cs.LG @ 2025-06-27: 1357
-
00 06-26 (4) Whole-Body Conditioned Egocentric Video Prediction Ganzkörperbedingte egozentrische Videovorhersage 整盘有条件的Egocentcent视频预报 2506.21552v1 -
01 06-26 mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale mTSBench: Benchmarking Multivariate Zeitreihen Anomalieerkennung und Modellauswahl auf Scale mTSBench:制定多变时间序列基准 2506.21550v1 -
02 06-26 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v1 -
03 06-26 HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation HalluSegBench: Counterfactual Visual Reasoning for Segmentation Halluzination Evaluation HalluSegeBench:截肢幻觉评价的反事实视觉理由 2506.21546v1 -
04 06-26 Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval Maximal aufeinander abgestimmte Materien: Vermeidung von Darstellungskollaps für robustes Cross-Modal Retrieval 最大匹配事项: 防止在强力跨模式检索中出现代表比例折叠 2506.21538v1 -
05 06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Erforschung des Design-Raums von 3D-MLLMs für die CT-Berichtserstellung 为编写CT报告探索3D MLLMs的设计空间 2506.21535v1 -
06 06-26 Chain-of-Sketch: Enabling Global Visual Reasoning Chain-of-Sketch: Globale visuelle Vernunft aktivieren 标准链链:扶持全球视觉理性 2410.08165v2 -
07 06-26 Mesh-Informed Neural Operator : A Transformer Generative Approach Mesh-informed Neural Operator : Ein transformer Generativer Ansatz 气象化神经操作器:变异创造方法 2506.16656v2 -
08 06-26 Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity Effiziente Flucht aus Sattelpunkten unter generalisierter Glätte durch selbsterklärende Regelmäßigkeit 通过自我调整常态,在普遍平滑状态下有效绕开散装货架点 2503.04712v2 -
09 06-26 Gaussian Invariant Markov Chain Monte Carlo Gaussian Invariant Markov Kette Monte Carlo Gausian Invarianant Markov 链 蒙特卡洛 2506.21511v1 -
10 06-26 skLEP: A Slovak General Language Understanding Benchmark sklep: Ein slowakisches allgemeines Sprachverständnis Benchmark SkLEP:斯洛伐克一般语言理解基准 2506.21508v1 -
11 06-26 NY Real Estate Racial Equity Analysis via Applied Machine Learning NY Real Estate Racial Equity Analyse über angewandtes maschinelles Lernen 通过应用机器学习进行房地产种族公平分析 2505.16946v3 -
12 06-26 Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems Prozess-Mining-gesteuerte Modellierung und Simulation zur Verbesserung der Fehlerdiagnose in cyber-physischen Systemen 由采矿流程驱动的模型和模拟模型和模拟,以加强网络物理系统中的过失诊断 2506.21502v1 -
13 06-26 Devising a solution to the problems of Cancer awareness in Telangana Lösung der Probleme des Krebsbewusstseins in Telangana 制定特拉甘纳癌症意识问题解决方案 2506.21500v1 -
14 06-26 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v2 -
15 06-26 One Model to Forecast Them All and in Entity Distributions Bind Them Ein Modell, um sie zu prognostizieren Alles und in Entity-Distributionen Bind Them 预测所有实体和实体分配的模型之一 2501.15499v2 -
16 06-26 Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages Mit Phonemes: Mehrsprachigkeit von LLMs für nicht-lateinische Script-Sprachen verbessern 以电话提示:提高LLMS的非拉丁文拼写语言多重语言质量 2411.02398v3 -
17 06-26 From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v2 -
18 06-26 Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection Zuverlässige Erkennung von leerem Raum: Bedingte markierte Punktprozesse für Objekterkennung 争取可靠地探测空空空间:物体探测的有条件定点过程 2506.21486v1 -
19 06-26 Evaluation of Traffic Signals for Daily Traffic Pattern Bewertung von Verkehrssignalen für das tägliche Verkehrsmuster 对每日交通模式交通信号的评价 2506.21469v1 -
20 06-26 In-Context Learning Strategies Emerge Rationally In-Context Learning Strategies Emerge Rational 新兴动力 2506.17859v2 -
21 06-26 Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Optimierung der Runge-Kutta-Methoden der 4. Ordnung: Dynamischer heuristischer Ansatz für Effizienz und geringen Speicher 优化第四阶极龙格-库塔方法:高效和低储存的动态超光速方法 2506.21465v1 -
22 06-26 Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs Capacity-Constrained Online-Lernen mit Verzögerungen: Scheduling Frameworks und Trade-offs bedauern 受能力制约的有延误的在线学习:时间安排框架和悔恨取舍 2503.19856v2 -
23 06-26 Aligning Spoken Dialogue Models from User Interactions Ausrichten von gesprochenen Dialogmodellen aus Benutzerinteraktionen 校对用户互动中的口语对话框模型 2506.21463v1 -
24 06-26 A Keyword-Based Technique to Evaluate Broad Question Answer Script Eine Keyword-basierte Technik zur Bewertung von Broad Question Answer Script 用于评价广泛问答脚本的关键字技术 2506.21461v1 -
25 06-26 Wild refitting for black box prediction Wilde Nachrüstung für Black Box Vorhersage 黑盒预测的野生改造 2506.21460v1 -
26 06-26 Fake it till You Make it: Reward Modeling as Discriminative Prediction Verfälschen Sie es, bis Sie es: Belohnung Modellieren als diskriminative Vorhersage 假称直到你做出它: 奖赏模型作为有偏见的预测 2506.13846v2 -
27 06-26 Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v4 -
28 06-26 PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries PARALLELPROMPT: Parallelität aus großen Sprachmodellfragen extrahieren PARALELPROPT:从大语言模式查询中提取平行论 2506.18728v2 -
29 06-26 Towards an Optimal Control Perspective of ResNet Training Auf dem Weg zu einer optimalen Steuerungsperspektive der ResNet-Schulung 建立ResNet培训最佳控制视角 2506.21453v1 -
30 06-26 A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario Ein umfassender Datensatz für die Untertage-Miner-Erkennung in unterschiedlichen Szenarien 不同情景下地下矿工探测综合数据集 2506.21451v1 -
31 06-26 Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform Lernbare adaptive Zeit-Frequenz-Darstellung über differenzierbare Kurzzeit Fourier-Transformation 通过有区别的短时四轮式变换,通过有区别的短时四轮式变换, 2506.21440v1 -
32 06-26 New Bounds for Sparse Variational Gaussian Processes Neue Grenzen für Sparse Variational Gaussian Prozesse 偏偏多高斯进程的新界口 2502.08730v2 -
33 06-26 Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v3 -
34 06-26 Graph Neural Network for Neutrino Physics Event Reconstruction Graph Neural Netzwerk für Neutrino Physik Ereignis Rekonstruktion 中子物理事件重建神经网络 2403.11872v2 -
35 06-26 The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures Die Probenkomplexität von Lipschitz-Betreibern in Bezug auf Gaussische Maßnahmen Gaussian措施方面学习利普施茨经营者的抽样复杂性 2410.23440v3 -
36 06-26 Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort Deception Detection in dyadischen Austauschen mit multimodalem maschinellem Lernen: Eine Studie über eine schwedische Kohorte 利用多式机器学习的多式机器交流中的欺骗感检测:瑞典教区研究 2506.21429v1 -
37 06-26 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v1 -
38 06-26 TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v3 -
39 06-26 Continual Learning as Computationally Constrained Reinforcement Learning Kontinuierliches Lernen als Computationally Constrained Reinforcement Learning 持续学习作为计算限制的训练强化学习 2307.04345v3 -
40 06-26 Improving Stochastic Cubic Newton with Momentum Verbesserung der stochastischen Kubik Newton mit Momentum 快速改善斯托卡立方立方牛顿 2410.19644v2 -
41 06-26 Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional Aktionsminimierung trifft auf generative Modellierung: Effizientes Transition Path Sampling mit der Onsager-Machlup Funktion 行动最优化符合产生模型的生成模型:与Onsager-Machlup 职能进行高效率过渡道路抽样 2504.18506v3 -
42 06-26 Distributed Cross-Channel Hierarchical Aggregation for Foundation Models Verteilte Cross-Channel Hierarchische Aggregation für Stiftungsmodelle 基金会模型分布式跨河道分道分道分道分道分道分道分道分道分道分道分道分道 2506.21411v1 -
43 06-26 Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference Skalierbare Bayesische Low-Rank-Anpassung von großen Sprachmodellen über stochastische Variations-Subraum-Inferenz 通过Stochastic变异性子空间推断,对大语言模型进行可缩放的Bayesian低Rank 2506.21408v1 -
44 06-26 Early Stopping Tabular In-Context Learning Frühzeitiges Stoppen des tabellarischen In-Context-Lernens 早期停止制表列表内容学习 2506.21387v1 -
45 06-26 Representation Learning of Lab Values via Masked AutoEncoders Darstellung Lernen von Laborwerten über Maskierte AutoEncoder 通过蒙面自动编码器学习实验室价值 2501.02648v3 -
46 06-26 Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection Temporal-Aware Graph Aufmerksamkeit Netzwerk für Kryptowährung Transaktion Betrugserkennung 加密货币交易欺诈侦查实时警示图关注网络 2506.21382v1 -
47 06-26 HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps HARPT: Ein Corpus für die Analyse des Vertrauens und der Datenschutzbelange der Verbraucher in mobilen Gesundheits-Apps HARPT: 分析移动保健应用程序中消费者信任和隐私问题的一个公司 2506.19268v2 -
48 06-26 Pay Attention to Small Weights Achten Sie auf kleine Gewichte 关注小体重 2506.21374v1 -
49 06-26 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v2 -
50 06-26 MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators MAx-DNN: Mehrstufige Arithmetik-Annäherung für energieeffiziente DNN-Hardwarebeschleuniger MAX-DNN: 能源高效 DNN 硬件加速器的多级自动测量近似法 2506.21371v1 -
51 06-26 rQdia: Regularizing Q-Value Distributions With Image Augmentation rQdia: Regularisieren der Q-Value-Distributionen mit Bildvergrößerung rQdia: 以图像放大方式规范 Q- 价值发行 2506.21367v1 -
52 06-26 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning SMMILE: Ein sachverständiger Benchmark für multimodales medizinisches In-Context-Lernen SMMILE:多模式医学内书学习专家开发基准 2506.21355v1 -
53 06-26 Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions Lipschitz Bounds für persistente Laplacian Eigenwerte unter One-Simplex-Insertionen 在单质插入下用于持久性拉板电极值的 Lipschitz Bounds 2506.21352v1 -
54 06-26 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory Über die Fähigkeit tiefer Netzwerke, Symmetrien aus Daten zu lernen: Eine neurale Kerneltheorie 深网络从数据中学习对称的深网络能力:神经核心理论 2412.11521v2 -
55 06-26 Learning Value of Information towards Joint Communication and Control in 6G V2X Lernwert von Informationen zur gemeinsamen Kommunikation und Kontrolle in 6G V2X 6G V2X 6G 6G V2X 6G 6G 6G V2X 6G 6G 6G 2505.06978v2 -
56 06-26 PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks PuriDefense: Randomized Local Implizite Adversarial Purification for Defending Black-Box Query-based Attacks 防御:保护黑箱质疑式袭击的随机本地秘密对抗性净化 2401.10586v2 -
57 06-26 Regret Bounds for Robust Online Decision Making Bedauern Sie Grenzen für robuste Online-Entscheidungsfindung 对强有力的在线决策感到遗憾 2504.06820v2 -
58 06-26 DynamicBench: Evaluating Real-Time Report Generation in Large Language Models DynamicBench: Bewertung der Echtzeit-Berichtserstellung in großen Sprachmodellen 动态 bench:评价以大语言模式编制实时报告的情况 2506.21343v1 -
59 06-26 AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification AGTCNet: Ein graphisch-zeitlicher Ansatz für die Klassifikation der Primärmotorik EEG AGTCNet: 固定机动图像电子EEG分类的图表-临时方法 2506.21338v1 -
60 06-26 A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis Ein skalierbares Quantum-Neural-Netzwerk für annähernde SRBB-basierte Einheitssynthese 近似基于SRBB的单一合成的可缩放量量子神经网络 2412.03083v2 -
61 06-26 ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion ScaleGNN: Auf dem Weg zu skalierbaren Graphen-Neuralnetzwerken über adaptive High-Order Neighboring Feature Fusion SASGNN:通过适应性高顺序相邻相邻地貌融合,走向可缩放的图形神经网络 2504.15920v4 -
62 06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Latent Prototype Routing: Erzielen einer nahezu perfekten Lastabgleichung in Mixture-of-Experts 原型原型路由:在混合专家中实现近效果负载平衡 2506.21328v1 -
63 06-26 Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning Stochastische Quantum-Spiking-Neuralnetzwerke mit Quantengedächtnis und lokalem Lernen 具有量子内存和本地学习的实测量量谱剖析神经网络 2506.21324v1 -
64 06-26 On Uniform Weighted Deep Polynomial approximation Auf einheitliche Gewichte tiefe Polynom-Annäherung 统一加权深多元近似值 2506.21306v1 -
65 06-26 Context-Aware Doubly-Robust Semi-Supervised Learning Kontext-Bewusst Doppel-Robust Semi-überwachtes Lernen Doubly-Robust半监督学习 2502.15577v2 -
66 06-26 Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance Semantische Szenegrafik für Ultrasound-Bilderklärung und Scan-Anleitung 超声超声图像解释和扫描指导的语义谱图 2506.19683v2 -
67 06-26 Exploring Adapter Design Tradeoffs for Low Resource Music Generation Erforschung von Adapter-Design-Tradeoffs für Low Resource Music Generation 探索用于低资源音乐制作的适应设计取舍 2506.21298v1 -
68 06-26 Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v2 -
69 06-26 Improved seeding strategies for k-means and k-GMM Verbesserte Saatstrategien für k-Mittel und k-GMM 改进k-手段和k-GMM和k-GMM的播种战略 2506.21291v1 -
70 06-26 Small Encoders Can Rival Large Decoders in Detecting Groundedness Kleine Encoder können große Decoder bei der Erkennung von Erdlichkeit rivalisieren 在地面探测中能够使大型分离器在探测地面时发生迭接 2506.21288v1 -
71 06-26 Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling Energy Matching: Zusammenführen von Flow Matching- und Energy-Based-Modellen für die Generative Modellierung 能源匹配:统一流动匹配和以能源为基础的生成模型模型 2504.10612v4 -
72 06-26 Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution 使用高效球道球道配送的超球变异自动编码器 2506.21278v1 -
73 06-26 Lagrangian Index Policy for Restless Bandits with Average Reward Lagrangian Index Policy for Restless Bandits with Average Reward 以平均回报率衡量的无休无休止强盗拉格朗加指数政策 2412.12641v2 -
74 06-26 A GREAT Architecture for Edge-Based Graph Problems Like TSP Eine großartige Architektur für Edge-Based Graph Probleme wie TSP 象TSP那样的边缘图表问题大建筑 2408.16717v2 -
75 06-26 These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining Diese sind nicht alle Funktionen, die Sie suchen: Ein grundlegender Engpass in überwachten Pretraining 这些不是所有你正在寻找的特征: 受监督预科班的基本瓶颈。 2506.18221v2 -
76 06-26 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster DiLoCoX: Ein kommunikationsarmer groß angelegter Ausbildungsrahmen für dezentralisierte Cluster DILOCOX:权力下放小组的低通信大范围培训框架 2506.21263v1 -
77 06-26 Simulating Hard Attention Using Soft Attention Simulation der harten Aufmerksamkeit mit weicher Aufmerksamkeit 使用软关注模拟硬关注 2412.09925v2 -
78 06-26 Wavelet Diffusion Neural Operator Wavelet Diffusions-Neuraloperator Wavelet 扩散神经操作员 2412.04833v3 -
79 06-26 Radio Map Estimation via Latent Domain Plug-and-Play Denoising Radiokarte Schätzung über Latent Domain Plug-and-Play Denoising 通过Latent Domain Plug 和 Play Disoising 无线电地图估计 2501.13472v2 -
80 06-26 From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting Von der On-Chain zum Makro: Bewertung der Bedeutung der Datenquellenvielfalt in der Kryptowährungsmarktprognose 从连网到宏观:评估数据来源多样性在加密货币市场预测中的重要性 2506.21246v1 -
81 06-26 Zero-Shot Learning for Obsolescence Risk Forecasting Zero-Shot-Lernen für Obsoleszenz-Risikoprognosen 用于悬浮风险预测的零热学习 2506.21240v1 -
82 06-26 Capturing Style in Author and Document Representation Stil in der Autor- und Dokumentdarstellung erfassen 在作者和文件代表中获取样式 2407.13358v2 -
83 06-26 Rapid Gyroscope Calibration: A Deep Learning Approach Schnelle Gyroskop-Kalibrierung: Ein tiefer Lernansatz 快速热波校准:深学习方法 2409.00488v3 -
84 06-26 Complexity-aware fine-tuning Komplexitätsbewusste Feinabstimmung 复杂度认知微调 2506.21220v1 -
85 06-26 Balancing Privacy, Robustness, and Efficiency in Machine Learning Ausbalancierende Privatsphäre, Robustheit und Effizienz im maschinellen Lernen 平衡隐私、强健和机器学习效率 2312.14712v3 -
86 06-26 Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Kausale Vernunft in großen Sprachmodellen enthüllen: Realität oder Mirage? 大语言模型中未解的因果理由:现实还是幻影? 2506.21215v1 -
87 06-26 Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen 未受监督的优化交通学习计划预测 2506.12025v2 -
88 06-26 LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey LLM-basierte human-agente Kooperations- und Interaktionssysteme: Eine Umfrage 以LLM为基础的人类-机构协作和互动系统:调查 2505.00753v4 -
89 06-26 Seal Your Backdoor with Variational Defense Versiegeln Sie Ihre Hintertür mit abwechslungsreicher Verteidigung 以不同防御方式密封你的后门 2503.08829v2 -
90 06-26 Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout Künstliche Delegierte lösen Fairness-Probleme bei der ständigen Abstimmung mit teilweiser Wahlbeteiligung 持部分投票票的永久表决中的人造代表解决公平问题 2506.21186v1 -
91 06-26 PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp PCF-Grasp: Umwandlung der Punktvervollständigung in Geometrie-Feature zur Verbesserung der 6-DoF-Grasp PCF-格拉斯普:将完成点转换成几何特征,以加强6-DoF格拉斯普 2504.16320v2 -
92 06-26 Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 Performance-Verbesserung der räumlichen semantischen Segmentierung mit angereicherten Audio-Features und agentenbasierter Fehlerkorrektur für DCASE 2025 Challenge Task 4 DASAS 2025年挑战任务4,具有浓缩音频特征和以代理物为基础的错误更正的 空间语义分离的性能改进 2506.21174v1 -
93 06-26 Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v2 -
94 06-26 Moderating the Generalization of Score-based Generative Model Moderierung der Generalisierung des Score-basierten Generativen Modells 简化基于记分制的通用创制模式 2412.07229v2 -
95 06-26 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Metis-RISE: RL fördert und verbessert multimodales Reasoning Model Learning Metis-RISE: RL 激励和SFT加强多模式理由示范学习 2506.13056v2 -
96 06-26 Self-Regulated Neurogenesis for Online Data-Incremental Learning Selbstregulierte Neurogenese für Online-Daten-Inkrementelles Lernen 在线数据强化学习自调节神经源 2403.14684v2 -
97 06-26 Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design Vielfältige Mini-Batch-Auswahl in Verstärkungs-Lernen für effiziente chemische Exploration in de novo Drug Design 为在新药设计中进行高效化学勘探而加强学习的多样化小型批次选择 2506.21158v1 -
98 06-26 Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation Schätzung von transformerbasierten räumlich-zeitlichen kontrafaktischen Ergebnissen 以变换器为基础的空间-时-时-时-反事实结果估计 2506.21154v1 -
99 06-26 A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security Ein neuartiges, lernbasiertes IDS zur Verbesserung der Privatsphäre und Sicherheit von UAVs 旨在加强无人驾驶航空器隐私和安全的新联邦学习型新学习型ISDS 2312.04135v3 -
100 06-26 Linearity-based neural network compression Linearitätsbasierte neuronale Netzwerkkompression 线性神经网络压缩 2506.21146v1 -
101 06-26 Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion Personalisiertes Federated Learning durch Dual-Prompt-Optimierung und Cross Fusion 通过双速优化和交叉融合进行个性化联邦学习 2506.21144v1 -
102 06-26 Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks Generative Adversarial Evasion und Out-of-Distribution-Detection für UAV-Cyber-Attacks 无人驾驶航空飞行器网络设备生成反向疏散和分配外探测 2506.21142v1 -
103 06-26 Multi-convex Programming for Discrete Latent Factor Models Prototyping Multi-convex-Programmierung für diskrete Latent Factor Models Prototyping Discrete 后端因数模型的多contex 编程程序 2504.01431v2 -
104 06-26 DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding DBConformer: Doppel-Branch-Konvolutionstransformator für EEG-Dekodierung DBCon前导体: EEG 解码的双相相相电变异变异器 2506.21140v1 -
105 06-26 Solving Inverse Problem for Multi-armed Bandits via Convex Optimization Inverses Problem für mehrarmige Banditen durch Convex-Optimierung lösen 通过 Convex 优化解决多武装强盗的反向问题 2501.18945v3 -
106 06-26 NaLaFormer: Norm-Aware Linear Attention for Transformer Models NaLaFormer: Norm-Aware Lineare Aufmerksamkeit für Transformer-Modelle NaLaFormer: 变形模型的诺姆- Aware 线性注意 2506.21137v1 -
107 06-26 Inverse Reinforcement Learning via Convex Optimization Inverse Verstärkungs-Lernen über Convex-Optimierung 通过Convex优化化进行反强化学习 2501.15957v2 -
108 06-26 Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks Curriculum-geführtes Antifragiles Verstärkungslernen für sichere UAV-Dekonfliktion unter Beobachtungs-Raumangriffen 在观测-空间攻击下安全无人驾驶飞行器消除冲突课程-指导反脆弱强化学习 2506.21129v1 -
109 06-26 Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments Robuster Policy-Switch für Antifragiles Verstärkungslernen für UAV-Deconfliction in Adversarial Environments 在逆向环境中为无人驾驶航空器消除冲突而进行抗脆弱强化学习的强有力政策转换 2506.21127v1 -
110 06-26 Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection Trade-Off-Grenzen drücken: Kompakte und dennoch effektive Fernerkundungs-Änderungserkennung 推进贸易-开放边界:结合但有效的遥感变化探测 2506.21109v1 -
111 06-26 Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges Unhaltbar: Unpaarte Single-Cell-Multi-Perturbation-Schätzung durch Dual Conditional Diffusion Implizite Brücken 不持久: 由双条件分解隐形桥进行无压单细胞多扰动估计 2506.21107v1 -
112 06-26 Learning to Skip the Middle Layers of Transformers Lernen, die mittleren Schichten der Transformer zu überspringen 学习跳过变换器的中层 2506.21103v1 -
113 06-26 Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning Interpretierbares Hierarchisches Konzept durch aufmerksamkeitsorientiertes Graphenlernen 通过引人指导图表学习推理的可解释的等级概念 2506.21102v1 -
114 06-26 FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung FeDa4fair:公平评价客户-联邦数据集 2506.21095v1 -
115 06-26 Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection Chain-of-Thought verbesserte Shallow Transformer für drahtlose Symbolerkennung 用于无线探测无线符号探测的 研究链强化浅ow变压器 2506.21093v1 -
116 06-26 CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions CovDocker: Benchmarking Covalent Drug Design mit Aufgaben, Datensätzen und Lösungen CovDocker:用任务、数据集和解决办法确定共价药物设计基准 2506.21085v1 -
117 06-26 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception EgoAdapt: Adaptive multisensorische Destillation und politisches Lernen für eine effiziente egozentrische Wahrnehmung EgoAdapt: 适应性多感性蒸馏和政策学习,促进高效率的以地球为中心感知 2506.21080v1 -
118 06-26 Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games Homogenisierung von Multi-Agent-Learning-Dynamik in Finite-State Markov Spiele 在Finite- State-Markov运动会中多剂学习动态的同质化 2506.21079v1 -
119 06-26 Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph Verbesserung der LLM-Tool-Nutzung mit hochwertigen Instruktionsdaten aus Wissensgrafik 利用来自知识图的高质量教学数据加强LLM工具的使用 2506.21071v1 -
120 06-26 SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v3 -
121 06-26 FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning FedDAA: Dynamisches Client-Clustering für Konzept Drift-Anpassung im Federated Learning FedDAA: 联邦学习中适应概念的动态客户集群组合 2506.21054v1 -
122 06-26 Sharp concentration of uniform generalization errors in binary linear classification Scharfe Konzentration von einheitlichen Verallgemeinerungsfehlern in der binären linearen Klassifikation 二进线线性分类中统一一般化误差的集中程度 2505.16713v2 -
123 06-26 Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Verbesserung der Diffusions-basierten Bildbearbeitung Treue durch Anleitung und Planung 通过指导和日程安排改进基于传播的图像编辑信仰 2506.21045v1 -
124 06-26 Efficient Skill Discovery via Regret-Aware Optimization Effiziente Skill Discovery durch regret-aware Optimierung 通过Regret-Aware 优化发现高效技能 2506.21044v1 -
125 06-26 Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning Strenge Subgoal Execution: Zuverlässige Langzeitplanung im Hierarchischen Stärkungslernen 严格次级目标执行:在等级强化学习中可靠的长期规划 2506.21039v1 -
126 06-26 RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment RL-Selector: Verstärkte lernorientierte Datenauswahl über Redundanzbewertung RL-选择者:通过裁员评估甄选强化学习指导数据 2506.21037v1 -
127 06-26 An Information-Theoretic Analysis for Federated Learning under Concept Drift Eine informationstheoretische Analyse für das Federated Learning unter Konzept Drift 根据 “ 漂流概念 “ 进行的联邦学习信息理论分析 2506.21036v1 -
128 06-26 SceneGenAgent: Precise Industrial Scene Generation with Coding Agent SceneGenAgent: Präzise industrielle Szenegenerierung mit Coding Agent SceneGenerAgenti: 精密工业场景与编码剂生成 2410.21909v3 -
129 06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Little By Little: Kontinuierliches Lernen über selbsttätiges Sparse Mixture-of-Rank Adaptives Lernen 小小小小:通过自发的微小混血体适应性学习不断学习 2506.21035v1 -
130 06-26 PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v2 -
131 06-26 TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence TRIDENT: Tri-Modal Molecular Representative Learning mit taxonomischen Anmerkungen und lokaler Korrespondenz 三模式分子代表性学习,具有分类说明和当地通讯 2506.21028v1 -
132 06-26 Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Mischung von Experten-augmented Deep Unfolding für Aktivitätserkennung in IRS-gestützten Systemen IRS辅助系统中活动探测专家加固深载混合体 2502.20183v2 -
133 06-26 HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation HybridQ: Hybrid-Klassisch-Quantum Generatives Adversariales Netzwerk für die Bildgenerierung von Hauterkrankungen CCF: 皮肤疾病成像生成的混合古金-量反反转网络 2506.21015v1 -
134 06-26 Efficient Image Generation with Variadic Attention Heads Effiziente Bildgenerierung mit verschiedenen Aufmerksamkeitsköpfen 高效的图像生成,由Variadic关注组织负责人负责 2211.05770v3 -
135 06-26 Proximal Point Method for Online Saddle Point Problem Proximale Point-Methode für Online-Sättelpunkt-Problem 在线搭配点问题的近点方法 2407.04591v3 -
136 06-26 Review learning: Real world validation of privacy preserving continual learning across medical institutions Review learning: Echte Welt-Validierung der Privatsphäre Erhaltung kontinuierlichen Lernens in medizinischen Einrichtungen 审查学习:维护各医疗机构持续学习的隐私的真实世界验证 2210.09394v2 -
137 06-26 Distilling Normalizing Flows Destillieren von Normalisierungsströmen 保持正常流动 2506.21003v1 -
138 06-26 Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process Genetischer Algorithmus mit innovativen Chromosomenmustern im Zuchtprozess 育种过程中创新性染色体模式的遗传数值 2501.18184v3 -
139 06-26 Pretrained Reversible Generation as Unsupervised Visual Representation Learning Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen 作为无人监督的视觉代表学习 2412.01787v3 -
140 06-26 Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance Schritt für Schritt Video-zu-Audio-Synthese über Negative Audio-Anleitung 通过消极音频指导,逐步进行视频到视听合成 2506.20995v1 -
141 06-26 Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity Überbrückung der Lücke zwischen Annäherung und Lernen durch Optimale Annäherung durch ReLU MLPs der Maximalregularität 通过最大合规性RELU MLP,通过最佳接近缩小接近与学习之间的差距 2409.12335v4 -
142 06-26 SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes SharpZO: Hybrid Sharpness-Aware Vision Sprachmodell Prompt Tuning via Forward-Only Passes SharpZO: 混合尖锐-敏锐视觉语言模型,通过前向-单行道快速调试 2506.20990v1 -
143 06-26 Can Gradient Descent Simulate Prompting? Kann Gradient Descent Simulate Prompting? 梯子源模拟能刺激吗? 2506.20989v1 -
144 06-26 Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem Split-Merge: Ein unterschiedsbasierter Ansatz für das Dominante Eigenwertproblem Split-Merge:对支配性电子价值问题采取基于差异的办法 2501.15131v2 -
145 06-26 Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations Generalisierte Tensor-basierte Parameter-Effizient Feinsteuerung über Lie Group Transformationen 通过 “ 谎言集团变形 “ 进行通用的Tensor基准参数有效精美调整 2504.00851v2 -
146 06-26 Explainable quantum regression algorithm with encoded data structure Erklärbarer Quantenregressionsalgorithmus mit kodierter Datenstruktur 具有编码数据结构的可解释量子回归算法 2307.03334v5 -
147 06-26 EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v1 -
148 06-26 Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding Antikörper-Design und Optimierung mit mehrstufigen äquivarianten Graphen-Diffusions-Modellen für präzise, komplexe Antigen-Bindung 防反体设计和优化,采用多种规模等同图形扩散模型,用于准确的复合抗原装订 2506.20957v1 -
149 06-26 Model State Arithmetic for Machine Unlearning Modell Staat Arithmetik für Maschine Unlearning 机器脱修示范国 2506.20941v1 -
150 06-26 Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics Prognose geopolitischer Ereignisse mit einem spare Temporal Fusion Transformer und Gaußschen Prozesshybrid: Eine Fallstudie in Nahost und US-Konfliktdynamik 以松散的时空融合变异器和高斯进程混合器预测地缘政治事件:中东和美国冲突动态案例研究 2506.20935v1 -
151 06-26 Lower Bounds on the Size of Markov Equivalence Classes Untere Grenzen auf der Größe der Markov-Äquivalenzklassen 马克夫等等效类大小的下下界界圈 2506.20933v1 -
152 06-26 Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market Quantum-Verstärkung-Learning-Trading-Agent für Sektor-Rotation auf dem Aktienmarkt Taiwan 台湾股市部门轮换的量级强化学习贸易代理 2506.20930v1 -
153 06-26 Active Learning for Manifold Gaussian Process Regression Aktives Lernen für manifolde Gaußsche Prozessregression Gaussian 进程倒退的 Manifide Gaussian 正在学习 2506.20928v1 -
154 06-26 Interpretable Representation Learning for Additive Rule Ensembles Interpretable Representative Learning for Additive Rule Ensembles 补充规则会议的解释性代表性学习 2506.20927v1 -
155 06-26 LLM-guided Chemical Process Optimization with a Multi-Agent Approach LLM-geführte chemische Prozessoptimierung mit einem Multi-Agent-Ansatz LLM-LLM-制导化学过程 优化采用多机构办法 2506.20921v1 -
156 06-26 Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers Maschinelles Lernen von Mikrostruktur-Eigenschaftsbeziehungen in Materialien, die die Mikrostrukturdarstellung von grundlegenden Vision-Transformatoren nutzen 利用基础视觉变压器代表微观结构的材料中微型结构-财产关系 2501.18637v2 -
157 06-26 Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning Erklärbare KI für Radar-Ressourcenmanagement: Modifizierte LIME im Deep Reinforcement Learning 用于雷达资源管理的可解释的AIAI:深强化学习中修改的LIME 2506.20916v1 -
158 06-26 ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models ZKPROV: Ein Null-Knowledge-Ansatz zur Datensatzprovenz für große Sprachmodelle ZKPROV:大语言模型数据集验证零知识化办法 2506.20915v1 -
159 06-26 Faster Fixed-Point Methods for Multichain MDPs Schnellere Fixed-Point-Methoden für Multichain-MDPs 《多链 MDP快速固定点方法》 2506.20910v1 -
160 06-26 Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL Optimale Single-Policy-Probenkomplexität und transiente Abdeckung für durchschnittlich reward Offline-RL 平均离岸平均回报率的 最佳单一政策抽样复杂程度和中度覆盖率 2506.20904v1 -
161 06-26 Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction Graph-strukturiertes Feedback Multimodel Ensemble Online Conformal Prediction 多模型组合在线非正式预测 2506.20898v1 -
162 06-25 (3) On the Necessity of Output Distribution Reweighting for Effective Class Unlearning Über die Notwendigkeit der Neugewichtung der Output-Distribution für effektives Klassenunlernen 有效班级取消学习时必须增加产出分配的加权 2506.20893v1 -
163 06-25 Next-token prediction capacity: general upper bounds and a lower bound for transformers Next-token Vorhersagekapazität: allgemeine obere Grenzen und eine untere Grenze für Transformatoren 下对位预测能力:变压器一般上限值和下限值 2405.13718v3 -
164 06-25 Omniwise: Predicting GPU Kernels Performance with LLMs Omniwise: Vorhersage der Leistung von GPU-Kerneln mit LLMs 总括性: 使用 LLMs 预测 GPU 核心内核性能 2506.20886v1 -
165 06-25 HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation HyperINF: Lösen der Hyperkraft der Schulz-Methode zur Bestimmung des Einflusses von Daten HyperINF: 释放Schulz数据影响估计方法的超功率 2410.05090v2 -
166 06-25 Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance Komplexe Modelltransformationen durch verstärktes Lernen mit unsicherer menschlicher Führung 以不确定的人类指导加强学习的复杂模式转变 2506.20883v1 -
167 06-25 Fairly Accurate: Fairness-aware Multi-group Target Detection in Online Discussion Ziemlich genau: Fairness-bewusste Multi-Gruppen-Zielerkennung in Online-Diskussion 准确无误:在线讨论中公平了解多群体目标检测 2407.11933v2 -
168 06-25 Always Skip Attention Immer die Aufmerksamkeit überspringen 总是跳过关注 2505.01996v2 -
169 06-25 A3 : an Analytical Low-Rank Approximation Framework for Attention A3: ein analytischer Rahmen für die Annäherung an den Low-Rank-Wert A3: 分析性低Rank接近度关注框架 2505.12942v3 -
170 06-25 Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research Empowering Digital Agriculture: Ein datenschutzschonender Rahmen für den Datenaustausch und die Sonderforschung 赋予数字农业权力:数据分享和合作研究保护隐私框架 2506.20872v1 -
171 06-25 High-dimensional Contextual Bandit Problem without Sparsity Hochdimensionales Kontext-Bandit-Problem ohne Sparsamkeit 无分分的高维背景土匪问题 2306.11017v2 -
172 06-25 Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA Leaner Training, Lower Leakage: Die Erinnerung an LLM Fine-Tuning mit LoRA 皮皮培训,《下下渗漏:重新研究LLM与LORA的精细调整的记忆 2506.20856v1 -
173 06-25 Subspace-Distance-Enabled Active Learning for Efficient Data-Driven Model Reduction of Parametric Dynamical Systems Subspace-Distance-Enabled Active Learning für effiziente datengetriebene Modellreduktion parametrischer dynamischer Systeme 减少参数动态系统的高效数据驱动模型 2505.00460v2 -
174 06-25 Multi-Objective Reinforcement Learning for Cognitive Radar Resource Management Multi-Zielives Stärkungslernen für Kognitives Radarressourcenmanagement 多目标强化学习促进认知雷达资源管理 2506.20853v1 -
175 06-25 InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction InterFormer: Effektives Heterogenes Interaktionslernen für Click-through-Rate-Vorhersage Interformer: 有效不同差异的交互式学习,用于点击频频率预测 2411.09852v3 -
176 06-25 Learning-Based Resource Management in Integrated Sensing and Communication Systems Lernbasiertes Ressourcenmanagement in integrierten Sensing- und Kommunikationssystemen 综合遥感和通信系统基于学习的资源管理 2506.20849v1 -
177 06-25 From Tiny Machine Learning to Tiny Deep Learning: A Survey Vom kleinen maschinellen Lernen bis zum kleinen tiefen Lernen: Eine Umfrage 从小机器学习到小深习:调查 2506.18927v2 -
178 06-25 Reducing Biases in Record Matching Through Scores Calibration Reduzierung von Biasen in Rekorden, die durch Score-Kalibrierung übereinstimmen 通过计分校准减少记录匹配比分 2411.01685v2 -
179 06-25 Uncertainty-Aware Machine-Learning Framework for Predicting Dislocation Plasticity and Stress-Strain Response in FCC Alloys Unsicheres Machine-Learning-Framework für die Vorhersage von Dislokation Plastizität und Stress-Stain-Reaktion in FCC-Legierungen FCC合金中预测异异可塑性和压力-压力-压力-压力-压力-压力反应的 不确定性-警报机学习框架 2506.20839v1 -
180 06-25 Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning Globale falsche Negative auf der Flucht entdecken für selbstüberwachtes kontraproduktives Lernen 为自我监督的反竞争学习而发现飞行中的全球虚假负差 2502.20612v2 -
181 06-25 Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data Composite Flow passend zum Verstärkungslernen mit Shifted-Dynamics-Daten 与上下动动量数据匹配的强化学习综合流程 2505.23062v2 -
182 06-25 Harnessing the Universal Geometry of Embeddings Die universelle Geometrie der Einbettungen nutzen 利用通用嵌入式几何法 2505.12540v3 -
183 06-25 TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation TaxaDiffusion: Progressiv ausgebildetes Diffusionsmodell für die Generierung feinkörniger Arten 传税:逐步培训的传税模式 2506.01923v2 -
184 06-25 Efficacy of Temporal Fusion Transformers for Runoff Simulation Wirksamkeit von Temporal Fusion Transformern für Runoff Simulation 用于模拟径流的时空熔化变换器的效能 2506.20831v1 -
185 06-25 Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery Erweiterte Computer-Vision für die Extraktion georeferenzierten Fahrzeug-Trajektorien aus Drohnenbildern 从无人机图像中提取地理参照车辆轨迹的高级计算机愿景 2411.02136v3 -
186 06-25 Demystifying Distributed Training of Graph Neural Networks for Link Prediction Entmystifizieren der verteilten Ausbildung von Graphen-Neural-Netzwerken zur Link-Vorhersage 对图形神经网络进行缩小神秘性分布培训,促进连结预测 2506.20818v1 -
187 06-25 Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers Universelle und effiziente Erkennung von Adversarialdaten durch nicht einheitliche Auswirkungen auf Netzwerkebenen 通过对网络层的不统一影响普遍和高效率地检测对立数据 2506.20816v1 -
188 06-25 Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning Divide, Specialize und Route: Ein neuer Ansatz für effizientes Ensemble-Lernen 区分、专门和路线:高效组合学习的新方式 2506.20814v1 -
189 06-25 FINN-GL: Generalized Mixed-Precision Extensions for FPGA-Accelerated LSTMs FINN-GL: Generalisierte Mischpräzisionserweiterungen für FPGA-beschleunigte LSTMs FINN-GL:FPGA加速式LSTMs通用混合精密扩展 2506.20810v1 -
190 06-25 GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization GPU-Kernel-Wissenschaftler: Ein LLM-getriebenes Framework für iterative Kernel-Optimierung GPU 核心科学家:循环核心优化LLM-驱动框架 2506.20807v1 -
191 06-25 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas Die Ideation-Execution-Gap: Ergebnisse der LLM-generierten gegen menschliche Forschungsideen 观察与执行差距:LLM-Genered与人类研究概念的执行结果 2506.20803v1 -
192 06-25 Structural System Identification via Validation and Adaptation Strukturelle Systemidentifikation durch Validierung und Anpassung 通过校验和适应确定结构系统 2506.20799v1 -
193 06-25 Stochastic Parameter Decomposition Stochastischer Parameter Zersetzung 蒸汽参数分解 2506.20790v1 -
194 06-25 Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing Spiking Neural Networks for SAR Interferometric Phase Unwrapping: Ein theoretischer Rahmen für energieeffiziente Verarbeitung 用于合成孔径雷达干涉测量阶段拆解的Spiking神经网络:节能处理理论框架 2506.20782v1 -
195 06-25 Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon Stabile Minima der ReLU Neuronalen Netzwerke leiden unter dem Fluch der Dimensionalität: Das neurale Shattering Phänomen ReLU神经网络中受多面性诅咒之苦的神经网络的稳定微型:神经震荡现象 2506.20779v1 -
196 06-25 Steering Your Diffusion Policy with Latent Space Reinforcement Learning Steuerung Ihrer Diffusionspolitik mit latentem Raum-Verstärkung-Lernen 指导您的发射政策 与远程空间加强学习 2506.15799v2 -
197 06-25 Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model Enthüllen von neuronalen Darstellungen höherer Ordnung von Unsicherheiten mit der Lärmschätzung durch das Modell der Verstärkungs-basierten Diffusion (NERD) 通过以增援为基础的扩散(NERD)模型进行噪音估计,以揭示高阶神经神经神经的不确定性 2503.14333v2 -
198 06-25 Stochastic and Non-local Closure Modeling for Nonlinear Dynamical Systems via Latent Score-based Generative Models Stochastische und nicht-lokale Verschlussmodellierung für nichtlineare dynamische Systeme über latente Score-basierte Generative Modelle 通过低记分生成模型为非线性动态系统模拟非线性动态系统建立存储和非本地关闭模型 2506.20771v1 -
199 06-25 GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs GASP: Effiziente Black-Box-Generation von Adversarial Suffixen für Jailbreaking LLMs GASP: 高效的黑色塑料制成的防腐化塑料塑料 2411.14133v2 -
200 06-25 Control and optimization for Neural Partial Differential Equations in Supervised Learning Steuerung und Optimierung für neurale Teildifferenzialgleichungen im Supervised Learning 受监督学习中神经部分差异等同的控制与优化 2506.20764v1 -
201 06-25 Characterization and Mitigation of Training Instabilities in Microscaling Formats Charakterisierung und Milderung von Ausbildungsinstabilitäten in Mikroskalierungsformaten 微缩缩放格式培训不稳定情况的特点和缓解 2506.20752v1 -
202 06-25 Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers Mehrere Ströme der Beziehungsextraktion: Anreicherung und Erinnerung an Transformer 关系采掘的多种流流:变形器中的丰富和回顾 2506.20746v1 -
203 06-25 A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools Eine KI-Umfrage für die Materialwissenschaft: Gründungsmodelle, LLM-Agenten, Datensätze und Tools 材料科学学会调查:基础模型、LLM代理、数据集和工具 2506.20743v1 -
204 06-25 Test-time Scaling Techniques in Theoretical Physics – A Comparison of Methods on the TPBench Dataset Testzeitskalierungstechniken in der Theoretischen Physik – Ein Vergleich der Methoden am TPBench-Datensatz 理论物理试验时间增强技术 – – TPBench数据集方法比较 2506.20729v1 -
205 06-25 On Convolutions, Intrinsic Dimension, and Diffusion Models Über Konvolutionen, Intrinsische Dimension und Diffusionsmodelle 革命、内在层面和扩散模型 2506.20705v1 -
206 06-25 Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models Diffusion Tree Sampling: Skalierbare Inferenz-Zeit-Ausrichtung von Diffusionsmodellen 扩散树采样:扩散模型的可缩放推推-时间对齐 2506.20701v1 -
207 06-25 DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy DemoDiffusion: Eine heiße menschliche Imitation mit vortrainierter Diffusionspolitik 利用预先培训的传播政策进行单向人类模拟 2506.20668v1 -
208 06-25 Data Quality in Crowdsourcing and Spamming Behavior Detection Datenqualität bei Crowdsourcing und Spamming Verhaltenserkennung 众包和垃圾传播行为检测数据质量 2404.17582v2 -
209 06-25 Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning 听不见邪恶:在联邦学习中发现恶意服务器的渐变渗漏 2506.20651v1 -
210 06-25 Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer Mastering Multiple-Expert Routing: Realisierbare $H$-Konsistenz und starke Garantien für das Lernen zu verteidigen 掌握多专家课程:可实现的美元-耐力和学习迟缓的有力保障 2506.20650v1 -
211 06-25 Disentangled representations of microscopy images Entwirrte Darstellungen von Mikroskopiebildern 显微镜图像的分解表达式 2506.20649v1 -
212 06-25 Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices Effizientes Federated Learning mit verschlüsselter Datenfreigabe für daten-heterogene Edge-Geräte 数据异异异边设备加密数据共享加密数据高效联邦学习 2506.20644v1 -
213 06-25 Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data Ausbalancieren der Skalen: Ein theoretischer und algorithmischer Rahmen für das Lernen aus unausgewogenen Daten 平衡尺度:从不平衡数据中学习的理论和算法框架 2502.10381v2 -
214 06-25 Towards Community-Driven Agents for Machine Learning Engineering Auf dem Weg zu gemeinschaftsgetriebenen Agenten für Maschinenbau 争取社区驱动机械学习工程代理 2506.20640v1 -
215 06-25 First-order methods for stochastic and finite-sum convex optimization with deterministic constraints Verfahren erster Ordnung zur stochastischen und finite-sum-konvexen Optimierung mit deterministischen Zwängen 具有确定性限制的随机和有限总消费的优化第一阶方法 2506.20630v1 -
216 06-25 PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models PLoP: Präzise LoRA-Platzierung für effiziente Feinsteuerung großer Modelle PLP: 高效微调大型模型的精确LORA定位 2506.20629v1 -
217 06-25 On Context-Content Uncertainty Principle Zu Kontext-Inhalt-Unsicherheitsprinzip 关于内含内含的不确定性原则 2506.20699v1 -
218 06-25 Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States Probing Quantum Spin Systems mit Kolmogorov-Arnold Neural Network Quantum States 与Kolmogorov-Arold神经网络 2506.01891v3 -
219 06-25 Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning Lost in Retraining: Roaming des Parameterraums exponentieller Familien unter geschlossenem Loop-Lernen 损失在再培训中:在闭路学习下,在封闭式学习下,对有生命力的家庭的参数空间进行Roaming 2506.20623v1 -
220 06-25 Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v2 -
221 06-25 Do Concept Bottleneck Models Respect Localities? Respektieren Konzept-Hengpässe-Modelle die Lokalitäten? ” 瓶颈模式 “ 概念是否尊重地方? 2401.01259v5 -
222 06-25 From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification Von $\mathcal{O}(n^{2})$ bis $\mathcal{O}(n)$ Parameter: Quanten Selbstaufmerksamkeit in Visionstransformatoren für die biomedizinische Bildklassifikation 从$\mathcal{O}(n2})美元到$\mathcal{O}(n)$ 参数:生物医学图像分类视觉变异器中的量子自我注意 2503.07294v2 -
223 06-25 H-FEX: A Symbolic Learning Method for Hamiltonian Systems H-FEX: Eine symbolische Lernmethode für Hamilton-Systeme H-FEX:汉密尔顿系统符号学习方法 2506.20607v1 -
224 06-25 LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v3 -
225 06-25 FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation FluoroSAM: Ein sprachförderndes Foundation-Modell für flexible Röntgenbild-Segmentierung FluororosAM:灵活X射线图像分割语言快速基础模型 2403.08059v3 -
226 06-25 On the Role of Context in Reading Time Prediction Zur Rolle des Kontexts bei der Lesezeitvorhersage 关于在阅读时间预测方面背景作用的 2409.08160v4 -
227 06-25 The kernel of graph indices for vector search Der Kernel der Graphenindizes für die Vektorsuche 用于矢量搜索的图表索引核心 2506.20584v1 -
228 06-25 Rethinking Early Stopping: Refine, Then Calibrate Frühes Aufhören neu denken: Verfeinern, dann kalibrieren 重新思考早期停止: 校正, 然后校准 2501.19195v2 -
229 06-25 Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling Entsperren des In-Context-Lernens für natürliche Datensätze jenseits der Sprachmodellierung 解锁超出语言建模之外的自然数据集的文中学习 2501.06256v2 -
230 06-25 Causal Representation Learning with Observational Grouping for CXR Classification Kausales Repräsentationslernen mit Beobachtungsgruppe für CXR-Klassifikation 与CXR分类观察组一起进行因果代表性学习 2506.20582v1 -
231 06-25 TabArena: A Living Benchmark for Machine Learning on Tabular Data TabArena: Ein lebender Benchmark für maschinelles Lernen auf Tabellendaten TabArena:用表格数据进行机器学习的活基准 2506.16791v2 -
232 06-25 Exploring Graph-Transformer Out-of-Distribution Generalization Abilities Erforschen von Graph-Transformer-Verallgemeinerungsfähigkeiten im Out-of-Distribution-Bereich 探索图图向外转移 2506.20575v1 -
233 06-25 Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series Benchmarking unüberwachter Strategien zur Erkennung von Anomalien in multivariaten Zeitreihen 确定多变时间序列中异常探测不受监督战略的基准 2506.20574v1 -
234 06-25 LARP: Learner-Agnostic Robust Data Prefiltering LARP: Learner-Agnostic Robuste Datenvorfilterung LARP: 学习者-不可知强力数据预过滤 2506.20573v1 -
235 06-25 Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control Verstärktes Lernen steigert die Produktion von Windfarmen durch Ermöglichung der Closed-Loop-Kollaborative Steuerung 增强学习能力,通过扶持闭路合作控制,增加风农场发电量 2506.20554v1 -
236 06-25 Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks Weniger Aufmerksamkeit auf trügerische Artefakte: Robuste Erkennung von komprimierten Deepfakes auf Online-Sozialen Netzwerken 较少注意欺骗性人工制品:在网上社交网络上大力发现压缩的深层假象 2506.20548v1 -
237 06-25 Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls Kontextuelle Optimierung unter Kovariate Shift: Ein robuster Ansatz durch Intersektion von Wassersteinkugeln 共变转移下的上下文优化:通过交叉的瓦森斯泰因球 采取强有力的方法 2406.02426v2 -
238 06-25 Demonstration of effective UCB-based routing in skill-based queues on real-world data Demonstration eines effektiven UCB-basierten Routings in kompetenzbasierten Warteschlangen auf realen Daten 根据真实世界数据,在基于技能的队列中,演示基于UCB的有效路线 2506.20543v1 -
239 06-25 Adversarial Reasoning at Jailbreaking Time Widerspenstige Vernunft in der Zeit des Gefängnisbruchs 监狱破禁时间的对立理由 2502.01633v2 -
240 06-25 Physics-Informed Machine Learning Regulated by Finite Element Analysis for Simulation Acceleration of Laser Powder Bed Fusion Physik-informiertes maschinelles Lernen reguliert durch Finite Element Analyse für Simulation Beschleunigung von Laser-Pulver Bed Fusion 受激光粉尘床溶化加速模拟加速的有限元素分析规范的物理系统化机械学习 2506.20537v1 -
241 06-25 WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads WattsOnAI: Messen, Analysieren und Visualisieren von Energie und Carbon Footprint von KI-Workloads WattsOnAI:AI工作量的测量、分析、可视化能源和碳足迹 2506.20535v1 -
242 06-25 Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery Globale Konvergenz iterativ umgewichteter Least Squares für robuste Subraum-Recovery 自动再加权最低空间平面对强力亚空间恢复的全球趋同 2506.20533v1 -
243 06-25 Attention with Trained Embeddings Provably Selects Important Tokens Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus 与经过训练的嵌入器的关注 2505.17282v3 -
244 06-25 Variational Learning Finds Flatter Solutions at the Edge of Stability Variationelles Lernen findet flachere Lösungen am Rande der Stabilität 稳定边缘的变异学习发现快餐式解决方案 2506.12903v2 -
245 06-25 Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains Proximale Kontrolle von UAVs mit Federated Learning für Mensch-Roboter Collaborative Domains 人类-机器人合作域的联邦学习系统对无人驾驶航空器的优化控制 2412.02863v2 -
246 06-25 Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation Industrial Energy Disaggregation mit digitalem Twin-generated Dataset und effizienter Datenvergrößerung 工业能源分类与数字双生成数据集和高效数据扩增 2506.20525v1 -
247 06-25 On Advancements of the Forward-Forward Algorithm Auf den Fortschritten des Vorwärtsalgorithmus 关于前向前前进算法的推进 2504.21662v2 -
248 06-25 Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards Asymmetrisches REINFORCE für Off-Policy-Verstärkung-Lernen: Ausgleich positiver und negativer Belohnungen 非政策加强学习的不对称REINFORCE对非政策加强学习的影响:平衡正与负的奖励 2506.20520v1 -
249 06-25 VRAIL: Vectorized Reward-based Attribution for Interpretable Learning VRAIL: Vectorized Reward-based Attribution for Interpretable Learning VRAIL: 可解释性学习的矢量奖励 2506.16014v3 -
250 06-25 WallStreetFeds: Client-Specific Tokens as Investment Vehicles in Federated Learning WallStreetFeds: Kundenspezifische Token als Investment Vehicles in Federated Learning WallStreetFededs: 客户特有名称作为联邦学习联盟的投资工具 2506.20518v1 -
251 06-25 Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch Schnelle Bodendurchdringung Radar Dual-Parameter Vollwellenform Inversion Methode beschleunigt durch hybride Zusammenstellung von CUDA-Kernel-Funktion und PyTorch 通过混合汇编CUDA内核功能和PyTorch,加速采用快速穿透快速地面雷达双参数双参数全波形反转法 2506.20513v1 -
252 06-25 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling OctoThinker: Mittleres Training fördert verstärktes Lernen Scaling OctoThinker: 中级培训鼓励加强学习 2506.20512v1 -
253 06-25 Collaborative Batch Size Optimization for Federated Learning Kollaborative Batch-Größenoptimierung für Federated Learning 联邦学习联合会的合作批量数量优化 2506.20511v1 -
254 06-25 LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation LPOSS: Label Propagation über Patches und Pixel für Open-vocabulary Semantic Segmentation LPOSS: 用于开放式词汇语义分解的补丁和像素的标签传播 2503.19777v2 -
255 06-25 Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank Unidentifiziert und verwechselt? Zwei-Tower-Modelle für unvoreingenommenes Lernen verstehen 如何理解两塔式的无偏见学习模式到排名? 2506.20501v1 -
256 06-25 Training Plug-n-Play Knowledge Modules with Deep Context Distillation Training Plug-n-Play Wissensmodule mit Deep Context Destillation 具有深背景蒸馏作用的培训插件-n-玩耍知识模块 2503.08727v3 -
257 06-25 Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging Gut, ich werde es selbst verschmelzen: Ein Multi-Fidelity-Framework für automatisiertes Modellverschmelzen 好吧,我会合并它我自己:一个自动模型合并的多功能框架 2502.04030v2 -
258 06-25 ReCode: Updating Code API Knowledge with Reinforcement Learning ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen ReCode:更新法规API知识与强化学习 2506.20495v1 -
259 06-25 Multimodal Representation Learning and Fusion Multimodales Repräsentationslernen und -fusion 多模式代表性学习和融合 2506.20494v1 -
260 06-25 Non-equilibrium Annealed Adjoint Sampler Nicht-Equilibrium Annealed Adjoint Sampler 非平衡 Annaaled 联合采样器 2506.18165v2 -
261 06-25 Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning Offline-Zielkonditioniertes Verstärkungslernen mit projektiver Quasimetrieplanung 离线目标-有条件加强强化学习,进行预测准准准准规划 2506.18847v2 -
262 06-25 Counterfactual Influence as a Distributional Quantity Gegenfaktischer Einfluss als Verteilungsmenge 分发量的反事实影响 2506.20481v1 -
263 06-25 Graph Linearization Methods for Reasoning on Graphs with Large Language Models Graphische Linearisierungsmethoden zur Begründung von Graphen mit großen Sprachmodellen 用于解释大语言模型图表的线性线性方法 2410.19494v3 -
264 06-25 MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing MARCO: Multi-Agent Code-Optimierung mit Echtzeit-Knowledge Integration für High-Performance Computing MARCO: 利用实时知识整合优化多机构代码,促进高绩效计算 2505.03906v3 -
265 06-25 Physics-informed Imitative Reinforcement Learning for Real-world Driving Physik-informiert Imitative Verstärkungs-Lernen für das Fahren in der realen Welt 为现实世界驾驶进行物理知情的模拟强化学习 2407.02508v3 -
266 06-25 HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling HiWave: Schulungsfreie High-Resolution-Bildgenerierung über Wavelet-basierte Diffusions-Sampling Hiwave:通过以波子为基础的传播抽样生成高分辨率图像,无需培训 2506.20452v1 -
267 06-25 Automatic Demonstration Selection for LLM-based Tabular Data Classification Automatische Demonstrationsauswahl für LLM-basierte Tabellendatenklassifikation 以LLM为基础的表格数据分类的自动演示选择 2506.20451v1 -
268 06-25 Image Super-Resolution with Guarantees via Conformalized Generative Models Bild Super-Resolution mit Garantien über konformisierte Generative Modelle 图像超级分辨率,通过正规化创制模型提供保障 2502.09664v2 -
269 06-25 Méthode de quadrature pour les PINNs fondée théoriquement sur la hessienne des résiduels Méthode de quadrature pour les PINNs Fondée théoriquement sur la hessienne des résiduels 厄立特里亚武装部队武装部队武装部队的 PIN-PIN-PENs 省立立立方体元体 2506.20441v1 -
270 06-25 Tackling Data Heterogeneity in Federated Learning through Knowledge Distillation with Inequitable Aggregation Bekämpfung von Daten Heterogenität im Föderierten Lernen durch Wissensdestillation mit unwiderruflicher Aggregation 通过知识蒸馏处理联邦学习中的数据异质性,以不平等的聚合方式进行知识蒸馏 2506.20431v1 -
271 06-25 Scalable Subset Selection in Linear Mixed Models Skalierbare Subset-Auswahl in linearen gemischten Modellen 线性混合模型中可缩放子集选择 2506.20425v1 -
272 06-25 Off-Policy Evaluation and Learning for the Future under Non-Stationarity Off-Policy-Evaluierung und -Lernen für die Zukunft unter Nicht-Stationarität 非政策性评价和在非标准化下学习未来 2506.20417v1 -
273 06-25 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Kein kostenloses Mittagessen: Internes Feedback für LLM Reasoning neu denken 无免费午餐:重新思考LLM理由解释的内部反馈 2506.17219v2 -
274 06-25 Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning Client Clustering trifft auf Wissensaustausch: Verbesserung der Privatsphäre und Robustheit im personalisierten Peer-to-Peer-Learning 客户群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群 2506.20413v1 -
275 06-25 POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes POLAR: Pessimistisches modellbasiertes politisches Lernen Algorithmen für dynamische Behandlungssysteme POLAR: 动态治疗制度基于政策学习模型的悲观模型 2506.20406v1 -
276 06-25 scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection scMamba: Ein skalierbares Foundation-Modell für die Single-Cell-Multi-Omics-Integration jenseits einer sehr variablen Feature-Auswahl scMamba:一个超越高可变地物选择的单细胞多有机集成的可缩放基础模型 2506.20697v1 -
277 06-25 Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking Ausbeuten von leichten Hierarchischen ViT und Dynamic Framework für effizientes visuelles Tracking 利用轻量轻级高压静电和高效视觉跟踪动态框架 2506.20381v1 -
278 06-25 TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis TESSERA: Temporale Einbettungen von Oberflächenspektren für die Darstellung und Analyse der Erde TESSERA:用于地球代表和分析的地平面表面表层实时嵌入 2506.20380v1 -
279 06-25 WyckoffDiff – A Generative Diffusion Model for Crystal Symmetry WyckoffDiff - ein generatives Diffusionsmodell für die Kristallsymmetrie WycccoffDiff – – 水晶对称生成扩散模型 2502.06485v3 -
280 06-25 Chemical knowledge-informed framework for privacy-aware retrosynthesis learning Chemischer wissensbasierter Rahmen für datenschutzbewusstes Retrosynthese-Lernen 以化学知识为基础的隐私意识复后学习框架 2502.19119v2 -
281 06-25 InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking InvZW: Invariantes Feature-Lernen über Lärm-Adversarial-Training für robuste Bild-Null-Wasser-Markierung InvZW:通过对强力图像零水标记的噪音 – – Adversarial培训进行不易变地物学习 2506.20370v1 -
282 06-25 A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges Eine Umfrage zum Erklärbaren Verstärkungslernen: Konzepte, Algorithmen, Herausforderungen 关于 “ 可解释的强化学习调查:概念、等级、挑战 “ 的调查 2211.06665v5 -
283 06-25 Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations Selbstüberwachtes Graphenlernen über Spektrale Bootstrapping- und Laplacian-basierte Augmentationen 通过光谱推进和拉平板辅助和拉平板辅助增强学习自摄图像 2506.20362v1 -
284 06-25 Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach Auf dem Weg zu einer interpretierbaren und effizienten Feature-Auswahl in Trajektori-Datensätzen: Ein taxonomischer Ansatz 走向在轨迹数据集中进行解释和高效地物选择:分类学方法 2506.20359v1 -
285 06-25 A foundation model with multi-variate parallel attention to generate neuronal activity Ein Fundamentmodell mit multivariater paralleler Aufmerksamkeit zur Generierung neuronaler Aktivität 具有多变量平行关注以产生神经活动的基础模型 2506.20354v1 -
286 06-25 Backpropagation Through Time For Networks With Long-Term Dependencies Backpropagation durch die Zeit für Netzwerke mit langfristigen Abhängigkeiten 长期依赖网络在时间上反向通信 2103.15589v3 -
287 06-25 DipSVD: Dual-importance Protected SVD for Efficient LLM Compression DipSVD: Dual-Importance Protected SVD für effiziente LLM-Kompression DipSVD: 用于高效LLM压缩的双重重要性保护SVD 2506.20353v1 -
288 06-25 It’s not you, it’s me – Global urban visual perception varies across demographics and personalities Es sind nicht Sie, es bin ich – Die globale urbane visuelle Wahrnehmung variiert zwischen demographischen und Persönlichkeiten 不是你,是我,全球城市的视觉感知 不同人口和个性的不同 2505.12758v2 -
289 06-25 On the ability of Deep Neural Networks to Learn Granger Causality in Multi-Variate Time Series Data Über die Fähigkeit von Deep Neural Networks, Granger-Causalität in mehrstufigen Zeitreihendaten zu lernen 关于深神经网络在多变时间序列数据中学习重力原因的能力 2506.20347v1 -
290 06-25 Signatures of planets and Galactic subpopulations in solar analogs. Precise chemical abundances with neural networks Signaturen von Planeten und galaktischen Subpopulationen in solaren Analogen. Präzise chemische Fülle mit neuronalen Netzwerken 太阳模拟物中行星和银河子人口组的签名; 具有神经网络的精密化学丰度 2506.20345v1 -
291 06-25 A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization Eine vollständige Verlustlandschaftsanalyse der Regularisierten Tiefenmatrix-Fabrikierung 对正规化深母体因子化的全损全损地貌分析 2506.20344v1 -
292 06-25 Feature Hallucination for Self-supervised Action Recognition Feature Halluzination für die Selbstüberwachte Aktionserkennung 自我监督行动承认的幻觉 2506.20342v1 -
293 06-25 Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design Recurrent neuronale netzwerkbasierte robuste Steuerungssysteme mit geschlossener, regionaler Inkrementelle ISS und Anwendung in MPC-Design 经常性神经网络的稳健神经网络控制系统,带有闭环区域递增性国际空间站并应用于多氯三联苯设计 2506.20334v1 -
294 06-25 Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Biomed-angereichert: Ein biomedizinischer Datensatz mit LLMs für die Vorschulung und Extraktion seltener und versteckter Inhalte 生物医学富含生物医学:生物医学数据集,配有预培训和提取稀有和隐藏内容的LLMMs 2506.20331v1 -
295 06-25 Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition Representatives Lernen mit parameterisierten Quantenkreisen zur Förderung der Sprachemotionserkennung 代表制学习,与推进言语情感识别参数量子电路进行代表制学习 2501.12050v3 -
296 06-25 Producer-Fairness in Sequential Bundle Recommendation Hersteller-Fairness in Sequential Bundle Empfehlung 序套件建议中的生产者公平 2506.20329v1 -
297 06-25 Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning Permutation Gleichwertige neural gesteuerte Differentialgleichungen für dynamisches Graphendarstellungslernen 用于动态图表代表性学习的等同神经控制的异异性神经控制差异等量 2506.20324v1 -
298 06-25 Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach Vergleichende Analyse von Deep-Learning-Modellen zur Erkennung von Crop Disease: Ein Transfer-Learning-Ansatz 作物疾病检测深学习模型的比较分析:转让学习方法 2506.20323v1 -
299 06-25 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning Konfuzius3-Math: Leichtes Hochleistungs-LLM für das chinesische K-12 Mathematik-Lernen 剖析3- 数学: 中国 K-12 数学学习的轻量级高性能推理法LLMLM 2506.18330v2 -
300 06-25 BINDy – Bayesian identification of nonlinear dynamics with reversible-jump Markov-chain Monte-Carlo BINDy – Bayesische Identifikation von nichtlinearen Dynamiken mit reversiblem Sprung Markov-Kette Monte-Carlo BINDI-BINDy-Bayesian 识别非线性动态与可逆-可逆-jump Markov-链链-Monte-Carlo的非线性动态 2408.08062v3 -
301 06-25 Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration Beyond-Expert Performance mit begrenzten Demonstrationen: Effizientes Imitationslernen mit doppelter Exploration 具有有限演示的超出专家的超专业性能:高效的双重探索模拟学习 2506.20307v1 -
302 06-25 Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding Moderate Input-Sensitive Funktionen lernen: Eine Fallstudie in QR-Code-Dekodierung 学习中度投入-敏感性功能:QR编码编码的案例研究 2506.20305v1 -
303 06-25 Bilinear MLPs enable weight-based mechanistic interpretability Bilineare MLPs ermöglichen gewichtsbasierte mechanistische Interpretationsfähigkeit 双线MLPs使基于重量的机械机械解释能力得以实现 2410.08417v2 -
304 06-25 Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v2 -
305 06-25 OLALa: Online Learned Adaptive Lattice Codes for Heterogeneous Federated Learning OLala: Online gelernte adaptive Gittercodes für heterogenes Federated Learning OLALA: 异质联邦学习在线知识适应性拉蒂码 2506.20297v1 -
306 06-25 Provably Improving Generalization of Few-Shot Models with Synthetic Data Wahrscheinliche Verbesserung der Verallgemeinerung von wenigen scharfen Modellen mit synthetischen Daten 改进利用合成数据普及微小热模型及合成数据 2505.24190v2 -
307 06-25 Flexible Infinite-Width Graph Convolutional Neural Networks Flexible Infinite-Width Graph Convolutional Neural Networks 灵活的无限线-线形图进化神经神经网络 2402.06525v2 -
308 06-25 Efficient uniform approximation using Random Vector Functional Link networks Effiziente einheitliche Annäherung mit Random Vector Functional Link-Netzwerken 使用随机矢量功能链接网络的有效统一近似 2306.17501v2 -
309 06-25 Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo Lösen von linear-gaussischen inversen Problemen mit entkoppelter Diffusion Sequential Monte Carlo 解决线性 – – 高加索-巴伊西亚州脱相扩散的反向问题 2502.06379v2 -
310 06-25 Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective Über topologische selbsterklärbare GNNs hinaus: Eine formale Erklärbarkeitsperspektive 超越地形学的自我自我解释的GNNs:正式解释的视角 2502.02719v2 -
311 06-25 Distilling A Universal Expert from Clustered Federated Learning Destillieren eines universellen Experten aus clustered Federated Learning 一名来自分组联邦学习的通用专家 2506.20285v1 -
312 06-25 Forensic Study of Paintings Through the Comparison of Fabrics Forensische Studie von Gemälden durch den Vergleich von Stoffen 比较印刷品法证研究 2506.20272v1 -
313 06-25 X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnose XSIT:痴呆症诊断的内在解释式地表视野变异器 2506.20267v1 -
314 06-25 3D variational autoencoder for fingerprinting microstructure volume elements 3D-Variations-Autoencoder für die Fingerabdruck-Mikrostruktur-Volume-Elemente 用于指纹微结构体积元素的 3D 变式自动编码器 2503.17427v3 -
315 06-25 Exploration-Exploitation Tradeoff in Universal Lossy Compression Explorations-Exploitation-Tradeoff bei universeller Lossy-Kompression 普遍损失压缩中探索-探索-探索-开发权衡 2506.20261v1 -
316 06-25 Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders Feintuning-Maschine-erlernte Partikelstrom-Rekonstruktion für neue Detektorgeometrien in zukünftigen Kollidern 微调机了解粒子流重建,以在未来相撞器中进行新探测器的地形 2503.00131v4 -
317 06-25 Argumentative Ensembling for Robust Recourse under Model Multiplicity Argumentatives Zusammenbauen für robusten Rücklauf unter Modellvielfalt 多种模式下强力利用的参数组合 2506.20260v1 -
318 06-25 A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features Ein transformerbasiertes Handschrifterkennungssystem, das Online- und Offline-Funktionen verwendet 联合使用在线和离线特点的基于变换手写识别系统 2506.20255v1 -
319 06-25 Time-series surrogates from energy consumers generated by machine learning approaches for long-term forecasting scenarios Zeitreihen von Energieverbrauchern, die durch maschinelle Lernansätze für langfristige Prognoseszenarien erzeugt werden 长期预测设想情景的机器学习方法产生的能源消费者代用时间序列代用 2506.20253v1 -
320 06-25 Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models Q-resafe: Bewertung von Sicherheitsrisiken und Quantization-aware Sicherheits-Patching für Quantized Large Language Models Q-安全:评估安全风险和量化大语言模式量化大语言模型的量化安全补丁 2506.20251v1 -
321 06-25 FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data FDBKD: 精化的联邦学习学习,以接受非二二二二二维数据方面的全球化和个性化 2506.20245v1 -
322 06-25 Dual-Channel Multiplex Graph Neural Networks for Recommendation Dual-Channel Multiplex Graph Neuronale Netzwerke zur Empfehlung 供建议用的双声道多气多气图神经网络 2403.11624v5 -
323 06-25 Directed Link Prediction using GNN with Local and Global Feature Fusion Direkte Link-Vorhersage mit GNN mit lokaler und globaler Feature-Fusion 使用GNN与本地和全球地貌融合的GNN进行直接链接预测 2506.20235v1 -
324 06-25 E-ABIN: an Explainable module for Anomaly detection in BIological Networks E-ABIN: ein erklärbares Modul zur Anomalieerkennung in BIologischen Netzwerken E-ABIN:生物网络异常检测可解释模块 2506.20693v1 -
325 06-25 Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems 通过相互作用粒子系统逐步自由序列的巴伊西亚实验设计 2504.13320v2 -
326 06-25 SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling SLEEPING-DISCO 9M: Ein großformatiger Vortrainings-Datensatz für generative Musikmodellierung SLEPING-DISCO 9M:用于基因音乐建模的大规模培训前数据集 2506.14293v3 -
327 06-25 Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast Unterstützung der Planung und des Betriebs erneuerbarer Energien mit datengetriebener Hochauflösungs-Ensemble-Wettervorhersage 支持可再生能源规划和运作,以数据驱动的高分辨率高分辨率气象组合组合天气预报支持可再生能源规划和运作 2505.04396v2 -
328 06-25 MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution MS-TVNet:Eine Langzeit-Zeitreihenvorhersagemethode auf der Grundlage multi-Scale Dynamic Convolution MS-TVNet:基于多空间动态演变的长期时间序列预测方法 2506.17253v2 -
329 06-25 Curved representational Bregman divergences and their applications Gebogene Repräsentationsdivergenzen von Bregman und deren Anwendungen 曲线代表布列格曼差异及其应用 2504.05654v2 -
330 06-25 Affective Priming Score: A Data-Driven Method to Detect Priming in Sequential Datasets Affektiver Priming-Score: Eine datengetriebene Methode, um Priming in Sequenzdatensätzen zu erkennen 情感原始分数:在序列数据集中检测原始数据的数据驱动方法 2506.20204v1 -
331 06-25 Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach Zero-Shot Attribution für große Sprachmodelle: Ein Distributionstestverfahren 大语言模式零点位数:分销测试方法 2506.20197v1 -
332 06-25 DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs DuoGPT: Training-freie Dual Sparsity durch Aktivierungs-bewusstes Pruning in LLMs DuoGPT:通过在LLM中采取积极-有意识的节制措施,实现无培训的双重平等 2506.20194v1 -
333 06-25 IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model IKDiffuser: Ein generatives Inverse Kinematik-Lösemittel für Multiarm-Roboter über Diffusionsmodell IKDiffuser: 通过扩散模型为多武器机器人制造的生成反反反虚拟解答器 2506.13087v3 -
334 06-25 Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-Informed Neural Networks Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-informed Neural Networks 通过反事实物理内成神经网络在部分差别中发现因果操作器 2506.20181v1 -
335 06-25 COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees COIN: Ungewissheitssicherung Selektive Frage-Beantwortung für Stiftungsmodelle mit wahrscheinlichen Risikogarantien COIN: 可靠风险保障基础模型的不确定性保护选择性问题选择性回答 2506.20178v1 -
336 06-25 Valid Selection among Conformal Sets Gültige Auswahl unter konformen Sets 在套件中有效选择 2506.20173v1 -
337 06-25 Causal discovery in deterministic discrete LTI-DAE systems Kausale Entdeckung in deterministischen diskreten LTI-DAE-Systemen LTI-DAE系统中决定性离散离散系统中的因果发现 2506.20169v1 -
338 06-25 Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes Aktives Lernen von tiefen neuralen Netzwerken durch gradient-free Schneidplanen 通过无梯度断层计划积极学习深神经网络 2410.02145v5 -
339 06-25 Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners Belohnender Graph Reasoning Prozess macht LLMs mehr Generalized Reasoners 奖励图表说明程序使LLMs公司更普遍化理由 2503.00845v2 -
340 06-25 Counterfactual Fairness through Transforming Data Orthogonal to Bias Counterfactual Fairness durch Umwandlung von Daten Orthogonal zu Bias 通过将数据正正向转换为比亚斯来反事实公平 2403.17852v3 -
341 06-25 Accept More, Reject Less: Reducing up to 19% Unnecessary Desk-Rejections over 11 Years of ICLR Data Mehr akzeptieren, weniger ablehnen: bis zu 19% reduzieren Unnötige Desk-Abweisungen über 11 Jahre ICLR-Daten 接受更多,拒绝减:在11年的ICLR数据中,将不必要的书面拒绝减少19% 2506.20141v1 -
342 06-25 Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis Stückweise lineare Annäherung in Lernindexstrukturen: Theoretische und empirische Analyse 进化指数结构的细线近似:理论和经验分析 2506.20139v1 -
343 06-25 TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis TSPulse: Dual Space Tiny Pre-Trained Modelle für die schnelle Zeit-Serien-Analyse TSPulse: 快速时序分析的双重空间细细件前培训模型 2505.13033v2 -
344 06-25 High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data High-Resolution Live Fuel Moisture Content (LFMC) Karten für Wildfire-Risiko aus multimodalen Erdbeobachtungsdaten 多式地球观测数据产生的野火风险高分辨率活燃料动力内容地图 2506.20132v1 -
345 06-25 Log-Linear Attention Log-Linear-Achtung 日志边注意 2506.04761v2 -
346 06-25 CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation CCRS: Ein Null-Shot LLM-as-a-Richter-Rahmen für eine umfassende RAG-Bewertung CCRS: 全面RAG综合评价框架 2506.20128v1 -
347 06-25 Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts Bewertung der Verallgemeinerung und Vertretungsstabilität in kleinen LMs durch Prompting, Fine-Tuning und Out-of-Distribution Prompts 通过提示、罚款和销售外提示评估小型液流中小液流中普遍化和代表性稳定情况 2506.17289v2 -
348 06-25 Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests Einsatz von KI-Gradern für fehlende Score-Imputation, um eine genaue Abschätzung der Fähigkeit in konstruierten Reaktionstests zu erreichen 利用AI 级数来计算缺失计分数,以在建构反应测试中实现准确能力估算 2506.20119v1 -
349 06-25 U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs U-R-VEDA: Integration von UNET, Residual Links, Edge und Dual Attention und Vision Transformer für präzise semantische Segmentierung von CMRs U-R-VEDA:将UNET、残余链接、边缘和双重关注以及遗留集束弹药准确的语义分割的愿景变异器结合起来 2506.20689v1 -
350 06-25 Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives 从树形集合中提取解释模型:计算和统计视角 2506.20114v1 -
351 06-25 Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox Autonome Cyber-Resilienz durch ein Co-Evolutionäres Waffenrennen innerhalb einer verstärkten digitalen Twin Sandbox 通过在强化数字双沙箱内共同推进的军备竞赛实现自动网络复原力 2506.20102v1 -
352 06-25 What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning Was in LLM-generierten Daten zählt: Vielfalt und ihre Wirkung auf Modell Feintuning LLM产生的数据中哪些重要:多样性及其对模拟微调的影响 2506.19262v2 -
353 06-25 MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations MIRAGE: Benchmark für multimodale Informationssuche und -vernunft in sachverständigen Gesprächen in der Landwirtschaft MIRAGE:农业专家指导下的农业多模式信息查找和说明理由基准 2506.20100v1 -
354 06-25 BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning BeltCrack: Der erste Sequential-Image-Industrie-Förderband Crack Detection Datensatz und seine Basis mit Triple-Domain Feature Learning BeltCrack:第一个序列图像工业相像式工业电容器带裂缝探测数据集及其基线,包括三域主文学习 2506.17892v2 -
355 06-25 Fine-Grained Perturbation Guidance via Attention Head Selection Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl 通过 “ 关注负责人甄选 “ 指导 2506.10978v2 -
356 06-25 MEL: Multi-level Ensemble Learning for Resource-Constrained Environments MEL: Multi-Level-Ensemble-Lernen für ressourcenbeschränkte Umgebungen MEL:为受资源制约的环境进行多层次连锁学习 2506.20094v1 -
357 06-25 Understanding World or Predicting Future? A Comprehensive Survey of World Models Welt verstehen oder Zukunft voraussagen? Eine umfassende Übersicht über Weltmodelle 了解世界或预测未来?世界模式综合概览 2411.14499v2 -
358 06-25 A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression Eine Übersicht über Predictive Maintenance Methods: Eine Analyse der Prognostik durch Klassifizierung und Regression 预测维护方法调查:通过分类和递减分析预测指标 2506.20090v1 -
359 06-25 Attack Smarter: Attention-Driven Fine-Grained Webpage Fingerprinting Attacks Attack Smarter: aufmerksamkeitsgetriebene feinkörnige Webseiten-Fingerprinting-Angriffe 攻击智能:引人注意的精美网页指纹印攻击 2506.20082v1 -
360 06-25 Federated Learning Clients Clustering with Adaptation to Data Drifts Federated Learning Clients Clustering mit Anpassung an Daten Drifts 采用适应数据流数据组合组合的联邦学习客户 2411.01580v2 -
361 06-25 Quantum-Classical Hybrid Quantized Neural Network Quantum-klassische Hybride Quantisiertes Neuronales Netzwerk 量-量- 量- 量- 量 混合 量- 量- 量 量- 量 量- 混合 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量 量- 量- 量- 量- 量- 混合混合 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 质- 质- 量- 量- 量- 量- 量- 量- 质- 质- 量- 量- 质- 质- 质- 质- 质- 量- 量- 量- 量- 质- 质- 量- 量- 质- 质- 质- 质- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 2506.18240v2 -
362 06-25 mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v2 -
363 06-25 A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs Ein modulares Multitask-Reasoning-Framework Integrating Spatio-temporal Models und LLMs 纳入时空空间模型和LLMs的模块多任务解释框架 2506.20073v1 -
364 06-25 Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges Leichte Fußgängererkennung in Sicht- und Infrarotbild-Feeds: Probleme und Herausforderungen 可见和红外图像输入中的低亮害虫探测:问题和挑战 2311.08557v3 -
365 06-25 Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision Multimodale Informationen Retrieval für offene Welt mit Edit Distanz Schwache Überwachung 编辑远程弱力监督的开放世界多模式信息检索器 2506.20070v1 -
366 06-25 Thought Anchors: Which LLM Reasoning Steps Matter? Thought Anchors: Welche LLM-Gründungsschritte sind wichtig? 何为理据步骤? 2506.19143v2 -
367 06-25 Conformal Prediction with Upper and Lower Bound Models Konforme Vorhersage mit oberen und unteren Bound-Modellen 与上下下两界模型的非正规预测 2503.04071v2 -
368 06-24 (2) Identifying Heterogeneity in Distributed Learning Heterogenität im verteilten Lernen identifizieren 确定分布式学习中的差异性 2506.16394v3 -
369 06-24 Supervised Coupled Matrix-Tensor Factorization (SCMTF) for Computational Phenotyping of Patient Reported Outcomes in Ulcerative Colitis Überwachte gekoppelte Matrix-Tensor-Fabrikation (SCMTF) für Computational Phenotyping von Patienten berichteten Ergebnisse bei Ulcerative Colitis 受监督的用于计算表性科结结结结结结结结结结结结结果的病人报告结果的计算式基因分析的矩阵-传感器系数(SCMTF) 2506.20065v1 -
370 06-24 Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models Lernen von Instruction-Following-Richtlinien durch offenes Instruction-Relabeling mit großen Sprachmodellen 通过不限名额指令与大语言模式重新标签 2506.20061v1 -
371 06-24 The Alignment Trap: Complexity Barriers Die Alignment-Falle: Komplexitätsbarrieren 协调陷阱:复杂障碍 2506.10304v2 -
372 06-24 Universal pre-training by iterated random computation Universelles Pre-Training durch iterierte Zufallsberechnung 通过迭代随机计算进行通用预培训 2506.20057v1 -
373 06-24 Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization Machine-Learning-Assisted Photonic Device Development: Ein multiskaliger Ansatz von der Theorie zur Charakterisierung 机学辅助光学设备开发:从理论到定性的多尺度方法 2506.20056v1 -
374 06-24 MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models MegaFold: System-Level-Optimierungen zur Beschleunigung von Proteinstruktur-Vorhersagemodellen MegaFold:加速蛋白质结构结构预测模型的全系统优化 2506.20686v1 -
375 06-24 A Principled Path to Fitted Distributional Evaluation Ein prinzipieller Weg zur integrierten Verteilungsevaluierung 合格分配评价的一条原则性道路 2506.20048v1 -
376 06-24 GNN’s Uncertainty Quantification using Self-Distillation Die Unbestimmtheitsquantifizierung von GNN mittels Selbstdestillation GNN 使用自处理法对不确定性进行量化 2506.20046v1 -
377 06-24 PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning PocketVina ermöglicht skalierbare und hochgenaue physikalisch gültige Docking durch Multi-Pocket-Konditionierung PocketVina 通过多盘附加条件, 使可缩放和高度精确的物理有效折叠 2506.20043v1 -
378 06-24 LSH-DynED: A Dynamic Ensemble Framework with LSH-Based Undersampling for Evolving Multi-Class Imbalanced Classification LSH-DynED: Ein dynamisches Ensemble-Framework mit LSH-basierter Unterprobe für die Evolving-Multi-Class-Unausgeglichene Klassifizierung LSH-Dyned:一个动态组合框架,以基于LSH的下层取样为基础,用于不断演化的多类综合分类 2506.20041v1 -
379 06-24 Cross-Layer Discrete Concept Discovery for Interpreting Language Models Cross-Layer Discrete Concept Discovery für Interpretationssprachmodelle 解释语言模型的跨语言监听概念发现 2506.20040v1 -
380 06-24 Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning Bilaterale Teambildung im kooperativen Multi-Agenten-Verstärkungs-Lernen lernen 合作多机构加强合作学习双边学习小组 2506.20039v1 -
381 06-24 Verifiable Unlearning on Edge Überprüfbares Lernen am Rande 边缘不可核实的学习 2506.20037v1 -
382 06-24 Neural network-based Godunov corrections for approximate Riemann solvers using bi-fidelity learning Neurale netzwerkbasierte Godunov-Korrekturen für ungefähre Riemann-Löser mit Bi-Fidelity-Lernen 近似Riemann的Riemann解决者使用双性忠诚学习校正 2503.13248v2 -
383 06-24 Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning Automatisierte Generierung von vielfältigen Handlungskursen für Multi-Agenten-Betriebe mit Binäroptimierung und Graphen-Lernen 利用二进制优化和图表学习,自动产生多种多机构业务行动多样化行动方案 2506.20031v1 -
384 06-24 Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining Daumen auf der Waage: Optimaler Verlustgewichtung in Last Layer Retraining 缩放缩略图: 上层再训练中的最佳损耗 2506.20025v1 -
385 06-24 Evaluating Long Range Dependency Handling in Code Generation LLMs Bewertung der Langzeitabhängigkeitsbehandlung in LLMs der Code-Generation 评估代码生成中的长期依赖性处理 2407.21049v2 -
386 06-24 Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting Erklärte Rollendiffusionsmodelle für probabilistische Wettervorhersagen 预测概率天气预测的显学滚滚传播模型 2506.20024v1 -
387 06-24 DIM-SUM: Dynamic IMputation for Smart Utility Management DIM-SUM: Dynamische Imputation für intelligentes Utility Management DIM-SUM: 智能工具管理动态数字 2506.20023v1 -
388 06-24 New Insights on Unfolding and Fine-tuning Quantum Federated Learning Neue Erkenntnisse zum Entfalten und Feintuning Quantum-Federated Learning 新《关于不增加和微调量量子联邦学习的新观点》 2506.20016v1 -
389 06-24 Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion Auf dem Weg zu besseren Benchmark-Datensätzen für induktive Wissensgraphenvervollständigung 建立更好的基准数据集,以完成引入知识图的完成 2406.11898v3 -
390 06-24 Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons Neuromorphes drahtloses Split Computing mit Resonanz-und-Feuer-Neuronen 神经无线神经无线分裂计算,有共振和火灾中中子 2506.20015v1 -
391 06-24 DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation DRO-Augment Framework: Robustheit durch Synergisieren Wasserstein distributiv robust Optimierung und Datenvergrößerung DRO - 增强框架:通过协调瓦森斯坦(Wasserstein)的分布式强力优化和数据增强,使瓦森斯坦(Wasserstein)的分布性强力 2506.17874v2 -
392 06-24 Scalable Machine Learning Algorithms using Path Signatures Skalierbare maschinelle Lernalgorithmen mit Pfadsignaturen 使用路径签名缩放机器学习算法 2506.17634v2 -
393 06-24 Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing Kann ein Sicherheitsschlaufe Guard sie alle? Agentic Guard Rails für Federated Computing 一个安全环圈能保护全部吗? 2506.20000v1 -
394 06-24 A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior Ein Spatio-Temporal-Punkt-Verfahren zur feinkörnigen Modellierung des Leseverhaltens 阅读行为精细模拟模型的斯帕迪奥时点进程 2506.19999v1 -
395 06-24 In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory In-Context Learning for Gradient-Free Receiver Adaptation: Prinzipien, Anwendungen und Theorie 逐步免费接收者适应:原则、应用和理论 2506.15176v2 -
396 06-24 TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design TRACED: Transition-aware regret Annäherung mit Mitlernbarkeit für Umweltdesign TRACEED: 环境设计中具有共负环境设计共负作用的过渡-意识到遗憾相近情况 2506.19997v1 -
397 06-24 CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems CoVE: Komprimierte Vokabelerweiterung macht LLM-basierte Recommender-Systeme besser COVE:压缩的词汇扩充使基于LLM的推荐系统更好 2506.19993v1 -
398 06-24 HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization HERCULES: Hierarchische Einbettung von rekursiven Clustern mit LLMs für eine effiziente Zusammenfassung HERCULES:利用LLMs高效汇总法,基于等级嵌入式嵌入式递递性集群 2506.19992v1 -
399 06-24 Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems Follow-the-Perturbed-Leader nähert sich Best-of-Both-Worlds für die m-Set Semi-Bandit-Probleme M-Set半银行问题最佳世界最佳办法 2504.07307v3 -
400 06-24 MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v2 -
401 06-24 Proofs as Explanations: Short Certificates for Reliable Predictions Beweise als Erklärungen: Kurze Zertifikate für zuverlässige Vorhersagen 作为解释的证明:可靠预测的短期证明 2504.08377v3 -
402 06-24 FORTRESS: Frontier Risk Evaluation for National Security and Public Safety FORTRESS: Frontier Risk Evaluation für nationale Sicherheit und öffentliche Sicherheit FORTRES:国家安全和公共安全的边界风险评估 2506.14922v2 -
403 06-24 MAIZX: A Carbon-Aware Framework for Optimizing Cloud Computing Emissions MAIZX: Ein Carbon-Aware-Framework zur Optimierung von Cloud-Computing-Emissionen MAIZX:优化云计算排放的碳软件框架 2506.19972v1 -
404 06-24 COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty COBRA-PPM: Eine kausale Bayesian-Reasoning-Architektur mit probabilistischer Programmierung für Robotermanipulation unter Unsicherheit COBRA-PPM: 在不确定性下对机器人操纵进行概率程序设计 2403.14488v3 -
405 06-24 Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models Fuzz-Testing trifft auf LLM-basierte Agenten: Ein automatisiertes und effizientes Framework für Jailbreaking-Text-to-Image-Generationsmodelle 以LLM为根据的代理物:一个自动有效的框架,用于制作监狱破译文本到图像制作模型。 2408.00523v3 -
406 06-24 Protein Structure Tokenization: Benchmarking and New Recipe Proteinstruktur Tokenization: Benchmarking und neues Rezept 蛋白质结构化:基准和新食谱 2503.00089v2 -
407 06-24 Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems Progressives Size-Adaptive-Federated Learning: Ein umfassender Rahmen für heterogene multimodale Datensysteme 渐进式规模-成熟型联邦学习:多种模式数据系统综合框架 2506.20685v1 -
408 06-24 SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models SA-Solver: Stochastischer Adams Solver für schnelle Probenahme von Diffusionsmodellen SA-Solver:用于快速采样扩散模型的蒸汽器溶解器 2309.05019v3 -
409 06-24 MILAAP: Mobile Link Allocation via Attention-based Prediction MILAAP: Mobile Link Allocation über aufmerksamkeitsbasierte Vorhersage MILAAP:通过基于关注的预测分配移动链接 2506.19947v1 -
410 06-24 Data-Driven Dynamic Factor Modeling via Manifold Learning Datengetriebene Dynamische Faktormodellierung über Manifold Learning 数据驱动动态因子通过 MManiple Learning 学习模式建模 2506.19945v1 -
411 06-24 LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps LLM-Wassermarkierung mit Mischungen und statistischen bis rechnerischen Lücken LLM LLM 利用混合体和统计到统计差距进行水标记 2505.01484v2 -
412 06-24 The Most Important Features in Generalized Additive Models Might Be Groups of Features Die wichtigsten Merkmale in generalisierten additiven Modellen könnten Gruppen von Funktionen sein 通用Additive模型中最重要的地物可能是地物群 2506.19937v1 -
413 06-24 Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture Jede Bestellung GPT als Maskierte Diffusion Modell: Entkopplung Formulierung und Architektur 任何指令 GPT , 以遮蔽扩散模型: 脱钩制成和结构 2506.19935v1 -
414 06-24 C-Learner: Constrained Learning for Causal Inference C-Learner: Eingeschränktes Lernen für kausale Schlussfolgerung C-Learner: 控制学习以诱因推断 2405.09493v4 -
415 06-24 Anomaly Detection and Radio-frequency Interference Classification with Unsupervised Learning in Narrowband Radio Technosignature Searches Anomalieerkennung und Hochfrequenz-Interferenzklassifikation mit unüberwachtem Lernen in Schmalband-Radio-Technosignatur-Suchen 在窄带无线电技术签名搜索中进行无监督学习的异常探测和无线电频率干扰分类 2411.16556v2 -
416 06-24 A Comparative Analysis of Reinforcement Learning and Conventional Deep Learning Approaches for Bearing Fault Diagnosis Eine vergleichende Analyse des Verstärkungslernens und konventioneller Deep-Learning-Ansätze zur Fault-Diagnose 强化学习和遗留过失诊断常规深习方法比较分析 2506.19929v1 -
417 06-24 Prover Agent: An Agent-based Framework for Formal Mathematical Proofs Prover Agent: Ein agentenbasiertes Framework für formale mathematische Nachweise 以代理人为基础的正式数学证明框架 2506.19923v1 -
418 06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Radiale Aufmerksamkeit: $O(n\log n)$ Sparse Achtung mit Energieverlust für lange Video-Generation 辐射注意: $O(nlog n)$ 散射注意, 长期视频生成的能源衰减导致能量衰减 2506.19852v1 -
419 06-24 Orthogonal Finetuning Made Scalable Orthogonale Feinsteuerung aus skalierbarem Material 可缩放 2506.19847v1 -
420 06-24 A Comparative Study of NAFNet Baselines for Image Restoration Eine vergleichende Studie von NAFNet Baselines für die Bildrestaurierung NAFNet图像恢复基线比较研究 2506.19845v1 -
421 06-24 Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering Konvergenz von mittleren Shift-Algorithmen für große Bandbreiten und simultanes präzises Clustering 大型带宽和同声精密集束中 平均移动比值的趋同 2506.19837v1 -
422 06-24 Machine Learning with Privacy for Protected Attributes Maschinelles Lernen mit Datenschutz für geschützte Attribute 带有受保护属性隐私的机器学习 2506.19836v1 -
423 06-24 Inferring Higher-Order Couplings with Neural Networks Rückschlüsse auf höhere Auftragskoppelungen mit neuralen Netzen 与神经网络连接 2501.06108v3 -
424 06-24 A standard transformer and attention with linear biases for molecular conformer generation Ein Standardtransformator und Aufmerksamkeit mit linearen Vorspannungen für die molekulare Konformergeneration 标准的变压器和对分子相配器生成具有线性偏偏的注意 2506.19834v1 -
425 06-24 Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential Fourier-Multi-Komponente und Multi-Layer-Neural-Netzwerke: Entsperren von Hochfrequenzpotenzialen Fariier多功能多功能多轨道神经网络:释放高功能潜能 2502.18959v2 -
426 06-24 Scaling Speculative Decoding with Lookahead Reasoning Spekulative Dekodierung mit Blick auf die Vernunft skalieren 带有 “ 眼前 “ 理由的 投机替代 2506.19830v1 -
427 06-24 Persona Features Control Emergent Misalignment Persona Eigenschaften Kontrolle Emergent Fehlausrichtung 人文特征控制 2506.19823v1 -
428 06-24 ProxelGen: Generating Proteins as 3D Densities ProxelGen: Proteine als 3D-Dichte generieren ProxelGen: 将蛋白质生成为 3D 密度 2506.19820v1 -
429 06-24 Model-Based Exploration in Monitored Markov Decision Processes Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen 在监测的Markov决策过程中进行基于模型的探索 2502.16772v5 -
430 06-24 Curating art exhibitions using machine learning Kunstausstellungen mit maschinellem Lernen kuratieren 利用机器学习,举办美术展览 2506.19813v1 -
431 06-24 Ambiguous Online Learning Vielfältiges Online-Lernen 模糊的在线学习 2506.19810v1 -
432 06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität KnowRL:探索知识强化学习促进事实质量 2506.19807v1 -
433 06-24 First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models First-Passage-Ansatz zur Optimierung von Störungen für verbessertes Training von Machine Learning-Modellen 优化干扰以改进机械学习模式培训的第一套办法 2502.04121v3 -
434 06-24 Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective Convolution-Gewichtungsmethode für das physikinformierte neuronale Netzwerk: Eine primär-duale Optimierungsperspektive 物理学-知情神经网络的革命加权法:原始-多极优化视角 2506.19805v1 -
435 06-24 Multiscale Training of Convolutional Neural Networks Multiskalige Ausbildung konvolutionärer neuraler Netzwerke 革命神经网络的多规模培训 2501.12739v3 -
436 06-24 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v1 -
437 06-24 A comparative analysis of machine learning algorithms for predicting probabilities of default Eine vergleichende Analyse von maschinellen Lernalgorithmen zur Vorhersage von Ausfallwahrscheinlichkeiten 对用于预测违约概率的机器学习算法进行比较分析 2506.19789v1 -
438 06-24 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning SRFT: Einstufige Methode mit überwachter und verstärkter Feinsteuerung für die Vernunft SRFT: 单一标准方法,以监督和加固为理由的罚款 2506.19767v1 -
439 06-24 FDA-Opt: Communication-Efficient Federated Fine-Tuning of Language Models FDA-Opt: Kommunikationseffizientes Federated Fine-Tuning von Sprachmodellen FFDA-Opt: 交流-高效联邦语言模型精密使用 2505.04535v2 -
440 06-24 The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series Die Form des Konsumverhaltens: Eine symbolische und topologische Analyse der Zeitreihen 《消费者行为形态:时间序列的象征和地形分析》 2506.19759v1 -
441 06-24 Cross-regularization: Adaptive Model Complexity through Validation Gradients Cross-Regulierung: Adaptive Modellkomplexität durch Validierungsgradienten 交叉正规化:通过验证梯度使适应性模型复杂度 2506.19755v1 -
442 06-24 A Robust Twin Parametric Margin Support Vector Machine for Multiclass Classification Eine robuste Twin-Parametrische Margin-Unterstützungs-Vektormaschine für die Multiclass-Klassifikation 多级分类的强力双双参数边距支持矢量机 2306.06213v3 -
443 06-24 On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls Über die Notwendigkeit einer adaptiven Regularisierung: Optimales Online-Lernen jederzeit auf $\boldsymbol{\ell_p}$-Bällen 关于适应性规范化的必要性: 最佳时间在网上学习$\ boldsysymbol_ell_p}$balls 2506.19752v1 -
444 06-24 Continuous Bayesian Model Selection for Multivariate Causal Discovery Kontinuierliche bayesische Modellauswahl für multivariate Kausalentdeckung 多变因果发现连续的巴伊西亚模型选择 2411.10154v2 -
445 06-24 DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization DecDEC: Ein Systemansatz zur Steigerung der LLM-Quantisierung mit niedrigem Bit DecDEC: 推进低比低级LLM量化的系统方法 2412.20185v2 -
446 06-24 Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls Noise Consistency Training: Ein nativer Ansatz für One-Step-Generator beim Lernen zusätzlicher Steuerungen 噪音一致性培训:在学习额外控制措施方面对单步发电机采取土著办法 2506.19741v1 -
447 06-24 Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery Q2SAR: Ein Quantum-Multiple-Kernel-Lernansatz für die Drogenentdeckung Q2SAR:药物发现量子多核心学习方法 2506.14920v2 -
448 06-24 Unscrambling disease progression at scale: fast inference of event permutations with optimal transport Verkrampfte Krankheitsprogression im Maßstab: schnelle Schlussfolgerung von Ereignispermutationen mit optimalem Transport 分解疾病大规模演变:以最佳运输方式快速推断事件变异 2410.14388v3 -
449 06-24 DRIFT: Data Reduction via Informative Feature Transformation- Generalization Begins Before Deep Learning starts DRIFT: Datenreduktion durch Informative Feature Transformation- Verallgemeinerung beginnt bevor Deep Learning startet DRIFT: 在深学习开始前通过信息特征转换普遍化开始减少数据 2506.19734v1 -
450 06-24 Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units Wer macht was im Deep Learning? Multidimensionale Game-Theoretische Zuordnung der Funktion neuraler Einheiten 谁在深层学习中做什么? 神经单位功能的多层面游戏理论归属 2506.19732v1 -
451 06-24 IgCONDA-PET: Weakly-Supervised PET Anomaly Detection using Implicitly-Guided Attention-Conditional Counterfactual Diffusion Modeling – a Multi-Center, Multi-Cancer, and Multi-Tracer Study IgCONDA-PET: Schwachüberwachte PET-Anomalie-Erkennung mittels implizit geführter Aufmerksamkeits-Bedingtheits-Kontrafaktual-Diffusionsmodellierung – eine Multi-Center-, Multi-Cancer- und Multi-Tracer-Studie IgCONDA-PET:使用隐性引导的注意-有条件反扩散模型 – – 多中心、多癌症和多跟踪研究 – – 多中心、多癌症和多跟踪研究 – – 进行微弱超弱PET异常探测 2405.00239v3 -
452 06-24 Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving Lokale Look-Ahead-Anleitung über Verifier-in-the-Loop für automatisierte Theorem-Proving 通过自动理论验证人在线验证人进行自动理论验证,指导当地目视中心 2503.09730v2 -
453 06-24 Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty Geometrisch-Bewusst Variationelle Schlussfolgerung: Robuste und adaptive Regularisierung mit Richtungsgewichtsunsicherheit 几何-软件变化推断:强力和适应性规范化,具有方向性重量不确定性 2506.19726v1 -
454 06-24 Identifying Unknown Stochastic Dynamics via Finite expression methods Unbekannte Stochastische Dynamik über Finite-Expression-Methoden identifizieren 通过 Finite 表达式方法识别未知的斯托卡动态 2504.07085v3 -
455 06-24 Conservative quantum offline model-based optimization Konservative Quanten-Offline-Modell-basierte Optimierung 保守性量子离线离线模型优化 2506.19714v1 -
456 06-24 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales Anleitung im Frequenzbereich ermöglicht High-Fidelity-Sampling bei niedrigen CFG-Skalen CFG 低CFG 尺度高频域允许高频度抽样的指南 2506.19713v1 -
457 06-24 Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks Lernen-unterstützte Bigraph Matching Ansatz zur Multi-Crew Wiederherstellung beschädigter Stromnetze mit Straßentransport-Netzwerke gekoppelt 与公路运输网相结合的多组恢复受损电力网的学习辅助活书匹配方法 2506.19703v1 -
458 06-24 Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Ausreißer-sicheres Pre-Training für robuste 4-Bit Quantisierung großer Sprachmodelle 大语言模式强力四比四比四的量化培训前培训 2506.19697v1 -
459 06-24 Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networks Nahezu optimale Schätzungen für die Konstanten $\ell^p$-Lipschitz von tiefen zufälligen ReLU-Neuralnetzwerken 深随机RLU神经网络的 $\ ell\ p$- Lipschitz 常数近于最佳的估计值 2506.19695v1 -
460 06-24 ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping ReBoot: Verschlüsseltes Training von Deep Neural Networks mit CKKS Bootstrapping ReBoot:使用 CKKS 启动系统加密深神经网络培训 2506.19693v1 -
461 06-24 Leveraging Lightweight Generators for Memory Efficient Continual Learning Leveraging Lightweight Generators für Speicher Effizientes kontinuierliches Lernen 利用轻型发电机促进记忆高效持续学习 2506.19692v1 -
462 06-24 AYLA: Amplifying Gradient Sensitivity via Loss Transformation in Non-Convex Optimization AYLA: Verstärkte Gradientenempfindlichkeit durch Verlusttransformation in nicht konvexer Optimierung AYLA:通过非Convex优化的亏损转化增强渐进感敏度 2504.01875v2 -
463 06-24 When Can We Reuse a Calibration Set for Multiple Conformal Predictions? Wann können wir eine Kalibrierung für mehrere konforme Vorhersagen wiederverwenden? 什么时候我们才能重新使用一个校准装置 来进行多常规的预测呢? 2506.19689v1 -
464 06-24 Model Guidance via Robust Feature Attribution Modellführung über robuste Eigenschaftszuweisung 通过强力地物学示范指导 2506.19680v1 -
465 06-24 Extreme Learning Machines for Exoplanet Simulations: A Faster, Lightweight Alternative to Deep Learning Extreme Lernmaschinen für Exoplanetensimulationen: Eine schnellere, leichte Alternative zum Deep Learning 用于Explanet模拟的极端学习机器:一种更快、轻量比深层学习的替代方法 2506.19679v1 -
466 06-24 Higher-Order Graph Databases Graphendatenbanken mit höherer Ordnung 高层图表数据库 2506.19661v1 -
467 06-24 PEVLM: Parallel Encoding for Vision-Language Models PEVLM: Parallele Kodierung für Vision-Language-Modelle PEVLM: 视觉语言模型平行编码 2506.19651v1 -
468 06-24 Tensor-Parallelism with Partially Synchronized Activations Tensor-Parallelismus mit teilweise synchronisierten Aktivierungen 具有部分同步激活作用的长者-长者-长者主义 2506.19645v1 -
469 06-24 Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model Unüberwachte Datengenerierung für Offline-Verstärkung Lernen: Eine Perspektive vom Modell 未受监督的离线强化学习数据生成:模式的视角 2506.19643v1 -
470 06-24 Hierarchical Time Series Forecasting Via Latent Mean Encoding Hierarchische Zeitreihen über latente mittlere Kodierung prognostizieren 预测Via 隐中平均值编码的等级时间序列 2506.19633v1 -
471 06-24 Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations Warum Ungewissheitskalibrierung zählt für zuverlässige Perturbation-basierte Erklärungen 以可靠干扰为基础的解释的不确定性校准为何重要 2506.19630v1 -
472 06-24 Operator Forces For Coarse-Grained Molecular Dynamics Bedienerkräfte für geradlinige molekulare Dynamiken 粗粗粒分子动态操作员力量 2506.19628v1 -
473 06-24 Scaling Up Unbiased Search-based Symbolic Regression Skalierung unvoreingenommen Suchbasierte Symbolische Regression 增强无偏向的反向( U) 2506.19626v1 -
474 06-24 Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges Multimodales maschinelles Lernen in der psychischen Gesundheit: Eine Erhebung von Daten, Algorithmen und Herausforderungen 心理健康中多式机器学习:数据调查、判断力和挑战 2407.16804v2 -
475 06-24 Contactless Cardiac Pulse Monitoring Using Event Cameras Kontaktlose Herz Pulsüberwachung mit Ereigniskameras 使用事件相机进行无触碰心心心脏病脉动监测 2505.09529v2 -
476 06-24 ECG-SMART-NET: A Deep Learning Architecture for Precise ECG Diagnosis of Occlusion Myocardial Infarction EKG-SMART-NET: Eine Deep-Learning-Architektur für präzise EKG-Diagnose des Okklusionsmyokardinfarkts ECG-SMART-NET: 精密ECG心肌梗塞症诊断的深学习结构 2405.09567v2 -
477 06-24 A text-to-tabular approach to generate synthetic patient data using LLMs Ein text-to-tabuläres Konzept zur Generierung synthetischer Patientendaten mit Hilfe von LLMs 使用LLMM 生成合成病人数据的一种文本到表格的方法 2412.05153v2 -
478 06-24 Beyond Static Models: Hypernetworks for Adaptive and Generalizable Forecasting in Complex Parametric Dynamical Systems Jenseits statischer Modelle: Hypernetworks für adaptive und generalisierbare Vorhersagen in komplexen parametrischen dynamischen Systemen 超越静态模型:复杂参数动态系统适应性和可通用预报超网络 2506.19609v1 -
479 06-24 ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP ChordPrompt: Orchestrierung von Cross-Modal Prompt Synergy für Multi-Domain Incremental Learning in CLIP ChordPrompt:CLIP中多领域递增学习的交织式跨模式即时同步协同 2506.19608v1 -
480 06-24 Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks Konstruktive Universal-Annäherung und Finite Sample-Memorisation durch Narrow Deep ReLU Networks 由窄深深RELU网络进行 2409.06555v2 -
481 06-24 Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases Diff-Def: Diffusionsgenerierte Deformationsfelder für Bedingte Atlase Diff- Def: 用于条件图集的 Diff- Def: 用于条件图集的 Dif- 扩散- 驱动解析字段 2403.16776v3 -
482 06-24 Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra Training Flexible Modelle genetischer Variant-Effekte aus funktionellen Anmerkungen mit beschleunigter Linear Algebra 使用加速线性线性代数对功能说明的遗传变异效应灵活模型的培训 2506.19598v1 -
483 06-24 Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications Vision Transformer-basierte Zeitreihen-Bildrekonstruktion für Cloud-Filling-Anwendungen 为云层填云应用而重建基于时间-系列图像 2506.19591v1 -
484 06-24 ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks ConStellaration: Ein Datensatz von QI-ähnlichen Stellaratoren-Plasmagrenzen und Optimierungs-Benchmarks 交配:一套类似 QI 星际等离子体边界和优化基准的数据集 2506.19583v1 -
485 06-24 Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention Realistisches Bild-zu-Bild-Maschine-Entlernen durch Entkopplung und Wissensretention 通过脱钩和知识保留消除学习 2502.04260v2 -
486 06-24 Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects Fake oder Real, Können Roboter erzählen? Evaluieren von körpereigenen Vision-Sprachenmodellen auf realen und 3D-gedruckten Objekten 假的还是假的,机器人能告诉吗?评价关于真物和3D实用物的内嵌视觉语言模型 2506.19579v1 -
487 06-24 FAF: A Feature-Adaptive Framework for Few-Shot Time Series Forecasting FAF: Ein Feature-Adaptive-Framework für die Vorhersage von Kurzzeitreihen FAF: 低热时间序列预测的特征-适应性框架 2506.19567v1 -
488 06-24 Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees Neudenken Neurale Kombinatorische Optimierung für Fahrzeugrouting-Probleme mit unterschiedlichen Engegraden 重新思考具有不同紧紧度的车辆运行问题神经组合优化 2505.24627v2 -
489 06-24 ConCM: Consistency-Driven Calibration and Matching for Few-Shot Class-Incremental Learning ConCM: Konsistenzgetriebene Kalibrierung und Passung für das wenige-heiße Klassen-Inkrementelle Lernen CCCM: 校准和校准低热级高级强化学习 2506.19558v1 -
490 06-24 General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound General Methods Make Great Domain-spezifische Foundation Models: Eine Fallstudie über Fetal Ultrasound 通用方法:胎儿超声波案例研究 2506.19552v1 -
491 06-24 Discovering Symmetries of ODEs by Symbolic Regression Symmetrien von ODEs durch symbolische Regression entdecken 通过符号回归发现对 ODE 的对称 2506.19550v1 -
492 06-24 RCStat: A Statistical Framework for using Relative Contextualization in Transformers RCStat: Statistischer Rahmen für die Verwendung der relativen Kontextualisierung in Transformern RCStat: 在变异器中使用相对环境化的统计框架 2506.19549v1 -
493 06-24 Overtuning in Hyperparameter Optimization Overtuning in Hyperparameter-Optimierung 超重参数超强优化 2506.19540v1 -
494 06-24 Dimension Reduction for Symbolic Regression Dimensionsreduzierung für symbolische Regression 减少内效退退的尺寸 2506.19537v1 -
495 06-24 Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks Identifizieren physikalisch realisierbare Auslöser für Backdoored Face Recognition Networks 识别后门脸部识别网络的有形可实现触发器 2506.19533v1 -
496 06-24 A Framework for Uncertainty Quantification Based on Nearest Neighbors Across Layers Ein Rahmen für die Unsicherheitsquantifizierung auf der Grundlage der nächsten Nachbarländer über Schichten hinweg 基于跨层近邻的不确定性定量框架 2506.19895v1 -
497 06-24 Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges Auf dem Weg zu robusten Stabilitätsprognosen in Smart Grids: GAN-basierter Ansatz unter Datenbeschränkungen und adversarialen Herausforderungen 实现智能网格中强有力的稳定预测:在数据制约和反向挑战下采取基于全球网络的方法 2501.16490v2 -
498 06-24 Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration Auf dem Weg zu einem unüberwachten Mehr-Agenten-Verstärkungs-Lernen durch Task-Agnostic Exploration 通过任务不可知探索实现无人监督的多机构强化学习 2502.08365v3 -
499 06-24 Explaining deep neural network models for electricity price forecasting with XAI Erläutern von Deep-Neural-Netzwerk-Modellen für die Strompreisprognose mit XAI 解释与XAI公司一道进行电力价格预测的深层神经网络模型 2506.19894v1 -
500 06-24 Visual hallucination detection in large vision-language models via evidential conflict Visuelle Halluzinationserkennung in großen visionssprachlichen Modellen über Beweiskonflikte 通过证据冲突在大型视觉语言模型中发现视觉幻觉 2506.19513v1 -
501 06-24 TrainVerify: Equivalence-Based Verification for Distributed LLM Training TrainVerify: Gleichwertigkeitsbasierte Überprüfung für verteiltes LLM-Training 培训核查:分布式LLM培训的等效核查 2506.15961v2 -
502 06-24 Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks Destillationsfähiges Wissen Ausrichtung für generative semantische Kommunikation in AIGC Provisioning-Aufgaben 在AIGC提供任务中产生语义通信的知识协调 2506.19893v1 -
503 06-24 RepuNet: A Reputation System for Mitigating Malicious Clients in DFL RepuNet: Ein Reputationssystem zur Bekämpfung bösartiger Kunden in der DFL RepuNet:DFL中减少恶意客户的声望系统 2506.19892v1 -
504 06-24 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v1 -
505 06-24 NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling NaviAgent: Bilevel-Planung auf Werkzeugabhängigkeitsgraphen für Funktionsaufruf NaviAgent: 功能调用工具依赖图双层规划 2506.19500v1 -
506 06-24 HeNCler: Node Clustering in Heterophilous Graphs via Learned Asymmetric Similarity Hencler: Knoten-Clustering in heterophilen Graphen mittels Asymmetrischer Ähnlichkeit HENCler:通过取得非对称相似性,将异生物性图案的节点分组 2405.17050v2 -
507 06-24 COLUR: Confidence-Oriented Learning, Unlearning and Relearning with Noisy-Label Data for Model Restoration and Refinement COLUR: Vertrauensorientiertes Lernen, Unlearning und Relearning mit Noisy-Label-Daten zur Modellrestauration und -verfeinerung COLUR: 以信心为导向的学习、不学习和再学习,使用噪音标签数据促进示范恢复和完善 2506.19496v1 -
508 06-24 Tunable correlation retention: A statistical method for generating synthetic data Tunable Korrelationsspeicherung: Eine statistische Methode zur Generierung synthetischer Daten 保留可裁判的关联性:生成合成数据的统计方法 2403.01471v3 -
509 06-24 Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story Steigerung der Vielfalt bei Parallelagenten: Eine höchstmögliche Entropie-Explorationsgeschichte 增强平行代表机构的多样性:最大国家宇宙体探索空间 2505.01336v2 -
510 06-24 Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy Erinnern an die vergessene Klasse Mitgliedschaften: Unerlernte Modelle können laute Labeler sein, um Leak Privacy 回顾《被遗忘的阶级成员:未学习的模型》, 2506.19486v1 -
511 06-24 Privacy Attacks on Image AutoRegressive Models Datenschutzangriffe auf Image AutoRegressive Modelle 对图像自动递减模型的隐私攻击 2502.02514v4 -
512 06-24 Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning Schnelle und distributed äquivariant Graph Neural Networks by Virtual Node Learning 通过虚拟节点学习快速和分布的等同图形神经网络 2506.19482v1 -
513 06-24 ADDQ: Adaptive Distributional Double Q-Learning ADDQ: Adaptive Verteilung Doppeltes Q-Lernen ADDQ:适应性分配双重学习 2506.19478v1 -
514 06-24 Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense Tiefe neuronale Netzwerke mit ReLU, undichtem ReLU und Softplus-Aktivierung überwinden nachweislich den Fluch der Dimensionalität für Kolmogorov partielle Differentialgleichungen mit Lipschitz-Nichtlinearitäten im $L^p$-Sense 与RELU、渗漏ReLU和软增压激活的深神经网络可以克服Kolmogorov部分差异方程式的维度诅咒,Lipschitz非线性方程式以$Lúp$-sense为单位。 2309.13722v2 -
515 06-24 Uncertainty Quantification on Graph Learning: A Survey Ungewissheit Quantifizierung des Graphenlernens: Eine Umfrage 图表学习的不确定性量化:调查 2404.14642v3 -
516 06-24 Orthogonal Soft Pruning for Efficient Class Unlearning Orthogonale Soft Pruning für effizientes Lernen 为高效班级取消学习而整形软节奏 2506.19891v1 -
517 06-24 Stylized Structural Patterns for Improved Neural Network Pre-training Stylisierte Strukturmuster für verbesserte Neural-Netzwerk-Vorausbildung 改善神经网络的固定结构模式 2506.19465v1 -
518 06-24 Tagged for Direction: Pinning Down Causal Edge Directions with Precision Tagged for Richtung: Pinning Down Causal Edge Richtungen mit Präzision 指向标记: 精密地弯曲下构造边缘方向 2506.19459v1 -
519 06-24 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Mischung aus Cache-Conditional Experts für effiziente mobile Geräteableitung 高效移动设备引力缓存-条件专家混合 2412.00099v2 -
520 06-24 Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search Low-Complexity Semantic Packet Aggregation für Token Communication via Lookahead Search 通过 Lookahead 搜索建立Tokon 通信的低复杂度语义包装集成 2506.19451v1 -
521 06-24 SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification SSPS: Selbstüberwachte positive Probenahme für robuste selbstüberwachte Lautsprecherverifizierung SSPS: 自我监督的自我监督发言人自我监督的积极抽样核查 2505.14561v2 -
522 06-24 The Elements of Differentiable Programming Die Elemente der differenzierbaren Programmierung 不同方案拟订要素 2403.14606v3 -
523 06-24 Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Verstärkung Learning 重力-引力引导焦点集中影响多机构强化学习机制中心 2506.19417v1 -
524 06-24 Multi-Continental Healthcare Modelling Using Blockchain-Enabled Federated Learning Multi-Continental Healthcare Modellierung mittels Blockchain-Enabled Federated Learning 利用综合链链-能连链的联邦学习模式建立多州保健模式 2410.17933v3 -
525 06-24 Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models Meta-Reasoner: Dynamische Anleitung zur optimierten Schlussfolgerungs-Zeit-Reasoning in großen Sprachmodellen Meta-Reasoner:大语言模型中优化推断-时间理由的动态指导 2502.19918v3 -
526 06-24 Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v2 -
527 06-24 M3D: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition M3D: Manifold-based Domain Adaptation mit dynamischer Distribution für nicht-deep Transfer Learning in Cross-Subjekt und Cross-Session EEG-based Emotion Recognition M3D: 跨学科和跨学科EEG的情感识别中非深入转移学习动态分布的多功能适应和基于多科目和跨学科EEG的情感识别中非深入转移学习动态分布 2404.15615v3 -
528 06-24 Improved and Explainable Cervical Cancer Classification using Ensemble Pooling of Block Fused Descriptors Verbesserte und erklärbare Cervical Cancer Classification mit Ensemblepooling von Block Fused Descriptors 使用聚在一起的聚聚聚块阻燃描述词块改进子宫颈癌分类和可解释的子宫颈癌分类 2405.01600v2 -
529 06-24 Controllable Video Generation with Provable Disentanglement Steuerbare Video-Generation mit wahrnehmbarer Entwirrtheit 带可变解脱的可控视频生成 2502.02690v2 -
530 06-24 Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators Maximale Aktualisierung Parametrisierung und Null-Shot-Hyperparameter-Übertragung für Fourier-Neural-Betreiber Fourier神经操作员最大更新平衡化和零热超强参数转换 2506.19396v1 -
531 06-24 ANOVA-boosting for Random Fourier Features ANOVA-Boosting für zufällige Fourier-Features ANOVA 启动随机 Fourier 特性 2404.03050v2 -
532 06-24 Causal-Aware Intelligent QoE Optimization for VR Interaction with Adaptive Keyframe Extraction Causal-Aware Intelligente QoE-Optimierung für VR-Interaktion mit adaptiver Keyframe-Extraktion VR 与适应性键框架的提取互动的优化 QoE 2506.19890v1 -
533 06-24 Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees Bewältigen sich Vendi-Scores mit Finite-Proben? Beschnittener Vendi-Score für Finite-Sample-Konvergenzgarantien Vendi 分数是否与有限样本相连接? 2410.21719v3 -
534 06-24 NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs NAADA: A Noise-Aware Aufmerksamkeit Denoising Autoencoder für Dental Panoramic Radiographs a. 用于牙科全景射电辐射仪的噪音警报器注意自动编码器 2506.19387v1 -
535 06-24 Deep Electromagnetic Structure Design Under Limited Evaluation Budgets Deep Elektromagnetic Structure Design unter begrenzter Bewertung Budgets 有限评价预算下的深电磁结构设计 2506.19384v1 -
536 06-24 Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning Erklärbare Künstliche Intelligenz Bonitätsbeurteilung mittels maschinellem Lernen 利用机器学习进行可解释的人工智能信息信用风险评估 2506.19383v1 -
537 06-24 ReDit: Reward Dithering for Improved LLM Policy Optimization ReDit: Belohnung für verbesserte LLM-Policy-Optimierung Redit:为改进LLM政策优化而向优利分差 2506.18631v2 -
538 06-24 Path Learning with Trajectory Advantage Regression Pfad-Lernen mit Trajektor-Vorteil Regression 路径学习与轨迹优于后退的路径学习 2506.19375v1 -
539 06-24 Flopping for FLOPs: Leveraging equivariance for computational efficiency Flopping für FLOPs: Equivarianz für Berechnungseffizienz FLOPs 的浮动 : 利用计算效率的等差 2502.05169v2 -
540 06-24 WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT WebGuard++:Interpretable bösartige URL-Erkennung durch bidirektionale Fusion von HTML-Subgraphen und multi-Scale Convolutional BERT WebGuard++: 通过 HTML 子集成和多波段进化 BERT 双向融合来可解释的恶意 URL 探测 2506.19356v1 -
541 06-24 In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly In-Context Occams Razor: Wie Transformer einfachere Hypothesen auf der Fliege bevorzugen 内文 Occam 的剃刀: 如何在飞行中发生变形人更倾向于简单易碎的假说 2506.19351v1 -
542 06-24 Discrepancy-Aware Graph Mask Auto-Encoder Discrepancy-Aware Graph Maske Auto-Encoder 自动编码器 2506.19343v1 -
543 06-24 Unlocking Insights Addressing Alcohol Inference Mismatch through Database-Narrative Alignment Unlocking Insights adressing Alcohol Inferenz Mismatch durch Datenbank-Narrative Alignment 通过数据库-聚合对齐来解锁对酒精推断误差的透视 2506.19342v1 -
544 06-24 CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension CAM-NET: Ein KI-Modell für ganze Atmosphäre mit Thermosphäre und Ionosphärenerweiterung CAM-NET:具有热层和电离层扩展作用的AI全大气模型 2506.19340v1 -
545 06-24 Contrastive Cross-Modal Learning for Infusing Chest X-ray Knowledge into ECGs Kontrastives Cross-Modal-Lernen für das Einbringen von Röntgenwissen im Brustkorb in EKGs 将切斯特X射线知识注入ECG 2506.19329v1 -
546 06-24 Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack Diffusionsbasierte aufgabenorientierte semantische Kommunikation mit Model Inversion Attack 以传播为基础的以任务为导向的语义通信与模型反向攻击 2506.19886v1 -
547 06-24 Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups Summ-of-Parts: Selbstzuordnende neurale Netzwerke mit Ende-zu-Ende-Lernen von Feature-Gruppen 部分总和:自成一体的神经网络,以及特异组的端到端学习 2310.16316v4 -
548 06-24 FlightKooba: A Fast Interpretable FTP Model FlightKooba: Ein schnell interpretierbares FTP-Modell 库巴飞行:快速解释式FTP模型 2506.19885v1 -
549 06-24 Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays Adversariale Angriffe auf tief lernbasierte falsche Dateninjektionserkennung in Differentialrelais 在差异中继中对深学习假数据输入探测的反向攻击 2506.19302v1 -
550 06-24 LAuReL: Learned Augmented Residual Layer LAuReL: Erlernte Augmented Residual Layer LauReL: 积累的剩余层 2411.07501v4 -
551 06-24 SycnMapV2: Robust and Adaptive Unsupervised Segmentation SycnMapV2: Robuste und adaptive unüberwachte Segmentierung SycnMapV2:强力和适应性不受监督的分割 2506.16297v2 -
552 06-24 The Effect of Depth on the Expressivity of Deep Linear State-Space Models Der Effekt der Tiefe auf die Expressivität von Deep Linear State-Space-Modellen 深度对深线国家空间模型-深线国家空间模型的表达性的影响 2506.19296v1 -
553 06-24 Efficient Extreme Operating Condition Search for Online Relay Setting Calculation in Renewable Power Systems Based on Parallel Graph Neural Network Effiziente extreme Betriebsbedingungen Suche nach Online-Relay-Setting-Berechnung in erneuerbaren Stromsystemen basierend auf parallelem Graphen-Neural-Netzwerk 以平行图形神经网络为基础的可再生能源系统在线中继设置计算 2506.19289v1 -
554 06-24 Information-Theoretic Proofs for Diffusion Sampling Informationstheoretische Nachweise für die Diffusionsprobenahme 用于扩散取样的信息理论证据 2502.02305v2 -
555 06-24 DF2: Distribution-Free Decision-Focused Learning DF2: Verteilungsfreies entscheidungsorientiertes Lernen DF2:无分发决定-无分发决定-以学习为目的的学习 2308.05889v2 -
556 06-24 A Batch-Insensitive Dynamic GNN Approach to Address Temporal Discontinuity in Graph Streams Ein Batch-Insensibler Dynamischer GNN-Ansatz zur Adresse der zeitlichen Diskontinuität in Graph Streams 处理图表流中时间性失常问题的批量不敏感动态 GNN 方法 2506.19282v1 -
557 06-24 STIMULUS: Achieving Fast Convergence and Low Sample Complexity in Stochastic Multi-Objective Learning STIMULUS: Schnelle Konvergenz und geringe Probenkomplexität im stochastischen Multi-Ziel-Lernen STIMULUS:在托盘多目的学习中实现快速趋同和低样本复杂性 2506.19883v1 -
558 06-24 Robust OOD Graph Learning via Mean Constraints and Noise Reduction Robustes OOD Graphenlernen über mittlere Einschränkungen und Lärmreduzierung 通过中度制约和减少噪音进行强有力的 OOD 图表学习 2506.19281v1 -
559 06-24 Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach Emotion Detection on User Front-Facing App Interfaces für verbesserte Zeitplanoptimierung: Ein Ansatz zum maschinellen Lernen 增强计划优化的用户前向应用程序接口的情感探测:机械学习方法 2506.19280v1 -
560 06-24 Rare dense solutions clusters in asymmetric binary perceptrons – local entropy via fully lifted RDT Seltene dichte Lösungen Cluster in asymmetrischen binären Perzeptronen – lokale Entropie über vollständig angehobene RDT 非对称二进光线 – – 通过完全提升的区域主任小组,当地对流 2506.19276v1 -
561 06-24 Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation Compound Fault Diagnose für Zugübertragungssysteme mit Deep Learning mit Fourier-verstärkter Darstellung 利用Fourier加强的代表制进行深学习,对火车传输系统进行断层分析 2504.07155v2 -
562 06-24 A Qubit-Efficient Hybrid Quantum Encoding Mechanism for Quantum Machine Learning Ein qubit-effizienter Hybrid-Quantum-Encoding-Mechanismus für das Quantum Machine Learning 量子机器学习量子编码机制 2506.19275v1 -
563 06-24 Stabilizing PDE–ML Coupled System Stabilisierung des PDE-ML-gekoppelten Systems 稳定PDE-ML混合系统 2506.19274v1 -
564 06-24 Process Reward Models That Think Prozess Belohnung Modelle, die denken 思考的流程奖励模式 2504.16828v3 -
565 06-24 Continuous-variable Quantum Diffusion Model for State Generation and Restoration Kontinuierlich-variables Quantendiffusionsmodell für die Zustandserstellung und Wiederherstellung 国家发电和复原的连续可变量量量传播模型 2506.19270v1 -
566 06-24 Learning Treatment Representations for Downstream Instrumental Variable Regression Lern-Behandlung Darstellungen für Downstream Instrumentale Variable Regression 下下游工具递退学习治疗代表 2506.02200v2 -
567 06-24 Leveraging Large Language Models to Democratize Access to Costly Datasets for Academic Research Nutzung großer Sprachmodelle zur Demokratisierung des Zugangs zu kostengünstigen Datensätzen für die akademische Forschung 利用大语言模式使学术研究获得成本昂贵的数据集民主化 2412.02065v2 -
568 06-24 Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning Netzwerkstrukturen als Angriffsfläche: Topologiebasiertes Datenschutz-Leakage im Federated Learning 网络结构作为攻击表面:联邦学习中的基于地形的隐私渗漏 2506.19260v1 -
569 06-24 Personality Prediction from Life Stories using Language Models Persönlichkeitsvorhersage aus Lebensgeschichten mit Sprachmodellen 使用语言模型对生活故事的个性预测 2506.19258v1 -
570 06-24 Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v1 -
571 06-24 Robust Behavior Cloning Via Global Lipschitz Regularization Robustes Verhalten Klonen über globale Lipschitz Regularisierung 强力行为 克隆通过全球自由自由实现正规化 2506.19250v1 -
572 06-24 Inference-Time Reward Hacking in Large Language Models Inferenz-Time Reward Hacking in großen Sprachmodellen 大语种模型中的推定-时间回授 2506.19248v1 -
573 06-24 Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning Verhaltensanomalienerkennung in verteilten Systemen über Federated Contrastive Learning 通过联邦反竞争学习在分布式系统中进行行为异常检测 2506.19246v1 -
574 06-24 Universal kernels via harmonic analysis on Riemannian symmetric spaces Universelle Kerne durch harmonische Analyse auf Riemannschen symmetrischen Räumen 通过对里格曼对称空间的和谐分析实现通用内核 2506.19245v1 -
575 06-24 SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation SASSHA: Scharfheitsbewusste Adaptive Second-Order-Optimierung mit stabiler hessischer Annäherung SASSHA: 使用稳定黑森相近的优化度 2502.18153v2 -
576 06-24 High precision PINNs in unbounded domains: application to singularity formulation in PDEs Hochpräzise PINNs in ungebundenen Domänen: Anwendung auf Singularitätsformulierung in PDEs 在无约束域域的高精精密 PINNs: 应用到PDEs 的独一配方 2506.19243v1 -
577 06-24 Understanding Reasoning in Thinking Language Models via Steering Vectors Verständnis von Vernunft im Denken von Sprachmodellen über Lenkungs-Vektoren 通过指导矢量来理解思考语言模式的理由 2506.18167v2 -
578 06-24 A General Framework for Property-Driven Machine Learning Ein allgemeiner Rahmen für eigentumsorientiertes maschinelles Lernen 财产驱动机器学习总框架 2505.00466v2 -
579 06-24 Limits of Discrete Energy of Families of Increasing Sets Grenzen der diskreten Energie von Familien zunehmender Sets 增加组家庭不同能源限度的限制 2504.11302v2 -
580 06-24 Private Model Personalization Revisited Private Modell-Personalisierung überarbeitet 重新研究的私人个人模式 2506.19220v1 -
581 06-24 Iterative Minimax Games with Coupled Linear Constraints Iterative Minimax Spiele mit gekoppelten linearen Einschränkungen 带有连线限制的迭接小型游戏 2212.04672v5 -
582 06-23 (1) Transferring Features Across Language Models With Model Stitching Übertragung von Funktionen über Sprachmodelle mit Modellstich 使用模型裁剪的跨语言模型传输功能 2506.06609v2 -
583 06-23 Align and Distill: Unifying and Improving Domain Adaptive Object Detection Align and Distill: Domain-Adaptive-Objekterkennung vereinheitlichen und verbessern 调整和蒸馏:统一和改进域适应性物体探测 2403.12029v4 -
584 06-23 Simulation of a closed-loop dc-dc converter using a physics-informed neural network-based model Simulation eines Closed-Loop-DC-Wandlers mit einem physik-informierten neuronalen Netzwerk-basierten Modell 使用以物理知情神经网络为基础的模型模拟闭闭环dc-dc转换器 2506.19178v1 -
585 06-23 Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes Maschinen und mathematische Mutationen: Verwendung von GNNs zur Charakterisierung von Quiver-Mutationsklassen 机器和数学变异:使用 GNNs 来定性 Quiver 变异分类 2411.07467v2 -
586 06-23 The Gittins Index: A Design Principle for Decision-Making Under Uncertainty Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit Gittins指数:不确定性下决策的设计原则 2506.10872v2 -
587 06-23 Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms Realistische gemeinsame Raumgrenzen für die Bewegungsanalyse gesunder und beeinträchtigter menschlicher Arme lernen 人体健康与残疾武器运动分析范围空间联合学习现实联合空间边界 2311.10653v3 -
588 06-23 Distilling Tool Knowledge into Language Models via Back-Translated Traces Destillieren von Werkzeugwissen in Sprachmodelle über Back-Germany Traces 通过后转路径将工具知识提炼成语言模型 2506.19171v1 -
589 06-23 A Deep Learning Based Method for Fast Registration of Cardiac Magnetic Resonance Images Eine Deep Learning-basierte Methode zur schnellen Registrierung von Herz-Magnetresonanz-Bildern 快速登记心电磁共振图像的深学习法 2506.19167v1 -
590 06-23 GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model GradualDiff-Fed: Ein Federated Learning Specialized Framework für großes Sprachmodell 逐步发展伙伴关系:联邦学习大语言模式专门框架 2506.19164v1 -
591 06-23 ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs ProxSparse: Regularisiertes Lernen von halbstrukturierten Sparsity Masken für vortrainierte LLMs ProxSparse:为预先培训的LMM 定期学习半结构化半结构化的顶罩 2502.00258v2 -
592 06-23 Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality Hintere Kontraktion für Sparse Neuronale Netzwerke in Besov-Räumen mit Intrinsischer Dimensionalität 贝索夫空间内有内分层的微孔神经网络的皮层收缩 2506.19144v1 -
593 06-23 EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding EEG-Stiftungsherausforderung: Von der Cross-Task zur Cross-Subject-EEG-Dekodierung EEG基金会挑战:从跨任务到跨主题的EEG解码 2506.19141v1 -
594 06-23 Command-V: Pasting LLM Behaviors via Activation Profiles Befehl-V: Einfügen von LLM-Behaviors über Aktivierungsprofile 命令- V: 通过激活剖析档粘贴 LLM 行为 2506.19140v1 -
595 06-23 Local Learning Rules for Out-of-Equilibrium Physical Generative Models Lokale Lernregeln für Physische Generative Modelle außerhalb des Equilibriums 外部平衡物理生成模型的地方学习规则 2506.19136v1 -
596 06-23 Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v2 -
597 06-23 Riemannian generative decoder Riemannischer Generativ-Decoder 里伊曼尼基因解码器 2506.19133v1 -
598 06-23 Finding Clustering Algorithms in the Transformer Architecture Clustering-Algorithmen in der Transformer-Architektur finden 在变换结构中查找聚集的算法 2506.19125v1 -
599 06-23 CUPID: Curating Data your Robot Loves with Influence Functions CUPID: Daten kuratieren, die Ihr Roboter mit Einflussfunktionen liebt CUPID: 计算机器人爱的有影响函数的曲线数据 2506.19121v1 -
600 06-23 Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models Blameless Users in einem sauberen Raum: Definition des Urheberrechtsschutzes für generative Modelle 清洁室内的无名用户:界定对创源模式的版权保护 2506.19881v1 -
601 06-23 On the algorithmic construction of deep ReLU networks Zur algorithmischen Konstruktion von tiefen ReLU-Netzwerken 关于深ReLU网络的算法构造 2506.19104v1 -
602 06-23 ADVLLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities ADVLLM: Iterative Selbst-Tuning LLMs für verbesserte Jailbreaking-Fähigkeiten ADVLLM: 强化破室能力自动自动自调LMs 2410.18469v4 -
603 06-23 Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks Code Graph Model (CGM): Ein Graph-integriertes Large Language Model für Repository-Level Software Engineering Aufgaben 代码图表模型(CGM):存储层软件工程任务 2505.16901v4 -
604 06-23 Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation Lernen von stochastischen Lehrerdarstellungen mit studentisch geführter Wissensdestillation 利用学生指导知识蒸馏,从Stochatic教师代表处学习 2504.14307v2 -
605 06-23 Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes Feinsteuerung eines Weather Foundation Modells mit leichten Decodern für ungesehene physikalische Prozesse 微调天气基础模型,为未见物理过程使用轻量代谢器 2506.19088v1 -
606 06-23 Benchmarking Music Generation Models and Metrics via Human Preference Studies Benchmarking von Musikgenerierungsmodellen und Metrics über Human Preference Studies 通过人类特惠研究制定音乐创作模型和计量基准 2506.19085v1 -
607 06-23 FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation FaircauseSyn: Auf dem Weg zu einer ursächlich fairen LLM-generierten synthetischen Datengenerierung FairCreause Syn: 迈向产生公平而公平的LLM – – 增强的合成数据 2506.19082v1 -
608 06-23 First-Order Sparse Convex Optimization: Better Rates with Sparse Updates Sparse Convex Optimization: Bessere Preise mit Sparse-Updates 第一序式螺旋螺旋式最优化: 与粗序更新相比, 利率更好 。 2506.19075v1 -
609 06-23 Which Company Adjustment Matter? Insights from Uplift Modeling on Financial Health Welches Unternehmen passt sich an? Einblicke aus dem Uplift Modelling on Financial Health 哪些公司调整重要?从提升金融健康模型中的观点 2506.19049v1 -
610 06-23 Rational Metareasoning for Large Language Models Rationale Metaveraking für große Sprachmodelle 大语言模型的逻辑比值 2410.05563v3 -
611 06-23 Self-reflecting Large Language Models: A Hegelian Dialectical Approach Selbstreflektierende große Sprachmodelle: Ein hegelianischer dialektischer Ansatz 自我反映大语言模型:海格利人对立方法 2501.14917v6 -
612 06-23 Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training Critical Batch Size Revisited: Ein einfacher empirischer Ansatz für großflächige Sprachmodellschulungen 重新审视关键批量大小:大型批量语文示范培训的简单经验方法 2505.23971v2 -
613 06-23 Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments Online-Lernen für dynamischen Vickrey-Clarke-Groves-Mechanismus in sequenziellen Auktionen unter unbekannten Umgebungen 在未知环境中有顺序拍卖的动态Vickrey-Clark-Groves机制在线学习 2506.19038v1 -
614 06-23 Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Robustes Verstärktes Lernen aus menschlichem Feedback für große Sprachmodelle Feintuning 从人类反馈中学习大语言模型精美调整 2504.03784v4 -
615 06-23 Plan for Speed – Dilated Scheduling for Masked Diffusion Language Models Plan für Geschwindigkeit – Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle 速度计划 – – 蒙面传播语言模型的压缩排程计划 2506.19037v1 -
616 06-23 Failure Modes of Time Series Interpretability Algorithms for Critical Care Applications and Potential Solutions Failure Modes of Time Series Interpretations-Algorithmen für kritische Pflegeanwendungen und mögliche Lösungen 关键护理应用和潜在解决方案的可解释性数值 2506.19035v1 -
617 06-23 When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets Wenn Diffusionsmodelle merken: Induktive Biasen in der Wahrscheinlichkeit Fluss von minimal-Norm Shallow Neural Nets 当传播模型时 记忆化:最低浅质神经网可能性流动中的诱导二分法 2506.19031v1 -
618 06-23 Emergent Risk Awareness in Rational Agents under Resource Constraints Emergent Risk Awareness in Rational Agents unter Ressourcenbeschränkungen 资源限制下对合理代理的新兴风险意识 2505.23436v3 -
619 06-23 Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives Statistische Schlussfolgerung für optimale Verkehrskarten: Jüngste Fortschritte und Perspektiven 最佳运输地图统计推论:最新进展和前景 2506.19025v1 -
620 06-23 Double Machine Learning for Conditional Moment Restrictions: IV Regression, Proximal Causal Learning and Beyond Doppeltes maschinelles Lernen für bedingten Moment Einschränkungen: IV Regression, proximales Kausallernen und darüber hinaus 有条件时刻限制的双机学习:四级递减、近似因果学习及以后 2506.14950v2 -
621 06-23 Automating Traffic Monitoring with SHM Sensor Networks via Vision-Supervised Deep Learning Automatisieren der Verkehrsüberwachung mit SHM-Sensornetzwerken über Vision-Supervised Deep Learning 通过视觉监督深层学习,与南高频传感器网络进行自动化交通监测 2506.19023v1 -
622 06-23 Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions Simulationsbasierte Sensitivitätsanalyse in Optimalen Behandlungsregimen und kausaler Zersetzung mit individualisierten Interventionen 最佳治疗制度和与个性化干预相结合的因果分解中的模拟-基于模拟的敏感度分析 2506.19010v1 -
623 06-23 Steering Conceptual Bias via Transformer Latent-Subspace Activation Steuerung konzeptioneller Bias über Transformer Latent-Subspace-Aktivierung 通过变换器中子空间动力动动 2506.18887v1 -
624 06-23 Accurate and scalable exchange-correlation with deep learning Genaue und skalierbare Austauschkorrelation mit Deep Learning 与深深学习的准确和可缩放的交换关系 2506.14665v3 -
625 06-23 A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe 时间序列中人类在Loop异常探测的可靠框架 2405.03234v4 -
626 06-23 CDI: Copyrighted Data Identification in Diffusion Models CDI: Copyrighted Data Identification in Diffusion Models CDI: 传播模型中的版权数据识别 2411.12858v3 -
627 06-23 Controlling Moments with Kernel Stein Discrepancies Kontrollieren von Momenten mit Kernel Stein Diskrepanzen 控制内核施用技术差异的控控时刻 2211.05408v7 -
628 06-23 EXPRTS: Exploring and Probing the Robustness ofTime Series Forecasting Models EXPRTS: Erforschung und Erprobung der Robustheit von Zeitreihenprognosemodellen EXPRTS:探索和检验时间系列预测模型的强劲性 2403.03508v2 -
629 06-23 A Comment On “The Illusion of Thinking”: Reframing the Reasoning Cliff as an Agentic Gap Ein Kommentar zu “Die Illusion des Denkens”: Den vernünftigen Cliff als Agent-Gap zurückweisen 关于“思考的幻觉”的评论:将理性断裂重新定位为一种危险差距 2506.18957v1 -
630 06-23 Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment Segmentation-Aware Generatives Verstärkungsnetzwerk (GRN) für Tissue Layer Segmentierung in 3-D-Ulbrosound-Bildern für chronische Rückenschmerzen (cLBP) Assessment 三维超声图像中3-三维超声图像中用于慢性低位疼痛(cLBBP)的 组织图层分层(CLBP)评估 2501.17690v3 -
631 06-23 LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth LIGHTHOUSE: Schneller und präziser Abstand zu Küstenberechnungen von überall auf der Erde 从地球上任何地方 快速和精确的距离到海岸线的计算 2506.18842v1 -
632 06-23 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning LongWriter-Zero: Mastering Ultra-Long Text Generation via Verstärkungslernen LongWriter-Zero:通过强化学习掌握超大龙制文本 2506.18841v1 -
633 06-23 A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection Eine umfassende Untersuchung von Techniken des maschinellen Lernens zur logbasierten Anomalieerkennung 全面研究用于基于日志异常探测的机器学习技术 2307.16714v5 -
634 06-23 Conformal Prediction for Causal Effects of Continuous Treatments Konforme Vorhersage für ursächliche Wirkungen kontinuierlicher Behandlungen 持续治疗因果影响非正式预测 2407.03094v3 -
635 06-23 Regularized Neural Ensemblers Regularisierte Neurale Ensemblers 正规神经组 2410.04520v2 -
636 06-23 Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators Kernel-Spektralfugeneinbettungen für hochdimensionale laute Datensätze mit Duo-Landmark-Integraloperatoren 使用双陆标记集成操作器进行高维噪音数据集的内核光谱联合嵌入 2405.12317v2 -
637 06-23 Maximizing Confidence Alone Improves Reasoning Maximierung des Vertrauens allein verbessert die Vernunft 使信心最大化单独提高合理性 2505.22660v3 -
638 06-23 Multi-Agent Online Control with Adversarial Disturbances Multi-Agent Online-Steuerung mit störenden Störungen 具有对抗骚乱的多代理在线控制 2506.18814v1 -
639 06-23 Learning Physical Systems: Symplectification via Gauge Fixing in Dirac Structures Physikalische Systeme lernen: Symplektifizierung über Messstreifenfixierung in Dirac-Strukturen 学习物理系统:通过在Dirac结构中定额进行定额的症状 2506.18812v1 -
640 06-23 Image Captions are Natural Prompts for Text-to-Image Models Bildunterschriften sind natürliche Prompts für Text-zu-Image-Modelle 图像说明是文本到图像模型的自然提示 2307.08526v2 -
641 06-23 Simple and Critical Iterative Denoising: A Recasting of Discrete Diffusion in Graph Generation Einfaches und kritisches iteratives Denoisieren: Eine Neuformulierung von diskreter Diffusion in der Graphengenerierung 简单和关键迭代代代代代代:图生成中分辨扩散的重新定性 2503.21592v2 -
642 06-23 A Multi-view Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction Ein Multi-View Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction 与药物有关的微生物预测多视图差异-信念-特征增强框架 2506.18797v1 -
643 06-23 Focus Your Attention: Towards Data-Intuitive Lightweight Vision Transformers Fokussieren Sie Ihre Aufmerksamkeit: Auf dem Weg zu datenintuitiven Leichtbautransformatoren 关注焦点:面向数据直观的轻量级视觉变异器 2506.18791v1 -
644 06-23 Learning to Insert for Constructive Neural Vehicle Routing Solver Einfügen lernen für konstruktive Neural Vehicle Routing Solver 用于建设型神经车辆路标解答器的“插入学习” 2505.13904v2 -
645 06-23 Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning Shift Happens: Mischung aus Experten basierende kontinuierliche Anpassung im Federated Learning 变化发生:基于专家的混合组合,在联邦学习中持续适应 2506.18789v1 -
646 06-23 A generalized neural tangent kernel for surrogate gradient learning Ein generalisierter neuronaler Tangentenkern für das Erlernen von Surrogatgradienten 用于代用梯度学习的普遍神经相近内核 2405.15539v2 -
647 06-23 Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems Begründung von Einschränkungen multimodaler Großsprachenmodelle. Eine Fallstudie zu Bongard-Problemen 多种多式大语言模型的理由限制,邦格问题案例研究 2411.01173v2 -
648 06-23 The Impact of Input Order Bias on Large Language Models for Software Fault Localization Die Auswirkungen der Eingabereihenfolge Bias auf große Sprachmodelle für Softwarefehlerlokalisierung 输入顺序对软件失错本地化大语言模式的影响 2412.18750v3 -
649 06-23 Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training Programmierung durch Backprop: LLMs Erwerben wiederverwendbarer algorithmischer Abstraktionen während der Code-Schulung 按后方程式分列的编程情况: 守则培训期间可重复使用的演算摘要LLM Acquire Accre Repre Reable Agrotic Empactations 2506.18777v1 -
650 06-23 Fast Bayesian Optimization of Function Networks with Partial Evaluations Schnelle Bayesian Optimierung von Funktionsnetzwerken mit teilweisen Bewertungen 利用部分评价优化功能网络 2506.11456v2 -
651 06-23 DPG loss functions for learning parameter-to-solution maps by neural networks DPG-Verlustfunktionen für das Lernen von Parameter-zu-Lösung-Karten durch neuronale Netzwerke 神经网络学习参数图解图的DPG损失函数 2506.18773v1 -
652 06-23 Neural Total Variation Distance Estimators for Changepoint Detection in News Data Neurale Gesamtvariationsdistanz-Schätzer für Changepoint Detection in News Daten 用于新闻数据中变化点探测变化点的神经总变化 2506.18764v1 -
653 06-23 Local Averaging Accurately Distills Manifold Structure From Noisy Data Lokale Mittelung genau destilliert manifold Struktur aus geräuschreichen Daten 从噪音数据生成的本地蒸馏处理结构 2506.18761v1 -
654 06-23 Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017 Robuste Anomalieerkennung im Netzwerkverkehr: Bewertung von Machine Learning-Modellen auf CICIDS2017 网络交通中的强力异常探测:评价CICIDS2017的机械学习模式 2506.19877v1 -
655 06-23 SEAL: Scaling to Emphasize Attention for Long-Context Retrieval SEAL: Skalierung zur Betonung der Aufmerksamkeit für die Langzeitretrieval-Retrieval SEAL: 逐步强调对长期检索的重视 2501.15225v2 -
656 06-23 Sensitivity Analysis of Image Classification Models using Generalized Polynomial Chaos Sensitivitätsanalyse von Bildklassifikationsmodellen mit Generalized Polynomial Chaos 利用普遍化的多面性混乱现象分析图像分类模型的敏感性分析 2506.18751v1 -
657 06-23 ContinualFlow: Learning and Unlearning with Neural Flow Matching ContinualFlow: Lernen und Nichtlernen mit neuralem Fluss passend 连续花:与神经流动匹配学习和不学习 2506.18747v1 -
658 06-23 Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression Schnelles State-Augmented-Lernen für drahtlose Ressourcenallokation mit Dual Variable Regression 以双重变量递减为无线资源分配快速国家强化学习 2506.18748v1 -
659 06-23 DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation DiffDesign: Steuerbare Diffusion mit Meta Prior für effiziente Interior Design Generation DiffDign: 有效内部设计设计前可控制的Meta扩散 2411.16301v3 -
660 06-23 Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments Experimentieren, schnell und langsam: Bayesische Optimierung langfristiger Ergebnisse mit Online-Experimenten 实验、快速和缓慢:利用在线实验优化长期成果 2506.18744v1 -
661 06-23 On the Existence of Universal Simulators of Attention Über die Existenz universeller Simulatoren der Aufmerksamkeit 全世界关注模拟器的存在 2506.18739v1 -
662 06-23 Towards Group Fairness with Multiple Sensitive Attributes in Federated Foundation Models Auf dem Weg zu Gruppengerechtigkeit mit mehreren sensiblen Attributen in Federated Foundation Models 争取在联邦基金会模式中实现多敏感属性集团公平 2506.18732v1 -
663 06-23 When to Forget? Complexity Trade-offs in Machine Unlearning Wann vergessen? Komplexität Trade-offs in Machine Unlearning 何时忘却? 机器不学习的复杂权衡取舍 2502.17323v2 -
664 06-23 Learning interpretable positional encodings in transformers depends on initialization Das Lernen interpretierbarer Positionskodierungen in Transformatoren hängt von der Initialisierung ab 变压器中学习可解释的位置编码取决于初始化 2406.08272v4 -
665 06-23 Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition Einschließlich semantischer Informationen über Word-Embeddings für skeletonbasierte Aktionserkennung 包括通过单词嵌入嵌入式提供语义信息,促进基于Sleton的行动确认 2506.18721v1 -
666 06-23 Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation Multimodaler Ankerverteiler mit Wissensdestillation zur Emotionserkennung im Gespräch 具有知识蒸馏的多式锁定器变异器,在对话中承认情感 2506.18716v1 -
667 06-23 PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations PC-SRGAN: Physikalisch konsistente Super-Resolution Generatives Adversarial Network für allgemeine Transientensimulationen PC-SRGAN: 通用中转模拟器实际一致的超分辨率生成反反向网络 2505.06502v2 -
668 06-23 Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition Kontext Biasing für Aussprachen-Orthographie Missverhältnis in der automatischen Spracherkennung 自动语音识别中出现偏差以引发-正正对学误差的背景情况 2506.18703v1 -
669 06-23 SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding SaGIF: Individuelle Fairness in Graphen-Neuralen Netzwerken durch Ähnlichkeitskodierung verbessern SaGIF:通过相似编码提高图形神经网络的个人公平性 2506.18696v1 -
670 06-23 BAnG: Bidirectional Anchored Generation for Conditional RNA Design BAnG: Bidirektionale Anchored Generation für Conditional RNA Design BANG: 有条件的RNA设计双向导导导导导导导出代 2502.21274v2 -
671 06-23 One Step Diffusion via Shortcut Models Ein Schritt Diffusion über Shortcut-Modelle 通过快捷键模型进行单步扩散 2410.12557v3 -
672 06-23 VesselGPT: Autoregressive Modeling of Vascular Geometry SchiffGPT: Autoregressive Modellierung der Gefäßgeometrie SelGPT: 血管几何自动递减建模 2505.13318v2 -
673 06-23 A Random Matrix Analysis of In-context Memorization for Nonlinear Attention Eine zufällige Matrixanalyse der In-Kontext-Memorisierung für nichtlineare Aufmerksamkeit 用于非线性关注的非线性关注的 内流记忆化随机矩阵分析 2506.18656v1 -
674 06-23 Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning Enge Verallgemeinerungsfehler-Bounds für stochastische Gradient Descent in Non-convex-Lernen 非节流学习中 Stopchactic Gradient Emple 的紧度一般误差弹道界 2506.18645v1 -
675 06-23 On Union-Closedness of Language Generation Zur Unions-Schließung der Sprachgenerierung 关于联合语言一代的关闭 2506.18642v1 -
676 06-23 Federated Loss Exploration for Improved Convergence on Non-IID Data Föderated Loss Exploration für verbesserte Konvergenz auf nicht-IID-Daten 改进关于非IID数据的趋同的联邦损失探索 2506.18640v1 -
677 06-23 Granular-Ball-Induced Multiple Kernel K-Means Granular-Ball-induzierter Mehrfach-Kernel K-Means 颗粒球制导多核心K-Myans 2506.18637v1 -
678 06-23 Trustworthy Prediction with Gaussian Process Knowledge Scores Vertrauenswürdige Vorhersage mit Gaussian Prozess Wissen Scores 高斯进程知识分数的可信赖的预测 2506.18630v1 -
679 06-23 On Equivariant Model Selection through the Lens of Uncertainty Bei gleicher Modellauswahl durch das Lens of Uncertainty 通过不确定性的镜头进行等同模型选择 2506.18629v1 -
680 06-23 Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits Multi-Agenten-Verstärkungs-Lernen für Inverses Design in photonischen integrierten Schaltungen 光感集成电路反设计多机构强化学习 2506.18627v1 -
681 06-23 Bures-Wasserstein Flow Matching for Graph Generation Bures-Wasserstein-Durchfluss passend für die Graphenerzeugung Bures-Wasserstein 图表生成匹配流程 2506.14020v2 -
682 06-23 Pr{é}diction optimale pour un mod{è}le ordinal {à} covariables fonctionnelles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nan 2506.18615v1 -
683 06-23 Policy gradient methods for ordinal policies Politikgradientenmethoden für Ordinalpolitiken 通常政策的政策梯度方法 2506.18614v1 -
684 06-23 SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer SHAMANS: Klanglokalisierung mit hybrider Alpha-stabiler Raummessung und neuralem Steerer SHAMANS: 与混合阿尔法稳定空间测量和神经传感器的稳妥本地化 2506.18954v1 -
685 06-23 Simulation-Free Differential Dynamics through Neural Conservation Laws Simulationsfreie Differentialdynamik durch neurale Erhaltungsgesetze 通过神经保护法实现无模拟-无差异动态 2506.18604v1 -
686 06-23 BulletGen: Improving 4D Reconstruction with Bullet-Time Generation BulletGen: Verbesserung der 4D-Rekonstruktion mit Bullet-Time-Generation BulletGen: 改进4D重建与代代代代代代代代代代代代代代代 2506.18601v1 -
687 06-23 No Training Wheels: Steering Vectors for Bias Correction at Inference Time Keine Trainingsräder: Lenk-Vektoren für Bias-Korrektur zur Inferenzzeit 无培训轮:推论时间比亚更正指导矢量 2506.18598v1 -
688 06-23 SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds SpaNN: Erkennung mehrerer Adversarial Patches auf CNNs durch Spanning Saliency Thresholds SPANN: 透过透视阈值在CNN上检测多反向补丁 2506.18591v1 -
689 06-23 Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks Optimierungs-induzierte Dynamik der Lipschitz-Kontinuität in neuralen Netzwerken 神经网络中利普西茨连续性的优化-引导动态 2506.18588v1 -
690 06-23 Radio Map Prediction from Aerial Images and Application to Coverage Optimization Radio Map Vorhersage von Luftbildern und Anwendung in die Reichweite Optimierung 从空中图像和应用于最佳报道优化的无线电地图预测 2410.17264v2 -
691 06-23 Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning Effiziente Strahlauswahl für ISAC im zellfreien Massiv MIMO über digitales Twin Assisted Deep Reinforcement Learning 通过数字双互助深层强化学习,在无细胞大规模MIMO中高效选择ISAC 2506.18560v1 -
692 06-23 Soft decision trees for survival analysis Weiche Entscheidungsbäume für die Überlebensanalyse 用于生存分析的软决策树 2506.16846v2 -
693 06-23 Accurate early detection of Parkinson’s disease from SPECT imaging through Convolutional Neural Networks Präzise Früherkennung der Parkinson-Krankheit durch SPECT-Bildgebung durch konvolutionäre neurale Netzwerke 通过进化神经网络从SPECT成像中准确早期检测帕金森病 2412.05348v2 -
694 06-23 AutoPDL: Automatic Prompt Optimization for LLM Agents AutoPDL: Automatische Prompt-Optimierung für LLM-Agenten AAUPDL:LLM代理器自动快速优化 2504.04365v2 -
695 06-23 Hidden Breakthroughs in Language Model Training Versteckte Durchbrüche im Sprachmodelltraining 语言模式培训中的隐藏突破 2506.15872v2 -
696 06-23 Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning Transformer-Weltmodell für Proben Effizientes Mehr-Agenten-Verstärkungs-Lernen 取样效率高的多机构强化学习世界模式 2506.18537v1 -
697 06-23 Affordable AI Assistants with Knowledge Graph of Thoughts Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken 具有知识思想知识图的负担得起的AI助理 2504.02670v4 -
698 06-23 Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning Multi-Stage-Manipulation mit Demonstrations-Augmented Reward, Politik und World Model Learning 以示范性奖励、政策和世界示范学习模式进行多层次处理 2503.01837v2 -
699 06-23 End-to-End Spoken Grammatical Error Correction End-to-End-Spoken Grammatical Error Correction 端端到端口语语语法错误校正 2506.18532v1 -
700 06-23 A Set-to-Set Distance Measure in Hyperbolic Space Eine eingestellte Distanzmessung im Hyperbolischen Raum 超曲向空间的定位到 set- set 距离测量 2506.18529v1 -
701 06-23 Federated Learning from Molecules to Processes: A Perspective Föderiertes Lernen von Molekülen zu Prozessen: Eine Perspektive 从分子到过程的联邦学习:视角 2506.18525v1 -
702 06-23 DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling DDOT: Ein Derivativ-gerichteter Dual-Decoder-Normaldifferentialgleichungstransformator für dynamische Systemmodellierung DDOT: 用于动态系统建模的衍生式双向双向脱coder普通差异等同变换器 2506.18522v1 -
703 06-23 Machine-learning based high-bandwidth magnetic sensing Machine-Learning-basierte High-Bandwidth-Magnet-Sensoring 基于机械学习的高带宽磁遥感 2409.12820v2 -
704 06-23 Theoretical guarantees for neural estimators in parametric statistics Theoretische Garantien für neuronale Schätzer in der parametrischen Statistik 参数统计中神经测算员的理论保障 2506.18508v1 -
705 06-23 Indeterminate Probability Theory Unbestimmte Wahrscheinlichkeitstheorie 不确定概率理论 2303.11536v2 -
706 06-23 PuckTrick: A Library for Making Synthetic Data More Realistic PuckTrick: Eine Bibliothek, um synthetische Daten realistischer zu machen PuckTrick:一个使合成数据更加现实的图书馆 2506.18499v1 -
707 06-23 SPoRt – Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL SPORt – Safe Policy Ratio: Zertifizierte Schulung und Bereitstellung von Task-Richtlinien in modellfreier RL SPORT – – 安全政策比率:无模式RL中任务政策的认证培训和部署 2504.06386v2 -
708 06-23 Leveraging neural network interatomic potentials for a foundation model of chemistry Nutzung von interatomaren Potenzialen für ein Grundlagenmodell der Chemie 为化学基础模型发挥神经网络互动潜力 2506.18497v1 -
709 06-23 Disentangling representations of retinal images with generative models Entwirrende Darstellungen von retinalen Bildern mit generativen Modellen 用基因模型拆分视视视像图像的形状 2402.19186v3 -
710 06-23 AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing AnalogNAS-Bench: Ein NAS-Benchmark für analoges In-Memory Computing AnalogNAS-Bench:NAS模拟计算基准 2506.18495v1 -
711 06-23 Reliability-Adjusted Prioritized Experience Replay Reliability-Adjusted Prioritized Experience Replay 调整了可靠性调整后确定优先经验重述 2506.18482v1 -
712 06-23 FREQuency ATTribution: Benchmarking Frequency-based Occlusion for Time Series Data FREQuency ATTription: Benchmarking Frequenzbasierte Okklusion für Zeitreihendaten 时间序列数据基于频率的封闭性基准 2506.18481v1 -
713 06-23 LLMs on a Budget? Say HOLA LLMs auf einem Budget? Sagen Sie HOLA 预算LLLM 预算? 2506.18952v1 -
714 06-23 A Deep Convolutional Neural Network-Based Novel Class Balancing for Imbalance Data Segmentation Eine tiefkonvolutionäre neurale Netzwerk-basierte neuartige Klassenbalancing für Ungleichgewicht-Datensegmentierung 以深革命神经网络为基础、基于深度神经网络的新奇分类平衡,以平衡数据分割 2506.18474v1 -
715 06-23 A Motivational Architecture for Open-Ended Learning Challenges in Robots Eine motivierende Architektur für offene Lernherausforderungen in Robotern 机器人中开放式学习挑战的动力结构 2506.18454v1 -
716 06-23 xInv: Explainable Optimization of Inverse Problems xInv: Erklärbare Optimierung inverser Probleme xInv: 反向问题的可解释优化 2506.11056v2 -
717 06-23 TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning TreeSynth: Verschiedenste Daten von Scratch über baumgeführte Subraumpartitionierung synthetisieren TreeSynth: 通过树制辅助空间分割从 Scratch 通过树制辅助空间分隔从 Scratch 中合成多样性数据 2503.17195v2 -
718 06-23 LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently LoRA-One: Ein-Schritt-Full Gradient könnte genug für feines Tuning von großen Sprachmodellen sein, wahrscheinlich und effizient LORA-OI: 精巧、高效、可预见和高效的微调大语言模型的单步全步可满足需要 2502.01235v3 -
719 06-23 New Hardness Results for Low-Rank Matrix Completion Neue Härte-Ergebnisse für Low-Rank-Matrix-Vervollständigung 低 Rank 矩阵补全新硬性结果 2506.18440v1 -
720 06-23 Thermal Vision: Pioneering Non-Invasive Temperature Tracking in Congested Spaces Thermische Vision: Pionierische nicht-invasive Temperaturverfolgung in überlasteten Räumen 热远景:在拥挤空间进行最先出现的非侵入性温度跟踪 2412.00863v2 -
721 06-23 Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations Harmony: Ein gemeinsamer selbstüberwachter und schwach-überwachter Rahmen für das Lernen von allgemeinen visuellen Repräsentationen 和谐:学习一般目的视觉表现的共同自我监督、弱力监督框架 2405.14239v3 -
722 06-23 How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models Wie robust ist Modellbearbeitung nach Feinsteuerung? Eine empirische Studie zu Text-zu-Bild-Diffusionsmodellen 微调后模型编辑的力度如何? 文本到图像传播模型的经验研究 2506.18428v1 -
723 06-23 Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models Circuit Compositions: Erforschen von modularen Strukturen in transformerbasierten Sprachmodellen 电路构成:探索以变换语言模式为基础的模块结构 2410.01434v3 -
724 06-23 An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets Ein erweiterter Benchmark, der den Rand der Ungewissheit für aktives Lernen in Tabellendatensätzen bestätigt 扩大基准范围,在表格数据集中重新覆盖并肯定不确定抽样的边缘,以便积极学习 2306.08954v3 -
725 06-23 FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning for Semi-Supervised Semantic Segmentation FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning für semi-überwachte semantische Segmentierung CFACLUSS: 半超声分解的模糊适应性再平衡和相矛盾的不确定性学习 2506.11142v2 -
726 06-23 Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings Generative Modellierung von Voll-Atom-Protein-Konformationen mit Latent Diffusion auf Graph-Embeddings 利用在图形嵌入器上延迟扩散生成全原子蛋白质变形的生成模型 2506.17064v2 -
727 06-23 Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning Sensorische Neuronen optimieren: Nichtlineare Aufmerksamkeitsmechanismen für beschleunigte Konvergenz in Permutations-Invarianten Neuralen Netzwerken für Verstärkungslernen 优化感知神经中世纪:在加强学习的常变-内在神经网络中加速趋同的非线性注意机制 2506.00691v4 -
728 06-23 Online high-precision prediction method for injection molding product weight by integrating time series/non-time series mixed features and feature attention mechanism Online-Präzisionsvorhersageverfahren für das Gewicht des Spritzgussprodukts durch Integration von Zeitreihen/Nicht-Zeitreihen gemischte Funktionen und Feature-Aufmerksamkeitsmechanismus 通过将时间序列/非时间序列混合特点和特征关注机制相结合,对注入模型产品重量的在线高精确度预测方法 2506.18950v1 -
729 06-23 ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction ADNF-Clustering: Adaptives und dynamisches Neuro-Fuzzy-Clustering für Leukämie-Vorhersage ADNF-CLADNF:白血病预测适应性和动态神经结扎聚群 2506.18396v1 -
730 06-23 Reliable Vertical Federated Learning in 5G Core Network Architecture Zuverlässiges vertikales Federated Learning in 5G Core Network Architecture 5G核心网络架构中的可靠垂直联邦学习 2505.15244v3 -
731 06-23 SLR: An Automated Synthesis Framework for Scalable Logical Reasoning SLR: Ein automatisiertes Synthese-Framework für skalierbare logische Vernunft SLR: 一个可缩放逻辑理由的自动合成框架 2506.15787v2 -
732 06-23 LOGICPO: Efficient Translation of NL-based Logical Problems to FOL using LLMs and Preference Optimization LOGICPO: Effiziente Übersetzung von NL-basierten Logischen Problemen in FOL mittels LLMs und Preference Optimization LOGICPO:利用LLM和优先优化将基于NL的逻辑问题有效翻译给FOL 2506.18383v1 -
733 06-23 PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching PERSCEN: Lerne personalisierte Interaktionsmuster und Szenarioeinstellungen für Multi-Szenario-Matching PERSCEN: 学习个人化互动模式和多情景匹配情景 2506.18382v1 -
734 06-23 Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space Ganzheitliche Physik Solver: Lernen von PDEs in einem einheitlichen Spektral-Physischen Raum 综合物理解答器:在统一光谱物理空间学习PDE 2410.11382v2 -
735 06-23 Persistent Sampling: Enhancing the Efficiency of Sequential Monte Carlo Persistente Probenahme: Verbesserung der Effizienz von Sequential Monte Carlo 持久性抽样:提高按顺序排列的蒙特卡洛的效率 2407.20722v3 -
736 06-23 Recent Trends in Artificial Intelligence Technology: A Scoping Review Jüngste Trends in der Künstlichen Intelligenz-Technologie: Eine Bewertung 《人造情报技术的近期趋势:范围审查》 2305.04532v3 -
737 06-23 Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations Factual Knowledge in Language Models: Robustheit und Anomalien unter einfachen zeitlichen Kontextvariationen 语言模型中的事实知识:简单时间环境变化下的强力和异常现象 2502.01220v6 -
738 06-23 DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie DipLLM: 外交战略决策的精细推荐LLM 2506.09655v2 -
739 06-23 Global Context-aware Representation Learning for Spatially Resolved Transcriptomics Global Context-aware Representative Learning for Spatially Resolved Transcriptomics 空间解决中转技术学全球背景意识代表制学习 2506.15698v2 -
740 06-23 RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming RePST: Sprachmodell empowered Spatio-Temporal Forecasting via Semantisch-orientierte Reprogrammierung REPST:通过以语义为主的重新编制方案来进行语言模型增强能力SPA-时间预报 2408.14505v3 -
741 06-23 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation SlimMoE: Strukturierte Kompression großer MoE-Modelle über Expert Slimming und Destillation SlimMoE:通过专家攀爬和蒸馏对大型MOE模型进行结构性压缩 2506.18349v1 -
742 06-23 Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration Bohdi: Heterogene LLM Fusion mit automatischer Datenexploration Bohdi: 与自动数据探索相混合的异基因LLM 2506.15721v2 -
743 06-23 LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots LoopSR: Looping Sim-and-Real für lebenslange politische Anpassung von Legged Robots 环圈:为终身政策调整而环绕定形机器人终身政策 2409.17992v2 -
744 06-23 Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control Dynamische Hybridmodellierung: Inkrementelle Identifikation und Modellvorhersagesteuerung 动态混合模型:递增识别和模型预测控制 2506.18344v1 -
745 06-23 Controlled Generation with Equivariant Variational Flow Matching Kontrollierte Generation mit äquivarianter Variations-Flow-Matching 具有等同变化流动比对的受控生产 2506.18340v1 -
746 06-23 Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics Strukturierte Kolmogorov-Arnold-Neurale ODEs für interpretierbares Lernen und symbolische Entdeckung nichtlinearer Dynamik Kolmogorov-Arold Neal 用于口译学习和非线性动态的符号发现 2506.18339v1 -
747 06-23 Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations? Escaping the SpuriVerse: Können große Vision-Language-Modelle jenseits von gesehenen puriösen Korruptionen verallgemeinern? 摆脱SpuriVerse:大型视觉语言模型能否超越表面净化的Correrations而普遍化? 2506.18322v1 -
748 06-23 A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers Ein transformerbasierter Ansatz zur Diagnose von Fehlerfällen in optischen Faserverstärkern 光纤放大器中分析过失案例的以变换器为基础的方法 2505.06245v2 -
749 06-23 BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity BrainSymphony: Eine transformergetriebene Fusion von fMRI-Zeitreihen und struktureller Konnektivität 脑交响乐:FMRI时间序列和结构连接的变异器-驱动融合 2506.18314v1 -
750 06-23 Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies Schärfung der Speere: Adaptive, von Experten geleitete Gegenangriffe auf die DRL-basierte autonome Fahrpolitik 尖尖尖尖尖:适应性专家指导对基于DRL的自主驾驶政策进行反反向攻击 2506.18304v1 -
751 06-23 Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution Kollaborative Mean Abschätzung unter Heterogenen strategischen Agenten: Individuelle Rationalität, Fairness und Wahrheitsbeitrag 不同不同战略媒介之间合作平均估计:个人合理性、公平性和真实贡献 2407.15881v2 -
752 06-23 Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v2 -
753 06-23 AFBS:Buffer Gradient Selection in Semi-asynchronous Federated Learning AFBS: Puffer-Gradienten-Auswahl im semi-asynchronen Föderierten Lernen AFBS: 半同步联邦学习中的缓分级选择 2506.12754v2 -
754 06-23 GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing GeNeRT: Ein physik-informierter Ansatz zur intelligenten drahtlosen Kanalmodellierung via Generalizable Neural Ray Tracing GENERT:通过可通用神经射线追踪对智能无线频道建模的物理综合方法 2506.18295v1 -
755 06-23 Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction Instabilität in Diffusions-ODEs: Eine Erklärung für die ungenaue Bildrekonstruktion DFDODEs的不稳定性:不准确图像重建的解释 2506.18290v1 -
756 06-23 LoRA vs Full Fine-tuning: An Illusion of Equivalence LoRA vs. Full Fine-Tuning: Eine Illusion der Gleichwertigkeit LoRA 与 完全微调: 等同的幻象 2410.21228v2 -
757 06-23 Learning High-Quality Latent Representations for Anomaly Detection and Signal Integrity Enhancement in High-Speed Signals Lernen von hochwertigen Latentendarstellungen für Anomalieerkennung und Signalintegritätsverbesserung in Hochgeschwindigkeitssignalen 高频信号中反常探测和信号完整性增强的学习高品质低端显示器 2506.18288v1 -
758 06-23 Learning Causal Graphs at Scale: A Foundation Model Approach Das Lernen von Kausalgraphen im Maßstab: Ein Basismodellansatz 规模化学习性因果图表:基础模式方法 2506.18285v1 -
759 06-23 Quantifying Uncertainty in the Presence of Distribution Shifts Quantifizierung der Unsicherheit in der Gegenwart von Verteilungsverschiebungen 分配变更存在不确定性的量化 2506.18283v1 -
760 06-23 Phase retrieval with rank $d$ measurements – \emph{descending} algorithms phase transitions Phase Retrieval mit Rang $d$ Messungen – \emph{descending} Algorithmen Phasenübergänge 以 $d$ 位数测量的阶段检索 – \ emph{descending} 算法阶段过渡 2506.18282v1 -
761 06-23 Hallucination Level of Artificial Intelligence Whisperer: Case Speech Recognizing Pantterinousut Rap Song Halluzinationsgrad der Künstlichen Intelligenz Whisperer: Case Speech Anerkennung von Pantterinousut Rap Song 人造情报自言自语:承认 “ 潘特罗宁自唱 “ 的个案发言 2506.16174v2 -
762 06-23 Optimal spectral initializers impact on phase retrieval phase transitions – an RDT view Optimale spektrale Initialisatoren wirken sich auf Phasenabfragephasenübergänge aus – eine RDT-Ansicht 最佳光谱初始化器对阶段回收阶段过渡的影响 – – RDT观点 2506.18279v1 -
763 06-23 Fast Rate Information-theoretic Bounds on Generalization Errors Schnelle Rate Information-theoretische Grenzen auf Verallgemeinerungsfehler 通用误差信息理论误差快速速速率 2303.14658v3 -
764 06-23 Finite-Time Information-Theoretic Bounds in Queueing Control Finite-Time Information-Theoretische Bounds in Queueing Control 定队控制中的短时信息理论环 2506.18278v1 -
765 06-23 Phase transition of \emph{descending} phase retrieval algorithms Phasenübergang von \emph{descending} Phasen-Retrieval-Algorithmen \ emph{ dedescending} 阶段检索算法的阶段过渡 2506.18275v1 -
766 06-23 Leveraging Large Language Models for Information Verification – an Engineering Approach Nutzung großer Sprachmodelle für die Informationsverifizierung – ein Engineering-Ansatz 利用大语言模型进行信息核查 – – 工程方法 2506.18274v1 -
767 06-23 When Large Language Models Meet Vector Databases: A Survey Wenn große Sprachmodelle Vektordatenbanken treffen: Eine Umfrage 当大语言模型与矢量数据库相匹配时:调查 2402.01763v4 -
768 06-23 Evolutionary Optimization of Physics-Informed Neural Networks: Evo-PINN Frontiers and Opportunities Evolutionäre Optimierung physikinformierter neuraler Netzwerke: Evo-PINN Grenzen und Chancen 物理内化神经网络的优化演变:Evo-PINN的前沿和机会 2501.06572v3 -
769 06-23 Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models Memory-Augmented Architecture für langfristiges Kontext-Handling in großen Sprachmodellen 大语言模型长期背景处理的记忆强化建筑 2506.18271v1 -
770 06-23 ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models with Heterogeneous Adaptation Needs ARD-LoRA: Dynamische Rangverteilung für Parameter-Effiziente Feineinstellung von Grundmodellen mit heterogenen Anpassungsbedürfnissen ARD-LORA: 具有不同差异适应需要的基础模型参数有效精密设计动态排名分配 2506.18267v1 -
771 06-23 Incentives for Responsiveness, Instrumental Control and Impact Anreize für Reaktionsfähigkeit, Instrumentenkontrolle und Wirkung 反应、工具控制和影响奖励措施 2001.07118v3 -
772 06-23 FutureFill: Fast Generation from Convolutional Sequence Models FutureFill: Schnelle Generation aus konvolutionären Sequenzmodellen 未来金融危机:从变序模型中快速繁衍 2410.03766v3 -
773 06-23 AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining AdaLRS: Loss-Guided Adaptive Learning Rate Suche nach effizientem Foundation Model Pretraining AdaLRS: 为高效基础基础示范培训前而寻找学习率 2506.13274v2 -
774 06-23 MGHF: Multi-Granular High-Frequency Perceptual Loss for Image Super-Resolution MGHF: Multi-Granulare High-Frequency Perceptual Loss für Bild-Super-Resolution MGHF: 图像超分辨率的多语言高频感知损失 2411.13548v2 -
775 06-23 Ground tracking for improved landmine detection in a GPR system Bodenverfolgung für eine verbesserte Landminenerkennung in einem GPR-System 在GPR系统中改进地雷探测的地面跟踪 2506.18258v1 -
776 06-23 RLPR: Extrapolating RLVR to General Domains without Verifiers RLPR: Extrapolieren von RLVR auf allgemeine Domains ohne Prüfer RLPR: 将RLVR外推至普通域域,无验证符 2506.18254v1 -
777 06-23 DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic DSAC-C: Beschränkte maximale Entropie für robuste diskrete Soft-Actor-Kritik DSAC- C: 软盘分解软控控控控控控最大导体 2310.17173v2 -
778 06-23 Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures Effiziente Quantifizierung von Modellierungsunsicherheiten mit differenzierten physikinformierten Machine Learning-Architekturen 探索对以不同物理和机械化学习架构建模的不确定性模型化进行高效率的量化 2506.18247v1 -
779 06-23 Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student Dual-Forward-Pfad-Lehrerwissen Destillation: Überwindung des Kapazitaetraums zwischen Lehrer und Student 教师知识蒸馏:缩小师生能力差距 2506.18244v1 -
780 06-23 Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Chain-of-Experts: Entsperren der Kommunikationskraft von Mixture-of-Experts-Modellen 专家链:解锁混合专家模型的通信能力 2506.18945v1 -
781 06-23 Uncertainty-aware Efficient Subgraph Isomorphism using Graph Topology Ungewissheit bewusst Effizienter Subgraph Isomorphismus mit Graph Topologie 使用图形地形学 2209.09090v3 -
782 06-23 LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs LLM 网络动态:追踪在LLM网络中的模型崩溃情况 2506.15690v2 -
783 06-23 AdapThink: Adaptive Thinking Preferences for Reasoning Language Model AdapThink: Adaptive Denkeinstellungen für das Sprachmodell der Vernunft AapThink:对理由语言模式的适应性思维偏好 2506.18237v1 -
784 06-23 ASGO: Adaptive Structured Gradient Optimization ASGO: Adaptive Strukturierte Gradientenoptimierung ASGO: 适应结构结构梯度优化 2503.20762v2 -
785 06-23 Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano Cross-Architecture Knowledge Destillation (KD) für retinale Fundus-Bildanomalieerkennung auf NVIDIA Jetson Nano NVIDIA Jetson Nano 图像异常探测跨建筑知识蒸馏(KD) 2506.18220v1 -
786 06-23 Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales Symmetrischer Verstärkungs-Lernverlust für robustes Lernen auf unterschiedlichen Aufgaben und Modellskalan 不同任务和模式规模的有力学习的对称强化学习损失 2405.17618v3 -
787 06-23 Cost-Aware Routing for Efficient Text-To-Image Generation Kostenbewusstes Routing für eine effiziente Text-zu-Bild-Generierung 高效生成文本到图像的成本-软件路由 2506.14753v2 -
788 06-23 Distributionally Robust Active Learning for Gaussian Process Regression Distributionell robustes aktives Lernen für Gaußsche Prozessregression Gaussian 进程倒退的分布强力积极学习 2502.16870v2 -
789 06-22 (7) BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning BLAZE: Cross-Language und Cross-Project Bug Lokalisierung über Dynamic Chunking und Hard Example Learning BLAZE:通过动态打字和硬实例学习实现跨语言和跨项目错误定位 2407.17631v3 -
790 06-22 Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules Datengesteuerte Entdeckung der biophysikalischen T-Zellrezeptor-Kospezifitätsregeln 以数据驱动的数据驱动的生物物理细胞受体受体发现 2412.13722v3 -
791 06-22 Joint Embedding Predictive Architecture for self-supervised pretraining on polymer molecular graphs Joint Embedding Predictive Architecture für selbstüberwachtes Pretraining auf Polymer-Molekulargraphen 联合嵌入式联合预测结构,以进行关于聚合分子图的自我监督的预培训 2506.18194v1 -
792 06-22 DeInfoReg: A Decoupled Learning Framework for Better Training Throughput DeInfoReg: Ein entkoppelter Lernrahmen für besseren Trainingsdurchsatz DInfoReg:一个分离的学习框架,以改善培训工作量 2506.18193v1 -
793 06-22 Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion Stabilisierung des zeitlichen Unterschieds Lernen durch implizite stochastische Rekursion 通过隐性蒸气回流稳定时间差异学习 2505.01361v2 -
794 06-22 Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks Rufen Sie mich vielleicht an: Verbesserung der JavaScript Call Graph Construction mit Graph Neural Networks 使用图形神经网络加强 JavaScript 呼叫图图建设 2506.18191v1 -
795 06-22 The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis Die Wirkung von Arzneimittelmangel auf unerwünschte Ergebnisse: Nachweise von Schizophreniepatienten durch Überlebensanalyse 《不遵守药品对不利结果的影响:通过生存分析从精神病患者那里得到的证据》 2506.18187v1 -
796 06-22 Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels Online-Lernen von Whittle-Indizes für ruhelose Banditen mit nicht-stationären Transition-Kerneln 在线学习无休无休止强盗用非稳定过渡核心的Whittle Indists在线学习 2506.18186v1 -
797 06-22 Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba Memba: Membrangetriebene Parameter-Effizient Feintuning für Mamba Memba:Mamba的膜驱动光膜驱动参数 2506.18184v1 -
798 06-22 Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models Halluzination-Aware Multimodaler Benchmark für die gastrointestinale Bildanalyse mit großen Vision-Sprachenmodellen 使用大型视觉语言模型分析胃肠图象的幻觉-软件多式基准 2505.07001v2 -
799 06-22 Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization Schnelle und präzise Leistungslastdatenvervollständigung über Regularisierungsoptimierte Low-Rank-Fabrikisierung 通过正规化、优化低射速电荷因子化完成快速和准确电源负载数据 2505.19133v2 -
800 06-22 One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models Ein Schritt ist genug: Sparse Autoencoder für Text-zu-Bild-Diffusionsmodelle 单步就够了: 用于文本到图像扩散模型的粗略自动编码器 2410.22366v4 -
801 06-22 Pitfalls of Conformal Predictions for Medical Image Classification Pitfalls von konformen Vorhersagen für medizinische Bildklassifikation 医学图像分类非正规预测的空洞 2506.18162v1 -
802 06-22 Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control Multi-Agent Soft Actor-Critic mit koordiniertem Verlust für autonome Mobilität-auf-Demand-Flotte-Kontrolle 多代理商软软软操作器-对自动机动按需机动车队控制协调损失具有协调损失的批评 2404.06975v2 -
803 06-22 Probabilistic and reinforced mining of association rules Probabilistischer und verstärkter Abbau von Assoziierungsregeln 协会规则的概率和强化开采 2506.18155v1 -
804 06-22 Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection Routing Mamba: Skalierung von State Space-Modellen mit Mixture-of-Experts Projektion Routing Mamba: 配有混合专家预测模型的扩大型国家空间模型 2506.18145v1 -
805 06-22 Enhancing LLM Knowledge Learning through Generalization Verbesserung des LLM-Wissenslernens durch Verallgemeinerung 通过普遍化加强LLM知识学习 2503.03705v2 -
806 06-22 Supercharging Graph Transformers with Advective Diffusion Supercharging Graph Transformer mit advektiver Diffusion 具有辅助扩散作用的极强巨型平面变形器 2310.06417v4 -
807 06-22 On the fast convergence of minibatch heavy ball momentum Auf die schnelle Konvergenz der Minibatch schweren Ball Momentum 小型大球的重球势头迅速趋同 2206.07553v5 -
808 06-22 Bridging Geometric Diffusion and Energy Minimization: A Unified Framework for Neural Message Passing Bridging Geometrische Diffusion und Energie Minimierung: Ein einheitliches Framework für neurale Message Passing 连接几何扩散和能源最小化:神经信息传递统一框架 2409.09111v2 -
809 06-22 Stable and consistent density-based clustering via multiparameter persistence Stabiles und konsistentes Dichte-basiertes Clustering über Multiparameter Persistenz 通过多参数耐久性建立稳定、一致的基于密度的集群 2005.09048v4 -
810 06-22 Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence Unüberwachte Risikofaktoren-Identifikation über Krebsarten und Datenmodalitäten durch erklärbare künstliche Intelligenz 通过可解释的人工智能,在癌症类型和数据模式中,通过可解释的人工智能,确定各种癌症类型和数据模式的不受监督的风险因素 2506.12944v2 -
811 06-22 Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models Bayesian Multiobject Tracking mit neural-erweiterten Bewegungs- und Messmodellen Bayesian 多功能物体跟踪,以神经强化机动和测量模型跟踪 2506.18124v1 -
812 06-22 RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies 机器人阿雷纳:对通用机器人政策进行分布式真实世界评价 2506.18123v1 -
813 06-22 Dynamic Temporal Positional Encodings for Early Intrusion Detection in IoT Dynamische temporale Positionskodierungen für die frühzeitige Intrusionserkennung im IoT 用于在 IoT 中早期入侵探测的动态时间位置定位编码 2506.18114v1 -
814 06-22 RL for Reasoning by Adaptively Revealing Rationales RL zur Begründung durch adaptives Aufdecken von Rationales 以适应性流转推理推理推理的RL 2506.18110v1 -
815 06-22 SD-KDE: Score-Debiased Kernel Density Estimation SD-KDE: Abschätzung der Score-Debiased-Kernel-Dichte SD-KDE: 计分下降核心密度估计 2504.19084v2 -
816 06-22 CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study CT Radiomics-based Explainable Machine Learning Model für genaue Differenzierung von bösartigen und benachbarten Endometrialtumoren: Eine Zwei-Center-Studie CT 基于辐射的可解释解析机器学习模型,用于准确区分马利干和贝尼尼天地地对地肿瘤:双中心研究 2506.18106v1 -
817 06-22 Enhancing VICReg: Random-Walk Pairing for Improved Generalization and Better Global Semantics Capturing Verbesserung von VICreg: Zufalls-Walk-Paaring für verbesserte Generalisierung und bessere globale Semantik 加强维也纳国际中心:为改善普遍化和更好的全球语义能力而随机电路对接 2506.18104v1 -
818 06-22 ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation ShareGPT-4o-Image: Multimodale Modelle mit GPT-4o-Level-Bilderzeugung ausrichten ShareGPT-4o-图像:使多模式模型与GPT-4o-层次图像生成相一致 2506.18095v1 -
819 06-22 MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks MalPurifier: Verbesserung der Android Malware-Erkennung mit Adversarial Reinigung gegen Evasion Angriffe 马尔伪化物:加强Android Maware的探测,并进行反向净化,防止攻击侵入 2312.06423v2 -
820 06-22 GRASP: Grouped Regression with Adaptive Shrinkage Priors GRASP: Gruppenregression mit adaptiven Schrumpfvorstufen GRASP: 具有适应性缩小前科的分组倒退 2506.18092v1 -
821 06-22 Active Fine-Tuning of Multi-Task Policies Aktive Feinsteuerung von Multi-Task-Politiken 积极对多任务政策进行罚款 2410.05026v3 -
822 06-22 Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares Identifizierbare Convex-Concave-Regression über Subgradient Regularisierte Least Squares 经由亚级正规化最不发达国家广场的可识别的 Convex-Concev 倒退 2506.18078v1 -
823 06-22 Distributionally robust minimization in meta-learning for system identification Verteilungsstarke Minimierung im Meta-Learning zur Systemidentifikation 在用于系统识别的元学习中大力进行分配,尽量减少分配,以便进行系统识别 2506.18074v1 -
824 06-22 PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks PREMAP: Ein einheitliches PreiMage APproximation Framework für neurale Netzwerke PREMAP:神经网络统一PREMMage相近性框架 2408.09262v2 -
825 06-22 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v3 -
826 06-22 Rumor Detection on Social Media with Reinforcement Learning-based Key Propagation Graph Generator Gerücht Detection auf Social Media mit Verstärkung Learning-basierte Key Propagation Graph Generator 以强化学习为基础的社会媒体的传闻探测 2405.13094v2 -
827 06-22 Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference Bayesische Bewusstseinstheorie als auswechselbare Emotion-Kognition-Schlussfolgerung 贝叶斯人的觉悟理论,作为可交流的情感 – – 情绪 – – 气氛推论 2407.09488v2 -
828 06-22 TAB: Unified Benchmarking of Time Series Anomaly Detection Methods TAB: Unified Benchmarking von Methoden zur Erkennung von Anomalien in der Zeitreihe TAB: 不同探测方法的时间序列统一基准 2506.18046v1 -
829 06-22 FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs FinGPT: Verbesserung der Sentiment-Based Stock Movement Prediction mit Verbreitungs-Bewusst und Kontext-angereicherten LLMs FINGPT:利用传播软件和内容丰富的LMs,加强基于情绪的库存流动预测 2412.10823v2 -
830 06-22 Hierarchical Decision Making Based on Structural Information Principles Hierarchische Entscheidungsfindung auf der Grundlage struktureller Informationsprinzipien 基于结构信息原则的等级决策 2404.09760v2 -
831 06-22 Pathwise Explanation of ReLU Neural Networks Pathwise Erklärung von ReLU Neuronalen Netzwerken ReLU 神经网络解析 2506.18037v1 -
832 06-22 Why Do Some Language Models Fake Alignment While Others Don’t? Warum richten sich einige Sprachmodelle falsch aus, während andere es nicht tun? 为何有些语言模型假相配合而其他人则不假相? 2506.18032v1 -
833 06-22 FLARE: Toward Universal Dataset Purification against Backdoor Attacks FLARE: Auf dem Weg zur Universaldatensatzreinigung gegen Hintertürangriffe FLARE: 实现普遍数据集净化,防止幕后袭击 2411.19479v3 -
834 06-22 POPGym Arcade: Parallel Pixelated POMDPs POPGym Arcade: Parallele Pixelierte POMDPs POPGym 街机屋:平行像素化 POMDPs 2503.01450v5 -
835 06-22 Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data Lernen aus Referenzantworten: Vielseitige Sprachmodellausrichtung ohne Binäre menschliche Präferenzdaten 从参考资料解答中学习:通用语言模型调整,无二元人类优先数据 2504.09895v2 -
836 06-22 Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning Verallgemeinerung unter byzantinischen und vergiftenden Angriffen: Enge Stabilitätsgrenzen in robustem verteiltem Lernen Byzantine和毒毒袭击下的普及化:强力分布式学习中的紧固稳定环环绕 2506.18020v1 -
837 06-22 AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs AlphaDecay: Modulweises Gewichtsdecay für schweres Balancing in LLMs AlphaDecay:LLMM中重帆平衡的舱型偏重衰减 2506.14562v2 -
838 06-22 Probing the Embedding Space of Transformers via Minimal Token Perturbations Den Embedding Space von Transformers über Minimal Token Perturbations probieren 通过最小 Token 扰动来验证变形器嵌入空间 2506.18011v1 -
839 06-22 Imputation of Longitudinal Data Using GANs: Challenges and Implications for Classification Imputation von Längsschnittdaten mit GANs: Herausforderungen und Implikationen für die Klassifizierung 使用全球大气网络的纵向数据估计:分类的挑战和影响 2506.18007v1 -
840 06-22 EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models EDA-DM: Verbesserte Verteilungsausrichtung für die Nachschulung Quantisierung von Diffusionsmodellen EDA-DM:加强传播模型的培训后量化的分发协调 2401.04585v3 -
841 06-22 Fast Neural Inverse Kinematics on Human Body Motions Schnelle Neurale Inverse Kinematik auf menschlichen Körper Bewegungen 人类身体运动的快速神经反反向数学 2506.17996v1 -
842 06-22 Newtonian and Lagrangian Neural Networks: A Comparison Towards Efficient Inverse Dynamics Identification Newtonian and Lagrangeian Neural Networks: Ein Vergleich zu einer effizienten Inverse Dynamics-Identifikation 牛顿和拉格朗江神经网络:与高效反向动态识别比较 2506.17994v1 -
843 06-22 Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings Datenkurationsmaterien: Modellkollaps und spurlose Shift-Performance-Vorhersage aus dem Training auf unkuratierten Text-Embeddings 数据说明事项:从未完成的文字嵌入培训中得出的模型折叠和净性转变的绩效预测 2506.17989v1 -
844 06-22 A Coverage-Guided Testing Framework for Quantum Neural Networks Ein Coverage-Guided Testing Framework für Quantum-Neural-Netzwerke 量子神经网络覆盖指导测试框架 2411.02450v2 -
845 06-22 SliceGX: Layer-wise GNN Explanation with Model-slicing SliceGX: Schichtweise GNN-Erläuterung mit Modellschnitt SlicGX: 从图层角度解释 GNN GNN 2506.17977v1 -
846 06-22 Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm Vertrauenswürdige effiziente Kommunikation für verteiltes Lernen mit LQ-SGD-Algorithmus 利用LQ-SGD 算法为分配学习进行值得信赖的高效沟通 2506.17974v1 -
847 06-22 Reinforcement Learning Teachers of Test Time Scaling Verstärktes Lernen von Lehrern der Testzeitskalierung 测试时间尺度强化学习教师 2506.08388v2 -
848 06-22 h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective h-Kalibrierung: Rethinking Klassifikator-Rekalibrierung mit probabilistischem fehlergebundenem Ziel h-校准:用概率错误误差目标重新思考分类法重新校准 2506.17968v1 -
849 06-22 Adapting Vision-Language Models for Evaluating World Models Anpassung von Vision-Language-Modellen für die Bewertung von Weltmodellen 调整世界模型评估的愿景-语言模型 2506.17967v1 -
850 06-22 AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement AnyEnhance: Ein einheitliches Generatives Modell mit Prompt-Guidance und Selbstkritik für Sprachverbesserung Any促进:促进声音增强的快速指导和自我批评统一生成模式 2501.15417v2 -
851 06-22 Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models Leveraging Model Guidance zum Extrahieren von Trainingsdaten aus personalisierten Diffusionsmodellen 利用示范指南,从个性化传播模式中提取培训数据 2410.03039v2 -
852 06-22 An entropy-optimal path to humble AI Entropie-optimaler Weg zur bescheidenen KI 通往谦卑的 AI 的星盘最佳路径 2506.17940v1 -
853 06-22 IDAL: Improved Domain Adaptive Learning for Natural Images Dataset IDAL: Verbessertes Domain Adaptives Lernen für natürliche Bilder Datensatz IDAL: 改进自然图像数据集的适应性空间学习 2506.17931v1 -
854 06-22 Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Evolving Prompts In-Context: Eine offene, sich selbst replizierende Perspektive 不断演变的加速:一个开放的、自我复制的视角 2506.17930v1 -
855 06-22 Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability Enthüllung molekularer Moieties durch Hierarchische Grad-CAM Graph Erklärbarkeit 通过等级梯度- CAM 图形解释 2402.01744v5 -
856 06-22 ASTER: Adaptive Spatio-Temporal Early Decision Model for Dynamic Resource Allocation ASTER: Adaptives Spatio-Temporales Frühentscheidungsmodell für die dynamische Ressourcenallokation ATER: 动态资源分配适应性SPATIO-临时早期决定模式 2506.17929v1 -
857 06-22 Improving the Efficiency of Long Document Classification using Sentence Ranking Approach Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz 采用判决分级办法提高长文件分类的效率 2506.07248v2 -
858 06-22 Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding Permutation Equivariant Modellbasiertes Offline-Verstärkungslernen für Auto-Bindung 用于自动招标的离线强化学习 2506.17919v1 -
859 06-22 A real-time anomaly detection method for robots based on a flexible and sparse latent space Eine Echtzeit-Anomalieerkennungsmethode für Roboter auf Basis eines flexiblen und spärlichen Latentraums 以灵活和稀少的潜在空间为基础的机器人实时异常现象探测方法 2504.11170v3 -
860 06-22 Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks Grafik Neuronale Netzwerke in Supply Chain Analytics und Optimierung: Konzepte, Perspektiven, Datensatz und Benchmarks 供应链分析和优化中的神经网络:概念、视角、数据集和基准 2411.08550v2 -
861 06-22 Interpretable global minima of deep ReLU neural networks on sequentially separable data Interpretable globale Minima von tiefen neuronalen ReLU-Netzwerken auf sequentiell trennbaren Daten 深RELU神经网络关于相继分离数据的可解释全球小型深RELU神经网络 2405.07098v3 -
862 06-22 SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback SIPDO: Closed-Loop Prompt Optimierung über Synthetic Data Feedback SIPDO:通过合成数据反馈,通过闭闭电话快速优化 2505.19514v2 -
863 06-22 Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text Text2Struct: Eine maschinenlernende Pipeline für den Bergbau strukturierte Daten aus Text Text2Struct: 文字中采矿结构化数据的机械学习管道 2212.09044v4 -
864 06-22 TROJAN-GUARD: Hardware Trojans Detection Using GNN in RTL Designs TROJAN-GUARD: Hardware-Trojaner-Erkennung mit GNN in RTL-Designs TROJAN-GUARD:在RTL设计中使用GNN的硬件探测Trojans 2506.17894v1 -
865 06-22 A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy Ein bayesischer nicht-parametrischer Ansatz für Generative Modelle: Integrieren von Variational Autoencoder und Generative Adversarial Networks mit Wasserstein und maximaler mittlerer Diskrepanz 采用巴耶斯非参数方法处理产生模型:采用瓦塞斯泰因和最大平均值差异法,整合变式自动编码器和生成反对向网络 2308.14048v2 -
866 06-22 ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training ECHO-LlaMA: Effizientes Caching für Hochleistungs-LLaMA-Schulungen ECHO-LLAMA: 高效率的高绩效拉马培训 2505.17331v2 -
867 06-22 SPD-CFL: Stepwise Parameter Dropout for Efficient Continual Federated Learning SPD-CFL: Schrittweiser Parameter-Ausfall für effizientes kontinuierliches Federated Learning SPD-CFL: 高效持续联邦学习的分级参数辍学 2405.09394v2 -
868 06-22 Cloud-Aware SAR Fusion for Enhanced Optical Sensing in Space Missions Cloud-Aware SAR Fusion für verbesserte optische Wahrnehmung in Weltraummissionen 用于空间飞行任务中增强光学遥感的云器合成孔合成孔径雷达 2506.17885v1 -
869 06-22 Navigating Conflicting Views: Harnessing Trust for Learning Navigieren gegensätzlicher Ansichten: Vertrauen fürs Lernen gewinnen 引导冲突观点:利用信任学习 2406.00958v4 -
870 06-22 Dim and Small Target Detection for Drone Broadcast Frames Based on Time-Frequency Analysis Dim und kleine Target Detection für Drohnen Broadcast Frames basierend auf Zeit-Frequenz-Analyse 根据时间-期限分析对无人机广播框架进行迪姆和小目标探测 2505.18167v2 -
871 06-22 Choice of Scoring Rules for Indirect Elicitation of Properties with Parametric Assumptions Wahl der Bewertungsregeln für die Indirekte Elizitation von Immobilien mit parametrischen Annahmen 带有参数假设的间接引力财产选择规则 2506.17880v1 -
872 06-22 Decoding Federated Learning: The FedNAM+ Conformal Revolution Decoding Federated Learning: Die FedNAM+ Konforme Revolution 解说联邦学习:美联联储+非正规革命 2506.17872v1 -
873 06-22 How Alignment Shrinks the Generative Horizon Wie Alignment den generativen Horizont schrumpft 协同一致如何缩小生成地平线 2506.17871v1 -
874 06-22 NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN NestQuant: 培训后DNN的整数 2506.17870v1 -
875 06-22 Geometric Contact Flows: Contactomorphisms for Dynamics and Control Geometrische Kontaktflüsse: Kontaktomorphismen für Dynamik und Steuerung 几何接触流动:动态和控制的接触形态 2506.17868v1 -
876 06-22 DeepMedcast: A Deep Learning Method for Generating Intermediate Weather Forecasts among Multiple NWP Models DeepMedcast: Eine Deep-Learning-Methode zur Generierung von Zwischenwetterprognosen unter mehreren NWP-Modellen 深气象:在多国家工作方案模型中生成中期天气预报的深层学习方法 2411.10010v2 -
877 06-22 IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas IGNIS: Ein robustes neurales Netzwerk-Framework für eingeschränkte Parameterschätzungen in Archimedischen Copulas IGNIS:Archimedean Copulas受控参数估计的强力神经网络框架 2505.22518v2 -
878 06-22 How Visual Representations Map to Language Feature Space in Multimodal LLMs Wie visuelle Darstellungen den Sprach-Feature-Raum in multimodalen LLMs anzeigen 多模式LMM中语言特征空间的视觉图示图 2506.11976v2 -
879 06-22 Learning to Reason under Off-Policy Guidance Unter außerpolitischer Anleitung zur Vernunft lernen 根据非政策指导学习理由 2504.14945v5 -
880 06-21 (6) FedBaF: Federated Learning Aggregation Biased by a Foundation Model FedBaF: Federated Learning Aggregation Durch ein Stiftungsmodell biased FedBAF: 联邦学习联合组织 2410.18352v3 -
881 06-21 AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking AbRank: Benchmark Dataset und Metric-Learning Framework für Antikörper-Antigen-Affinitätsranking AbRank:抗体-安提gen同系物排序基准数据集和计量-学习框架 2506.17857v1 -
882 06-21 Bayesian Inference for Left-Truncated Log-Logistic Distributions for Time-to-event Data Analysis Bayesische Schlussfolgerung für links-beschnittene Log-Logistic-Distributionen für die Zeit-zu-Ereignis-Datenanalyse 用于时间到活动数据分析的左排出日志分布的贝叶斯推理 2506.17852v1 -
883 06-21 Pathway-based Progressive Inference (PaPI) for Energy-Efficient Continual Learning Pathway-based Progressive Inferenz (PaPI) für energieeffizientes kontinuierliches Lernen 能源效率连续不断学习基于途径的渐进推论(PAPI) 2506.17848v1 -
884 06-21 A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity Eine vergleichende Studie von Open-Source-Bibliotheken für die synthetische tabellarische Datengenerierung: SDV vs. SynthCity 对用于合成图表数据生成的开放源码图书馆的比较研究:SDV诉合成城市 2506.17847v1 -
885 06-21 Causal Spherical Hypergraph Networks for Modelling Social Uncertainty Causal Spherical Hypergraph Networks for Modeling Social Uncertainty 社会不确定性建模模型化的因果球面高光谱网络 2506.17840v1 -
886 06-21 Evaluating Rank-N-Contrast: Continuous and Robust Representations for Regression Bewertung von Rank-N-Kontrast: Kontinuierliche und robuste Darstellungen für Regression 评价排名-N-Contrast:持续和有力的倒退代表 2411.16298v3 -
887 06-21 Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking Leveling the Playing Field: Klassische und gelernte Controller für Quadrotor Trajectory Tracking sorgfältig miteinander vergleichen 平整播放字段: 仔细比较用于四重奏轨迹跟踪的古典和中学主计长 2506.17832v1 -
888 06-21 Sharper Bounds for Chebyshev Moment Matching, with Applications Scharfere Bounds für Chebyshev Moment Matching, mit Anwendungen Chebyshev Moment 配配, 与应用程序 2408.12385v2 -
889 06-21 Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v1 -
890 06-21 Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning Durch Causal-Hypergraphen praktikable Interpretierbarkeit: Entwirren von Batch-Größeneffekten im Deep Learning 通过Causal Hyphriphes:深学习中不易破坏的批量大小效应 2506.17826v1 -
891 06-21 Quantum-Hybrid Support Vector Machines for Anomaly Detection in Industrial Control Systems Quanten-Hybrid-Unterstützung Vektormaschinen für Anomalieerkennung in industriellen Steuerungssystemen 用于在工业控制系统中异常探测的量子-湿性支持矢量机 2506.17824v1 -
892 06-21 Learning to Dock: A Simulation-based Study on Closing the Sim2Real Gap in Autonomous Underwater Docking Dock lernen: Eine simulationsbasierte Studie zum Schließen der Sim2Real Gap im autonomen Unterwasser-Docking 学到码头:模拟研究,研究如何缩小自来自来自来自来自来水库中的Sim2实时差距 2506.17823v1 -
893 06-21 CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning CultureMERT: Kontinuierliches Pre-Training für kulturübergreifendes Musikrepräsentanz-Lernen CUCMERT: 不同文化间音乐代表制学习的继续预培训 2506.17818v1 -
894 06-21 Secure Energy Transactions Using Blockchain Leveraging AI for Fraud Detection and Energy Market Stability Sichere Energietransaktionen mittels Blockchain-Leveraging-KI für Betrugserkennung und Energiemarktstabilität 利用安全能源交易利用 “ 利用全链利用AI “ 来欺诈侦查和能源市场稳定 2506.19870v1 -
895 06-21 Flatness After All? Flachheit nach allem? 终究是平坦吗? 2506.17809v1 -
896 06-21 Reimagining Parameter Space Exploration with Diffusion Models Reimagining Parameter-Weltraumforschung mit Diffusionsmodellen 利用扩散模型重新想象参数空间探索 2506.17807v1 -
897 06-21 AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator AdRo-FL: Informierte und sichere Kundenauswahl für das Federated Learning in der Gegenwart von Adversarial Aggregator ADRO-FL:在存在反versarial聚合体的情况下,为联邦学习进行知情和安全的客户选择 2506.17805v1 -
898 06-21 Smooth InfoMax – Towards Easier Post-Hoc Interpretability Smooth InfoMax – Auf dem Weg zu einer einfacheren Nach-Hoc-Interpretabilität 平滑的InfoMax – – 迈向更轻松的后热后解释 2408.12936v3 -
899 06-21 SING: SDE Inference via Natural Gradients SING: SDE-Schlussfolgerung über natürliche Gradienten SING: SDE 通过自然梯度推断 2506.17796v1 -
900 06-21 Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries Physik-informiert KAN PointNet: Deep Learning für simultane Lösungen für inverse Probleme im unkompressiblen Fluss auf zahlreichen irregulären Geometrien KAN PointNet:深刻学习如何同时解决许多非正常地貌的不压抑性流动的反面问题 2504.06327v2 -
901 06-21 Bayesian Social Deduction with Graph-Informed Language Models Bayesische soziale Deduktion mit Graphen-informierten Sprachmodellen 采用图形化语言模型的巴伊斯社会衰退 2506.17788v1 -
902 06-21 Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models Über die Instruktionskonditionierung hinaus, MoTE: Mischung von Task-Experten für Multi-Task-Einbettungsmodelle 超越教学-调控,MOTE:多任务嵌入模型任务专家混合 2506.17781v1 -
903 06-21 Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series Verbesserung der Glukose-Prognose von ICU-Patienten durch hierarchische Modellierung irregulärer Zeitreihen 通过不定期时序的等级建模,加强对伊斯兰法院联盟病人的葡萄糖水平预测 2411.01418v3 -
904 06-21 Toward Autonomous UI Exploration: The UIExplorer Benchmark Auf dem Weg zu autonomer UI-Exploration: Der UIExplorer Benchmark 走向自主的界面勘探:界面勘探者基准 2506.17779v1 -
905 06-21 Machine Learning Model Integration with Open World Temporal Logic for Process Automation Machine Learning Model Integration mit Open World Temporal Logic für die Prozessautomatisierung 与开放世界时间逻辑集成的机械学习模型集成 2506.17776v1 -
906 06-21 PhysiX: A Foundation Model for Physics Simulations PhysiX: Ein Grundlagenmodell für Physiksimulationen PhysiX:物理模拟基础模型 2506.17774v1 -
907 06-21 Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks Log-Normal Multiplikative Dynamiken für stabile Low-Precision Training von großen Netzwerken 用于大型网络稳定低精度培训的对地-热多复制动态 2506.17768v1 -
908 06-21 A Locally Differential Private Coding-Assisted Succinct Histogram Protocol Ein lokal differenziertes, privates Coding Assisted Succinct Histogramm Protokoll 本地差异私家编码辅助闪电直方图议定书 2506.17767v1 -
909 06-21 Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions Flugbahnvorhersage für autonomes Fahren: Fortschritt, Grenzen und Zukunftsrichtung 自主驾驶的轨迹预测:进步、限制和未来方向 2503.03262v2 -
910 06-21 Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes Derandomizing Simultane Confidence Regions for band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes 改进诺姆弹道和多数表决制度,为限制有定型功能的功能提供同步信任区 2506.17764v1 -
911 06-21 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training DUMP: Automatisiertes Lehrplanlernen auf Verteilungsebene für RL-basiertes LLM-Post-Training DDMP: 以LLLLM为基础的LLM培训后课程自动分发级别课程学习 2504.09710v2 -
912 06-21 Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion Auf dem Weg zu einem einheitlichen textuellen Graphen-Framework für spektrale Reasoning mittels physikalischer und chemischer Informationsfusion 建立一个通过物理和化学信息融合的光学理由统一文本图框架 2506.17761v1 -
913 06-21 G-Adaptivity: optimised graph-based mesh relocation for finite element methods G-Adaptivität: optimierte graphbasierte Netzverlagerung für Finite-Elemente-Methoden G-适应性:以最佳图形为基础的网格,用于定点元件方法 2407.04516v3 -
914 06-21 Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities Physik-informierte Mischung von Experten-Netzwerk für interpretierbare Batteriedegradations-Trajektorie Berechnung inmitten von Zweitleben Komplexitäten 可解释的电池降解轨迹计算第二寿命复杂性中可解释电池降解轨迹的专家网络的物理知情混合专家网络 2506.17755v1 -
915 06-21 SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification SCISSOR: Semantische Bias durch cluster-aware Siamesische Netzwerke für robuste Klassifizierung abmildern SCISSOR: 通过 “ 硬性分类 “ 的集束电电波暹脑网络,减缓语义比亚 2506.14587v2 -
916 06-21 Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences Kernel-Grenzwert für recurrente neurale Netzwerke, die auf ergodischen Datensequenzen trainiert werden Ergodic数据序列培训的经常性神经网络核心限制 2308.14555v3 -
917 06-21 Pix2Geomodel: A Next-Generation Reservoir Geomodeling with Property-to-Property Translation Pix2Geomodel: Ein Next-Generation Reservoir Geomodeling mit Property-to-Property-Übersetzung Pix2 Geomod: 下一个拥有地对地翻译的地建模 2506.17747v1 -
918 06-21 Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator Direkte diskriminative Optimierung: Ihr Likelihood-basiertes visuelles Generatives Modell ist geheim ein GAN-Diskriminator 直接排斥性优化:你以相似性为基础的视觉创造模型秘密地是一个GAN 歧视者 2503.01103v3 -
919 06-21 MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation MoORE: SVD-basierte Modell-MoE-ization für Konflikt- und Vergessenheits-Resistenz-Multi-Task-Anpassung MoORE: 以SVD为基础的冲突与遗忘-恢复-远程多任务适应示范MoE化模式 2506.14436v2 -
920 06-21 Learning Aerodynamics for the Control of Flying Humanoid Robots Aerodynamik lernen zur Steuerung von fliegenden humanoiden Robotern 用于控制飞行人类体机器人的学习空气动力学 2506.00305v2 -
921 06-21 Rethinking the Role of Operating Conditions for Learning-based Multi-condition Fault Diagnosis Überdenken der Rolle der Betriebsbedingungen für lernbasierte Multi-Condition-Fault-Diagnose 重新思考业务条件对基于学习的多设备错失诊断的作用 2506.17740v1 -
922 06-21 Recursive Gaussian Process State Space Model Rekursive Gaussian Prozess Zustand Raum Modell 递递性高斯进程状态空间模型 2411.14679v2 -
923 06-21 Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs Sicheres Pruning LoRA: Robustes, distanzgeführtes Pruning für die Sicherheitsausrichtung bei der Anpassung von LLMs 安全谨慎 LoRA:为适应LLMs实现安全协调提供强有力的远程指导 2506.18931v1 -
924 06-21 Numerical simulation of transient heat conduction with moving heat source using Physics Informed Neural Networks Numerische Simulation der transienten Wärmeleitung mit beweglicher Wärmequelle mittels Physics Informed Neural Networks 利用物理知情神经网络利用移动热源对瞬时热导导与移动热源进行数字模拟 2506.17726v1 -
925 06-21 Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains Time-Aware-Lernen Kausaldarstellung für Modellverallgemeinerung in sich entwickelnden Domänen 正在演变的域域中模型普遍化模型的学习时间- 软件因果代表 2506.17718v1 -
926 06-21 Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages Enthüllungsfaktoren für ein verbessertes POS-Tagging: Eine Studie über ressourcenarme mittelalterliche romanische Sprachen 强化POS贴标签的难解因素:低资源中世纪罗姆语言研究 2506.17715v1 -
927 06-21 Truthful Elicitation of Imprecise Forecasts Wahre Botschaft von ungenauen Prognosen 以真真真真真真真真真切的易感简易预报 2503.16395v2 -
928 06-21 CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition CEGA: Ein kosteneffizienter Ansatz für graphisch basierte Modellextraktion und -akquisition CEGA:基于图表的抽取和采购模式的成本-效益办法 2506.17709v1 -
929 06-21 Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach Zähmen von OOD-Maßnahmen für das Offline-Verstärkungslernen: ein vorteilhafter Ansatz 塔坦 OOOD 离线强化学习行动:以优势为基础的方法 2505.05126v3 -
930 06-21 Data-Dependent Regret Bounds for Constrained MABs Datendependent Regret Bounds for Constrained MABs 受约束 MAB 的受控数据依赖的 Regret Bounds 2505.20010v2 -
931 06-21 Curse of Dimensionality in Neural Network Optimization Der Fluch der Dimensionalität in der Neuralen Netzwerkoptimierung 神经网络中多维度的诅咒 优化 2502.05360v2 -
932 06-21 Zero-Shot Conversational Stance Detection: Dataset and Approaches Zero-Shot Conversational Stance Detection: Datensatz und Ansätze 零热对调调检测:数据集和方法 2506.17693v1 -
933 06-21 Enhancing Stress-Strain Predictions with Seq2Seq and Cross-Attention based on Small Punch Test Verbesserung der Stress-Strain-Vorhersagen mit Seq2Seq und Cross-Attention auf Basis von Small Punch Test 基于小型拳击试验的Seq2Seq和交叉注意加强压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力预测 2506.17680v1 -
934 06-21 Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing Inferenz-Zeit-Blick-Verfeinerung für die Mikro-Expression-Erkennung: Eventbasiertes Eye Tracking mit Motion-Aware-Post-Processing verbessern 微电压识别的推断-时玻璃改进改进:加强以动态-软件处理后的方式对事件进行目视跟踪 2506.12524v2 -
935 06-21 Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking Verstärkung Learning-based Dynamic Grouping für Rohrstruktur-Tracking 用于跟踪Tubular 结构跟踪的强化学习型动态组 2506.18930v1 -
936 06-21 Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities Nicht-asymptotische Annäherungen der Gaußschen neuronalen Netze über Ungleichheiten in Poincaré zweiter Ordnung 通过Poincaré分级的第二级不平等,高森神经网络的非症状近似 2304.04010v2 -
937 06-21 Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference Vernunftschaltungen in Sprachmodellen: Eine mechanistische Interpretation der syllogistischen Inferenz 语言模型中说明理由的电路:对音频推断的机械解释 2408.08590v3 -
938 06-21 Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization Robustes LLM-Unlearning mit MUDMAN: Meta-Unlearning mit Disruptionsmasken und Normalisierung 与 MUDMAN 一起重新学习: 以干扰蒙蔽和正常化的方式重新学习 2506.12484v2 -
939 06-21 Gaussian Process Latent Variable Modeling for Few-shot Time Series Forecasting Gaussian Prozess Latente Variable Modellierung für wenige Fotos Time Series Forecasting Gaussian 微短时间序列预测的 Gaussian 进程中点变量建模 2212.10306v2 -
940 06-21 FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies FaithfulSAE: Auf dem Weg zur Erfassung treuer Funktionen mit Sparse Autoencodern ohne externe Datensatzabhängigkeiten 忠实的SAE:在没有外部数据集依赖性的情况下, 与粗略自动解析器一起获取忠实的特征 2506.17673v1 -
941 06-21 Learning Personalized Utility Functions for Drivers in Ride-hailing Systems Using Ensemble Hypernetworks Learning Personalisierte Utility-Funktionen für Treiber in Ride-Haling-Systemen mit Ensemble Hypernetworks 利用组合式超网络进行乘载系统的驱动人员学习个性化功用功能 2506.17672v1 -
942 06-21 TPTT: Transforming Pretrained Transformer into Titans TPTT: Transformieren des vortrainierten Transformers in Titanen TPTT: 将预训练变形器转换成巨人 2506.17671v1 -
943 06-21 Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution Online-Multi-LLM-Auswahl über Kontext-Banditen unter unstrukturierter Kontext-Evolution 在无结构环境演变下通过背景强盗进行在线多LLLM选择 2506.17670v1 -
944 06-21 How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies Wie trainieren Sie Ihr Multi-Exit-Modell? Analysieren der Auswirkungen von Trainingsstrategien 如何培训你的多出口模式?分析培训战略的影响 2407.14320v2 -
945 06-21 Advanced Modeling for Exoplanet Detection and Characterization Erweiterte Modellierung für Exoplanetenerkennung und Charakterisierung 推进异地平原探测和特征化的模型化 2506.17665v1 -
946 06-21 Stop Overvaluing Multi-Agent Debate – We Must Rethink Evaluation and Embrace Model Heterogeneity Mehr-Agenten-Debatte stoppen – Wir müssen Bewertung neu denken und Modell Heterogenität umarmen 停止高估多机构辩论 – – 我们必须重新思考评价和拥抱模型多样性 2502.08788v3 -
947 06-21 How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs Wie numerische Präzision die Fähigkeit von LLMs zur Arithmetik beeinflusst 数字精确度如何影响LLM 的理理原因能力 2410.13857v2 -
948 06-21 Comba: Improving Bilinear RNNs with Closed-loop Control Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern Comba: 改进有闭环控制的双线区域网网 2506.02475v3 -
949 06-21 Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation Schritt-Opt: Steigerung der Optimierungsmodellierung in LLMs durch iterative Datensynthese und strukturierte Validierung 通过迭代数据合成和结构化校验,促进通过迭代数据合成和结构化校验,在LLMs中建立优化优化模型模型 2506.17637v1 -
950 06-21 Completely Parameter-Free Single-Loop Algorithms for Nonconvex-Concave Minimax Problems Vollständig Parameter-freie Single-Loop-Algorithmen für nicht konvex-konkave Minimax-Probleme 完全无参数的非convex- Conceve Minimax 问题单线单光解算法 2407.21372v3 -
951 06-21 RPLKG: Robust Prompt Learning with Knowledge Graph RPLKG: Robustes Prompt-Lernen mit Wissensgrafik ROPLKG: 运用知识图进行强力快速学习 2304.10805v2 -
952 06-21 LLM-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting LLM-Prompt: Integrierte Heterogene Prompt für die Entriegelung von LLMs in der Zeitreihenprognose LLM-Prompt:在时间序列预测中解锁LLMLM的综合异种提示 2506.17631v1 -
953 06-21 UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation UniMoT: Unified Molecule-Text Language Model mit diskreter Token-Darstellung UniMoT: 具有分立调制调制解析器表示式的统一分子文字语言模式 2408.00863v2 -
954 06-21 A Closer Look into Mixture-of-Experts in Large Language Models Ein genauerer Blick in Mixture-of-Experts in großen Sprachmodellen 更密切地研究大语言模型混合专家 2406.18219v3 -
955 06-21 DrivAer Transformer: A high-precision and fast prediction method for vehicle aerodynamic drag coefficient based on the DrivAerNet++ dataset DrivAer Transformer: Eine hochpräzise und schnelle Vorhersagemethode für den aerodynamischen Widerstandskoeffizienten auf Basis des DrivAerNet++ Datensatzes DriivAer变换器:基于DriivAerNet++数据集的车辆空气动力拖动系数的高精度和快速预测方法 2504.08217v5 -
956 06-21 Exploring the Secondary Risks of Large Language Models Erforschung der sekundären Risiken großer Sprachmodelle 探讨大语言模式的次要风险 2506.12382v2 -
957 06-21 Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems Nutzung von Effizienzlücken in dynamischen Deep-Learning-Systemen 利用动态深深学习系统的效率脆弱性 2506.17621v1 -
958 06-21 Trustworthy Chronic Disease Risk Prediction For Self-Directed Preventive Care via Medical Literature Validation Vertrauenswürdige Risikovorhersage für chronische Krankheiten für die selbstgesteuerte Präventivversorgung über die Validierung medizinischer Literatur 通过医学文学鉴定对自我分散的预防性护理进行可靠可靠慢性慢性病风险预测 2506.17620v1 -
959 06-21 Federated Learning With Energy Harvesting Devices: An MDP Framework Federated Learning with Energy Harvesting Devices: Ein MDP-Framework 联邦能源收获装置学习:MDP框架 2405.10513v2 -
960 06-21 EQuARX: Efficient Quantized AllReduce in XLA for Distributed Machine Learning Acceleration EQuARX: Effiziente Quantisiertes AllReduce in XLA zur Beschleunigung des verteilten maschinellen Lernens EuARX: XLA 中高效量化全减,以加速分配机器学习 2506.17615v1 -
961 06-21 Open-world machine learning: A review and new outlooks Open-World Machine Learning: Ein Rückblick und neue Perspektiven 开放世界机器学习:回顾和新展望 2403.01759v4 -
962 06-21 TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting TyphoFormer: Sprachgesteigerter Transformer für präzise Typhoon-Track-Prognose 台风前台风:用于准确预报台风轨道的语文增强变换器 2506.17609v1 -
963 06-21 Towards Fundamental Limits for Active Multi-distribution Learning Auf dem Weg zu grundlegenden Grenzen für aktives Multidistributionslernen 走向积极的多分发学习基本限制 2506.17607v1 -
964 06-21 Unlearning Isn’t Invisible: Detecting Unlearning Traces in LLMs from Model Outputs Unlearning ist nicht unsichtbar: Unlearning Traces in LLMs von Model Outputs erkennen 从模型产出中检测出LLMM中未学习的踪迹 2506.14003v2 -
965 06-21 Steering LLMs for Formal Theorem Proving Lenkung LLMs für formale Theorem Proving 正式理论证明指导LLMs 2502.15507v4 -
966 06-21 Risk Bounds For Distributional Regression Risikogrenzen für distributive Regression 分布性倒退的风险临界值 2505.09075v2 -
967 06-21 HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models HalluRNN: Halluzinationen durch Recurrent Cross-Layer-Reasoning in großen Vision-Sprachenmodellen abmildern HalluRNN:通过在大型视觉语言模型中反复出现的跨代理由减少幻觉 2506.17587v1 -
968 06-21 Novel Multicolumn Kernel Extreme Learning Machine for Food Detection via Optimal Features from CNN Neuartige Multikolumn-Kernel Extreme Lernmaschine für Lebensmittel-Erkennung durch optimale Funktionen von CNN 利用有线电视新闻网最佳地物检测食品的极端学习机器 2205.07348v2 -
969 06-21 Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models Cite Pretrain: Retrieval-freie Wissenszuweisung für große Sprachmodelle Cite Prettrain: 大语言模型的检索-无知识归属 2506.17585v1 -
970 06-21 LFR-PINO: A Layered Fourier Reduced Physics-Informed Neural Operator for Parametric PDEs LFR-PINO: Ein geschichteter Fourier reduzierter physikinformierter Neuraloperator für parametrische PDEs LFR-PINO: 用于参数PDE的多层四层减少四层物理学 2506.17582v1 -
971 06-21 Optimizing Mastery Learning by Fast-Forwarding Over-Practice Steps Mastery-Lernen optimieren, indem überpraktizierende Schritte schnell vorangebracht werden 通过快速推进超实践步骤优化硕士学习 2506.17577v1 -
972 06-21 Towards Deeper GCNs: Alleviating Over-smoothing via Iterative Training and Fine-tuning Auf dem Weg zu tieferen GCNs: Übersäuerung durch iteratives Training und Feinabstimmung mildern 走向更深的GCNCs:通过迭接培训和微调减少过度移动 2506.17576v1 -
973 06-21 Predicting Mild Cognitive Impairment Using Naturalistic Driving and Trip Destination Modeling Voraussagen einer milden kognitiven Schädigung mittels naturalistischer Fahr- und Reisezielmodellierung 利用自然驱动和出港目的地模型模型预测低度认知缺陷 2504.09027v2 -
974 06-21 Accelerating Residual Reinforcement Learning with Uncertainty Estimation Beschleunigung des residualen Verstärkungslernens mit Unsicherheitsabschätzung 以不确定的估算值加速剩余强化学习 2506.17564v1 -
975 06-21 Stochastic Gradient Descent for Nonparametric Regression Stochastischer Gradient Abstieg für nichtparametrische Regression 用于非参数回退的斯托克渐变底层 2401.00691v4 -
976 06-21 SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference SynDaCaTE: Ein synthetischer Datensatz zur Bewertung der hierarchischen Inferenz SynDaCaTE:用于评价整个整体等级推理部分的合成数据集 2506.17558v1 -
977 06-21 Multi-agent Markov Entanglement Multi-Agent Markov Verschränkung 多剂 Markov 缠绕 2506.02385v2 -
978 06-21 Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method Schnellere Low-Rank-Annäherung und Kernel Ridge-Regression über die Block-Nyström-Methode 通过块-Nyström方法更快地低兰克相近和内核脊回归 2506.17556v1 -
979 06-21 Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach Balance zwischen Interferenz und Korrelation in räumlichen Experimentaldesigns: Ein ursächlicher Graphenschnitt-Ansatz 空间实验设计中平衡干扰和关联:因果图表切割法 2505.20130v2 -
980 06-21 DRIMV_TSK: An Interpretable Surgical Evaluation Model for Incomplete Multi-View Rectal Cancer Data DRIMV_TSK: Ein Interpretations-Surgical-Bewertungsmodell für unvollständige Rectal-Krebsdaten DRIMV_TSK:不完全的多视力直肠癌数据可解释的外科评估模型 2506.17552v1 -
981 06-21 Wireless-Friendly Window Position Optimization for RIS-Aided Outdoor-to-Indoor Networks based on Multi-Modal Large Language Model Wireless-Friendly Window Position Optimization für RIS-Aided Outdoor-to-Indoor-Netzwerke basierend auf Multi-Modal Large Language Model 以多模式大语言模式为基础,优化以无线友好型友好型网络为主的外门对门至门网络的RIS辅助最佳窗口位置 2410.20691v2 -
982 06-21 Predicting E-commerce Purchase Behavior using a DQN-Inspired Deep Learning Model for enhanced adaptability E-Commerce-Prognose Kaufverhalten mit einem DQN-inspirierten Deep Learning-Modell für verbesserte Anpassungsfähigkeit 利用DQN启发的加强适应性的深学习模式预测电子商务采购行为 2506.17543v1 -
983 06-21 EditLord: Learning Code Transformation Rules for Code Editing EditLord: Regeln zur Code-Transformation für die Code-Editing 编辑主: 学习代码编辑的代码转换规则 2504.15284v3 -
984 06-21 MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization MTSIC: Mehrstufige Transformer-basierte GAN für spektrale Infrarot-Bildfarbgebung MTIIC: 用于光谱红外红外图像色彩化的多级变形器GAN 2506.17540v1 -
985 06-21 ConsumerBench: Benchmarking Generative AI Applications on End-User Devices ConsumerBench: Benchmarking Generative KI-Anwendungen auf Endgeräten 消费者:确定最终用户设备应用基准 2506.17538v1 -
986 06-21 Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU Demokratie der KI Numerische Wettermodelle: Ein Beispiel für globale Vorhersagen mit FourCastNetv2 Hergestellt von einem Universitätsforschungslabor mit GPU AI 数字气象模型的民主民主:大学研究实验室利用GPU用四CTNetv2进行的全球预测实例 2504.17028v2 -
987 06-21 Infected Smallville: How Disease Threat Shapes Sociality in LLM Agents Infizierte Smallville: Wie Krankheitsgefährdung die Gesellschaft in LLM-Agenten prägt 小镇感染者:LLM代理中疾病威胁形态如何影响社会 2506.13783v2 -
988 06-21 Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment Quantum-Verstärkungs-Lernen für Power Grid Security Assessment 提高量子强化学习促进电力网安全评估 2504.14412v2 -
989 06-21 Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection Zwei Köpfe sind eigentlich besser als eins: Auf dem Weg zu besserer adversarialer Robustheit durch Transduktion und Ablehnung 两个头比一个头实际更好:通过转换和拒绝实现更好的对抗力 2305.17528v2 -
990 06-20 (5) A Survey of State Representation Learning for Deep Reinforcement Learning Eine Umfrage über staatliche Repräsentationslernen für tiefes Stärkungslernen 国家代表深强化学习学习调查 2506.17518v1 -
991 06-20 Validating Mechanistic Interpretations: An Axiomatic Approach Validierung mechanistischer Interpretationen: Ein axiomatischer Ansatz 验证机械学解释:一种不法方法 2407.13594v2 -
992 06-20 IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G AI-Native 6G的I/Q流无线无线基础模型 2506.06718v2 -
993 06-20 $L^*LM$: Learning Automata from Examples using Natural Language Oracles $L^*LM$: Automata lernen aus Beispielen mit natürlichen Sprach-Orakeln $LLM$:从使用自然语言甲骨文的例子中学习自动地图 2402.07051v2 -
994 06-20 Modeling Neural Networks with Privacy Using Neural Stochastic Differential Equations Modellierung neuraler Netzwerke mit Datenschutz mittels neuraler stochastischer Differentialgleichungen 以使用神经神学差异等同的隐私建模神经网络 2501.06686v2 -
995 06-20 Episode-specific Fine-tuning for Metric-based Few-shot Learners with Optimization-based Training Episodenspezifische Feinabstimmung für Metric-based Learner mit Optimization-based Training 以 “ 优化化 “ 培训为 “ 以计量为基础的少见学生 “ 的 “ 最佳化 “ 培训 2506.17499v1 -
996 06-20 From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training Von der Generalität zur Meisterschaft: Composer-Style Symbolic Music Generation via Large-Scale Pre-Training 从普遍到掌握:通过大规模预培训创作作曲家-中继符号音乐 2506.17497v1 -
997 06-20 Online Adaptation for Flying Quadrotors in Tight Formations Online-Anpassung für fliegende Quadrotoren in engen Formationen 近形飞行四方体在线适应 2506.17488v1 -
998 06-20 Distilling On-device Language Models for Robot Planning with Minimal Human Intervention Destillieren von On-Device-Sprachmodellen für die Roboterplanung mit minimaler menschlicher Intervention 利用最低限度的人力干预,为机器人规划继续采用现有设计语言模式 2506.17486v1 -
999 06-20 Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization Entwirren und Regularisieren: Gebärdenspracheproduktion mit Artikulator-basierter Entwirren und Kanal-Bewusst-Regularisierung 分解和规范化:手语制作,配有以动画师为基础的分解和频道-意识规范化 2504.06610v2 -
1000 06-20 A geometric framework for momentum-based optimizers for low-rank training Ein geometrischer Rahmen für Impuls-basierte Optimatoren für Low-Rank-Training 低级培训动力优化动力优化的几何框架 2506.17475v1 -
1001 06-20 Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients Fed-Pilot: Optimierung der LoRA-Allokation für effizientes Federated Fine-Tuning mit heterogenen Kunden Fed-试点:优化LORA分配,与异质客户进行高效的联邦货币调整 2410.10200v2 -
1002 06-20 Distributional Training Data Attribution Verteilung der Ausbildungsdaten 分配培训数据 2506.12965v2 -
1003 06-20 Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms Modellierung des menschlichen visuellen Systems: Vergleichende Erkenntnisse aus response-optimierten und aufgabenoptimierten Visionsmodellen, Sprachmodellen und verschiedenen Auslesemechanismen 模拟人类视觉系统:从反应适应和任务适应的视觉模型、语言模型和不同的阅读机制中比较透视 2410.14031v4 -
1004 06-20 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v2 -
1005 06-20 Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems 理解大语言模型对书写和信息生态系统的影响的计算方法 2506.17467v1 -
1006 06-20 FedNAMs: Performing Interpretability Analysis in Federated Learning Context FedNAMs: Interpretationsanalyse im Federated Learning Context durchführen FFNAM: 在联邦学习背景下进行解释性分析 2506.17466v1 -
1007 06-20 LieDetect: Detection of representation orbits of compact Lie groups from point clouds LieDetect: Erkennung von Darstellungsbahnen kompakter Lie-Gruppen von Punktwolken 测谎:从点云中探测到紧凑层的代表轨道 2309.03086v2 -
1008 06-20 Directional Gradient Projection for Robust Fine-Tuning of Foundation Models Richtgradientenprojektion für robuste Feinsteuerung von Fundamentmodellen 基金会模型硬性精美调整方向梯度预测 2502.15895v2 -
1009 06-20 A Comparative Analysis of Distributed Linear Solvers under Data Heterogeneity Eine vergleichende Analyse der verteilten linearen Solver unter Daten Heterogenität 数据差异下分布线性溶剂的比较分析 2304.10640v4 -
1010 06-20 UT-GraphCast Hindcast Dataset: A Global AI Forecast Archive from UT Austin for Weather and Climate Applications UT-GraphCast Hindcast Dataset: Ein globales KI-Prognosearchiv aus UT Austin für Wetter- und Klimaanwendungen UT-GraphCast Hindcast 数据集:来自UT Austin的天气和气候应用全球AI预报档案 2506.17453v1 -
1011 06-20 Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking Skalierbare Einheitsharmonisierung in der medizinischen Informatik über Bayesian-Optimized Retrieval und Transformer-Based Re-Ranking 通过Bayesian-Operimized检索和变压器重新排位,在医疗信息学中通过Bayesian-Operimized检索和变压器重新排位 2505.00810v2 -
1012 06-20 FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering FRAMES-VQA: Benchmarking Fine-Tuning Robustheit über Multi-Modal Shifts in der visuellen Fragestellung FRAMES-VQA:确定视觉问题解答中多模式变化的精确调整强度基准 2505.21755v2 -
1013 06-20 Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation Medizinische KI gesund halten: Eine Überprüfung der Erkennungs- und Korrekturmethoden für Systemabbau 保持医疗全健康:系统退化检测和纠正方法审查 2506.17442v1 -
1014 06-20 Memorization to Generalization: Emergence of Diffusion Models from Associative Memory Erinnerung an die Verallgemeinerung: Entstehung von Diffusionsmodellen aus dem assoziativen Gedächtnis 记忆化为普遍化:共同内存传播模型的出现 2505.21777v2 -
1015 06-20 Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search Sequenz-zu-Sequenz-Modelle mit Aufmerksamkeit Mechanistisch Karte zur Architektur des menschlichen Gedächtnisses Suche 人类记忆搜索建筑图的顺序到顺序模型,注意人类记忆搜索结构的机械图 2506.17424v1 -
1016 06-20 UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making UProp: Untersuchung der Unsicherheitsausbreitung von LLMs in mehrstufiger agentischer Entscheidungsfindung UPROP:调查多级制剂决策中LLMs的不确定性传播情况 2506.17419v1 -
1017 06-20 Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble Zweite Meinungsfrage: Auf dem Weg zu adaptiver klinischer KI über den Konsens des Expert Model Ensembles 第二意见事项:通过专家示范组共识实现适应性临床AI 2505.23075v2 -
1018 06-20 Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning Das kostenlose Mittagessen stehlen: Die Grenzen des Dyna-Style-Verstärkungslernens aufzeigen 偷免费午餐:暴露妇产科强化学习的极限 2412.14312v3 -
1019 06-20 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Aha Moment Revisited: Sind VLMs wirklich in der Lage, Selbstverifizierung in Folgezeit Scaling? 重新审视动态:在推论-时间尺度方面,VLMs是否真正有能力进行自我核查? 2506.17417v1 -
1020 06-20 Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation Adaptive Steuerung Aufmerksamkeit Netzwerk für Unterwasser-akustische Lokalisierung und Domain-Anpassung 水下声传本土化和域域改造适应性控制关注网络 2506.17409v1 -
1021 06-20 Zero-Shot NAS via the Suppression of Local Entropy Decrease Zero-Shot NAS durch die Unterdrückung der lokalen Entropie-Verringerung 通过制止局部星气减少,零热NAS 2411.06236v3 -
1022 06-20 Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting Part$^{2}$GS: Teilbewusste Modellierung von artikulierten Objekten mittels 3D Gaussian Splatting *2美元=2美元=GS:使用 3D Gaussian Splatting 3D 的人工物体部分认知建模 2506.17212v1 -
1023 06-20 BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning BREAD: Verzweigte Rollouts von Expert Anchors Bridge SFT & RL für die Vernunft 专家领航桥SFT和RL的分包推演 2506.17211v1 -
1024 06-20 AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability AQA-Bench: Ein interaktiver Benchmark für die Bewertung der sequenziellen Begründungsfähigkeit von LLMs AQA- “ AQA-区 “ :评估LLLMs按顺序推理能力的互动基准 2402.09404v2 -
1025 06-20 DreamCube: 3D Panorama Generation via Multi-plane Synchronization DreamCube: 3D-Panorama-Generation über Multi-Plane-Synchronisierung DreamCube:3D全景生成,通过多飞机同步同步 2506.17206v1 -
1026 06-20 Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning Netzwerksparsität entsperrt das Skalierungspotenzial von Deep Reinforcement Learning 网络分化 释放深强化学习的扩大潜力 2506.17204v1 -
1027 06-20 DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments DAL: Ein praktisches Prior-Free Black-Box Framework für nicht-stationäre Bandit-Umgebungen DAL:非高度强盗环境实际的、事先免费的黑盒框架 2501.19401v3 -
1028 06-20 Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres Schrödinger-Brücke passend für baumstrukturierte Kosten und entropische Wasserstein-Barycentres Schrödinger桥,与树木结构成本和Entropic Wasserstein Barycentres 相匹配 2506.17197v1 -
1029 06-20 Optimal Implicit Bias in Linear Regression Optimale Implizite Bias bei linearer Regression 线性回归中的优化隐含比值 2506.17187v1 -
1030 06-20 Variational Learning of Disentangled Representations Variationelles Lernen von entfremdeten Repräsentationen 不同代表的不同学习 2506.17182v1 -
1031 06-20 Convergent Linear Representations of Emergent Misalignment Convergent Lineare Darstellungen von Emergent Fehlausrichtung 新出现的对接不均现象的一致线性代表 2506.11618v2 -
1032 06-20 Deep generative models as the probability transformation functions Tiefe generative Modelle als die Wahrscheinlichkeitstransformationsfunktionen 深基因模型作为概率转换功能 2506.17171v1 -
1033 06-20 EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback EF21 mit Glocken & Pfeifen: Sechs algorithmische Erweiterungen des modernen Fehlerrückblicks EF21 配有 “ 钟声和吹哨:现代错误反馈的六种演算扩展 2110.03294v2 -
1034 06-20 A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models Minimalistische Methode zur Feinabstimmung von Text-zu-Bild-Diffusions-Modellen 微微调文本到图像传播模型的微量微调方法 2506.12036v2 -
1035 06-20 Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Sparse-Reg: Verbesserung der Probenkomplexität im Offline-Verstärkungs-Lernen mit Sparsity 利用公平性改进离线强化学习的抽样复杂性 2506.17155v1 -
1036 06-20 Do We Need Large VLMs for Spotting Soccer Actions? Brauchen wir große VLMs zum Spotting von Fußball-Aktionen? 我们是否需要大VLMs来发现足球行动? 2506.17144v1 -
1037 06-20 Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models Konsequente Probenahme und Simulation: Molekulare Dynamik mit energiebasierten Diffusionsmodellen 一致的取样和模拟:以能源为基础的扩散模型的分子动态 2506.17139v1 -
1038 06-20 Robust Training with Data Augmentation for Medical Imaging Classification Robustes Training mit Datenvergrößerung für die Klassifikation der medizinischen Bildgebung 医学成像分类数据强化强力培训 2506.17133v1 -
1039 06-20 Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model Schnelle und kontinuierliche Vertrauensbewertung für effektive Aufgabenkooperation durch Siamesisches Modell 通过西亚模式对有效任务协作进行快速和持续信任评价 2506.17128v1 -
1040 06-20 Watermarking Language Models through Language Models Wasserzeichen von Sprachmodellen durch Sprachmodelle 通过语言模型建立语言模型 2411.05091v2 -
1041 06-20 TransDreamerV3: Implanting Transformer In DreamerV3 TransDreamerV3: Implantationstransformator in DreamerV3 TransDreamerV3: 在梦中植入变异器 2506.17103v1 -
1042 06-20 Identifiability of Deep Polynomial Neural Networks Identifizierbarkeit von tiefpolynomischen neuralen Netzwerken 深多元神经网络的可识别性 2506.17093v1 -
1043 06-20 Domain Specific Benchmarks for Evaluating Multimodal Large Language Models Domainspezifische Benchmarks für die Bewertung multimodaler Großsprachenmodelle 评价多模式大语言模式的具体域域基准 2506.12958v2 -
1044 06-20 Neural Polar Decoders for DNA Data Storage Neuronale Polardecoder für die DNA-Datenspeicherung DNA数据存储的神经极极代号 2506.17076v1 -
1045 06-20 Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting Diffusion & Adversarial Schrödinger Brücken über iterative Proportionale Markovian Fitting 通过迭代比例相称马尔科维安健身桥 2410.02601v3 -
1046 06-20 Al-Khwarizmi: Discovering Physical Laws with Foundation Models Al-Khwarizmi: Physikalische Gesetze mit Stiftungsmodellen entdecken Al-Khwarizmi:利用基金会模式发现实体法 2502.01702v2 -
1047 06-20 Safe Guaranteed Exploration for Non-linear Systems Sichere, garantierte Exploration für nichtlineare Systeme 非线性系统安全保证勘探 2402.06562v2 -
1048 06-20 Empowering Near-Field Communications in Low-Altitude Economy with LLM: Fundamentals, Potentials, Solutions, and Future Directions Stärkung der Nahfeldkommunikation in Low-Altitude Economy mit LLM: Grundlagen, Potenziale, Lösungen und Zukunftsrichtungen 以LLM:基础、潜力、解决方案和未来方向,增强低度经济中近地通信能力 2506.17067v1 -
1049 06-20 Flow-Based Non-stationary Temporal Regime Causal Structure Learning Fließbasiertes nicht-stationäres Temporalregime Kausalstrukturlernen 以流动为基础的非静止不流动时间制度因果结构学习 2506.17065v1 -
1050 06-20 Client Selection Strategies for Federated Semantic Communications in Heterogeneous IoT Networks Kundenauswahlstrategien für die gefederte semantische Kommunikation in heterogenen IoT-Netzwerken 异源性互联网网络中联邦语义通信的客户选择战略 2506.17063v1 -
1051 06-20 SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification SAFEx: Analysieren von Schwachstellen von MoE-basierten LLMs durch stabile sicherheitskritische Expertenidentifikation SAFEx:通过稳定安全-关键专家鉴定分析以MOE为基础的LLMLMLLMs的脆弱性 2506.17368v1 -
1052 06-20 Universal Music Representations? Evaluating Foundation Models on World Music Corpora Universal Music Representations? Bewertung von Stiftungsmodellen auf World Music Corpora 世界音乐公司模型评估基金会 2506.17055v1 -
1053 06-20 From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Von Konzepten zu Komponenten: Konzept-agnostische Aufmerksamkeit Modul Entdeckung in Transformatoren 从概念到组成部分:在变异器中发现概念 – – 不可接受注意模块 2506.17052v1 -
1054 06-20 Navigating the Deep: Signature Extraction on Deep Neural Networks Navigieren der Tiefe: Signaturextraktion auf tiefen neuralen Netzwerken 深层导航:深神经网络的签名提取 2506.17047v1 -
1055 06-20 MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models MUCAR: Multilinguale Cross-Modal Ambiguity Auflösung für multimodale große Sprachmodelle Benchmarking MUCAR:为多模式大语言模型制定多语言跨模式的多语种和多模式模糊分辨率基准 2506.17046v1 -
1056 06-20 Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning Problemraumtransformationen für die Verallgemeinerung außerhalb der Verteilung im Verhaltens-Klonen 行为性克隆中传播外普遍化的空间转变问题 2411.04056v2 -
1057 06-20 COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework COS-DPO: Bedingtes eins-shot Multi-Objective Fine-Tuning Framework COS-DPO: 有条件的单片多目标微调框架 2410.08316v3 -
1058 06-20 MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection MAWIFlow Benchmark: Realistische flussbasierte Bewertung für Netzwerkintrusionserkennung MAWIFlow 基准:对网络入侵探测的现实流动评价 2506.17041v1 -
1059 06-20 LSCD: Lomb-Scargle Conditioned Diffusion for Time series Imputation LSCD: Lomb-Scargle Conditioned Diffusion für Zeitreihen Imputation LSCD: 用于时间序列的有附加条件的激光扩散 2506.17039v1 -
1060 06-20 Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction Bayesisches gemeinsames Modell von Multi-Sensor- und Failure Event-Daten für Multi-Mode Failure Prediction 多种模式故障预测多传感器和故障事件多发生数据的贝叶斯联合模型 2506.17036v1 -
1061 06-20 Critical Appraisal of Fairness Metrics in Clinical Predictive AI Kritische Bewertung von Fairness-Metriken in klinisch vorausschauender KI 临床预测性人工智能中的公平度量 2506.17035v1 -
1062 06-20 Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence Bedingte Front-Tür-Anpassung für heterogene Behandlung Zuordnungseffektschätzung unter Nichtbefolgung 不遵守规定情况下对不同不同待遇不同待遇的 条件性前门调整 外门调整 2505.05677v3 -
1063 06-20 Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment Skalierbares und zuverlässiges Multi-Agenten-Verstärkungslernen für die Verkehrszuweisung 可缩放和可靠的多试剂交通分配强化学习 2506.17029v1 -
1064 06-20 Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path Perturbation Null-Schuss-Klasse Entlernen über schichtweise Relevanz Analyse und neuronale Path Perturbation 通过从图层角度的关联性分析和神经路径干扰,零中弹的班级取消学习 2410.23693v2 -
1065 06-20 Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning Eau De $Q$-Network: Adaptive Destillation von neuralen Netzwerken im Deep Reinforcement Learning Eau de $Q$-网络:深强化学习中神经网络的适应性蒸馏 2503.01437v2 -
1066 06-20 A Quantile Regression Approach for Remaining Useful Life Estimation with State Space Models Ein Quantile Regressionsansatz für verbleibende sinnvolle Lebensschätzung mit State Space Models 国家空间模型中剩余使用寿命估计的量化回归方法 2506.17018v1 -
1067 06-20 The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation Die versteckten Kosten eines Bildes: Quantifizierung des Energieverbrauchs von KI-Bilderzeugung 图像的隐藏成本:对AI图像生成的能源消耗量进行量化 2506.17016v1 -
1068 06-20 Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows Simulieren korrelierter Elektronen mit Symmetrie-verstärkten Normalisierungsströmen 以对称强制正常化流程模拟与Cor相关电子 2506.17015v1 -
1069 06-20 Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators Robustes Verstärkungslernen für diskrete kompositorische Generierung über allgemeine Soft Operatoren 通过一般软操作员为分辨合成生成进行强力强化学习 2506.17007v1 -
1070 06-20 Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments Prmpt2Adpt: Promptbasierte Zero-Shot-Domain-Anpassung für ressourcenbeschränkte Umgebungen 受资源限制的环境的快速零热域适应 2506.16994v1 -
1071 06-20 CoIFNet: A Unified Framework for Multivariate Time Series Forecasting with Missing Values CoIFNet: Ein einheitliches Framework für die Multivariate Zeitreihenprognose mit fehlenden Werten CoIFNet:多变时间序列缺值预测统一框架 2506.13064v2 -
1072 06-20 SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments SHAKTI: Ein 2,5 Milliarden Parameter kleines Sprachmodell optimiert für Edge-KI und Low-Resource-Umgebungen SHAKTI:为边缘AI和低资源环境优化的2.5亿亿亿分数小语言模型 2410.11331v2 -
1073 06-20 The learned range test method for the inverse inclusion problem Die Lernbereich-Testmethode für das inverse Inklusion-Problem 反包容问题的学习范围测试方法 2411.00463v2 -
1074 06-20 Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond Sprachengpässe-Modelle: Ein Rahmen für interpretierbares Wissen auf Tracing und darüber hinaus 语言瓶颈模式:可解释知识追踪框架及以后 2506.16982v1 -
1075 06-20 Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction Gegurtetes und ensembled neurales Netzwerk für lineare und nichtlineare Dimensionsreduktion 内线和非线性足够尺寸减少带带和组合的神经网络 2412.08961v2 -
1076 06-20 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework Polysemantik mit PRISM erfassen: Ein Multi-Konzept-Feature Beschreibung Framework 利用PRISM获得多边性能:多概念特征描述框架 2506.15538v2 -
1077 06-20 Latent Concept Disentanglement in Transformer-based Language Models Latent Concept Disentanglement in Transformer-basierten Sprachmodellen 以变换器为基础的语言模型中的边端概念分解 2506.16975v1 -
1078 06-20 Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks Masken-PINNs: Regelbare Funktionsverteilungen in physikinformierten Neuronalen Netzwerken Mask-PINNs:物理成形神经网络中规范地物分布 2505.06331v2 -
1079 06-20 PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval PromptDSI: Prompt-basiert Probefrei Instance-wise Incremental Learning for Document Retrieval 快速DSI:为文件检索进行基于即时的无排练-不重复式递增学习 2406.12593v3 -
1080 06-20 RocketStack: A level-aware deep recursive ensemble learning framework with exploratory feature fusion and model pruning dynamics RocketStack: Ein level-aware tiefe rekursives Ensemble Lernrahmen mit Sondierungsfunktion Fusion und Modellschneiden Dynamik 火箭堆:一个具有探索性聚集和模型排出动态的深深有觉知的循环深层共聚学习框架 2506.16965v1 -
1081 06-20 LogProber: Disentangling confidence from contamination in LLM responses LogProber: Entwirren des Vertrauens in LLM-Antworten 日志Prober:解除对LLM反应中污染的信心 2408.14352v3 -
1082 06-20 Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review Methoden des maschinellen Lernens für Anwendungen der kleinen Daten- und Upstream-Bioverarbeitung: Ein umfassender Überblick 小型数据和上游生物处理应用的机械学习方法:全面审查 2506.12322v2 -
1083 06-20 LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models LAION-C: Ein Out-of-Distribution-Benchmark für Web-Scale Vision-Modelle LAION-C:网络规模愿景模型的分发外基准 2506.16950v1 -
1084 06-20 Solving a class of stochastic optimal control problems by physics-informed neural networks Lösung einer Klasse stochastischer optimaler Kontrollprobleme durch physikinformierte neuronale Netzwerke 通过物理知情神经网络解决一系列随机最佳控制问题 2402.15592v2 -
1085 06-20 Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs Kalibrierte vorausschauende untere Bounds zur Zeit-zu-Unsicher-Probenahme in LLMs LLM 中时间到非安全抽样时对低频谱校准的预测值下下界 2506.13593v2 -
1086 06-20 Gaussian Processes and Reproducing Kernels: Connections and Equivalences Gaußsche Prozesse und reproduzierende Kerne: Verbindungen und Äquivalenzen 高斯进程和再生产核心:连接和等效 2506.17366v1 -
1087 06-20 Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test Steigerung der Expressivität von Quantum-Neuralen Netzwerken auf Basis des SWAP-Tests 根据全部门办法测试,提高量子神经网络的表达性 2506.16938v1 -
1088 06-20 A deep learning and machine learning approach to predict neonatal death in the context of São Paulo Ein tiefer Lern- und maschineller Lernansatz zur Vorhersage des neonatalen Todes im Kontext von São Paulo 在圣保罗背景下预测新生儿死亡的深层学习和机器学习方法 2506.16929v1 -
1089 06-20 A Neural Operator based Hybrid Microscale Model for Multiscale Simulation of Rate-Dependent Materials Ein neurales Operator-basiertes Hybrid-Mikroskalen-Modell zur Multiskalen-Simulation von ratenabhängigen Materialien 以神经操作器为基础的多级制模调依赖材料多级模拟混合微型模型 2506.16918v1 -
1090 06-20 Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs Robuste Gradienten für POMDPs mit verstecktem Modell 隐藏模式 POMDPs 的硬性有限记忆政策梯度 2505.09518v2 -
1091 06-20 From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts Von Daten zu Wissen: Bewertung, wie effizient Sprachmodelle Fakten lernen 从数据到知识:评价如何高效语言模式学习事实 2506.16912v1 -
1092 06-20 Graph is all you need? Lightweight data-agnostic neural architecture search without training Graph ist alles, was Sie brauchen? Leichte daten-agnostische neuronale Architektur-Suche ohne Training 轻量数据神经神经结构搜索,不经过训练 2405.01306v2 -
1093 06-20 RCNet: $ΔΣ$ IADCs as Recurrent AutoEncoders RCNet: $Δω$ IADCs als recurrent AutoEncoder RCNet:作为经常性自动编码器的空间碎片协委会 2506.16903v1 -
1094 06-20 On Almost Surely Safe Alignment of Large Language Models at Inference-Time Zur fast sicher sicheren Ausrichtung großer Sprachmodelle bei Inferenz-Time 在推断时几乎可以安全地统一大语言模型 2502.01208v3 -
1095 06-20 With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You Mit begrenzten Daten für multimodale Ausrichtung, lassen Sie die STRUKTUR-Leitfaden Sie 以有限数据实现多式联运对齐,让结构引导你 2506.16895v1 -
1096 06-20 LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment LearnAlign: Grundlegende Datenauswahl für Verstärkungslernen in großen Sprachmodellen basierend auf verbesserter Gradient Alignment 学习对称:根据改进梯度对齐,为在大语言模式中强化学习选择理由数据 2506.11480v2 -
1097 06-20 From Lab to Factory: Pitfalls and Guidelines for Self-/Unsupervised Defect Detection on Low-Quality Industrial Images Vom Labor zur Fabrik: Pitfalls und Richtlinien für selbst-/unüberwachte Fehlererkennung auf niederqualitativen Industriebildern 从实验室到工厂:坑和低质量工业形象自我/不受监督的缺陷探测准则 2506.16890v1 -
1098 06-20 Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders Stabiles Lernen mit Spiking Neuronal Networks ausgestattet mit Affine Encodern und Decodern 利用利用配有仙形编码器和代碼器的螺旋神经网络进行的稳定学习 2404.04549v3 -
1099 06-20 Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v5 -
1100 06-20 The Importance of Being Lazy: Scaling Limits of Continual Learning Die Bedeutung des Faulseins: Skalierungsgrenzen des kontinuierlichen Lernens 懒惰的重要性:持续学习的局限性 2506.16884v1 -
1101 06-20 Efficient Feedback Gate Network for Hyperspectral Image Super-Resolution Effizientes Feedback Gate-Netzwerk für Hyperspektrale Bild-Super-Resolution 超光谱图像超分辨率高效反馈门户网 2506.17361v1 -
1102 06-20 A Statistical Evaluation of Indoor LoRaWAN Environment-Aware Propagation for 6G: MLR, ANOVA, and Residual Distribution Analysis Eine statistische Auswertung von Indoor LoRaWAN Environment-Aware Propagation für 6G: MLR, ANOVA und Residual Distribution Analysis 6G:MLR、ANOVA和残余分布分析的室内LORAWAN环境-软件传播统计评价 2504.16688v3 -
1103 06-20 Training Multi-Layer Binary Neural Networks With Local Binary Error Signals Training Multi-Layer Binär-Neural-Netzwerke mit lokalen Binär-Fehler-Signale 利用本地二进制错误信号,培训多语言二进制神经网络 2412.00119v3 -
1104 06-20 Optimal Depth of Neural Networks Optimale Tiefe der neuralen Netze 神经网络的最佳深度 2506.16862v1 -
1105 06-20 Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution Auf dem Weg zu einer effizienten Nur-Shot Graph-Neural-Architektur Suche über Partitionierung Gradient Beitrag 通过分割渐变贡献, 实现高效、 短短截图图像神经结构搜索 2506.01231v2 -
1106 06-20 ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation ICC: 多式数据集曲线的量化图像显示具体度 2403.01306v4 -
1107 06-20 Anomaly Detection in Event-triggered Traffic Time Series via Similarity Learning Anomalie-Erkennung in ereignisgetriggerten Traffic Time-Serien über Ähnlichkeits-Lernen 通过类似学习在事件触发的交通时间序列中异常探测 2506.16855v1 -
1108 06-20 Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models Reward-Agnostic Prompt-Optimierung für Diffusionsmodelle von Text zu Bild 文本到图像传播模型的奖励-不可知迅速优化 2506.16853v1 -
1109 06-20 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Anpassung beim Lernen: LLMs für wissenschaftliche Probleme mit intelligenter Werkzeugverwendung anpassen 在学习期间适应适应:利用智能工具适应科学问题定位LMS 2411.00412v4 -
1110 06-20 Bandwidth Selectors on Semiparametric Bayesian Networks Bandbreiten-Selektoren auf semiparametrischen Bayesischen Netzwerken 半参数贝近地网络上的带宽选择器 2506.16844v1 -
1111 06-20 FedFitTech: A Baseline in Federated Learning for Fitness Tracking FedFitTech: Eine Basis im Federated Learning für Fitness-Tracking FFFFFTTTech:联邦健身跟踪学习基准 2506.16840v1 -
1112 06-20 Beyond Blur: A Fluid Perspective on Generative Diffusion Models Beyond Blur: Eine flüssige Perspektive auf generative Diffusionsmodelle 模糊之外:关于发源传播模型的流透视角 2506.16827v1 -
1113 06-20 Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs Vorhersage neuer Forschungsrichtungen in der Materialwissenschaft mit großen Sprachmodellen und Konzeptgraphen 利用大语言模型和概念图预测材料科学新研究方向 2506.16824v1 -
1114 06-20 When and How Does CLIP Enable Domain and Compositional Generalization? Wann und wie aktiviert CLIP Domain- und Kompositionsverallgemeinerung? CLIP 何时和如何启用域和组成集约化? 2502.09507v2 -
1115 06-20 Robust Group Anomaly Detection for Quasi-Periodic Network Time Series Robuste Gruppenanomalienerkennung für Quasi-periodische Netzwerk-Zeitreihen 准固定网络自动探测强力组 时间序列 2506.16815v1 -
1116 06-20 Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning Boltzmann Klassifikator: Ein thermodynamisch inspirierter Ansatz zum überwachten Lernen Boltzmann 分类: 一种热动力学激励式的受监督学习方法 2505.06753v2 -
1117 06-20 CINNAMON: A hybrid approach to change point detection and parameter estimation in single-particle tracking data CINNAMON: Ein hybrider Ansatz zur Änderung der Punkterkennung und der Parameterschätzung in Einzelteilchen-Tracking-Daten CINNAMON: 改变单粒子跟踪数据中点探测和参数估计的混合方法 2503.14253v2 -
1118 06-20 DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis DVFS-Aware DNN-Schlussfolgerung zu GPUs: Latenzmodellierung und Leistungsanalyse DVFS-Aware DNN GPUs的推论:长期建模和业绩分析 2502.06295v2 -
1119 06-20 Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack Effizient, aber gefährdet: Benchmarking und Defending LLM Batch Prompting Attack 高效但脆弱:基准设定和捍卫LLM批次快速袭击 2503.15551v2 -
1120 06-20 Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective Erforschung und Verbesserung der Initialisierung für tiefe Graphen-Neural-Netzwerke: Eine Signalverbreitungsperspektive 探索和改进深图神经网络的初始化:信号传动视角 2506.16790v1 -
1121 06-20 Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps LoRA durch die Lens of Parameter Redundancy erneut besuchen: Spectral Encoding hilft 通过参数冗余的镜头对 LoRA 进行重审: 光谱编码帮助 2506.16787v1 -
1122 06-20 CodeV-R1: Reasoning-Enhanced Verilog Generation CodeV-R1: Grundlegende Verilog-Generierung 代码V-R1:有理性的增强性性性性性性性生殖器生成 2505.24183v2 -
1123 06-20 What Is the Point of Equality in Machine Learning Fairness? Beyond Equality of Opportunity Was ist der Punkt der Gleichheit in der Fairness des maschinellen Lernens? 机器学习公平中的平等点是什么? 2506.16782v1 -
1124 06-20 SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation SSR-Zero: Einfaches Selbstveredelungslernen für maschinelle Übersetzung 机械翻译简单自评强化学习 2505.16637v3 -
1125 06-20 Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies Können wir Fehler ohne Fehlerdaten erkennen? Ungewissheit-Bewusst Runtime Failure Detection for Imitation Learning Policies 我们能否在无故障数据的情况下检测失败? 用于模拟学习政策的不确定性- 软件运行时故障检测 2503.08558v3 -
1126 06-20 Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations Wissensdestillationsrahmen für die Beschleunigung hochakkurater neuraler Netzwerk-basierter molekularer Dynamiksimulationen 加速高准确度高神经网基分子动态模拟学知识蒸馏框架 2506.15337v2 -
1127 06-20 Metapath-based Hyperbolic Contrastive Learning for Heterogeneous Graph Embedding Metapath-basiertes hyperbolisches Kontrastives Lernen für heterogene Grapheneinbettung 异异异形图形嵌入式的超双曲反对立学习 2506.16754v1 -
1128 06-20 Nature Language Model: Deciphering the Language of Nature for Scientific Discovery Nature Language Model: Die Sprache der Natur für die wissenschaftliche Entdeckung bestimmen 自然语言模型:为科学发现而破除自然语言 2502.07527v3 -
1129 06-20 Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation Off-Policy-Actor-Kritik für adversarische Beobachtung Robustheit: Virtuelles alternatives Training durch symmetrische Politikevaluierung 外部观察强力非政策行为者-批评者:通过对称政策评价进行虚拟替代培训 2506.16753v1 -
1130 06-20 DeepSelective: Interpretable Prognosis Prediction via Feature Selection and Compression in EHR Data DeepSelective: Interpretierbare Prognosevorhersage über Feature Selection und Komprimierung in EHR-Daten 深选择:通过EHR数据中的地物选择和压缩,通过特征选择和压缩,作出可解释预测预测预测 2504.11264v2 -
1131 06-20 Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization Konforme Schlussfolgerung unter hochdimensionalen Kovariate Verschiebungen über Likelihood-Ratio Regularisierung 通过传统-拉蒂奥正规化,在高多样性可变性转变下发生非正式推论 2502.13030v4 -
1132 06-20 IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification IsoNet: Kausale Analyse multimodaler Transformer für die neuromuskuläre Gestenklassifikation IsoNet:用于神经肌肉手腕分类的多式变形器的因果分析 2506.16744v1 -
1133 06-20 Group-Level Data Selection for Efficient Pretraining Gruppen-Level-Datenauswahl für effizientes Vortraining 高效预科培训的集团一级数据选择 2502.14709v2 -
1134 06-20 Client-Centered Federated Learning for Heterogeneous EHRs: Use Fewer Participants to Achieve the Same Performance Client-Centered Federated Learning for Heterogeneous EHRs: Verwenden Sie weniger Teilnehmer, um die gleiche Leistung zu erreichen 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 异种EHR学习:利用较少的参与者实现相同业绩 2404.13318v4 -
1135 06-20 Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening Unwahrscheinliche Belohnung: GRPO über die Verbreitung hinaus schärfen 奖励不理想者:将GROPO提升到分销加压之后 2506.02355v2 -
1136 06-20 Optimism Without Regularization: Constant Regret in Zero-Sum Games Optimismus ohne Regularisierung: Ständiger Bedauern in Null-Sum-Spielen 不带常规的乐观主义:对零-苏姆运动会的一贯悔恨 2506.16736v1 -
1137 06-20 On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis On Training-Test (Mis)Ausrichtung in unüberwachter kombinatorischer Optimierung: Beobachtung, empirische Exploration und Analyse 未经监督的组合优化中的培训测试(Miss)调整:观察、经验探索和分析 2506.16732v1 -
1138 06-20 Disentangling and Integrating Relational and Sensory Information in Transformer Architectures Entwirren und Integrieren von relationalen und sensorischen Informationen in Transformer-Architekturen 将关系和感官信息拆解和整合到变换结构中 2405.16727v3 -
1139 06-20 Incentivizing High-quality Participation From Federated Learning Agents Anreize für eine qualitativ hochwertige Beteiligung von Federated Learning Agents 激励来自联邦学习代理机构的高质量参与 2506.16731v1 -
1140 06-20 TriCon-SF: A Triple-Shuffle and Contribution-Aware Serial Federated Learning Framework for Heterogeneous Healthcare Data TriCon-SF: Ein Dreifach-Shuffle und Contribution-Aware Serial Federated Learning Framework für heterogene Gesundheitsdaten TriCon-SF: 不同基因保健数据三维和贡献软件系列联邦学习框架 2506.16723v1 -
1141 06-20 DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy DRARL: Entflechtung-Verstärkung-Verstärkung-Lernen zur effizienten Verbesserung der autonomen Fahrpolitik DARL: 为有效改进自主驾驶政策而加强学习 2506.16720v1 -
1142 06-20 Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback Automatisierte Skill Discovery für Sprachagenten durch Exploration und iteratives Feedback 通过探索和迭回反馈自动发现语言物剂技能 2506.04287v2 -
1143 06-20 Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness Multi-Agenten-Debatte als Test-Time Scaling: Eine systematische Studie der bedingten Wirksamkeit 重新审议作为试验时间尺度的多机构辩论:对有条件有效性的系统研究 2505.22960v2 -
1144 06-20 Info-Coevolution: An Efficient Framework for Data Model Coevolution Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution 信息革命:数据模型革命的有效框架 2506.08070v2 -
1145 06-20 How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension Wie viele Domains genügen für Domain Generalization? Eine enge Charakterisierung über die Domain Shattering Dimension 有多少域 足以 域 普遍化 ? 通过 域 折叠 维格 的 紧度 。 2506.16704v1 -
1146 06-20 SIDE: Semantic ID Embedding for effective learning from sequences SIDE: Semantische ID Einbetten für effektives Lernen aus Sequenzen 语义识别码嵌入,以便从序列中有效学习 2506.16698v1 -
1147 06-20 Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach Verständnis und Reduzierung der klassenabhängigen Effekte von Datenvergrößerung mit einem Zwei-Spieler-Spiel-Ansatz 理解和减少数据递增的二级依赖影响,采用双层游戏方法 2407.03146v4 -
1148 06-20 Fast and Stable Diffusion Planning through Variational Adaptive Weighting Schnelle und stabile Diffusionsplanung durch variationale adaptive Gewichtung 通过变式适应性重力规划快速和稳定扩散 2506.16688v1 -
1149 06-20 Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen 共同残存挖掘者:改进现实世界接触-Rich 人教管管管 2506.16685v1 -
1150 06-20 How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions Wie Sie Ihr Text-zu-Image-Modell trainieren: Bewertung von Design-Optionen für synthetische Trainingsbilder 如何培训您的文本到图像模型:评估合成培训说明的设计选择 2506.16679v1 -
1151 06-20 Open-Set Graph Anomaly Detection via Normal Structure Regularisation Open-Set Graph Anomalie Erkennung durch Normalstruktur Regularisierung 通过正常结构规范化进行开放版图异常检测 2311.06835v5 -
1152 06-20 Kinetics: Rethinking Test-Time Scaling Laws Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen 动因:重新思考试验时间扩增法 2506.05333v3 -
1153 06-20 RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations RL2Grid: Benchmarking-Verstärkung im Netzbetrieb RL2Grid:在电力网业务中确定加强学习的基准 2503.23101v2 -
1154 06-20 Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models Adaptive Anleitung beschleunigt die Stärkung des Lernens von Vernunftmodellen 适应性指导加速加速强化理性模型学习 2506.13923v2 -
1155 06-20 The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing Der Hitchhiker-Leitfaden für effizientes, Ende-zu-Ende und enges DP-Auditing Hitchhiker的《高效、最终到最终和严格DP审计指南》 2506.16666v1 -
1156 06-20 Private Training & Data Generation by Clustering Embeddings Privates Training & Datengenerierung durch Clustering-Embeddings 通过集群化嵌入进行私营培训和数据生成 2506.16661v1 -
1157 06-20 A Minimalist Optimizer Design for LLM Pretraining Minimalistisches Optimizer-Design für LLM Pretraining LLM 培训前最起码的优化剂设计 2506.16659v1 -
1158 06-20 Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards Multi-Armed Bandits mit maschinellem Lernen-erzeugte Surrogate Belohnungen 多装甲强盗和机器学习优于学习的代金奖 2506.16658v1 -
1159 06-20 Near Optimal Decision Trees in a SPLIT Second Nahe Optimale Entscheidung Bäume in einem SPLIT zweite SPLIT 秒中接近最佳决定树 2502.15988v2 -
1160 06-19 (4) Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures Relationales Deep Learning: Herausforderungen, Grundlagen und Architekturen der nächsten Generation 关系深层学习:挑战、基础和下一代建筑 2506.16654v1 -
1161 06-19 Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials Integration dynamischer Systeme Lernen mit Basismodellen: Ein Meta-Evolutionäres KI-Framework für klinische Studien 将动态系统学习与基础模型相结合:临床试验的非革命性AI框架 2506.14782v2 -
1162 06-19 LLMs in Coding and their Impact on the Commercial Software Engineering Landscape LLMs in Coding und ihre Auswirkungen auf die kommerzielle Software-Engineering-Landschaft 编码及其对商业软件工程景观的影响 2506.16653v1 -
1163 06-19 CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity CodeDiffuser: Aufmerksamkeitsverstärkte Diffusionspolitik über VLM-generierten Code für Instruction Ambiguity 代码用户:通过VLM - 教育结构设计守则加强关注 - 强化传播政策 2506.16652v1 -
1164 06-19 A Distributional-Lifting Theorem for PAC Learning Ein Distributional-Lifting-Theorem für PAC-Lernen PAC 学习的分布式放行理论 2506.16651v1 -
1165 06-19 Distributional Adversarial Loss Verlust des Verteilungsgefälles 分布相对损 损 2406.03458v2 -
1166 06-19 Semantic Outlier Removal with Embedding Models and LLMs Semantic Outlier Entfernung mit Einbetten Modelle und LLMs 带有嵌入型模型和LLMs的语义外外部清除 2506.16644v1 -
1167 06-19 Learning to Route LLMs with Confidence Tokens Lernen, LLMs mit vertrauensvollen Token zu routen 学习使用充满信心的LLMs路线 2410.13284v3 -
1168 06-19 Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions Low-Resource-Video-Super-Resolution mit Speicher, Wavelets und deformierbare Konvolutionen 使用记忆、波子和变形革命的低资源视频超级分辨率 2502.01816v3 -
1169 06-19 Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation Latent Noise Injection für die private und statistisch ausgerichtete Synthetische Datengenerierung 私人和统计上统一合成数据生成的热点喷射器 2506.16636v1 -
1170 06-19 Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts Lion Secretly Solves Constrained Optimization: Wie Lyapunov voraussagt 限制优化:如Lyapunov预测 2310.05898v6 -
1171 06-19 Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten 精神病纵向数据产生的可预期的学习结果 2506.16629v1 -
1172 06-19 Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System Erste Untersuchung der LLM-Assistenten Entwicklung eines regelbasierten klinischen NLP-Systems 利用LLM协助开发有章可循的临床NLP系统的初步调查 2506.16628v1 -
1173 06-19 FlatCAD: Fast Curvature Regularization of Neural SDFs for CAD Models FlatCAD: Schnelle Curvature Regularisierung von neuralen SDFs für CAD-Modelle FlatCAD: CAD 模型的神经SDF 快速曲线常规化 2506.16627v1 -
1174 06-19 Harmonizing Safety and Speed: A Human-Algorithm Approach to Enhance the FDA’s Medical Device Clearance Policy Harmonisierung von Sicherheit und Geschwindigkeit: Ein Mensch-Algorithmus-Ansatz zur Verbesserung der Sicherheitspolitik für medizinische Geräte der FDA 统一安全和速度:采取人类-逻辑方法,加强林业发展局的医疗设备清理政策 2407.11823v2 -
1175 06-19 MonoSOWA: Scalable monocular 3D Object detector Without human Annotations MonoSOWA: Skalierbarer monookularer 3D Objektdetektor ohne menschliche Anmerkungen MonoSOWA:无人说明的可缩缩的单镜3D物体探测器 2501.09481v3 -
1176 06-19 Distribution Parameter Actor-Critic: Shifting the Agent-Environment Boundary for Diverse Action Spaces Verteilungsparameter Aktor-Kritik: Verschiebung der Agent-Umwelt-Grenze für unterschiedliche Aktionsräume 分布参数 Actor-Critic: 改变不同行动空间的代理环境边界 2506.16608v1 -
1177 06-19 SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics SlepNet: Spektrales Subgraphenrepräsentationslernen für neurale Dynamik SlepNet:神经动力学光谱子图示学习 2506.16602v1 -
1178 06-19 FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE FLAME: Auf dem Weg zu Federated Fine-Tuning großen Sprachmodellen durch adaptive SMoE FLAME:通过适应性SMOE,走向联邦微调大语言模式 2506.16600v1 -
1179 06-19 DRIVE Through the Unpredictability:From a Protocol Investigating Slip to a Metric Estimating Command Uncertainty DRIVE durch die Unvorhersehbarkeit:Von einem Protokoll, das Slip untersucht, zu einem Metric Estimating Command Uncertainty 无法预测:从协议调查滑坡到计量估计命令不确定性 2506.16593v1 -
1180 06-19 Energy-Based Transfer for Reinforcement Learning Energiebasierter Transfer für verstärktes Lernen 强化学习以能源为基础的转让 2506.16590v1 -
1181 06-19 Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework Messung eines (ausreichenden) Weltmodells in LLMs: Ein Rahmen für die Abweichungszersetzung 计量(足够)LLMM世界模型:差异分解框架 2506.16584v1 -
1182 06-19 A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning A Impliziert B: Schaltungsanalyse in LLMs für propositionelle logische Vernunft A Implies B: 用于推定逻辑理由的LLMLM的电路分析 2411.04105v4 -
1183 06-19 From Semantic To Instance: A Semi-Self-Supervised Learning Approach Von semantisch bis instance: Ein halbselbstüberwachter Lernansatz 从语义到实例:半自监督的学习方法 2506.16563v1 -
1184 06-19 ChatDBG: Augmenting Debugging with Large Language Models ChatDBG: Augmenting Debugging mit großen Sprachmodellen 聊天DBG: 使用大语言模式加强调试 2403.16354v5 -
1185 06-19 One Sample is Enough to Make Conformal Prediction Robust Eine Probe reicht aus, um konforme Vorhersagen robust zu machen 一个样本就足够制造 共创预测力了 2506.16553v1 -
1186 06-19 A Free Probabilistic Framework for Analyzing the Transformer-based Language Models Ein freier probabilistischer Rahmen für die Analyse der transformerbasierten Sprachmodelle 分析以变换器为基础的语言模型的自由概率框架 2506.16550v1 -
1187 06-19 Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU Herr Snuffleupagus bei SemEval-2025 Task 4: Unlearning Factual Knowledge von LLMs mit adaptiver RMU Snuffleupagus先生在SemEval-2025任务4:从利用适应性RMU的LLMs中汲取事实知识 2506.16548v1 -
1188 06-19 BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios BIDA: Ein Zwei-Ebenen-Interaktionsentscheidungs-Algorithmus für autonome Fahrzeuge in dynamischen Verkehrsszenarien BIDA:动态交通情况中机动车辆的双级互动决策比额 2506.16546v1 -
1189 06-19 Essential-Web v1.0: 24T tokens of organized web data Essential-Web v1.0: 24T Token von organisierten Web-Daten 基本Web v1.0: 24个有组织网络数据标记 2506.14111v2 -
1190 06-19 On the Robustness of Decision-Focused Learning Zur Robustheit des entscheidungsorientierten Lernens 关于决策重点学习的有力性 2311.16487v4 -
1191 06-19 Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches Ausrichtung der ASR-Bewertung auf menschliche und LLM-Richtungen: Intelligibilitätsmetrics mit phonetischen, semantischen und NLI-Anflügen 将ASR评价与人类和LLM判决:使用电话、语义和NLI方法的智能计量学 2506.16528v1 -
1192 06-19 Improvement of Nuclide Detection through Graph Spectroscopic Analysis Framework and its Application to Nuclear Facility Upset Detection Verbesserung der Nuklid-Erkennung durch Graph Spektroskopische Analyserahmen und ihre Anwendung auf Kernanlagen-Auffangerkennung 通过图谱光谱分析框架及其适用于核设施爆裂探测的图示光谱分析框架改进核子分子探测 2506.16522v1 -
1193 06-19 Robust Reward Modeling via Causal Rubrics Robuste Reward-Modellierung über Kausalrubriken 通过果实卢布建模的强力奖赏模型 2506.16507v1 -
1194 06-19 Subspace-Boosted Model Merging Subraum-beschleunigtes Modell Zusammenführen 子空间叠装模型合并 2506.16506v1 -
1195 06-19 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity SparseLoRA: LLM-Fine-Tuning mit Kontextsparsität beschleunigen 加快LLM与上下文质量的精细调整 2506.16500v1 -
1196 06-19 ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning ML-Master: Auf dem Weg zu KI durch Integration von Exploration und Vernunft ML-Master:通过综合探讨和理由,争取AI为AI 2506.16499v1 -
1197 06-19 QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation QG-SMS: Verbesserung der Testobjektanalyse durch Studentenmodellierung und Simulation QG-SMS:通过学生建模和模拟加强测试物品分析 2503.05888v2 -
1198 06-19 Manifold Learning for Personalized and Label-Free Detection of Cardiac Arrhythmias Manifold Learning für personalisierte und etikettenfreie Erkennung von Herzrhythmusstörungen 人工和无标签地发现心心心律失常的人工和不贴标签的人文学习 2506.16494v1 -
1199 06-19 Competing Bandits in Decentralized Contextual Matching Markets Konkurrieren von Banditen in dezentralisierten Kontext-Matching-Märkten 分散环境匹配市场中相互竞争的强盗 2411.11794v2 -
1200 06-19 Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection Auf dem Weg zu allgemeingültigen allgemeinen schädlichen Sprachdatensätzen für Implizite Hass-Spracherkennung 争取建立通用通用通用有害言论数据集,用于隐含仇恨言论探测 2506.16476v1 -
1201 06-19 Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining Human2LocoMan: Vielseitige Quadrupedalmanipulation mit menschlichem Vortraining lernen 人类2 Locoman: 学习与人类预科培训一起四步操作 2506.16475v1 -
1202 06-19 Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations Förderung des multidemografischen föderierten Lernens für die Röntgenanalyse in der Brust mittels selbstüberwachter Darstellungen für allgemeine Zwecke 利用普通用途自我监督的表述方式,促进多人口联合会学习胸部射电分析 2504.08584v2 -
1203 06-19 AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation AlphaTrans: Ein neuro-symbolischer Kompositionsansatz für Repository-Level-Code-Übersetzung und Validierung AlphaTrans: 存储层代码翻译和校验的神经-交元组合法 2410.24117v5 -
1204 06-19 Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities Progressive Inferenz-Zeit Annealing von Diffusionsmodellen für die Probenahme von Boltzmann Dichten Boltzmann大区采样扩散模型的逐步推导-及时销毁 2506.16471v1 -
1205 06-19 Human-like Forgetting Curves in Deep Neural Networks Menschenähnliche vergessende Kurven in tiefen neuralen Netzwerken 人类在深神经网络中忘记曲线 2506.12034v2 -
1206 06-19 Black-Box Privacy Attacks on Shared Representations in Multitask Learning Black-Box-Datenschutzangriffe auf geteilte Repräsentationen im Multitasking-Lernen 在多任务学习中分享代表的黑人隐私攻击 2506.16460v1 -
1207 06-19 Joint Tensor-Train Parameterization for Efficient and Expressive Low-Rank Adaptation Gemeinsame Tensor-Train-Parameterisierung für effiziente und Expressive Low-Rank-Anpassung 高效和表现式低射速适应联合登机机-培训参数 2506.16456v1 -
1208 06-19 Consumer-friendly EEG-based Emotion Recognition System: A Multi-scale Convolutional Neural Network Approach Consumer-friendly EEG-based Emotion Recognition System: Ein multi-scale Convolutional Neural Network Ansatz 以基于基于爱-爱-爱-爱-爱-爱-爱-情感承认系统:多规模革命神经网络方法 2506.16448v1 -
1209 06-19 Leveraging Influence Functions for Resampling Data in Physics-Informed Neural Networks Nutzung von Einflussfunktionen für die Resampling-Daten in physikinformierten Neuronalen Netzwerken 利用物理内成形神经网络中恢复数据取样的利用影响功能 2506.16443v1 -
1210 06-19 PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding PerceptionLM: Open-Access-Daten und Modelle für ein detailliertes visuelles Verständnis 感知LM:开放存取数据和详细视觉理解模型 2504.13180v2 -
1211 06-19 An efficient neuromorphic approach for collision avoidance combining Stack-CNN with event cameras Ein effizienter neuromorpher Ansatz zur Kollisionsvermeidung, der Stack-CNN mit Eventkameras kombiniert 将Stack-CNN与事件摄像头相结合的一种高效的避免碰撞神经形态法 2506.16436v1 -
1212 06-19 Agentic Personalisation of Cross-Channel Marketing Experiences Agentische Personalisierung von Cross-Channel-Marketing-Erfahrungen 跨渠道营销经验的代理个性化 2506.16429v1 -
1213 06-19 EFormer: An Effective Edge-based Transformer for Vehicle Routing Problems EFormer: Ein effektiver Edge-basierter Transformer für Fahrzeugrouting-Probleme Eformer:车辆运行问题的有效边缘变异器 2506.16428v1 -
1214 06-19 Quantifying artificial intelligence through algorithmic generalization Quantifizierung künstlicher Intelligenz durch algorithmische Verallgemeinerung 通过算法一般化对人工智能进行量化 2411.05943v2 -
1215 06-19 ALTA: Compiler-Based Analysis of Transformers ALTA: Compiler-basierte Analyse von Transformatoren ALTA:以汇编者为基础对变形器的分析 2410.18077v2 -
1216 06-19 Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models MoE Router optimieren: Design, Implementierung und Evaluation in Transformer-Modellen 优化教育部优化路由器:变革型模型的设计、实施和评价 2506.16419v1 -
1217 06-19 On Continuous Monitoring of Risk Violations under Unknown Shift Kontinuierliche Überwachung von Risikoverletzungen unter unbekannter Verschiebung 关于根据未知轮移持续监测违反风险情况 2506.16416v1 -
1218 06-19 When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework Wann funktioniert Trennen und Erobern für den langen Kontext LLM? Ein Lärmzersetzungsrahmen 何时分化和征服工作为长期LLM服务? 噪音分解框架 2506.16411v1 -
1219 06-19 CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning CLOUD: Ein skalierbares und physikinformiertes Grundmodell für das Kristalldarstellungslernen CLOUD: 水晶代表制学习的可缩缩化和物理成形基础模型 2506.17345v1 -
1220 06-19 Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression Breaking the Compression Ceiling: Datenfreie Pipeline für ultraeffiziente Delta-Kompression 打破压缩上限:超有效三角洲压缩无数据管道 2505.13563v2 -
1221 06-19 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights Drag-and-Drop LLMs: Nullschnelle Prompt-zu-Gewichte 拖放LMs: 零热快速到重 2506.16406v1 -
1222 06-19 Generating Directed Graphs with Dual Attention and Asymmetric Encoding Generieren von gerichteten Graphen mit doppelter Aufmerksamkeit und asymmetrischer Kodierung 产生具有双重注意和对称编码的定向图形 2506.16404v1 -
1223 06-19 IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks IS-Bench: Bewertung der interaktiven Sicherheit von VLM-getriebenen Körpermitteln bei täglichen Haushaltsaufgaben IS-Bench:评估每日家务任务中VLM-Driven 充装代理人的互动安全 2506.16402v1 -
1224 06-19 GoalLadder: Incremental Goal Discovery with Vision-Language Models Zielleiter: Incremental Goal Discovery mit Vision-Language-Modellen 目标增编:利用视觉语言模型发现递增目标 2506.16396v1 -
1225 06-19 State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification 国家空间局Kolmogorov Arnold 解释非线性非线性系统识别网络 2506.16392v1 -
1226 06-19 Patch-based learning of adaptive Total Variation parameter maps for blind image denoising Patchbasiertes Lernen von adaptiven Total Variation-Parameterkarten für Blind Image Denoising 盲人图像除污的全变化参数图 2503.16010v2 -
1227 06-19 CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset CLIP-MG: Leitende Semantische Aufmerksamkeit mit skeletalen Pose-Funktionen und RGB-Daten zur Micro-Gesture-Erkennung auf dem iMiGUE-Datensatz CLIP-MG:在iMIGUE数据集中以骨骨骼藻类特征和RGB数据指导语义注意,用于识别微气识别的RGB数据 2506.16385v1 -
1228 06-19 Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval Hopfield-Fenchel-Young Networks: Ein einheitliches Rahmenwerk für assoziative Memory Retrieval Hopfield-Fenchel-青年网络:联合记忆检索统一框架 2411.08590v2 -
1229 06-19 Celo: Training Versatile Learned Optimizers on a Compute Diet Celo: Training vielseitiger gelernter Optimierer auf einer Computer-Diät Celo:就计算膳食培训有说服力的优化剂 2501.12670v2 -
1230 06-19 Classification of Cattle Behavior and Detection of Heat (Estrus) using Sensor Data Klassifizierung des Rinderverhaltens und der Erkennung von Wärme (Estrus) anhand von Sensordaten 使用传感器数据对牛行为进行分类和检测热量(Estrus) 2506.16380v1 -
1231 06-19 FFINO: Factorized Fourier Improved Neural Operator for Modeling Multiphase Flow in Underground Hydrogen Storage FFINO: Factorized Fourier Verbesserter neuraler Operator für die Modellierung von Mehrphasenströmungen im unterirdischen Wasserstoffspeicher FFINO:用于模拟地下氢储存多阶段流动模型的四倍改进神经操作员 2506.17344v1 -
1232 06-19 WebXAII: an open-source web framework to study human-XAI interaction WebXAII: ein Open-Source-Web-Framework zur Untersuchung der Mensch-XAI-Interaktion WebXAII:研究人类-XAI相互作用的公开来源网络框架 2506.14777v2 -
1233 06-19 Variance-Based Defense Against Blended Backdoor Attacks Varianzbasierte Verteidigung gegen gemischte Hintertürangriffe 以差异为基础防范混合的幕后袭击 2506.01444v2 -
1234 06-19 Data-Driven Policy Mapping for Safe RL-based Energy Management Systems Datengestützte Politikmappings für sichere RL-basierte Energiemanagementsysteme 以RL为基础的安全能源管理系统数据驱动政策绘图 2506.16352v1 -
1235 06-19 Adaptive Experimental Design for Policy Learning Adaptives Experimentelles Design für politisches Lernen 政策学习适应性实验设计 2401.03756v4 -
1236 06-19 Watermarking Autoregressive Image Generation Autoregressive Bildgenerierung mit Wasserzeichen 自动递减图像生成 2506.16349v1 -
1237 06-19 Quantum-Informed Contrastive Learning with Dynamic Mixup Augmentation for Class-Imbalanced Expert Systems Quantum-informiertes Kontrastives Lernen mit dynamischer Mixup Augmentation für klassengerechte Expertensysteme 以动态混合增量促进分类平衡专家系统 2506.13987v2 -
1238 06-19 Sustainable Greenhouse Microclimate Modeling: A Comparative Analysis of Recurrent and Graph Neural Networks Sustainable Greenhouse Microclimate Modeling: Eine vergleichende Analyse von recurrenten und Graphen-Neuralen Netzwerken 可持续的温室微观气候建模:经常性和图形神经网络比较分析 2502.17371v4 -
1239 06-19 Feedback-driven recurrent quantum neural network universality feedbackgesteuerte rezidivierende quantenneuronale Netzwerk-Universalität 由反馈驱动的经常性量子神经网络普遍性 2506.16332v1 -
1240 06-19 Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners Beitrag anregen und auch Parameter lernen: Föderiertes Lernen mit strategischen Dateninhabern 激励贡献和学习参数:与战略数据所有者进行联邦学习 2505.12010v2 -
1241 06-19 SimBank: from Simulation to Solution in Prescriptive Process Monitoring SimBank: Von der Simulation zur Lösung in der Prescriptive Process Monitoring SimBank:从模拟到规范程序监测的解决方案 2506.14772v2 -
1242 06-19 Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation Effiziente und flexible Neural-Netzwerk-Schulung durch schichtweise Feedback-Propagation 通过多层反馈传播进行有效和灵活的神经网络培训 2308.12053v3 -
1243 06-19 Bayesian Optimization over Bounded Domains with the Beta Product Kernel Bayesian Optimierung über Bounded Domains mit dem Beta Product Kernel 利用贝塔产品中枢在Beta Product Kernel的封闭区上空实现最佳贝叶斯优化 2506.16316v1 -
1244 06-19 Signatures to help interpretability of anomalies Unterschriften zur Interpretation von Anomalien 签名有助于异常点的解释 2506.16314v1 -
1245 06-19 Improved Exploration in GFlownets via Enhanced Epistemic Neural Networks Verbesserte Exploration in GFlownets durch verstärkte epistemische Neuralnetze 通过增强的神电网网改进对绿地网的探索 2506.16313v1 -
1246 06-19 Neural Guided Diffusion Bridges Neural geführte Diffusionsbrücken 神经向导扩散桥 2502.11909v3 -
1247 06-19 Optimizing Multilingual Text-To-Speech with Accents & Emotions Multilinguale Text-To-Speech-Optimierung mit Akzenten & Emotionen 利用 Acents 和情感优化多语种文字语音语音 2506.16310v1 -
1248 06-19 Adaptive Social Metaverse Streaming based on Federated Multi-Agent Deep Reinforcement Learning Adaptives soziales Metaverse-Streaming auf Basis von Federated Multi-Agent Deep Reinforcement Learning 基于联邦多要求深层强化学习的适应性社会超常流 2506.17342v1 -
1249 06-19 AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation AlignDistil: Token-Level-Sprachmodell Alignment als Adaptive Policy Destillation Aligndistil: 作为适应性政策蒸馏的调整级语言模式模型对齐 2503.02832v2 -
1250 06-19 The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units Die Condition Number als Scale-Invariant Proxy für die Informationskodierung in neuralen Einheiten 作为神经单位信息编码缩放- 变量代理工具的条件编号 2506.16289v1 -
1251 06-19 Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective Next-Token Vorhersage sollte Ambiguität-Sensitiv sein: Eine Meta-Learning-Perspektive 下肯预测应该是对模糊度敏感度的:一种元学习的视角 2506.16288v1 -
1252 06-19 LLM-Guided Indoor Navigation with Multimodal Map Understanding LLM-geführte Indoor-Navigation mit multimodalem Kartenverständnis 具有多式地图理解的LLM-引导式室内导航 2503.11702v4 -
1253 06-19 Random feature approximation for general spectral methods Random Feature Approximation für allgemeine Spektralmethoden 普通光谱方法的随机随机特征近似 2506.16283v1 -
1254 06-19 Harnessing omnipresent oscillator networks as computational resource Nutzung allgegenwärtiger Oszillatornetzwerke als rechnerische Ressource 将无所不在的振动器网络作为计算资源 2502.04818v3 -
1255 06-19 The Exploration of Error Bounds in Classification with Noisy Labels Die Erforschung von Fehlergrenzen in der Klassifizierung mit Noisy-Labels 探索有噪音标签的分类误差 2501.15163v2 -
1256 06-19 Serving Large Language Models on Huawei CloudMatrix384 Große Sprachmodelle auf Huawei CloudMatrix384 瓦威云马特列克384 2506.12708v3 -
1257 06-19 Optimal Online Bookmaking for Any Number of Outcomes Optimale Online Bookmaking für jede Anzahl von Ergebnissen 任意数量结果的优化在线账簿制作 2506.16253v1 -
1258 06-19 Guaranteed prediction sets for functional surrogate models Garantierte Vorhersagesätze für funktionale Ersatzmodelle 功能替代模型的保证预测数据集 2501.18426v2 -
1259 06-19 Synthetic ALS-EEG Data Augmentation for ALS Diagnosis Using Conditional WGAN with Weight Clipping Synthetische ALS-EEG Datenvergrößerung für ALS Diagnose mit Bedingtem WGAN mit Gewichtseinschnitt 使用有重量缩放的附加条件WGAN系统进行ALS诊断的ALS-EEG合成合成数据增强 2506.16243v1 -
1260 06-19 Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means Einheitliche Mean-Schätzung für schwerfällige Verteilungen über Median-of-Means 通过中中度中度中途重型发运重故障统一平均估计值 2506.14673v3 -
1261 06-19 Active MRI Acquisition with Diffusion Guided Bayesian Experimental Design Aktive MRT-Akquisition mit Diffusion Guided Bayesian Experimental Design 主动MRI 利用扩散导导贝叶斯实验设计获得MRI 2506.16237v1 -
1262 06-19 Think Global, Act Local: Bayesian Causal Discovery with Language Models in Sequential Data Think Global, Act Local: Bayesian Causal Discovery mit Sprachmodellen in Sequential Data 《全球思维》,《地方行动法》:Bayesian Causal发现序列数据中的语言模式 2506.16234v1 -
1263 06-19 Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation Kann KI von ungesehenen Galaxien träumen? Bedingtes Diffusionsmodell für Galaxy Morphology Augmentation AI 能梦到看不见的星系吗? 2506.16233v1 -
1264 06-19 Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy Malware-Klassifizierung NLP & Machine Learning für verbesserte Genauigkeit 恶意分类利用NLP和机器学习来提高准确度 2506.16224v1 -
1265 06-19 Interventions Against Machine-Assisted Statistical Discrimination Maßnahmen gegen maschinengestützte statistische Diskriminierungen 反对机器辅助统计歧视的干预措施 2310.04585v4 -
1266 06-19 From Pixels to CSI: Distilling Latent Dynamics For Efficient Wireless Resource Management Von Pixeln zu CSI: Destillieren von Latent Dynamics für effizientes drahtloses Ressourcenmanagement 从像素到 CSI: 为高效无线资源管理蒸馏中流动态 2506.16216v1 -
1267 06-19 Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts Multi-Preference-Optimierung: Verallgemeinern von DPO über Set-Level-Kontrast 多优先优化:通过定点对比度普及残疾人组织 2412.04628v4 -
1268 06-19 VideoGAN-based Trajectory Proposal for Automated Vehicles VideoGAN-basierter Flugbahnvorschlag für Automatisierte Fahrzeuge 以视频GAN为基础的自动车辆轨迹建议 2506.16209v1 -
1269 06-19 Learning Dynamics in Continual Pre-Training for Large Language Models Dynamisches Lernen im kontinuierlichen Pre-Training für große Sprachmodelle 大语言模式持续培训前培训中的学习动态 2505.07796v2 -
1270 06-19 Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs Effiziente und datenschutzschonende Soft-Prompt-Übertragung für LLMs 为LLMM公司高效和隐私保护软件迅速转让 2506.16196v1 -
1271 06-19 Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction Föderated Learning for MRI-based BrainAGE: Eine multizentrische Studie zur post-stroke funktionellen Ergebnisvorhersage 为基于MRI的脑力智能学习联合会学习:关于打击后功能性结果预测的多中心研究 2506.15626v2 -
1272 06-19 CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization CP$2美元:利用几何方法,通过Canonic化进行非正式预测 2506.16189v1 -
1273 06-19 Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval Hierarchisches Multi-Positive-Kontrastives Lernen für das Patentbild-Retrieval 用于专利图像检索的等级式多动态差异学习 2506.13496v3 -
1274 06-19 Robust Hallucination Detection in LLMs via Adaptive Token Selection Robuste Halluzinationserkennung in LLMs durch adaptive Tokenauswahl 通过适应 Tok 选择在LLMs中进行强力幻觉检测 2504.07863v2 -
1275 06-19 Sheaf Hypergraph Networks Sheaf Hypergraph Networks Sheaf 电报网络 2309.17116v2 -
1276 06-19 Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks Repräsentationslernen mit gegenseitigem Einfluss von Modalitäten für die Knotenklassifikation in multimodalen Heterogenen Netzwerken 多模式不同形式网络节点分类方式相互影响,代表学习 2505.07895v3 -
1277 06-19 From Teacher to Student: Tracking Memorization Through Model Distillation Vom Lehrer zum Schüler: Erinnerung durch Modelldestillation verfolgen 从教师到学生:通过示范蒸馏跟踪记忆 2506.16170v1 -
1278 06-19 Performance of Rank-One Tensor Approximation on Incomplete Data Leistung der Rang eins Tensor-Annäherung auf unvollständigen Daten 在不完全数据上接近 “ 一等-一等 “ 的性能 2504.07818v2 -
1279 06-19 Return-Aligned Decision Transformer Return-Aligned Decision Transformer 回归统一决定转换器 2402.03923v6 -
1280 06-19 DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v6 -
1281 06-19 Geometric Learning in Black-Box Optimization: A GNN Framework for Algorithm Performance Prediction Geometrisches Lernen in der Black-Box-Optimierung: Ein GNN-Framework für Algorithmen-Performance-Vorhersage 黑人Box优化中的几何学习:GNN指数性能预测框架 2506.16144v1 -
1282 06-19 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning GRPO-CARE: Konsequentitäts-Bewusst-Verstärkungs-Lernen für multimodale Vernunft GROPO-CARE: 统一软件强化学习,用于多模式理由 2506.16141v1 -
1283 06-19 Solving Zero-Sum Convex Markov Games Lösen Zero-Sum Convex Markov Spiele 解决零- 苏姆 Convex Markov 游戏 2506.16120v1 -
1284 06-19 Deep learning joint extremes of metocean variables using the SPAR model Deep Learning gemeinsame Extreme von Metozean-Variablen mit dem SPAR-Modell 使用SPAR模型的深海海洋变量的深学习联合极端 2412.15808v2 -
1285 06-19 ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v5 -
1286 06-19 Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification Vermeidung von Überbeanspruchung in Graphen-Neuralen Netzwerken durch Spectrum-Erhaltung von Sparsifikationen 通过光谱保护分解减轻图形神经网络的过度隔动 2506.16110v1 -
1287 06-19 Advancing atomic electron tomography with neural networks Weiterentwicklung der Atomelektronentomographie mit neuronalen Netzwerken 利用神经网络推进原子电子摄影 2506.16104v1 -
1288 06-19 Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans Flow Matching: Markov-Kernel, stochastische Prozesse und Transportpläne 流程匹配:Markov Kernels, 存储过程和运输计划 2501.16839v4 -
1289 06-19 Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning Semantic-Aware-Spektrum-Sharing im Internet von Fahrzeugen auf Basis von Deep Reinforcement Learning 基于深强化学习的车辆在互联网上共享语义-语言软件频谱 2406.07213v4 -
1290 06-19 Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning Rekonfigurierbare intelligente oberflächenunterstützte VEC auf Basis von Multi-Agenten-Verstärkungslernen 基于多机构强化学习的可重新配置智能表面辅助VEC 2406.11318v2 -
1291 06-19 On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse Über die Grenzen der Sprachgenerierung: Trade-Offs zwischen Halluzination und Modekollaps 语言产生限制:幻觉与模式崩溃之间的取舍 2411.09642v2 -
1292 06-19 Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks Deep-Reinforcement-Learning-based AoI-Aware Ressourcenzuweisung für RIS-Aided IoV-Netzwerke 为RIS援助的IOV网络分配的深入加强-基于学习的AoI-软件资源 2406.11245v2 -
1293 06-19 A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders Ein Brain-to-Population Graph Learning Framework zur Diagnose von Hirnerkrankungen 脑至人口图诊断脑疾病学习框架 2506.16096v1 -
1294 06-19 Temporal horizons in forecasting: a performance-learnability trade-off Zeithorizonte bei der Prognose: ein Leistungs-Ernennbarkeits-Austausch 预测的时空前景:业绩-可忽略性权衡取舍 2506.03889v2 -
1295 06-19 Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network Ressourcenzuteilung für Twin Maintenance und Computing Task Processing im digitalen Twin Vehicular Edge Computing Network 数字双面电子计算网络双向维护和电子计算任务处理的资源分配 2407.07575v2 -
1296 06-19 Mobility-Aware Federated Self-supervised Learning in Vehicular Network Mobilitätsbewusstes Selbstüberwachtes Lernen im Vehicular Network 车辆网络中流动软件 – – 流动软件 – – 联邦监督的自我学习 2408.00256v2 -
1297 06-19 Diffusion-Based Hypothesis Testing and Change-Point Detection Diffusionsbasierte Hypothesenprüfung und Change-Point-Erkennung 基于传播的假假设测试和变化点探测 2506.16089v1 -
1298 06-19 HSTU-BLaIR: Lightweight Contrastive Text Embedding for Generative Recommender HSTU-BLaIR: Leichte Kontrastive Text-Embedding für generative Recommender HSTU-BLAIR: 用于产生建议建议的轻量比对式文本嵌入 2504.10545v3 -
1299 06-19 Investigating Lagrangian Neural Networks for Infinite Horizon Planning in Quadrupedal Locomotion Untersuchung lagrangischer neuraler Netzwerke für die unendliche Horizontplanung in der Quadrupedal-Locomotion 调查拉格朗江神经网络,以在四分居动中进行无限期地地平线规划 2506.16079v1 -
1300 06-19 Probing the Robustness of Large Language Models Safety to Latent Perturbations Nachweis der Robustheit großer Sprachmodelle Sicherheit zu latenten Störungen 检验大语言模型安全性是否强,以证实大语言模型安全性是否足以应对前端扰动 2506.16078v1 -
1301 06-19 Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching Schnellere stochastische Optimierung mit willkürlichen Verzögerungen über asynchrones Mini-Batching 通过非同步小型批次快速优化任意拖延 2408.07503v2 -
1302 06-19 Joint User Priority and Power Scheduling for QoS-Aware WMMSE Precoding: A Constrained-Actor Attentive-Critic Approach Gemeinsame Benutzerpriorität und Leistungsplanung für QoS-Aware WMMSE-Vorkodierung: Ein eingeschränkter, aktiv-kritischer Ansatz Qos-Aware WMMSE 预码: 控制- 控制- 控制- 控制- 控制- 控制- 控制- 反应- 批评方法 2506.16074v1 -
1303 06-19 KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity KCES: Training-freie Verteidigung für robuste Graphen-Neural-Netzwerke über Kernel-Komplexität KCES:通过核心复杂度为坚固的图表神经网络提供无训练防御 2506.11611v2 -
1304 06-19 A Lightweight RL-Driven Deep Unfolding Network for Robust WMMSE Precoding in Massive MU-MIMO-OFDM Systems Ein leichtes RL-getriebenes Tiefen-Entfaltungs-Netzwerk für robuste WMMSE-Vorkodierung in massiven MU-MIMO-OFDM-Systemen 大型MU-MIMO-OFDM系统中强力 WMMSE 预码的轻量 RL-Dripry 深载网络 2506.16072v1 -
1305 06-19 Provably Efficient Online RLHF with One-Pass Reward Modeling Effiziente Online-RLHF mit One-Pass-Reward-Modellierung 配有 “ 一纸分奖励 “ 模型的在线甚高频网络高效率 2502.07193v2 -
1306 06-19 Complexity of Injectivity and Verification of ReLU Neural Networks Komplexität der Injektivität und Verifizierung von ReLU-Neuralnetzen RELU神经网络的投射复杂度和核查 2405.19805v3 -
1307 06-19 DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing DRL-basiertes, selbstüberwachtes Lernen für Aufgabe Offloading und Ressourcenallokation im ISAC-fähigen Fahrzeug Edge Computing DRL-基于DRL的基于联邦的自我监督学习,以在ISAC-可加入的车辆边缘电子计算中进行任务卸载和资源分配 2408.14831v2 -
1308 06-19 On the generalization of Tanimoto-type kernels to real valued functions Über die Verallgemeinerung von Tanimoto-Kerneln zu echten wertgeschätzten Funktionen 将谷本本型内核普遍化为实际的有价值功能 2007.05943v3 -
1309 06-19 Floating-Point Neural Networks Are Provably Robust Universal Approximators Floating-Point-Neural-Netzwerke sind wahrscheinlich robuste Universal-Annäherung 浮动点神经网络具有可可预见强健的通用通用近似器 2506.16065v1 -
1310 06-19 Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v3 -
1311 06-19 CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations CRIA: Ein Cross-View-Interaktions- und Instanz-adaptierter Vorausbildungsrahmen für allgemeine EEG-Vertretungen CRIA: 通用的EEG代表制跨视角互动和根据实际情况制定的培训前框架 2506.16056v1 -
1312 06-19 A Sparse Tensor Generator with Efficient Feature Extraction Ein Sparse Tensor Generator mit effizienter Feature Extraction 具有高效地物采掘功能的简式天窗生成器 2405.04944v3 -
1313 06-19 LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records LabTOP: Ein einheitliches Modell für Labortestergebnisse Vorhersage auf elektronische Gesundheitsdatensätze LabTOP:电子健康记录实验室试验结果预测统一模型 2502.14259v4 -
1314 06-19 From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Vom Experten zum Generalisten: Auf dem Weg zur allgemeinen Ganzkörperkontrolle für humanoide Roboter 从专家到通才:对人体机器人实行全面整体控制 2506.12779v2 -
1315 06-19 From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience Von Daten zur Entscheidung: Data-Centric Infrastruktur für reproduzierbare ML in Collaborative eScience 从数据到决定:合作电子科学中可复制ML的数据中心基础设施 2506.16051v1 -
1316 06-19 Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Leiter-Residual: Parallelismus-bewusste Architektur zur Beschleunigung großer Modellinferenz mit Kommunikationsüberlappung 云梯-残余:加速大型模型推断与通信重叠的平行意识结构 2501.06589v5 -
1317 06-19 FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system FALCON: Feedback-gesteuert Adaptiv Lang-/Kurzzeitspeicher verstärkt Coding Optimization System FALCON: 反馈驱动的适应性长/短期内存强化编码优化系统 2410.21349v5 -
1318 06-19 AutomataGPT: Forecasting and Ruleset Inference for Two-Dimensional Cellular Automata AutomataGPT: Prognose und Regelschluss für zweidimensionale zelluläre Automata AutomataGPT: 两维细胞自动数据预测和规则推理 2506.17333v1 -
1319 06-19 DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling DynScaling: Effizientes Verifier-freies Inferenzscaling über dynamische und integrierte Sampling DynSACLAG:通过动态和综合抽样,提高验证人无引文的有效比例 2506.16043v1 -
1320 06-19 OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents OSWorld-Human: Benchmarking der Effizienz von Computer-Use Agents OS 世界人类:计算机使用代理的效率基准 2506.16042v1 -
1321 06-19 Enhancing Document-Level Question Answering via Multi-Hop Retrieval-Augmented Generation with LLaMA 3 Verbesserung der Dokumenten-Fragebeantwortung mittels Multi-Hop Retrieval-Augmented Generation mit LLaMA 3 通过多层检索-提法一代加强文件层面的回答问题,LLAMA 3 2506.16037v1 -
1322 06-19 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Vision-geführtes Chunking ist alles, was Sie brauchen: Verbesserung der RAG durch multimodales Dokumentenverständnis 愿景引导的决赛是您所需要的:用多模式文件理解加强RAG 2506.16035v1 -
1323 06-19 DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v4 -
1324 06-19 A Scalable Factorization Approach for High-Order Structured Tensor Recovery Ein skalierbarer Factorisierungsansatz für die hochordentlich strukturierte Tensor-Wiederherstellung 高分结构结构梯度恢复的可缩放因数化办法 2506.16032v1 -
1325 06-19 V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models V2X-VLM: End-to-End V2X kooperatives autonomes Fahren durch große Vision-Sprache Modelle V2X-VLM:通过大型视觉语言模型自主驾驶的终端到终端V2X合作社 2408.09251v3 -
1326 06-19 Multi-agent Multi-armed Bandits with Minimum Reward Guarantee Fairness Multi-Agent Multi-Armed Bandits mit Mindestprämie Garantie Fairness 具有最低奖励保证公平性的多武装多武装多武装强盗 2502.15240v2 -
1327 06-19 Conformal prediction for frequency-severity modeling Konforme Vorhersage für Frequenz-Schwere-Modellierung 频率比重建模非正式预测 2307.13124v4 -
1328 06-19 EvoLM: In Search of Lost Language Model Training Dynamics EvoLM: Auf der Suche nach verlorenen Sprachmodellen EvoLM: 寻找失传语言培训模式 2506.16029v1 -
1329 06-19 Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models Min-p, Max Übertreibung: Eine kritische Analyse der Min-p-Sampling in Sprachmodellen Min-p, Max Explation: 对语言模型的 Min-p 抽样的批判性分析 2506.13681v2 -
1330 06-19 Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis Effiziente Videoannotation im Einzelhandel: Robuster Ansatz zur Erstellung von Schlüsselrahmen für die Analyse von Produkt- und Kundeninteraktion 高效零售视频注释:产品和客户互动分析的强有力关键框架生成方法 2506.14854v2 -
1331 06-19 Bridging Brain with Foundation Models through Self-Supervised Learning Gehirn mit Grundmodellen durch Selbstüberwachtes Lernen überbrücken 通过自学学习与基金会模式架架架脑架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架 2506.16009v1 -
1332 06-19 Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning Jeder Rang könnte ein Experte sein: Ein-Ranked-Mixtur von Experten LoRA für Multi-Task-Learning 每一级别都可以是一名专家:多任务学习专家LORA的单条混合体 2501.15103v2 -
1333 06-19 Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advances Out-of-Distribution Detection: Eine aufgabenorientierte Umfrage über die jüngsten Fortschritte 分销外探测:最近进展的专案调查 2409.11884v3 -
1334 06-19 Data-Agnostic Cardinality Learning from Imperfect Workloads Daten-agnostische Kardinalität Lernen aus unvollkommenen Arbeitsbelastungen 从不完美工作量中学习 2506.16007v1 -
1335 06-19 On Domain-Adaptive Post-Training for Multimodal Large Language Models Zum Domain-Adaptive Post-Training für multimodale große Sprachmodelle 关于多模式大语言模式的多模式后培训 2411.19930v3 -
1336 06-19 AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction AutoHFormer: Effizienter Hierarchischer Autoregressiver Transformer für die Vorhersage der Zeitreihen AutoH former: 用于时间序列预测的高效的等级自动递减变换器 2506.16001v1 -
1337 06-19 TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement TAPS: Throat and Acoustic Paired Speech Dataset für Deep Learning-based Speech Enhancement TAPS: 用于加强深学习式语音强化的喉音和声频语音数据集 2502.11478v2 -
1338 06-19 Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG Tuning-Free Coreset Markov Kette Monte Carlo über Hot Dog 通过Hot DoG连线蒙特卡洛(Monte Carlo) 2410.18973v2 -
1339 06-19 A Comprehensive Survey on Continual Learning in Generative Models Eine umfassende Umfrage zum kontinuierlichen Lernen in generativen Modellen 关于以创建模式持续学习的综合调查 2506.13045v3 -
1340 06-19 Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging Heterogen-Modal Unüberwachte Domain-Anpassung über Latent Space Bridging 通过低空空间连接对域进行无监督的适应 2506.15971v1 -
1341 06-19 LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning LazyEviction: Verlangsamte KV-Eviktion mit Aufmerksamkeitsmusterbeobachtung für effizientes Long Reasoning LazyEvition: 以关注方式对有效长长理由进行观察的Lucking KV驱逐 2506.15969v1 -
1342 06-19 Two Heads Are Better than One: Simulating Large Transformers with Small Ones Zwei Köpfe sind besser als einer: Große Transformer mit kleinen zu simulieren 两头胜于一:模拟大型变形器,使用小头变形器 2506.12220v2 -
1343 06-19 Bridging Text and Crystal Structures: Literature-driven Contrastive Learning for Materials Science Bridging Text und Kristallstrukturen: Literaturgetriebenes Kontrastives Lernen für die Materialwissenschaft 架桥文字和水晶结构:以文学为动力的材料科学竞赛学习 2501.12919v2 -
1344 06-19 On the Theoretical Understanding of Identifiable Sparse Autoencoders and Beyond Über das theoretische Verständnis identifizierbarer Sparse Autoencoder und darüber hinaus 关于可辨识的微缩自动编码器理论理解及以后问题 2506.15963v1 -
1345 06-19 Learning Model Successors Nachfolger von Lernmodellen 学习模式继承人 2502.00197v2 -
1346 06-19 Contactless Precision Steering of Particles in a Fluid inside a Cube with Rotating Walls Kontaktlose Präzisionslenkung von Partikeln in einer Flüssigkeit in einem Würfel mit rotierenden Wänden 带旋转墙的立方体内流流体中的粒子无接触精确度指示器 2506.15958v1 -
1347 06-19 One Period to Rule Them All: Identifying Critical Learning Periods in Deep Networks Eine Periode, um sie alle zu beherrschen: Kritische Lernphasen in tiefen Netzwerken identifizieren 确定深网络的关键学习期 2506.15954v1 -
1348 06-19 Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments Hierarchisches und Modulares Netzwerk zur nicht-prähensilen Manipulation in allgemeinen Umgebungen 关于一般环境中非流行病操纵的等级和模块网络 2502.20843v2 -
1349 06-19 Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v4 -
1350 06-19 Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning Gemeinsame Optimierung des Informationszeitalters und des Energieverbrauchs im NR-V2X-System auf Basis von Deep Reinforcement Learning 基于深强化学习的NR-V2X系统信息和能源消耗年龄的联合优化 2407.08458v2 -
1351 06-19 Statistical Inference under Performativity Statistische Schlussfolgerung unter Performativität 性能下统计推断值 2505.18493v2 -
1352 06-19 Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach Topologie-Lernaktionen für Power Grid Control: Ein graphisch basierter Soft-Label-Lernansatz 电网控制学习地形行动:以图表为基础的软标签模拟学习方法 2503.15190v2 -
1353 06-19 On the optimal regret of collaborative personalized linear bandits Über das optimale Bedauern der kollaborativen personalisierten linearen Banditen 合作的个人化线性强盗的最佳遗憾 2506.15943v1 -
1354 06-19 CORAL: Disentangling Latent Representations in Long-Tailed Diffusion KORAL: Entwirrende Latentendarstellungen in langanhaltender Diffusion CORAL: 在长期失败的传播中拆分内流代表处 2506.15933v1 -
1355 06-19 Competing Bandits in Matching Markets via Super Stability Konkurrierende Banditen in Matching Markets über Super Stabilität 通过超级稳定在匹配市场中相互竞争的强盗 2506.15926v1 -
1356 06-19 fairmetrics: An R package for group fairness evaluation fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit 公平度:团体公平评估R包件 2506.06243v2
Article 0
Title@2025-06-26 (4): Whole-Body Conditioned Egocentric Video Prediction
Title: Whole-Body Conditioned Egocentric Video Prediction | Ganzkörperbedingte egozentrische Videovorhersage | 整盘有条件的Egocentcent视频预报 2506.21552v1 |
Authors (6): Yutong Bai, Danny Tran, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model’s embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.
我们训练模型,以预测人类行动的以地球为中心视频(PEVA),考虑到以往的视频和以相对的3D体构成为代表的动作。通过调整由身体联合等级结构组成的运动构成的轨迹,我们的模型学会从第一人的角度模拟人的身体行为如何影响环境。我们在Nymeria上训练一个自动反向的有条件扩散变压器。Nymeria是真实世界以自我为中心的视频和身体的大规模数据集。我们进一步设计了一个等级评价协议,其任务越来越具有挑战性,从而能够全面分析模型所体现的预测和控制能力。我们的工作代表了从人类角度应对模拟复杂的现实世界环境和以视频预测为代表的代理人行为的挑战的初步尝试。
Article 1
Title@2025-06-26 (4): mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale
Title: mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | mTSBench: Benchmarking Multivariate Zeitreihen Anomalieerkennung und Modellauswahl auf Scale | mTSBench:制定多变时间序列基准 2506.21550v1 |
Authors (3): Xiaona Zhou, Constantin Brif, Ismini Lourentzou
Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection.
多变时间序列异常探测(MTS-AD)在保健、网络安全和工业监测等领域至关重要,但由于复杂的可变依赖性、时间动态和稀疏异常标签等原因,多变时间序列异常探测(MTS-AD)仍然具有挑战性。 我们引入了mTSBench,这是迄今为止用于MTS-AD和无人监督的模式选择的最大基准,覆盖了19个数据集和12个不同应用域的344个标记时间序列。 mTSBench评估了24种异常探测方法,包括用于多变时间序列的大型语言模型(LLLM)探测器,以及在标准化条件下系统设定不受监督的模型选择技术基准。与先前的调查结果一致,我们的结果证实,没有单一的探测器能够超越数据集,强调模型选择的重要性。然而,即使是最先进的选择方法也远远没有达到最佳水平,揭示了关键差距。 MTSBench提供了统一的评估套,以便能够在适应异常探测和稳健模型选择方面进行严格、可复制的比较,并催化未来的进展。
Article 2
Title@2025-06-26 (4): Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Title: Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test | 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v1 |
Authors (3): Ziyue Li, Chenrui Fan, Tianyi Zhou
Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkpoints during one-pass pretraining of a 7B large language model (LLM), i.e., OLMoE. We compute the training loss and evaluate generalization on diverse benchmark tasks, including math reasoning, code generation, and commonsense/domain-specific knowledge retrieval tasks. Our study, for the first time, verifies that grokking still happens in the pretraining of large-scale foundation models, though different data may enter grokking stages asynchronously. We further demystify grokking’s “emergence of generalization” by investigating LLM internal dynamics. Specifically, we find that training samples’ pathways (i.e., expert choices across layers) evolve from random, instance-specific to more structured and shareable between samples during grokking. Also, the complexity of a sample’s pathway reduces despite the converged loss. These indicate a memorization-to-generalization conversion, providing a mechanistic explanation of delayed generalization. In the study, we develop two novel metrics to quantify pathway distance and the complexity of a single pathway. We show their ability to predict the generalization improvement on diverse downstream tasks. They are efficient, simple to compute and solely dependent on training data. Hence, they have practical value for pretraining, enabling us to monitor the generalization performance without finetuning and test. Theoretically, we show that more structured pathways reduce model complexity and improve the generalization bound.
格罗克金,即测试性表现在培训损失趋同后,在培训损失趋同后,情况在相当长的时间里不断改善。我们最近在神经网络培训中目睹了培训损失并评价了不同基准任务的一般化,包括数学推理、代码生成和公元/域特定知识检索任务等。虽然先前的研究通常对数千个时代的微小模型进行一些玩具或高度特定的任务的小型模型培训,但我们在对7B大语言模型(LLIM)的一次性预训练中首次对检查站进行磨损研究,即:OLMoE。我们计算了培训损失并评价了不同基准任务,包括数学推理、代码生成和公元/域特定知识检索任务。我们的研究首次证实,在大规模基础模型的预训练阶段,虽然不同的数据可能进入了混杂阶段,但我们进一步淡化了7B大语言模型(LIM)的“概括化”,即OLMoE 。我们发现,在一般模型的内部动态上,简单的多样化,我们发现培训样本的改进过程(即专家选择跨层)从随机、结构结构结构、常规变现到结构数据转换能力,也显示了一种可变现的缩的模型。
Article 3
Title@2025-06-26 (4): HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Title: HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | HalluSegBench: Counterfactual Visual Reasoning for Segmentation Halluzination Evaluation | HalluSegeBench:截肢幻觉评价的反事实视觉理由 2506.21546v1 |
Authors (6): Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.
然而,这些模型往往通过为不以图像内容为根据的物体制作分离面罩或错误地标出不相干区域而产生幻觉。现有的分解幻觉评价程序主要侧重于标签或文字幻觉,而没有操纵视觉环境,限制了它们诊断重大故障的能力。作为回应,我们引入了HalluSegeBench,这是第一个专门设计通过反事实视觉推理角度评价视觉地面幻觉的基准。我们的基准包括一套新颖的数据集,共有1340对反事实实例,涵盖281个独特的对象类别,以及一套新推出的计量指标,在视觉一致的场景编辑中量化幻觉敏感性。HalluSege-Bench的实验与最先进的视觉语言分解模型显示,由视觉驱动的幻觉比由标签驱动的幻觉更加普遍,模型往往以假分解为主,强调需要反事实推论来判断真真性。
Article 4
Title@2025-06-26 (4): Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Title: Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval | Maximal aufeinander abgestimmte Materien: Vermeidung von Darstellungskollaps für robustes Cross-Modal Retrieval | 最大匹配事项: 防止在强力跨模式检索中出现代表比例折叠 2506.21538v1 |
Authors (4): Hani Alomari, Anushka Sivakumar, Andrew Zhang, Chris Thomas
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.
由于不同模式的内容之间可能存在各种关联,跨式图像-文字检索具有挑战性。传统方法学会了一种单一矢量嵌入以代表每个样本的语义,但努力捕捉不同模式之间可能存在的细化和多种关系。基于设置的方法代表了每个样本的多个嵌入式,提供了一种有希望的替代办法,因为它们可以捕捉到更丰富、更多样化的关系。在本文中,我们表明,尽管这些基于设置的表达方式有希望,但它们继续面临各种问题,包括监督不力和设置崩溃,这限制了它们的有效性。为了应对这些挑战,我们提议最大层层对层分配相似性,以优化组合内保存语义多样性的嵌入式组合之间的一对一匹配。我们还引入了两个损失功能,以进一步加强表述方式:全球差异性损失,以加强嵌入的区别,以及内部分裂性损失,以防止每组内部崩溃。我们的方法在MS-CO和Flick30k上实现了最先进的业绩,而无需依靠外部数据。
Article 5
Title@2025-06-26 (4): Exploring the Design Space of 3D MLLMs for CT Report Generation
Title: Exploring the Design Space of 3D MLLMs for CT Report Generation | Erforschung des Design-Raums von 3D-MLLMs für die CT-Berichtserstellung | 为编写CT报告探索3D MLLMs的设计空间 2506.21535v1 |
Authors (5): Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution
在这项工作中,我们系统地调查3DMLLMS的设计空间,包括视觉输入演示、投影仪、大语言模型(LLMS)和3DCT报告生成的微调技术。我们还引入了两种基于知识的报告增强方法,提高GREEN分数的绩效,达到10,在MICCAI 2024 AMOS-MMM挑战中达到了第2位。我们在AMOS-MM数据集中的1,687个案例的结果表明,RRG基本上独立于同一培训协议下的LLM的大小。我们还表明,如果原VIT在数量上事先培训,则数量并不总是更大。最后,我们表明,使用分解面罩和CT量改进性能。我们可在https://github.com/bowang-lab/AMOS-MM-Solution上公开查阅该代码。
Article 6
Title@2025-06-26 (4): Chain-of-Sketch: Enabling Global Visual Reasoning
Title: Chain-of-Sketch: Enabling Global Visual Reasoning | Chain-of-Sketch: Globale visuelle Vernunft aktivieren | 标准链链:扶持全球视觉理性 2410.08165v2 |
Authors (5): Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbe
Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features do not provide significant information. Minsky and Papert put forward such tasks in 1969 with their connectivity study, exposing the limitations of the perceptron model. In this paper, we introduce an expanded set of global visual datasets involving graphs, strings, mazes, and image grids. We show that large vision models still struggle to learn these tasks efficiently. Similarly, state-of-the-art multi-modal LLMs perform poorly on these datasets. We explain this learning inefficiency by means of the ‘globality degree’ measure. To mitigate this, we propose a method called chain-of-sketch (CoS). Similar to the chain-of-thought and scratchpad techniques used in language models, CoS breaks the original task into intermediate visual steps to help learn a complex task. In addition, we show that not all CoS strategies perform equally well. Our key insight is to impose a Markovian structure on the CoS frames. This leads to the introduction of ‘inductive CoS’ which achieves better out-of-distribution generalization and performs well even with smaller models compared to non-inductive variants.
现代愿景模型在基准方面取得了显著的成功,当地特征提供了有关目标的重要信息; 现在人们越来越有兴趣处理需要更全球推理的任务,而当地特征没有提供重要信息; 明斯克和佩佩佩特在1969年通过连通性研究提出这些任务,暴露了渗透模型的局限性。 在本文件中,我们引入了一套扩大的全球视觉数据集,其中包括图表、字符串、迷宫和图像网; 我们显示,大型愿景模型仍在努力有效地学习这些任务; 同样, 最先进的多模式LMs在这些数据集上表现不佳。 我们用“ 全球化程度” 衡量方法来解释这种学习效率低下的情况。 为了减轻这一点,我们提出了一种称为“ 链式套式” (COS) 的方法。 类似于语言模型中使用的思维链和刮痕技术, CoS 将最初的任务分解为中间视觉步骤, 以帮助学习复杂的任务。 此外, 我们显示,并非所有COS战略都表现得同样好。 我们的关键洞察力是将Markovian 结构设置在这些数据集上。 我们用“ 全球化度” 测量尺度来解释这种低效。 我们提出“ ” 方法,这导致引入较小型模型, 更细化模式。
Article 7
Title@2025-06-26 (4): Mesh-Informed Neural Operator : A Transformer Generative Approach
Title: Mesh-Informed Neural Operator : A Transformer Generative Approach | Mesh-informed Neural Operator : Ein transformer Generativer Ansatz | 气象化神经操作器:变异创造方法 2506.16656v2 |
Authors (4): Yaozhong Shi, Zachary E. Ross, Domniki Asimaki, Kamyar Azizzadenesheli
Generative models in function spaces, situated at the intersection of generative modeling and operator learning, are attracting increasing attention due to their immense potential in diverse scientific and engineering applications. While functional generative models are theoretically domain- and discretization-agnostic, current implementations heavily rely on the Fourier Neural Operator (FNO), limiting their applicability to regular grids and rectangular domains. To overcome these critical limitations, we introduce the Mesh-Informed Neural Operator (MINO). By leveraging graph neural operators and cross-attention mechanisms, MINO offers a principled, domain- and discretization-agnostic backbone for generative modeling in function spaces. This advancement significantly expands the scope of such models to more diverse applications in generative, inverse, and regression tasks. Furthermore, MINO provides a unified perspective on integrating neural operators with general advanced deep learning architectures. Finally, we introduce a suite of standardized evaluation metrics that enable objective comparison of functional generative models, addressing another critical gap in the field.
功能性基因模型在基因模型和操作者学习交汇处的功能空间中产生模型,由于其在各种科学和工程应用方面的巨大潜力,正日益引起人们的注意。功能性基因模型在理论上是域和离散型的,但目前的实施严重依赖Fourier神经操作员(FNO),将其应用范围限制在常规网格和矩形领域。为了克服这些关键的局限性,我们引入了Mesh-Inform Inform Neal操作员(MINO)。通过利用图形神经操作员和交叉注意机制,MINO为功能空间的基因模型提供有原则的、域化和离散型的骨干。这一进步大大扩大了这些模型的范围,使其在基因、反向和回归任务方面应用更加多样化。此外,MINO还提出了将神经操作员与一般高级深层次学习结构相结合的统一观点。最后,我们引入了一套标准化的评价指标,以便能够对功能性基因模型进行客观比较,解决实地的另一个关键差距。
Article 8
Title@2025-06-26 (4): Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Title: Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity | Effiziente Flucht aus Sattelpunkten unter generalisierter Glätte durch selbsterklärende Regelmäßigkeit | 通过自我调整常态,在普遍平滑状态下有效绕开散装货架点 2503.04712v2 |
Authors (4): Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Benjamin Tang
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.
我们用第一顺序方法研究非电流功能的优化问题,这些功能不一定是平滑的(平坦和/或黑森是利普西茨),使用第一顺序方法。光滑是在理论和实践的机器学习中的一种限制性假设,激发了最近关于寻找第一顺序固定功能点的重要工作,以第一顺序方法满足平滑的一般要求。我们开发了一个新的框架,使我们能够系统研究在平滑的概括下将一流一级优化算法(我们称之为减缩程序)综合起来的问题。我们即时地将我们的框架分析第一顺序优化算法与第一和\textit{seconomics的趋同,在平滑的概括下将固定顺序点集中起来。结果,我们为第一顺序方法确定了第一个趋同保证,在一般的平坦度方法下将第二顺序固定点合并起来。我们展示了几个典型例子属于我们的框架范围,并突出了实际影响。
Article 9
Title@2025-06-26 (4): Gaussian Invariant Markov Chain Monte Carlo
Title: Gaussian Invariant Markov Chain Monte Carlo | Gaussian Invariant Markov Kette Monte Carlo | Gausian Invarianant Markov 链 蒙特卡洛 2506.21511v1 |
Authors (4): Michalis K. Titsias, Angelos Alexopoulos, Siran Liu, Petros Dellaportas
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
我们开发了抽样方法,其中包括随机步行大都会(RWM)的Gaussian变异版本、大都会经调整的Langevin算法(MALA)和第二顺序的Hessian或Manifound MALA。与标准的RWM和MALA不同,我们显示,高山变异取样可导致统计效率提高的自动测算器。这是由于高山变异的显著特性,使我们能够为Gaussian目标的Poisson方程获得精确的分析解决方案。这些解决办法可用于构建高效和易于使用控制变异的变量,以减少任何难选目标下的估算器的差异。我们在若干例子中展示了新的采样者和估计器,包括潜值模型中的高维目标,我们在此与一些先进方法进行比较并获得最新的结果。我们还提供了关于几何测角的梯度的理论结果,并提供了显示最佳接受率对目标高标值的依赖度的最佳比例分析。
Article 10
Title@2025-06-26 (4): skLEP: A Slovak General Language Understanding Benchmark
Title: skLEP: A Slovak General Language Understanding Benchmark | sklep: Ein slowakisches allgemeines Sprachverständnis Benchmark | SkLEP:斯洛伐克一般语言理解基准 2506.21508v1 |
Authors (8): Marek Šuppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marián Šimko
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
在这项工作中,我们引入了SkLEP,这是专门为评价斯洛伐克自然语言理解(NLU)模式而设计的第一个全面基准;我们汇编了SkLEP, 涵盖九种不同的任务,包括象征性、句式和文件层面的挑战,从而对模型能力进行彻底评估;为建立这一基准,我们为斯洛伐克人专门设计了新的原始数据集,并仔细翻译了已经建立的英文NLU资源;在本文件中,我们还利用SkLEP任务,对一系列斯洛伐克特有、多语种和经过培训的英语语言模型进行了首次系统和广泛的评价;最后,我们还发布了完整的基准数据、一个开放源工具包,便利了模型的微调和评估,并在https://github.com/slovak-nlp/sklep上发布了一个公共领导板,希望促进斯洛伐克国家语言联盟的再生化和推动未来研究。
Article 11
Title@2025-06-26 (4): NY Real Estate Racial Equity Analysis via Applied Machine Learning
Title: NY Real Estate Racial Equity Analysis via Applied Machine Learning | NY Real Estate Racial Equity Analyse über angewandtes maschinelles Lernen | 通过应用机器学习进行房地产种族公平分析 2505.16946v3 |
Authors (3): Sanjana Chalavadi, Andrei Pastor, Terry Leitch
This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities. We use an advanced race/ethnicity imputation model (LSTM+Geo with XGBoost filtering, validated at 89.2% accuracy) to compare the predicted racial composition of property owners to the resident population from census data. We examine both a Full Model (statewide) and a Name-Only LSTM Model (NYC) to assess how incorporating geospatial context affects our predictions and disparity estimates. The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners. These disparities are most pronounced in minority-majority neighborhoods, where ownership is predominantly White despite a predominantly non-White population. Corporate ownership (LLCs, trusts, etc.) exacerbates these gaps by reducing owner-occupied opportunities in urban minority communities. We provide a breakdown of ownership vs. population by race for majority-White, -Black, -Hispanic, and -Asian tracts, identify those with extreme ownership disparities, and compare patterns in urban, suburban, and rural contexts. The findings underscore persistent racial inequity in property ownership, reflecting broader historical and socio-economic forces, and highlight the importance of data-driven approaches to address these issues.
这项研究分析了纽约州(纽约州)和纽约市(纽约州)一级房地产所有权模式,以发现种族差异。我们使用先进的种族/族裔估算模型(LSTM+Geo和XGBo过滤器,以89.2%的准确率验证),将根据人口普查数据预测的财产所有者的种族构成与居民人口进行比较。我们研究了全模式(全州范围)和全名LSTM模式(NYC),以评估纳入地理空间背景如何影响我们的预测和差异估计。结果揭示了严重的不平等:白人占其人口的财产和财产价值的比例过高,黑人、西班牙裔和亚洲社区作为财产所有者的比例不足。这些差异在少数民族社区最为明显,尽管人口大多为非白人,但所有权主要为白人。公司所有权(全州范围)和独名LSTM模式(Nationality-Onal-LSTM模式)都通过减少城市少数群体占用的机会,加剧了这些差距。我们按种族对多数白人、黑人、黑人、希斯帕尼和亚裔社区的所有权与人口的比例不成比例,而黑人社区作为财产所有者的人数比例则不足。这些差距在城市和亚裔社会经济背景中反映出这些历史和历史、历史上的差距和历史上的差异和历史不平等。
Article 12
Title@2025-06-26 (4): Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems
Title: Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems | Prozess-Mining-gesteuerte Modellierung und Simulation zur Verbesserung der Fehlerdiagnose in cyber-physischen Systemen | 由采矿流程驱动的模型和模拟模型和模拟,以加强网络物理系统中的过失诊断 2506.21502v1 |
Authors (6): Francesco Vitale, Nicola Dall’Ora, Sebastiano Gaiardelli, Enrico Fraccaroli, Nicola Mazzocca, Franco Fummi
Fault diagnosis in Cyber-Physical Systems (CPSs) is essential for ensuring system dependability and operational efficiency by accurately detecting anomalies and identifying their root causes. However, the manual modeling of faulty behaviors often demands extensive domain expertise and produces models that are complex, error-prone, and difficult to interpret. To address this challenge, we present a novel unsupervised fault diagnosis methodology that integrates collective anomaly detection in multivariate time series, process mining, and stochastic simulation. Initially, collective anomalies are detected from low-level sensor data using multivariate time-series analysis. These anomalies are then transformed into structured event logs, enabling the discovery of interpretable process models through process mining. By incorporating timing distributions into the extracted Petri nets, the approach supports stochastic simulation of faulty behaviors, thereby enhancing root cause analysis and behavioral understanding. The methodology is validated using the Robotic Arm Dataset (RoAD), a widely recognized benchmark in smart manufacturing. Experimental results demonstrate its effectiveness in modeling, simulating, and classifying faulty behaviors in CPSs. This enables the creation of comprehensive fault dictionaries that support predictive maintenance and the development of digital twins for industrial environments.
网络物理系统中的错误诊断对于通过准确发现异常现象和查明其根源,确保系统可靠性和运作效率至关重要。然而,对错误行为的手工建模往往需要广泛的领域专门知识,并产生复杂、容易出错和难以解释的模型。为了应对这一挑战,我们提出了一个新型的未经监督的错误诊断方法,将集体异常检测纳入多变时间序列、进程采矿和随机模拟中。最初,通过多变时间序列分析,从低级别传感器数据中检测出集体异常。这些异常随后转化为结构化的事件日志,从而能够通过进程采矿发现可解释的流程模型。通过将时间分布纳入提取的Petri 网,该方法支持对错误行为进行随机模拟,从而增强根本原因分析和行为理解。该方法使用机械臂数据集(ROAD)这一广泛公认的智能制造基准加以验证。实验结果表明其在建模、模拟和分类CPS系统中的错误行为方面的有效性。通过将时间序列挖掘,从而能够创建支持数字工业环境的完整断层。
Article 13
Title@2025-06-26 (4): Devising a solution to the problems of Cancer awareness in Telangana
Title: Devising a solution to the problems of Cancer awareness in Telangana | Lösung der Probleme des Krebsbewusstseins in Telangana | 制定特拉甘纳癌症意识问题解决方案 2506.21500v1 |
Authors (4): Priyanka Avhad, Vedanti Kshirsagar, Urvi Ranjan, Mahek Nakhua
According to the data, the percent of women who underwent screening for cervical cancer, breast and oral cancer in Telangana in the year 2020 was 3.3 percent, 0.3 percent and 2.3 percent respectively. Although early detection is the only way to reduce morbidity and mortality, people have very low awareness about cervical and breast cancer signs and symptoms and screening practices. We developed an ML classification model to predict if a person is susceptible to breast or cervical cancer based on demographic factors. We devised a system to provide suggestions for the nearest hospital or Cancer treatment centres based on the users location or address. In addition to this, we can integrate the health card to maintain medical records of all individuals and conduct awareness drives and campaigns. For ML classification models, we used decision tree classification and support vector classification algorithms for cervical cancer susceptibility and breast cancer susceptibility respectively. Thus, by devising this solution we come one step closer to our goal which is spreading cancer awareness, thereby, decreasing the cancer mortality and increasing cancer literacy among the people of Telangana.
根据这些数据,在Telangana,2020年接受宫颈癌、乳腺癌和口腔癌筛查的妇女比例分别为3.3%、0.3%和2.3%,尽管早期检测是降低发病率和死亡率的唯一途径,但人们对宫颈癌和乳腺癌症状以及症状和筛查做法的认识非常低,我们开发了ML分类模型,根据人口因素预测一个人是否容易患乳腺癌或宫颈癌。我们设计了一个系统,根据用户地点或地址,为最近的医院或癌症治疗中心提供建议。此外,我们可以整合卫生卡,以维护所有个人的医疗记录,开展提高认识的运动和运动。对于ML分类模型,我们分别使用决定树分类和支持宫颈癌易感和乳腺癌易感性病媒分类算法。因此,通过设计这一解决方案,我们更接近我们的目标,即传播癌症意识,从而降低癌症死亡率,提高特拉加纳人的癌症识字率。
Article 14
Title@2025-06-26 (4): Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment
Title: Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment | 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v2 |
Authors (4): Yuhui Sun, Xiyao Wang, Zixi Li, Jinman Zhao
While large-scale unsupervised language models (LMs) capture broad world knowledge and reasoning capabilities, steering their behavior toward desired objectives remains challenging due to the lack of explicit supervision. Existing alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on training a reward model and performing reinforcement learning to align with human preferences. However, RLHF is often computationally intensive, unstable, and sensitive to hyperparameters. To address these limitations, Direct Preference Optimization (DPO) was introduced as a lightweight and stable alternative, enabling direct alignment of language models with pairwise preference data via classification loss. However, DPO and its extensions generally assume a single static preference distribution, limiting flexibility in multi-objective or dynamic alignment settings. In this paper, we propose a novel framework: Multi-Preference Lambda-weighted Listwise DPO, which extends DPO to incorporate multiple human preference dimensions (e.g., helpfulness, harmlessness, informativeness) and enables dynamic interpolation through a controllable simplex-weighted formulation. Our method supports both listwise preference feedback and flexible alignment across varying user intents without re-training. Empirical and theoretical analysis demonstrates that our method is as effective as traditional DPO on static objectives while offering greater generality and adaptability for real-world deployment.
虽然大规模不受监督的语言模式(LMS)能够捕捉广泛的世界知识和推理能力,但由于缺乏明确的监督,引导其行为走向预期目标仍具有挑战性。现有的调整技术,如从人类反馈中强化学习(RLHF),依靠培训奖励模式和开展强化学习以适应人类的偏好。然而,RLHF往往在计算上密集、不稳定和敏感于超光谱。为解决这些局限性,直接偏好优化(DPO)作为一种轻量级和稳定的替代方案,通过分类损失,使语言模式与对称优惠数据直接一致。然而,DPO及其扩展通常采取单一的静态优惠分配,限制在多目标或动态的调整环境中的灵活性。在本文件中,我们提出了一个新的框架:多Pregation Lambda加权列表方法,将DPO扩大到多个人类偏好层面(例如帮助、无害、信息性),并通过可控性简单加权的编制,使得动态的内插。我们的方法既支持列表式的优惠反馈,也支持在不同用户意图之间灵活调整,在多目标或动态调整环境中,同时提供更灵活的理论性分析。
Article 15
Title@2025-06-26 (4): One Model to Forecast Them All and in Entity Distributions Bind Them
Title: One Model to Forecast Them All and in Entity Distributions Bind Them | Ein Modell, um sie zu prognostizieren Alles und in Entity-Distributionen Bind Them | 预测所有实体和实体分配的模型之一 2501.15499v2 |
Authors (2): Kutay Bölat, Simon Tindemans
Probabilistic forecasting in power systems often involves multi-entity datasets like households, feeders, and wind turbines, where generating reliable entity-specific forecasts presents significant challenges. Traditional approaches require training individual models for each entity, making them inefficient and hard to scale. This study addresses this problem using GUIDE-VAE, a conditional variational autoencoder that allows entity-specific probabilistic forecasting using a single model. GUIDE-VAE provides flexible outputs, ranging from interpretable point estimates to full probability distributions, thanks to its advanced covariance composition structure. These distributions capture uncertainty and temporal dependencies, offering richer insights than traditional methods. To evaluate our GUIDE-VAE-based forecaster, we use household electricity consumption data as a case study due to its multi-entity and highly stochastic nature. Experimental results demonstrate that GUIDE-VAE outperforms conventional quantile regression techniques across key metrics while ensuring scalability and versatility. These features make GUIDE-VAE a powerful and generalizable tool for probabilistic forecasting tasks, with potential applications beyond household electricity consumption.
电力系统的概率预测往往涉及多个实体的数据集,如家庭、饲料和风力涡轮机,由此产生可靠的具体实体的预测带来重大挑战。传统方法要求为每个实体培训单个模型,使其效率低,规模难以衡量。本研究使用一个单一模型来解决这个问题,该模型是有条件的可变自动编码工具,允许使用一个单一模型进行具体实体的概率预测。 GUDE-VAE提供了灵活的产出,从可解释点估计到全面概率分布,由于它的先进的共变结构。这些分布方法收集了不确定性和时间依赖性,提供了比传统方法更丰富的洞察力。为了评估基于GUID-VAE的预报器,我们使用家庭用电量数据作为案例研究,因为其多性和高度互不相干的性质。实验结果表明,DUID-VAE在确保可缩放性和易用性的同时,超越了关键计量标准,超越了常规的四重回归技术。这些特征使得GUDE-VAE成为一种强大和可通用的预测任务工具,具有超出家庭用电消费范围的潜在应用。
Article 16
Title@2025-06-26 (4): Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages
Title: Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages | Mit Phonemes: Mehrsprachigkeit von LLMs für nicht-lateinische Script-Sprachen verbessern | 以电话提示:提高LLMS的非拉丁文拼写语言多重语言质量 2411.02398v3 |
Authors (7): Hoang H Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary
Although multilingual LLMs have achieved remarkable performance across benchmarks, we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that LLMs are pretrained with orthographic scripts, which are dominated by Latin characters that obscure their shared phonology with non-Latin scripts. We propose leveraging phonemic transcriptions as complementary signals to induce script-invariant representations. Our study demonstrates that integrating phonemic signals improves performance across both non-Latin and Latin script languages, with a particularly significant impact on closing the performance gap between the two. Through detailed experiments, we show that phonemic and orthographic scripts retrieve distinct examples for in-context learning (ICL). This motivates our proposed Mixed-ICL retrieval strategy, where further aggregation from both leads to our significant performance improvements for both Latin script languages (up to 12.6%) and non-Latin script languages (up to 15.1%) compared to randomized ICL retrieval.
虽然多语种LLM在基准方面取得了显著的成绩,但我们发现,在当代LLM家族中,这些LLM在非拉丁文字语言上的表现仍然不尽如人意。这一差异源于LLM在接受正拼写文字学培训之前就已接受过拼写文字学的训练,这些文字主要是拉丁字符,这些拉丁字符模糊了他们与非拉丁文字的同声文字学。我们建议利用电话抄录作为辅助信号,以诱导脚本和拉丁文字表达。我们的研究显示,结合语音信号可以改善非拉丁文字和拉丁文字语言的成绩,对缩小两种文字的成绩差距产生特别显著的影响。我们通过详细实验发现,语音和文字文字文字学(ICL)可以找到不同的例子。这激励了我们拟议的混合ICL检索战略,在这两个战略中,从中进一步整合导致我们对拉丁文字语言(高达12.6%)和非拉丁文字语言(高达15.1%)和非拉丁文字语言(高达15.1%)的成绩与随机的ICL检索相比都有显著的改进。
Article 17
Title@2025-06-26 (4): From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents
Title: From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents | Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents | 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v2 |
Authors (23): Weizhi Zhang, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie, Yuyao Yang, Wei-Chieh Huang, Chunyu Miao, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Yankai Chen, Chunkit Chan, Peilin Zhou, Xinyang Zhang, Chenwei Zhang, Jingbo Shang, Ming Zhang, Yangqiu Song, Irwin King, Philip S. Yu
Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking. All the related resources, including industry products, research papers, benchmark datasets, and open-source implementations, are collected for the community in https://github.com/DavidZWZ/Awesome-Deep-Research.
我们的立场是,具有推理和代理能力的大型语言模型(LLMS)正在引入一个新的范式,称为 “ 干深研究 “ 。这些系统通过紧密整合自主推理、迭代检索和信息合成,超越常规信息搜索技术,形成动态反馈循环。我们追踪从静态网络搜索到互动、代理系统的变化,这些系统计划、探索和学习。我们还引入了测试-时间缩放法,以正式确定计算深度对推理和搜索的影响。我们借助基准结果和开放源执行的崛起,证明 “ 干深研究 “ 不仅大大超越了现有方法,而且还准备成为未来信息搜索的主要范式。所有相关资源,包括工业产品、研究论文、基准数据集和开放源实施,都在https://github.com/DavidZZZ/Awesome-Deep-Research中为社区收集了所有相关资源,包括工业产品、研究文件、基准数据集、基准数据集和开放源实施。
Article 18
Title@2025-06-26 (4): Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Title: Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection | Zuverlässige Erkennung von leerem Raum: Bedingte markierte Punktprozesse für Objekterkennung | 争取可靠地探测空空空间:物体探测的有条件定点过程 2506.21486v1 |
Authors (3): Tobias J. Riedlinger, Kira Maag, Hanno Gottschalk
Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model’s uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.
深神经网络设置了计算机视觉任务中最先进的技术,如捆绑盒探测和语义分解等。物体探测器和分解模型给预测分配了信任分数,反映了模型在物体探测或像素分类中的不确定性。然而,这些信任估计往往有误,因为其结构和损失功能是针对任务性能而不是概率基础的。即使有经过适当校准的预测,物体探测器也无法量化所检测到的框框外不确定性,也就是说,该模型无法对没有被检测到的物体的区域是否真正没有障碍进行概率评估。这在自动驾驶等应用中造成了安全风险,因为空域的不确定性仍未被探索。在这项工作中,我们提出了一个基于空间统计的物体探测模型。框数据匹配一个标记点的实现情况,通常用来描述被确定为捆绑箱中心的空间点事件的概率发生,用标记来描述捆绑框和分类的空间扩展。我们的统计框架使得有可能进行基于可能性的培训,并且提供了对是否通过磁性评估进行自由的校准。我们通过一个区域来显示一个可以进行的业绩评估。
Article 19
Title@2025-06-26 (4): Evaluation of Traffic Signals for Daily Traffic Pattern
Title: Evaluation of Traffic Signals for Daily Traffic Pattern | Bewertung von Verkehrssignalen für das tägliche Verkehrsmuster | 对每日交通模式交通信号的评价 2506.21469v1 |
Authors (2): Mohammad Shokrolah Shirazi, Hung-Fu Chang
The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones.
转动计数数据对交通信号设计、交叉几何规划、交通流量和拥堵分析至关重要。 这项工作提出了三种方法,称为动态、静态和混合配置,用于TMC的交通信号。 开发了一个基于视觉的跟踪系统,以利用交通摄像头对拉斯维加斯6个交叉点的TMC进行估计。 交叉设计、路线(例如车辆移动方向)和具有兼容格式的信号配置文件被合成并输入到城市移动模拟中,以便用现实数据进行信号评价。 根据估计等待时间得出的初步实验结果显示,90和120秒的周期时间对所有交叉点最有效。 此外,四个交叉点显示动态信号时间配置的性能更好,静态路段的动态路段总数比较低。 由于日常交通流量往往呈现双式模式,我们提出了一种混合信号方法,在动态和静态方法之间开关,以适应峰值和峰值的交通条件来改进流量管理。 因此,一个基于建筑的交通发电机模块将车道路段的路线设定为4小时,包括高峰时段, 以及一个信号设计模块设计模块显示动态路段的交通流量结构结构, 向北纬路段总结构向北段的频率分布, 向北纬度向北纬度为动态路段的汇率分配。 向北路段分配为动态路段的汇率计算, 。 向北纬度向北位计算计算, 向南计算向南计算,向北纬路段路段路段路段路段至南计算,向北路段至北路段至北路段至北路段段段段段段, 。
Article 20
Title@2025-06-26 (4): In-Context Learning Strategies Emerge Rationally
Title: In-Context Learning Strategies Emerge Rationally | In-Context Learning Strategies Emerge Rational | 新兴动力 2506.17859v2 |
Authors (6): Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah D. Goodman
Recent work analyzing in-context learning (ICL) has identified a broad set of strategies that describe model behavior in different experimental conditions. We aim to unify these findings by asking why a model learns these disparate strategies in the first place. Specifically, we start with the observation that when trained to learn a mixture of tasks, as is popular in the literature, the strategies learned by a model for performing ICL can be captured by a family of Bayesian predictors: a memorizing predictor, which assumes a discrete prior on the set of seen tasks, and a generalizing predictor, where the prior matches the underlying task distribution. Adopting the normative lens of rational analysis, where a learner’s behavior is explained as an optimal adaptation to data given computational constraints, we develop a hierarchical Bayesian framework that almost perfectly predicts Transformer next-token predictions throughout training – without assuming access to its weights. Under this framework, pretraining is viewed as a process of updating the posterior probability of different strategies, and inference-time behavior as a posterior-weighted average over these strategies’ predictions. Our framework draws on common assumptions about neural network learning dynamics, which make explicit a tradeoff between loss and complexity among candidate strategies: beyond how well it explains the data, a model’s preference towards implementing a strategy is dictated by its complexity. This helps explain well-known ICL phenomena, while offering novel predictions: e.g., we show a superlinear trend in the timescale for transitioning from generalization to memorization as task diversity increases. Overall, our work advances an explanatory and predictive account of ICL grounded in tradeoffs between strategy loss and complexity.
最近的工作分析文中学习(ICL)已经确定了一套广泛的战略,描述不同实验条件下的模型行为。我们的目标是统一这些结论,询问为什么模型首先学习这些不同的战略。具体地说,我们首先观察,当我们受过训练学习混合任务时,正如文献中流行的那样,通过执行ICL模型学习的战略可以被一个巴伊西亚预测家的大家庭所捕捉:一个记忆化的预测者,它假定在一组已知任务之前是一个离散的预言者,以及一个总体化预测者,而以前的预测者与基本任务分布相匹配。我们采用了理性分析的规范性透镜,其中将学习者的行为解释为对计算限制下的数据的最佳调整。我们的框架将一个等级级的拜伊斯框架,它几乎完美地预测下流的预测,而不用假设其贸易权重。在这个框架下,先期行为被视为一个更新不同战略的远征概率的过程,而后向时间化的行为则作为这些战略的后期平均值。在预测中,我们的框架将一个共同的解释性的解释性战略用来解释一个共同的系统, 解释一个共同的变价变的假设, 也就是它是如何解释一个解释一个共同的变变变的策略,它是如何解释一个解释它是如何解释一个共同的策略,一个解释一个共同的缩缩缩缩的缩的策略, 。在它是如何在它是如何解释一个解释一个解释一个解释一个解释一个解释一个系统, 的策略,它是如何在我们的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式战略, 。
Article 21
Title@2025-06-26 (4): Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage
Title: Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage | Optimierung der Runge-Kutta-Methoden der 4. Ordnung: Dynamischer heuristischer Ansatz für Effizienz und geringen Speicher | 优化第四阶极龙格-库塔方法:高效和低储存的动态超光速方法 2506.21465v1 |
Authors (3): Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O’Sullivan
Extended Stability Runge-Kutta (ESRK) methods are crucial for solving large-scale computational problems in science and engineering, including weather forecasting, aerodynamic analysis, and complex biological modelling. However, balancing accuracy, stability, and computational efficiency remains challenging, particularly for high-order, low-storage schemes. This study introduces a hybrid Genetic Algorithm (GA) and Reinforcement Learning (RL) approach for automated heuristic discovery, optimising low-storage ESRK methods. Unlike traditional approaches that rely on manually designed heuristics or exhaustive numerical searches, our method leverages GA-driven mutations for search-space exploration and an RL-inspired state transition mechanism to refine heuristic selection dynamically. This enables systematic parameter reduction, preserving fourth-order accuracy while significantly improving computational efficiency.The proposed GA-RL heuristic optimisation framework is validated through rigorous testing on benchmark problems, including the 1D and 2D Brusselator systems and the steady-state Navier-Stokes equations. The best-performing heuristic achieves a 25\% reduction in IPOPT runtime compared to traditional ESRK optimisation processes while maintaining numerical stability and accuracy. These findings demonstrate the potential of adaptive heuristic discovery to improve resource efficiency in high-fidelity simulations and broaden the applicability of low-storage Runge-Kutta methods in real-world computational fluid dynamics, physics simulations, and other demanding fields. This work establishes a new paradigm in heuristic optimisation for numerical methods, opening pathways for further exploration using Deep RL and AutoML-based heuristic search
扩展稳定龙格-库塔(ESRK)方法对于解决科学和工程(包括天气预报、空气动力分析和复杂的生物建模)方面的大规模计算问题至关重要,然而,平衡准确性、稳定性和计算效率仍然具有挑战性,特别是在高阶、低储量计划方面。这项研究引入了自动超常发现(GA)和强化学习(RL)混合方法,优化了低储量ESK方法。与依赖人工设计的超常或详尽的数字搜索的传统方法不同,我们的方法利用GA驱动的深度变异来进行搜索空间探索,并采用RL驱动的州级过渡机制来动态完善超常选择。这有利于系统降低参数,保持四级的准确性,同时大幅提高计算效率。 拟议的GA-RUT(GA)和SEUBER(RL)的优化优化框架通过对基准问题的严格测试,包括1D和2DBruseltor系统,以及稳定状态的Savier-Stokes 数字等方程式。我们最优秀的超低级搜索方法利用GA的深度探索模型进行探索,在IPPTLL的运行轨道上进行25°计算,在运行过程中实现了不断递减缩的轨变精化,同时进行中,并展示与传统资源稳定地分析。
Article 22
Title@2025-06-26 (4): Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs
Title: Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs | Capacity-Constrained Online-Lernen mit Verzögerungen: Scheduling Frameworks und Trade-offs bedauern | 受能力制约的有延误的在线学习:时间安排框架和悔恨取舍 2503.19856v2 |
Authors (3): Alexander Ryabchenko, Idan Attias, Daniel M. Roy
We study online learning with oblivious losses and delays under a novel capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under
clairvoyance’’ (i.e., delay durations are revealed upfront each round) and/or preemptibility'' (i.e., we can stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the
optimal capacity’’ needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min{K/\log(K),d})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.
nan
Article 23
Title@2025-06-26 (4): Aligning Spoken Dialogue Models from User Interactions
Title: Aligning Spoken Dialogue Models from User Interactions | Ausrichten von gesprochenen Dialogmodellen aus Benutzerinteraktionen | 校对用户互动中的口语对话框模型 2506.21463v1 |
Authors (4): Anne Wu, Laurent Mazaré, Neil Zeghidour, Alexandre Défossez
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
我们提出一个新的优惠调整框架,以改善用户互动实时对话的口语对话模式; 目前优惠的学习方法主要侧重于基于文本的语言模式,不直接适应实时语音互动的复杂性,其动态更丰富(例如中断、互接),发言者旋转之间没有明显的分化; 我们创建了大型数据集,由来自原始多端语音对话的150 000多对特惠组合组成,并附有AI反馈说明,以涵盖语言内容和时间背景变化的偏好; 我们利用离线调整方法微调一个全不易自定义的自动语音对语音模式。 广泛的实验表明,对通用对话的反馈可以始终有效地改进口语对话模式,产生更实际、更安全和更符合背景的交互作用。 我们运用微调模型,进行整体的人类评价,以评估单点话之外的影响。 我们的调查结果揭示了各种动态之间平衡兼顾的重要性,这对自然实时语音对话系统至关重要。
Article 24
Title@2025-06-26 (4): A Keyword-Based Technique to Evaluate Broad Question Answer Script
Title: A Keyword-Based Technique to Evaluate Broad Question Answer Script | Eine Keyword-basierte Technik zur Bewertung von Broad Question Answer Script | 用于评价广泛问答脚本的关键字技术 2506.21461v1 |
Authors (5): Tamim Al Mahmud, Md Gulzar Hussain, Sumaiya Kabir, Hasnain Ahmad, Mahmudus Sobhan
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This article focuses on finding the keywords from the answer script and then compares them with the keywords that have been parsed from both open and closed domain. The system also checks the grammatical and spelling errors in the answer script. Our proposed system tested with answer scripts of 100 students and gives precision score 0.91.
评估是评估和确定教育系统的方法,通过口头或口头或口头声音测试、主观或客观书面测试等各种技术进行评估和确定教育系统,本文件是评价主观回答文字电子化的有效办法,我们在此文件中提出并实施了一套综合系统,用以审查和评价书面答复文字,这一条的重点是从回答文字中找出关键词,然后将其与从开放和封闭域中解析的关键词进行比较,该系统还检查了回答文字中的语法和拼写错误。我们提议的系统以100名学生的回答文字测试,给出了0.91分精确分数。
Article 25
Title@2025-06-26 (4): Wild refitting for black box prediction
Title: Wild refitting for black box prediction | Wilde Nachrüstung für Black Box Vorhersage | 黑盒预测的野生改造 2506.21460v1 |
Authors (1): Martin J. Wainwright
We describe and analyze a computionally efficient refitting procedure for computing high-probability upper bounds on the instance-wise mean-squared prediction error of penalized nonparametric estimates based on least-squares minimization. Requiring only a single dataset and black box access to the prediction method, it consists of three steps: computing suitable residuals, symmetrizing and scaling them with a pre-factor $\rho$, and using them to define and solve a modified prediction problem recentered at the current estimate. We refer to it as wild refitting, since it uses Rademacher residual symmetrization as in a wild bootstrap variant. Under relatively mild conditions allowing for noise heterogeneity, we establish a high probability guarantee on its performance, showing that the wild refit with a suitably chosen wild noise scale $\rho$ gives an upper bound on prediction error. This theoretical analysis provides guidance into the design of such procedures, including how the residuals should be formed, the amount of noise rescaling in the wild sub-problem needed for upper bounds, and the local stability properties of the block-box procedure. We illustrate the applicability of this procedure to various problems, including non-rigid structure-from-motion recovery with structured matrix penalties; plug-and-play image restoration with deep neural network priors; and randomized sketching with kernel methods.
我们描述并分析一个计算高概率高值上限的考虑高效的重新配置程序,该程序是在以最小平方最小化为基础,对受处罚的、非参数性估算的预测错误进行任意、平均和偏差的预测错误,仅要求单一数据集和黑盒访问预测方法,它由三个步骤组成:计算适当的残余物,以元前的元值对称和缩放,并使用它们来界定和解决当前估计中出现的经修改的预测问题。我们把它称为奇特的重新配置,因为它使用Rademacher残余对称的预测错误,作为野靴式变异。在相对温和的条件下,允许噪声异性,我们对它的性能建立高概率保证,表明用适当选择的野生噪声比标值对预测误率进行校准。这种理论分析为设计此类程序提供了指导,包括如何形成残余物,野生子对准的子质标值调整数量,因为它使用野生次质的次质调整,作为野生靴式组合的配比值,在较轻的条件下,在允许噪异异异的变异性情况下,我们建立高异性组合,我们为其性结构结构结构结构结构的恢复程序,我们用结构结构结构结构结构结构结构结构结构的恢复,用结构结构结构结构结构的变式的恢复,用结构结构,用结构结构结构结构结构的变形变式的变制,用办法说明。
Article 26
Title@2025-06-26 (4): Fake it till You Make it: Reward Modeling as Discriminative Prediction
Title: Fake it till You Make it: Reward Modeling as Discriminative Prediction | Verfälschen Sie es, bis Sie es: Belohnung Modellieren als diskriminative Vorhersage | 假称直到你做出它: 奖赏模型作为有偏见的预测 2506.13846v2 |
Authors (6): Runtao Liu, Jiahao Zhan, Yingqing He, Chen Wei, Alan Yuille, Qifeng Chen
An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models. However, current approaches of reward modeling suffer from implementation complexity due to their reliance on extensive human-annotated preference data or meticulously engineered quality dimensions that are often incomplete and engineering-intensive. Inspired by adversarial training in generative adversarial networks (GANs), this paper proposes GAN-RM, an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering. Our method trains the reward model through discrimination between a small set of representative, unpaired target samples(denoted as Preference Proxy Data) and model-generated ordinary outputs, requiring only a few hundred target samples. Comprehensive experiments demonstrate our GAN-RM’s effectiveness across multiple key applications including test-time scaling implemented as Best-of-N sample filtering, post-training approaches like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Code and data will be released at https://github.com/Visualignment/GAN-RM.
有效的奖赏模式在加强培训后提高视觉基因模型的学习方面发挥着关键作用,然而,目前的奖赏模式方法由于依赖广泛的人类附加说明的优惠数据或往往不完全和工程密集的精心设计的质量层面,而在执行方面又很复杂,因为其依赖大量人类附加说明的优惠数据或往往不完善和工程密集型的精心设计的质量层面,本文件提出GAN-RM,这是一个有效的奖赏模式框架,消除了人工偏好说明和明确的质量层面工程。我们的方法通过区分一小组代表、未标本的样本(称为Preference Proxy Data)和模型生成的普通产出来培训奖赏模式,只需要几百个目标样本。全面实验表明我们的GAN-RM在多种关键应用中的有效性,包括测试时间的扩大,作为最佳样本过滤、培训后方法,如超级火化(SFT)和直接Preat Optimiz化(DPO)等。 守则和数据将在https://github.com/Visualignmentmentmentment/GAN-RM。
Article 27
Title@2025-06-26 (4): Measurement to Meaning: A Validity-Centered Framework for AI Evaluation
Title: Measurement to Meaning: A Validity-Centered Framework for AI Evaluation | Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung | 衡量到意义:AI评价的有效性-中心框架 2505.10573v4 |
Authors (9): Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo
While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence. For instance, our framework helps determine whether performance on a mathematical benchmark is an indication of the ability to solve problems on math tests or instead indicates a broader ability to reason. Our framework is well-suited for the contemporary paradigm in machine learning, where various stakeholders provide measurements and evaluations that downstream users use to validate their claims and decisions. At the same time, our framework also informs the construction of evaluations designed to speak to the validity of the relevant claims. By leveraging psychometrics’ breakdown of validity, evaluations can prioritize the most critical facets for a given claim, improving empirical utility and decision-making efficacy. We illustrate our framework through detailed case studies of vision and language model evaluations, highlighting how explicitly considering validity strengthens the connection between evaluation evidence and the claims being made.
虽然AI系统的能力和效用已经提高,但评价这些系统的严格规范却落后了。大型的主张,如获得一般推理能力的模型等,得到了关于狭隘基准的示范性业绩的支持,例如研究生级考试问题的业绩,这提供了有限和潜在的误导性评估。我们为根据现有证据可以提出的评价性主张的类型提供了有条理的推理方法。例如,我们的框架有助于确定数学基准的绩效是表明解决数学测试问题的能力,还是表明较广泛的理性能力。我们的框架非常适合现代机器学习模式,即各种利益攸关方提供下游用户用来验证其主张和决定的测量和评价。与此同时,我们的框架也为旨在说明有关主张有效性的评价结构提供了参考。通过利用精神计量的断裂,评价可以确定某一索赔的最关键方面的优先次序,提高经验的效用和决策效力。我们通过对愿景和语言模型评价的详细案例研究来说明我们的框架,强调明确考虑有效性如何加强评价证据与正在提出的主张之间的联系。
Article 28
Title@2025-06-26 (4): PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries
Title: PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries | PARALLELPROMPT: Parallelität aus großen Sprachmodellfragen extrahieren | PARALELPROPT:从大语言模式查询中提取平行论 2506.18728v2 |
Authors (4): Steven Kolawole, Keshav Santhanam, Virginia Smith, Pratiksha Thaker
LLM serving systems typically treat user prompts as monolithic inputs, optimizing inference through decoding tricks or inter-query batching. However, many real-world prompts contain latent semantic parallelism–decomposable structures where subtasks can be executed independently to reduce latency while preserving meaning. We introduce PARALLELPROMPT, the first benchmark for measuring intra-query parallelism in natural user prompts. Our dataset comprises over 37,000 real-world prompts from public LLM chat logs, each annotated with a structured schema capturing task templates, shared context, and iteration inputs. These schemas are extracted using LLM-assisted prompting with rule-based multilingual validation. To evaluate the benefits of decomposition, we provide an execution suite that benchmarks serial vs. parallel strategies, measuring latency, structural adherence, and semantic fidelity. Our results show that intra-query parallelism can be successfully parsed in over 75% of curated datasets, unlocking up to 5x speedups on tasks like translation, comprehension, and comparative analysis, with minimal quality degradation. By releasing this benchmark, curation pipeline, and evaluation suite, we provide the first standardized testbed for studying structure-aware execution in LLM serving pipelines.
LLM 服务系统通常将用户提示作为单体输入,通过解码技巧或交错分批来优化推导。 然而,许多真实世界提示包含潜在的语义平行-分解结构,在这种结构中,子任务可以独立地执行,以减少潜伏,同时保留含义。我们引入了PARALELELLPROMPT,这是测量自然用户提示中隔体平行现象的第一个基准。我们的数据集由来自公共LLM聊天日志的37 000多个真实世界提示组成,每个都配有结构化的模型捕捉任务模板、共享背景和互换投入。这些系统是使用基于基于规则的多语种验证的LLLM协助生成的。为了评估分解的好处,我们提供了一套执行套,以平行战略为基准,衡量延缩、结构坚持和语义忠诚。我们的结果显示,内部平行状态可以成功地在超过75 %的已整理数据集中进行分解,将任务解解到5x速度,例如翻译、理解、对比、比较分析、测试性分析,我们进行这种标准化的升级化质量研究。
Article 29
Title@2025-06-26 (4): Towards an Optimal Control Perspective of ResNet Training
Title: Towards an Optimal Control Perspective of ResNet Training | Auf dem Weg zu einer optimalen Steuerungsperspektive der ResNet-Schulung | 建立ResNet培训最佳控制视角 2506.21453v1 |
Authors (4): Jens Püttschneider, Simon Heilig, Asja Fischer, Timm Faulwasser
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
我们建议为ResNets制定一个培训方案,反映适用于标准架构和一般损失功能的最佳控制问题。我们建议通过惩罚隐藏国家的中间输出来弥补这两个世界,与最佳控制阶段的成本相对应。对于标准ResNets,我们通过随后的跳过连接和输出层来传播国家,从而获得中间输出。我们证明我们的培训动态偏向了不必要的更深残余层的消散权重。这表明了理论基础层调整战略的潜力。
Article 30
Title@2025-06-26 (4): A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario
Title: A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario | Ein umfassender Datensatz für die Untertage-Miner-Erkennung in unterschiedlichen Szenarien | 不同情景下地下矿工探测综合数据集 2506.21451v1 |
Authors (4): Cyrus Addy, Ajay Kumar Gurumadaiah, Yixiang Gao, Kwame Awuah-Offei
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.
地下采矿作业面临巨大的安全挑战,使应急反应能力变得至关重要。虽然机器人在协助搜索和救援行动方面表现出希望,但其效力取决于可靠的矿工探测能力。深层学习算法为自动探测矿工提供了潜在的解决办法,但需要全面的训练数据集,目前地下采矿环境尚缺乏这些数据集。本文介绍了专门设计用于开发和验证矿工探测系统以用于潜在紧急应用的新型热成像数据集。我们系统收集了各种采矿活动的热成像和情景,以便为探测算法奠定坚实的基础。为建立基线性能指标,我们评估了几个最先进的物体探测算法,包括YOLOv8、YOLOv10、YOLOO11和在我们的数据集上的RT-DETR。虽然这一数据集并非详尽无遗所有可能的紧急情况,但作为开发可靠的热基探雷系统以最终可在实际紧急情况下部署的关键性的第一步。这项工作表明,使用热成能探测探雷工的可行性,并为这一关键安全应用的未来研究奠定基础。
Article 31
Title@2025-06-26 (4): Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform
Title: Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform | Lernbare adaptive Zeit-Frequenz-Darstellung über differenzierbare Kurzzeit Fourier-Transformation | 通过有区别的短时四轮式变换,通过有区别的短时四轮式变换, 2506.21440v1 |
Authors (5): Maxime Leiber, Yosra Marnissi, Axel Barrau, Sylvain Meignen, Laurent Massoulié
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.
短期的Fourier变换法(STFT)被广泛用于分析非静止信号,但其性能对其参数非常敏感,人工或超速调整往往产生不理想的结果。为了克服这一限制,我们提议对STFT采用统一的可区分的公式,以便能够根据梯度优化其参数。这一方法解决传统的STFT参数调制方法的局限性,这些方法往往依靠计算密集的离散搜索。它能够根据任何需要的标准对时频表示法进行微调。此外,我们的方法与神经网络无缝地结合,使STFT参数和网络重量能够联合优化。拟议的可区分的STFT在加强TRS和改进下游任务绩效方面的功效通过模拟数据和现实世界数据的实验得到证明。
Article 32
Title@2025-06-26 (4): New Bounds for Sparse Variational Gaussian Processes
Title: New Bounds for Sparse Variational Gaussian Processes | Neue Grenzen für Sparse Variational Gaussian Prozesse | 偏偏多高斯进程的新界口 2502.08730v2 |
Authors (1): Michalis K. Titsias
Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models. At the core of these methods is the assumption that the true posterior distribution over training function values ${\bf f}$ and inducing variables ${\bf u}$ is approximated by a variational distribution that incorporates the conditional GP prior $p({\bf f} | {\bf u})$ in its factorization. While this assumption is considered as fundamental, we show that for model training we can relax it through the use of a more general variational distribution $q({\bf f} | {\bf u})$ that depends on $N$ extra parameters, where $N$ is the number of training examples. In GP regression, we can analytically optimize the evidence lower bound over the extra parameters and express a tractable collapsed bound that is tighter than the previous bound. The new bound is also amenable to stochastic optimization and its implementation requires minor modifications to existing sparse GP code. Further, we also describe extensions to non-Gaussian likelihoods. On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance. |
松散的 Gossian 进程( GPs) 构建可移植的远端近似 GP 模型。 这些方法的核心是假设培训功能的真正后端分布值为 $ $ bf f f $ 美元,并引出变量 $ bf u 美元 美元 。 在GP 回归中, 我们可以分析将条件的GP 先前的 $p( ~ bf f } { { bf u } $ ) 包含在系数化中的差质分布值, 以包含有条件的 $ pp ( $ bf f f } b f u u } 美元。 虽然这个假设被认为是根本性的, 但是对于模型培训来说, 我们可以通过使用更普遍的变异分布值 $q ( q ( bf f } ) ) { { { b b { b f u u { { u} $ () $ $) $ $ $ $ $ $ n 额外的额外参数来放松 $, $, $ $ ( $, $ $ $ $ $ $ $ $ $ $ ( $ $ $ $ $ ( $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ( $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
Article 33
Title@2025-06-26 (4): Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Title: Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations | Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen | 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v3 |
Authors (4): Zeinab Dehghani, Mohammed Naveed Akram, Koorosh Aslansefat, Adil Khan
Large language models like GPT, LLAMA, and Claude have become incredibly powerful at generating text, but they are still black boxes, so it is hard to understand how they decide what to say. That lack of transparency can be problematic, especially in fields where trust and accountability matter. To help with this, we introduce SMILE, a new method that explains how these models respond to different parts of a prompt. SMILE is model-agnostic and works by slightly changing the input, measuring how the output changes, and then highlighting which words had the most impact. Create simple visual heat maps showing which parts of a prompt matter the most. We tested SMILE on several leading LLMs and used metrics such as accuracy, consistency, stability, and fidelity to show that it gives clear and reliable explanations. By making these models easier to understand, SMILE brings us one step closer to making AI more transparent and trustworthy.
GPT、LLAMA和Claude等大型语言模型在生成文本方面已经变得非常强大,但它们仍然是黑盒,所以很难理解它们如何决定要说什么。缺乏透明度可能会有问题,特别是在信任和问责很重要的领域。为了对此有所帮助,我们引入了SMILE,这是一个解释这些模型如何对快速的不同部分作出反应的新方法。SMILE是模型的不可知性,通过略微改变输入、测量产出变化,然后突出哪些词具有最大影响来开展工作。创建简单的直观热图,显示一个最迅速的事物的哪些部分。我们用几个主要的LLMMMS测试了SMILE,并使用了精确性、一致性、稳定性和忠诚性等指标来表明它提供了清晰可靠的解释。通过让这些模型更容易理解,SMILE让我们更接近于使AI更加透明和可信。
Article 34
Title@2025-06-26 (4): Graph Neural Network for Neutrino Physics Event Reconstruction
Title: Graph Neural Network for Neutrino Physics Event Reconstruction | Graph Neural Netzwerk für Neutrino Physik Ereignis Rekonstruktion | 中子物理事件重建神经网络 2403.11872v2 |
Authors (9): V Hewes, Adam Aurisano, Giuseppe Cerati, Jim Kowalkowski, Claire Lee, Wei-keng Liao, Daniel Grzenda, Kaushal Gumpula, Xiaohe Zhang
Liquid Argon Time Projection Chamber (LArTPC) detector technology offers a wealth of high-resolution information on particle interactions, and leveraging that information to its full potential requires sophisticated automated reconstruction techniques. This article describes NuGraph2, a Graph Neural Network (GNN) for low-level reconstruction of simulated neutrino interactions in a LArTPC detector. Simulated neutrino interactions in the MicroBooNE detector geometry are described as heterogeneous graphs, with energy depositions on each detector plane forming nodes on planar subgraphs. The network utilizes a multi-head attention message-passing mechanism to perform background filtering and semantic labelling on these graph nodes, identifying those associated with the primary physics interaction with 98.0\% efficiency and labelling them according to particle type with 94.9\% efficiency. The network operates directly on detector observables across multiple 2D representations, but utilizes a 3D-context-aware mechanism to encourage consistency between these representations. Model inference takes 0.12~s/event on a CPU, and 0.005s/event batched on a GPU. This architecture is designed to be a general-purpose solution for particle reconstruction in neutrino physics, with the potential for deployment across a broad range of detector technologies, and offers a core convolution engine that can be leveraged for a variety of tasks beyond the two described in this article.
液晶射时投影室(LARTPC)探测器技术提供了大量关于粒子相互作用的高分辨率信息,并充分利用这些信息来充分发挥其潜力,这就需要先进的自动重建技术。本篇文章描述了Nugraph2,一个用于在LarTPC探测器中低层次重建模拟中微微子相互作用的图形神经网络。MicroBooNE探测器的模拟中微子相互作用被描述为多维图形,但利用3D-text-aware机制鼓励这些表示的一致性。模型在规划子图上形成节点。网络利用多头注意信息传递机制在这些图形节点上进行背景过滤和语义标签,查明与初级物理相互作用相关的98.0效率的图像神经网络(GNNN),并按粒子类型贴标签。这个网络直接在多个2D示意图的探测器上进行观测,但利用3D-text-aware机制鼓励这些表达的一致性。模型在CPU上采用0.12-eve/event 模型,以及0.005s/event mission 标签标记这些图形/eval标签标签标签标签标签标签标签,用于在GPPPO 范围的常规部署中进行两次勘测算。
Article 35
Title@2025-06-26 (4): The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures
Title: The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures | Die Probenkomplexität von Lipschitz-Betreibern in Bezug auf Gaussische Maßnahmen | Gaussian措施方面学习利普施茨经营者的抽样复杂性 2410.23440v3 |
Authors (3): Ben Adcock, Michael Griebel, Gregor Maier
Operator learning, the approximation of mappings between infinite-dimensional function spaces using machine learning, has gained increasing research attention in recent years. Approximate operators, learned from data, can serve as efficient surrogate models for problems in computational science and engineering, complementing traditional methods. However, despite their empirical success, our understanding of the underlying mathematical theory is in large part still incomplete. In this paper, we study the approximation of Lipschitz operators with respect to Gaussian measures. We prove higher Gaussian Sobolev regularity of Lipschitz operators and establish lower and upper bounds on the Hermite polynomial approximation error. We then study general reconstruction strategies of Lipschitz operators from $m$ arbitrary (potentially adaptive) linear samples. As a key finding, we tightly characterize the corresponding sample complexity, that is, the smallest achievable worst-case error among all possible choices of (adaptive) sampling and reconstruction strategies in terms of $m$. As a consequence, we identify an inherent curse of sample complexity: No method to approximate Lipschitz operators based on $m$ linear samples can achieve algebraic convergence rates in $m$. On the positive side, we prove that a sufficiently fast spectral decay of the covariance operator of the underlying Gaussian measure guarantees convergence rates which are arbitrarily close to any algebraic rate. Overall, by tightly characterizing the sample complexity, our work confirms the intrinsic difficulty of learning Lipschitz operators, regardless of the data or learning technique.
操作员学习,即利用机器学习的无限功能空间间测图的近似值,近年来引起了越来越多的研究关注。从数据学的近似操作员可以作为计算科学和工程问题的有效替代模型,补充传统方法。然而,尽管他们取得了经验上的成功,但我们对基本数学理论的理解在很大程度上仍然不完全。在本文中,我们研究了利普西茨操作员对高斯测量方法的近似值。我们证明利普西茨操作员的常规性较高,并在赫尔米特多诺米近似误差上和下限。我们从数据学中学习了利普西茨操作员的一般重建战略,从任意(潜在的适应性)直线性样本中取出。我们仔细分析相应的抽样复杂程度,即所有可能选择(适应性)取样和重建战略中最小的最坏错误。结果是,我们发现样本复杂性的内在诅咒:没有办法在赫米特多多多多多多多多的线性近差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差分数中,因此我们可以获取高利普里普里普利普利普利维茨操作员的精解的精解的精确率率率率的精确率。
Article 36
Title@2025-06-26 (4): Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort
Title: Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort | Deception Detection in dyadischen Austauschen mit multimodalem maschinellem Lernen: Eine Studie über eine schwedische Kohorte | 利用多式机器学习的多式机器交流中的欺骗感检测:瑞典教区研究 2506.21429v1 |
Authors (4): Franco Rugolon, Thomas Jack Samuels, Stephan Hau, Lennart Högman
This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.
这项研究调查了使用多式机器学习技术探测dyadic互动中欺骗行为的效率,重点是将欺骗者和被欺骗者的数据结合起来。我们比较早期和后期融合方法,在各种模式和参与者的各种可能的组合中,使用视听数据 – – 特别是行动股和凝视信息 – – 对所有可能的组合方式和参与者进行对比。我们从瑞典本地语言者中新收集的数据集是了解真相的,或根据情感相关主题的假想收集的数据集,作为我们分析的基础。结果显示,与单一模式方法相比,将言语和面部信息结合起来可以产生优异的性能。此外,包括两个参与者提供的数据,大大提高了欺骗检测的准确性,使用既适用于模式又适用于参与者的晚融合战略取得的最佳性能(71%)。这些研究结果与心理理论一致,表明在初始互动期间对面部和声音表达方式有不同的控制。作为对斯堪的斯堪的纳维亚人组的首次研究,这种研究为今后对dyadic互动进行调查奠定了基础,特别是在精神疗法环境中。
Article 37
Title@2025-06-26 (4): Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Title: Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen | 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v1 |
Authors (2): Prajwal Koirala, Cody Fleming
Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.
推广和流程匹配等生成模型,通过捕捉丰富的多式联运行动分布,为离线强化学习提供表达式政策(RL),通过捕获富集的多式联运行动分布,但其迭代抽样采样带来了高推价和培训不稳定性,因为跨采样步骤的梯度传播。我们提议了\textit{Sing-Single-Step Forpulation Policy}(SSCP),这是经过强化流程匹配目标培训的基因化政策,以预测中间流样本的直接完成矢量,从而能够准确、一分球行动生成。在一个离政策性行为者-批评框架内,SSCP将基因化模型的表达性与单式政策的培训性和推论效率结合起来,而无需长长的后向推进链。我们的方法尺度可以有效地实现离线、离线到离线到在线和在线的RL环境,在对基于扩散的基线的速度和适应性方面带来巨大收益。我们进一步扩展了SSCP,使平板政策能够在没有明确的等级推论的情况下利用子目标目标性目标性目标性矢量结构。SSCP在标准离离线下和行为克隆基准之间取得了强有力的结果,将它定位定位为一个可操作性、连续和深度决策。
Article 38
Title@2025-06-26 (4): TracLLM: A Generic Framework for Attributing Long Context LLMs
Title: TracLLM: A Generic Framework for Attributing Long Context LLMs | TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext | TracLLM: 长期LMLM授标通用框架 2506.04202v3 |
Authors (4): Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia
Long context large language models (LLMs) are deployed in many real-world applications such as RAG, agent, and broad LLM-integrated applications. Given an instruction and a long context (e.g., documents, PDF files, webpages), a long context LLM can generate an output grounded in the provided context, aiming to provide more accurate, up-to-date, and verifiable outputs while reducing hallucinations and unsupported claims. This raises a research question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs) in the context that contribute most to or are responsible for the generated output by an LLM? This process, which we call context traceback, has various real-world applications, such as 1) debugging LLM-based systems, 2) conducting post-attack forensic analysis for attacks (e.g., prompt injection attack, knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources to enhance the trust of users towards outputs generated by LLMs. When applied to context traceback for long context LLMs, existing feature attribution methods such as Shapley have sub-optimal performance and/or incur a large computational cost. In this work, we develop TracLLM, the first generic context traceback framework tailored to long context LLMs. Our framework can improve the effectiveness and efficiency of existing feature attribution methods. To improve the efficiency, we develop an informed search based algorithm in TracLLM. We also develop contribution score ensemble/denoising techniques to improve the accuracy of TracLLM. Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM. Our code and data are at: https://github.com/Wang-Yanting/TracLLM.
长背景长语言模型(LLMS)用于许多现实世界应用程序,如RAG、代理和广 LLM综合应用程序。鉴于一个指令和长背景(例如文件、PDF文件、网页),长背景的LLM可以产生基于提供环境的产出,目的是提供更准确、最新和可核查的产出,同时减少幻觉和不支持的主张。这引起了一个研究问题:如何在最有助于或对LLMM生成产出负最大责任的文本(例如判决、段落或段落)中定位文本(例如,句子、段落或段落)?这个进程,我们称之为背景的LLMM综合应用程序?这个进程(我们称之为背景的追踪)有各种真实世界应用程序,例如1)调试基于LM系统的系统;2)对攻击(例如迅速注入攻击、知识攻击)进行事后法证分析,同时减少幻觉和不支持的主张。这引起了一个研究问题:如何在LMLM/CLM的上下文追踪时,现有的特征分析方法,例如Shaply-opimalus的次背景, 业绩/OLLLLLLLM框架可以改进我们现有的成本框架。
Article 39
Title@2025-06-26 (4): Continual Learning as Computationally Constrained Reinforcement Learning
Title: Continual Learning as Computationally Constrained Reinforcement Learning | Kontinuierliches Lernen als Computationally Constrained Reinforcement Learning | 持续学习作为计算限制的训练强化学习 2307.04345v3 |
Authors (7): Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.
长期有效积累知识以发展日益复杂的技能的代理人可以推进人工智能能力的前沿,这种代理人的设计仍然是人工智能的长期挑战,通过持续学习的主题来处理,该专著澄清并正式确定持续学习的概念,引入一个框架和一套工具来刺激进一步的研究。
Article 40
Title@2025-06-26 (4): Improving Stochastic Cubic Newton with Momentum
Title: Improving Stochastic Cubic Newton with Momentum | Verbesserung der stochastischen Kubik Newton mit Momentum | 快速改善斯托卡立方立方牛顿 2410.19644v2 |
Authors (3): El Mahdi Chayti, Nikita Doikov, Martin Jaggi
We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton’s method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
我们研究了解决一般非convex优化问题的第二顺序方法。 我们提议在牛顿方法中使用特殊版本的动力来稳定随机梯度和黑森估计值。 我们表明,这种动力可以改善随机估计值的差异,并允许方法为任何噪声水平趋同。 使用立方正法技术,我们证明我们处理一般非convex问题的方法的全球趋同率为第二顺序固定点,即使每次迭代只使用单一的随机数据样本。 这与所有现有的非convex问题第二顺序方法截然不同,通常需要大批量。 因此,我们首先展示了非convex情况下任意大小的批次在Stochacistic Cubic Newton中的全球趋同率。 此外,我们展示了我们正常化的牛顿方法的convex随机问题速度加快。
Article 41
Title@2025-06-26 (4): Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional
Title: Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional | Aktionsminimierung trifft auf generative Modellierung: Effizientes Transition Path Sampling mit der Onsager-Machlup Funktion | 行动最优化符合产生模型的生成模型:与Onsager-Machlup 职能进行高效率过渡道路抽样 2504.18506v3 |
Authors (6): Sanjeev Raja, Martin Šípka, Michael Psenka, Tobias Kreiman, Michal Pavelka, Aditi S. Krishnapriyan
Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model’s original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.
过渡路径取样(TPS)涉及寻找连接能源景观上两个点的可能的路径,由于现实世界的原子系统的复杂性,这仍然是一个挑战。当前机器学习方法使用昂贵、针对具体任务和无数据的培训程序,限制了它们受益于高质量数据集和大规模预先培训模型的能力。在这项工作中,我们通过将候选路径解释为从经培训前基因模型的学识分数函数所引导的随机动态中抽取的轨迹来处理TPS,特别是分解传播和流动匹配。在这些动态下,找到高相似的过渡路径可以等同于最大限度地减少Onsagger-Machlup(OM)行动功能。这使我们能够以零发式的方式重新使用事先经过培训的TPS基因化模型,而不是先前工作中的说辞式、特定任务的方法。我们展示了我们对于各种分子系统的做法,获得多样化的、实际现实的过渡路径,以及超越了经过培训前模型的原始培训数据集集。我们的方法可以很容易地纳入新的基因化模型中,使它实际上具有相关性的AS-M-TPS-S-S-S-S-AQ-S-S-AQ-AQ-AD-S-AD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-A-A-A-A-S-S-S-S-A-A-A-A-A-A-S-S-S-A-A-A-A-A-A-A-A-A-A-A-A-S-S-S-S-S-A-A-A-A-A-A-S-A-S-S-S-S-A-A-S-S-A-A-A-A-A-S-S-A-A-A-S-A-A-A-A-A-A-A-A-A-A-A-A-A-S-S-A
Article 42
Title@2025-06-26 (4): Distributed Cross-Channel Hierarchical Aggregation for Foundation Models
Title: Distributed Cross-Channel Hierarchical Aggregation for Foundation Models | Verteilte Cross-Channel Hierarchische Aggregation für Stiftungsmodelle | 基金会模型分布式跨河道分道分道分道分道分道分道分道分道分道分道分道分道 2506.21411v1 |
Authors (9): Aristeidis Tsaris, Isaac Lyngaas, John Lagregren, Mohamed Wahib, Larry York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang
Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.
基于愿景的科学基础模型对推进科学发现和创新有着重大希望,其潜力来自它们利用变压器结构将各种来源的图像汇总起来的能力,例如各种物理地面定位或数据采集系统,以及学习时空关系的能力。然而,象征性图像和集成图像可以计算为密集,而目前分布的方法并未充分解决这一挑战。在这项工作中,我们引入了为不同图像模式众多渠道的数据集设计的分布式跨通道分道分层分层分层聚合(D-CHAG)方法。我们的方法与任何模型平行战略和任何类型的视觉变异器结构相兼容,大大提高了计算效率。我们评估了超光谱成像和天气预报任务方面的D-CHAG。当与超光谱和模型分解相结合时,我们的方法在记忆用量方面减少了75%,在前沿超级计算机上超过1 024 AMD GPUs的通过量翻了一番。
Article 43
Title@2025-06-26 (4): Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Title: Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference | Skalierbare Bayesische Low-Rank-Anpassung von großen Sprachmodellen über stochastische Variations-Subraum-Inferenz | 通过Stochastic变异性子空间推断,对大语言模型进行可缩放的Bayesian低Rank 2506.21408v1 |
Authors (5): Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${\sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.
尽管使用广泛,但大型语言模型(LLMS)已知会给错误信息带来幻觉,而且校准也差强人意。这使得这些模型具有关键重要性的不确定性量化,特别是在自主和医疗保健等高取域。先前的工作使得贝叶斯人的深深学习方法更加容易解决这一问题,对微调模型的低调适应(LORA)参数进行推论。虽然这些方法有效,但由于比LORA更需要额外的参数,因此难以向更大的LLM(LLLM)推广更大的LMS(LLM)。在这项工作中,我们通过Stochastectic Variational Subspace Inference(ScalaBL) 将美元作为关键参数,我们能够从这个子空间绘制样本到LLM($x)的完整重量空间($xx),BB(B) $(LLM) $(B) 。这样,我们就能通过Stocrical-deal-deal-deal-ladeal-deal-deal-deal-levelyal-laveal-laveal-s) 方法,我们才能在低平地平面上进行最大规模的参数变。
Article 44
Title@2025-06-26 (4): Early Stopping Tabular In-Context Learning
Title: Early Stopping Tabular In-Context Learning | Frühzeitiges Stoppen des tabellarischen In-Context-Lernens | 早期停止制表列表内容学习 2506.21387v1 |
Authors (3): Jaris Küken, Lennart Purucker, Frank Hutter
Tabular foundation models have shown strong performance across various tabular learning tasks via in-context learning, offering robust generalization without any downstream finetuning. However, their inference-time costs remain high, particularly for larger datasets. To address this, we propose early-stopping the in-context learning process. We achieve this by dynamically evaluating whether to stop in-context learning after each Transformer encoder layer. Once stopped, we decode the embedding using a pre-trained layer-wise decoder. Experiments across 34 small classification tasks size show that early stopping in-context learning accelerates inference by up to x1.3 with negligible degradation in predictive performance. To assess scalability, we further evaluate our method on five larger classification tasks, achieving speedups of up to x2.2. Our results demonstrate the potential of early exiting as an effective and practical strategy for improving the efficiency of tabular in-context learning.
制表基础模型显示,在各种表格式学习任务中,通过理论内学习取得了良好的业绩,提供了强有力的概括性,而没有下游的微调。然而,它们的推论时间成本仍然很高,特别是对于较大的数据集而言。为此,我们提议尽早停止理论内学习过程。我们通过动态评估在每个变异器编码器层之后是否停止文字内学习来实现这一目标。一旦停止,我们就使用预先训练的分层解码器解码嵌入。34个小分类任务规模的实验表明,早期停止理论内学习加速了推导速度,最高为x1.3,预测性性表现可忽略不小。为了评估可扩展性,我们进一步评估了我们关于五个较大分类任务的方法,实现x2.2的加速。我们的结果表明,早期退出有可能成为提高文字内表学习效率的有效实用战略。
Article 45
Title@2025-06-26 (4): Representation Learning of Lab Values via Masked AutoEncoders
Title: Representation Learning of Lab Values via Masked AutoEncoders | Darstellung Lernen von Laborwerten über Maskierte AutoEncoder | 通过蒙面自动编码器学习实验室价值 2501.02648v3 |
Authors (8): David Restrepo, Chenwei Wu, Yueran Jia, Jaden K. Sun, Jack Gallifant, Catherine G. Bielick, Yugang Jia, Leo A. Celi
Accurate imputation of missing laboratory values in electronic health records (EHRs) is critical to enable robust clinical predictions and reduce biases in AI systems in healthcare. Existing methods, such as XGBoost, softimpute, GAIN, Expectation Maximization (EM), and MICE, struggle to model the complex temporal and contextual dependencies in EHR data, particularly in underrepresented groups. In this work, we propose Lab-MAE, a novel transformer-based masked autoencoder framework that leverages self-supervised learning for the imputation of continuous sequential lab values. Lab-MAE introduces a structured encoding scheme that jointly models laboratory test values and their corresponding timestamps, enabling explicit capturing temporal dependencies. Empirical evaluation on the MIMIC-IV dataset demonstrates that Lab-MAE significantly outperforms state-of-the-art baselines such as XGBoost, softimpute, GAIN, EM, and MICE across multiple metrics, including root mean square error (RMSE), R-squared (R2), and Wasserstein distance (WD). Notably, Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions. We further investigate the role of follow-up laboratory values as potential shortcut features, revealing Lab-MAE’s robustness in scenarios where such data is unavailable. The findings suggest that our transformer-based architecture, adapted to the characteristics of EHR data, offers a foundation model for more accurate and fair clinical imputation. In addition, we measure and compare the carbon footprint of Lab-MAE with the a XGBoost model, highlighting its environmental requirements.
对电子健康记录(EHRs)中缺失的实验室价值进行精确估算至关重要,这有助于在保健中进行稳健的临床预测和减少AI系统的偏差。现有的方法,如XGBoost、软impute、GAIN、期望最大化(EM)和MICE, 努力模拟EHR数据中复杂的时间和背景依赖性,特别是在代表人数不足的群体中。在这项工作中,我们提议Lab-MAE,一个基于变压器的隐藏自动编码框架,利用自我监督的学习来估算连续连续连续连续的实验室价值。Lab-MAE引入一个结构化的编码计划,共同模拟实验室测试值及其相应的时间戳,从而能够明确捕捉时间依赖性。对MIMIC-IV数据集的实证性评估表明,LABMAE大大超越了诸如XGBoost、软性精度、GAINA、E和MIICE等多度(包括根平均值错误)、Rqual-qualadad(R2)和瓦列斯坦(Westrstein)的精确度快速预测性(Weal-ma-ma-labal labal labal laislation) rolation rolation rolation laislation laislation laislations laislup labislup laislup lax laisl), la labs lax lax lax lax lax lax lax lax lades ladess ladess lax lax lad lax lax lax lax lax lax lader lader lader labs lab labs labs labs lader lad lads lad lad lad lad labbbbbs lade labs labs labs labs la labs labs la labs labs labsl labs labs labs la la labs la
Article 46
Title@2025-06-26 (4): Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection
Title: Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection | Temporal-Aware Graph Aufmerksamkeit Netzwerk für Kryptowährung Transaktion Betrugserkennung | 加密货币交易欺诈侦查实时警示图关注网络 2506.21382v1 |
Authors (3): Zhi Zheng, Bochuan Zhou, Yuping Song
Cryptocurrency transaction fraud detection faces the dual challenges of increasingly complex transaction patterns and severe class imbalance. Traditional methods rely on manual feature engineering and struggle to capture temporal and structural dependencies in transaction networks. This paper proposes an Augmented Temporal-aware Graph Attention Network (ATGAT) that enhances detection performance through three modules: (1) designing an advanced temporal embedding module that fuses multi-scale time difference features with periodic position encoding; (2) constructing a temporal-aware triple attention mechanism that jointly optimizes structural, temporal, and global context attention; (3) employing weighted BCE loss to address class imbalance. Experiments on the Elliptic++ cryptocurrency dataset demonstrate that ATGAT achieves an AUC of 0.9130, representing a 9.2% improvement over the best traditional method XGBoost, 12.0% over GCN, and 10.0% over standard GAT. This method not only validates the enhancement effect of temporal awareness and triple attention mechanisms on graph neural networks, but also provides financial institutions with more reliable fraud detection tools, with its design principles generalizable to other temporal graph anomaly detection tasks.
加密货币交易欺诈的探测面临日益复杂的交易模式和严重的阶级不平衡的双重挑战。传统方法依靠人工特征工程和努力捕捉交易网络的时间和结构依赖性。本文件建议建立一个强化的时空觉图形关注网络(ATGAT),通过三个模块提高检测性能:(1) 设计一个先进的时间嵌入模块,将多重时间差异特征与定期位置编码结合起来;(2) 建立一个时间觉觉的三重关注机制,共同优化结构、时间和全球范围的注意力;(3) 使用加权的BCE损失解决阶级不平衡问题。对 Ellipitic+密码货币数据集的实验表明,AUC为0.9130,比最佳的传统方法XGBoost提高了9.2%,比GCN提高了12.0%,比标准GAT高出10.0%。这种方法不仅验证了时间意识的增强效应和对图形神经网络的三重关注机制,而且还为金融机构提供了更可靠的欺诈检测工具,其设计原则可概括到其他时钟异常检测任务。
Article 47
Title@2025-06-26 (4): HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps
Title: HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps | HARPT: Ein Corpus für die Analyse des Vertrauens und der Datenschutzbelange der Verbraucher in mobilen Gesundheits-Apps | HARPT: 分析移动保健应用程序中消费者信任和隐私问题的一个公司 2506.19268v2 |
Authors (6): Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, Praveen Rao
We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.
我们提出了大规模、附加说明的移动健康应用商店审查,旨在推进用户隐私和信任方面的研究。数据集由48万多个用户审查组成,分为七个类别,涵盖对应用程序的信任、对供应商的信任和对隐私的关切等关键方面。建立流动健康应用审查需要处理多种复杂问题,例如界定细微标签模式,将相关内容与大量吵杂数据隔离开来,以及设计平衡可调适性与准确性的批注战略。这一战略整合了基于规则的过滤、迭代人工标签与审查、有针对性的数据扩增,以及使用基于变压器的分类器进行薄弱的监督,以加快覆盖范围。与此同时,经过仔细整理的7 000项审查被手工附加说明,以支持模式的开发和评估。我们为一系列广泛的分类模型设定基准,表明强有力的业绩是可以实现的,并为今后的研究提供了基准。HARPT作为公共资源发布,以支持卫生信息学、网络和自然语言处理方面的工作。
Article 48
Title@2025-06-26 (4): Pay Attention to Small Weights
Title: Pay Attention to Small Weights | Achten Sie auf kleine Gewichte | 关注小体重 2506.21374v1 |
Authors (4): Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free – the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.
据知,在内存和计算成本方面,对受过训练的大型神经网络进行微调都是资源密集型的。为了缓解这一点,一个共同的方法是将培训限制在一组模型参数之内。通过分析微调期间梯度和重量之间的关系,我们观察到一个显著的模式:大梯度往往与小磁度重量有关。这种相关性在微调设置方面比从零开始的培训更为明显。我们提出,NANOADAM在这项观察中只动态地更新微量重量,并提供若干实际的优势:首先,这一标准是无梯度的,参数子可以在不计算梯度的情况下确定;第二,它保留大梯度重量,这有可能将培训前所学的关键特征编码起来,从而降低灾难性遗忘的风险;第三,它允许使用更大的学习率,并不断提高实验的普及性能。我们为NLP和视觉任务演示了这一点。
Article 49
Title@2025-06-26 (4): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application | Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung | 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v2 |
Authors (4): Xiucheng Wang, Honggang Jia, Nan Cheng, Dusit Niyato
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.
本文提出一个新的语义通信框架,通过基因人工智能(GAI)增强对频道噪音和传输数据分布变化的稳健性能; 利用随机差分方程(SDEs)建立理论基础,从中得出任何信号对噪音比率(SNR)和最佳分流时间步骤之间的封闭式映射; 此外,为解决分布不匹配问题,还采用了数学缩放方法,使接收到的语义特征与GAI的培训分布相匹配。 在这个理论基础上,提议了一个基于潜在传播模型(LDM)的语义通信框架,将用于提取语义特征的变异自动校对仪(SDEs)结合起来,在此过程中,使用一种先入为定的传播模型进行分解。 拟议的系统是一个无培训框架,支持零发全局化,并在低调和分配条件下实现优异性业绩,为未来的6G语义通信系统提供了一个可扩缩和稳健的解决方案。 实验结果显示,拟议的语义通信框架在SMAL级后质量上实现了任何恒定的SIS级质量。
Article 50
Title@2025-06-26 (4): MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
Title: MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators | MAx-DNN: Mehrstufige Arithmetik-Annäherung für energieeffiziente DNN-Hardwarebeschleuniger | MAX-DNN: 能源高效 DNN 硬件加速器的多级自动测量近似法 2506.21371v1 |
Authors (5): Vasileios Leon, Georgios Makris, Sotirios Xydis, Kiamal Pekmestzi, Dimitrios Soudris
Nowadays, the rapid growth of Deep Neural Network (DNN) architectures has established them as the defacto approach for providing advanced Machine Learning tasks with excellent accuracy. Targeting low-power DNN computing, this paper examines the interplay of fine-grained error resilience of DNN workloads in collaboration with hardware approximation techniques, to achieve higher levels of energy efficiency. Utilizing the state-of-the-art ROUP approximate multipliers, we systematically explore their fine-grained distribution across the network according to our layer-, filter-, and kernel-level approaches, and examine their impact on accuracy and energy. We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations. The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model, while it provides 2x energy gains with better accuracy versus the state-of-the-art DNN approximations.
目前,深神经网络(DNN)结构的快速增长将它们确立为提供精密的先进机器学习任务的实际方法。 以低功率的DNN计算为目标,本文件审查了DNN工作量微微差错应力与硬件近似技术的相互作用,以实现更高的能效水平。 利用最先进的ROUP近似乘数,我们系统地根据我们的层、过滤和内核层面方法探索其在网络中的细差分布,并研究其对准确性和能源的影响。 我们使用CIFAR-10数据集的ResNet-8模型来评估我们的近似值。 与基线四分位模型相比,拟议的解决方案提供了高达54%的能源增益,以换取高达4%的精度损失,同时它提供了比最先进的DNN近率更精确的2x能源增益。
Article 51
Title@2025-06-26 (4): rQdia: Regularizing Q-Value Distributions With Image Augmentation
Title: rQdia: Regularizing Q-Value Distributions With Image Augmentation | rQdia: Regularisieren der Q-Value-Distributionen mit Bildvergrößerung | rQdia: 以图像放大方式规范 Q- 价值发行 2506.21367v1 |
Authors (2): Sam Lerman, Jing Bi
rQdia regularizes Q-value distributions with augmented images in pixel-based deep reinforcement learning. With a simple auxiliary loss, that equalizes these distributions via MSE, rQdia boosts DrQ and SAC on 9/12 and 10/12 tasks respectively in the MuJoCo Continuous Control Suite from pixels, and Data-Efficient Rainbow on 18/26 Atari Arcade environments. Gains are measured in both sample efficiency and longer-term training. Moreover, the addition of rQdia finally propels model-free continuous control from pixels over the state encoding baseline.
rQdia 将Q值分布规范化,在像素强化深层学习中增加图像。由于简单的辅助损失,这些分布通过MSE实现平衡,rQdia分别在9/12和10/12任务中将DrQ和SAC提升到MuJoCo连续控制套件中,18/26 Atari 街机环境中的数据有效彩虹中,在18/26 Atari 街机环境中分别实施DrQ和10/12任务,在样本效率和长期培训中衡量收益。此外,在州编码基线上添加了rQdia最终推进器,从像素上实现无模型连续控制。
Article 52
Title@2025-06-26 (4): SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
Title: SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning | SMMILE: Ein sachverständiger Benchmark für multimodales medizinisches In-Context-Lernen | SMMILE:多模式医学内书学习专家开发基准 2506.21355v1 |
Authors (12): Melanie Rieff, Maya Varma, Ossian Rabow, Subathra Adithan, Julie Kim, Ken Chang, Hannah Lee, Nidhi Rohatgi, Christian Bluethgen, Mohamed S. Muneer, Jean-Benoit Delbrouck, Michael Moor
Multimodal in-context learning (ICL) remains underexplored despite significant potential for domains such as medicine. Clinicians routinely encounter diverse, specialized tasks requiring adaptation from limited examples, such as drawing insights from a few relevant prior cases or considering a constrained set of differential diagnoses. While multimodal large language models (MLLMs) have shown advances in medical visual question answering (VQA), their ability to learn multimodal tasks from context is largely unknown. We introduce SMMILE, the first expert-driven multimodal ICL benchmark for medical tasks. Eleven medical experts curated problems, each including a multimodal query and multimodal in-context examples as task demonstrations. SMMILE encompasses 111 problems (517 question-image-answer triplets) covering 6 medical specialties and 13 imaging modalities. We further introduce SMMILE++, an augmented variant with 1038 permuted problems. A comprehensive evaluation of 15 MLLMs demonstrates that most models exhibit moderate to poor multimodal ICL ability in medical tasks. In open-ended evaluations, ICL contributes only 8% average improvement over zero-shot on SMMILE and 9.4% on SMMILE++. We observe a susceptibility for irrelevant in-context examples: even a single noisy or irrelevant example can degrade performance by up to 9.5%. Moreover, example ordering exhibits a recency bias, i.e., placing the most relevant example last can lead to substantial performance improvements by up to 71%. Our findings highlight critical limitations and biases in current MLLMs when learning multimodal medical tasks from context.
尽管医学等领域具有巨大的潜力,但多式文字学习(ICL)仍未得到充分探索,尽管在医学等领域具有巨大的潜力。临床医生经常遇到各种各样的特殊任务,需要从有限的例子中加以调整,例如从以前的一些相关案例中汲取深刻的见解,或考虑一套有限的差异诊断。多式大型语言模型(MLLMS)在医学直观解答(VQA)方面显示出了进步,但是他们从背景中学习多式任务的能力在很大程度上是未知的。我们引入了SMMILE(SMMLLLE),这是专家驱动的医学任务的第一个多式ICLU基准。11名医学专家整理了问题,每个问题都包括多式联运查询和多式文字背景实例,作为任务示范。SMMILE包含111个问题(517个问答三部),涵盖6个医学专业和13个成像模式。我们进一步引入了SMMILE+++(SMMM),一个增加的变异模式,表明,大多数模式在医疗任务中表现不及不良的多式ICLLE能力。在开放性评估中,只有8%的平均改进,从目前的SMMMILLLLLLE和9.4M(9.4%)和最不相关。我们不相关业绩的排序,从一个不相关的例子。我们用一个不相近于正的成绩,可以将一个不相像为例。
Article 53
Title@2025-06-26 (4): Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions
Title: Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions | Lipschitz Bounds für persistente Laplacian Eigenwerte unter One-Simplex-Insertionen | 在单质插入下用于持久性拉板电极值的 Lipschitz Bounds 2506.21352v1 |
Authors (3): Le Vu Anh, Mehmet Dik, Nguyen Viet Anh
Persistent Laplacians are matrix operators that track how the shape and structure of data transform across scales and are popularly adopted in biology, physics, and machine learning. Their eigenvalues are concise descriptors of geometric and topological features in a filtration. Although earlier work established global algebraic stability for these operators, the precise change in a single eigenvalue when one simplex, such as a vertex, edge, or triangle, is added has remained unknown. This is important because downstream tools, including heat-kernel signatures and spectral neural networks, depend directly on these eigenvalues. We close this gap by proving a uniform Lipschitz bound: after inserting one simplex, every up-persistent Laplacian eigenvalue can vary by at most twice the Euclidean norm of that simplex’s boundary, independent of filtration scale and complex size. This result delivers the first eigenvalue-level robustness guarantee for spectral topological data analysis. It guarantees that spectral features remain stable under local updates and enables reliable error control in dynamic data settings.
持久性的拉普拉梯是跟踪数据形状和结构如何在尺度上变化的矩阵操作员,并被生物、物理和机器学习广泛采用。它们的叶质值是过滤中几何和地貌特征的简明描述符。虽然早先的工作为这些操作员确立了全球代数稳定性,但是当一个简单的x,例如顶点、边缘或三角体,添加一个单项电子值的精确变化仍然未知。这很重要,因为下游工具,包括热内核信号和光谱神经网络,直接依赖这些电子值。我们通过证明一个统一的利普西茨约束来缩小这一差距:在插入一个简单x之后,每个高持久性的拉普拉皮奇亚基因值最多可以变化两倍于该简单x边界的欧克利德兰规范,独立于过滤尺度和复杂尺寸。这为光谱表层数据分析提供了第一个精度水平的稳健性保证。它保证光谱特征在当地更新中保持稳定,并且能够在动态数据设置中进行可靠的错误控制。
Article 54
Title@2025-06-26 (4): On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
Title: On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory | Über die Fähigkeit tiefer Netzwerke, Symmetrien aus Daten zu lernen: Eine neurale Kerneltheorie | 深网络从数据中学习对称的深网络能力:神经核心理论 2412.11521v2 |
Authors (2): Andrea Perin, Stephane Deny
Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds considerable promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks – with standard architectures trained in a standard, supervised way – learn symmetries from data. Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others – only a subset. In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning. The group-cyclic nature of the dataset allows us to analyze the Gram matrix of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of class separation (signal) and class-orbit density (noise). This characterization reveals that generalization can only be successful when the local structure of the data prevails over its non-local, symmetry-induced structure, in the kernel space defined by the architecture. We extend our theoretical treatment to any finite group, including non-abelian groups. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional deep networks lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
在许多数据集中存在对称性( 以群集动作转换) , 利用这些对称性在很多数据集中存在, 为改进机器学习的预测提供了相当的希望。 在这项工作中, 我们的目标是理解何时和如何深层次的网络 – – 标准架构以标准、 监督的方式培训, 从数据中学习对称性。 受现实世界情景的启发, 我们研究一个分类模式, 数据对称性在培训中只是部分观察到: 有些班级包括环流组的所有转换, 而另一些班级 – – 不是一个子。 在无限的深宽限中, 适用内核模拟, 我们得出了对称性学习的神经内核网络的内核理论理论。 数据集的群集性性质使我们能够分析内核内核内核内核内核内核内核内核结构; 我们的内核内核内核内核内核内核结构, 我们的内核内核内核内核结构, 我们的内核内核内核内核结构, 我们的内核内核结构的内核结构, 我们的内核内核内核结构, 我们的内化的内核结构的内核结构, 我们的内核结构的内核, 的内核内核结构的内核结构, 我们的内核的内核, 的内核的内核结构的内核结构的内化, 的内化, 我们的内核结构的内化, 我们的内核结构的内核结构, , , , 我们的内核的内核的内核结构的内核的内核的内核的内核的内化, 我们的内核的内核结构的内核的内核的内核结构的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核, , , 的内核结构的内核的内核, 的内核结构, , 的内核的内核的内核, 的内核,我们的内核的内核的内核结构, 的内核, 的内核的内核, 的内核的内核的内核的内核的内核的内核的内核的内核的内核,
Article 55
Title@2025-06-26 (4): Learning Value of Information towards Joint Communication and Control in 6G V2X
Title: Learning Value of Information towards Joint Communication and Control in 6G V2X | Lernwert von Informationen zur gemeinsamen Kommunikation und Kontrolle in 6G V2X | 6G V2X 6G 6G V2X 6G 6G 6G V2X 6G 6G 6G 2505.06978v2 |
Authors (4): Lei Lei, Kan Zheng, Xuemin, Shen
As Cellular Vehicle-to-Everything (C-V2X) evolves towards future sixth-generation (6G) networks, Connected Autonomous Vehicles (CAVs) are emerging to become a key application. Leveraging data-driven Machine Learning (ML), especially Deep Reinforcement Learning (DRL), is expected to significantly enhance CAV decision-making in both vehicle control and V2X communication under uncertainty. These two decision-making processes are closely intertwined, with the value of information (VoI) acting as a crucial bridge between them. In this paper, we introduce Sequential Stochastic Decision Process (SSDP) models to define and assess VoI, demonstrating their application in optimizing communication systems for CAVs. Specifically, we formally define the SSDP model and demonstrate that the MDP model is a special case of it. The SSDP model offers a key advantage by explicitly representing the set of information that can enhance decision-making when available. Furthermore, as current research on VoI remains fragmented, we propose a systematic VoI modeling framework grounded in the MDP, Reinforcement Learning (RL) and Optimal Control theories. We define different categories of VoI and discuss their corresponding estimation methods. Finally, we present a structured approach to leverage the various VoI metrics for optimizing the When",
What”, and ``How” to communicate problems. For this purpose, SSDP models are formulated with VoI-associated reward functions derived from VoI-based optimization objectives. While we use a simple vehicle-following control problem to illustrate the proposed methodology, it holds significant potential to facilitate the joint optimization of stochastic, sequential control and communication decisions in a wide range of networked control systems.
随着汽车对一切的细胞(C-V2X)向未来的第六代(6G)网络发展,连接自治车辆(CAV)正在形成,成为关键应用。利用数据驱动机器学习(ML),特别是深强化学习(DRL),预计将大大加强车辆控制和V2X通信在不确定性情况下的决策。这两个决策过程密切相关,信息的价值(VoI)是它们之间的重要桥梁。在本文中,我们引入了“连续操作决策过程(SSDP)”模型,以界定和评估VoI,展示其在优化CAVS通信系统的应用。具体地说,我们正式定义了数据驱动机器学习(ML)模式,并表明MDP模式是其中的一个特例。SDP模型通过明确代表能够加强现有决策的一组信息,从而提供了一个关键优势。此外,由于目前对VoI的研究仍然很分散,我们提议一个系统化的VoI模型,用于构建一个基于MDP、强化学习(RL)和优化操作过程(OFI)的流程模型,我们定义了该工具的系统结构化的系统,我们界定了各种成本定义的系统。
Article 56
Title@2025-06-26 (4): PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks
Title: PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks | PuriDefense: Randomized Local Implizite Adversarial Purification for Defending Black-Box Query-based Attacks | 防御:保护黑箱质疑式袭击的随机本地秘密对抗性净化 2401.10586v2 |
Authors (6): Ping Guo, Xiang Li, Zhiyuan Yang, Xi Lin, Qingchuan Zhao, Qingfu Zhang
Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model’s architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
黑盒查询式攻击对机器学习服务系统构成重大威胁,因为这些系统可以产生对抗性例子,而不能利用目标模型的结构和参数;传统的防御机制,如对抗性训练、梯度蒙面和输入转换,要么造成大量的计算成本,要么损害非对抗性投入的测试准确性;为应对这些挑战,我们提议一个高效的防御机制PuriDefence,即PuriDefense,采用随机的补丁净化机制,以低的推论成本,使用轻量净化模型组合,这些模型利用当地的隐含功能,重建自然图像的多重。我们的理论分析表明,这种方法通过将随机性纳入净化,减缓了基于查询的攻击的趋同。关于CIFAR-10和图像网络的广泛实验证实了我们提议的纯净化式防御机制的有效性,表明对基于查询的攻击的稳健性有了显著改善。
Article 57
Title@2025-06-26 (4): Regret Bounds for Robust Online Decision Making
Title: Regret Bounds for Robust Online Decision Making | Bedauern Sie Grenzen für robuste Online-Entscheidungsfindung | 对强有力的在线决策感到遗憾 2504.06820v2 |
Authors (2): Alexander Appel, Vanessa Kosoy
We propose a framework which generalizes “decision making with structured observations” by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).
我们提出一个框架,通过允许稳健(即多值)的模型(即多值)模式,将“决策与结构化观测”概括化。在这个框架内,每个模型将每个决定与一系列概率分布与结果联系起来。自然可以任意(对抗)的方式从这个组合中选择分布,这种分布可能不明显,并取决于过去的历史。由此形成的框架比古典强盗和强化学习更具有普遍性,因为真实性假设变得更弱,更现实。我们随后为这个框架得出一个遗憾界限理论。虽然我们的下界和上界并不紧凑,但足以充分描述权力法学习能力。我们在两个特殊案例中展示了这一理论:强势线性强强和表格式强的在线强化学习。在这两个案例中,我们产生了改善现状的遗憾界限(除非我们不处理计算效率问题 ) 。
Article 58
Title@2025-06-26 (4): DynamicBench: Evaluating Real-Time Report Generation in Large Language Models
Title: DynamicBench: Evaluating Real-Time Report Generation in Large Language Models | DynamicBench: Bewertung der Echtzeit-Berichtserstellung in großen Sprachmodellen | 动态 bench:评价以大语言模式编制实时报告的情况 2506.21343v1 |
Authors (8): Jingyao Li, Hao Sun, Zile Qiao, Yong Jiang, Pengjun Xie, Fei Huang, Hong Xu, Jiaya Jia
Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minute data. DynamicBench utilizes a dual-path retrieval pipeline, integrating web searches with local report databases. It necessitates domain-specific knowledge, ensuring accurate responses report generation within specialized fields. By evaluating models in scenarios that either provide or withhold external documents, DynamicBench effectively measures their capability to independently process recent information or leverage contextual enhancements. Additionally, we introduce an advanced report generation system adept at managing dynamic information synthesis. Our experimental results confirm the efficacy of our approach, with our method achieving state-of-the-art performance, surpassing GPT4o in document-free and document-assisted scenarios by 7.0% and 5.8%, respectively. The code and data will be made publicly available.
大型语言模型(LLMS)的传统基准通常依靠静态评价,通过故事或意见表达,无法反映实时信息处理在当代应用中的动态要求。为了应对这一限制,我们介绍“动态邦奇”这一基准,用以评价LMS在储存和处理最新到分钟数据方面的熟练程度。动态邦奇利用双路径检索管道,将网络搜索与当地报告数据库结合起来。它需要具体领域的知识,确保在专门领域生成准确的响应报告。通过在提供或扣留外部文件的假设情景中评估准确的模型,动态邦希有效地衡量其独立处理最新信息或利用背景强化手段的能力。此外,我们引入了能够管理动态信息合成的高级报告生成系统。我们的实验结果证实了我们的方法的有效性,我们的方法实现了最新业绩,在无文件和文件辅助文件的假设中超过了GPT4o,分别实现了7.0%和5.8%的GPT4o。代码和数据将被公布。
Article 59
Title@2025-06-26 (4): AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification
Title: AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification | AGTCNet: Ein graphisch-zeitlicher Ansatz für die Klassifikation der Primärmotorik EEG | AGTCNet: 固定机动图像电子EEG分类的图表-临时方法 2506.21338v1 |
Authors (6): Galvin Brice S. Lim, Brian Godwin S. Lim, Argel A. Bandala, John Anthony C. Jose, Timothy Scott C. Chu, Edwin Sybingco
Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.
利用电传技术(BCI) 利用电传技术(EEG) 是一种变革性创新,赋予运动残疾者以平等环境的能力。尽管它具有很有希望的潜力,但由于个人和时间之间神经活动固有的复杂性和变化性,再加上EEEG硬件的限制,开发主题变异和会动变异 BCI 系统仍是一个重大挑战。虽然先前的研究力求开发强大的BCI系统,但现有方法仍然无法有效捕捉多渠道(EEEEG)信号内错综复杂的随机依赖性。本研究通过引入专注的图形-时空网络(AGTCNet ) 来弥补这一差距。尽管其潜力大有希望,但开发主题-变异性BCI系统(MI-EEEEG)系统(AGT)系统(AGTC Net ) (AGTC ) (AGTC Net ) (AGTC ) (AGTC) (AG) (ACT) (AG) (ACT ) (ACT ) (ACT) (E) (E) (AG) (AG) (E) (E) (E) (E) (E) (E) (E (E) (E) (E) (E) (E) (E) (E) (E) (E) (E (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (EG) (E) (E(E) (E) (E) (E) (E) (E) (E) (E (E) (E) (E) (E) (E) (E) (E) (E) (E) (E (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (ED) (E) (E) (E) (E) (E) (E) (E) (E) (E) ((E) ((ED) (E) (
Article 60
Title@2025-06-26 (4): A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis
Title: A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis | Ein skalierbares Quantum-Neural-Netzwerk für annähernde SRBB-basierte Einheitssynthese | 近似基于SRBB的单一合成的可缩放量量子神经网络 2412.03083v2 |
Authors (3): Giacomo Belli, Marco Mordacci, Michele Amoretti
In this work, a scalable quantum neural network is introduced as a means to approximate any unitary evolution through the Standard Recursive Block Basis (SRBB) and, subsequently, redesigned with a number of CNOTs asymptotically reduced by an exponential contribution. This algebraic approach to the problem of unitary synthesis exploits Lie algebras and their topological features to obtain scalable parameterizations of unitary operators. First, the original SRBB-based scalability scheme, already known in the literature only from a theoretical point of view, is reformulated for efficient algorithm implementation and complexity management. Remarkably, 2-qubit operators emerge as a special case outside the original scaling scheme. Furthermore, an algorithm is proposed to reduce the number of CNOTs, thus deriving a new implementable scaling scheme that requires only one layer of approximation. The scalable CNOT-reduced quantum neural network is implemented and its performance is assessed with a variety of different unitary matrices, both sparse and dense, up to 6 qubits via the PennyLane library. The effectiveness of the approximation is measured with different metrics in relation to two optimizers: a gradient-based method and the Nelder-Mead method. The approximate CNOT-reduced SRBB-based synthesis algorithm is also tested on real hardware and compared with other valid approximation and decomposition methods available in the literature.
在这项工作中,引入了一个可伸缩的量子神经网络,作为通过标准递归基础基础(SRBB)来估计任何单元演进的手段,随后重新设计了一些CNOTs, 其间会因指数性贡献而逐渐减少一些CNOTs。这种对统一合成问题的代数方法利用了Lie 代数及其地形特征,以获得单一操作者可伸缩参数。首先,最初的基于SRBB的缩放计划,仅从理论角度出发才在文献中知道,为高效算法实施和复杂管理重新制定了基于SRB的原始缩放计划。值得注意的是,2平方位操作员作为特例出现在原缩放计划之外。此外,还提出了一种算法,以减少CNOTs的数量,从而形成一个新的可执行的缩放计划,只需要一层近似。可缩放的CNOT的量子神经网络得到了实施,其性能通过PennyLane图书馆以各种不同的、稀薄和稠密的单一矩阵进行评估,最高至6平方位。近似值的有效性与现有的NBRBS-CR和最接近性方法测量了。
Article 61
Title@2025-06-26 (4): ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion
Title: ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion | ScaleGNN: Auf dem Weg zu skalierbaren Graphen-Neuralnetzwerken über adaptive High-Order Neighboring Feature Fusion | SASGNN:通过适应性高顺序相邻相邻地貌融合,走向可缩放的图形神经网络 2504.15920v4 |
Authors (8): Xiang Li, Jianpeng Qi, Haobing Liu, Yuan Cao, Guoqing Chao, Zhongying Zhao, Junyu Dong, Yanwei Yu
Graph Neural Networks (GNNs) have demonstrated impressive performance across diverse graph-based tasks by leveraging message passing to capture complex node relationships. However, when applied to large-scale real-world graphs, GNNs face two major challenges: First, it becomes increasingly difficult to ensure both scalability and efficiency, as the repeated aggregation of large neighborhoods leads to significant computational overhead; Second, the over-smoothing problem arises, where excessive or deep propagation makes node representations indistinguishable, severely hindering model expressiveness. To tackle these issues, we propose ScaleGNN, a novel framework that adaptively fuses multi-hop node features for both scalable and effective graph learning. First, we construct per-hop pure neighbor matrices that capture only the exclusive structural information at each hop, avoiding the redundancy of conventional aggregation. Then, an enhanced feature fusion strategy significantly balances low-order and high-order information, preserving both local detail and global correlations without incurring excessive complexity. To further reduce redundancy and over-smoothing, we introduce a Local Contribution Score (LCS)-based masking mechanism to filter out less relevant high-order neighbors, ensuring that only the most meaningful information is aggregated. In addition, learnable sparse constraints selectively integrate multi-hop valuable features, emphasizing the most informative high-order neighbors. Extensive experiments on real-world datasets demonstrate that ScaleGNN consistently outperforms state-of-the-art GNNs in both predictive accuracy and computational efficiency, highlighting its practical value for large-scale graph learning.
图像神经网络(GNNs)通过利用传递信息获取复杂的节点关系,展示了不同图表任务中令人印象深刻的绩效。然而,当应用到大规模真实世界图形时,GNNS面临两大挑战:第一,随着大型街坊的反复聚集导致大量计算间接费用,越来越难以确保可缩放和效率:第二,过度或深度的传播造成互不相干的表现,严重妨碍模型表达性。为了解决这些问题,我们提议SASCGNN(SOSGN),这是一个新颖的框架,它适应性地结合了可缩放和有效的图表学习的多点节点功能。第一,我们建造了只捕捉到每个悬浮点的独家结构信息的纯邻居矩阵,避免了常规汇总的冗余。然后,强化的特性融合战略大大平衡了低级和高端信息,保存了本地细节和全球相关性,而不会造成过度复杂的复杂性。为了进一步减少冗余性和超度,我们建议引入一个基于本地捐款的节点显示多点的节点的节点,基于本地贡献的节点的节点的节点的节点的节点功能,将高端的节点的节点的节点的节点提升的G-强调高点的节点的节点的节点的节点的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制,只是制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制的节制
Article 62
Title@2025-06-26 (4): Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts
Title: Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Latent Prototype Routing: Erzielen einer nahezu perfekten Lastabgleichung in Mixture-of-Experts | 原型原型路由:在混合专家中实现近效果负载平衡 2506.21328v1 |
Authors (1): Jiajie Yang
Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently activated during training and inference, leading to significant underutilization of model capacity and computational resources. In this work, we revisit expert routing through a clustering perspective and propose Latent Prototype Routing (LPR), a novel routing framework that generalizes existing approaches while promoting balanced expert utilization without compromising downstream performance. Extensive experiments across multiple open-source MoE models – including DeepSeek-V3, Qwen3-MoE, and Mixtral – demonstrate that LPR reduces the Gini coefficient of expert load from 0.70 to 0.035 on average, improves the min-max expert load ratio from 1e-6 to 0.70, achieving near-perfect load balancing.
专家混合结构(Mixture of Experters (MoE)架构已成为有效推广大型语言模型的关键战略,然而,目前的教育部系统承受着严重的负荷不平衡,在培训和推论期间,只有一小部分专家在不断被激活,导致模型能力和计算资源严重利用不足。在这项工作中,我们重新审视专家通过集群视角选择路线,并提议采用Lentant Prototy Routing(LPR)这一新的路线框架,它概括现有方法,同时促进平衡专家利用,同时不损害下游业绩。 多种开放源的教育部模型 – – 包括DeepSeek-V3、Qwen3-MOE和Mixtral – 的广泛实验表明,LPR平均将专家负荷的基尼系数从0.70降至0.035,将最小负载专家负荷比率从1e-6提高到0.70,实现近效负载平衡。
Article 63
Title@2025-06-26 (4): Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning
Title: Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning | Stochastische Quantum-Spiking-Neuralnetzwerke mit Quantengedächtnis und lokalem Lernen | 具有量子内存和本地学习的实测量量谱剖析神经网络 2506.21324v1 |
Authors (3): Jiechen Chen, Bipin Rajendran, Osvaldo Simeone
Neuromorphic and quantum computing have recently emerged as promising paradigms for advancing artificial intelligence, each offering complementary strengths. Neuromorphic systems built on spiking neurons excel at processing time-series data efficiently through sparse, event-driven computation, consuming energy only upon input events. Quantum computing, on the other hand, leverages superposition and entanglement to explore feature spaces that are exponentially large in the number of qubits. Hybrid approaches combining these paradigms have begun to show potential, but existing quantum spiking models have important limitations. Notably, prior quantum spiking neuron implementations rely on classical memory mechanisms on single qubits, requiring repeated measurements to estimate firing probabilities, and they use conventional backpropagation on classical simulators for training. Here we propose a stochastic quantum spiking (SQS) neuron model that addresses these challenges. The SQS neuron uses multi-qubit quantum circuits to realize a spiking unit with internal quantum memory, enabling event-driven probabilistic spike generation in a single shot. Furthermore, we outline how networks of SQS neurons – dubbed SQS neural networks (SQSNNs) – can be trained via a hardware-friendly local learning rule, eliminating the need for global classical backpropagation. The proposed SQSNN model fuses the time-series efficiency of neuromorphic computing with the exponentially large inner state space of quantum computing, paving the way for quantum spiking neural networks that are modular, scalable, and trainable on quantum hardware.
神经和量子计算最近成为推进人工智能的有希望的范式,每个模式都提供了互补优势。神经和度量计算最近成为了推进人工智能的有希望的范式。 神经和度量计算系统在通过分散的、由事件驱动的计算,高效地处理时间序列数据方面表现优于神经元。 量子计算则只在输入事件时消耗能源。 另一方面, 量子计算利用超位和纠缠来探索在分子数量上成倍庞大的模型。 结合这些模式的混合方法已开始显示出潜力,但现有的量子喷射模型也有相当大的局限性。 值得注意的是, 先前的量喷射神经元执行依赖单量子的古典记忆机制, 需要反复测量来估计发射概率的概率, 并且只在输入输入输入输入输入输入输入输入的常规模拟模拟器。 量子计算器中, 我们提议用一个随机量量量量的量量子模型来探索这些模型。 SQS 神经, 使用多量子量子模型来实现一个具有内部内量内存储存储的分子存储单位, 使由事件驱动的神经稳定性峰峰值生成生成生成 Q。 此外, 我们勾画的网络如何通过SQ学习SNIS的轨道的智能智能中S- creal- creal- creal
Article 64
Title@2025-06-26 (4): On Uniform Weighted Deep Polynomial approximation
Title: On Uniform Weighted Deep Polynomial approximation | Auf einheitliche Gewichte tiefe Polynom-Annäherung | 统一加权深多元近似值 2506.21306v1 |
Authors (2): Kingsley Yeon, Steven B. Damelin
It is a classical result in rational approximation theory that certain non-smooth or singular functions, such as $ | x | $ and $x^{1/p}$, can be efficiently approximated using rational functions with root-exponential convergence in terms of degrees of freedom \cite{Sta, GN}. In contrast, polynomial approximations admit only algebraic convergence by Jackson’s theorem \cite{Lub2}. Recent work shows that composite polynomial architectures can recover exponential approximation rates even without smoothness \cite{KY}. In this work, we introduce and analyze a class of weighted deep polynomial approximants tailored for functions with asymmetric behavior-growing unbounded on one side and decaying on the other. By multiplying a learnable deep polynomial with a one-sided weight, we capture both local non-smoothness and global growth. We show numerically that this framework outperforms Taylor, Chebyshev, and standard deep polynomial approximants, even when all use the same number of parameters. To optimize these approximants in practice, we propose a stable graph-based parameterization strategy building on \cite{Jar}. |
典型的结果是理性近似理论, 某些非超值或单一功能, 如 $ x $ $ 和 $x 1/ p} , 可以用自由度 \ cite { Sta, GN} 来使用理性函数与根- 耗离趋同法进行有效比较。 相比之下, 多元近似只能接受杰克逊理论论的代数趋同值。 最近的工作显示, 复合多边多边结构即使没有平滑, 也可以恢复指数近似率。 在这项工作中, 我们引入并分析一组为不对称行为在一边不受限制地增长而同时在另一边衰减的函数而定制的加权深度多元近似差值。 通过将可学习的深度多数值乘以一面重量, 我们捕捉到本地非移动性和全球增长。 我们用数字显示, 这个框架比泰勒、 Chebyshev 和标准的深度多元近似值要高, 甚至当所有在图形中都使用稳定度的参数的校准度 。
Article 65
Title@2025-06-26 (4): Context-Aware Doubly-Robust Semi-Supervised Learning
Title: Context-Aware Doubly-Robust Semi-Supervised Learning | Kontext-Bewusst Doppel-Robust Semi-überwachtes Lernen | Doubly-Robust半监督学习 2502.15577v2 |
Authors (4): Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir Al-Hashimi
The widespread adoption of artificial intelligence (AI) in next-generation communication systems is challenged by the heterogeneity of traffic and network conditions, which call for the use of highly contextual, site-specific, data. A promising solution is to rely not only on real-world data, but also on synthetic pseudo-data generated by a network digital twin (NDT). However, the effectiveness of this approach hinges on the accuracy of the NDT, which can vary widely across different contexts. To address this problem, this paper introduces context-aware doubly-robust (CDR) learning, a novel semi-supervised scheme that adapts its reliance on the pseudo-data to the different levels of fidelity of the NDT across contexts. CDR is evaluated on the task of downlink beamforming where it outperforms previous state-of-the-art approaches, providing a 24% loss decrease when compared to doubly-robust (DR) semi-supervised learning in regimes with low labeled data availability.
在下一代通信系统中广泛采用人工智能(AI)受到交通和网络条件差异性的挑战,这种差异性要求使用高背景、针对具体地点的数据。一个大有希望的解决办法是不仅依赖现实世界的数据,而且依赖网络数字双组(NDT)产生的合成伪数据。然而,这一方法的有效性取决于NDT的准确性,这种准确性在不同背景中差异很大。为解决这一问题,本文件引入了一种新颖的半监督的半监督的学习方法,即对伪数据的依赖适应了NDT在各种情况下的不同水平的忠实性。CDR在与以往最先进的方法相形形形色色时,对下行链式的任务进行了评价,在与具有低标签数据可获性的制度中的二元罗伯特(DR)半监督性学习相比,减少了24%的损失。
Article 66
Title@2025-06-26 (4): Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance
Title: Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance | Semantische Szenegrafik für Ultrasound-Bilderklärung und Scan-Anleitung | 超声超声图像解释和扫描指导的语义谱图 2506.19683v2 |
Authors (5): Xuesong Li, Dianye Huang, Yameng Zhang, Nassir Navab, Zhongliang Jiang
Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.
由于成像和获取参数的差异造成视觉可变性很大,对医学超声成像的了解仍是一项长期挑战。大型语言模型(LLMS)最近的进展被用来自动生成内容丰富的摘要,供具有足够生理知识的临床医生使用;然而,非专家用户对改进超声学解释能力和基本扫描指导的需求日益增加,例如,在护理点环境中,尚未探索改进超声波解释能力和基本扫描指导。在本研究中,我们首先为超声波图像引入场景图(SG),以向普通图像内容解释,并为超声波扫描提供指导。超声波SG首先使用基于变压器的一阶段方法进行计算,从而消除了对明确天体探测的需要。为形成对普通的可捕捉的图像解释,然后使用用户查询,以通过LLMSMs进一步完善SG的抽象代表。此外,还探讨了SG的预测潜力,以引导超声波扫描当前图像,协助普通用户实现更标准化和完整的解剖面探测。超声波SGSG的SGSGSDRDRDRDRDRDRDRDRDRDRDRDRUDRUDRUDS 和SUDLUDLUDUDRUDRUDRVVF 和SUDRVVVVVVVVVVVVVVVVVVVVVVVVVF ROVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVD, 和GVVVVVVVVVVVVVDRVDRVDF 和SVVVVVDRVDRVDRVDRVVVDRVVVDFDFDRVVVDRVDFDRVDRVDRVDF 和SDRVVVVVVVVDFDFDFDRVVVDFDFDFDFDFDFDFDVDRVDF和RVVVVVVVVDF 和SDFDVVVVVVDRVVVVVVDRVDF 和SD
Article 67
Title@2025-06-26 (4): Exploring Adapter Design Tradeoffs for Low Resource Music Generation
Title: Exploring Adapter Design Tradeoffs for Low Resource Music Generation | Erforschung von Adapter-Design-Tradeoffs für Low Resource Music Generation | 探索用于低资源音乐制作的适应设计取舍 2506.21298v1 |
Authors (3): Atharva Mehta, Shivam Chauhan, Monojit Choudhury
Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources. Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters while preserving model performance. However, the design choices for adapters, including their architecture, placement, and size, are numerous, and it is unclear which of these combinations would produce optimal adapters and why, for a given case of low-resource music genre. In this paper, we attempt to answer this question by studying various adapter configurations for two AI music models, MusicGen and Mustango, on two genres: Hindustani Classical and Turkish Makam music. Our findings reveal distinct trade-offs: convolution-based adapters excel in capturing fine-grained local musical details such as ornamentations and short melodic phrases, while transformer-based adapters better preserve long-range dependencies crucial for structured improvisation. Additionally, we analyze computational resource requirements across different adapter scales, demonstrating how mid-sized adapters (40M parameters) achieve an optimal balance between expressivity and quality. Furthermore, we find that Mustango, a diffusion-based model, generates more diverse outputs with better adherence to the description in the input prompt while lacking in providing stability in notes, rhythm alignment, and aesthetics. Also, it is computationally intensive and requires significantly more time to train. In contrast, autoregressive models like MusicGen offer faster training and are more efficient, and can produce better quality output in comparison, but have slightly higher redundancy in their generations.
微调大型音乐制作模型,如MusicGen和Mustango,是一个计算成本昂贵的过程,往往需要更新数十亿参数,因此需要大量硬件资源。 参数-高效的美调(PefFT)技术,特别是基于适应器的方法,已经成为一个大有希望的替代方案,在保持模型性能的同时,能够以最起码的训练参数进行适应。然而,适应器的设计选择,包括其结构、位置和大小,数量众多,这些组合中哪些会产生最优化的调校正,为什么,对于某个资源较少的音乐流体来说,这些组合会产生最优化的调适度,为什么呢?在本文件中,我们试图通过研究两种AI音乐模型(Muscult Geting and Mustago)的多种调适度配置来回答这个问题:印度古典和土木马卡姆音乐。我们发现,基于革命的适应器在捕捉精细的本地音乐模型(如装和短调调调)中,而基于变异的调的调能更好地保存对结构化至关重要的长距离培训模型。 在本文中,我们试图进行更精确的调校准的调制的调定的调定时, 。
Article 68
Title@2025-06-26 (4): Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols
Title: Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols | Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle | 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v2 |
Authors (6): Longzhu He, Chaozhuo Li, Peng Tang, Li Sun, Sen Su, Philip S. Yu
Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have gained considerable attention. These protocols leverage the privacy advantages of local differential privacy (LDP) and the effectiveness of GNN’s message-passing in calibrating noisy data, offering strict privacy guarantees for users’ local data while maintaining high utility (e.g., node classification accuracy) for graph learning. Despite these advantages, such protocols may be vulnerable to data poisoning attacks, a threat that has not been considered in previous research. Identifying and addressing these threats is crucial for ensuring the robustness and security of privacy-preserving graph learning frameworks. This work introduces the first data poisoning attack targeting locally private graph learning protocols. The attacker injects fake users into the protocol, manipulates these fake users to establish links with genuine users, and sends carefully crafted data to the server, ultimately compromising the utility of private graph learning. The effectiveness of the attack is demonstrated both theoretically and empirically. In addition, several defense strategies have also been explored, but their limited effectiveness highlights the need for more robust defenses.
图表神经网络(GNNs)在图形代表学习方面取得巨大成功,并应用于各个领域。然而,许多真实世界的图表包含敏感的个人信息,如社交网络中的用户概况,在使用GNNs进行图形学习时引起严重的隐私问题。为解决这一问题,本地私人图形学习协议得到了相当的重视。这些协议利用了当地差异隐私隐私隐私的隐私优势和GNN信息传递在校准噪音数据方面的有效性,为用户的本地数据提供了严格的隐私保障,同时保持了与真实用户的链接(例如,节点分类精确度),同时为图表学习提供了高度的实用性(例如,节点分类精确度 ) 。尽管有这些优势,但这类协议可能易受到数据中毒袭击的威胁,而此前的研究中并未考虑到这一威胁。 识别和应对这些威胁对于确保隐私保存图形学习框架的稳健和安全性至关重要。 这项工作将首次数据中毒袭击当地私人图形学习协议的用户引入了协议,操纵这些假用户与真正的用户建立联系,并向服务器发送精心制作的数据,最终损害了私人图表学习的实用性,但最终损害了私人图表学习的效用。 探索性战略的有效性也是有限的。 。 探索性攻击的有效性是有限的, 。 探索性战略 也得到了展示性 。
Article 69
Title@2025-06-26 (4): Improved seeding strategies for k-means and k-GMM
Title: Improved seeding strategies for k-means and k-GMM | Verbesserte Saatstrategien für k-Mittel und k-GMM | 改进k-手段和k-GMM和k-GMM的播种战略 2506.21291v1 |
Authors (2): Guillaume Carrière, Frédéric Cazals
We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a lookahead principle–conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a multipass strategy to tame down the effect of randomization. Experiments show a consistent constant factor improvement over classical contenders in terms of the final metric (SSE for k-means, log-likelihood for k-GMM), at a modest overhead. In particular, for k-means, our methods improve on the recently designed multi-swap strategy, which was the first one to outperform the greedy k-means++ seeding. Our experimental analysis also shed light on subtle properties of k-means often overlooked, including the (lack of) correlations between the SSE upon seeding and the final SSE, the variance reduction phenomena observed in iterative seeding methods, and the sensitivity of the final SSE to the pool size for greedy methods. Practically, our most effective seeding methods are strong candidates to become one of the–if not the–standard techniques. From a theoretical perspective, our formalization of seeding opens the door to a new line of analytical approaches.
我们重新审视了 k- point 群集 和 k- GMMM ( 与 期望- 最大化 相匹配的 Gausisan Mixture 模型) 随机化种子技术 , 正式化了它们的三个关键要素: 种子取样所用的衡量标准、 候选种子的数量 和种子选择所使用的衡量标准 。 此项分析产生了新颖的初始化方法系列, 利用外观原则调节种子选择, 使种子选择与用于评估算法的最后衡量标准更加一致, 以及减少随机化效应的多点策略。 实验显示传统竞争者在最终指标( k- 平均值的SE, k- GMMM 的日志相似度) 方面不断有不断的因子改进。 特别是 k- 工具, 我们最近设计的多点抽取战略的初始化方法改进了我们的方法, 超越了贪婪的 k- 工具++ 种子。 我们的实验分析分析分析还揭示了 k- 工具的微妙性, 包括 SSE 和最终的理论化方法之间的关联性关系, 一种从SE- supreal- review view view- view- view- view- view- view- view- view- view- view- sal- view- view- view- view- view- view- view- view- view- sal- view- view- view
Article 70
Title@2025-06-26 (4): Small Encoders Can Rival Large Decoders in Detecting Groundedness
Title: Small Encoders Can Rival Large Decoders in Detecting Groundedness | Kleine Encoder können große Decoder bei der Erkennung von Erdlichkeit rivalisieren | 在地面探测中能够使大型分离器在探测地面时发生迭接 2506.21288v1 |
Authors (7): Istabrak Abbes, Gabriele Prato, Quentin Fournier, Fernando Rodriguez, Alaa Boukhary, Adam Elwood, Sarath Chandar
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
利用外部环境增强大型语言模型(LLMS),大大改善了其在自然语言处理(NLP)任务方面的表现;然而,LLMS在所提供背景缺乏信息、往往诉诸无根据的猜测或内部知识时,努力可靠地回答询问;基础(产生得到背景的大力支持的答复)对于确保事实的一致性和可信赖性至关重要;本研究的重点是查明某一询问是否以LLMS在昂贵的答案生成之前提供的背景文件为依据;这种检测机制可以大大减少推断时间和资源消耗。我们显示,轻量、特定任务编码模型,如RoBERTA和NomicBERT, 微调整理数据集,能够达到与Llama3 8B和GPT4o等最新高水平的LLMMS的精确度,同时减少数量级的推断力。该代码可在以下网址查阅:https://github.com/chandarlab/Hallucincate-less。
Article 71
Title@2025-06-26 (4): Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Title: Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling | Energy Matching: Zusammenführen von Flow Matching- und Energy-Based-Modellen für die Generative Modellierung | 能源匹配:统一流动匹配和以能源为基础的生成模型模型 2504.10612v4 |
Authors (8): Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin, Suprosanna Shit, Lea Bogensperger, Sebastian Kaltenbach, Petros Koumoutsakos, Bjoern Menze
The most widely used generative models map noise and data distributions by matching flows or scores. However, they struggle to incorporate partial observations and additional priors–something energy-based models (EBMs) handle elegantly by simply adding corresponding scalar energy terms. We address this issue by proposing Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs. Far from the data manifold, samples move along curl-free, optimal transport paths from noise to data. As they approach the data manifold, an entropic energy term guides the system into a Boltzmann equilibrium distribution, explicitly capturing the underlying likelihood structure of the data. We parameterize this dynamic with a single time-independent scalar field, which serves as both a powerful generator and a flexible prior for effective regularization of inverse problems. Our method substantially outperforms existing EBMs on CIFAR-10 and ImageNet generation in terms of fidelity, while retaining simulation-free training of transport-based approaches away from the data manifold. Furthermore, we leverage the method’s flexibility to introduce an interaction energy that supports diverse mode exploration, which we demonstrate in a controlled protein-generation setting. Our approach focuses on learning a scalar potential energy–without time-conditioning, auxiliary generators, or additional networks–which marks a significant departure from recent EBM methods. We believe that this simplified framework significantly advances EBMs capabilities and paves the way for their wider adoption in generative modeling across diverse domains.
最广泛使用的基因模型通过匹配流量或分数来绘制噪音和数据分布。然而,它们努力将部分观测和更多前置能源模型(EBMS)纳入部分观测和更多前置能源模型(EBMs),简单增加相应的卡路里能源术语,优雅地处理。我们通过提出“能源匹配”来解决这一问题,这个框架将流动方法与EBMs的灵活性相匹配。远不局限于数据元件,样品沿着无卷轴的最佳运输路径从噪音到数据。当它们接近数据元件时,一个变异性能源术语引导系统进入博尔茨曼均衡分布,明确捕捉数据的潜在可能性结构。我们用一个单一的、时间独立的标价字段将这种动态参数化为单一的参数,它既是一个强大的生成器,又是一个灵活地在对反向问题进行有效的正规化之前。我们的方法大大地超越了CFAR-10上的现有EBMs和图像网络生成的忠诚度,同时保留对基于传输方法的滚动式培训,远离数据元体。此外,我们利用该方法的灵活性来引入一种互动能量支持不同模式的能源,支持不同模式的模型的探索,我们用新的网络,我们从一个控制型型网络的路径,在不深层次上学习了一种控制型的模型生成。
Article 72
Title@2025-06-26 (4): Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution
Title: Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution | Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution | 使用高效球道球道配送的超球变异自动编码器 2506.21278v1 |
Authors (2): Lukas Sablica, Kurt Hornik
We propose a novel variational autoencoder (VAE) architecture that employs a spherical Cauchy (spCauchy) latent distribution. Unlike traditional Gaussian latent spaces or the widely used von Mises-Fisher (vMF) distribution, spCauchy provides a more natural hyperspherical representation of latent variables, better capturing directional data while maintaining flexibility. Its heavy-tailed nature prevents over-regularization, ensuring efficient latent space utilization while offering a more expressive representation. Additionally, spCauchy circumvents the numerical instabilities inherent to vMF, which arise from computing normalization constants involving Bessel functions. Instead, it enables a fully differentiable and efficient reparameterization trick via M"obius transformations, allowing for stable and scalable training. The KL divergence can be computed through a rapidly converging power series, eliminating concerns of underflow or overflow associated with evaluation of ratios of hypergeometric functions. These properties make spCauchy a compelling alternative for VAEs, offering both theoretical advantages and practical efficiency in high-dimensional generative modeling.
我们提出一个新的变式自动编码器(VAE)结构,该结构使用球形隐性分布(spCauchy)潜在分布。不同于传统的高斯潜在空间或广泛使用的冯·米塞-菲舍尔(vMF)分布,spCauchy提供了更自然的超球潜在变量,更好地捕捉方向性数据,同时保持灵活性。其繁琐的成份性质防止了超常规化,确保了高效的潜在空间利用,同时提供了更清晰的表述。此外,spCocici 绕过了VMF的内在数值不稳定性,而这种不稳定性产生于涉及贝塞尔功能的正常化常数的计算。相反,它通过M"奥比乌斯(M'obius)的转换,提供了完全不同和高效的再校准技巧,允许进行稳定和可扩缩的培训。KL的差异可以通过快速相趋力序列来计算,消除与评估超地球函数比率有关的下流或溢出问题。这些特性使Spcurcurvicial成为VAE的令人信服的替代物,提供了理论优势和高位基因模型的实用效率。
Article 73
Title@2025-06-26 (4): Lagrangian Index Policy for Restless Bandits with Average Reward
Title: Lagrangian Index Policy for Restless Bandits with Average Reward | Lagrangian Index Policy for Restless Bandits with Average Reward | 以平均回报率衡量的无休无休止强盗拉格朗加指数政策 2412.12641v2 |
Authors (3): Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah
We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti’s theorem.
我们研究无休止的多武装强盗的拉格朗指数政策(LIP),长期平均奖赏。我们特别将LIP的表现与Whittle指数政策(WIP)的绩效进行比较,这两种累进政策在某些自然条件下都被认为只是暂时最佳的。尽管在多数情况下,他们的表现非常相似,但在WIP表现不佳的情况下,LIP的表现仍然非常出色。我们然后提议以表格和NNW为基础的强化学习算法,以便在无模式环境下获得LIP的在线学习计划。拟议的LIP强化学习计划要求比WIP的类似计划要少得多的记忆。我们用分析方法计算重新启动模型的拉格朗指数,该指数适用于最佳网络爬动和加权信息年龄的最小化。我们还根据可交换性和de Filletti的理论,在武器数量达到无限性时,为同质武器提供无症状的最佳优化性的新证据。
Article 74
Title@2025-06-26 (4): A GREAT Architecture for Edge-Based Graph Problems Like TSP
Title: A GREAT Architecture for Edge-Based Graph Problems Like TSP | Eine großartige Architektur für Edge-Based Graph Probleme wie TSP | 象TSP那样的边缘图表问题大建筑 2408.16717v2 |
Authors (5): Attila Lischka, Filip Rydin, Jiaming Wu, Morteza Haghir Chehreghani, Balázs Kulcsár
In the last years, many learning-based approaches have been proposed to tackle combinatorial optimization problems such as routing problems. Many of these approaches are based on graph neural networks (GNNs) or related transformers, operating on the Euclidean coordinates representing the routing problems. However, models operating on Euclidean coordinates are ill-suited for non-Euclidean, asymmetric problem instances that are often found in real-world settings. To overcome this limitation, we propose a novel GNN-based and edge-focused neural model called Graph Edge Attention Network (GREAT). Using GREAT as an encoder to capture the properties of a routing problem instance, we build a reinforcement learning framework which we apply to Euclidean and non-Euclidean variants of vehicle routing problems such as Traveling Salesman Problem, Capacitated Vehicle Routing Problem and Orienteering Problem. Our framework is among the first to tackle non-Euclidean variants of these problems and achieves competitive results among learning-based solvers.
在过去的几年中,提出了许多基于学习的方法,以解决诸如路由问题等组合优化问题,其中许多方法都基于图形神经网络或相关的变压器,在代表路由问题的欧几里德坐标上运行,然而,在欧几里德坐标上运行的模式不适合在现实世界环境中经常发现的非欧几里德、非对称问题案例。为了克服这一限制,我们提出了一个新的基于GNN的、以边缘为重点的神经模型,称为“图视网”。我们利用Great作为编码器来捕捉路由问题实例的特性,我们建立了一个强化学习框架,适用于欧洲球里德和非欧几里德的车辆路由问题变体,如旅行推销员问题、卡帕蒂特车辆运行问题和动态问题。我们的框架首先解决了这些问题的非欧几里德变体变体,并在基于学习的解决问题者之间取得竞争性结果。
Article 75
Title@2025-06-26 (4): These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining
Title: These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining | Diese sind nicht alle Funktionen, die Sie suchen: Ein grundlegender Engpass in überwachten Pretraining | 这些不是所有你正在寻找的特征: 受监督预科班的基本瓶颈。 2506.18221v2 |
Authors (3): Xingyu Alice Yang, Jianyu Zhang, Léon Bottou
Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data. However, a significant challenge remains in ensuring that transferred features are sufficient to handle unseen datasets, amplified by the difficulty of quantifying whether two tasks are “related”. To address these challenges, we evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training. We identify a fundamental limitation in deep learning models – an “information saturation bottleneck” – where networks fail to learn new features once they encode similar competing features during training. When restricted to learning only a subset of key features during pretraining, models will permanently lose critical features for transfer and perform inconsistently on data distributions, even components of the training mixture. Empirical evidence from published studies suggests that this phenomenon is pervasive in deep learning architectures – factors such as data distribution or ordering affect the features that current representation learning methods can learn over time. This study suggests that relying solely on large-scale networks may not be as effective as focusing on task-specific training, when available. We propose richer feature representations as a potential solution to better generalize across new datasets and, specifically, present existing methods alongside a novel approach, the initial steps towards addressing this challenge.
转让学习是现代机器学习的基石,有望使在数据广泛混合基础上预先培训的模型适应新任务,而新数据极少。然而,在确保转让的特征足以处理无形数据集方面,仍然存在重大挑战,因为难以量化两个任务是否“相关”而使这些功能更加难以处理。 为了应对这些挑战,我们评估从培训前混合向每个组成部分任务转移的模式,评估培训前特点能否与具体任务直接培训的绩效相匹配。我们确定了深层次学习模式的根本局限性,即“信息饱和瓶颈”,即一旦网络在培训期间对类似的竞争特征进行编码,网络就无法学习新特征。如果在培训前只学习一组关键特征,模型将永远失去关键特征,用于转让,并在数据分布方面执行不一致,甚至培训混合部分。从已发表的研究中得出的经验性证据表明,这种现象在深层次的学习结构中十分普遍 – – 例如数据传播或命令影响当前代表性学习方法随着时间的推移能够学习的特点。这项研究表明,仅仅依靠大型网络,在培训前期阶段,可能无法有效地侧重于特定任务的培训,在现有的新办法中,我们提出更丰富的办法。
Article 76
Title@2025-06-26 (4): DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
Title: DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster | DiLoCoX: Ein kommunikationsarmer groß angelegter Ausbildungsrahmen für dezentralisierte Cluster | DILOCOX:权力下放小组的低通信大范围培训框架 2506.21263v1 |
Authors (9): Ji Qi, WenPeng Zhu, Li Li, Ming Wu, YingJun Wu, Wu He, Xun Gao, Jason Zeng, Michael Heinrich
The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.
基础模型的分布式培训,特别是大型语言模型(LLMS)的分布式培训需要高水平的通信。 因此,它高度依赖集中集集,具有快速和可靠的互连性。 我们能否在缓慢的网络上进行培训,从而在涉及超过1000亿参数的模型时释放分散式集群的力量? 在本文中,我们提议DiLoCoX,这是一个低通信的大规模分散式集群培训框架。它将管道平行与双优化政策、通信和地方培训的单步重叠和适应性渐进式压缩计划结合起来。这种组合大大改善了参数的规模和模型预培训的速度。我们有理由通过对趋同进行理论分析来证明通信和地方培训的一步骤重叠以及适应性梯度压缩计划的好处。我们很生动地证明DiloCoX能够在1Gbps网络上对107B基础模型进行预培训。 与Vanilla AllRedduce相比, DiLoCoX可以在分配培训中实现357x的首次速度,同时保持100亿分流化模型的成功降解。
Article 77
Title@2025-06-26 (4): Simulating Hard Attention Using Soft Attention
Title: Simulating Hard Attention Using Soft Attention | Simulation der harten Aufmerksamkeit mit weicher Aufmerksamkeit | 使用软关注模拟硬关注 2412.09925v2 |
Authors (4): Andy Yang, Lena Strobl, David Chiang, Dana Angluin
We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several subclasses of languages recognized by hard-attention transformers, which can be defined in variants of linear temporal logic. We demonstrate how soft-attention transformers can compute formulas of these logics using unbounded positional embeddings or temperature scaling. Second, we demonstrate how temperature scaling allows softmax transformers to simulate general hard-attention transformers, using a temperature that depends on the minimum gap between the maximum attention scores and other attention scores.
我们研究的是使用软关注的变压器能够模拟硬关注的条件,即有效地将所有关注都集中在一组位置上。 首先,我们检查了几个由硬关注变压器承认的亚类语言,这些可按线性时间逻辑变量加以定义。我们演示了软关注变压器如何用无约束的定位嵌入或温度缩放来计算这些逻辑的公式。第二,我们演示了温度缩放如何让软最大变压器模拟普通硬关注变压器,使用温度取决于最大关注分数与其他关注分数之间的最小差距。
Article 78
Title@2025-06-26 (4): Wavelet Diffusion Neural Operator
Title: Wavelet Diffusion Neural Operator | Wavelet Diffusions-Neuraloperator | Wavelet 扩散神经操作员 2412.04833v3 |
Authors (10): Peiyan Hu, Rui Wang, Xiang Zheng, Tao Zhang, Haodong Feng, Ruiqi Feng, Long Wei, Yue Wang, Zhi-Ming Ma, Tailin Wu
Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling system states with abrupt changes and generalizing to higher resolutions. In this work, we propose Wavelet Diffusion Neural Operator (WDNO), a novel PDE simulation and control framework that enhances the handling of these complexities. WDNO comprises two key innovations. Firstly, WDNO performs diffusion-based generative modeling in the wavelet domain for the entire trajectory to handle abrupt changes and long-term dependencies effectively. Secondly, to address the issue of poor generalization across different resolutions, which is one of the fundamental tasks in modeling physical systems, we introduce multi-resolution training. We validate WDNO on five physical systems, including 1D advection equation, three challenging physical systems with abrupt changes (1D Burgers’ equation, 1D compressible Navier-Stokes equation and 2D incompressible fluid), and a real-world dataset ERA5, which demonstrates superior performance on both simulation and control tasks over state-of-the-art methods, with significant improvements in long-term and detail prediction accuracy. Remarkably, in the challenging context of the 2D high-dimensional and indirect control task aimed at reducing smoke leakage, WDNO reduces the leakage by 78% compared to the second-best baseline. The code can be found at https://github.com/AI4Science-WestlakeU/wdno.git.
以部分差异方程式(PDEs)描述的模拟和控制物理系统是整个科学和工程的关键任务。最近,由于能够捕捉长期依赖性和模型高度状态,传播基因模型已成为这些任务的一个有竞争力的方法类别。然而,扩散模型通常与处理系统国家挣扎,其操作系统国家突变,并推广到高分辨率。在这项工作中,我们提议Wavelet Difmulation神经操作员(WDNO),这是一个新的PDE模拟和控制框架,它加强了对这些复杂性的处理。WDNO由两个关键的创新组成。首先,WDNO在波盘域内为整个轨道进行基于传播的基因模型,以便有效地处理突变和长期依赖性依赖性状态。第二,为了解决不同分辨率处理系统处理系统不完善的问题,这是构建物理系统模型的基本任务之一,我们引入了多分辨率培训。我们验证了WDNO的五个物理系统,包括1D对流方方程式,三个具有挑战性的物理系统,突变码(1D Burgers等方程式,1D可精确的导航-Stoke-Degy-Degy-Degenal-Destrational-deal-Deal-deal-de-deal-deal-deal-deal-deal-deal-Develrial-deal-deal-deal-deal-deal-deal-de-de-deal-deal-deal-deal-deal-lagal-lagal-defal-deal-deal-de-de-deal-deal-deal-deal-de-de-de-deal-deal-de-de-de-de-deal-de-de-de-lad-lad-la-de-ladal-de-de-la-la-la-de-de-de-de-la-la-labal-labal-labal-labal-deal-de-de-deal-deal-de-lad-labal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-de-la-la-la-la-la-la-la-la-
Article 79
Title@2025-06-26 (4): Radio Map Estimation via Latent Domain Plug-and-Play Denoising
Title: Radio Map Estimation via Latent Domain Plug-and-Play Denoising | Radiokarte Schätzung über Latent Domain Plug-and-Play Denoising | 通过Latent Domain Plug 和 Play Disoising 无线电地图估计 2501.13472v2 |
Authors (5): Le Xu, Lei Cheng, Junting Chen, Wenqiang Pu, Xiao Fu
Radio map estimation (RME), also known as spectrum cartography, aims to reconstruct the strength of radio interference across different domains (e.g., space and frequency) from sparsely sampled measurements. To tackle this typical inverse problem, state-of-the-art RME methods rely on handcrafted or data-driven structural information of radio maps. However, the former often struggles to model complex radio frequency (RF) environments and the latter requires excessive training – making it hard to quickly adapt to in situ sensing tasks. This work presents a spatio-spectral RME approach based on plug-and-play (PnP) denoising, a technique from computational imaging. The idea is to leverage the observation that the denoising operations of signals like natural images and radio maps are similar – despite the nontrivial differences of the signals themselves. Hence, sophisticated denoisers designed for or learned from natural images can be directly employed to assist RME, avoiding using radio map data for training. Unlike conventional PnP methods that operate directly in the data domain, the proposed method exploits the underlying physical structure of radio maps and proposes an ADMM algorithm that denoises in a latent domain. This design significantly improves computational efficiency and enhances noise robustness. Theoretical aspects, e.g., recoverability of the complete radio map and convergence of the ADMM algorithm are analyzed. Synthetic and real data experiments are conducted to demonstrate the effectiveness of our approach.
无线电地图估计(RME)也称为频谱制图,目的是从零星抽样测量中重建不同领域(如空间和频率)的无线电干扰强度(RME),目的是从零星抽样测量中重建不同领域(如空间和频率)的无线电干扰强度;为了解决这一典型的反问题,最新RME方法依靠无线电地图的手工制作或数据驱动结构信息;然而,前者往往难以模拟复杂的无线电频率(RF)环境,后者需要过多的培训 – – 这使得它难以迅速适应现场遥感任务;这项工作提出了基于插播(PnP)分接(PnP)分流的简易光谱RME算法方法,这是一种计算成像技术的技术;目的是利用以下观察,即自然图像和无线电地图地图等信号的分解操作是相似的 – – 尽管信号本身存在非重大差异;因此,为自然图像设计或从自然图像中学习的精密的隐隐含器可以直接用于协助RME,避免使用无线电地图数据进行培训。 与直接在数据领域运作的常规PnP方法不同,拟议的方法是利用无线电地图的基本物理结构结构结构结构结构结构结构,并提议使ADMMMMMM的同步数据升级更能改进。
Article 80
Title@2025-06-26 (4): From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting
Title: From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting | Von der On-Chain zum Makro: Bewertung der Bedeutung der Datenquellenvielfalt in der Kryptowährungsmarktprognose | 从连网到宏观:评估数据来源多样性在加密货币市场预测中的重要性 2506.21246v1 |
Authors (3): Giorgos Demosthenous, Chryssis Georgiou, Eliada Polydorou
This study investigates the impact of data source diversity on the performance of cryptocurrency forecasting models by integrating various data categories, including technical indicators, on-chain metrics, sentiment and interest metrics, traditional market indices, and macroeconomic indicators. We introduce the Crypto100 index, representing the top 100 cryptocurrencies by market capitalization, and propose a novel feature reduction algorithm to identify the most impactful and resilient features from diverse data sources. Our comprehensive experiments demonstrate that data source diversity significantly enhances the predictive performance of forecasting models across different time horizons. Key findings include the paramount importance of on-chain metrics for both short-term and long-term predictions, the growing relevance of traditional market indices and macroeconomic indicators for longer-term forecasts, and substantial improvements in model accuracy when diverse data sources are utilized. These insights help demystify the short-term and long-term driving factors of the cryptocurrency market and lay the groundwork for developing more accurate and resilient forecasting models.
这项研究通过综合各种数据类别,包括技术指标、链内指标、情绪和兴趣指标、传统市场指数和宏观经济指标,调查数据来源多样性对加密货币预测模型绩效的影响。我们引入了Crypto100指数,代表了市场资本化的100个最大加密数据,并提出新的减少特征算法,以确定不同数据来源最具有影响力和复原力的特点。我们的全面实验表明,数据来源多样性极大地增强了不同时间范围内预测模型的预测性能。主要调查结果包括:链内指标对短期和长期预测都至关重要,传统市场指数和宏观经济指标对长期预测的相关性日益提高,在使用多种数据来源时,模型准确性得到大幅提高。这些见解有助于解开密码货币市场的短期和长期驱动因素,并为开发更准确、更具有复原力的预测模型奠定基础。
Article 81
Title@2025-06-26 (4): Zero-Shot Learning for Obsolescence Risk Forecasting
Title: Zero-Shot Learning for Obsolescence Risk Forecasting | Zero-Shot-Lernen für Obsoleszenz-Risikoprognosen | 用于悬浮风险预测的零热学习 2506.21240v1 |
Authors (7): Elie Saad, Aya Mrabah, Mariem Besbes, Marc Zolghadri, Victor Czmil, Claude Baron, Vincent Bourgeois
Component obsolescence poses significant challenges in industries reliant on electronic components, causing increased costs and disruptions in the security and availability of systems. Accurate obsolescence risk prediction is essential but hindered by a lack of reliable data. This paper proposes a novel approach to forecasting obsolescence risk using zero-shot learning (ZSL) with large language models (LLMs) to address data limitations by leveraging domain-specific knowledge from tabular datasets. Applied to two real-world datasets, the method demonstrates effective risk prediction. A comparative evaluation of four LLMs underscores the importance of selecting the right model for specific forecasting tasks.
在依赖电子部件的行业中,部件过时构成重大挑战,造成成本增加,系统安全和可用性中断,准确的过时风险预测至关重要,但由于缺乏可靠数据而受阻,本文件提出一种新办法,利用大型语言模型的零点学习(ZSL)预测过时风险,利用表格数据集的域别知识解决数据限制问题。该方法适用于两个现实世界数据集,显示有效的风险预测。对四个LLMS的比较评价强调了选择适合具体预测任务的模式的重要性。
Article 82
Title@2025-06-26 (4): Capturing Style in Author and Document Representation
Title: Capturing Style in Author and Document Representation | Stil in der Autor- und Dokumentdarstellung erfassen | 在作者和文件代表中获取样式 2407.13358v2 |
Authors (3): Enzo Terreau, Antoine Gourru, Julien Velcin
A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that they do not explicitly capture writing style, making them hardly applicable to literary data. We therefore propose a new architecture based on Variational Information Bottleneck (VIB) that learns embeddings for both authors and documents with a stylistic constraint. Our model fine-tunes a pre-trained document encoder. We stimulate the detection of writing style by adding predefined stylistic features making the representation axis interpretable with respect to writing style indicators. We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship Corpus and IMDb62, for which we show that it matches or outperforms strong/recent baselines in authorship attribution while capturing much more accurately the authors stylistic aspects.
深天然语言处理(NLP)的多种模式包括持续和低维的文字和文件表达方式。 令人惊讶的是,很少有模型为作者进行代表性学习。 这些表达方式可用于许多自然语言处理(NLP)任务,例如作者身份和分类,或建议系统。 对现有作品的强烈限制是,它们没有明确体现写作风格,因此很难适用于文学数据。 因此,我们提议基于动态信息瓶颈(VIB)的新架构,它学习以文体限制的方式嵌入作者和文件。 我们的模型微调是一个经过预先训练的文件编码器。 我们通过添加预定义的文体特征,使代表轴轴在写风格指标方面可以解释,来刺激对写作风格的风格的探测。 我们对三个数据集的方法进行了评估:从古滕贝格项目、博客作者Corpus和IMDb62中提取的文学文集,我们为此表明,它与作者归属的强/中基准相匹配或优于后者,同时更准确地捕捉到作者的文体学方面。
Article 83
Title@2025-06-26 (4): Rapid Gyroscope Calibration: A Deep Learning Approach
Title: Rapid Gyroscope Calibration: A Deep Learning Approach | Schnelle Gyroskop-Kalibrierung: Ein tiefer Lernansatz | 快速热波校准:深学习方法 2409.00488v3 |
Authors (2): Yair Stolero, Itzik Klein
Low-cost gyroscope calibration is essential for ensuring the accuracy and reliability of gyroscope measurements. Stationary calibration estimates the deterministic parts of measurement errors. To this end, a common practice is to average the gyroscope readings during a predefined period and estimate the gyroscope bias. Calibration duration plays a crucial role in performance, therefore, longer periods are preferred. However, some applications require quick startup times and calibration is therefore allowed only for a short time. In this work, we focus on reducing low-cost gyroscope calibration time using deep learning methods. We propose an end-to-end convolutional neural network for the application of gyroscope calibration. We explore the possibilities of using multiple real and virtual gyroscopes to improve the calibration performance of single gyroscopes. To train and validate our approach, we recorded a dataset consisting of 186.6 hours of gyroscope readings, using 36 gyroscopes of four different brands. We also created a virtual dataset consisting of simulated gyroscope readings. The six datasets were used to evaluate our proposed approach. One of our key achievements in this work is reducing gyroscope calibration time by up to 89% using three low-cost gyroscopes. Our dataset is publicly available to allow reproducibility of our work and to increase research in the field.
低成本的陀螺仪校准对于确保陀螺仪测量的准确性和可靠性至关重要。 定点校准估计测量误差的确定性部分。 为此,通常的做法是在预设期内平均陀螺仪读数, 并估计陀螺仪偏差。 校准期在性能方面起着关键作用, 因此, 比较长。 但是, 有些应用程序需要快速启动时间, 因此只允许短时间校准。 在这项工作中, 我们的重点是利用深层学习方法减少低成本的陀螺仪校准时间。 我们提议了用于应用陀螺仪校准的终端至终端神经网络。 我们探索使用多种真实和虚拟的陀螺仪来改进单一陀螺仪的校准性能的可能性。 为了培训和验证我们的方法, 我们记录了一个由186.6小时的陀螺仪读数组成的数据集, 使用36个不同的陀螺仪品牌。 我们还创建了一个虚拟数据集, 包括模拟陀螺仪校准校准校准校准校准校准的校准系统校准系统校准系统校准系统, 我们的六大校准法用于将校准校准到公校准的校准校准的校准法。 我们的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准的校准法是用的校准法。
Article 84
Title@2025-06-26 (4): Complexity-aware fine-tuning
Title: Complexity-aware fine-tuning | Komplexitätsbewusste Feinabstimmung | 复杂度认知微调 2506.21220v1 |
Authors (5): Andrey Goncharov, Daniil Vyazhev, Petr Sychev, Edvard Khalafyan, Alexey Zaytsev
General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.55$ vs $0.43$ average accuracy) and provides comparable with distillation performance while using $62\%$ less data ($0.55$ average accuracy for both). We publish our code and data to facilitate further research in this direction.
通用大语言模型(LLMS)经常通过监督的微调(SFT)进行微调,以提高特定领域的绩效,通过提炼大型模型的思维链,以大量昂贵的电话和大量数据为代价,可以取得更好的结果。我们提出了一个高效微调的新蓝图,该蓝图只对通过英特罗普查明的复杂数据进行推理。具体地说,在两个小型开放模型($\approx 3B$)中,我们将培训数据分成复杂的类别,通过SFT和蒸馏,将一个单一的象征性答录(ROC ACUC 0.73美元)、微调大型语言模型(LLMS)进行精细调,显示我们的输油管大大超过标准SFT方法(0.55美元比0.43美元平均精度),并在使用62美元减去数据(平均精确度(0.55美元)的同时提供可比较的蒸馏性能。我们公布我们的代码和数据,以便利在这方面进行进一步的研究。
Article 85
Title@2025-06-26 (4): Balancing Privacy, Robustness, and Efficiency in Machine Learning
Title: Balancing Privacy, Robustness, and Efficiency in Machine Learning | Ausbalancierende Privatsphäre, Robustheit und Effizienz im maschinellen Lernen | 平衡隐私、强健和机器学习效率 2312.14712v3 |
Authors (3): Youssef Allouah, Rachid Guerraoui, John Stephan
This position paper argues that achieving robustness, privacy, and efficiency simultaneously in machine learning systems is infeasible under prevailing threat models. The tension between these goals arises not from algorithmic shortcomings but from structural limitations imposed by worst-case adversarial assumptions. We advocate for a systematic research agenda aimed at formalizing the robustness-privacy-efficiency trilemma, exploring how principled relaxations of threat models can unlock better trade-offs, and designing benchmarks that expose rather than obscure the compromises made. By shifting focus from aspirational universal guarantees to context-aware system design, the machine learning community can build models that are truly appropriate for real-world deployment.
本立场文件认为,在现行威胁模式下,在机器学习系统中同时实现稳健性、隐私和效率是行不通的。 这些目标之间的紧张关系并非源于算法缺陷,而是由于最坏的对抗性假设造成的结构性限制。 我们主张系统研究议程,旨在正式确定稳健性-隐私-效率三重力,探索威胁模式的原则性放松如何能够打开更好的权衡,并设计暴露而不是掩盖妥协的基准。 通过将重点从抱负性普遍保障转向环境意识系统设计,机器学习界可以构建真正适合现实世界部署的模式。
Article 86
Title@2025-06-26 (4): Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
Title: Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? | Kausale Vernunft in großen Sprachmodellen enthüllen: Realität oder Mirage? | 大语言模型中未解的因果理由:现实还是幻影? 2506.21215v1 |
Authors (8): Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han
Causal reasoning capability is critical in advancing large language models (LLMs) toward strong artificial intelligence. While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark called CausalProbe-2024, whose corpora are fresh and nearly unseen for the studied LLMs. The LLMs exhibit a significant performance drop on CausalProbe-2024 compared to earlier benchmarks, indicating the fact that they primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. We propose G^2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs’ causal reasoning processes. Experiments demonstrate that G^2-Reasoner significantly enhances LLMs’ causal reasoning capability, particularly in fresh and counterfactual contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.
原因推理能力对于推动大型语言模型(LLMS)获得强大的人工智能至关重要。尽管多功能LMS在理解背景因果关系和提供符合因果关系法则的响应方面表现出了能力,但是仍然不清楚它们是否具有与人类相似的真正因果关系推理。然而,目前的证据表明相反。具体地说,LMS只能进行浅(一级)因果推理,主要归因于其参数所包含的因果知识,但它们缺乏真正人性(二级)因果推理的能力。为了支持这一假设,从方法上看,我们进入基于变压LMS的自动反反向推理机制,揭示它并非必然的因果关系。我们引入了一个称为Causal Pro-2024的新的因果推理基准,其体对所研究的LMSMs来说是新鲜而几乎看不见的。LMS与先前的基准相比,其表现显著下降,表明它们主要参与一级(二级)因果推理学。为了弥补以2级为基础的因果关系推理学差距,我们从以下事实中汲取灵感,即人类推理学通常通过一般的了解和预期性推理推理学水平,将GLMSLMS的推理学能力转化为推理学。
Article 87
Title@2025-06-26 (4): Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs
Title: Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs | Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen | 未受监督的优化交通学习计划预测 2506.12025v2 |
Authors (3): Sonia Mazelet, Rémi Flamary, Bertrand Thirion
Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.
基于 Gromov-Wasserstein 和其他扩展, 图表之间最优化的迁移是比较和调整图形结构的有力工具。 然而, 解决相关的非convex优化问题在计算上是昂贵的, 这限制了这些方法的可缩进性。 在这项工作中, 我们展示了最佳运输不均的学习方法( ULOT ) , 这是预测两个图表之间最佳运输计划的深层次学习方法。 我们的方法是通过最大限度地减少导出不平衡的Gromov- Wasserstein (FUGW) 损失来培训的。 我们提出一个具有交叉注意的新型神经结构, 以 FUGW 交换超参数为条件。 我们评估合成合成相切碎块模型(SBM) 图形和从 fMRI 获得的真皮层表面数据 ULOT 。 ULOT 预测的运输计划有竞争性损失,其规模可达两个级, 比古典求解算器更快。 此外, 预测的计划可以用作古典求解解的热开始加速其趋趋同的加速趋同器。 最后, 预测的运输计划与功能优化计划完全不同, 使OBODLGUW 最优化计划能化计划与ULGPALPALPALPALPALPALPAL。
Article 88
Title@2025-06-26 (4): LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
Title: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey | LLM-basierte human-agente Kooperations- und Interaktionssysteme: Eine Umfrage | 以LLM为基础的人类-机构协作和互动系统:调查 2505.00753v4 |
Authors (15): Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Dongyuan Li, Renhe Jiang, Xue Liu, Philip S. Yu
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities arising from human-AI collaboration. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems.
大型语言模型(LLMS)最近的进展引起了人们对建立完全自主的代理商的兴趣,然而,完全自主的LLM公司代理商仍面临重大挑战,包括由于幻觉、处理复杂任务的困难以及巨大的安全和道德风险,这些都限制了其在现实世界应用中的可行性和可信度。为了克服这些限制,基于LLM公司的人类代理系统(LLM-HAS)将人提供的信息、反馈或控制纳入代理系统,以提高系统性能、可靠性和安全性。这些人类代理商合作系统使人类和LLM公司代理商能够利用其互补优势进行有效的合作。本文件提供了LLM-HAS的首次全面和结构化调查。该文件澄清了基本概念,系统地介绍了形成这些系统的核心组成部分,包括环境与特征分析、人类反馈、互动类型、协同和通信,探索新出现的应用,并讨论了人类-AI合作带来的独特挑战和机遇。通过整合现有知识并提供结构化的概览,我们的目标是促进在这一迅速演变的跨学科领域开展进一步的研究和创新。文件清单和资源可在 https://githubab-Cormao-Cormaus-Cory-Cormaus-Co-Sy-Ang-Systry-Syard-Sy-Sy-Systry-Sy-A/Hy-Syard-Z/Hing-As/Hing-SomeAsgo-S)。
Article 89
Title@2025-06-26 (4): Seal Your Backdoor with Variational Defense
Title: Seal Your Backdoor with Variational Defense | Versiegeln Sie Ihre Hintertür mit abwechslungsreicher Verteidigung | 以不同防御方式密封你的后门 2503.08829v2 |
Authors (3): Ivan Sabolić, Matej Grcić, Siniša Šegvić
We propose VIBE, a model-agnostic framework that trains classifiers resilient to backdoor attacks. The key concept behind our approach is to treat malicious inputs and corrupted labels from the training dataset as observed random variables, while the actual clean labels are latent. VIBE then recovers the corresponding latent clean label posterior through variational inference. The resulting training procedure follows the expectation-maximization (EM) algorithm. The E-step infers the clean pseudolabels by solving an entropy-regularized optimal transport problem, while the M-step updates the classifier parameters via gradient descent. Being modular, VIBE can seamlessly integrate with recent advancements in self-supervised representation learning, which enhance its ability to resist backdoor attacks. We experimentally validate the method effectiveness against contemporary backdoor attacks on standard datasets, a large-scale setup with 1$k$ classes, and a dataset poisoned with multiple attacks. VIBE consistently outperforms previous defenses across all tested scenarios.
我们建议VIBE, 这是一种模型- 不可知性框架, 用来训练能抵御后门攻击的分类人员。 我们方法的关键概念是将培训数据集中的恶意输入和腐蚀标签作为观察到的随机变量处理, 而实际的清洁标签则是潜伏的。 然后VIBE通过变推法恢复相应的潜在清洁标签后背体。 由此产生的培训程序遵循了预期- 最大化算法。 电子步骤通过解决一个加密常规化的最佳运输问题来推断清洁的假标签, 而 M 步骤则通过梯度下移更新分类参数。 成为模块, VIBE 能够与最近自我监督演示学习的进展无缝地融合, 从而增强它抵抗后门攻击的能力。 我们实验性地验证了对标准数据集的当代后门攻击的方法有效性, 一种规模为1千元的大型设置, 以及一个带有多重攻击毒害的数据集。 VIBE 在所有测试的情景中, 始终超越了先前的防御。
Article 90
Title@2025-06-26 (4): Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout
Title: Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout | Künstliche Delegierte lösen Fairness-Probleme bei der ständigen Abstimmung mit teilweiser Wahlbeteiligung | 持部分投票票的永久表决中的人造代表解决公平问题 2506.21186v1 |
Authors (4): Apurva Shah, Axel Abels, Ann Nowé, Tom Lenaerts
Perpetual voting addresses fairness in sequential collective decision-making by evaluating representational equity over time. However, existing perpetual voting rules rely on full participation and complete approval information, assumptions that rarely hold in practice, where partial turnout is the norm. In this work, we study the integration of Artificial Delegates, preference-learning agents trained to represent absent voters, into perpetual voting systems. We examine how absenteeism affects fairness and representativeness under various voting methods and evaluate the extent to which Artificial Delegates can compensate for missing participation. Our findings indicate that while absenteeism significantly affects fairness, Artificial Delegates reliably mitigate these effects and enhance robustness across diverse scenarios.
长期投票通过长期评估代表性公平性,处理顺序集体决策的公平性,但现行永久投票规则依赖于全面参与和完整的核准信息,而这种假设在实践中很少得到,而部分投票是常态。在这项工作中,我们研究将人工代表、经过培训代表缺席选民的偏好学习人员纳入永久投票系统的问题。我们研究在各种投票方法下缺勤如何影响公平和代表性,并评估人工代表对缺失参与的补偿程度。我们的调查结果表明,缺勤虽然严重影响公平性,但人工代表可靠地减轻这些影响,加强各种情景的稳健性。
Article 91
Title@2025-06-26 (4): PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp
Title: PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp | PCF-Grasp: Umwandlung der Punktvervollständigung in Geometrie-Feature zur Verbesserung der 6-DoF-Grasp | PCF-格拉斯普:将完成点转换成几何特征,以加强6-DoF格拉斯普 2504.16320v2 |
Authors (7): Yaofeng Cheng, Fusheng Zha, Wei Guo, Pengfei Wang, Chao Zeng, Lining Sun, Chenguang Yang
The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.
基于点云的6-D(Degree of Free) 6-Degree (DoF) 捕捉方法显示,在使机器人能够捕捉目标物体方面有很大的潜力。然而,大多数现有方法都以单视深度图像产生的点云(2.5D点)为基础。这些点云只具有提供不完全几何信息的物体表面一面,提供不完全的几何信息,从而误导掌握算法来判断目标物体的形状,从而降低掌握准确性。此外,由于网络生成与实际执行之间的差距,人类可以通过一个分数过滤器从一个单一的角度准确性地捕捉物体。在人类的启发下,我们提出了一个6-DoF新颖的捕捉框架,将点完成结果转换为目标形状功能的物体形状特性,以训练6-DoF抓抓网网络。在这里,点的完成率可以从2.5D点产生大约的完整点,类似于人类的几度,而将其转换成形状特征是利用它来提高掌握效率的方法。此外,由于网络生成与实际执行之间的差距,我们将一个分过滤器在我们的框架中选择更可执行的完整机器人的更可执行的掌握的收藏的建议。这使我们的方法能够在任何更精确的摄像器上保持一个更精确的升级的升级的升级的升级的准确性,从而大大的升级的方法保持一个更精确的升级的方法,从而使得我们更精确的升级的升级的升级的升级的方法能够使任何摄取方法更接近率。
Article 92
Title@2025-06-26 (4): Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
Title: Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 | Performance-Verbesserung der räumlichen semantischen Segmentierung mit angereicherten Audio-Features und agentenbasierter Fehlerkorrektur für DCASE 2025 Challenge Task 4 | DASAS 2025年挑战任务4,具有浓缩音频特征和以代理物为基础的错误更正的 空间语义分离的性能改进 2506.21174v1 |
Authors (6): Jongyeon Park, Joonhee Lee, Do-Hyeon Lim, Hong Kook Kim, Hyeongcheol Geum, Jeong Eun Lim
This technical report presents submission systems for Task 4 of the DCASE 2025 Challenge. This model incorporates additional audio features (spectral roll-off and chroma features) into the embedding feature extracted from the mel-spectral feature to im-prove the classification capabilities of an audio-tagging model in the spatial semantic segmentation of sound scenes (S5) system. This approach is motivated by the fact that mixed audio often contains subtle cues that are difficult to capture with mel-spectrograms alone. Thus, these additional features offer alterna-tive perspectives for the model. Second, an agent-based label correction system is applied to the outputs processed by the S5 system. This system reduces false positives, improving the final class-aware signal-to-distortion ratio improvement (CA-SDRi) metric. Finally, we refine the training dataset to enhance the classi-fication accuracy of low-performing classes by removing irrele-vant samples and incorporating external data. That is, audio mix-tures are generated from a limited number of data points; thus, even a small number of out-of-class data points could degrade model performance. The experiments demonstrate that the submit-ted systems employing these approaches relatively improve CA-SDRi by up to 14.7% compared to the baseline of DCASE 2025 Challenge Task 4.
这份技术报告介绍了DCASE 2025挑战(S5) 任务4的提交系统,该模型将更多音频特征(光谱滚动和染色体特征)纳入从Mel光谱特征中提取的嵌入功能,以证明声音场景空间语系分隔(S5)系统中的音频标记模型的分类能力;这一方法的动机是混合音频往往包含难以单独用Mel-spectrogrogram来捕捉的细微提示;因此,这些额外功能为模型提供了交替视角。第二,对S5系统处理的产出应用了基于代理的标签校正系统。这个系统减少了错误的正数,改进了声频信号对扭曲比率的最后等级信号模型(CA-SDRI)的分类能力。最后,我们改进了培训数据集,以提高低性能班级的等级偏差准确度,方法是删除无线-偏差样本和纳入外部数据。这就是,音频混合导导导导来自数量有限的数据点;因此,即使是少量的超出级的SDAS级外位模型,可以将这些SBA+14级基准测试显示这些测试。
Article 93
Title@2025-06-26 (4): Variational Supervised Contrastive Learning
Title: Variational Supervised Contrastive Learning | Variationelles Überwachtes Kontrastuelles Lernen | 差异监督反舞弊学习 2506.07413v2 |
Authors (5): Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu
Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.
事实证明,通过将相似的样本集中起来,将不同的样本分开,在塑造不同模式的代表性空间方面,相互抵触的学习非常高效,而且适应性强。然而,两大限制依然存在:(1) 没有明确监管嵌入分布, 与语义相关的情况可能会被无意地分开,除非补充性信号指南选择对配对;(2) 过度依赖大型批量负值和定制增强度阻碍了总体化。为解决这些限制,我们提议变式监督性差异性差异性学习(Varcon),将监督性对比性学习重新定位为对潜在类变量的变异推论,并最大限度地扩大事后加权证据的下限(ELBO),以取代高效的类认知匹配和对嵌入空间内部分布的细化对比; 完全依靠图像数据培训,我们在CIFAR-10、CIFAR-100、图像网络-100和图像网络-网络-1K的实验表明, Varcon(1) 在对比性学习框架方面达到最先进的业绩,在图像网络-1的精细度上达到79.36 %的顶级和高级精准性证据(ELB) 和高端级递递增的递增的递增性组织的递增性递增性递增性递增性递增性能(KRAR-100) 的递增性能) 的递增性能的递增性决定 。
Article 94
Title@2025-06-26 (4): Moderating the Generalization of Score-based Generative Model
Title: Moderating the Generalization of Score-based Generative Model | Moderierung der Generalisierung des Score-basierten Generativen Modells | 简化基于记分制的通用创制模式 2412.07229v2 |
Authors (7): Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong
Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current ‘gold standard’ in Machine Unlearning (MU), i.e., re-training the model after removing the undesirable training data, and find it does not work in SGMs. Further analysis of score functions reveals that the MU ‘gold standard’ does not alter the original score function, which explains its ineffectiveness. Based on this insight, we propose the first Moderated Score-based Generative Model (MSGM), which introduces a novel score adjustment strategy that redirects the score function away from undesirable data during the continuous-time stochastic differential equation process. Extensive experimental results demonstrate that MSGM significantly reduces the likelihood of generating undesirable content while preserving high visual quality for normal image generation. Albeit designed for SGMs, MSGM is a general and flexible MU framework that is compatible with diverse diffusion architectures (SGM and DDPM) and training strategies (re-training and fine-tuning), and enables zero-shot transfer of the pre-trained models to downstream tasks, e.g. image inpainting and reconstruction. The code will be shared upon acceptance.
对评分功能的进一步分析表明,MU“黄金标准”并不改变最初的得分功能,而这种分数功能解释其无效性。基于这一深入了解,我们建议第一种基于中度分分的创算模式(MSGM),它引入了一种新型的得分调整战略,在连续的随机差异方程过程中,将得分功能从不可取的数据转向不可取的数据。广泛的实验结果表明,MSGM大大降低了生成不良内容的可能性,同时保持了正常图像生成的高视觉质量。尽管为SGMs设计了MSGM,MSGM是通用和灵活的电子MUS格式的升级和升级规则。
Article 95
Title@2025-06-26 (4): Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Title: Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning | Metis-RISE: RL fördert und verbessert multimodales Reasoning Model Learning | Metis-RISE: RL 激励和SFT加强多模式理由示范学习 2506.13056v2 |
Authors (7): Haibo Qiu, Xiaohan Lan, Fanfan Liu, Xiaohu Sun, Delian Ruan, Peng Shi, Lin Ma
Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model’s exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy Optimization variant) to incentivize and activate the model’s latent reasoning capacity. Subsequently, the targeted SFT stage addresses two key challenges identified during RL: (1) \textit{inefficient trajectory sampling} for tasks where the model possesses but inconsistently applies correct reasoning, which we tackle using self-distilled reasoning trajectories from the RL model itself; and (2) \textit{fundamental capability absence}, which we address by injecting expert-augmented knowledge for prompts where the model entirely fails. This strategic application of RL for incentivization followed by SFT for enhancement forms the core of Metis-RISE, leading to two versions of our MLLMs (7B and 72B parameters). Evaluations on the OpenCompass Multimodal Reasoning Leaderboard demonstrate that both models achieve state-of-the-art performance among similar-sized models, with the 72B version ranking fourth overall. Please refer to our project page for open-source information.
大型语言模型(LLMS)的近期进步已经见证了高级推理范式发展的激增,而高级推理范式的发展正在被整合到多式联运大型语言模型(MLLM)中。然而,现有的方法往往不尽如人意:仅仅使用强化学习(RL)的方法可以与抽样低效率作斗争,并启动完全不存在推理能力,而传统管道则先冷启动监管的微调(SFT)阶段,然后RL(SFT)阶段可以限制模型的探索能力,并面临不完美的趋同。在这项工作中,我们引入了高级推理模型(Textbf{R}L textbf{I}Centrigificalizations and\ textbleblebleblef{Erlightivildivations) 方法,随后,SFTFS-S-Slickrlickr) 阶段的自我推理化能力在常规方法中,通过Slickrlickrlickreval 版本, 其自我推理化了我们快速推算的自我推介了两个关键推算。
Article 96
Title@2025-06-26 (4): Self-Regulated Neurogenesis for Online Data-Incremental Learning
Title: Self-Regulated Neurogenesis for Online Data-Incremental Learning | Selbstregulierte Neurogenese für Online-Daten-Inkrementelles Lernen | 在线数据强化学习自调节神经源 2403.14684v2 |
Authors (4): Murat Onur Yildirim, Elif Ceren Gok Yildirim, Decebal Constantin Mocanu, Joaquin Vanschoren
Neural networks often struggle with catastrophic forgetting when learning sequences of tasks or data streams, unlike humans who can continuously learn and consolidate new concepts even in the absence of explicit cues. Online data-incremental learning seeks to emulate this capability by processing each sample only once, without having access to task or stream cues at any point in time since this is more realistic compared to offline setups, where all data from novel class(es) is assumed to be readily available. However, existing methods typically rely on storing the subsets of data in memory or expanding the initial model architecture, resulting in significant computational overhead. Drawing inspiration from ‘self-regulated neurogenesis’-brain’s mechanism for creating specialized regions or circuits for distinct functions-we propose a novel approach SERENA which encodes each concept in a specialized network path called ‘concept cell’, integrated into a single over-parameterized network. Once a concept is learned, its corresponding concept cell is frozen, effectively preventing the forgetting of previously acquired information. Furthermore, we introduce two new continual learning scenarios that more closely reflect real-world conditions, characterized by gradually changing sample sizes. Experimental results show that our method not only establishes new state-of-the-art results across ten benchmarks but also remarkably surpasses offline supervised batch learning performance. The code is available at https://github.com/muratonuryildirim/serena.
当学习任务或数据流序列时,神经网络往往与灾难性的忘记而挣扎,而人类在学习任务或数据流序列时,与人类不同,即使没有明确的提示,也能不断学习和整合新概念。在线数据强化学习寻求通过只处理一次样本来效仿这一能力,而没有机会获得任务或流线索,因为这与离线设置相比更加现实,因为与离线设置相比,新类的所有数据都被认为很容易获得。然而,现有方法通常依赖于将数据子集存储在记忆中或扩大初始模型结构,从而产生重要的计算间接费用。从“自我调节神经源生成”的大脑机制中汲取灵感,为不同功能创建专门区域或电路的灵感,我们建议一种新型的SERENA方法,将每个概念编码在称作“感官细胞”的专门网络路径中,并整合到一个单一的超参数化的网络中。一旦一个概念被发现,其相应的概念细胞就会被冻结,从而有效地防止忘记先前获得的信息。此外,我们引入了两种新的持续学习情景,更近地反映现实世界状况,以逐渐变化的样本大小为特征特征特征特征的特征。实验性结果显示我们无法彻底地评估的结果显示。
Article 97
Title@2025-06-26 (4): Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design
Title: Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design | Vielfältige Mini-Batch-Auswahl in Verstärkungs-Lernen für effiziente chemische Exploration in de novo Drug Design | 为在新药设计中进行高效化学勘探而加强学习的多样化小型批次选择 2506.21158v1 |
Authors (5): Hampus Gummesson Svensson, Ola Engkvist, Jon Paul Janet, Christian Tyrchan, Morteza Haghir Chehreghani
In many real-world applications, evaluating the goodness of instances is often costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In particular, this is even more critical in reinforcement learning, as new interactions with the environment (i.e., new instances) need to be evaluated to provide a reward signal to learn from. As sufficient exploration is crucial, learning from a diverse mini-batch can have a large impact and help mitigate mode collapse. In this paper, we introduce diverse mini-batch selection for reinforcement learning and propose to use determinantal point processes for this task. We study this framework in the context of a real-world problem, namely drug discovery. We experimentally study how our proposed framework can improve the effectiveness of chemical exploration in de novo drug design, where finding diverse and high-quality solutions is essential. We conduct a comprehensive evaluation with three well-established molecular generation oracles over numerous generative steps. Our experiments conclude that our diverse mini-batch selection framework can substantially improve the diversity of the solutions, while still obtaining solutions of high quality. In drug discovery, such outcome can potentially lead to fulfilling unmet medication needs faster.
在许多现实世界应用中,评估各种事例的好坏往往成本高,而且耗费时间,例如人类反馈和物理模拟,而不是提出新的事例;特别是,这在强化学习方面更为重要,因为需要评估与环境的新互动(即新事例),以提供从中学习的奖励信号。由于充分探索至关重要,从不同的微型批中学习可以产生巨大影响,有助于缓解模式崩溃。在本文件中,我们为强化学习采用不同的小型批量选择,并提议为这项任务使用决定性点进程。我们在现实世界问题(即毒品发现)背景下研究这一框架。我们实验研究我们提议的框架如何提高化学勘探在新药物设计(即寻找多样化和高质量解决方案至关重要)中的效率。我们用三种成熟的分子生成或触法来全面评估许多具有基因特征的步骤。我们的实验结论是,我们多样化的小型批量选择框架可以大大改善解决方案的多样性,同时仍然获得高质量的解决方案。在药物发现中,这种结果有可能导致满足未得到满足的药物需求。
Article 98
Title@2025-06-26 (4): Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
Title: Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation | Schätzung von transformerbasierten räumlich-zeitlichen kontrafaktischen Ergebnissen | 以变换器为基础的空间-时-时-时-反事实结果估计 2506.21154v1 |
Authors (6): He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang
The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at https://github.com/lihe-maxsize/DeppSTCI_Release_Version-master.
真实世界自然具有时间和空间的维度。 因此,用空间时空属性来估计反事实结果是一个关键问题。 但是,以前的方法是以古典统计模型为基础的,这些模型在性能和一般化方面仍然有局限性。本文件提出了一个新颖的框架,用以利用变形器来估计具有空间时空属性的反事实结果,其估计能力更强。根据温和的假设,在这个框架内拟议的估计数字是一致的,且不时的正常。为了验证我们的方法的有效性,我们进行模拟实验和真实的数据实验。模拟实验表明,我们的估计数据比基线方法有更强的估计能力。真实的数据实验为哥伦比亚森林损失冲突因果关系提供了宝贵的结论。源代码见https://github.com/lihe-maxfard/DeppSTCI_Release_Version-master。
Article 99
Title@2025-06-26 (4): A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security
Title: A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security | Ein neuartiges, lernbasiertes IDS zur Verbesserung der Privatsphäre und Sicherheit von UAVs | 旨在加强无人驾驶航空器隐私和安全的新联邦学习型新学习型ISDS 2312.04135v3 |
Authors (4): Ozlem Ceviz, Pinar Sadioglu, Sevil Sen, Vassilios G. Vassilakis
Unmanned aerial vehicles (UAVs) operating within Flying Ad-hoc Networks (FANETs) encounter security challenges due to the dynamic and distributed nature of these networks. Previous studies focused predominantly on centralized intrusion detection, assuming a central entity responsible for storing and analyzing data from all devices. However, these approaches face challenges including computation and storage costs, along with a single point of failure risk, threatening data privacy and availability. The widespread dispersion of data across interconnected devices underscores the need for decentralized approaches. This paper introduces the Federated Learning-based Intrusion Detection System (FL-IDS), addressing challenges encountered by centralized systems in FANETs. FL-IDS reduces computation and storage costs for both clients and the central server, which is crucial for resource-constrained UAVs. Operating in a decentralized manner, FL-IDS enables UAVs to collaboratively train a global intrusion detection model without sharing raw data, thus avoiding delay in decisions based on collected data, as is often the case with traditional methods. Experimental results demonstrate FL-IDS’s competitive performance with Central IDS (C-IDS) while mitigating privacy concerns, with the Bias Towards Specific Clients (BTSC) method further enhancing FL-IDS performance even at lower attacker ratios. Comparative analysis with traditional intrusion detection methods, including Local IDS (L-IDS), sheds light on the strengths of FL-IDS. This study significantly contributes to UAV security by introducing a privacy-aware, decentralized intrusion detection approach tailored to UAV networks. Moreover, by introducing a realistic dataset for FANETs and federated learning, our approach differs from others lacking high dynamism and 3D node movements or accurate federated data federations.
由于这些网络的动态和分布性质,无人驾驶航空飞行器(无人驾驶飞行器)在飞行Ad-Hoc网络内运作,面临安全挑战。以前的研究主要侧重于中央入侵探测,假设是一个负责储存和分析所有装置数据的中央实体,假设是一个负责储存和分析所有装置数据的中央实体。然而,这些方法面临种种挑战,包括计算和储存成本,以及单一的故障风险、数据隐私和可用性等单一点;不同相互关联的装置之间数据的广泛分散突出表明需要分散做法。本文件介绍了基于学习的联邦入侵发现系统(FLF-IDS),以应对FANET系统中央系统遇到的挑战。FL-IDS降低客户和中央服务器的计算和存储成本,这对于资源缺乏的UAAVs都至关重要。FL-IDS以分散的方式运作,使UAVs能够合作培训全球入侵探测模型,而无需分享原始数据,从而避免根据所收集的数据做出决策的延误,这通常是传统方法的例子。实验结果显示FL-IDS采用FL-FS的竞争力,与中央诊断系统(C-IDS)的竞争力表现较低,同时减轻了客户间系统内部识别系统,而更深层的内越快路路路路路段,并改进了内部测方法,并改进了对越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越越
Article 100
Title@2025-06-26 (4): Linearity-based neural network compression
Title: Linearity-based neural network compression | Linearitätsbasierte neuronale Netzwerkkompression | 线性神经网络压缩 2506.21146v1 |
Authors (2): Silas Dobler, Florian Lemmerich
In neural network compression, most current methods reduce unnecessary parameters by measuring importance and redundancy. To augment already highly optimized existing solutions, we propose linearity-based compression as a novel way to reduce weights in a neural network. It is based on the intuition that with ReLU-like activation functions, neurons that are almost always activated behave linearly, allowing for merging of subsequent layers. We introduce the theory underlying this compression and evaluate our approach experimentally. Our novel method achieves a lossless compression down to 1/4 of the original model size in over the majority of tested models. Applying our method on already importance-based pruned models shows very little interference between different types of compression, demonstrating the option of successful combination of techniques. Overall, our work lays the foundation for a new type of compression method that enables smaller and ultimately more efficient neural network models.
在神经网络压缩中,大多数当前方法都通过测量重要性和冗余度来减少不必要的参数。为了增加已经高度优化的现有解决方案,我们提议以线性压缩作为降低神经网络重量的新方式。它基于直觉,即使用RELU式的激活功能,几乎总是激活的神经元可以线性地运行,从而可以合并随后的层次。我们引入了这种压缩背后的理论,并实验性地评估了我们的方法。我们的新方法在大多数测试过的模型中实现了无损压缩,将原始模型大小降至四分之一。在已经基于重要性的细微模型上应用我们的方法,表明不同类型压缩模型之间的干扰很小,显示了成功组合技术的选择。总体而言,我们的工作为一种新的压缩方法奠定了基础,该方法使得较小并最终更加有效的神经网络模型得以实现。
Article 101
Title@2025-06-26 (4): Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion
Title: Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion | Personalisiertes Federated Learning durch Dual-Prompt-Optimierung und Cross Fusion | 通过双速优化和交叉融合进行个性化联邦学习 2506.21144v1 |
Authors (5): Yuguang Zhang, Kuangpu Guo, Zhihe Lu, Yunbo Wang, Jian Liang
Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client’s unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.
联邦学习(FL)使分散客户之间的合作模式培训成为合作模式,不分享当地数据,但受到数据、计算和交流方面差异性的挑战。先入为主的视力语言模型(VLM),由于通过提示进行强烈的普及和轻量调,提供了很有希望的解决办法。然而,现有的联邦化快速学习方法仅依靠文字提示,忽视了标签-域分配方面的联合转变。在本文中,我们提议了一个基于双速学习和交叉融合的个性化FL框架,称为PFedDC。具体地说,每个客户在各种愿景和语言模式之间都保持全球和地方的快感:全球快感获取全联邦共享的共同知识,而本地快感则将客户特定的语义和域特性编码。与此同时,一个交叉融合模块的设计是为了适应性地整合不同级别的快感,使模型产生与每个客户的独特数据分布相匹配的个人化表达方式。在九个数据集和各种不同种类的异质性化模型中进行的广泛实验表明,PFDC一贯地超越了全联邦之间共享的状态方法。
Article 102
Title@2025-06-26 (4): Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks
Title: Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks | Generative Adversarial Evasion und Out-of-Distribution-Detection für UAV-Cyber-Attacks | 无人驾驶航空飞行器网络设备生成反向疏散和分配外探测 2506.21142v1 |
Authors (2): Deepak Kumar Panda, Weisi Guo
The growing integration of UAVs into civilian airspace underscores the need for resilient and intelligent intrusion detection systems (IDS), as traditional anomaly detection methods often fail to identify novel threats. A common approach treats unfamiliar attacks as out-of-distribution (OOD) samples; however, this leaves systems vulnerable when mitigation is inadequate. Moreover, conventional OOD detectors struggle to distinguish stealthy adversarial attacks from genuine OOD events. This paper introduces a conditional generative adversarial network (cGAN)-based framework for crafting stealthy adversarial attacks that evade IDS mechanisms. We first design a robust multi-class IDS classifier trained on benign UAV telemetry and known cyber-attacks, including Denial of Service (DoS), false data injection (FDI), man-in-the-middle (MiTM), and replay attacks. Using this classifier, our cGAN perturbs known attacks to generate adversarial samples that misclassify as benign while retaining statistical resemblance to OOD distributions. These adversarial samples are iteratively refined to achieve high stealth and success rates. To detect such perturbations, we implement a conditional variational autoencoder (CVAE), leveraging negative log-likelihood to separate adversarial inputs from authentic OOD samples. Comparative evaluation shows that CVAE-based regret scores significantly outperform traditional Mahalanobis distance-based detectors in identifying stealthy adversarial threats. Our findings emphasize the importance of advanced probabilistic modeling to strengthen IDS capabilities against adaptive, generative-model-based cyber intrusions.
由于传统的异常探测方法往往未能发现新的威胁,因此越来越将无人驾驶飞行器纳入民用空域,这突出表明需要有有弹性和智能的入侵探测系统(IDS),因为传统的异常探测方法往往无法识别新的威胁。一种共同的方法将不熟悉的攻击视为超出分配的样本;然而,这在缓解措施不足时使系统变得脆弱。此外,常规的OOD探测器努力区分隐形对抗性攻击与真正的OOOD事件。本文介绍了一个基于隐形对称对抗性攻击的有条件的质优对抗性网络(cGAN)基础框架,用于策划逃避IDS网络机制的隐形对抗性对称攻击。我们首先设计一个强有力的多级IMDS分类,在良性对等和已知的网络上对称性攻击,包括拒绝提供服务(DoS)、虚假数据注入(FDI)、中层人(MIMTM)和重现攻击。我们已知的CGGAN 突击性对立式对立式对立式对等模型。这些对隐形和成功率很高。为了检测这种透视,我们采用了对等式的对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等的对等式对等式对等式对等式对等式对等式对等式对等式对等式对等式对等的对等式对等式对等式对等式对等式对式对式对等式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对等式对等式对等式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式对式
Article 103
Title@2025-06-26 (4): Multi-convex Programming for Discrete Latent Factor Models Prototyping
Title: Multi-convex Programming for Discrete Latent Factor Models Prototyping | Multi-convex-Programmierung für diskrete Latent Factor Models Prototyping | Discrete 后端因数模型的多contex 编程程序 2504.01431v2 |
Authors (4): Hao Zhu, Shengchao Yan, Jasper Hoffmann, Joschka Boedecker
Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.
在机器学习、经济学、神经科学、心理学等不同领域广泛使用分辨潜伏要素模型。 目前,将DLFM安装到某些数据集中,取决于个人模型的定制求解器,这需要做出大量努力才能实施,并仅限于DLFM模型的特定实例。在本文件中,我们提出了一个基于CVXPY的通用框架,允许用户在一个非常简短的脚本内具体描述和解决包括回归和分类模型在内的多种DLFM模型的合适问题。我们的框架灵活且内在地支持对DLFM参数和潜在因素的正规化条件和限制的整合,使用户能够方便地根据其数据集和应用假想情况对DLFM结构进行原型。我们介绍了开源Python模型的实施,并在几个实例中说明框架。
Article 104
Title@2025-06-26 (4): DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding
Title: DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding | DBConformer: Doppel-Branch-Konvolutionstransformator für EEG-Dekodierung | DBCon前导体: EEG 解码的双相相相电变异变异器 2506.21140v1 |
Authors (6): Ziwei Wang, Hongbin Wang, Tianwang Jia, Xingyi He, Siyang Li, Dongrui Wu
Electroencephalography (EEG)-based brain-computer interfaces (BCIs) transform spontaneous/evoked neural activity into control commands for external communication. While convolutional neural networks (CNNs) remain the mainstream backbone for EEG decoding, their inherently short receptive field makes it difficult to capture long-range temporal dependencies and global inter-channel relationships. Recent CNN-Transformer (Conformers) hybrids partially address this issue, but most adopt a serial design, resulting in suboptimal integration of local and global features, and often overlook explicit channel-wise modeling. To address these limitations, we propose DBConformer, a dual-branch convolutional Transformer network tailored for EEG decoding. It integrates a temporal Conformer to model long-range temporal dependencies and a spatial Conformer to extract inter-channel interactions, capturing both temporal dynamics and spatial patterns in EEG signals. A lightweight channel attention module further refines spatial representations by assigning data-driven importance to EEG channels. Extensive experiments on five motor imagery (MI) datasets and two seizure detection datasets under three evaluation settings demonstrate that DBConformer consistently outperforms 10 competitive baseline models, with over eight times fewer parameters than the high-capacity EEG Conformer baseline. Further, the visualization results confirm that the features extracted by DBConformer are physiologically interpretable and aligned with sensorimotor priors in MI. The superior performance and interpretability of DBConformer make it reliable for robust and explainable EEG decoding. Code is publicized at https://github.com/wzwvv/DBConformer.
以大脑-计算机界面为基础的大脑-计算机界面(EEG)将自发/反向神经活动转化成外部通信的控制指令。虽然 convolual 神经网络(CNNs)仍然是 EEG解码的主要主干线,但它们固有的短期可接受字段使得难以捕捉长距离时间依赖和全球通道关系,最近CNN-Transfer(Contrads)的混合部分地解决这一问题,但多数人采用序列设计,导致地方和全球特征的不优化整合,并常常忽视明确的通道型模型。为了应对这些限制,我们提议DBConfor, 双布拉茨的神经神经神经网络(CNNs)仍然是为 EEEG解码设计而专门设计的主流主干线连接网络。它将时间连接到远距离时间依赖和空间连接连接到电网际互动中,在EEEEEG频道中,轻度关注模块通过赋予数据驱动的重要性来进一步改进空间表现。在五部运动图像(MI)上进行广泛的实验,在EBC的高级直观基准度上,在EBS-S-Servereal 上,在三种直径前的直径数据定位数据模型下进行持续的直判判分校判分比EB。
Article 105
Title@2025-06-26 (4): Solving Inverse Problem for Multi-armed Bandits via Convex Optimization
Title: Solving Inverse Problem for Multi-armed Bandits via Convex Optimization | Inverses Problem für mehrarmige Banditen durch Convex-Optimierung lösen | 通过 Convex 优化解决多武装强盗的反向问题 2501.18945v3 |
Authors (2): Hao Zhu, Joschka Boedecker
We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.
我们考虑了在行为模型的神经学和心理学研究中广泛使用的多武装匪徒(IMAB)的反面问题。我们首先发现IMAB问题不是一般的混凝土问题,而是可以通过变异变换而放松到混凝土问题。根据这个结果,我们提出一个两步相继的累赘,以(大约)解决IMAB问题。我们讨论了一个条件,即我们的方法以证书和近似方法为IMAB问题提供全球解决方案,以进一步节省计算时间。数字实验表明,我们的超常方法比通过重复的地方优化直接解决IMAB问题更强大,并且能够在大大缩短的运行时间内实现蒙特卡洛方法的效绩。我们提供了基于CVXPY的方法的实施,该方法允许不精通于 convex优化的用户直接应用。
Article 106
Title@2025-06-26 (4): NaLaFormer: Norm-Aware Linear Attention for Transformer Models
Title: NaLaFormer: Norm-Aware Linear Attention for Transformer Models | NaLaFormer: Norm-Aware Lineare Aufmerksamkeit für Transformer-Modelle | NaLaFormer: 变形模型的诺姆- Aware 线性注意 2506.21137v1 |
Authors (6): Weikang Meng, Yadan Luo, Liangyu Huo, Yaowei Wang, Xin Li, Zheng Zhang
Linear attention has emerged as a viable alternative to softmax attention by reducing complexity from quadratic to linear in sequence length. To preserve two fundamental properties of softmax, non-negativity and entropy reduction, current works employ various linearly separatable kernel functions with $L1$ normalization instead of softmax operator. However, query norms are neglected by the normalization operation in linear attention, such degradation heavily leads to an entropy gap. Meanwhile, existing works inhibit negative values of query and key vectors resulting in a missing inner-product interactions after being mapped. To address these dual challenges, we propose a novel Norm-Aware Linear Attention mechanism serving to restore norm-guided dynamic spikiness and recover kernel-perturbed norm distributions. Specifically, we first decouple query and key matrices into two components: norm and direction, to achieve norm-aware spikiness control and norm consistency, respectively. We mathematically reveal that the extent of entropy reduction varies with the query norm in softmax normalization, motivating a query-norm aware kernel function for dynamic control over entropy reduction. Furthermore, to ensure norm consistency and enforce non-negativity constraints, we employ a norm-preserving mapping to project all elements of the angular matrix into positive values, leveraging cosine similarity to inhibit dimensions with opposite directions. We conduct extensive experiments demonstrating that the NaLaFormer improves performance on vision and language tasks, enhancing both expressiveness and efficiency by up to 4.2\%.
线性关注作为软体关注的一种可行替代方法,通过在序列长度上将复杂性从四面形降低到线性。为了维护软体、非惯性和非递减的两种基本特性,目前的工程采用各种线性分离内核功能,以1美元为常规,而不是软体操作者。然而,常规操作在线性关注中忽视了查询规范,这种退化在很大程度上导致一个螺旋差距。与此同时,现有的工程抑制了查询和关键矢量的负值,导致在绘图后出现缺失的内产品互动。为了应对这些双重挑战,我们提议建立一个新的诺姆-Award 线性关注机制,用于恢复规范引导的动态动态螺旋螺旋,并恢复内螺旋内核规范分布。具体地说,我们首先将调调调和关键矩阵分为两个组成部分:规范和方向,以实现规范性螺旋螺旋调和规范一致性控制,并实现规范一致性。我们数学显示,在软体性规范正常化中,鼓励对内核内核内核的内核内核内核功能功能功能,同时显示对内核规范的稳定性的稳定性,同时进行弹性调整。
Article 107
Title@2025-06-26 (4): Inverse Reinforcement Learning via Convex Optimization
Title: Inverse Reinforcement Learning via Convex Optimization | Inverse Verstärkungs-Lernen über Convex-Optimierung | 通过Convex优化化进行反强化学习 2501.15957v2 |
Authors (3): Hao Zhu, Yuan Zhang, Joschka Boedecker
We consider the inverse reinforcement learning (IRL) problem, where an unknown reward function of some Markov decision process is estimated based on observed expert demonstrations. In most existing approaches, IRL is formulated and solved as a nonconvex optimization problem, posing challenges in scenarios where robustness and reproducibility are critical. We discuss a convex formulation of the IRL problem (CIRL) initially proposed by Ng and Russel, and reformulate the problem such that the domain-specific language CVXPY can be applied directly to specify and solve the convex problem. We also extend the CIRL problem to scenarios where the expert policy is not given analytically but by trajectory as state-action pairs, which can be strongly inconsistent with optimality, by augmenting some of the constraints. Theoretical analysis and practical implementation for hyperparameter auto-selection are introduced. This note helps the users to easily apply CIRL for their problems, without background knowledge on convex optimization.
我们考虑了反强化学习(IRL)问题,根据观察到的专家演示,估计了某些Markov决策过程的一个未知的奖励功能。在多数现有办法中,IRL被作为非convex优化问题制定和解决,对稳健性和可复制性至关重要的情景提出了挑战。我们讨论了Ng和Russel最初提出的IRL问题(CIRL)的直截了当的提法,并重塑了问题,使特定域语言CVXPY能够直接用于具体确定和解决convex问题。我们还将CIRL问题扩大到专家政策没有分析,而是作为州-行动对口的轨迹,这与最佳性有很大矛盾,为此增加了一些限制。引入了理论分析和超分计自动选择的实际实施。本说明帮助用户在不掌握对矩对准的背景知识的情况下,方便地应用CIRL解决他们的问题。
Article 108
Title@2025-06-26 (4): Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks
Title: Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks | Curriculum-geführtes Antifragiles Verstärkungslernen für sichere UAV-Dekonfliktion unter Beobachtungs-Raumangriffen | 在观测-空间攻击下安全无人驾驶飞行器消除冲突课程-指导反脆弱强化学习 2506.21129v1 |
Authors (3): Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo
Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation-space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert-guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision-making in environments with evolving threat scenarios.
在动态空气空间的无人驾驶飞行器(UAV)导航等安全临界系统中部署的强化学习(RL)政策,在动态空气空间的无人驾驶飞行器(UAV)导航中,很容易在观测空间出现超分配(OOOD)对抗性攻击。这些攻击导致分布性变化,显著降低价值估计,导致不安全或不优化决策,使现有政策变得脆弱。为解决这一脆弱性,我们提议了一个抗脆弱RL框架,以适应对抗对抗性扰动课程。这个框架引入了一个模拟攻击器,以逐步增强观测-空间扰动的强度,使RL代理在更广泛的OOOD观察中能够适应和概括各种超分配(OOOD)对抗性攻击。我们首先从理论上描述脆弱性的特征,正式将灾难性的遗忘定义为值函数分布的单项差异性差异,使现有的政策更加脆弱。我们以此为基础,将抗脆弱性定义为这种价值变化的界限,使这些遗忘的适应性条件趋于稳定。我们的方法通过反复的专家-指导的批评者校正校正的校正,用瓦列斯坦(VL)的距离最小性最小化地在渐进式的测测测测测测测测测测测测点上三度下,在不断变压的基线上,在持续地测测测地测测测测测测测测测测测测测测测测测测测地的模型中,显示的模型。
Article 109
Title@2025-06-26 (4): Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments
Title: Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments | Robuster Policy-Switch für Antifragiles Verstärkungslernen für UAV-Deconfliction in Adversarial Environments | 在逆向环境中为无人驾驶航空器消除冲突而进行抗脆弱强化学习的强有力政策转换 2506.21127v1 |
Authors (2): Deepak Kumar Panda, Weisi Guo
The increasing automation of navigation for unmanned aerial vehicles (UAVs) has exposed them to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL) through sensor manipulation. Although existing robust RL methods aim to mitigate such threats, their effectiveness has limited generalization to out-of-distribution shifts from the optimal value distribution, as they are primarily designed to handle fixed perturbation. To address this limitation, this paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts by incorporating a switching mechanism based on discounted Thompson sampling (DTS). This mechanism dynamically selects among multiple robust policies to minimize adversarially induced state-action-value distribution shifts. The proposed approach first derives a diverse ensemble of action robust policies by accounting for a range of perturbations in the policy space. These policies are then modeled as a multiarmed bandit (MAB) problem, where DTS optimally selects policies in response to nonstationary Bernoulli rewards, effectively adapting to evolving adversarial strategies. Theoretical framework has also been provided where by optimizing the DTS to minimize the overall regrets due to distributional shift, results in effective adaptation against unseen adversarial attacks thus inducing antifragility. Extensive numerical simulations validate the effectiveness of the proposed framework in complex navigation environments with multiple dynamic three-dimensional obstacles and with stronger projected gradient descent (PGD) and spoofing attacks. Compared to conventional robust, non-adaptive RL methods, the antifragile approach achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories compared to existing robust RL techniques
无人驾驶飞行器(无人驾驶飞行器)导航自动化的日益增强使其暴露于对抗性攻击中,这种攻击利用了通过传感器操纵强化学习(RL)的脆弱性。虽然现有稳健的RL方法旨在减轻这种威胁,但其效力限制了从最佳价值分布向分配外的转变,因为其主要设计是为了处理固定扰动。为了应对这一限制,本文件引入了抗脆弱RL框架,通过采用基于折扣汤普森抽样的转换机制,增强适应更广泛的分配变化。这一机制在多种稳健政策中强有力地选择了更高的政策,以尽量减少对抗性引发的州-行动价值分布变化。虽然现有的RLL方法旨在减轻这种威胁,但其效力从最佳分配性分布性分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式分布式移动式移动式移动式移动式移动式排列式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动式移动
Article 110
Title@2025-06-26 (4): Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection
Title: Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection | Trade-Off-Grenzen drücken: Kompakte und dennoch effektive Fernerkundungs-Änderungserkennung | 推进贸易-开放边界:结合但有效的遥感变化探测 2506.21109v1 |
Authors (3): Luosheng Xu, Dalin Zhang, Zhaohui Song
Remote sensing change detection is essential for monitoring urban expansion, disaster assessment, and resource management, offering timely, accurate, and large-scale insights into dynamic landscape transformations. While deep learning has revolutionized change detection, the increasing complexity and computational demands of modern models have not necessarily translated into significant accuracy gains. Instead of following this trend, this study explores a more efficient approach, focusing on lightweight models that maintain high accuracy while minimizing resource consumption, which is an essential requirement for on-satellite processing. To this end, we propose FlickCD, which means quick flick then get great results, pushing the boundaries of the performance-resource trade-off. FlickCD introduces an Enhanced Difference Module (EDM) to amplify critical feature differences between temporal phases while suppressing irrelevant variations such as lighting and weather changes, thereby reducing computational costs in the subsequent change decoder. Additionally, the FlickCD decoder incorporates Local-Global Fusion Blocks, leveraging Shifted Window Self-Attention (SWSA) and Enhanced Global Self-Attention (EGSA) to efficiently capture semantic information at multiple scales, preserving both coarse- and fine-grained changes. Extensive experiments on four benchmark datasets demonstrate that FlickCD reduces computational and storage overheads by more than an order of magnitude while achieving state-of-the-art (SOTA) performance or incurring only a minor (<1\% F1) accuracy trade-off. The implementation code is publicly available at https://github.com/xulsh8/FlickCD.
对监测城市扩张、灾害评估和资源管理而言,遥感变化探测对于监测城市扩张、灾害评估和资源管理至关重要,它提供了及时、准确和大规模地洞了解动态景观变化的变化变化。虽然深层次的学习使变化的探测发生了革命性的变化,但现代模型日益复杂和计算的需求并不一定转化为显著的准确性收益。本研究没有遵循这一趋势,而是探索一种效率更高的方法,侧重于保持高准确性的轻量模型,同时尽量减少资源消耗,这是卫星处理的一项基本要求。为此,我们提议FlickCD,这意味着快速闪烁,然后取得巨大成果,推动业绩资源交易的界限。FlickCD引入了一个强化差异模块(EDM),以扩大时间阶段之间的关键特征差异,同时抑制灯光和天气变化等不相关的变化,从而减少随后变化的计算成本。此外,FlickCD 解码包含本地-全球贸易变压区,利用已调整窗口自我保存(SWSA)和强化全球自控(EGSA),以在多个尺度上有效收集语系信息,既保持coreal-and-frical-deal-deal dealalalalalalal lax),同时又通过公开进行四级的存储。
Article 111
Title@2025-06-26 (4): Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges
Title: Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges | Unhaltbar: Unpaarte Single-Cell-Multi-Perturbation-Schätzung durch Dual Conditional Diffusion Implizite Brücken | 不持久: 由双条件分解隐形桥进行无压单细胞多扰动估计 2506.21107v1 |
Authors (8): Changxi Chi, Jun Xia, Yufei Huang, Jingbo Zhou, Siyuan Li, Yunfan Liu, Chang Yu, Stan Z. Li
Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell’s phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model’s insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.
在各种扰动中估计单细胞反应的跨度,有助于识别关键基因,提高药物筛选效率。然而,单细胞测序是一个破坏性过程,无法在扰动前后捕捉同一细胞的苯型,因此,在扰动和无扰动条件下收集的数据本质上是无法调节的。现有的方法要么试图使用随机抽样强行配对未受扰动的数据,要么忽视在建模期间未受扰动和受扰动的细胞之间的内在关系。在这项工作中,我们提议了一个基于双分传隐蔽桥梁(DDIB)的框架,以学习不同数据分布之间的绘图,有效地应对未受扰动数据的挑战。因此,我们进一步将这一框架解释为数据增强的一种形式。我们整合了基因调控网络信息,以具有生物意义的方式传播扰动信号,进一步纳入一种用于预测静态基因的掩码模型,提高生成的掩码质量。此外,在相同的扰动中,基因的表达方式往往在不同的细胞之间差异很大,经常展示一种由不同分类的内向的内嵌式的直径,并显示一种双向式的直径流分布。我们提出了一种更深的直径直径直径直径的直径的直径的模型。我们提出了一种更深的直径直径的直径直径的直径的模型,用模型,将一个反向式的测测测测测测测测测测测测测测测测测测测测测测测测测算。
Article 112
Title@2025-06-26 (4): Learning to Skip the Middle Layers of Transformers
Title: Learning to Skip the Middle Layers of Transformers | Lernen, die mittleren Schichten der Transformer zu überspringen | 学习跳过变换器的中层 2506.21103v1 |
Authors (2): Tim Lawson, Laurence Aitchison
Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers) or skip layers independently of one another. However, interpretability research has demonstrated that the middle layers of Transformers exhibit greater redundancy, and that early layers aggregate information into token positions. Guided by these insights, we propose a novel architecture that dynamically skips a variable number of layers from the middle outward. In particular, a learned gating mechanism determines whether to bypass a symmetric span of central blocks based on the input, and a gated attention mechanism prevents subsequent tokens from attending to skipped token positions. Residual norms are controlled with a ‘sandwich’ or ‘perilayernorm’ scheme and gate sparsity with an adaptive regularization loss. We had aimed to reduce compute requirements for ‘simpler’ tokens and potentially foster an emergent multi-level representational hierarchy but, at the scales investigated, our approach does not achieve improvements in the trade-off between validation cross-entropy and estimated FLOPs compared to dense baselines with fewer layers. We release our code at https://github.com/tim-lawson/skip-middle.
有条件计算是提高变异器效率的流行战略。 现有方法通常针对单个模块( 如专家层混合) 或互不相干。 然而, 可解释性研究显示, 中变异器的中层表现出更大的冗余, 并且早期的层汇总信息会形成象征位置。 在这些洞察力的指导下, 我们提出了一个新结构, 动态地跳过中外的多层。 特别是, 一个学习的格子机制决定是否绕过基于输入的中央区块对称范围, 以及一个门式注意机制阻止随后的标志进入暂记位置。 剩余规范由“ andwich” 或“ perilaynorom” 方案控制, 门式偏差导致适应性规范损失。 我们的目的是减少对“ 质性” 符号的折数要求, 并可能促进新兴的多层代表等级等级。 但是, 在所调查的尺度上, 我们的方法并没有在验证跨层和估计的FLOPs之间的折算法上实现改进。 我们发布了在 http://gimb/ comb- lab- laps at at- sqours 。
Article 113
Title@2025-06-26 (4): Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning
Title: Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning | Interpretierbares Hierarchisches Konzept durch aufmerksamkeitsorientiertes Graphenlernen | 通过引人指导图表学习推理的可解释的等级概念 2506.21102v1 |
Authors (4): David Debot, Pietro Barbiero, Gabriele Dominici, Giuseppe Marra
Concept-Based Models (CBMs) are a class of deep learning models that provide interpretability by explaining predictions through high-level concepts. These models first predict concepts and then use them to perform a downstream task. However, current CBMs offer interpretability only for the final task prediction, while the concept predictions themselves are typically made via black-box neural networks. To address this limitation, we propose Hierarchical Concept Memory Reasoner (H-CMR), a new CBM that provides interpretability for both concept and task predictions. H-CMR models relationships between concepts using a learned directed acyclic graph, where edges represent logic rules that define concepts in terms of other concepts. During inference, H-CMR employs a neural attention mechanism to select a subset of these rules, which are then applied hierarchically to predict all concepts and the final task. Experimental results demonstrate that H-CMR matches state-of-the-art performance while enabling strong human interaction through concept and model interventions. The former can significantly improve accuracy at inference time, while the latter can enhance data efficiency during training when background knowledge is available.
以概念为基础的模型(CBM)是一类深层次学习模型,通过解释高层次概念的预测提供解释性,这些模型首先预测概念,然后利用这些模型执行下游任务。然而,目前的建立信任措施只为最后任务预测提供解释性,而概念预测本身通常是通过黑箱神经网络作出的。为解决这一局限性,我们提议了等级概念记忆理性(H-CMR),这是一种新的CBM,为概念和任务预测提供了解释性。H-CMR模型在概念之间使用一个有学识的、定向的周期性图表,其边缘代表着从其他概念的角度界定概念的逻辑规则。在推断中,H-CMR使用一个神经关注机制来选择这些规则的一组,然后按等级应用这些规则来预测所有概念和最终任务。实验结果表明,H-CMR与最新业绩相匹配,同时通过概念和模型干预使人类能够进行强有力的互动。前者可以大大提高推论时间的准确性,而后者可以在具备背景知识时提高培训的数据效率。
Article 114
Title@2025-06-26 (4): FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
Title: FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation | FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung | FeDa4fair:公平评价客户-联邦数据集 2506.21095v1 |
Authors (4): Xenia Heilmann, Luca Corbucci, Mattia Cerrato, Anna Monreale
Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients’ private data. However, fairness remains a key concern, as biases in local clients’ datasets can impact the entire federated system. Heterogeneous data distributions across clients may lead to models that are fairer for some clients than others. Although several fairness-enhancing solutions are present in the literature, most focus on mitigating bias for a single sensitive attribute, typically binary, overlooking the diverse and sometimes conflicting fairness needs of different clients. This limited perspective can limit the effectiveness of fairness interventions for the different clients. To support more robust and reproducible fairness research in FL, we aim to enable a consistent benchmarking of fairness-aware FL methods at both the global and client levels. In this paper, we contribute in three ways: (1) We introduce FeDa4Fair, a library to generate tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.
联邦学习(FL)使多个客户能够进行合作模式培训,而不必分享客户的私人数据。然而,公平性仍然是一个关键问题,因为当地客户数据集中的偏见会对整个联邦系统产生影响。不同客户之间的数据分布可能形成对一些客户更为公平的模型。虽然文献中存在若干促进公平的解决办法,但大多数侧重于减少单一敏感属性的偏差,通常为二进制,忽视不同客户的不同和有时相互矛盾的公平需要。这种有限的视角可能限制不同客户公平干预的有效性。为了支持在FL进行更稳健和可再现的公平性研究,我们力求在全球和客户两级实现公平意识FL方法的一致基准。在本文件中,我们以三种方式作出贡献:(1) 我们引入FeDa4Faiir,这是一个图书馆,以生成表格数据集,专门评估不同客户的偏差性公平法;(2) 我们发布四套偏向偏差的数据集和相应的基准,以比较受控环境中的公平性缓解方法;(3) 我们提供随时可用的功能,用以评价这些数据的公平性结果。
Article 115
Title@2025-06-26 (4): Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection
Title: Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection | Chain-of-Thought verbesserte Shallow Transformer für drahtlose Symbolerkennung | 用于无线探测无线符号探测的 研究链强化浅ow变压器 2506.21093v1 |
Authors (4): Li Fan, Peng Wang, Jing Yang, Cong Shen
Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work, we propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection. By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models (1-2 layers) without increasing model depth. This design enables lightweight Transformers to achieve detection performance comparable to much deeper models, making them well-suited for deployment on resource-constrained mobile devices. Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers, while maintaining storage and computational efficiency. This represents a promising direction for implementing Transformer-based algorithms in wireless receivers with limited computational resources.
变异器在解决无线通信问题方面显示出潜力,尤其是通过内流学习(ICL),模型通过不要求更新模型的提示而适应新的任务。然而,以前以ICL为基础的变异器模型依靠多层的深层结构才能达到令人满意的性能,从而导致大量的存储和计算成本。在这项工作中,我们提议采用高压压符号符号模拟(CHOOSE)的CHain,这是COT强化的浅质变器框架,用于无线符号检测。通过在隐藏空间引入自动递增潜潜潜推步骤,CHOOSE在不增加模型深度的情况下大大提高了浅层模型(1-2层)的推理能力。这一设计使轻质变异器能够取得与更深层模型相近的探测性能,使其非常适合在资源限制的移动装置上部署。实验结果表明,我们的方法在保持存储和计算效率的同时,比深层变异器的性能超过了常规浅变异器,并取得了与深层变异器的性能。这代表着在无线式接收器中采用基于变异器的极有希望的方向。
Article 116
Title@2025-06-26 (4): CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions
Title: CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | CovDocker: Benchmarking Covalent Drug Design mit Aufgaben, Datensätzen und Lösungen | CovDocker:用任务、数据集和解决办法确定共价药物设计基准 2506.21085v1 |
Authors (7): Yangzhe Peng, Kaiyuan Gao, Liang He, Yuheng Cong, Haiguang Liu, Kun He, Lijun Wu
Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds and the associated structural changes. To address this gap, we introduce a comprehensive benchmark for covalent docking, CovDocker, which is designed to better capture the complexities of covalent binding. We decompose the covalent docking process into three main tasks: reactive location prediction, covalent reaction prediction, and covalent docking. By adapting state-of-the-art models, such as Uni-Mol and Chemformer, we establish baseline performances and demonstrate the effectiveness of the benchmark in accurately predicting interaction sites and modeling the molecular transformations involved in covalent binding. These results confirm the role of the benchmark as a rigorous framework for advancing research in covalent drug design. It underscores the potential of data-driven approaches to accelerate the discovery of selective covalent inhibitors and addresses critical challenges in therapeutic development.
分子对接法在预测对蛋白质的捆绑模式和共价互动方面发挥着至关重要的作用,共价对接法涉及在对蛋白质和目标之间形成共价联结,由于它们具有很强的、持久的绑定性质,这些互动特别宝贵。然而,大多数现有的对接方法和深层学习方法几乎无法说明共价债券的形成和相关结构变化。为弥补这一差距,我们为共价对接法引入了一个综合基准CovDocker,目的是更好地捕捉共价对接的复杂性。我们把共价对接程序分解为三大主要任务:反应定位预测、共价反应预测和共价对接法。我们调整了诸如Uni-Mol和Chemorth等最新模型,建立了基线性能,并展示了基准在准确预测互动点和对共价约束所涉分子变化进行建模方面的有效性。这些结果证实了该基准作为推进对共价药物设计研究的严格框架的作用。我们强调了数据压制方法在加速选择性共价研究方面的潜力。
Article 117
Title@2025-06-26 (4): EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Title: EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception | EgoAdapt: Adaptive multisensorische Destillation und politisches Lernen für eine effiziente egozentrische Wahrnehmung | EgoAdapt: 适应性多感性蒸馏和政策学习,促进高效率的以地球为中心感知 2506.21080v1 |
Authors (10): Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao
Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.
现代认知模型,特别是那些设计用于多感性自我中心任务的模型,已经取得了显著的成绩,但往往需要大量计算成本。这些高需求对现实世界的部署提出了挑战,特别是在资源受限制的环境中。在本文中,我们引入了EgoAdapt,这是一个适应性地进行跨模式蒸馏和政策学习的框架,以便在不同的自我中心认知任务中进行有效的推论,包括自我中心行动识别、积极的演讲者定位和行为预测。我们拟议的政策模块适应特定任务的行动空间,使其广泛适用。三种具有挑战性的自我中心数据集EPIC-Kitchens、EaserCom和Aria的实验结果表明,我们的方法极大地提高了效率,将GMAC降低到89.09%,将参数降低到82.02 %,将能源降低到9.6x,而与此同时,在很多情况下,相应的州级模型的表现仍然优于业绩。
Article 118
Title@2025-06-26 (4): Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games
Title: Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games | Homogenisierung von Multi-Agent-Learning-Dynamik in Finite-State Markov Spiele | 在Finite- State-Markov运动会中多剂学习动态的同质化 2506.21079v1 |
Authors (1): Yann Kerzreho
This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game. The idea is to rescale the learning process by simultaneously reducing the learning rate and increasing the update frequency, effectively treating the agent’s parameters as a slow-evolving variable influenced by the fast-mixing game state. Under mild assumptions-ergodicity of the state process and continuity of the updates-we prove the convergence of this rescaled process to an ordinary differential equation (ODE). This ODE provides a tractable, deterministic approximation of the agent’s learning dynamics. An implementation of the framework is available at\,: https://github.com/yannKerzreho/MarkovGameApproximation
本文介绍了一种新的方法,以近似于多强化学习(RL)代理机构在有限状态Markov游戏中互动的多强化学习(RL)代理机构的学习动态。其想法是同时降低学习率,增加更新频率,同时调整学习过程,有效地将该代理机构的参数视为受快速混合游戏状态影响的一个缓慢变化的变量。在国家过程的轻度假设和升级的连续性下,我们证明这一重编进程与普通差异方程式(ODE)的趋同。这个ODE提供了该代理机构的学习动态的可移动、决定性近似。可在以下网站查阅框架的实施情况:https://github.com/yannKerzreho/MarkovGameAprocimation:https://github. com/yannKerzreho/MarkovGame-procimation。
Article 119
Title@2025-06-26 (4): Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph
Title: Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph | Verbesserung der LLM-Tool-Nutzung mit hochwertigen Instruktionsdaten aus Wissensgrafik | 利用来自知识图的高质量教学数据加强LLM工具的使用 2506.21071v1 |
Authors (10): Jingwei Wang, Zai Zhang, Hao Qian, Chunjing Gan, Binbin Hu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Bin Shi, Bo Dong
Teaching large language models (LLMs) to use tools is crucial for improving their problem-solving abilities and expanding their applications. However, effectively using tools is challenging because it requires a deep understanding of tool functionalities and user intentions. Previous methods relied mainly on LLMs to generate instruction data, but the quality of these data was often insufficient. In this paper, we propose a new method that uses knowledge graphs to generate high-quality instruction data for LLMs. Knowledge graphs are manually curated datasets rich in semantic information. We begin by extracting various query pathways from a given knowledge graph, which are transformed into a broad spectrum of user queries. We then translate the relationships between entities into actionable tools and parse the pathways of each query into detailed solution steps, thereby creating high-quality instruction data. Our experiments show that fine-tuning on just a small sample of this synthetic data can significantly improve the tool utilization and overall capabilities of LLMs.
教授大型语言模型(LLMS)使用工具对于提高其解决问题的能力和扩大应用至关重要。然而,有效使用工具具有挑战性,因为它要求深入了解工具功能和用户意图。以前的方法主要依靠LLMS生成教学数据,但这些数据的质量往往不足。在本文中,我们建议采用新方法,使用知识图为LLMS生成高质量的教学数据。知识图是人工整理的数据集,其语义信息丰富。我们首先从特定知识图中提取各种查询路径,这些路径被转换成广泛的用户查询。我们然后将各实体之间的关系转化为可操作的工具,将每个查询路径分析成详细的解决方案步骤,从而创建高质量的教学数据。我们的实验表明,只对少量的合成数据进行微调,就可以大大改进LMS的工具利用和总体能力。
Article 120
Title@2025-06-26 (4): SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations
Title: SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations | SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen | SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v3 |
Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.
隐性蒸馏差异计算(SDE)是时间序列和序列建模的有力工具。然而,培训隐性 SDE通常依赖联合灵敏度方法,这些方法取决于模拟和通过近似 SDE 解决方案的反向造影,这限制了可缩放性。在这项工作中,我们提出SDE匹配,这是用于培训隐性 SDE 的一种新的无模拟匹配方法。在现代记分和流动配对算法的启发下,我们将这些想法推广到时间序列和序列建模的随机动态领域,消除了成本昂贵的数字模拟的需要。我们的结果显示,SDE匹配在大幅降低计算复杂性的同时,取得了与共性灵敏度灵敏度相近的性能。
Article 121
Title@2025-06-26 (4): FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning
Title: FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning | FedDAA: Dynamisches Client-Clustering für Konzept Drift-Anpassung im Federated Learning | FedDAA: 联邦学习中适应概念的动态客户集群组合 2506.21054v1 |
Authors (2): Fu Peng, Ming Tang
In federated learning (FL), the data distribution of each client may change over time, introducing both temporal and spatial data heterogeneity, known as concept drift. Data heterogeneity arises from three drift sources: real drift (a shift in the conditional distribution P(y | x)), virtual drift (a shift in the input distribution P(x)), and label drift (a shift in the label distribution P(y)). However, most existing FL methods addressing concept drift primarily focus on real drift. When clients experience virtual or label drift, these methods often fail to selectively retain useful historical knowledge, leading to catastrophic forgetting. A key challenge lies in distinguishing different sources of drift, as they require distinct adaptation strategies: real drift calls for discarding outdated data, while virtual or label drift benefits from retaining historical data. Without explicitly identifying the drift sources, a general adaptation strategy is suboptimal and may harm generalization. To address this challenge, we propose FedDAA, a dynamic clustered FL framework designed to adapt to multi-source concept drift while preserving valuable historical knowledge. Specifically, FedDAA integrates three modules: a cluster number determination module to find the optimal number of clusters; a real drift detection module to distinguish real drift from virtual/label drift; and a concept drift adaptation module to adapt to new data while retaining useful historical information. We provide theoretical convergence guarantees, and experiments show that FedDAA achieves 7.84% to 8.52% accuracy improvements over state-of-the-art methods on Fashion-MNIST, CIFAR-10, and CIFAR-100. |
在联谊学习(FL)中,每个客户的数据分配可能随时间而改变,引入时间和空间数据差异性,称为概念流,数据差异性来自三个漂移源:真实流(有条件分布P(yx)的转换)、虚拟流(投入分布P(x)的改变)和标签流(标签分配P(y)的改变),但是,大多数处理概念流动的现有FL方法主要侧重于真实流动。当客户经历虚拟流动或标签流动时,这些方法往往无法选择性地保留有用的历史知识,导致灾难性的遗忘。关键挑战在于区分不同的漂移来源,因为它们需要不同的适应战略:真正流传呼吁放弃过时的数据,虚拟或标签从保存历史数据中获得的漂移好处。在没有明确确定漂移来源的情况下,一般适应战略是亚的亚性,可能损害一般化。然而,我们提议FDDAAAA是一个动态的FL框架,旨在适应多源流传概念流动,同时保存宝贵的历史知识。具体地,FDAA综合数组合数组合数确定三个模块:丢弃过时数据流过时数据,同时找到流流流动数据流动模模模模模模模模模模模模模模模版模版,同时显示流动模版模模模模版模版模模版模模模模模模模模模模模模模模模模模模版,以显示流模模模模模模模模模模模的流到流到流到流模。
Article 122
Title@2025-06-26 (4): Sharp concentration of uniform generalization errors in binary linear classification
Title: Sharp concentration of uniform generalization errors in binary linear classification | Scharfe Konzentration von einheitlichen Verallgemeinerungsfehlern in der binären linearen Klassifikation | 二进线线性分类中统一一般化误差的集中程度 2505.16713v2 |
Authors (1): Shogo Nakakita
We examine the concentration of uniform generalization errors around their expectation in binary linear classification problems via an isoperimetric argument. In particular, we establish Poincar'{e} and log-Sobolev inequalities for the joint distribution of the output labels and the label-weighted input vectors, which we apply to derive concentration bounds. The derived concentration bounds are sharp up to moderate multiplicative constants by those under well-balanced labels. In asymptotic analysis, we also show that almost sure convergence of uniform generalization errors to their expectation occurs in very broad settings, such as proportionally high-dimensional regimes. Using this convergence, we establish uniform laws of large numbers under dimension-free conditions.
我们通过等离子参数来检查在二元线性分类问题的预期值上统一一般化误差的集中程度。 特别是, 我们为输出标签和标签加权输入矢量的联合分布建立 Poincar' {e} 和log- Sobolev 不平等性, 我们运用这些不平等性来得出浓度界限。 衍生的浓度界限在平衡性标签之下, 达到中等的多倍常数。 在无症状分析中, 我们还表明, 统一一般化误差几乎肯定在非常广泛的情况下发生, 如比例高的系统。 我们利用这种趋同性, 在无维度条件下, 制定了大量统一的法律 。
Article 123
Title@2025-06-26 (4): Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling
Title: Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling | Verbesserung der Diffusions-basierten Bildbearbeitung Treue durch Anleitung und Planung | 通过指导和日程安排改进基于传播的图像编辑信仰 2506.21045v1 |
Authors (2): Hansam Cho, Seoung Bum Kim
Text-guided diffusion models have become essential for high-quality image synthesis, enabling dynamic image editing. In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved. However, achieving optimal results is challenging because of the inherent trade-off between editability and faithfulness. To address this, we propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability. FGS incorporates faithfulness guidance to strengthen the preservation of input image information and introduces a scheduling strategy to resolve misalignment between editability and faithfulness. Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability. Moreover, its compatibility with various editing methods enables precise, high-quality image edits across diverse tasks.
在图像编辑中,两个关键方面是可编辑性,这决定了修改的程度,以及忠诚度,这反映了未修改要素的保存程度。然而,由于编辑性和忠诚之间的内在权衡关系,取得最佳结果具有挑战性。为了解决这个问题,我们提议“忠实指导与排练”(FGS),它能提高忠诚性,对编辑性的影响最小。FGS包含忠诚性指导,以加强对输入图像信息的保存,并引入一个时间表战略,解决编辑性和忠诚性之间的不匹配问题。实验结果表明,FGS在保持编辑性的同时,实现了较高的忠诚性。此外,它与各种编辑方法的兼容性使得能够对各种任务进行准确、高质量的图像编辑。
Article 124
Title@2025-06-26 (4): Efficient Skill Discovery via Regret-Aware Optimization
Title: Efficient Skill Discovery via Regret-Aware Optimization | Effiziente Skill Discovery durch regret-aware Optimierung | 通过Regret-Aware 优化发现高效技能 2506.21044v1 |
Authors (5): He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong
Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For existing methods, they focus on improving diversity through pure exploration, mutual information optimization, and learning temporal representation. Despite that they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i.e., skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, skill generation comes from an up-gradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods.
不受监督的技能发现旨在在开放式强化学习中学习多样化和可辨别的行为。对于现有的方法来说,它们侧重于通过纯粹的探索、相互的信息优化和学习时间代表性来改进多样性。尽管这些方法在探索方面表现良好,但在效率方面仍然有限,特别是在高维情况下。在这项工作中,我们把技能发现设置为技能创造和政策学习的微量游戏,在时间代表性学习的基础上提出一种遗憾认识方法,在可升级政策实力的方向上扩大所发现的技能空间。拟议方法背后的关键洞察力是,技能发现与政策学习相对应,即,弱力技能应该进一步探索,同时减少对汇聚力量技能的探索。作为执行,我们以遗憾感记分力水平,用可学习的技能生成器指导技能发现技能。为了避免变换代,技能生成来自高层次的技能。我们在复杂度和维度大小不同的环境中进行实验。根据经验,我们的方法发现,在效率和多样性方面,我们的方法比基线都差。此外,作为执行,我们的方法在高维度环境中实现了15 %的改进。
Article 125
Title@2025-06-26 (4): Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
Title: Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning | Strenge Subgoal Execution: Zuverlässige Langzeitplanung im Hierarchischen Stärkungslernen | 严格次级目标执行:在等级强化学习中可靠的长期规划 2506.21039v1 |
Authors (4): Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han
Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
长视分目标任务对强化学习(RL)构成根本性挑战,特别是在目标距离遥远、回报稀少的情况下;虽然以等级和图表为基础的方法提供部分解决办法,但它们往往受到次级目标不可行和低效率规划的影响;我们引入了严格分目标执行(SSE),这是一个基于图表的分级RL框架,通过结构性限制高层决策,强制实现单步子目标;为了加强探索,SSE采用分解的勘探政策,系统地在目标空间的探索区域中进行分解;此外,对失败程度的路径进行改进,根据观察到的低成功率动态调整以图表为基础的规划,从而改进次级目标的可靠性;各种长方位基准的实验结果表明SE在效率和成功率方面始终超越现有以目标为条件的RL和分级RL方法。
Article 126
Title@2025-06-26 (4): RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Title: RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment | RL-Selector: Verstärkte lernorientierte Datenauswahl über Redundanzbewertung | RL-选择者:通过裁员评估甄选强化学习指导数据 2506.21037v1 |
Authors (4): Suorong Yang, Peijia Li, Furao Shen, Jian Zhao
Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more data-efficient training paradigms. Data selection has shown promise to mitigate redundancy by identifying the most representative samples, thereby reducing training costs without compromising performance. Existing methods typically rely on static scoring metrics or pretrained models, overlooking the combined effect of selected samples and their evolving dynamics during training. We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships, capturing the intrinsic structure of the dataset. Based on this, we reformulate data selection as a reinforcement learning (RL) process and propose RL-Selector, where a lightweight RL agent optimizes the selection policy by leveraging epsilon-sample cover derived from evolving dataset distribution as a reward signal. Extensive experiments across benchmark datasets and diverse architectures demonstrate that our method consistently outperforms existing state-of-the-art baselines. Models trained with our selected datasets show enhanced generalization performance with improved training efficiency.
现代深层结构往往依赖大型数据集,但关于这些数据集的培训往往需要大量计算和存储间接费用。现实世界数据集往往包含大量冗余,导致需要数据效率更高的培训模式。数据选择显示,通过确定最具代表性的样本,有望减少冗余,从而降低培训成本,同时又不损害性能。现有方法通常依赖静态评分指标或预培训模型,忽视选定样本的综合效应及其在培训期间不断变化的动态。我们引入了环西龙-sample覆盖概念,这一概念根据各种抽样关系对样本冗余进行量化,捕捉数据集的内在结构。基于这一概念,我们重新配置数据选择,作为强化学习(RL)进程,并提出RL-选择者,在这种选择中,轻量的RL代理通过利用不断演变的数据集分布作为奖赏信号,优化选择政策。跨基准数据集和不同结构的广泛实验表明,我们的方法始终超越了现有的最新基准,捕捉了数据集的内在结构。我们经过培训后,通过改进的性能改进的一般性能模型显示我们改进的一般性能。
Article 127
Title@2025-06-26 (4): An Information-Theoretic Analysis for Federated Learning under Concept Drift
Title: An Information-Theoretic Analysis for Federated Learning under Concept Drift | Eine informationstheoretische Analyse für das Federated Learning unter Konzept Drift | 根据 “ 漂流概念 “ 进行的联邦学习信息理论分析 2506.21036v1 |
Authors (3): Fu Peng, Meng Zhang, Ming Tang
Recent studies in federated learning (FL) commonly train models on static datasets. However, real-world data often arrives as streams with shifting distributions, causing performance degradation known as concept drift. This paper analyzes FL performance under concept drift using information theory and proposes an algorithm to mitigate the performance degradation. We model concept drift as a Markov chain and introduce the \emph{Stationary Generalization Error} to assess a model’s capability to capture characteristics of future unseen data. Its upper bound is derived using KL divergence and mutual information. We study three drift patterns (periodic, gradual, and random) and their impact on FL performance. Inspired by this, we propose an algorithm that regularizes the empirical risk minimization approach with KL divergence and mutual information, thereby enhancing long-term performance. We also explore the performance-cost tradeoff by identifying a Pareto front. To validate our approach, we build an FL testbed using Raspberry Pi4 devices. Experimental results corroborate with theoretical findings, confirming that drift patterns significantly affect performance. Our method consistently outperforms existing approaches for these three patterns, demonstrating its effectiveness in adapting concept drift in FL.
联邦学习(FL)的近期研究通常对静态数据集进行训练。然而,真实世界数据通常以流体流形式出现,分布变化,导致性能退化,称为概念流,本文利用信息理论分析概念流的FL性能,并提出减少性能退化的算法。我们将漂浮概念建为Markov链,并引入“标准通用错误”来评估模型捕捉未来不可见数据特性的能力。它的上界是利用KL差异和相互信息得出的。我们研究了三种漂流模式(周期、渐进和随机)及其对FL性能的影响。我们为此提出一种算法,将实验风险最小化方法与KL差异和相互信息规范起来,从而提高长期性能。我们还通过确定Pareto前台来探索性能成本权衡。为了验证我们的方法,我们用 Raspberry Pi4 设备建立了一个FL 测试台。实验结果与理论结论相证实,证实了这种漂移模式对性能的显著影响。我们的方法始终超越了这三种模式的现有方法,并展示了在漂移法上的概念的有效性。
Article 128
Title@2025-06-26 (4): SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
Title: SceneGenAgent: Precise Industrial Scene Generation with Coding Agent | SceneGenAgent: Präzise industrielle Szenegenerierung mit Coding Agent | SceneGenerAgenti: 精密工业场景与编码剂生成 2410.21909v3 |
Authors (8): Xiao Xia, Dan Zhang, Zibo Liao, Zhenyu Hou, Tianrui Sun, Jing Li, Ling Fu, Yuxiao Dong
The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at https://github.com/THUDM/SceneGenAgent .
工业场景模型对于工业制造业的模拟至关重要。大型语言模型(LLMS)在用文字描述生成一般的3D场景方面取得了显著进展,而利用LLMS生成工业场景则因其对精确测量和定位的需求而带来了独特的挑战,需要对空间安排进行复杂的规划。为了应对这一挑战,我们引入了C#代码生成工业场景的基于LLM的CeneGenAgenti代理商SceenGenAgenti(CeneGenAgenti)确保了精确的布局规划,通过结构化和可计算的格式、布局核查以及迭接的完善以满足工业场景的数量要求。实验结果表明,SceneGenAgenent所驱动的LMS超过其最初的性能,在真实世界工业场景生成任务中达到高达81.0%的成功率,并有效地满足了大多数场景生成要求。为了进一步提高无障碍性,我们建造了SeenInstruct(SenInGPTHMM),一个旨在将开源LMMS-GPTSentrentrental 3./MUD)和MSUDSUDMS。
Article 129
Title@2025-06-26 (4): Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning
Title: Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Little By Little: Kontinuierliches Lernen über selbsttätiges Sparse Mixture-of-Rank Adaptives Lernen | 小小小小:通过自发的微小混血体适应性学习不断学习 2506.21035v1 |
Authors (6): Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong
Continual learning (CL) with large pre-trained models is challenged by catastrophic forgetting and task interference. Existing LoRA-based Mixture-of-Experts (MoE) approaches mitigate forgetting by assigning and freezing task-specific adapters, but suffer from interference, redundancy, and ambiguous routing due to coarse adapter-level selection. However, this design introduces three key challenges: 1) Interference: Activating full LoRA experts per input leads to subspace interference and prevents selective reuse of useful components across tasks. 2) Redundancy: Newly added experts often duplicate or contradict existing knowledge due to unnecessary activation of unrelated ranks and insufficient reuse of relevant ones. 3) Ambiguity: Overlapping features across tasks confuse the router, resulting in unstable expert assignments. As more experts accumulate, earlier task routing degrades, accelerating forgetting. We propose MoRA, a Mixture-of-Rank Adaptive learning approach with self-activated and sparse rank activation for CL. Unlike mixing multiple low-rank matrices, MoRA decomposes each rank-r update into r rank-1 components, each treated as an independent expert, enabling fine-grained mixture of rank-1 expert utilization while mitigating interference and redundancy. To avoid ambiguous routing, we propose that each rank-1 expert can infer its own relevance via intermediate activations. Coupled with our proposed rank pruning and activation budgets, MoRA adaptively selects a sparse mixture of ranks per input. We validate MoRA on continual learning tasks with CLIP and large language models (LLMs), analyzing both in-domain learning and out-of-domain forgetting/generalization during fine-tuning. MoRA shows significant effectiveness on enhancing CL with PTMs, and improving generalization while mitigating forgetting.
具有大量预先培训模型的持续学习(CL)受到灾难性的遗忘和任务干扰的挑战。现有的基于LORA的Mixture-Experts(MoE)方法通过指派和冻结特定任务适应器来减轻遗忘,但因粗糙的适应器一级选择而受到干扰、冗余和模糊路由。然而,这一设计提出了三大挑战:(1) 干扰:启动每个输入的全LORA专家导致子空间干扰,并防止有选择地重复使用有用的部件。(2) 重复性:由于不必要地激活不相关级别和不足够地再利用相关类别,新增加的专家经常重复或违背现有知识。(3) 模糊性:任务重叠性使路由器混淆,导致专家任务不稳定。随着更多的专家积累、早于任务调整的退化,加速遗忘。 我们提议采用MORA, 混合-Mixture-Rank的适应性学习方法,同时为CLLA提出普通级的升级和低级激活。我们不将多种低级的变换基体混合、摩拉解将每级更新一个级的级更新为排名-1部分,同时提出大幅的士比级学习。
Article 130
Title@2025-06-26 (4): PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
Title: PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling | PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung | PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v2 |
Authors (6): Yuxuan Yue, Zukang Xu, Zhihang Yuan, Dawei Yang, Jianlong Wu, Liqiang Nie
Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current VQ works, places greater emphasis on reducing the magnitude error. This property is contrary to the above finding, unavoidably leading to larger quantization errors. To these ends, this paper proposes Polar Coordinate Decoupled Vector Quantization (PCDVQ), an effective and efficient VQ framework consisting of two key modules: 1) Polar Coordinate Decoupling (PCD), which transforms vectors into their polar coordinate representations and perform independent quantization of the direction and magnitude parameters.2) Distribution Aligned Codebook Construction (DACC), which optimizes the direction and magnitude codebooks in accordance with the source distribution. Experimental results show that PCDVQ outperforms baseline methods at 2-bit level by at least 1.5\% zero-shot accuracy, establishing a novel paradigm for accurate and highly compressed LLMs.
大型语言模型(LLMS) 由于其巨大的参数规模, 在边缘部署中面临巨大的挑战 。 矢量量量( VQ) 是一个基于集群的量化方法, 是一个非常低的比位( 即便在 2 位) 和相当精度的通用解决方案。 由于矢量是数学和物理的数量, 具有方向和规模, 现有的 VQ 工作通常会同时将其量化。 然而, 我们发现, 方向与数量相比, 对四分化的敏感度比数量大得多。 例如, 矢量量量量( VQQ) 的重量矢量矢量( VQQQ ) 方向的精确度下降分别为46.5 和 2.3 。 这一差距甚至随着组群集中心的减少而加大。 此外, Euclideaden 距离是当前VQ的矢量相似度的通用测量标准, 更强调减少量误差。 然而, 与上述发现相反, 不可避免导致更大的二次量度差差差差值错误。 例如, 本文提议在 LLaMA2 递解量量量量量值运算值 值 流流流流值 的精确度( Plational- dalomal- dalmoalmocalmoalalalalalalalalalalalalalalal) orizalalalalalalizmal) 框架, 。
Article 131
Title@2025-06-26 (4): TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence
Title: TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence | TRIDENT: Tri-Modal Molecular Representative Learning mit taxonomischen Anmerkungen und lokaler Korrespondenz | 三模式分子代表性学习,具有分类说明和当地通讯 2506.21028v1 |
Authors (9): Feng Jiang, Mangal Prakash, Hehuan Ma, Jianyuan Deng, Yuzhi Guo, Amina Mollaysa, Tommaso Mansi, Rui Liao, Junzhou Huang
Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 11 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction.
分子财产预测旨在了解如何将化学结构映射成功能特性。虽然多式联运学习已成为学习分子代表性的强大范例,但先前的工作基本上忽视了分子的文字和分类信息,以进行代表性学习。我们引入了TRIDENT,这是一个将分子SMILES、文字描述和分类功能说明相结合的新框架,以学习丰富的分子代表性;为此,我们编撰了一组全面的分子-文本数据集,配有结构化的、多层次功能说明。尽管多式联运学习已成为学习分子代表性的强大范例,但以前的工作基本上忽视了分子的文字和分类信息。我们引入了TRIDENT,这是一个将分子子结构及其相应的子文字描述详细结合起来的新的地方调整目标。一个基于动力的机制,动态地平衡了全球和地方之间的平衡,使模型能够学习广泛的功能性语义和精细度结构功能特征绘图。TRIDENT使用一个基于量的调整目标,以在全球水平上联合协调三模式的三模式,使软性、几何测量-测量系统在各种模式上保持一致。此外,TRIDENT还引入一个新的本地调整目标,展示了分子结构预测价值。
Article 132
Title@2025-06-26 (4): Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems
Title: Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems | Mischung von Experten-augmented Deep Unfolding für Aktivitätserkennung in IRS-gestützten Systemen | IRS辅助系统中活动探测专家加固深载混合体 2502.20183v2 |
Authors (5): Zeyi Ren, Qingfeng Lin, Jingreng Lei, Yang Li, Yik-Chung Wu
In the realm of activity detection for massive machine-type communications, intelligent reflecting surfaces (IRS) have shown significant potential in enhancing coverage for devices lacking direct connections to the base station (BS). However, traditional activity detection methods are typically designed for a single type of channel model, which does not reflect the complexities of real-world scenarios, particularly in systems incorporating IRS. To address this challenge, this paper introduces a novel approach that combines model-driven deep unfolding with a mixture of experts (MoE) framework. By automatically selecting one of three expert designs and applying it to the unfolded projected gradient method, our approach eliminates the need for prior knowledge of channel types between devices and the BS. Simulation results demonstrate that the proposed MoE-augmented deep unfolding method surpasses the traditional covariance-based method and black-box neural network design, delivering superior detection performance under mixed channel fading conditions.
在大规模机器型通信的活动探测领域,智能反射表面(IRS)在扩大与基地站缺乏直接联系的装置的覆盖范围方面显示出巨大的潜力,然而,传统活动探测方法通常是为单一类型的频道模型设计的,该模型没有反映真实世界情景的复杂性,特别是在包含IRS的系统中。为了应对这一挑战,本文件采用了一种新颖的方法,将模型驱动的深度与专家混合框架(MOE)结合起来。通过自动选择三种专家设计之一并将其应用到显示的预测梯度方法,我们的方法消除了事先了解装置和BS之间频道类型的必要性。模拟结果表明,拟议的MOE强化的深层开发方法超过了传统的共变法和黑盒神经网络设计,在混合通道退缩条件下提供了优异的探测性能。
Article 133
Title@2025-06-26 (4): HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation
Title: HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation | HybridQ: Hybrid-Klassisch-Quantum Generatives Adversariales Netzwerk für die Bildgenerierung von Hauterkrankungen | CCF: 皮肤疾病成像生成的混合古金-量反反转网络 2506.21015v1 |
Authors (4): Qingyue Jiao, Kangyu Zheng, Yiyu Shi, Zhiding Liang
Machine learning-assisted diagnosis is gaining traction in skin disease detection, but training effective models requires large amounts of high-quality data. Skin disease datasets often suffer from class imbalance, privacy concerns, and object bias, making data augmentation essential. While classical generative models are widely used, they demand extensive computational resources and lengthy training time. Quantum computing offers a promising alternative, but existing quantum-based image generation methods can only yield grayscale low-quality images. Through a novel classical-quantum latent space fusion technique, our work overcomes this limitation and introduces the first classical-quantum generative adversarial network (GAN) capable of generating color medical images. Our model outperforms classical deep convolutional GANs and existing hybrid classical-quantum GANs in both image generation quality and classification performance boost when used as data augmentation. Moreover, the performance boost is comparable with that achieved using state-of-the-art classical generative models, yet with over 25 times fewer parameters and 10 times fewer training epochs. Such results suggest a promising future for quantum image generation as quantum hardware advances. Finally, we demonstrate the robust performance of our model on real IBM quantum machine with hardware noise.
皮肤疾病数据集往往存在阶级不平衡、隐私问题和对象偏差,因此数据扩增至关重要。古典基因模型被广泛使用,但它们需要大量的计算资源和漫长的培训时间。 量子计算提供了一个充满希望的替代方法,但现有的量子成像生成方法只能产生灰度的低质量图像。通过一种新型古典-量子潜伏空间融合技术,我们的工作克服了这一限制,并引入了第一个能够生成彩色医学图像的古典-量子基因对抗网络(GAN ) 。我们的模型优于古典深层基因对抗网络(GAN ) , 以及现有的混合古典- 量子 GAN , 两者在图像生成质量和分类性能提升数据时都要求大量使用。 此外, 性能提升与使用最先进的古典基因化模型所实现的相似, 但仍有超过25倍的参数和10倍的培训。 这样的结果显示, 量子图像生成的前景很有希望, 作为量子硬件进步。 最后,我们用硬质硬件模型展示了我们真正的硬质的硬质模型。
Article 134
Title@2025-06-26 (4): Efficient Image Generation with Variadic Attention Heads
Title: Efficient Image Generation with Variadic Attention Heads | Effiziente Bildgenerierung mit verschiedenen Aufmerksamkeitsköpfen | 高效的图像生成,由Variadic关注组织负责人负责 2211.05770v3 |
Authors (5): Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi
While the integration of transformers in vision models have yielded significant improvements on vision tasks they still require significant amounts of computation for both training and inference. Restricted attention mechanisms significantly reduce these computational burdens but come at the cost of losing either global or local coherence. We propose a simple, yet powerful method to reduce these trade-offs: allow the attention heads of a single transformer to attend to multiple receptive fields. We demonstrate our method utilizing Neighborhood Attention (NA) and integrate it into a StyleGAN based architecture for image generation. With this work, dubbed StyleNAT, we are able to achieve a FID of 2.05 on FFHQ, a 6% improvement over StyleGAN-XL, while utilizing 28% fewer parameters and with 4$\times$ the throughput capacity. StyleNAT achieves the Pareto Frontier on FFHQ-256 and demonstrates powerful and efficient image generation on other datasets. Our code and model checkpoints are publicly available at: https://github.com/SHI-Labs/StyleNAT
虽然将变压器纳入愿景模型在愿景任务方面已取得了显著改进,但它们仍需要大量的培训和推断计算。 限制关注机制大大减少了这些计算负担,但以失去全球或地方一致性为代价。 我们提出一个简单而有力的方法来减少这些权衡:让单一变压器的负责人关注多个可接受域。 我们展示了我们使用邻里注意(NA)的方法,并将其纳入基于StyleGAN的图像生成架构。 通过这项被称为StyleNAT的工作,我们能够在FFHQ上实现2.05的FID,比StyleGAN-XL改进6%,同时利用了28%的参数和4倍的吞吐量能力。StyleNAT在FFHQ-256上实现了Pareto边界,并在其他数据集上展示了强大而高效的图像生成。我们的代码和模型检查站在https://github.com/SHI-Labs/StyNAT上公开提供:https://github.StyleNAT。
Article 135
Title@2025-06-26 (4): Proximal Point Method for Online Saddle Point Problem
Title: Proximal Point Method for Online Saddle Point Problem | Proximale Point-Methode für Online-Sättelpunkt-Problem | 在线搭配点问题的近点方法 2407.04591v3 |
Authors (2): Qing-xin Meng, Jian-wei Liu
This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method (OPPM), the Optimistic OPPM (OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric. The technical appendix and code can be found at https://github.com/qingxin6174/PPM-for-OSP.
本文侧重于在线马鞍问题, 包括一系列双玩者时间变换的调子。 考虑到环境的不常态性, 我们采用双性差和动态纳什平衡遗憾作为算法设计的性能衡量标准。 我们提出了三种准点方法的变体: 在线准点法(OPPM)、 最佳OPPPM(OPOPPPM) 和拥有多个预测器的 OptOPPM 。 每种算法都保证双性差和动态纳什平衡遗憾的上限, 在测量两性差时接近最佳性。 具体地说, 在某些良性环境, 如固定报酬功能的顺序, 这些算法保持几乎不变的矩阵约束。 实验结果进一步验证了这些算法的有效性。 最后, 本文讨论了与使用动态纳什平衡遗憾作为性能衡量标准相关的潜在可靠性问题。 技术附录和代码可以在 https://github.com/qingxin6174/ PPM-for- OSPP上找到。
Article 136
Title@2025-06-26 (4): Review learning: Real world validation of privacy preserving continual learning across medical institutions
Title: Review learning: Real world validation of privacy preserving continual learning across medical institutions | Review learning: Echte Welt-Validierung der Privatsphäre Erhaltung kontinuierlichen Lernens in medizinischen Einrichtungen | 审查学习:维护各医疗机构持续学习的隐私的真实世界验证 2210.09394v2 |
Authors (12): Jaesung Yoo, Sunghyuk Choi, Ye Seul Yang, Suhyeon Kim, Jieun Choi, Dongkyeong Lim, Yaeji Lim, Hyung Joon Joo, Dae Jung Kim, Rae Woong Park, Hyeong-Jin Yoon, Kwangsoo Kim
When a deep learning model is trained sequentially on different datasets, it often forgets the knowledge learned from previous data, a problem known as catastrophic forgetting. This damages the model’s performance on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we introduce “review learning” (RevL), a low cost continual learning algorithm for diagnosis prediction using electronic health records (EHR) within a PPDL framework. RevL generates data samples from the model which are used to review knowledge from previous datasets. Six simulated institutional experiments and one real-world experiment involving three medical institutions were conducted to validate RevL, using three binary classification EHR data. In the real-world experiment with data from 106,508 patients, the mean global area under the receiver operating curve was 0.710 for RevL and 0.655 for TL. These results demonstrate RevL’s ability to retain previously learned knowledge and its effectiveness in real-world PPDL scenarios. Our work establishes a realistic pipeline for PPDL research based on model transfers across institutions and highlights the practicality of continual learning in real-world medical settings using private EHR data.
当一个深层次学习模型在不同的数据集上连续培训时,它往往忘记从以往数据中获取的知识,这是一个被称为灾难性的遗忘的问题。这损害了该模型在不同的数据集上的业绩,而该模型在基于转移学习(TL)的隐私保护深层次学习(PPDL)应用中至关重要。为了克服这一点,我们在PPDL框架内引入了“复习学习”(RevL),这是使用电子健康记录(EHR)进行诊断预测的低成本持续学习算法。RevL从用于审查从以往数据集中获取的知识的模型中生成数据样本。六次模拟机构实验和一次真实世界实验涉及三个医疗机构,利用三个二元分类EHR数据对RevL进行验证。在现实世界实验中使用106,508名病人的数据,接收者操作曲线下的平均全球区域为0.710,TL为0.655。这些结果表明,RevL有能力保留以前学到的知识及其在现实世界的PDDL假设中的有效性。我们的工作为PDL研究建立了现实的管道,其基础是各机构之间的模型转移,并突出了在实际学习世界中不断学习的实践环境。
Article 137
Title@2025-06-26 (4): Distilling Normalizing Flows
Title: Distilling Normalizing Flows | Destillieren von Normalisierungsströmen | 保持正常流动 2506.21003v1 |
Authors (6): Steven Walton, Valeriy Klyukin, Maksim Artemev, Denis Derkach, Nikita Orlov, Humphrey Shi
Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and analyze the probability distribution. The downside of these models is that they are often more difficult to train and have lower sampling quality. Normalizing flows are explicit density models, that use composable bijective functions to turn an intractable probability function into a tractable one. In this work, we present novel knowledge distillation techniques to increase sampling quality and density estimation of smaller student normalizing flows. We seek to study the capacity of knowledge distillation in Compositional Normalizing Flows to understand the benefits and weaknesses provided by these architectures. Normalizing flows have unique properties that allow for a non-traditional forms of knowledge transfer, where we can transfer that knowledge within intermediate layers. We find that through this distillation, we can make students significantly smaller while making substantial performance gains over a non-distilled student. With smaller models there is a proportionally increased throughput as this is dependent upon the number of bijectors, and thus parameters, in the network.
显性密度学习者由于有能力更好地模拟概率分布,正在成为一种日益流行的基因模型的基因模型技术。他们比基因反反转网络具有优势,因为他们有能力进行密度估计,并且具有精确的潜伏可变推法。这有许多优势,包括:能够简单地进行内插,计算样本可能性,分析概率分布。这些模型的下方是,它们往往更难进行培训,而且抽样质量较低。 标准化流是明确的密度模型,它们使用可折射的双向函数,将一个难以调和的概率函数转换成可移动的函数。在这项工作中,我们提出了新的知识蒸馏技术,以提高对较小学生正常流动的抽样质量和密度估计。我们试图研究在结构的正常流动中进行知识蒸馏的能力,以了解这些结构所提供的好处和弱点。 正常流动具有独特的特性,允许非传统形式的知识转让,我们可以在中间层中传输知识。我们发现,通过这种蒸馏,我们可以使学生大大缩小规模,同时在非静止的学生正常的网络参数上取得实质性的绩效收益。我们试图研究,这种模型通过一个小的成成成成成一个成一个成一个成一个成一个成一个成一成一成一成一成一的模型。
Article 138
Title@2025-06-26 (4): Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process
Title: Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process | Genetischer Algorithmus mit innovativen Chromosomenmustern im Zuchtprozess | 育种过程中创新性染色体模式的遗传数值 2501.18184v3 |
Authors (1): Qingchuan Lyu
This paper proposes Genetic Algorithm with Border Trades (GAB), a novel modification of the standard genetic algorithm that enhances exploration by incorporating new chromosome patterns in the breeding process. This approach significantly mitigates premature convergence and improves search diversity. Empirically, GAB achieves up to 8x higher fitness and 10x faster convergence on complex job scheduling problems compared to standard Genetic Algorithms, reaching average fitness scores of 888 versus 106 in under 20 seconds. On the classic Flip-Flop problem, GAB consistently finds optimal or near-optimal solutions in fewer generations, even as input sizes scale to thousands of bits. These results highlight GAB as a highly effective and computationally efficient alternative for solving large-scale combinatorial optimization problems.
本文建议对标准遗传算法进行新的修改,将新的染色体模式纳入育种过程,从而增进探索,从而大大减轻过早的趋同,改进了搜索的多样性,与标准的遗传算法相比,GAB在复杂的工作时间安排问题上取得了高达8倍的健身率和10倍的更快的趋同,在20秒内达到888比106的平均健身率。关于典型的Flip-Flop问题,GAB在较少的几代人中始终能找到最佳或接近最佳的解决方案,即使是作为成千位位的输入大小。这些结果突出表明GAB是解决大规模组合优化问题的高效和计算效率高的替代办法。
Article 139
Title@2025-06-26 (4): Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Title: Pretrained Reversible Generation as Unsupervised Visual Representation Learning | Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen | 作为无人监督的视觉代表学习 2412.01787v3 |
Authors (7): Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang
Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous generation model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. This framework enables the flexible selection of feature hierarchies tailored to specific downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model based methods, including 78% top-1 accuracy on ImageNet at a resolution of 64*64. Extensive ablation studies, including out-of-distribution evaluations, further validate the effectiveness of our approach. Code is available at https://github.com/opendilab/PRG.
基于分数比对和流动比对的最近基因化模型具有显著进步的代代相传任务,但它们在歧视性任务方面的潜力仍未得到充分探讨。以前的方法,如基因分类师,由于设计复杂,没有充分利用这些模型的能力来完成歧视性任务。我们提议,通过扭转事先经过训练的连续代比对模型的基因化过程,产生不受监督的表示。PRG有效地重新利用不受监督的基因化模型,利用其强大的能力作为下游任务的强大和可通用的特征提取器。这个框架使得能够灵活选择适合具体下游任务的特征等级。我们的方法始终超越了先前的方法,超越了多种基准,在基于基因化模型的方法中取得最先进的性能,包括64*64年决议对图像网的78%至1的精度。广泛的放大研究,包括分配外评价,进一步验证我们的方法的有效性。代码见https://github.com/opendilab/PRG。
Article 140
Title@2025-06-26 (4): Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
Title: Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance | Schritt für Schritt Video-zu-Audio-Synthese über Negative Audio-Anleitung | 通过消极音频指导,逐步进行视频到视听合成 2506.20995v1 |
Authors (4): Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji
We propose a novel step-by-step video-to-audio generation method that sequentially produces individual audio tracks, each corresponding to a specific sound event in the video. Our approach mirrors traditional Foley workflows, aiming to capture all sound events induced by a given video comprehensively. Each generation step is formulated as a guided video-to-audio synthesis task, conditioned on a target text prompt and previously generated audio tracks. This design is inspired by the idea of concept negation from prior compositional generation frameworks. To enable this guided generation, we introduce a training framework that leverages pre-trained video-to-audio models and eliminates the need for specialized paired datasets, allowing training on more accessible data. Experimental results demonstrate that our method generates multiple semantically distinct audio tracks for a single input video, leading to higher-quality composite audio synthesis than existing baselines.
我们建议一种新型的逐步录相到声频生成方法, 依次生成单个音频音轨, 每个音频音轨都对应视频中的具体事件。 我们的方法反映了传统的Foley工作流程, 目的是全面捕捉由特定视频引发的所有音频事件。 每一代步骤都是以一个目标文本为条件的带指导的视频到声频合成任务, 以一个快速和先前生成的音频音轨为条件。 这个设计受先前组成生成框架否定概念的理念的启发。 为了让这个有指导的一代能够利用预先培训的视频到音频模型, 并消除专门配对数据集的需求, 从而能够就更便于获取的数据进行培训。 实验结果显示, 我们的方法为单一输入视频生成了多种语言上不同的音轨, 从而导致比现有基线更高质量的复合音频合成。
Article 141
Title@2025-06-26 (4): Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity
Title: Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity | Überbrückung der Lücke zwischen Annäherung und Lernen durch Optimale Annäherung durch ReLU MLPs der Maximalregularität | 通过最大合规性RELU MLP,通过最佳接近缩小接近与学习之间的差距 2409.12335v4 |
Authors (2): Ruiyang Hong, Anastasis Kratsios
The foundations of deep learning are supported by the seemingly opposing perspectives of approximation or learning theory. The former advocates for large/expressive models that need not generalize, while the latter considers classes that generalize but may be too small/constrained to be universal approximators. Motivated by real-world deep learning implementations that are both expressive and statistically reliable, we ask: “Is there a class of neural networks that is both large enough to be universal but structured enough to generalize?” This paper constructively provides a positive answer to this question by identifying a highly structured class of ReLU multilayer perceptions (MLPs), which are optimal function approximators and are statistically well-behaved. We show that any $(L,\alpha)$-H"{o}lder function from $[0,1]^d$ to $[-n,n]$ can be approximated to a uniform $\mathcal{O}(1/n)$ error on $[0,1]^d$ with a sparsely connected ReLU MLP with the same H"{o}lder exponent $\alpha$ and coefficient $L$, of width $\mathcal{O}(dn^{d/\alpha})$, depth $\mathcal{O}(\log(d))$, with $\mathcal{O}(dn^{d/\alpha})$ nonzero parameters, and whose weights and biases take values in ${0,\pm 1/2}$ except in the first and last layers which instead have magnitude at-most $n$. Further, our class of MLPs achieves a near-optimal sample complexity of $\mathcal{O}(\log(N)/\sqrt{N})$ when given $N$ i.i.d. normalized sub-Gaussian training samples. We achieve this through a new construction that perfectly fits together linear pieces using Kuhn triangulations, along with a new proof technique which shows that our construction preserves the regularity of not only the H"{o}lder functions, but also any uniformly continuous function. Our results imply that neural networks can solve the McShane extension problem on suitable finite sets.
深层次学习的基础由近似或学习理论的表面相反视角支持。 前一个大/ 表达模型的代言人不需要泛泛化, 而后一个代言人认为, 普通化但太小/ 约束性过小, 无法成为通用的相近者。 我们问 : “ 真实的深层次学习执行过程的动力, 既能表达性又统计性可靠 ” 神经网络的大小是否足够普遍, 但结构性足以概括化? 本文建设性地回答了这一问题, 确定了高度结构化的 RELU多层概念( MLPs) , 最优化的功能是相似的, 高级的, 并且统计性地表示 $1, 美元到 $, 美元, 并且当我们的新( =% 美元) 和 美元( 美元) 的变现性能, 当我们的新( RU) MPseral_ 和 美元( 美元) 的变异性( 美元) 开始, 以美元 美元/ Omax d) 。
Article 142
Title@2025-06-26 (4): SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
Title: SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes | SharpZO: Hybrid Sharpness-Aware Vision Sprachmodell Prompt Tuning via Forward-Only Passes | SharpZO: 混合尖锐-敏锐视觉语言模型,通过前向-单行道快速调试 2506.20990v1 |
Authors (6): Yifan Yang, Zhen Zhang, Rupak Vignesh Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang
Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-variance evolutionary strategies (ES) or zeroth-order (ZO) optimization, and often fail to achieve satisfactory performance. In this paper, we propose a hybrid Sharpness-aware Zeroth-order optimization (SharpZO) approach, specifically designed to enhance the performance of ZO VLM fine-tuning via a sharpness-aware warm-up training. SharpZO features a two-stage optimization process: a sharpness-aware ES stage that globally explores and smooths the loss landscape to construct a strong initialization, followed by a fine-grained local search via sparse ZO optimization. The entire optimization relies solely on forward passes. Detailed theoretical analysis and extensive experiments on CLIP models demonstrate that SharpZO significantly improves accuracy and convergence speed, achieving up to 7% average gain over state-of-the-art forward-only methods.
微调视觉语言模型(VLMS)在各种下游任务中取得了显著的成绩;然而,它要求通过反向调整(BP)获得模型梯度,使其不适合于记忆限制的、仅推断的边缘设备。为解决这一局限性,先前的工作探索了各种不考虑BP的微调方法。然而,这些方法往往依赖于高差异进化战略(ES)或零级优化(ZO),往往不能取得令人满意的业绩。在本文件中,我们建议采用混合的夏丁斯-觉Zeros-顺序优化(SharpZO)方法,具体设计该方法的目的是通过敏锐度-敏锐的热度培训提高ZO VLM微调的性能。 SharpZO具有两个阶段的优化过程:敏锐的ES级演化阶段,通过全球探索和平缓冲损失场景以构建强大的初始化,然后通过稀薄ZO优化进行精细的本地搜索。整个优化完全依靠前方的传递。关于CLIP模型的详细理论分析和广泛的实验表明,SharpZO的高级前进速度大大改进了前向率和速度。
Article 143
Title@2025-06-26 (4): Can Gradient Descent Simulate Prompting?
Title: Can Gradient Descent Simulate Prompting? | Kann Gradient Descent Simulate Prompting? | 梯子源模拟能刺激吗? 2506.20989v1 |
Authors (3): Eric Zhang, Leshem Choshen, Jacob Andreas
There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM’s own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model performance – showing improvement on the ``reversal curse’’ tasks, and answering questions about text passages after a single gradient update. These results suggest that, with appropriate initialization, gradient descent can be surprisingly expressive. Our results suggest new avenues for long-context modeling and offer insight into the generalization capabilities of gradient-based learning.
将新信息纳入语言模式(LM)有两种主要方式:改变其快速或改变其参数,例如微调。参数更新不产生长期存储成本。然而,对于许多模型更新来说,推动效果要大得多:推动模型能够从单一实例中强有力地概括,并得出标准微调下不会发生的逻辑推论。可以修改模型,以便微调能够效仿推理结果吗?本文描述了一个元培训LM方法,例如梯度更新可以模仿调整新信息的效果。我们的方法使用基于梯度的元学习工具,但使用LM自己推动的预测作为目标,消除对地面真相标签的需求。随后的梯度下降培训恢复了一些(有时还恢复全部)激励模型性能 – – 显示“逆向诅咒”任务的改进,并在单一梯度更新后回答关于文本段落的问题。这些结果表明,随着适当的初始化,梯度下降率可以令人惊讶地表达。我们的结果显示,通过新的途径可以进行长文本模型建模和直观基于梯度的一般学习能力。
Article 144
Title@2025-06-26 (4): Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem
Title: Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem | Split-Merge: Ein unterschiedsbasierter Ansatz für das Dominante Eigenwertproblem | Split-Merge:对支配性电子价值问题采取基于差异的办法 2501.15131v2 |
Authors (2): Xiaozhi Liu, Yong Xia
The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous optimization-driven applications. Traditional methods, typically based on the \textit{Quotient} formulation, often suffer from challenges related to computational efficiency and reliance on prior spectral knowledge. In this work, we leverage the alternative \textit{Difference} formulation to reinterpret the classical power method as a first-order optimization algorithm. This perspective allows for a novel convergence analysis and facilitates the development of accelerated variants with larger step-sizes, achieving faster convergence without additional computational cost. Building on this insight, we introduce a generalized family of Difference-based methods, with the power method as a special case. Within this family, we propose Split-Merge, an algorithm that attains accelerated convergence without requiring spectral knowledge and operates solely via matrix-vector products. Extensive experiments on both synthetic and real-world datasets demonstrate that Split-Merge consistently outperforms state-of-the-art methods in both efficiency and scalability. In particular, it achieves more than a $\boldsymbol{10\times}$ speedup over the classical power method, underscoring its practical effectiveness for large-scale problems.
计算正正半不完全基质矩阵的占主导性成象仪是许多优化驱动的应用的基石。通常基于\ textituuatient} 配方的传统方法通常在计算效率和依赖先前光谱知识方面面临挑战。在这项工作中,我们利用替代的\ textit{ difference} 配方来重新解释古典权力方法,将其作为一阶优化算法。这一视角使得可以进行新的趋同分析,并有利于开发具有更大分级尺寸的加速变方,在不增加计算成本的情况下实现更快的趋同。基于这一洞察,我们引入了基于差异方法的普遍组合,以权力方法为特例。在这个组中,我们提出“Splet-Merge”算法,这种算法在不需要光谱知识的情况下实现加速趋同,并且仅仅通过矩阵-矢量产品运作。关于合成和真实世界数据集的广泛实验表明,Splet-Melge在效率和可扩展性两方面都始终超越了状态。特别是,它比一个超高速度的模型问题更高。
Article 145
Title@2025-06-26 (4): Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Title: Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations | Generalisierte Tensor-basierte Parameter-Effizient Feinsteuerung über Lie Group Transformationen | 通过 “ 谎言集团变形 “ 进行通用的Tensor基准参数有效精美调整 2504.00851v2 |
Authors (6): Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. However, the wide range of tasks and high computational costs make full fine-tuning impractical. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. Despite the success of these methods, they are primarily designed for linear layers, focusing on two-dimensional matrices while largely ignoring higher-dimensional parameter spaces like convolutional kernels. Moreover, directly applying these methods to higher-dimensional parameter spaces often disrupts their structural relationships. Given the rapid advancements in matrix-based PEFT methods, rather than designing a specialized strategy, we propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties. Specifically, we treat parameters as elements of a Lie group, with updates modeled as perturbations in the corresponding Lie algebra. These perturbations are mapped back to the Lie group through the exponential map, ensuring smooth, consistent updates that preserve the inherent structure of the parameter space. Extensive experiments on computer vision and natural language processing validate the effectiveness and versatility of our approach, demonstrating clear improvements over existing methods.
为各种下游任务调整经过培训的基建模型是人工智能中的一种核心做法。然而,由于任务繁多且计算成本高,因此完全微调不切实际。为了克服这一点,LORA等参数高效微调方法(PEFT)已经出现,并正在成为一个日益突出的研究焦点。尽管这些方法取得了成功,但它们主要针对线性层设计,侧重于二维矩阵,同时基本上忽略了像卷轴这样的高维参数空间。此外,直接将这些方法应用于高维参数空间往往会破坏它们的结构关系。鉴于基于矩阵的PEFT方法的快速进步,而不是设计专门战略,我们建议一种将基于矩阵的PEEFT方法推广到更高维度的参数空间而不损害其结构特性的一般化方法。具体地说,我们把参数当作一个Lie组的要素,在相应的Lie algebra 中以触觉为模型进行更新。这些参数通过指数映射图被追溯到Lie组,确保平稳、一致的更新,以维护参数空间的固有结构。关于计算机视觉和自然语言处理方法的宽度和反向验证现有方法的改进。
Article 146
Title@2025-06-26 (4): Explainable quantum regression algorithm with encoded data structure
Title: Explainable quantum regression algorithm with encoded data structure | Erklärbarer Quantenregressionsalgorithmus mit kodierter Datenstruktur | 具有编码数据结构的可解释量子回归算法 2307.03334v5 |
Authors (6): C. -C. Joseph Wang, F. Perkkola, I. Salmenperä, A. Meijer-van de Griend, J. K. Nurminen, R. S. Bennink
Hybrid variational quantum algorithms (VQAs) are promising for solving practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. However, with typical random ansatz or quantum alternating operator ansatz, derived variational quantum algorithms become a black box that cannot be trusted for model interpretation, not to mention deploying as applications in informing critical decisions: the results of these variational parameters are just rotational angles for the quantum gates and have nothing to do with interpretable values that a model can provide directly. In this paper, we construct the first interpretable quantum regression algorithm, in which the quantum state exactly encodes the classical data table and the variational parameters correspond directly to the regression coefficients, which are real numbers by construction, providing a high degree of model interpretability and minimal cost to optimize due to the right expressiveness. We also take advantage of the encoded data structure to reduce the time complexity of computing the regression map. To shorten the circuit depth for nonlinear regression, our algorithm can be extended by building nonlinear features by classical preprocessing as the independent encoded column vectors. Even though the realization of compressed encoding in superconducting qubits has been achieved by the less noisy compressed encoding recently by the authors, we envision potential quantum utilities with multi-qubit gates implemented in neutral cold atoms and ions.
混合混合量子算法(VQAs)对于解决诸如组合优化、量子化学模拟、量子机器学习和在噪音量子计算机上校正量子错误等实际问题很有希望。然而,由于典型的随机 ansatz 或量交替操作员 ansatz, 衍生的量子算法成为了一个黑盒,无法相信模型解释,更不用说作为应用工具的部署,为关键决定提供信息:这些变异参数的结果仅仅是量子门的旋转角度,与模型直接提供的可解释值无关。在本文中,我们构建了第一个可解释的量子回归算法,其中量子状态精确地编码了古典数据表和变异参数,这些参数直接对应回归系数,这些参数是真实的,提供了高度的模型解释性和最低的成本,以优化模型解释。我们还利用编码数据结构来降低计算回归图的时间复杂性。为了缩短非线性回归的电路深度,我们算算算法可以通过通过建立古典前量回归算性量子回归算法来扩展我们的算法,通过古典前处理方式将古型的量回归法化前的特性纳入古体化数据,作为中级数据,而实现了中级级级化的内压级化的高级数据,但我们通过最近实现了制版版级化的磁数据级化的磁数据级化的高级数据,通过制版化的机机母化的机母体化的机能,在制的机母矢算,我们实现了,通过较低的机母化的机母母化的机母体,我们实现了了在制化的机能级母体化的机母体化的机母体化的机母体,我们实现了了实现了通过最近的磁母体化的磁母体化的磁母体级母体级母体化的磁母体化的机母体化的机母体化机母体化机母体化机母体化机母体化的机母体化机母体化机母体上,我们通过实现。
Article 147
Title@2025-06-26 (4): EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
Title: EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora | Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora | EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v1 |
Authors (9): Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou
Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.
为克服这些限制,我们引入了基于图表的多层图集框架(Graph-RAG),它支持高效和可缩放的动态更新。我们的方法利用超机基于地方敏感散列(LSH),将原体分割并组织成等级图表结构,从而能够在不干扰现有表层结构的情况下高效和局部地插入新数据。设计消除了在新文件到达时进行再培训或代价高昂的重新计算的需要,同时保持高检索精确度和低悬浮度。大规模基准实验表明,EraRag与现有的图集系统相比,在更新时间和象征性消费方面达到了一定的降幅,同时提供了更高的精确性性性能。这项工作为RAG系统提供了一条实用的前进道路,这些系统必须超越不断增长的Corporia/M 数据调适度。
Article 148
Title@2025-06-26 (4): Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding
Title: Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding | Antikörper-Design und Optimierung mit mehrstufigen äquivarianten Graphen-Diffusions-Modellen für präzise, komplexe Antigen-Bindung | 防反体设计和优化,采用多种规模等同图形扩散模型,用于准确的复合抗原装订 2506.20957v1 |
Authors (4): Jiameng Chen, Xiantao Cai, Jia Wu, Wenbin Hu
Antibody design remains a critical challenge in therapeutic and diagnostic development, particularly for complex antigens with diverse binding interfaces. Current computational methods face two main limitations: (1) capturing geometric features while preserving symmetries, and (2) generalizing novel antigen interfaces. Despite recent advancements, these methods often fail to accurately capture molecular interactions and maintain structural integrity. To address these challenges, we propose \textbf{AbMEGD}, an end-to-end framework integrating \textbf{M}ulti-scale \textbf{E}quivariant \textbf{G}raph \textbf{D}iffusion for antibody sequence and structure co-design. Leveraging advanced geometric deep learning, AbMEGD combines atomic-level geometric features with residue-level embeddings, capturing local atomic details and global sequence-structure interactions. Its E(3)-equivariant diffusion method ensures geometric precision, computational efficiency, and robust generalizability for complex antigens. Furthermore, experiments using the SAbDab database demonstrate a 10.13\% increase in amino acid recovery, 3.32\% rise in improvement percentage, and a 0.062~\AA\ reduction in root mean square deviation within the critical CDR-H3 region compared to DiffAb, a leading antibody design model. These results highlight AbMEGD’s ability to balance structural integrity with improved functionality, establishing a new benchmark for sequence-structure co-design and affinity optimization. The code is available at: https://github.com/Patrick221215/AbMEGD.
抗体设计仍然是治疗和诊断发展中的一项重大挑战,特别是对于具有多种约束界面的复杂抗原而言。当前的计算方法面临两个主要的局限性:(1)在保持对称的同时捕捉几何特征,以及(2)普及新型抗原界面。尽管最近取得了进步,但这些方法往往无法准确捕捉分子互动并保持结构完整性。为了应对这些挑战,我们提议了\ textbf{AMUGD},一个端到端的框架,将 & textbf{Multy-sial-altbf{E}qualevariant\ textb{G}Lapeh\textb{Dright_DrightA}Drextf{D}}}融合用于抗体序列和结构共同设计。尽管最近有所进步,但这些方法往往无法准确捕捉分子互动并保持结构完整性。为了应对这些挑战,我们提议了E(3)-Q-Q-Qrealfity传播模式,确保了这些地球测量精确性、计算效率,以及复杂抗原基因的可靠通用代码。此外,使用SAbDA-deal-dealdealdealalalalalal-ralal-ral-ralallieval-ral Aration Areval-revax a10.13 areval reval deal ral ral ral reval_reval_reval_reval_ral_ral rum a reval_rum a reval_ral_reval_reval_ reval_ rvial ral_ rvial__ral_ral_ral_ral__ralral_ral__ralral_rbal__rb__ a ral________ral_ral_ral_ral__ral_ral_ral______ral_ral__ral_____
Article 149
Title@2025-06-26 (4): Model State Arithmetic for Machine Unlearning
Title: Model State Arithmetic for Machine Unlearning | Modell Staat Arithmetik für Maschine Unlearning | 机器脱修示范国 2506.20941v1 |
Authors (4): Keivan Rezaei, Mehrdad Saberi, Abhilasha Ravichander, Soheil Feizi
Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance. Eliminating the influence of such problematic datapoints through complete retraining – by repeatedly pretraining the model on datasets that exclude these specific instances – is computationally prohibitive. For this reason, unlearning algorithms have emerged that aim to eliminate the influence of particular datapoints, while otherwise preserving the model – at a low computational cost. However, precisely estimating and undoing the influence of individual datapoints has proved to be challenging. In this work, we propose a new algorithm, MSA, for estimating and undoing the influence of datapoints – by leveraging model checkpoints i.e. artifacts capturing model states at different stages of pretraining. Our experimental results demonstrate that MSA consistently outperforms existing machine unlearning algorithms across multiple benchmarks, models, and evaluation metrics, suggesting that MSA could be an effective approach towards more flexible large language models that are capable of data erasure.
大型语言模型在网络数据大规模组合方面受过培训,其中可能包括私人数据、有版权的材料、事实不准确的数据,或降低模型性能的数据。通过全面再培训,反复对不包括这些具体实例的数据集模型进行预先培训,从而消除这类有问题的数据点的影响,这在计算上是令人望而却步。为此原因,出现了旨在消除特定数据点影响的不学习算法,同时以低计算成本保存模型。然而,精确估计和消除单个数据点的影响证明是具有挑战性的。在这项工作中,我们建议采用新的算法,即管理事务协议,用以估计和消除数据点的影响,方法是利用模型检查站,即掌握不同培训前阶段的模型状态。我们的实验结果表明,管理事务协议一贯优于多种基准、模型和评价衡量标准的现有机器不学习算法,表明管理事务协议可以是一种有效的方法,用于更灵活的大语言模型,能够消除数据。
Article 150
Title@2025-06-26 (4): Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics
Title: Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics | Prognose geopolitischer Ereignisse mit einem spare Temporal Fusion Transformer und Gaußschen Prozesshybrid: Eine Fallstudie in Nahost und US-Konfliktdynamik | 以松散的时空融合变异器和高斯进程混合器预测地缘政治事件:中东和美国冲突动态案例研究 2506.20935v1 |
Authors (2): Hsin-Hsiung Huang, Hayden Hampton
Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness, and overdispersion of such data cause standard deep learning models, including the Temporal Fusion Transformer (TFT), to produce unreliable long-horizon predictions. We introduce STFT-VNNGP, a hybrid architecture that won the 2023 Algorithms for Threat Detection (ATD) competition by overcoming these limitations. Designed to bridge this gap, our model employs a two-stage process: first, a TFT captures complex temporal dynamics to generate multi-quantile forecasts. These quantiles then serve as informed inputs for a Variational Nearest Neighbor Gaussian Process (VNNGP), which performs principled spatiotemporal smoothing and uncertainty quantification. In a case study forecasting conflict dynamics in the Middle East and the U.S., STFT-VNNGP consistently outperforms a standalone TFT, showing a superior ability to predict the timing and magnitude of bursty event periods, particularly at long-range horizons. This work offers a robust framework for generating more reliable and actionable intelligence from challenging event data, with all code and workflows made publicly available to ensure reproducibility.
从全球事件、语言和托恩数据库(GDELT)等数据来源预测地缘政治冲突,是国家安全面临的一个重大挑战。这些数据的内在宽度、爆发性和过度分散性导致标准的深度学习模型,包括时空融合变异变异器(TFT),以产生不可靠的长方位预测。我们引入了STFT-VNNNGP,这是一个通过克服这些限制而赢得2023年威胁探测(ATD)高分数(ATD)竞争的混合结构。为了弥合这一差距,我们的模型采用了一个两个阶段的过程:首先,TFT捕捉复杂的时间动态,以产生多量的预报。这些微量数据模型随后成为一个标准的深层次学习模型,包括时空融合变异变异变异变异变异变变变变变变变变变变变变变器(TFTFTTTTTTTT),以产生不小的时空变化动态。 我们引入了SSTFT-VNNNGP, 持续地将一个独立的TFTF, 展示出一个更强大的预测时间和规模的、更可靠、更具有挑战性的信息流流流流流的系统。
Article 151
Title@2025-06-26 (4): Lower Bounds on the Size of Markov Equivalence Classes
Title: Lower Bounds on the Size of Markov Equivalence Classes | Untere Grenzen auf der Größe der Markov-Äquivalenzklassen | 马克夫等等效类大小的下下界界圈 2506.20933v1 |
Authors (3): Erik Jahn, Frederick Eberhardt, Leonard J. Schulman
Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely observational data. Under the assumptions of acyclicity, causal sufficiency, and a uniform model prior, Markov equivalence classes are known to be small on average. In this paper, we show that this is no longer the case when any of these assumptions is relaxed. Specifically, we prove exponentially large lower bounds for the expected size of Markov equivalence classes in three settings: sparse random directed acyclic graphs, uniformly random acyclic directed mixed graphs, and uniformly random directed cyclic graphs.
原因发现算法通常只恢复到其 Markov 等效等级的因果图,除非作出额外的参数假设。这些等值类的大小反映了从纯观察数据中能够从基本因果图表中了解到的限度。根据周期性、因果充足性和先前统一模型的假设,Markov 等值类平均已知是很小的。在本文中,我们表明,当这些假设中的任何一种得到放松时,情况就不再是这样了。具体地说,我们证明,在三种情况下,Markov 等值类的预期规模极低:零散随机定向环流图、统一随机随机定向混合图和统一随机定向循环图。
Article 152
Title@2025-06-26 (4): Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market
Title: Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market | Quantum-Verstärkung-Learning-Trading-Agent für Sektor-Rotation auf dem Aktienmarkt Taiwan | 台湾股市部门轮换的量级强化学习贸易代理 2506.20930v1 |
Authors (3): Chi-Sheng Chen, Xinyu Zhang, Ya-Chuan Chen
We propose a hybrid quantum-classical reinforcement learning framework for sector rotation in the Taiwan stock market. Our system employs Proximal Policy Optimization (PPO) as the backbone algorithm and integrates both classical architectures (LSTM, Transformer) and quantum-enhanced models (QNN, QRWKV, QASA) as policy and value networks. An automated feature engineering pipeline extracts financial indicators from capital share data to ensure consistent model input across all configurations. Empirical backtesting reveals a key finding: although quantum-enhanced models consistently achieve higher training rewards, they underperform classical models in real-world investment metrics such as cumulative return and Sharpe ratio. This discrepancy highlights a core challenge in applying reinforcement learning to financial domains – namely, the mismatch between proxy reward signals and true investment objectives. Our analysis suggests that current reward designs may incentivize overfitting to short-term volatility rather than optimizing risk-adjusted returns. This issue is compounded by the inherent expressiveness and optimization instability of quantum circuits under Noisy Intermediate-Scale Quantum (NISQ) constraints. We discuss the implications of this reward-performance gap and propose directions for future improvement, including reward shaping, model regularization, and validation-based early stopping. Our work offers a reproducible benchmark and critical insights into the practical challenges of deploying quantum reinforcement learning in real-world finance.
我们为台湾股票市场部门轮值提出了一个混合的量子古典强化学习框架。我们的系统使用美化政策优化(PPO)作为主算法,并将古典建筑(LSTM、变异器)和量子强化模型(QNN、QRWKV、QASA)作为政策和价值网络。一个自动化特征工程管道从资本共享数据中提取金融指标,以确保所有配置的典型投入一致。经验背样测试揭示了一个关键发现:虽然量子强化模型不断获得更高的培训回报,但它们在诸如累积回报和夏普比率等真实世界投资指标方面落后于典型模式。这一差异突出表明了在将强化学习应用到金融领域(即代理奖励信号和真正投资目标之间的不匹配)方面所面临的核心挑战。我们的分析表明,目前的奖励设计可能鼓励过度适应短期波动而不是优化风险调整后的回报。这个问题由于以下内在的明显性和优化定量电路流的不稳定性而变得更加复杂:在长期投资指标(NISQQ)下,这些模型在累积回报率和夏普率比率等实际投资指标衡量标准中,我们提出了在确定未来投资改革方面的重要方向。
Article 153
Title@2025-06-26 (4): Active Learning for Manifold Gaussian Process Regression
Title: Active Learning for Manifold Gaussian Process Regression | Aktives Lernen für manifolde Gaußsche Prozessregression | Gaussian 进程倒退的 Manifide Gaussian 正在学习 2506.20928v1 |
Authors (4): Yuanxing Cheng, Lulu Kang, Yiwei Wang, Chun Liu
This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error. Experiments on synthetic data demonstrate superior performance over randomly sequential learning. The framework efficiently handles complex, discontinuous functions while preserving computational tractability, offering practical value for scientific and engineering applications. Future work will focus on scalability and uncertainty-aware manifold learning.
本文件为多重高斯进程(GP)回归引入了积极的学习框架,将多重学习与战略数据选择相结合,以提高高维空间的准确性。我们的方法共同优化了用于减少维度的神经网络和潜空的高斯进程倒退者,并辅之以一个能最大限度地减少全球预测错误的积极学习标准。合成数据的实验显示了优于随机顺序学习的优异性能。这个框架高效地处理复杂、不连续的功能,同时保持计算可牵引性,为科学和工程应用提供实用价值。未来的工作将侧重于可扩展性和有不确定性的多重学习。
Article 154
Title@2025-06-26 (4): Interpretable Representation Learning for Additive Rule Ensembles
Title: Interpretable Representation Learning for Additive Rule Ensembles | Interpretable Representative Learning for Additive Rule Ensembles | 补充规则会议的解释性代表性学习 2506.20927v1 |
Authors (4): Shahrzad Behzadimanesh, Pierre Le Bodic, Geoffrey I. Webb, Mario Boley
Small additive ensembles of symbolic rules offer interpretable prediction models. Traditionally, these ensembles use rule conditions based on conjunctions of simple threshold propositions $x \geq t$ on a single input variable $x$ and threshold $t$, resulting geometrically in axis-parallel polytopes as decision regions. While this form ensures a high degree of interpretability for individual rules and can be learned efficiently using the gradient boosting approach, it relies on having access to a curated set of expressive and ideally independent input features so that a small ensemble of axis-parallel regions can describe the target variable well. Absent such features, reaching sufficient accuracy requires increasing the number and complexity of individual rules, which diminishes the interpretability of the model. Here, we extend classical rule ensembles by introducing logical propositions with learnable sparse linear transformations of input variables, i.e., propositions of the form $\mathbf{x}^\mathrm{T}\mathbf{w} \geq t$, where $\mathbf{w}$ is a learnable sparse weight vector, enabling decision regions as general polytopes with oblique faces. We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression. Experimental results demonstrate that the proposed method efficiently constructs rule ensembles with the same test risk as state-of-the-art methods while significantly reducing model complexity across ten benchmark datasets.
象征性规则的小型添加性堆积提供了可解释的预测模型。 传统上, 这些组合使用基于简单阈值参数组合的规则条件 。 在单个输入变量x美元和阈值$t$t$的组合上使用美元\geq t$, 从而在轴- 平行的多端堆积中以几何方式作为决定区域。 虽然这种形式可以确保单个规则的高度可解释性,并且可以使用梯度推动方法有效地学习,但它依赖于访问一组缩略式表达式和理想的独立输入特性,这样一小块轴- 平行区域组合可以很好地描述目标变量。 存在这样的特性, 要达到足够精确就需要增加单个规则的数量和复杂性,从而降低模型的可解释性。 在这里, 我们扩展经典规则的组合, 引入逻辑参数, 以可学习的细小线性变变换输入变量, 例如, 以 $mathb=模型{x#maintlemaintrigle common- complillateal resultial resultial ress magle making making a commle restial restial rouple restial role rouple role resmle resmle role role role role role rolemental role roleglegleglemental rolegleglemental me) rodulegleglem rolemental rodu rouple rouplemental rouplemental rouplemental rogleglegle roma roma roma roma roma roma roma rodu rodu roma roma rod rod rod rodal rod rodal rodal rogal rodal rogal rogal rodal rod rodal rod rod romas rod rodal romas ro) ro) commal commal rogal romasmal ro) roma
Article 155
Title@2025-06-26 (4): LLM-guided Chemical Process Optimization with a Multi-Agent Approach
Title: LLM-guided Chemical Process Optimization with a Multi-Agent Approach | LLM-geführte chemische Prozessoptimierung mit einem Multi-Agent-Ansatz | LLM-LLM-制导化学过程 优化采用多机构办法 2506.20921v1 |
Authors (5): Tong Zeng, Srivathsan Badrinarayanan, Janghoon Ock, Cheng-Kai Lai, Amir Barati Farimani
Chemical process optimization is crucial to maximize production efficiency and economic performance. Traditional methods, including gradient-based solvers, evolutionary algorithms, and parameter grid searches, become impractical when operating constraints are ill-defined or unavailable, requiring engineers to rely on subjective heuristics to estimate feasible parameter ranges. To address this constraint definition bottleneck, we present a multi-agent framework of large language model (LLM) agents that autonomously infer operating constraints from minimal process descriptions, then collaboratively guide optimization using the inferred constraints. Our AutoGen-based agentic framework employs OpenAI’s o3 model, with specialized agents for constraint generation, parameter validation, simulation execution, and optimization guidance. Through two phases - autonomous constraint generation using embedded domain knowledge, followed by iterative multi-agent optimization - the framework eliminates the need for predefined operational bounds. Validated on the hydrodealkylation process across cost, yield, and yield-to-cost ratio metrics, the framework demonstrated competitive performance with conventional optimization methods while achieving better computational efficiency, requiring fewer iterations to converge. Our approach converged in under 20 minutes, achieving a 31-fold speedup over grid search. Beyond computational efficiency, the framework’s reasoning-guided search demonstrates sophisticated process understanding, correctly identifying utility trade-offs, and applying domain-informed heuristics. This approach shows significant potential for optimization scenarios where operational constraints are poorly characterized or unavailable, particularly for emerging processes and retrofit applications.
化学流程优化对于最大限度地提高生产效率和经济绩效至关重要。传统方法,包括基于梯度的解决方案、进化算法和参数网格搜索等传统方法,在操作限制定义不当或无法提供的情况下,不切实际,要求工程师依赖主观超常性来估计可行的参数范围。为解决这一制约因素定义瓶颈,我们提出了一个由大型语言模型代理组成的多试剂框架,根据最低流程描述自动推断操作限制,然后用推断的制约因素共同指导优化。我们基于AutoGen的代理框架使用O3模式,由专门工具来生成制约、参数验证、模拟执行和优化指导。在两个阶段中,即利用嵌入域知识自主生成限制,然后是迭代多试优化 — 该框架消除了对预先界定操作界限的需要。在成本、收益和收益-成本比比率衡量尺度之间,框架以常规优化方法展示了竞争性业绩,同时实现更高的计算效率,需要更少的周期融合。我们的方法在20分钟内趋于一致,在使用嵌入式域内实现31倍的逆向后生成,在高级域域知识生成的快速递增速应用,然后进行迭式优化的版搜索,并展示了成本分析框架,从而展示了超越电网格的计算,从而显示新的电路段式的流程。
Article 156
Title@2025-06-26 (4): Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers
Title: Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers | Maschinelles Lernen von Mikrostruktur-Eigenschaftsbeziehungen in Materialien, die die Mikrostrukturdarstellung von grundlegenden Vision-Transformatoren nutzen | 利用基础视觉变压器代表微观结构的材料中微型结构-财产关系 2501.18637v2 |
Authors (2): Sheila E. Whitman, Marat I. Latypov
Machine learning of microstructure–property relationships from data is an emerging approach in computational materials science. Most existing machine learning efforts focus on the development of task-specific models for each microstructure–property relationship. We propose utilizing pre-trained foundational vision transformers for the extraction of task-agnostic microstructure features and subsequent light-weight machine learning of a microstructure-dependent property. We demonstrate our approach with pre-trained state-of-the-art vision transformers (CLIP, DINOv2, SAM) in two case studies on machine-learning: (i) elastic modulus of two-phase microstructures based on simulations data; and (ii) Vicker’s hardness of Ni-base and Co-base superalloys based on experimental data published in literature. Our results show the potential of foundational vision transformers for robust microstructure representation and efficient machine learning of microstructure–property relationships without the need for expensive task-specific training or fine-tuning of bespoke deep learning models.
从数据中学习微型结构-财产关系的机器是计算材料科学中的一种新兴方法。大多数现有的机器学习努力侧重于为每个微型结构-财产关系开发具体任务模型。我们提议利用预先培训的基础视觉变压器提取任务-微型结构特征,随后对一个依赖微型结构的财产进行轻量机械学习。我们在两个关于机器学习的案例研究中展示了我们的做法,即:(一) 以模拟数据为基础的两阶段微型结构弹性模版;(二) 根据文献中公布的实验数据,Vicker对Ni-Base和Com-Base超合金的硬性。我们的结果表明,基础视觉变压器对于稳健的微型结构代表性和对微型结构-财产关系的高效机器学习具有潜力,不需要昂贵的特定任务培训或微深学习模型的微调。
Article 157
Title@2025-06-26 (4): Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning
Title: Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning | Erklärbare KI für Radar-Ressourcenmanagement: Modifizierte LIME im Deep Reinforcement Learning | 用于雷达资源管理的可解释的AIAI:深强化学习中修改的LIME 2506.20916v1 |
Authors (4): Ziyang Lu, M. Cenk Gursoy, Chilukuri K. Mohan, Pramod K. Varshney
Deep reinforcement learning has been extensively studied in decision-making processes and has demonstrated superior performance over conventional approaches in various fields, including radar resource management (RRM). However, a notable limitation of neural networks is their ``black box” nature and recent research work has increasingly focused on explainable AI (XAI) techniques to describe the rationale behind neural network decisions. One promising XAI method is local interpretable model-agnostic explanations (LIME). However, the sampling process in LIME ignores the correlations between features. In this paper, we propose a modified LIME approach that integrates deep learning (DL) into the sampling process, which we refer to as DL-LIME. We employ DL-LIME within deep reinforcement learning for radar resource management. Numerical results show that DL-LIME outperforms conventional LIME in terms of both fidelity and task performance, demonstrating superior performance with both metrics. DL-LIME also provides insights on which factors are more important in decision making for radar resource management.
在决策过程中广泛研究了深入强化学习的问题,并表明在各个领域,包括雷达资源管理(RRM),与常规方法相比,深入强化学习的成绩优于常规方法。然而,神经网络的一个显著局限性是其“黑盒”性质,而最近的研究工作越来越侧重于可解释的AI(XAI)技术,以描述神经网络决定的理由。一种有希望的XAI方法是当地可解释的模型-认知解释(LIME),然而,LIME的取样过程忽略了各种特征之间的关联。在本文中,我们提议了经修改的LIME方法,将深度学习(DLL)纳入取样过程,我们称之为DL-LIME。我们在雷达资源管理的深度强化学习中采用了DLL-LIME。数字结果表明,DL-LIME在忠诚和任务表现两方面都超越了常规的LIME。DLME,显示了两种指标的优异性表现。DL-LIME还提供了关于哪些因素在雷达资源管理决策中更为重要的见解。
Article 158
Title@2025-06-26 (4): ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models
Title: ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models | ZKPROV: Ein Null-Knowledge-Ansatz zur Datensatzprovenz für große Sprachmodelle | ZKPROV:大语言模型数据集验证零知识化办法 2506.20915v1 |
Authors (3): Mina Namazi, Alexander Nemecek, Erman Ayday
As the deployment of large language models (LLMs) grows in sensitive domains, ensuring the integrity of their computational provenance becomes a critical challenge, particularly in regulated sectors such as healthcare, where strict requirements are applied in dataset usage. We introduce ZKPROV, a novel cryptographic framework that enables zero-knowledge proofs of LLM provenance. It allows users to verify that a model is trained on a reliable dataset without revealing sensitive information about it or its parameters. Unlike prior approaches that focus on complete verification of the training process (incurring significant computational cost) or depend on trusted execution environments, ZKPROV offers a distinct balance. Our method cryptographically binds a trained model to its authorized training dataset(s) through zero-knowledge proofs while avoiding proof of every training step. By leveraging dataset-signed metadata and compact model parameter commitments, ZKPROV provides sound and privacy-preserving assurances that the result of the LLM is derived from a model trained on the claimed authorized and relevant dataset. Experimental results demonstrate the efficiency and scalability of the ZKPROV in generating this proof and verifying it, achieving a practical solution for real-world deployments. We also provide formal security guarantees, proving that our approach preserves dataset confidentiality while ensuring trustworthy dataset provenance.
随着大型语言模型(LLMS)在敏感领域的部署不断增长,确保其计算出处的完整性成为一项重大挑战,特别是在医疗保健等监管部门,对数据集的使用适用严格的要求。我们引入了ZKPROV,这是一个新型的加密框架,使LLM出处的零知识证明成为零知识证明。它使用户能够核实一个模型是在可靠的数据集上培训,而没有透露有关该数据集或参数的敏感信息。与以前注重全面核查培训过程(产生大量计算费用)或依赖可信赖的执行环境的做法不同,ZKPROV提供了一种独特的平衡。我们的方法通过零知识证明将一个经过培训的模型与经授权的培训数据集捆绑在一起,同时避免每个培训步骤的证据。通过利用数据集签署的元数据和紧凑模型参数承诺,ZKPROV提供了可靠和保密的保证:LMM的结果来自经过关于声称的授权和相关数据集的培训模型的模型。实验结果显示ZKPROV在生成这一证据和正式部署方面的效率和可扩展性。我们通过零知识证据将一个经过培训的模式捆绑起来,同时确保真实的安全保障。
Article 159
Title@2025-06-26 (4): Faster Fixed-Point Methods for Multichain MDPs
Title: Faster Fixed-Point Methods for Multichain MDPs | Schnellere Fixed-Point-Methoden für Multichain-MDPs | 《多链 MDP快速固定点方法》 2506.20910v1 |
Authors (2): Matthew Zurek, Yudong Chen
We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.
在平均回报标准下,我们研究了解决通用(a.k.a.多链(多链)的Markov决策流程(MDPs)的数值-指数(VI)算法(VI),这是一个基本但理论上具有挑战性的环境,除了所有平均回报问题所固有的困难之外,在对贝尔曼经营者缺乏合同性和非统一的解决办法所造成的所有平均回报问题,在多链环境中,最佳政策必须解决向最佳连接部分方向的导航分问题,同时优化每个组成部分的长期性能。我们开发了更好地解决这一导航次问题的算法,以便实现多链 MDP的更快趋同,提高趋同率,并较准确地衡量与先前工作相比的复杂性。我们的结果中的许多关键组成部分具有潜在的独立利益,包括平均回报和折扣问题之间的新联系、扩大一般Banach空间的六折扣最佳固定点方法、折扣价值错误的新的亚线性趋同率,以及改进多链 MDP的亚性通性分位调。我们的成果为折扣基础和平均问题六的理论化办法带来更快的趋同率和扩展。
Article 160
Title@2025-06-26 (4): Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
Title: Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL | Optimale Single-Policy-Probenkomplexität und transiente Abdeckung für durchschnittlich reward Offline-RL | 平均离岸平均回报率的 最佳单一政策抽样复杂程度和中度覆盖率 2506.20904v1 |
Authors (3): Matthew Zurek, Guy Zamir, Yudong Chen
We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.
我们研究平均回报 MDP 的离线强化学习,从分布转换和非统一覆盖的角度来看,这带来了更多的挑战,而且从理论角度来分析相对不足。虽然以前的工作在单一政策数据覆盖假设下获得了绩效保障,但这种保障使用在所有政策上均匀的更多复杂度措施,例如统一的混合时间。我们只根据目标政策,特别是偏差范围以及新颖的政策撞击半径来制定锐利的保证,产生第一个完全单一政策样本复杂度,以平均回报为离线RL。我们也是第一个处理一般薄弱的传达 MDP , 对比先前工作中的限制性结构假设。为了实现这一点,我们采用了一种基于悲观的贴现价值的算法,这种算法使得能够使用更清晰的实证性惩罚功能。我们的算法也不要求任何先前的参数知识来实施。我们通过难的例子显示,在我们的条件下学习需要超出目标政策固定分布的范围假设,从而区别先前审查的主要结果。
Article 161
Title@2025-06-26 (4): Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction
Title: Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction | Graph-strukturiertes Feedback Multimodel Ensemble Online Conformal Prediction | 多模型组合在线非正式预测 2506.20898v1 |
Authors (2): Erfan Hajihashemi, Yanning Shen
Online conformal prediction has demonstrated its capability to construct a prediction set for each incoming data point that covers the true label with a predetermined probability. To cope with potential distribution shift, multi-model online conformal prediction has been introduced to select and leverage different models from a preselected candidate set. Along with the improved flexibility, the choice of the preselected set also brings challenges. A candidate set that includes a large number of models may increase the computational complexity. In addition, the inclusion of irrelevant models with poor performance may negatively impact the performance and lead to unnecessarily large prediction sets. To address these challenges, we propose a novel multi-model online conformal prediction algorithm that identifies a subset of effective models at each time step by collecting feedback from a bipartite graph, which is refined upon receiving new data. A model is then selected from this subset to construct the prediction set, resulting in reduced computational complexity and smaller prediction sets. Additionally, we demonstrate that using prediction set size as feedback, alongside model loss, can significantly improve efficiency by constructing smaller prediction sets while still satisfying the required coverage guarantee. The proposed algorithms are proven to ensure valid coverage and achieve sublinear regret. Experiments on real and synthetic datasets validate that the proposed methods construct smaller prediction sets and outperform existing multi-model online conformal prediction approaches.
在线一致预测表明,它有能力为每个进取的数据点建立一套包含真实标签且预设概率的预测数据集。为了应对潜在分布变化,已经引入了多模型在线一致预测,以从预选候选人组中选择和利用不同的模型。随着灵活性的提高,选择预选数据集也带来了挑战。包含大量模型的候选数据集可能会增加计算复杂性。此外,将不相干且性能差的模型纳入其中可能会对性能产生不利影响,并导致不必要的大型预测组。为了应对这些挑战,我们提议采用新的多模型在线符合预测算法,通过从接收新数据后完善的双部分图中收集反馈,确定每个步骤的有效模型的一组。然后从这一组中选择一个模型,以构建预测组,从而降低计算复杂性和缩小预测组。此外,我们证明,利用预测设定的大小作为反馈,加上模型损失,可以大大提高效率,同时建立较小的预测组,同时仍满足所需的覆盖范围保证。拟议的算法证明能够确保有效覆盖范围并实现亚线性遗憾,每个步骤都有一套有效模型。在接收新数据时段图时,然后从这个组中挑选出一个模型,然后从这个组中挑选出一个模型,然后从这个组中挑选出一个模型,用来构建一个模型,用来构建一个模型,用来构建一个模型,用来构建一个模型,用以进行实际和合成模型,用以验证现有模型的模型的模型的模拟。
Article 162
Title@2025-06-25 (3): On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
Title: On the Necessity of Output Distribution Reweighting for Effective Class Unlearning | Über die Notwendigkeit der Neugewichtung der Output-Distribution für effektives Klassenunlernen | 有效班级取消学习时必须增加产出分配的加权 2506.20893v1 |
Authors (3): Yian Wang, Ali Ebrahimpour-Boroojeny, Hari Sundaram
In this work, we introduce an output-reweighting unlearning method, RWFT, a lightweight technique that erases an entire class from a trained classifier without full retraining. Forgetting specific classes from trained models is essential for enforcing user deletion rights and mitigating harmful or biased predictions. The full retraining is costly and existing unlearning methods fail to replicate the behavior of the retrained models when predicting samples from the unlearned class. We prove this failure by designing a variant of membership inference attacks, MIA-NN that successfully reveals the unlearned class for any of these methods. We propose a simple redistribution of the probability mass for the prediction on the samples in the forgotten class which is robust to MIA-NN. We also introduce a new metric based on the total variation (TV) distance of the prediction probabilities to quantify residual leakage to prevent future methods from susceptibility to the new attack. Through extensive experiments with state of the art baselines in machine unlearning, we show that our approach matches the results of full retraining in both metrics used for evaluation by prior work and the new metric we propose in this work. Compare to state-of-the-art methods, we gain 2.79% in previously used metrics and 111.45% in our new TV-based metric over the best existing method.
在这项工作中,我们引入了一种产出再加权的不学习方法,即REFT,即轻量级技术,不经过全面再培训,即从受过训练的分类器中清除整个班级。从受过训练的分类器中忘记特定班级对于执行用户删除权利和减轻有害或偏差预测至关重要。全面再培训费用很高,而现有的不学习方法在预测未学班的样本时未能复制经过再培训的模型的行为。我们设计了一个成员推论攻击的变式,即MIA-NNN,成功地揭示了任何这类方法的未学习类。我们建议简单重新分配在被遗忘的类别中预测样本的概率质量,这对MIA-NN是强有力的。我们还根据预测的全变(TV)距离引入了一个新的指标,以量化剩余渗漏,防止未来方法易受到新攻击的可能性。通过对机器不学习的先进基线进行广泛的试验,我们证明我们的方法符合用于评价的任何这类方法中所使用的完全再培训的结果。我们提议在这项工作中采用的新指标。与我们以前采用的最新的衡量方法相比,我们获得了目前采用的最佳衡量方法。
Article 163
Title@2025-06-25 (3): Next-token prediction capacity: general upper bounds and a lower bound for transformers
Title: Next-token prediction capacity: general upper bounds and a lower bound for transformers | Next-token Vorhersagekapazität: allgemeine obere Grenzen und eine untere Grenze für Transformatoren | 下对位预测能力:变压器一般上限值和下限值 2405.13718v3 |
Authors (3): Liam Madden, Curtis Fox, Christos Thrampoulidis
Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are equal up to a multiplicative constant. We prove these bounds in the general setting where next-token distributions can be arbitrary as well as the empirical setting where they are calculated from a finite number of document sequences. Our lower bounds are for one-layer multi-head decoder-only transformers and our proofs highlight an important injectivity property satisfied by self-attention. Furthermore, we provide numerical evidence that the minimal number of parameters for memorization is sufficient for being able to train the model to the entropy lower bound.
根据一系列的符号, 如单词, 下点预测的任务是预测下点有条件概率分布 。 仅使用解码器的变压器已成为此任务的有效模型, 但其属性仍然不完全理解 。 特别是, 仅使用解码器的变压器可以内插下端的分布序列数量最多的不同背景序列尚未确定 。 为了填补这一空白, 我们证明这个数字的上下界与多倍数常数相等。 我们证明, 在一般设置中, 下一点分布器可以任意性, 以及从一定数量文档序列中计算它们的经验性设置 。 我们的下界是单层多位解码器的变压器, 以及我们的证据可以突出一个由自我注意满足的重要的投射属性 。 此外, 我们提供数字证据, 表示, 最小的记忆化参数数量足以将模型训练到低端框中。
Article 164
Title@2025-06-25 (3): Omniwise: Predicting GPU Kernels Performance with LLMs
Title: Omniwise: Predicting GPU Kernels Performance with LLMs | Omniwise: Vorhersage der Leistung von GPU-Kerneln mit LLMs | 总括性: 使用 LLMs 预测 GPU 核心内核性能 2506.20886v1 |
Authors (4): Zixian Wang, Cole Ramos, Muhammad A. Awad, Keith Lowery
In recent years, the rapid advancement of deep neural networks (DNNs) has revolutionized artificial intelligence, enabling models with unprecedented capabilities in understanding, generating, and processing complex data. These powerful architectures have transformed a wide range of downstream applications, tackling tasks beyond human reach. In this paper, we introduce Omniwise, the first end-to-end, self-supervised fine-tuning pipeline that applies large language models (LLMs) to GPU kernel performance prediction–a novel use case in performance profiling. Omniwise is model-agnostic and lightweight, achieving strong results even with a small 3B-parameter model. It can predict key performance metrics, including memory bandwidth, cache hit rates, GFLOPs, and arithmetic intensity, directly from kernel code without the need for code execution or profiling tools. Our approach achieves over 90% of predictions within 10% relative error on GPU kernels executed on AMD MI250 and MI300X architectures. In addition to the pipeline, we develop an online inference server and a Visual Studio Code plugin that seamlessly integrate LLM-based performance prediction into developers’ workflows.
近些年来,深神经网络(DNNS)的快速进步使人工智能发生了革命性的变化,造就了在理解、生成和处理复杂数据方面能力空前的模型。这些强大的建筑改变了广泛的下游应用,解决了人类无法完成的任务。在本文中,我们引入了Omniwith,这是第一个端到端、自监督的微调管道,将大型语言模型(LLLMS)应用于GPU内核性能预测-在性能剖析图中的一种新应用案例。Omniwith是模型性能和轻量级,即使使用一个小的 3B 参数模型也能够取得强有力的结果。它可以直接从内核代码中预测关键性能指标,包括记忆带宽、缓存触速率、GFLLOPs和计算强度,而不需要代码执行或剖析工具。我们的方法在对AMM MI250 和 MI300X 结构中执行的GPUPU内核内核的10%的相对错误范围内实现90%的预测。除了管道外,我们还开发了一个在线发价服务器和视觉演算码插件器插件。
Article 165
Title@2025-06-25 (3): HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation
Title: HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation | HyperINF: Lösen der Hyperkraft der Schulz-Methode zur Bestimmung des Einflusses von Daten | HyperINF: 释放Schulz数据影响估计方法的超功率 2410.05090v2 |
Authors (3): Xinyu Zhou, Simin Fan, Martin Jaggi
Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz’s iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix, which reduces the memory and computation overheads to constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of HyperINF compared to other baselines through a synthetic convergence simulation for matrix inversion. We further validate the efficacy of HyperINF through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. Our codebase is available at https://github.com/Blackzxy/HyperINF.
影响功能为评估个人培训样本对特定目标的贡献提供了一种原则性方法。然而,它们的高计算成本限制了其在大型模型和数据集中的应用。为影响功能近似提议的现有方法大大减少了计算性间接费用。然而,由于算法缺乏强有力的趋同保证,这些功能大多受到不准确的估计。超能力方法的组合因其在矩阵反近差方面严格的趋同保证而广为人知,而矩阵倍增操作可能涉及大型模型的难用记忆和计算费用。我们提议了超光速INF,即高效和准确的影响功能近似方法,利用超强功率方法,特别是舒尔茨的迭代算法。为了处理计算密集型矩阵的倍增,我们采用了通用的鱼信息(GFIM),作为赫斯矩阵的低级近似值,从而降低了记忆和计算间接费用,使之与Lora-RA调制模型的等级无关。我们首先通过一个合成趋同的趋同模拟矩阵,我们进一步验证了超正性INF的功效,通过广泛的实际-密集矩阵矩阵矩阵矩阵增倍的矩阵,我们通过微的LLLLLIMIM 数据测测算,同时实现我们微的高级的轨道数据测算。
Article 166
Title@2025-06-25 (3): Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance
Title: Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance | Komplexe Modelltransformationen durch verstärktes Lernen mit unsicherer menschlicher Führung | 以不确定的人类指导加强学习的复杂模式转变 2506.20883v1 |
Authors (2): Kyanna Dagenais, Istvan David
Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a trade-off between the certainty and timeliness of human advice, our method takes a step towards RL-driven human-in-the-loop engineering methods.
由模型驱动的工程问题往往要求复杂的模型转换,即以广泛顺序链条的模型转换。这些问题的相关例子包括模型同步、自动模型修理和空间探索设计。手工开发复杂的模型是一个容易出错且往往不可行的过程。强化学习(RL)是缓解这些问题的恰当方法。在RL,自主代理商通过试验和错误探索国家空间,以确定有益的行动序列,如MTs。然而,RL方法显示出复杂的问题中的性能问题。在这种情况下,人类指导可能非常有用。在本文中,我们提出了一个通过RL开发复杂的MT序列的方法和技术框架,以潜在的不确定人类建议为指导。我们的框架允许用户定义的MTs被绘图到RL原始,并将它们作为RL方案加以执行,以找到最佳的MT序列。我们的评估表明,人类指导,即使不确定,也会大大改进RL的性能,并导致更高效地发展复杂的RMTs。我们通过贸易驱动的确定性和及时性方法,在人类工程方法之间采取了一种贸易驱动式的一步。
Article 167
Title@2025-06-25 (3): Fairly Accurate: Fairness-aware Multi-group Target Detection in Online Discussion
Title: Fairly Accurate: Fairness-aware Multi-group Target Detection in Online Discussion | Ziemlich genau: Fairness-bewusste Multi-Gruppen-Zielerkennung in Online-Diskussion | 准确无误:在线讨论中公平了解多群体目标检测 2407.11933v2 |
Authors (3): Soumyajit Gupta, Maria De-Arteaga, Matthew Lease
Target-group detection is the task of detecting which group(s) a social media post is ``directed at or about’’, with various applications, such as targeted-marketing. In this work, we focus on the fairness implications of target-group detection in the context of toxicity detection, where the perceived harm of a post often depends on which group(s) it targets. Because toxicity is highly contextual, language that appears benign in general may be harmful when targeting specific demographic groups. It is thus important to first detect which group(s) are being {\em targeted} by a post as a precursor to the subsequent task of determining whether the post is toxic given the group(s). Target-group detection is also challenging: a single post may simultaneously target one to many groups, and we must detect groups fairly in order to promote equitable treatment. We show that our proposed approach to {\em fairness-aware multi target-group detection} not only reduces bias across groups, but also achieves competitive predictive performance, outperforming existing fairness-aware baselines. To spur future research on fairness-aware target-group detection and support competitive benchmarking, we also share our code.
检测目标群体的任务在于检测社会媒体职位的“指向”或“指向”哪个群体,包括各种应用,例如定向营销。在这项工作中,我们侧重于在毒性检测背景下检测目标群体的公平影响,因为发现职位的伤害往往取决于目标人群。由于毒性很强,看来友好的语言在针对特定人口群体时可能普遍有害。因此,首先必须检测作为随后确定该职位是否有毒的前身的某个(s)群体,作为确定该职位是否为目标的(s)职位的前身。在检测目标群体时,也具有挑战性:单个职位可能同时针对多个群体,我们必须公平地检测群体,以促进公平待遇。我们表明,我们拟议的“认识到公平”的多目标群体检测方法不仅减少各群体之间的偏见,而且实现竞争性预测性业绩,优于现有的公平认识基线。为了鼓励今后对公平认识目标群体的检测和支持竞争性基准的研究,我们还共享我们的代码。
Article 168
Title@2025-06-25 (3): Always Skip Attention
Title: Always Skip Attention | Immer die Aufmerksamkeit überspringen | 总是跳过关注 2505.01996v2 |
Authors (4): Yiping Ji, Hemanth Saratchandran, Peyman Moghadam, Simon Lucey
We highlight a curious empirical result within modern Vision Transformers (ViTs). Specifically, self-attention catastrophically fails to train unless it is used in conjunction with a skip connection. This is in contrast to other elements of a ViT that continue to exhibit good performance (albeit suboptimal) when skip connections are removed. Further, we show that this critical dependence on skip connections is a relatively new phenomenon, with previous deep architectures (\eg, CNNs) exhibiting good performance in their absence. In this paper, we theoretically characterize that the self-attention mechanism is fundamentally ill-conditioned and is, therefore, uniquely dependent on skip connections for regularization. Additionally, we propose Token Graying – a simple yet effective complement (to skip connections) that further improves the condition of input tokens. We validate our approach in both supervised and self-supervised training methods.
我们强调现代视野变异器(VITs)中令人好奇的经验结果。 具体地说,自我注意灾难性地未能训练,除非与跳过连接结合使用。 这与在取消跳过连接时继续表现良好(尽管不太理想)的VIT其他元素形成鲜明对比。 此外,我们表明,这种对跳过连接的严重依赖是一种相对较新的现象,以前的深层结构(如CNNs)在不存在时表现良好。 在本文中,我们理论上认为,自我注意机制根本是条件不良的,因此,完全取决于跳过连接进行正规化。 此外,我们提议Token Graying是一种简单而有效的补充(跳过连接),可以进一步改善输入符号的条件。我们验证了我们在监管和自我监督的培训方法方面的做法。
Article 169
Title@2025-06-25 (3): A3 : an Analytical Low-Rank Approximation Framework for Attention
Title: A3 : an Analytical Low-Rank Approximation Framework for Attention | A3: ein analytischer Rahmen für die Annäherung an den Low-Rank-Wert | A3: 分析性低Rank接近度关注框架 2505.12942v3 |
Authors (7): Jeffrey T. H. Wong, Cheng Zhang, Xinye Cao, Pedro Gimenes, George A. Constantinides, Wayne Luk, Yiren Zhao
Large language models have demonstrated remarkable performance; however, their massive parameter counts make deployment highly expensive. Low-rank approximation offers a promising compression solution, yet existing approaches have two main limitations: (1) They focus on minimizing the output error of individual linear layers, without considering the architectural characteristics of Transformers, and (2) they decompose a large weight matrix into two small low-rank matrices. Consequently, these methods often fall short compared to other compression techniques like pruning and quantization, and introduce runtime overhead such as the extra GEMM kernel launches for decomposed small matrices. To address these limitations, we propose $\tt A^\tt 3$, a post-training low-rank approximation framework. $\tt A^\tt 3$ splits a Transformer layer into three functional components, namely $\tt QK$, $\tt OV$, and $\tt MLP$. For each component, $\tt A^\tt 3$ provides an analytical solution that reduces the hidden dimension size inside each component while minimizing the component’s functional loss ($\it i.e.$, error in attention scores, attention outputs, and MLP outputs). This approach directly reduces model sizes, KV cache sizes, and FLOPs without introducing any runtime overheads. In addition, it provides a new narrative in advancing the optimization problem from singular linear layer loss optimization toward improved end-to-end performance. Through extensive experiments, we show that $\tt A^\tt 3$ maintains superior performance compared to SoTAs. For example, under the same reduction budget in computation and memory, our low-rank approximated LLaMA 3.1-70B achieves a perplexity of 4.69 on WikiText-2, outperforming the previous SoTA’s 7.87 by 3.18. We also demonstrate the versatility of $\tt A^\tt 3$, including KV cache compression, quantization, and mixed-rank assignments for enhanced performance.
大型语言模型表现出了显著的性能; 然而, 其巨大的参数计数使得部署费用非常昂贵。 低级别近似提供了很有希望的压缩解决方案, 但现有方法有两个主要局限性:(1) 侧重于将单个线性层的产出错误最小化, 不考虑变异器的建筑特征, 并且(2) 将一个大重矩阵分解成两个小的低级矩阵。 因此, 这些方法往往比其他压缩技术( 比如修剪和四分化) 差得要差得多, 并引入运行时间性管理管理, 比如额外的 GEMM 核心启动功能性能。 为了解决这些局限性, 我们建议 $\ t 3, 培训后低级的低级缩缩缩缩框架。 $t A, 将一个变异端的变异端矩阵分为三个功能部分, 即 $t QKOV 和 美元 美元 。 同样的问题, $ t A t t t 提供一种分析解决方案, 降低每个部件的隐藏的维度规模, 同时将功能损失最小化为 i. i. i. i. preallientalalalalal deal deal dill deal dal daltial dent exaltial ex ex dental ex ex ex ex
Article 170
Title@2025-06-25 (3): Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research
Title: Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research | Empowering Digital Agriculture: Ein datenschutzschonender Rahmen für den Datenaustausch und die Sonderforschung | 赋予数字农业权力:数据分享和合作研究保护隐私框架 2506.20872v1 |
Authors (5): Osama Zafar, Rosemarie Santa González, Mina Namazi, Alfonso Morales, Erman Ayday
Data-driven agriculture, which integrates technology and data into agricultural practices, has the potential to improve crop yield, disease resilience, and long-term soil health. However, privacy concerns, such as adverse pricing, discrimination, and resource manipulation, deter farmers from sharing data, as it can be used against them. To address this barrier, we propose a privacy-preserving framework that enables secure data sharing and collaboration for research and development while mitigating privacy risks. The framework combines dimensionality reduction techniques (like Principal Component Analysis (PCA)) and differential privacy by introducing Laplacian noise to protect sensitive information. The proposed framework allows researchers to identify potential collaborators for a target farmer and train personalized machine learning models either on the data of identified collaborators via federated learning or directly on the aggregated privacy-protected data. It also allows farmers to identify potential collaborators based on similarities. We have validated this on real-life datasets, demonstrating robust privacy protection against adversarial attacks and utility performance comparable to a centralized system. We demonstrate how this framework can facilitate collaboration among farmers and help researchers pursue broader research objectives. The adoption of the framework can empower researchers and policymakers to leverage agricultural data responsibly, paving the way for transformative advances in data-driven agriculture. By addressing critical privacy challenges, this work supports secure data integration, fostering innovation and sustainability in agricultural systems.
由数据驱动的农业将技术和数据纳入农业做法,具有提高作物产量、抗病能力和长期土壤健康的潜能,然而,对隐私的关切,如不利的定价、歧视和资源操纵等,阻止农民分享数据,因为数据可以用来对付他们。为克服这一障碍,我们提议一个隐私保护框架,使数据共享和研发合作能够安全,同时减少隐私风险。该框架结合了降低维度技术(如主要组成部分分析)和差异隐私权,采用拉普拉西亚噪音保护敏感信息。拟议框架使研究人员能够确定目标农民的潜在合作者,并培训个性化机器学习模型,要么通过联合学习,要么直接利用已查明的合作者的数据,阻止农民分享数据,因为综合隐私保护数据是针对相似性的。我们在真实生活数据集上证实了这一点,展示了强有力的隐私保护,防止对抗性攻击和可与中央系统相比的实用性业绩。我们证明这一框架可如何促进农民和研究人员之间的合作,从而保护敏感信息。采用这一框架可以增强研究人员和决策者的能力,从而通过联合学习或直接使用综合隐私保护数据数据,从而推动农业安全创新。
Article 171
Title@2025-06-25 (3): High-dimensional Contextual Bandit Problem without Sparsity
Title: High-dimensional Contextual Bandit Problem without Sparsity | Hochdimensionales Kontext-Bandit-Problem ohne Sparsamkeit | 无分分的高维背景土匪问题 2306.11017v2 |
Authors (2): Junpei Komiyama, Masaaki Imaizumi
In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field, we do not impose sparsity on the regression coefficients. Instead, we rely on recent findings on overparameterized models, which enables us to analyze the performance of the minimum-norm interpolating estimator when data distributions have small effective ranks. We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations.
在这项研究中,我们调查了高维线性背景土匪问题,在那里,地物数量大于预算$T,或者甚至可能是无限的。与这个领域以前的大部分工程不同,我们并没有对回归系数施加宽度。相反,我们依靠最近关于过度分化模型的调查结果,这使我们能够在数据分布效率低时分析最低北向间插估测器的性能。我们建议采用探索-当时承诺算法来解决这一问题并检查其性能。我们通过分析,以美元计算出电子计算算法的最佳率,并表明这一比率可以通过平衡勘探和开发实现。此外,我们引入适应性探索-当时算法,以适应性的方式找到最佳平衡。我们通过一系列模拟评估了拟议算法的性能。
Article 172
Title@2025-06-25 (3): Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA
Title: Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA | Leaner Training, Lower Leakage: Die Erinnerung an LLM Fine-Tuning mit LoRA | 皮皮培训,《下下渗漏:重新研究LLM与LORA的精细调整的记忆 2506.20856v1 |
Authors (2): Fei Wang, Baochun Li
Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA fine-tuning, a widely adopted parameter-efficient method. In this work, we re-examine memorization in fine-tuning and uncover a surprising divergence from prior findings across different fine-tuning strategies. Factors such as model scale and data duplication, which strongly influence memorization in pre-training and full fine-tuning, do not follow the same trend in LoRA fine-tuning. Using a more relaxed similarity-based memorization metric, we demonstrate that LoRA significantly reduces memorization risks compared to full fine-tuning, while still maintaining strong task performance.
大型语言模型(LLMs)的记忆化使得他们容易受到数据提取攻击。虽然对培训前记忆化进行了广泛研究,但探索其微调影响的工作较少,特别是Lora微调,这是一个广泛采用的具有参数效率的方法。在这项工作中,我们重新审查微调的记忆化,发现不同微调战略与以往发现的差异令人惊讶。模型规模和数据重复等因素,严重影响了培训前记忆化和全面微调,但与Lora微调的类似趋势不同。我们使用较宽松的类似度模化标准,证明Lora与全面微调相比,大量减少记忆化风险,同时保持很强的任务性能。
Article 173
Title@2025-06-25 (3): Subspace-Distance-Enabled Active Learning for Efficient Data-Driven Model Reduction of Parametric Dynamical Systems
Title: Subspace-Distance-Enabled Active Learning for Efficient Data-Driven Model Reduction of Parametric Dynamical Systems | Subspace-Distance-Enabled Active Learning für effiziente datengetriebene Modellreduktion parametrischer dynamischer Systeme | 减少参数动态系统的高效数据驱动模型 2505.00460v2 |
Authors (3): Harshit Kapadia, Peter Benner, Lihong Feng
In situations where the solution of a high-fidelity dynamical system needs to be evaluated repeatedly, over a vast pool of parametric configurations and in absence of access to the underlying governing equations, data-driven model reduction techniques are preferable. We propose a novel active learning approach to build a parametric data-driven reduced-order model (ROM) by greedily picking the most important parameter samples from the parameter domain. As a result, during the ROM construction phase, the number of high-fidelity solutions dynamically grow in a principled fashion. The high-fidelity solution snapshots are expressed in several parameter-specific linear subspaces, with the help of proper orthogonal decomposition (POD), and the relative distance between these subspaces is used as a guiding mechanism to perform active learning. For successfully achieving this, we provide a distance measure to evaluate the similarity between pairs of linear subspaces with different dimensions, and also show that this distance measure is a metric. The usability of the proposed subspace-distance-enabled active learning (SDE-AL) framework is demonstrated by augmenting two existing non-intrusive reduced-order modeling approaches, and providing their active-learning-driven (ActLearn) extensions, namely, SDE-ActLearn-POD-KSNN, and SDE-ActLearn-POD-NN. Furthermore, we report positive results for two parametric physical models, highlighting the efficiency of the proposed SDE-AL approach.
在需要反复评价高不忠动态系统解决方案的情况下,在庞大的参数配置库中以及在无法获取基本治理方程式的情况下,数据驱动模型减少技术更为可取。我们提出一种新的积极学习方法,通过贪婪地从参数域中取取最重要的参数样本来建立一个参数驱动数据减序模型(ROM)。结果,在ROM构建阶段,高不忠动态系统解决方案的数量以有原则的方式动态增长。高不忠诚解决方案在多个特定参数的线性线性子空间中显示,帮助进行适当的正或色分解(POD),这些子空间之间的相对距离被用作进行积极学习的指导机制。为了成功实现这一点,我们提供了远程测量不同维度的线性子空间对相对之间相似性的措施,并表明这一远程测量是测量的。拟议的次空间拉伸缩积极主动学习(SDEAL)框架的易用性,通过增强现有的非高级不偏重的SDO-L-LFS-S-S-ROD-L-S-FAL-S-L-SL-SB-S-L-SB-SB-L-L-SD-SB-IL-SL-SB-SL-I-SB-SB-S-I-SB-SL-SB-IL-S-S-S-S-S-SB-SL-S-S-S-S-S-S-S-SB-S-S-S-S-S-S-S-S-SL-SL-SL-SL-S-S-SL-SL-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-
Article 174
Title@2025-06-25 (3): Multi-Objective Reinforcement Learning for Cognitive Radar Resource Management
Title: Multi-Objective Reinforcement Learning for Cognitive Radar Resource Management | Multi-Zielives Stärkungslernen für Kognitives Radarressourcenmanagement | 多目标强化学习促进认知雷达资源管理 2506.20853v1 |
Authors (5): Ziyang Lu, Subodh Kalia, M. Cenk Gursoy, Chilukuri K. Mohan, Pramod K. Varshney
The time allocation problem in multi-function cognitive radar systems focuses on the trade-off between scanning for newly emerging targets and tracking the previously detected targets. We formulate this as a multi-objective optimization problem and employ deep reinforcement learning to find Pareto-optimal solutions and compare deep deterministic policy gradient (DDPG) and soft actor-critic (SAC) algorithms. Our results demonstrate the effectiveness of both algorithms in adapting to various scenarios, with SAC showing improved stability and sample efficiency compared to DDPG. We further employ the NSGA-II algorithm to estimate an upper bound on the Pareto front of the considered problem. This work contributes to the development of more efficient and adaptive cognitive radar systems capable of balancing multiple competing objectives in dynamic environments.
多功能认知雷达系统的时间分配问题侧重于对新出现目标进行扫描与跟踪先前发现的目标之间的权衡,我们将此作为一个多目标优化问题加以阐述,并采用深入强化学习方法,找到最佳的Pareto解决方案,比较深度确定性政策梯度(DPG)和软行为者-批评算法。我们的结果显示两种算法在适应各种情景方面的有效性,SAC显示与DDPG相比,稳定性和抽样效率有所提高。我们进一步使用NSGA-II算法来估计在Pareto前面所考虑的问题的上限。这项工作有助于开发更高效和适应性的认知雷达系统,能够在动态环境中平衡多种相互竞争的目标。
Article 175
Title@2025-06-25 (3): InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
Title: InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction | InterFormer: Effektives Heterogenes Interaktionslernen für Click-through-Rate-Vorhersage | Interformer: 有效不同差异的交互式学习,用于点击频频率预测 2411.09852v3 |
Authors (28): Zhichen Zeng, Xiaolong Liu, Mengyue Hang, Xiaoyi Liu, Qinghai Zhou, Chaofei Yang, Yiqun Liu, Yichen Ruan, Laming Chen, Yuxin Chen, Yujia Hao, Jiaqi Xu, Jade Nie, Xi Liu, Buyun Zhang, Wei Wen, Siyang Yuan, Hang Yin, Xin Zhang, Kai Wang, Wen-Yen Chen, Yiping Han, Huayu Li, Chunzhi Yang, Bo Long, Philip S. Yu, Hanghang Tong, Jiyan Yang
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is a fundamental task in recommender systems. The emergence of heterogeneous information, such as user profile and behavior sequences, depicts user interests from different aspects. A mutually beneficial integration of heterogeneous information is the cornerstone towards the success of CTR prediction. However, most of the existing methods suffer from two fundamental limitations, including (1) insufficient inter-mode interaction due to the unidirectional information flow between modes, and (2) aggressive information aggregation caused by early summarization, resulting in excessive information loss. To address the above limitations, we propose a novel module named InterFormer to learn heterogeneous information interaction in an interleaving style. To achieve better interaction learning, InterFormer enables bidirectional information flow for mutually beneficial learning across different modes. To avoid aggressive information aggregation, we retain complete information in each data mode and use a separate bridging arch for effective information selection and summarization. Our proposed InterFormer achieves state-of-the-art performance on three public datasets and a large-scale industrial dataset.
点击通速率(CTR)预测预测用户点击广告的概率,这是推荐人系统中的一项基本任务。各种信息的出现,如用户概况和行为序列等,从不同方面描绘了用户的兴趣。多种信息之间的互利整合是CTR预测成功的基石。但是,大多数现有方法都受到两个基本限制,包括:(1) 由于不同模式之间单向信息流动造成的模式间互动不足,以及(2) 早期汇总导致的信息过度损失,导致信息大量汇总。为了应对上述限制,我们提议了一个名为 InterFormer 的新模块,以学习互换式的多种信息互动。为了实现更好的互动学习,InterFormer 能够双向信息流动,用于不同模式的互利学习。为了避免积极的信息汇总,我们保留了每个数据模式中的全部信息,并使用一个单独的连接中心,以有效选择和汇总信息。我们提议的InterFormer在三个公共数据集和大规模工业数据集上实现了最新表现。
Article 176
Title@2025-06-25 (3): Learning-Based Resource Management in Integrated Sensing and Communication Systems
Title: Learning-Based Resource Management in Integrated Sensing and Communication Systems | Lernbasiertes Ressourcenmanagement in integrierten Sensing- und Kommunikationssystemen | 综合遥感和通信系统基于学习的资源管理 2506.20849v1 |
Authors (4): Ziyang Lu, M. Cenk Gursoy, Chilukuri K. Mohan, Pramod K. Varshney
In this paper, we tackle the task of adaptive time allocation in integrated sensing and communication systems equipped with radar and communication units. The dual-functional radar-communication system’s task involves allocating dwell times for tracking multiple targets and utilizing the remaining time for data transmission towards estimated target locations. We introduce a novel constrained deep reinforcement learning (CDRL) approach, designed to optimize resource allocation between tracking and communication under time budget constraints, thereby enhancing target communication quality. Our numerical results demonstrate the efficiency of our proposed CDRL framework, confirming its ability to maximize communication quality in highly dynamic environments while adhering to time constraints.
在本文件中,我们处理配备雷达和通信单位的综合遥感和通信系统中的适应性时间分配任务,双功能雷达通信系统的任务是分配时间跟踪多个目标并利用剩余时间将数据传送到估计目标地点;我们采用了新的限制深度强化学习(CDRL)方法,目的是在预算限制下在时间限制下优化跟踪和通信之间的资源分配,从而提高目标通信质量;我们的数字结果显示了我们拟议的CDRL框架的效率,证实它有能力在高度动态的环境中最大限度地提高通信质量,同时遵守时间限制。
Article 177
Title@2025-06-25 (3): From Tiny Machine Learning to Tiny Deep Learning: A Survey
Title: From Tiny Machine Learning to Tiny Deep Learning: A Survey | Vom kleinen maschinellen Lernen bis zum kleinen tiefen Lernen: Eine Umfrage | 从小机器学习到小深习:调查 2506.18927v2 |
Authors (11): Shriyank Somvanshi, Md Monzurul Islam, Gaurab Chhetri, Rohit Chakraborty, Mahmuda Sultana Mimi, Sawgat Ahmed Shuvo, Kazi Sifatul Islam, Syed Aaqib Javed, Sharif Ahmed Rafat, Anandi Dutta, Subasish Das
The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely resource-constrained hardware. This survey presents a comprehensive overview of the transition from TinyML to TinyDL, encompassing architectural innovations, hardware platforms, model optimization techniques, and software toolchains. We analyze state-of-the-art methods in quantization, pruning, and neural architecture search (NAS), and examine hardware trends from MCUs to dedicated neural accelerators. Furthermore, we categorize software deployment frameworks, compilers, and AutoML tools enabling practical on-device learning. Applications across domains such as computer vision, audio recognition, healthcare, and industrial monitoring are reviewed to illustrate the real-world impact of TinyDL. Finally, we identify emerging directions including neuromorphic computing, federated TinyDL, edge-native foundation models, and domain-specific co-design approaches. This survey aims to serve as a foundational resource for researchers and practitioners, offering a holistic view of the ecosystem and laying the groundwork for future advancements in edge AI.
边缘装置的迅速增长推动了在边缘部署人工智能(AI)的需求,从而产生了小型机器学习(TinyML)及其不断发展的对应机构(Tiny Deep Learning(TinyDL) 。虽然TinyML最初侧重于为微控制器提供简单的推论任务,但微小DL的出现标志着向在资源严重紧缺的硬件方面部署深层次学习模型的范式转变。这项调查全面概述了从小ML到小DL的过渡,包括建筑创新、硬件平台、模型优化技术和软件工具链。我们分析了在定量化、快速运行和神经结构搜索(NAS)方面最先进的方法,并研究了从微控制器到专门的神经加速器的硬件趋势。此外,我们对软件部署框架、编译器和自动ML工具进行了分类,使实用化的硬件学习变得实用。对诸如计算机愿景、音频识别、保健和工业监测等领域的应用进行了审查,以说明小DL的实际世界影响。最后,我们确定了新出现的方向,包括神经上的最先进的计算、更新的边缘和神经边缘构造基础模型,为基础基础基础,为高级基础的内置地基建的实地研究提供了基础基础基础基础。
Article 178
Title@2025-06-25 (3): Reducing Biases in Record Matching Through Scores Calibration
Title: Reducing Biases in Record Matching Through Scores Calibration | Reduzierung von Biasen in Rekorden, die durch Score-Kalibrierung übereinstimmen | 通过计分校准减少记录匹配比分 2411.01685v2 |
Authors (2): Mohammad Hossein Moslemi, Mostafa Milani
Record matching is the task of identifying records that refer to the same real-world entity across datasets. While most existing models optimize for accuracy, fairness has become an important concern due to the potential for unequal outcomes across demographic groups. Prior work typically focuses on binary outcomes evaluated at fixed decision thresholds. However, such evaluations can miss biases in matching scores–biases that persist across thresholds and affect downstream tasks. We propose a threshold-independent framework for measuring and reducing score bias, defined as disparities in the distribution of matching scores across groups. We show that several state-of-the-art matching methods exhibit substantial score bias, even when appearing fair under standard threshold-based metrics. To address this, we introduce two post-processing score calibration algorithms. The first, calib, aligns group-wise score distributions using the Wasserstein barycenter, targeting demographic parity. The second, ccalib, conditions on predicted labels to further reduce label-dependent biases, such as equal opportunity. Both methods are model-agnostic and require no access to model training data. calib also offers theoretical guarantees, ensuring reduced bias with minimal deviation from original scores. Experiments across real-world datasets and matching models confirm that calib and ccalib substantially reduce score bias while minimally impacting model accuracy.
记录匹配是确定记录的任务,这些记录指的是跨数据集的同一个真实世界实体。虽然大多数现有模型优化了准确性,但公平性已成为一个重要的关切问题,因为各人口群体之间可能产生不平等的结果。以往的工作通常侧重于在固定决定阈值下评估的二进制结果。然而,这种评价在匹配跨越阈值并影响下游任务的得分偏差方面可能会出现偏差。我们提出了一个衡量和减少得分偏差的门槛独立框架,即不同组间得分分配的差异。我们显示,一些最先进的匹配方法表现出很大的评分偏差,即使根据标准的门槛衡量标准看是公平的。为了解决这个问题,我们引入了两种后处理得分校准算法。第一个是Calib,在使用瓦塞斯坦百分数中标来调整组间得分分布时,针对人口均等。第二个是Calib,关于预测标签的条件,以进一步减少依赖标签的偏差,例如机会均等。这两种方法都是模型,不需要获得模型培训数据。Clib还提供理论保证,确保减少偏差,同时大幅确认原始得分的精确性模型。
Article 179
Title@2025-06-25 (3): Uncertainty-Aware Machine-Learning Framework for Predicting Dislocation Plasticity and Stress-Strain Response in FCC Alloys
Title: Uncertainty-Aware Machine-Learning Framework for Predicting Dislocation Plasticity and Stress-Strain Response in FCC Alloys | Unsicheres Machine-Learning-Framework für die Vorhersage von Dislokation Plastizität und Stress-Stain-Reaktion in FCC-Legierungen | FCC合金中预测异异可塑性和压力-压力-压力-压力-压力-压力反应的 不确定性-警报机学习框架 2506.20839v1 |
Authors (5): Jing Luo, Yejun Gu, Yanfei Wang, Xiaolong Ma, Jaafar. A El-Awady
Machine learning has significantly advanced the understanding and application of structural materials, with an increasing emphasis on integrating existing data and quantifying uncertainties in predictive modeling. This study presents a comprehensive methodology utilizing a mixed density network (MDN) model, trained on extensive experimental data from literature. This approach uniquely predicts the distribution of dislocation density, inferred as a latent variable, and the resulting stress distribution at the grain level. The incorporation of statistical parameters of those predicted distributions into a dislocation-mediated plasticity model allows for accurate stress-strain predictions with explicit uncertainty quantification. This strategy not only improves the accuracy and reliability of mechanical property predictions but also plays a vital role in optimizing alloy design, thereby facilitating the development of new materials in a rapidly evolving industry.
机器学习极大地促进了结构材料的理解和应用,日益强调综合现有数据和量化预测模型的不确定性,本研究提出了使用混合密度网络模型的综合方法,该模型在文献的广泛实验数据方面受过培训,这一方法独特地预测了作为潜在变数推算的失调密度的分布,以及由此在谷物一级的压力分布。将这些预测分布的统计参数纳入一个以错位为中介的可塑性模型,从而可以准确预测压力紧张,并有明确的不确定性量化。这一战略不仅提高了机械地产预测的准确性和可靠性,而且还在优化合金设计方面发挥了至关重要的作用,从而便利了迅速演变的行业新材料的开发。
Article 180
Title@2025-06-25 (3): Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
Title: Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning | Globale falsche Negative auf der Flucht entdecken für selbstüberwachtes kontraproduktives Lernen | 为自我监督的反竞争学习而发现飞行中的全球虚假负差 2502.20612v2 |
Authors (4): Vicente Balmaseda, Bokun Wang, Ching-Long Lin, Tianbao Yang
In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as “false negatives”, leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.
在自我监督的对比性学习中,负面对子通常是用锚图像和从整个数据集中提取的样本来构造的,不包括锚。然而,这一方法可能导致产生带有类似语义的负对子,称为“假阴性”,导致其嵌入过程被错误地拆分。为了解决这一问题,我们引入了基于优化的GloFND,这是一种在训练期间自动学习每个锚数据的阈值以识别其虚假底值的优化方法。与以往的虚假负面发现方法相反,我们的做法在全球范围发现整个数据集中的假阴性,而不是在微型批量中在当地。此外,其 Periteration计算成本仍然独立于数据集的大小。图像和图像文本数据的实验结果显示了拟议方法的有效性。我们的实施可以在 https://github.com/vibalcam/GloFND上查阅。
Article 181
Title@2025-06-25 (3): Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Title: Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data | Composite Flow passend zum Verstärkungslernen mit Shifted-Dynamics-Daten | 与上下动动量数据匹配的强化学习综合流程 2505.23062v2 |
Authors (5): Lingkai Kong, Haichuan Wang, Tonghan Wang, Guojun Xiong, Milind Tambe
Incorporating pre-collected offline data from a source environment can significantly improve the sample efficiency of reinforcement learning (RL), but this benefit is often challenged by discrepancies between the transition dynamics of the source and target environments. Existing methods typically address this issue by penalizing or filtering out source transitions in high dynamics-gap regions. However, their estimation of the dynamics gap often relies on KL divergence or mutual information, which can be ill-defined when the source and target dynamics have disjoint support. To overcome these limitations, we propose CompFlow, a method grounded in the theoretical connection between flow matching and optimal transport. Specifically, we model the target dynamics as a conditional flow built upon the output distribution of the source-domain flow, rather than learning it directly from a Gaussian prior. This composite structure offers two key advantages: (1) improved generalization for learning target dynamics, and (2) a principled estimation of the dynamics gap via the Wasserstein distance between source and target transitions. Leveraging our principled estimation of the dynamics gap, we further introduce an optimistic active data collection strategy that prioritizes exploration in regions of high dynamics gap, and theoretically prove that it reduces the performance disparity with the optimal policy. Empirically, CompFlow outperforms strong baselines across several RL benchmarks with shifted dynamics.
从源环境预先收集的离线数据可以大大提高强化学习(RL)的抽样效率,但这一效益往往受到源与目标环境过渡动态之间的差异的挑战。现有方法通常通过惩罚或过滤高动态差距区域源的过渡来解决这一问题。但是,它们对于动态差距的估计往往依赖KL差异或相互信息,而当源和目标动态得到不连贯的支持时,这种差异或相互信息可能定义不当。为了克服这些限制,我们提议CompFlow,这是基于流动匹配与最佳运输之间理论联系的一种方法。具体地说,我们将目标动态作为基于源-地流动产出分布的有条件流动模型,而不是直接从Gausian之前的区域学习。这一综合结构提供了两个主要优势:(1) 改进学习目标动态的通用化,(2) 通过源与目标转型之间的瓦瑟斯坦距离对动态差距进行有原则性的估计。我们利用我们对动态差距的原则性估计,我们进一步引入了一种乐观的积极数据收集战略,优先在高动态差距区域进行勘探,从理论上证明它能够减少最佳政策基线之间的强性差距。
Article 182
Title@2025-06-25 (3): Harnessing the Universal Geometry of Embeddings
Title: Harnessing the Universal Geometry of Embeddings | Die universelle Geometrie der Einbettungen nutzen | 利用通用嵌入式几何法 2505.12540v3 |
Authors (4): Rishi Jha, Collin Zhang, Vitaly Shmatikov, John X. Morris
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.
我们引入了将文本从一个矢量空间嵌入到另一个矢量空间的第一种方法, 没有任何配对数据、 编码器或预设的匹配组。 我们不受监督的方法将任何嵌入和嵌入于一个普遍的潜值代表( 即, 由等离子代表假设推断的通用语义结构 ) 。 我们的翻译实现了不同结构、 参数计数 和培训数据集的模型对配的高度共生相似性 。 将未知嵌入到不同空间同时保存其几何特征的能力对矢量数据库的安全有着严重影响 。 只有嵌入矢量代表器的对手可以提取关于基本文件的敏感信息, 足以进行分类和属性推断 。
Article 183
Title@2025-06-25 (3): TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Title: TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation | TaxaDiffusion: Progressiv ausgebildetes Diffusionsmodell für die Generierung feinkörniger Arten | 传税:逐步培训的传税模式 2506.01923v2 |
Authors (6): Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath, Anuj Karpatne, Wei-Lun Chao, Cheng Zhang
We propose TaxaDiffusion, a taxonomy-informed training framework for diffusion models to generate fine-grained animal images with high morphological and identity accuracy. Unlike standard approaches that treat each species as an independent category, TaxaDiffusion incorporates domain knowledge that many species exhibit strong visual similarities, with distinctions often residing in subtle variations of shape, pattern, and color. To exploit these relationships, TaxaDiffusion progressively trains conditioned diffusion models across different taxonomic levels – starting from broad classifications such as Class and Order, refining through Family and Genus, and ultimately distinguishing at the Species level. This hierarchical learning strategy first captures coarse-grained morphological traits shared by species with common ancestors, facilitating knowledge transfer before refining fine-grained differences for species-level distinction. As a result, TaxaDiffusion enables accurate generation even with limited training samples per species. Extensive experiments on three fine-grained animal datasets demonstrate that outperforms existing approaches, achieving superior fidelity in fine-grained animal image generation. Project page: https://amink8.github.io/TaxaDiffusion/
为了利用这种关系,我们提议CastaraDifution逐步培养不同分类层次的有条件的传播模式 – – 从诸如等级和秩序等广泛分类开始,通过家庭和Genus加以改进,并最终在物种一级加以区分。这一等级学习战略首先捕捉与共同祖先的物种共有的杂质形态特征,促进知识的转让,然后才能细化物种层次的差别。因此,CatalaDifmission使得即使每个物种的培训样本有限,也能进行准确的生成。关于三个细细类动物数据集的广泛实验表明,它们超越了现有的方法,在细类动物图像生成方面实现了优等忠诚。项目网页:https://amink8.github.io/Tasixiflive/Tasaifmission。项目网页:https://amink8.gexub.
Article 184
Title@2025-06-25 (3): Efficacy of Temporal Fusion Transformers for Runoff Simulation
Title: Efficacy of Temporal Fusion Transformers for Runoff Simulation | Wirksamkeit von Temporal Fusion Transformern für Runoff Simulation | 用于模拟径流的时空熔化变换器的效能 2506.20831v1 |
Authors (2): Sinan Rasiya Koya, Tirthankar Roy
Combining attention with recurrence has shown to be valuable in sequence modeling, including hydrological predictions. Here, we explore the strength of Temporal Fusion Transformers (TFTs) over Long Short-Term Memory (LSTM) networks in rainfall-runoff modeling. We train ten randomly initialized models, TFT and LSTM, for 531 CAMELS catchments in the US. We repeat the experiment with five subsets of the Caravan dataset, each representing catchments in the US, Australia, Brazil, Great Britain, and Chile. Then, the performance of the models, their variability regarding the catchment attributes, and the difference according to the datasets are assessed. Our findings show that TFT slightly outperforms LSTM, especially in simulating the midsection and peak of hydrographs. Furthermore, we show the ability of TFT to handle longer sequences and why it can be a better candidate for higher or larger catchments. Being an explainable AI technique, TFT identifies the key dynamic and static variables, providing valuable scientific insights. However, both TFT and LSTM exhibit a considerable drop in performance with the Caravan dataset, indicating possible data quality issues. Overall, the study highlights the potential of TFT in improving hydrological modeling and understanding.
将注意力与重现结合起来,在包括水文预测在内的序列建模中显示出了宝贵的价值。在这里,我们探索了长期短期内存(LSTM)网络在降雨径流模型中的时间分解变异器(TTTT)的强度。我们为美国531个CAMLES集水区随机培训了10个初始模型(TFT和LSTM)。我们重复了对代表美国、澳大利亚、巴西、大不列颠和智利各集水区的Carvan数据集的5个子集的实验。然后,对模型的性能、它们在集水分属性方面的变异性以及根据数据集的差异进行了评估。我们的调查结果显示,TFT略优于LSTM,特别是在模拟水文学的中段和峰值方面。此外,我们展示了TFT处理较长序列的能力,以及为什么它可以更好地选择更高或更大的集水分集。TFT是一种可解释的技术,TF确定关键的动态和静态变量,提供了宝贵的科学洞察力。然而,TFT和LSTM都展示了与全面航空数据概要的绩效下降。
Article 185
Title@2025-06-25 (3): Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery
Title: Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery | Erweiterte Computer-Vision für die Extraktion georeferenzierten Fahrzeug-Trajektorien aus Drohnenbildern | 从无人机图像中提取地理参照车辆轨迹的高级计算机愿景 2411.02136v3 |
Authors (4): Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis
This paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone imagery, addressing key challenges in urban traffic monitoring and the limitations of traditional ground-based systems. Our approach integrates several novel contributions, including a tailored object detector optimized for high-altitude bird’s-eye view perspectives, a unique track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, and an orthophoto and master frame-based georeferencing strategy that enhances consistent alignment across multiple drone viewpoints. Additionally, our framework features robust vehicle dimension estimation and detailed road segmentation, enabling comprehensive traffic analysis. Conducted in the Songdo International Business District, South Korea, the study utilized a multi-drone experiment covering 20 intersections, capturing approximately 12TB of 4K video data over four days. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising approximately 700,000 unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated images with about 300,000 vehicle instances in four classes. Comparisons with high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our extraction pipeline in dense urban environments. The public release of Songdo Traffic and Songdo Vision, and the complete source code for the extraction pipeline, establishes new benchmarks in data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise and cost-effective urban traffic monitoring, providing valuable resources for developing intelligent transportation systems and enhancing traffic management strategies.
本文提供了一个框架,用于从高空无人驾驶飞机图像中提取地理参照车辆轨迹,应对城市交通监测和传统陆基系统限制方面的主要挑战。我们的方法综合了几项新贡献,包括:为高空鸟视景视角优化的定制目标探测器,一种独特的轨道稳定方法,在图像登记期间使用探测到的车辆捆绑盒作为排除面罩,以及一种以圆形和总框架为基础的地理参照战略,使多无人驾驶飞机观点更加一致。此外,我们的框架包括强力的车辆尺寸估计和详细的路段分割,便于进行全面交通分析。在韩国宋道国际商业区进行的研究利用了涵盖20个交叉点的多轨实验,在四天内采集了约12个4K视频数据。 该框架产生了两个高质量的数据集:Songdo交通数据集,由大约700 000个独特的车辆轨迹组成,以及Songdo Vision数据集,包含5 000多多张的、约300 000个车辆例的智能图象,从而得以全面分析交通状况。与高精确的交通感测器传感器数据进行比较,从一个不断推进的车辆的实验室测试环境中,为改进的准确度数据定位和升级的车辆管理提供最新数据,以展示数据,以展示系统,以展示数据源源的准确性,以显示城市交通数据,以显示和升级的准确性数据,以显示城市交通数据,以显示城市路路路路基路路基路基。
Article 186
Title@2025-06-25 (3): Demystifying Distributed Training of Graph Neural Networks for Link Prediction
Title: Demystifying Distributed Training of Graph Neural Networks for Link Prediction | Entmystifizieren der verteilten Ausbildung von Graphen-Neural-Netzwerken zur Link-Vorhersage | 对图形神经网络进行缩小神秘性分布培训,促进连结预测 2506.20818v1 |
Authors (2): Xin Huang, Chul-Ho Lee
Graph neural networks (GNNs) are powerful tools for solving graph-related problems. Distributed GNN frameworks and systems enhance the scalability of GNNs and accelerate model training, yet most are optimized for node classification. Their performance on link prediction remains underexplored. This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation when each worker trains a GNN on its assigned partitioned subgraph without having access to the entire graph. We discover that the main sources of the issue come from not only the information loss caused by graph partitioning but also the ways of drawing negative samples during model training. While sharing the complete graph information with each worker resolves the issue and preserves link prediction accuracy, it incurs a high communication cost. We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost. Experiment results on several public real-world datasets demonstrate the effectiveness of SpLPG, which reduces the communication overhead by up to about 80% while mostly preserving link prediction accuracy.
图表神经网络(GNNs)是解决与图形有关的问题的有力工具。 分布式GNN框架和系统提高了GNNs的可缩缩缩性,并加快了模型培训,但大多数都优化了节点分类。 链接预测的性能仍然未得到充分探讨。 本文通过调查每个工人在其指定的分区分层分层图上培训GNN时的性能退化问题,解开了对GNNs进行联系预测的分布式培训。 我们发现,问题的主要来源不仅来自图形分割造成的信息损失,而且还来自模型培训中绘制负面样本的方式。 在与每个工人分享完整的图表信息解决了问题并保留了预测准确性链接的同时,它产生了很高的通信成本。 我们建议SpLPG, 有效地利用图形放大功能,以降低通信成本来缓解性能退化问题。 几个公共真实世界数据集的实验结果证明了SpLPG的有效性, 它将通信费降低到大约80%,同时主要保持链接预测准确性。
Article 187
Title@2025-06-25 (3): Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers
Title: Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers | Universelle und effiziente Erkennung von Adversarialdaten durch nicht einheitliche Auswirkungen auf Netzwerkebenen | 通过对网络层的不统一影响普遍和高效率地检测对立数据 2506.20816v1 |
Authors (2): Furkan Mumcu, Yasin Yilmaz
Deep Neural Networks (DNNs) are notoriously vulnerable to adversarial input designs with limited noise budgets. While numerous successful attacks with subtle modifications to original input have been proposed, defense techniques against these attacks are relatively understudied. Existing defense approaches either focus on improving DNN robustness by negating the effects of perturbations or use a secondary model to detect adversarial data. Although equally important, the attack detection approach, which is studied in this work, provides a more practical defense compared to the robustness approach. We show that the existing detection methods are either ineffective against the state-of-the-art attack techniques or computationally inefficient for real-time processing. We propose a novel universal and efficient method to detect adversarial examples by analyzing the varying degrees of impact of attacks on different DNN layers. {Our method trains a lightweight regression model that predicts deeper-layer features from early-layer features, and uses the prediction error to detect adversarial samples.} Through theoretical arguments and extensive experiments, we demonstrate that our detection method is highly effective, computationally efficient for real-time processing, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.
深神经网络(DNNs)在噪音预算有限的情况下极易受到对抗性输入设计的影响。虽然提出了许多对原始输入进行微妙修改的成功攻击,但防御技术相对而言却相对缺乏研究。现有的防御方法要么侧重于通过排除扰动效应的影响来提高DNN的稳健性,要么使用二级模型来探测敌对数据。虽然同样重要的是,在这项工作中研究的攻击探测方法比强力方法提供了更实际的防御。我们表明,现有的探测方法要么对最先进的攻击技术无效,要么对实时处理方法计算效率低。我们提出了一个新的普遍有效方法,通过分析对DNNNe不同层次攻击的不同程度的影响来探测对抗性例子。{我们的方法是培训一个轻量回归模型,预测早期的更深层特征,并利用预测错误来探测对抗性样品。}通过理论论断和广泛的实验,我们证明,我们的探测方法对于实时处理非常有效,计算效率很高,与任何DNNN结构相容,并适用于不同领域,例如图像、音频和可应用。}
Article 188
Title@2025-06-25 (3): Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning
Title: Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning | Divide, Specialize und Route: Ein neuer Ansatz für effizientes Ensemble-Lernen | 区分、专门和路线:高效组合学习的新方式 2506.20814v1 |
Authors (8): Jakub Piwko, Jędrzej Ruciński, Dawid Płudowski, Antoni Zajko, Patryzja Żak, Mateusz Zacharecki, Anna Kozak, Katarzyna Woźnica
Ensemble learning has proven effective in boosting predictive performance, but traditional methods such as bagging, boosting, and dynamic ensemble selection (DES) suffer from high computational cost and limited adaptability to heterogeneous data distributions. To address these limitations, we propose Hellsemble, a novel and interpretable ensemble framework for binary classification that leverages dataset complexity during both training and inference. Hellsemble incrementally partitions the dataset into circles of difficulty by iteratively passing misclassified instances from simpler models to subsequent ones, forming a committee of specialised base learners. Each model is trained on increasingly challenging subsets, while a separate router model learns to assign new instances to the most suitable base model based on inferred difficulty. Hellsemble achieves strong classification accuracy while maintaining computational efficiency and interpretability. Experimental results on OpenML-CC18 and Tabzilla benchmarks demonstrate that Hellsemble often outperforms classical ensemble methods. Our findings suggest that embracing instance-level difficulty offers a promising direction for constructing efficient and robust ensemble systems.
综合学习在提高预测性能方面证明是行之有效的,但传统方法,如包装、提升和动态混合选择(DES),却因计算成本高和适应不同数据分布的能力有限而受到影响。为克服这些限制,我们建议Hellsemble(一个新颖和可解释的二进制分类共同框架),在培训和推断期间利用数据集的复杂性。综合将数据集逐步分割成困难圈,从更简单的模型反复地将错误的分类实例传送到随后的模型,组成一个专门的基础学习者委员会。每个模型都接受挑战性越来越大的子集的培训,而一个单独的路由模型则学会根据推断的困难为最合适的基础模型分配新的实例。Gellsemble(Gellsemble)在保持计算效率和可解释性的同时,实现了很强的分类准确性。OpenML-CC18和Tabzilla基准的实验结果表明,Hellsemble 集合往往超越了典型的组合方法。我们的研究结果表明,采用实例一级的困难为建设高效和健全的组合系统提供了很有希望的方向。
Article 189
Title@2025-06-25 (3): FINN-GL: Generalized Mixed-Precision Extensions for FPGA-Accelerated LSTMs
Title: FINN-GL: Generalized Mixed-Precision Extensions for FPGA-Accelerated LSTMs | FINN-GL: Generalisierte Mischpräzisionserweiterungen für FPGA-beschleunigte LSTMs | FINN-GL:FPGA加速式LSTMs通用混合精密扩展 2506.20810v1 |
Authors (5): Shashwat Khandelwal, Jakoba Petri-Koenig, Thomas B. Preußer, Michaela Blott, Shreejith Shanker
Recurrent neural networks (RNNs), particularly LSTMs, are effective for time-series tasks like sentiment analysis and short-term stock prediction. However, their computational complexity poses challenges for real-time deployment in resource constrained environments. While FPGAs offer a promising platform for energy-efficient AI acceleration, existing tools mainly target feed-forward networks, and LSTM acceleration typically requires full custom implementation. In this paper, we address this gap by leveraging the open-source and extensible FINN framework to enable the generalized deployment of LSTMs on FPGAs. Specifically, we leverage the Scan operator from the Open Neural Network Exchange (ONNX) specification to model the recurrent nature of LSTM computations, enabling support for mixed quantisation within them and functional verification of LSTM-based models. Furthermore, we introduce custom transformations within the FINN compiler to map the quantised ONNX computation graph to hardware blocks from the HLS kernel library of the FINN compiler and Vitis HLS. We validate the proposed tool-flow by training a quantised ConvLSTM model for a mid-price stock prediction task using the widely used dataset and generating a corresponding hardware IP of the model using our flow, targeting the XCZU7EV device. We show that the generated quantised ConvLSTM accelerator through our flow achieves a balance between performance (latency) and resource consumption, while matching (or bettering) inference accuracy of state-of-the-art models with reduced precision. We believe that the generalisable nature of the proposed flow will pave the way for resource-efficient RNN accelerator designs on FPGAs.
常规神经网络,特别是LSTMS,对于情绪分析和短期库存预测等时间序列任务有效。然而,它们的计算复杂性对资源受限环境中实时部署构成挑战。虽然FPGAs为节能AI加速提供了充满希望的平台,但现有工具主要是向前反馈网络,而LSTM加速通常需要完全自定义实施。在本文件中,我们利用开放源码和可扩展的FINNN框架框架来解决这一差距,以便能够在FFPGAs上普遍部署LSTMS。具体地说,我们利用开放神经网络交换(ONNX)的扫描操作员来模拟LSTM计算在资源受限环境中的经常性质,从而能够支持内部的混合量化和基于LSTM的模型的功能核查。此外,我们在FINN汇编器中引入自定义转换,以绘制 ONX 计算图图图,以来自FINNCRC汇编和Vission State SLS图书馆的硬块块。我们确认拟议的工具流动工具流动方式,在CONLSADMT的通用流程模型中,同时使用我们用于运行的运行的内流数据预结果。
Article 190
Title@2025-06-25 (3): GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
Title: GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization | GPU-Kernel-Wissenschaftler: Ein LLM-getriebenes Framework für iterative Kernel-Optimierung | GPU 核心科学家:循环核心优化LLM-驱动框架 2506.20807v1 |
Authors (2): Martin Andrews, Sam Witteveen
Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU architectures where traditional development aids are scarce. This paper introduces an LLM-powered “GPU Kernel Scientist,” an automated methodology for iteratively refining accelerator kernels. Our methodology employs LLMs in a multi-stage, evolutionary process: (a) strategically selecting promising prior code versions as a basis for new iterations; (b) generating hypotheses for optimization experiments, based on existing code and assimilated knowledge from general GPU literature; and (c) autonomously implementing these experiments through code modification and subsequent submission to an external evaluation system, using only observed timing data as performance feedback. We detail how this approach navigates the challenges of the AMD MI300 target architecture and leverages LLMs to compensate for limited domain-specific human expertise. Since quantitative results from an ongoing performance competition were embargoed on paper submission date, we present the architectural design, operational workflow, and qualitative insights, highlighting the potential of LLM-driven agents to democratise and accelerate GPU kernel optimization, especially in resource-constrained or rapidly evolving hardware environments.
优化 GPU 内核以取得高绩效是一项复杂的任务,往往需要深层次的建筑知识、广泛的剖析和迭代实验。当针对传统发展辅助手段稀缺的较新或较少记录的 GPU 结构时,这一挑战会更加艰巨。本文介绍了LLM 驱动的“GPU Kernel 科学家 ” , 这是一种自动的方法,用于迭接地精炼加速器内核。我们的方法在一个多阶段的演进过程中采用LMS : (a) 从战略上选择有前途的先前代码版本,作为新的迭代的基础;(b) 根据现有的代码和一般GPU文献的吸收知识,为优化实验创造假说;以及(c) 通过修改代码和随后向外部评价系统提交数据,仅使用观察的定时数据作为业绩反馈,自主实施这些实验。我们详细介绍这一方法如何应对AMM300目标架构的挑战,并利用LMMS 来补偿有限的具体领域人类专门知识。由于持续的业绩竞争的结果在纸质提交日期被禁,我们介绍建筑设计设计、操作工作流程和定性洞察;以及定性洞察看,通过规则,强调不断变化的硬质化的硬质环境,特别是加速的硬质分析器。
Article 191
Title@2025-06-25 (3): The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
Title: The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas | Die Ideation-Execution-Gap: Ergebnisse der LLM-generierten gegen menschliche Forschungsideen | 观察与执行差距:LLM-Genered与人类研究概念的执行结果 2506.20803v1 |
Authors (3): Chenglei Si, Tatsunori Hashimoto, Diyi Yang
Large Language Models (LLMs) have shown promise in accelerating the scientific research pipeline. A key capability for this process is the ability to generate novel research ideas, and prior studies have found settings in which LLM-generated research ideas were judged as more novel than human-expert ideas. However, a good idea should not simply appear to be novel, it should also result in better research after being executed. To test whether AI-generated ideas lead to better research outcomes, we conduct an execution study by recruiting 43 expert researchers to execute randomly-assigned ideas, either written by experts or generated by an LLM. Each expert spent over 100 hours implementing the idea and wrote a 4-page short paper to document the experiments. All the executed projects are then reviewed blindly by expert NLP researchers. Comparing the review scores of the same ideas before and after execution, the scores of the LLM-generated ideas decrease significantly more than expert-written ideas on all evaluation metrics (novelty, excitement, effectiveness, and overall; p < 0.05), closing the gap between LLM and human ideas observed at the ideation stage. When comparing the aggregated review scores from the execution study, we even observe that for many metrics there is a flip in rankings where human ideas score higher than LLM ideas. This ideation-execution gap highlights the limitations of current LLMs in generating truly effective research ideas and the challenge of evaluating research ideas in the absence of execution outcomes.
大型语言模型(LLMS)在加速科学研究管道方面表现出了希望。 这一过程的一个关键能力是能够产生新的研究想法,而先前的研究发现,LLM公司产生的研究想法被评为比人类专家想法更新颖的,但是,一个好的想法不应该简单看起来是新颖的,它在执行后也应导致更好的研究。为了检验AI公司产生的想法是否导致更好的研究成果,我们开展了一项执行研究,征聘了43名专家研究人员随机执行专家或专家或LLM公司提出的想法。 每位专家花费了100多小时的时间来实施这一想法,并写了一份四页的短文件来记录实验。然后,所有执行的项目都由NLP公司的专家研究人员盲目地加以审查。比较了相同想法在实施前后的评分,LM公司提出的想法的得分比专家撰写的关于所有评价指标(新颖、刺激、有效和整体;第 < 0.05页)的构想。 每位专家都花100多小时的时间来落实这一想法,并撰写了一份四页短的短的短的论文。 在比较执行过程中,所有执行中的所有项目项目都被盲目审查的评分时,然后由NLPM公司研究人员。 研究的评分中,我们观察到在实际的评分中观察到的评分中,甚至观察了执行中的许多评测测测测了执行中的许多评分。
Article 192
Title@2025-06-25 (3): Structural System Identification via Validation and Adaptation
Title: Structural System Identification via Validation and Adaptation | Strukturelle Systemidentifikation durch Validierung und Anpassung | 通过校验和适应确定结构系统 2506.20799v1 |
Authors (2): Cristian López, Keegan J. Moore
Estimating the governing equation parameter values is essential for integrating experimental data with scientific theory to understand, validate, and predict the dynamics of complex systems. In this work, we propose a new method for structural system identification (SI), uncertainty quantification, and validation directly from data. Inspired by generative modeling frameworks, a neural network maps random noise to physically meaningful parameters. These parameters are then used in the known equation of motion to obtain fake accelerations, which are compared to real training data via a mean square error loss. To simultaneously validate the learned parameters, we use independent validation datasets. The generated accelerations from these datasets are evaluated by a discriminator network, which determines whether the output is real or fake, and guides the parameter-generator network. Analytical and real experiments show the parameter estimation accuracy and model validation for different nonlinear structural systems.
估算正方程参数值对于将实验数据与科学理论相结合以理解、验证和预测复杂系统的动态至关重要。 在这项工作中,我们提出了一个新的结构系统识别方法(SI)、不确定性量化和直接从数据中验证。受基因模型框架的启发,神经网络将随机噪音绘制为具有实际意义的参数。这些参数随后用于已知的运动方程式中,以获得假加速度,这些加速度通过平均平方差损失与实际培训数据进行比较。为了同时验证所学参数,我们使用独立的验证数据集。这些数据集产生的加速度由一个歧视者网络进行评估,该网络确定产出是真实的还是假的,并指导参数生成器网络。分析和实际实验显示参数估算准确性和不同非线性结构系统的模型验证。
Article 193
Title@2025-06-25 (3): Stochastic Parameter Decomposition
Title: Stochastic Parameter Decomposition | Stochastischer Parameter Zersetzung | 蒸汽参数分解 2506.20790v1 |
Authors (3): Lucius Bushnaq, Dan Braun, Lee Sharkey
A key step in reverse engineering neural networks is to decompose them into simpler parts that can be studied in relative isolation. Linear parameter decomposition – a framework that has been proposed to resolve several issues with current decomposition methods – decomposes neural network parameters into a sum of sparsely used vectors in parameter space. However, the current main method in this framework, Attribution-based Parameter Decomposition (APD), is impractical on account of its computational cost and sensitivity to hyperparameters. In this work, we introduce \textit{Stochastic Parameter Decomposition} (SPD), a method that is more scalable and robust to hyperparameters than APD, which we demonstrate by decomposing models that are slightly larger and more complex than was possible to decompose with APD. We also show that SPD avoids other issues, such as shrinkage of the learned parameters, and better identifies ground truth mechanisms in toy models. By bridging causal mediation analysis and network decomposition methods, this demonstration opens up new research possibilities in mechanistic interpretability by removing barriers to scaling linear parameter decomposition methods to larger models. We release a library for running SPD and reproducing our experiments at https://github.com/goodfire-ai/spd.
反向工程神经网络的一个关键步骤是将其分解为较简单的部分,可以在相对孤立的情况下加以研究。线性参数分解 – – 为解决当前分解方法中的若干问题而提议的一个框架 – – 将神经网络参数分解成参数空间中少用矢量的总和。然而,目前这个框架中的主要方法,即基于归因的参数分解(APD),由于其计算成本和对超参数的敏感度,是不切实际的。在这项工作中,我们引入了\textit{Stochaistic Parater Dicomposition}(SPD),这是一个比亚分解方法更可伸缩和有力的方法,这个方法通过分解比亚分解法中多得多的模型,将神经网络参数分解成在参数空间中少得多的矢量体矢量。我们还表明,目前SPDD避免了其他问题,例如所学参数的缩缩缩放,以及更好地辨别高度模型中的地面真相机制。通过弥补因果关系的调解分析和网络分解方法,这一演示在机械性解释上开辟了新的研究可能性,通过消除障碍和超度分解性参数,我们在SDDPDSmaimimal上进行我们的实验模型。
Article 194
Title@2025-06-25 (3): Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing
Title: Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing | Spiking Neural Networks for SAR Interferometric Phase Unwrapping: Ein theoretischer Rahmen für energieeffiziente Verarbeitung | 用于合成孔径雷达干涉测量阶段拆解的Spiking神经网络:节能处理理论框架 2506.20782v1 |
Authors (1): Marc Bara
We present the first theoretical framework for applying spiking neural networks (SNNs) to synthetic aperture radar (SAR) interferometric phase unwrapping. Despite extensive research in both domains, our comprehensive literature review confirms that SNNs have never been applied to phase unwrapping, representing a significant gap in current methodologies. As Earth observation data volumes continue to grow exponentially (with missions like NISAR expected to generate 100PB in two years) energy-efficient processing becomes critical for sustainable data center operations. SNNs, with their event-driven computation model, offer potential energy savings of 30-100x compared to conventional approaches while maintaining comparable accuracy. We develop spike encoding schemes specifically designed for wrapped phase data, propose SNN architectures that leverage the spatial propagation nature of phase unwrapping, and provide theoretical analysis of computational complexity and convergence properties. Our framework demonstrates how the temporal dynamics inherent in SNNs can naturally model the spatial continuity constraints fundamental to phase unwrapping. This work opens a new research direction at the intersection of neuromorphic computing and SAR interferometry, offering a complementary approach to existing algorithms that could enable more sustainable large-scale InSAR processing.
我们提出了将神经网络(SNNS)应用于合成孔径雷达(SAR)干涉测量阶段拆解的第一个理论框架。尽管在这两个领域进行了广泛的研究,我们的全面文献审查证实,SNNS从未应用于拆解阶段,这是目前方法中的一个巨大差距。随着地球观测数据量继续成倍增长(像NISAR这样的飞行任务预计在两年内产生100PB),节能处理对于可持续的数据中心运作至关重要。SNNS及其事件驱动的计算模型提供了与常规方法相比的30-100x的潜在节能,同时保持了可比较的准确性。我们专门为包裹阶段数据开发了峰值编码计划,提出了利用拆解阶段的空间传播性质的SNNN结构,并对计算的复杂性和趋同性进行了理论分析。我们的框架表明SNNS所固有的时间动态如何自然地模拟到拆解阶段所必须的空间连续性限制。这项工作开启了神经形态计算和合成合成合成孔径雷达的交叉点的新研究方向,为现有的算法提供了补充方法,以便能够更可持续的大规模地进行处理。
Article 195
Title@2025-06-25 (3): Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
Title: Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon | Stabile Minima der ReLU Neuronalen Netzwerke leiden unter dem Fluch der Dimensionalität: Das neurale Shattering Phänomen | ReLU神经网络中受多面性诅咒之苦的神经网络的稳定微型:神经震荡现象 2506.20779v1 |
Authors (4): Tongtong Liang, Dan Qiao, Yu-Xiang Wang, Rahul Parhi
We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs – a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions vis-`a-vis low-norm solutions (i.e., weight decay), which knowingly do not suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of ‘‘neural shattering’’ where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.
我们研究了平面/低(损失)曲率的隐含偏差,以及它对于具有多变量投入的双层超分度ReLU网络普遍化的影响 – – 这是由渐变培训中微小稳定性和稳定边缘现象引起的问题。 现有工作要么需要内插, 要么只侧重于单向输入。 本文为多变量输入提供了新的和有点令人惊奇的理论结果。 在两种自然环境中:(1) 平面解决方案的概括性差距, 以及(2) 以稳定的迷你模型进行的非对称函数估计的中度差( MSE) , 我们证明, 平面的确意味着普遍化, 由此产生的趋同速度必然会随着输入层面的增长而急剧恶化。 这使得平面解决方案与低温解决方案( e. 重量衰减) 之间呈指数分化, 这在知情中不会受到维度的诅咒。 特别是, 我们的微缩缩缩的下界框构建( MSE) 在与边界定位的ReLU神经元的新型包装论证下, 证明了平面解决方案在高层次上是如何利用高层次的推导力, 在高层次分析中如何利用“ ” 模拟分析中, 使这些压性结果成为了“ ” 的压压压力, 我们的深度的深度分析。
Article 196
Title@2025-06-25 (3): Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Title: Steering Your Diffusion Policy with Latent Space Reinforcement Learning | Steuerung Ihrer Diffusionspolitik mit latentem Raum-Verstärkung-Lernen | 指导您的发射政策 与远程空间加强学习 2506.15799v2 |
Authors (8): Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, Sergey Levine
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior – an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies – a state-of-the-art BC methodology – we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement. Furthermore, DSRL avoids many of the challenges associated with finetuning diffusion policies, obviating the need to modify the weights of the base policy at all. We demonstrate DSRL on simulated benchmarks, real-world robotic tasks, and for adapting pretrained generalist policies, illustrating its sample efficiency and effective performance at real-world policy improvement.
从人类演示中汲取的机器人控制政策在许多现实世界应用中取得了令人印象深刻的成果。然而,在最初表现不尽人意的情景中,如新颖的开放世界环境中经常出现的情况那样,这种行为性克隆(BC)学识政策通常要求收集更多的人类演示,以进一步改善其行为 – – 这是一种昂贵和耗时的过程。相比之下,强化学习(RL)有希望促成自主的在线政策改进,但由于它通常需要大量样本,因此往往无法做到这一点。在这项工作中,我们采取步骤,通过高效率的现实世界RL,使接受不列颠哥伦比亚培训的政策能够快速自主地适应。我们特别侧重于传播政策 – – 一种最先进的不列颠哥伦比亚方法 – – 我们建议通过强化学习(DSRL)来收集更多的人类演示,以进一步改进不列颠哥伦比亚政策。我们表明,加强学习(RL)具有高度的样本效率,只需要黑箱访问不列行政策,并能够有效地改进真实世界自主政策。此外,DSRL避免了与调整传播政策相关的许多挑战,避免了在基础政策改进方面改变实际比重,我们展示了基础政策基准标准,我们展示了所有标准。
Article 197
Title@2025-06-25 (3): Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model
Title: Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model | Enthüllen von neuronalen Darstellungen höherer Ordnung von Unsicherheiten mit der Lärmschätzung durch das Modell der Verstärkungs-basierten Diffusion (NERD) | 通过以增援为基础的扩散(NERD)模型进行噪音估计,以揭示高阶神经神经神经的不确定性 2503.14333v2 |
Authors (2): Hojjat Azimi Asrari, Megan A. K. Peters
Studies often aim to reveal first-order" representations (FORs), which encode aspects of an observer's environment, such as contents or structure. A less-common target is
higher-order” representations (HORs), which are about" FORs -- e.g., their strength or uncertainty -- and which may contribute to learning. HORs about uncertainty are unlikely to be direct
read-outs” of FOR characteristics, instead reflecting noisy estimation processes incorporating prior expectations about uncertainty, but how the brain represents such expected uncertainty distributions remains largely unexplored. Here, we study ``noise expectation” HORs using neural data from a task which may require the brain to learn about its own noise: decoded neurofeedback, wherein human subjects learn to volitionally produce target neural patterns. We develop and apply a Noise Estimation through Reinforcement-based Diffusion (NERD) model to characterize how brains may undertake this process, and show that NERD offers high explanatory power for human behavior.
研究往往旨在揭示“第一顺序”的表示(FORs),这种表示(FORs)将观察员环境的各个方面,如内容或结构等纳入其中。一个较不常见的目标是“高顺序”的表示(HORs),这些表示(HORs)是“关于”的表示(例如,其强度或不确定性),可能有助于学习。关于不确定性的表示不大可能直接地“解释”用于特性,而是反映包含先前对不确定性的预期期望的噪音估计过程,但大脑如何代表这种预期的不确定分布仍然基本上没有得到探讨。在这里,我们研究“新要求”的指使(HORs)使用神经数据,这可能要求大脑了解自己的噪音:解码神经节,让人类的主体学会自动生成目标神经模式。我们开发并应用一个噪音刺激模型,通过基于强化的Difculation(NERD)来描述大脑如何进行这种过程,并显示NERD为人类行为提供高度的解释力。
Article 198
Title@2025-06-25 (3): Stochastic and Non-local Closure Modeling for Nonlinear Dynamical Systems via Latent Score-based Generative Models
Title: Stochastic and Non-local Closure Modeling for Nonlinear Dynamical Systems via Latent Score-based Generative Models | Stochastische und nicht-lokale Verschlussmodellierung für nichtlineare dynamische Systeme über latente Score-basierte Generative Modelle | 通过低记分生成模型为非线性动态系统模拟非线性动态系统建立存储和非本地关闭模型 2506.20771v1 |
Authors (3): Xinghao Dong, Huchen Yang, Jin-Long Wu
We propose a latent score-based generative AI framework for learning stochastic, non-local closure models and constitutive laws in nonlinear dynamical systems of computational mechanics. This work addresses a key challenge of modeling complex multiscale dynamical systems without a clear scale separation, for which numerically resolving all scales is prohibitively expensive, e.g., for engineering turbulent flows. While classical closure modeling methods leverage domain knowledge to approximate subgrid-scale phenomena, their deterministic and local assumptions can be too restrictive in regimes lacking a clear scale separation. Recent developments of diffusion-based stochastic models have shown promise in the context of closure modeling, but their prohibitive computational inference cost limits practical applications for many real-world applications. This work addresses this limitation by jointly training convolutional autoencoders with conditional diffusion models in the latent spaces, significantly reducing the dimensionality of the sampling process while preserving essential physical characteristics. Numerical results demonstrate that the joint training approach helps discover a proper latent space that not only guarantees small reconstruction errors but also ensures good performance of the diffusion model in the latent space. When integrated into numerical simulations, the proposed stochastic modeling framework via latent conditional diffusion models achieves significant computational acceleration while maintaining comparable predictive accuracy to standard diffusion models in physical spaces.
我们提议了一个潜在的基于分数的基因化AI框架,用于在非线性动态计算力系统中学习随机、非局部封闭模型和构成法律。这项工作应对了一个关键挑战,即在不进行明确的规模分离的情况下,模拟复杂的多尺度动态系统,而没有进行明确的规模分离,为此,数字解决所有尺度的尺度都极其昂贵,例如工程动荡流动。虽然传统的封闭模型方法利用域知识,以近似亚电网规模现象,但其确定性和地方假设在缺乏明确规模分离的制度中可能限制性过强。基于扩散的随机模型的近期发展在封闭模型方面显示了希望,但其令人望而却令人望而却步的计算性推论成本限制了许多现实世界应用的实际应用。这项工作通过在潜在空间联合培训具有有条件扩散模型的革命性自动组合者,大大降低取样过程的维度,同时保留基本的物理特征。数字性结果显示,联合培训方法有助于发现一个适当的潜伏空间,不仅保证小规模重建错误,而且还确保潜在空间的传播模型的良好性。在通过可比较的精确性模型中实现可比较的加速性模型的模拟,同时通过可比较的精确的加速性模型,在可变的精确性模型中实现。
Article 199
Title@2025-06-25 (3): GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
Title: GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs | GASP: Effiziente Black-Box-Generation von Adversarial Suffixen für Jailbreaking LLMs | GASP: 高效的黑色塑料制成的防腐化塑料塑料 2411.14133v2 |
Authors (2): Advik Raj Basani, Xiao Zhang
LLMs have shown impressive capabilities across various natural language processing tasks, yet remain vulnerable to input prompts, known as jailbreak attacks, carefully designed to bypass safety guardrails and elicit harmful responses. Traditional methods rely on manual heuristics but suffer from limited generalizability. Despite being automatic, optimization-based attacks often produce unnatural prompts that can be easily detected by safety filters or require high computational costs due to discrete token optimization. In this paper, we introduce Generative Adversarial Suffix Prompter (GASP), a novel automated framework that can efficiently generate human-readable jailbreak prompts in a fully black-box setting. In particular, GASP leverages latent Bayesian optimization to craft adversarial suffixes by efficiently exploring continuous latent embedding spaces, gradually optimizing the suffix prompter to improve attack efficacy while balancing prompt coherence via a targeted iterative refinement procedure. Through comprehensive experiments, we show that GASP can produce natural adversarial prompts, significantly improving jailbreak success over baselines, reducing training times, and accelerating inference speed, thus making it an efficient and scalable solution for red-teaming LLMs.
LLMS在各种自然语言处理任务中表现出了令人印象深刻的能力,但是仍然容易受到被称为越狱攻击的输入提示的影响,这种输入提示被称作越狱攻击,精心设计以绕过安全护栏和引起有害反应;传统方法依靠人工超力,但受一般限制;尽管是自动的,但以优化为基础的攻击往往产生非自然的提示,而安全过滤器可以很容易地检测到,或者由于离散的象征优化而需要很高的计算成本;在本文中,我们引入了“General Aversarial Suffix Expresser”(GASP),这是一个新的自动化框架,可以在完全黑箱环境下有效生成人可读的越狱提示;特别是,GASP利用隐性巴伊西亚优化工具,通过高效探索持续潜伏的嵌入空间,来制造对抗性后继物,逐渐优化防震动器,通过有针对性的迭接改进程序来平衡迅速的一致性;我们通过全面实验,显示GASPASPSPS能够产生自然的对抗提示,大大改进防守基线的成功,减少培训时间,加速推导速度,从而使其成为红色磁磁管的高效和可伸缩的解决方案。
Article 200
Title@2025-06-25 (3): Control and optimization for Neural Partial Differential Equations in Supervised Learning
Title: Control and optimization for Neural Partial Differential Equations in Supervised Learning | Steuerung und Optimierung für neurale Teildifferenzialgleichungen im Supervised Learning | 受监督学习中神经部分差异等同的控制与优化 2506.20764v1 |
Authors (3): Alain Bensoussan, Minh-Binh Tran, Bangjie Wang
Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the coefficients of the associated operators within such systems has not yet been thoroughly explored. In this work, we aim to initiate a line of research in control theory focused on optimizing and controlling the coefficients of these operators-a problem that naturally arises in the context of neural networks and supervised learning. In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: neural networks can be interpreted as partial differential equations (PDEs). From this viewpoint, the control problem traditionally studied in the context of ordinary differential equations (ODEs) is reformulated as a control problem for PDEs, specifically targeting the optimization and control of coefficients in parabolic and hyperbolic operators. To the best of our knowledge, this specific problem has not yet been systematically addressed in the control theory of PDEs. To this end, we propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers. Finally, we investigate the control problem associated with hyperbolic PDEs and prove the existence of solutions for a corresponding approximated control problem.
虽然关于对抛物线和双曲系统的控制和优化问题有大量文献,但控制和优化这些系统中相关操作者系数的具体问题尚未彻底探讨,在这项工作中,我们的目标是发起一系列控制理论的研究,重点是优化和控制这些操作者的系数,这是神经网络和受监督的学习中自然产生的一个问题。在监督的学习中,首要目标是通过神经网络层将初始数据传送到目标数据中。我们提出了一个新观点:神经网络可以被解释为部分差异方程式(PDEs)。从这个观点看,传统上在普通差异方程式(ODs)背景下研究的控制问题被重新确定为PDEs的控制问题,具体针对对在神经网络和受监督的学习中自然产生的系数的优化和控制问题。根据我们所知,这一具体问题还没有在PDEs的控制理论中系统地得到解决。我们建议为与paric PDEs相关的控制和优化问题制定双重系统。我们还提出了一种与PDEs(PDEs)解决方案相关的控制问题,为PDEs(OD)的理论性控制机制的发展提供基础。最后,为PDEs最佳数字控制的未来研究方案提供一种基础。
Article 201
Title@2025-06-25 (3): Characterization and Mitigation of Training Instabilities in Microscaling Formats
Title: Characterization and Mitigation of Training Instabilities in Microscaling Formats | Charakterisierung und Milderung von Ausbildungsinstabilitäten in Mikroskalierungsformaten | 微缩缩放格式培训不稳定情况的特点和缓解 2506.20752v1 |
Authors (5): Huangyuan Su, Mujin Kwun, Stephanie Gil, Sham Kakade, Nikhil Anand
Training large language models is an expensive, compute-bound process that must be repeated as models scale, algorithms improve, and new data is collected. To address this, next-generation hardware accelerators increasingly support lower-precision arithmetic formats, such as the Microscaling (MX) formats introduced in NVIDIA’s Blackwell architecture. These formats use a shared scale within blocks of parameters to extend representable range and perform forward/backward GEMM operations in reduced precision for efficiency gains. In this work, we investigate the challenges and viability of block-scaled precision formats during model training. Across nearly one thousand language models trained from scratch – spanning compute budgets from $2 \times 10^{17}$ to $4.8 \times 10^{19}$ FLOPs and sweeping over a broad range of weight-activation precision combinations – we consistently observe that training in MX formats exhibits sharp, stochastic instabilities in the loss, particularly at larger compute scales. To explain this phenomenon, we conduct controlled experiments and ablations on a smaller proxy model that exhibits similar behavior as the language model, sweeping across architectural settings, hyperparameters, and precision formats. These experiments motivate a simple model in which multiplicative gradient bias introduced by the quantization of layer-norm affine parameters and a small fraction of activations can trigger runaway divergence. Through \emph{in situ} intervention experiments on our proxy model, we demonstrate that instabilities can be averted or delayed by modifying precision schemes mid-training. Guided by these findings, we evaluate stabilization strategies in the LLM setting and show that certain hybrid configurations recover performance competitive with full-precision training. We release our code at https://github.com/Hither1/systems-scaling.
培训大型语言模型是一个昂贵的、 commote- combed 程序, 且必须作为模型规模、 算法改进 和新数据收集 。 要解决这个问题, 下一代硬件加速器将越来越多地支持低精度算术格式, 如 NVIDIA 黑井架构中引入的微缩缩缩缩缩缩缩缩(MX)格式。 这些格式在参数区块内使用共享比例比例, 以扩大可代表范围, 并进行前向/ 后向 GEMM 操作, 降低效率收益的精确度。 在这项工作中, 我们调查了模型培训中区块块度精确度精确度的精确度 。 在从抓起的近一千种语言模型中, 将预算从2美元/ 10 17美元/美元/美元/ 美元/ 美元/ 至4.8美元/ 时间轴缩略微缩略音( MX) 格式, 并扫荡一系列重量激活精度精度的精度组合。 我们不断观察到, MX 格式的培训显示损失的剧烈、 不稳定性, , 特别是更粗度的缩的缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩。 我们可以通过的缩略的缩缩缩缩略的缩略的缩缩化的缩缩缩缩缩缩缩缩略的缩缩略图。
Article 202
Title@2025-06-25 (3): Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers
Title: Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers | Mehrere Ströme der Beziehungsextraktion: Anreicherung und Erinnerung an Transformer | 关系采掘的多种流流:变形器中的丰富和回顾 2506.20746v1 |
Authors (4): Todd Nief, David Reber, Sean Richardson, Ari Holtzman
When an LLM learns a relation during finetuning (e.g., new movie releases, corporate mergers, etc.), where does this information go? Is it extracted when the model processes an entity, recalled just-in-time before a prediction, or are there multiple separate heuristics? Existing localization approaches (e.g. activation patching) are ill-suited for this analysis because they tend to replace parts of the residual stream, potentially deleting information. To fill this gap, we propose dynamic weight-grafting between fine-tuned and pre-trained language models to show that fine-tuned language models both (1) extract relation information learned during finetuning while processing entities and (2) recall" this information in later layers while generating predictions. In some cases, models need both of these pathways to correctly generate finetuned information while, in other cases, a single
enrichment” or recall" pathway alone is sufficient. We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved -- finding that the
recall” pathway occurs via both task-specific attention mechanisms and a relation extraction step in the output of the attention and the feedforward networks at the final layers before next token prediction.
当LLM在微调(例如,新电影发行、公司合并等)过程中学习某种关系时,LLM在微调(例如,新电影发行、公司合并等)中学习了这种关系,这种信息会流向何处?当模型处理一个实体时,在预测前及时召回时,是抽取这种信息吗?现有的本地化方法(例如,激活补丁)不适合进行这种分析,因为它们往往取代剩余流的一部分,有可能删除信息。为了填补这一空白,我们提议在微调和预先训练的语言模式之间动态加权调整,以显示微调的语言模式:(1) 提取在微调过程中在实体处理时学到的关系信息,(2) 在后层中“再次点拨”这种信息,同时产生预测。在某些情况下,这些模式需要两种途径正确生成微调信息,而在其他情况下,单一种“增资”或“再调”路径就足够了。我们研究了这些信息路径的必要性和充分性,检查它们所显示的层次,它们显示的冗余程度,以及涉及哪些模式组成部分 – 发现在具体任务机制和最后产出层次上,在预测前的层次上注意前的“再次注意之前,即开始注意”。
Article 203
Title@2025-06-25 (3): A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools
Title: A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools | Eine KI-Umfrage für die Materialwissenschaft: Gründungsmodelle, LLM-Agenten, Datensätze und Tools | 材料科学学会调查:基础模型、LLM代理、数据集和工具 2506.20743v1 |
Authors (4): Minh-Hao Van, Prateek Verma, Chen Zhao, Xintao Wu
Foundation models (FMs) are catalyzing a transformative shift in materials science (MatSci) by enabling scalable, general-purpose, and multimodal AI systems for scientific discovery. Unlike traditional machine learning models, which are typically narrow in scope and require task-specific engineering, FMs offer cross-domain generalization and exhibit emergent capabilities. Their versatility is especially well-suited to materials science, where research challenges span diverse data types and scales. This survey provides a comprehensive overview of foundation models, agentic systems, datasets, and computational tools supporting this growing field. We introduce a task-driven taxonomy encompassing six broad application areas: data extraction, interpretation and Q\&A; atomistic simulation; property prediction; materials structure, design and discovery; process planning, discovery, and optimization; and multiscale modeling. We discuss recent advances in both unimodal and multimodal FMs, as well as emerging large language model (LLM) agents. Furthermore, we review standardized datasets, open-source tools, and autonomous experimental platforms that collectively fuel the development and integration of FMs into research workflows. We assess the early successes of foundation models and identify persistent limitations, including challenges in generalizability, interpretability, data imbalance, safety concerns, and limited multimodal fusion. Finally, we articulate future research directions centered on scalable pretraining, continual learning, data governance, and trustworthiness.
基础模型(FMs)正在推动材料科学(MatSci)的变革性转变,使材料科学(MatSci)能够实现可扩展的、通用的和多式的独立科学发现系统。传统机器学习模型与传统机器学习模型不同,传统机器学习模型的范围一般狭窄,需要特定任务的工程;调频提供跨部的通用和展示能力。其多功能性特别适合于材料科学,研究挑战涉及不同的数据类型和规模。这一调查全面概述了支持这一日益扩大的领域的基础模型、代理系统、数据集和计算工具。我们引入了任务驱动的分类学,包括六个广泛的应用领域:数据提取、解释和A;不完全模拟;财产预测;材料结构、设计和发现;流程规划、发现和优化;以及多规模模型。我们讨论了在单式和多式调频调频调频调频以及新兴的大型语言模型(LLM)代理方面的最新进展。此外,我们审查了标准化的数据集集、开放源工具以及自主的实验平台,这些都共同推动调频的开发和整合到研究工作流程中。我们最后评估了数据基础基础基础的早期差距,并理解了基础基础基础基础分析的早期分析、基础和可持续性。
Article 204
Title@2025-06-25 (3): Test-time Scaling Techniques in Theoretical Physics – A Comparison of Methods on the TPBench Dataset
Title: Test-time Scaling Techniques in Theoretical Physics – A Comparison of Methods on the TPBench Dataset | Testzeitskalierungstechniken in der Theoretischen Physik – Ein Vergleich der Methoden am TPBench-Datensatz | 理论物理试验时间增强技术 – – TPBench数据集方法比较 2506.20729v1 |
Authors (8): Zhiqi Gao, Tianyi Li, Yurii Kvasiuk, Sai Chaitanya Tadepalli, Maja Rudolph, Daniel J. H. Chung, Frederic Sala, Moritz Münchmeyer
Large language models (LLMs) have shown strong capabilities in complex reasoning, and test-time scaling techniques can enhance their performance with comparably low cost. Many of these methods have been developed and evaluated on mathematical reasoning benchmarks such as AIME. This paper investigates whether the lessons learned from these benchmarks generalize to the domain of advanced theoretical physics. We evaluate a range of common test-time scaling methods on the TPBench physics dataset and compare their effectiveness with results on AIME. To better leverage the structure of physics problems, we develop a novel, symbolic weak-verifier framework to improve parallel scaling results. Our empirical results demonstrate that this method significantly outperforms existing test-time scaling approaches on TPBench. We also evaluate our method on AIME, confirming its effectiveness in solving advanced mathematical problems. Our findings highlight the power of step-wise symbolic verification for tackling complex scientific problems.
大型语言模型(LLMS)在复杂的推理方面表现出很强的能力,测试时间的缩放技术可以以相当低的成本提高它们的性能,其中许多方法已经开发,并用数学推理基准(如AIME)进行了评估。本文调查从这些基准中吸取的经验教训是否概括到先进的理论物理学领域。我们评估了TPBench物理数据集的一系列共同测试时间缩放方法,并将其效力与AIME的结果进行比较。为了更好地利用物理问题的结构,我们开发了一个创新的、象征性的弱化验证框架,以改进平行的缩放结果。我们的经验结果表明,这种方法大大超过了TPBench的现有测试时间缩放方法。我们还评估了我们关于AIME的方法,确认其在解决先进的数学问题方面的有效性。我们的调查结果强调了逐步进行象征性核查以解决复杂的科学问题的力量。
Article 205
Title@2025-06-25 (3): On Convolutions, Intrinsic Dimension, and Diffusion Models
Title: On Convolutions, Intrinsic Dimension, and Diffusion Models | Über Konvolutionen, Intrinsische Dimension und Diffusionsmodelle | 革命、内在层面和扩散模型 2506.20705v1 |
Authors (3): Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem
The manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds. Diffusion models (DMs) – which operate by convolving data with progressively larger amounts of Gaussian noise and then learning to revert this process – have risen to prominence as the most performant generative models, and are known to be able to learn distributions with low-dimensional support. For a given datum in one of these submanifolds, we should thus intuitively expect DMs to have implicitly learned its corresponding local intrinsic dimension (LID), i.e. the dimension of the submanifold it belongs to. Kamkari et al. (2024b) recently showed that this is indeed the case by linking this LID to the rate of change of the log marginal densities of the DM with respect to the amount of added noise, resulting in an LID estimator known as FLIPD. LID estimators such as FLIPD have a plethora of uses, among others they quantify the complexity of a given datum, and can be used to detect outliers, adversarial examples and AI-generated text. FLIPD achieves state-of-the-art performance at LID estimation, yet its theoretical underpinnings are incomplete since Kamkari et al. (2024b) only proved its correctness under the highly unrealistic assumption of affine submanifolds. In this work we bridge this gap by formally proving the correctness of FLIPD under realistic assumptions. Additionally, we show that an analogous result holds when Gaussian convolutions are replaced with uniform ones, and discuss the relevance of this result.
多方面的假设表明,对高维环境空间感兴趣的数据,例如图像数据,存在于未知的低维子元体上。扩散模型(DMs) – – 其运作方式是将数据与数量逐渐增加的高萨噪音相结合,然后学习恢复这一进程 – – 已经作为最有性能的基因化模型而成为突出的模型,并已知能够以低维支持来学习分布。对于这些子层中的一个特定数据,例如图像数据,我们应该直觉地期待DMs隐含地了解其相应的本地内在层面(LID),即它所属的低维度(DMs)的层面。传播模型(DMDs) – – 将数据包含在逐渐增加的噪音中将数据与DM的边际密度变化速度联系起来,从而导致一个被称为FLIPD的测算器。LIDs等测算器除了其他用途外,还用不切实际的数值来量化给给定的达平面值的复杂度(LID)。 Kamkari 等人等人等人等人(2024b) – – – – 其假设的准确性能在ADLILIFDS下正式地分析结果。
Article 206
Title@2025-06-25 (3): Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models
Title: Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models | Diffusion Tree Sampling: Skalierbare Inferenz-Zeit-Ausrichtung von Diffusionsmodellen | 扩散树采样:扩散模型的可缩放推推-时间对齐 2506.20701v1 |
Authors (4): Vineet Jain, Kusha Sareen, Mohammad Pedramfar, Siamak Ravanbakhsh
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS$^\star$), performs a global search for high reward samples. On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to $10\times$ less compute. In text-to-image generation and language completion tasks, DTS$^\star$ effectively searches for high reward samples that match best-of-N with up to $5\times$ less compute. By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples, providing a scalable approach for inference-time alignment of diffusion models.
在基因模型中,现有的指导方法存在不准确的价值估计,特别是在高噪音水平上,这有偏向性指导。此外,过去运行的信息没有被再利用来提高样本质量,导致计算效率低下。在蒙特卡洛树搜索的成功激励下,我们通过将推断时间调整作为重新利用过去计算结果的搜索问题来解决这些局限性。我们引入了一种基于树的方法,即通过传播链和反复更新对每一代人的价值估计值进行更新,从奖励一致的目标密度中提取样本,通过传播链和反复更新计算值估计值来回授。我们拟议的方法,即“植树采伐”(DTS),从无限滚动限制的目标分布中产生零现精确的样本,导致计算效率差的利用。我们受蒙性的变种,即投放树搜索(DTS$star),进行高奖励样品的全球搜索。在MNISTI和CIFAR-10级标准生成中,DTTTS匹配最佳基准,从每一代的10美元更准确的传播价格,从10美元提高到每一代的快速检索。在文本搜索中有效地提供高额的版本,为5美元。
Article 207
Title@2025-06-25 (3): DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy
Title: DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy | DemoDiffusion: Eine heiße menschliche Imitation mit vortrainierter Diffusionspolitik | 利用预先培训的传播政策进行单向人类模拟 2506.20668v1 |
Authors (3): Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani
We propose DemoDiffusion, a simple and scalable method for enabling robots to perform manipulation tasks in natural environments by imitating a single human demonstration. Our approach is based on two key insights. First, the hand motion in a human demonstration provides a useful prior for the robot’s end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Our approach avoids the need for online reinforcement learning or paired human-robot data, enabling robust adaptation to new tasks and scenes with minimal manual effort. Experiments in both simulation and real-world settings show that DemoDiffusion outperforms both the base policy and the retargeted trajectory, enabling the robot to succeed even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/
我们提议DemmoDifulation, 这是一种简单且可扩缩的方法, 使机器人能够模仿人类的单一演示, 在自然环境中执行操作任务。 我们的方法基于两个关键的洞察力。 首先, 人类演示中的手动运动为机器人的终端效应轨迹提供了有用的前程, 我们可以通过运动再定向转换成一个粗糙的开放环形机器人运动轨迹。 第二, 虽然这个重定向运动捕捉了任务的总体结构, 但是它可能不与真实的机器人在文字中的动作相匹配。 为了解决这个问题, 我们利用预先训练的通用扩散政策来改变轨迹, 确保它既跟随人类运动, 也保持在合理机器人动作的分布中。 我们的方法避免了在线强化学习或配对人体机器人数据的需求, 从而能够以最小的手工努力对新任务和场景进行有力的适应。 在模拟和现实世界环境中进行的实验显示, DemoDifmulation 都不符合基本政策和再定向轨迹, 使机器人甚至能够成功完成事先经过训练的一般政策失败的任务。 项目页: https://demodifgiftifgiftgast.
Article 208
Title@2025-06-25 (3): Data Quality in Crowdsourcing and Spamming Behavior Detection
Title: Data Quality in Crowdsourcing and Spamming Behavior Detection | Datenqualität bei Crowdsourcing und Spamming Verhaltenserkennung | 众包和垃圾传播行为检测数据质量 2404.17582v2 |
Authors (4): Yang Ba, Michelle V. Mancenido, Erin K. Chiou, Rong Pan
As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators’ consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency, and two metrics are developed to measure crowd workers’ credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.
由于众包是获取机器学习数据集标签的有效和成本效益高的方法,因此必须评估众包数据的质量,以便改进分析性能,减少随后机器学习任务中的偏差。鉴于大多数众包缺乏地面真相,我们把数据质量称为说明者的一致性和可信度。与通常可以适用卡帕系数和阶级内部相关系数的简单假设不同,在线众包需要处理更复杂的情况。我们采用系统的方法评估数据质量,发现因差异分解产生的垃圾威胁,并根据不同行为模式将垃圾邮件分类为三类。我们建议采用垃圾邮件索引来评估整个数据一致性,并开发了两个衡量人群工人信誉的尺度,即利用马尔科夫链和一般随机效应模型。此外,我们用两个众包平台收集的模拟和真实世界数据来展示我们技术的实用性及其优势。
Article 209
Title@2025-06-25 (3): Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning
Title: Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning | Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning | 听不见邪恶:在联邦学习中发现恶意服务器的渐变渗漏 2506.20651v1 |
Authors (2): Fei Wang, Baochun Li
Recent work has shown that gradient updates in federated learning (FL) can unintentionally reveal sensitive information about a client’s local data. This risk becomes significantly greater when a malicious server manipulates the global model to provoke information-rich updates from clients. In this paper, we adopt a defender’s perspective to provide the first comprehensive analysis of malicious gradient leakage attacks and the model manipulation techniques that enable them. Our investigation reveals a core trade-off: these attacks cannot be both highly effective in reconstructing private data and sufficiently stealthy to evade detection – especially in realistic FL settings that incorporate common normalization techniques and federated averaging. Building on this insight, we argue that malicious gradient leakage attacks, while theoretically concerning, are inherently limited in practice and often detectable through basic monitoring. As a complementary contribution, we propose a simple, lightweight, and broadly applicable client-side detection mechanism that flags suspicious model updates before local training begins, despite the fact that such detection may not be strictly necessary in realistic FL settings. This mechanism further underscores the feasibility of defending against these attacks with minimal overhead, offering a deployable safeguard for privacy-conscious federated learning systems.
最近的工作表明,联邦学习(FL)中的梯度更新会无意中透露有关客户本地数据的敏感信息。当恶意服务器操纵全球模型以激起客户提供丰富信息的最新信息时,这种风险会大得多。在本文件中,我们从捍卫者的角度出发,首次全面分析恶意梯度渗漏袭击和使这些袭击得以使用的模型操纵技术。我们的调查揭示了一个核心的权衡:这些袭击在重建私人数据时不可能非常有效,也不可能有足够的隐性以躲避探测,特别是在现实的FL环境中,这种环境包含共同的正常化技术和平均联盟化。基于这一洞察,我们认为,恶意梯度渗漏袭击虽然理论上涉及,但在实践上必然有限,而且往往可以通过基本监测探测出来。作为补充,我们建议了一个简单、轻量和广泛适用的客户方检测机制,在地方培训开始前标出可疑的模型更新信号,尽管在现实的FL环境中可能并非绝对必要。这一机制进一步强调了以最低的间接费用来防范这些袭击的可行性,为有隐私意识的进化学习系统提供可部署的保障。
Article 210
Title@2025-06-25 (3): Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer
Title: Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer | Mastering Multiple-Expert Routing: Realisierbare $H$-Konsistenz und starke Garantien für das Lernen zu verteidigen | 掌握多专家课程:可实现的美元-耐力和学习迟缓的有力保障 2506.20650v1 |
Authors (3): Anqi Mao, Mehryar Mohri, Yutao Zhong
The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable $H$-consistent surrogate losses and further prove $H$-consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.
与多位专家一起学习推迟的问题包括向专家优化分配投入情况,平衡其准确性和计算成本之间的平衡。这是自然语言生成以及图像处理和医学诊断等其他领域的重大挑战。最近的研究提出了替代损失功能,以优化推迟功能,但在确保其一致性特性方面仍然存在挑战。本文件介绍了新的替代损失功能和高效算法,并提供了强有力的理论学习保证。我们讨论了以下问题:在单阶段(联合学习预测和推迟功能)和两阶段(仅用固定专家学习推迟功能)两个阶段(仅用固定专家学习推迟功能)之间实现可实现的一致、美元一致性约束和比值一致。关于单阶段(联合学习预测和推迟功能)和两阶段(仅用固定专家学习情景学习推迟功能)的难题。关于单阶段推迟,我们采用了一套新的可实现的美元一致性的替代损失功能,并进一步证明对选定成员而言,美元具有很强的理论学保证。关于两阶段推迟的建议,我们得出新的替代损失,即实现可实现可实现可实现的美元一致性、美元一致性约束性和延迟性功能,以及两个阶段(根据我们现有基线假设)的、在两种假设下,为改进的核安全假设提供新的假设提供更好的初步结果。
Article 211
Title@2025-06-25 (3): Disentangled representations of microscopy images
Title: Disentangled representations of microscopy images | Entwirrte Darstellungen von Mikroskopiebildern | 显微镜图像的分解表达式 2506.20649v1 |
Authors (4): Jacopo Dapueto, Vito Paolo Pastore, Nicoletta Noceti, Francesca Odone
Microscopy image analysis is fundamental for different applications, from diagnosis to synthetic engineering and environmental monitoring. Modern acquisition systems have granted the possibility to acquire an escalating amount of images, requiring a consequent development of a large collection of deep learning-based automatic image analysis methods. Although deep neural networks have demonstrated great performance in this field, interpretability, an essential requirement for microscopy image analysis, remains an open challenge. This work proposes a Disentangled Representation Learning (DRL) methodology to enhance model interpretability for microscopy image classification. Exploiting benchmark datasets from three different microscopic image domains (plankton, yeast vacuoles, and human cells), we show how a DRL framework, based on transferring a representation learnt from synthetic data, can provide a good trade-off between accuracy and interpretability in this domain.
从诊断到合成工程和环境监测,显微镜图像分析是不同应用的基础,从诊断到合成工程和环境监测。现代获取系统使得有可能获得数量不断上升的图像,因此需要随后开发大量基于深层次学习的自动图像分析方法。虽然深神经网络在这一领域表现出了巨大的表现,但显微镜图像分析的基本要求——可解释性仍然是一项公开的挑战。这项工作提出了一种分解的代表学习方法,以加强显微镜图像分类的模型解释性。我们利用了三个不同的微镜领域(浮游生物、酵母真空和人类细胞)的基准数据集,我们展示了基于从合成数据中学习的表述的DRL框架如何在该领域的准确性和可解释性之间实现良好的平衡。
Article 212
Title@2025-06-25 (3): Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices
Title: Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices | Effizientes Federated Learning mit verschlüsselter Datenfreigabe für daten-heterogene Edge-Geräte | 数据异异异边设备加密数据共享加密数据高效联邦学习 2506.20644v1 |
Authors (6): Hangyu Li, Hongyue Wu, Guodong Fan, Zhen Zhang, Shizhan Chen, Zhiyong Feng
As privacy protection gains increasing importance, more models are being trained on edge devices and subsequently merged into the central server through Federated Learning (FL). However, current research overlooks the impact of network topology, physical distance, and data heterogeneity on edge devices, leading to issues such as increased latency and degraded model performance. To address these issues, we propose a new federated learning scheme on edge devices that called Federated Learning with Encrypted Data Sharing(FedEDS). FedEDS uses the client model and the model’s stochastic layer to train the data encryptor. The data encryptor generates encrypted data and shares it with other clients. The client uses the corresponding client’s stochastic layer and encrypted data to train and adjust the local model. FedEDS uses the client’s local private data and encrypted shared data from other clients to train the model. This approach accelerates the convergence speed of federated learning training and mitigates the negative impact of data heterogeneity, making it suitable for application services deployed on edge devices requiring rapid convergence. Experiments results show the efficacy of FedEDS in promoting model performance.
随着隐私保护越来越重要,正在对更多的模型进行边缘设备培训,然后通过联邦学习联合会(FL)将其并入中央服务器。然而,目前的研究忽略了网络地形学、物理距离和数据在边缘设备上的异质性的影响,从而导致诸如增加潜伏度和降低模型性能等问题。为了解决这些问题,我们提议在边缘设备上建立一个称为“以加密数据共享方式进行联邦学习联合会(FededEDS)”的新的联合学习计划。FedEDS使用客户模型和模型的随机层来培训数据加密。数据加密器生成加密数据,并与其他客户共享。客户使用相应的客户的随机层和加密数据来培训和调整本地模型。FedEDS使用客户的本地私人数据和来自其他客户的加密共享数据来培训模型。这种方法加快了联邦学习培训的趋同速度,减轻了数据异性性的负面影响,使其适合用于需要快速趋同的边缘设备的应用服务。实验结果显示FedEDS在推动性能模型方面的功效。
Article 213
Title@2025-06-25 (3): Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data
Title: Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data | Ausbalancieren der Skalen: Ein theoretischer und algorithmischer Rahmen für das Lernen aus unausgewogenen Daten | 平衡尺度:从不平衡数据中学习的理论和算法框架 2502.10381v2 |
Authors (4): Corinna Cortes, Anqi Mao, Mehryar Mohri, Yutao Zhong
Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We then propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (Imbalanced Margin Maximization), which incorporate confidence margins and are applicable to various hypothesis sets. While our focus is theoretical, we also present extensive empirical results demonstrating the effectiveness of our algorithms compared to existing baselines.
分类不平衡仍然是机器学习的一大挑战,特别是在长期销售的多级问题中。现有的方法,如数据抽取、成本敏感技术和后勤损失修改,尽管流行而且往往有效,但缺乏坚实的理论基础。举例来说,我们证明成本敏感方法不是贝耶斯一致的方法。本文为分析不平衡分类的概括性引入了一个新的理论框架。我们然后为二进制和多级环境提出了一个新的阶级平衡差值损失功能,证明了其强劲的美元一致性,并根据经验损失和对阶级敏感的雷德马赫复杂程度的新概念获得了相应的学习保障。利用这些理论成果,我们设计了新颖和一般的学习算法,IMMAX(Im平衡的玛金最大化),其中包括信任幅度,适用于各种假设。我们的重点虽然是理论性,但我们也提出了广泛的实验结果,表明我们算法相对于现有基线的有效性。
Article 214
Title@2025-06-25 (3): Towards Community-Driven Agents for Machine Learning Engineering
Title: Towards Community-Driven Agents for Machine Learning Engineering | Auf dem Weg zu gemeinschaftsgetriebenen Agenten für Maschinenbau | 争取社区驱动机械学习工程代理 2506.20640v1 |
Authors (5): Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang
Large language model-based machine learning (ML) agents have shown great promise in automating ML research. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent’s ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a novel agent that excels at exchanging insights and developing novel solutions within a community context. CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions. Our code is released at https://github.com/comind-ml/CoMind.
大型语言模型的机器学习代理在ML研究自动化方面表现出巨大的希望,然而,现有代理通常在某一研究问题上孤立地运作,而没有与更广泛的研究界接触,人类研究人员往往通过分享知识获得洞察力和贡献。为了缩小这一差距,我们引入了MLE-Live,这是一个现场评价框架,旨在评估代理商与模拟Kaggle研究界的集体知识进行交流和利用这些知识的能力。我们在此框架的基础上,提议CoMind,这是一个在社区范围内交流见解和开发新解决方案的新型代理商。CoMind在MLE-Live和超模版四个正在进行的Kagle竞争中平均达到79.2%的人类竞争者最先进的表现。我们的代码在https://github.com/comind-ml/CoMind上发布。
Article 215
Title@2025-06-25 (3): First-order methods for stochastic and finite-sum convex optimization with deterministic constraints
Title: First-order methods for stochastic and finite-sum convex optimization with deterministic constraints | Verfahren erster Ordnung zur stochastischen und finite-sum-konvexen Optimierung mit deterministischen Zwängen | 具有确定性限制的随机和有限总消费的优化第一阶方法 2506.20630v1 |
Authors (2): Zhaosong Lu, Yifeng Xiao
In this paper, we study a class of stochastic and finite-sum convex optimization problems with deterministic constraints. Existing methods typically aim to find an $\epsilon$-$expectedly\ feasible\ stochastic\ optimal$ solution, in which the expected constraint violation and expected optimality gap are both within a prescribed tolerance $\epsilon$. However, in many practical applications, constraints must be nearly satisfied with certainty, rendering such solutions potentially unsuitable due to the risk of substantial violations. To address this issue, we propose stochastic first-order methods for finding an $\epsilon$-$surely\ feasible\ stochastic\ optimal$ ($\epsilon$-SFSO) solution, where the constraint violation is deterministically bounded by $\epsilon$ and the expected optimality gap is at most $\epsilon$. Our methods apply an accelerated stochastic gradient (ASG) scheme or a modified variance-reduced ASG scheme $only\ once$ to a sequence of quadratic penalty subproblems with appropriately chosen penalty parameters. We establish first-order oracle complexity bounds for the proposed methods in computing an $\epsilon$-SFSO solution. As a byproduct, we also derive first-order oracle complexity results for sample average approximation method in computing an $\epsilon$-SFSO solution of the stochastic optimization problem using our proposed methods to solve the sample average problem.
在本文中,我们研究了一系列具有确定性制约的随机和有限总和优化问题。现有方法通常旨在找到一个美元-美元预期可行和最佳的解决方案,其中,预期的制约违约和预期的最佳性差距都在规定的容忍范围内 $\ epsilon$。然而,在许多实际应用中,限制必须几乎以确定性的方式得到满足,使这种解决方案有可能因重大违约风险而变得不合适。为了解决这一问题,我们建议了找到美元-肯定-美元-可行\ ochortislon$-stostopal roupations rouple roups-ofervolvements roupal $-servol-restive $oqual-servility rolegal-squalformation roupal roupal-supal roupal roupal-supal-squal-ral-ral-rass rofal rol roupal roup roupal roup roup roup roup roupal roupal roup roup roup roup roupal-s-我们提出的平均平均方法。我们提出的平均方法。我们提出的一个平均方法,一个平均方法,我们提出的平均或平均的计算方法。
Article 216
Title@2025-06-25 (3): PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
Title: PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models | PLoP: Präzise LoRA-Platzierung für effiziente Feinsteuerung großer Modelle | PLP: 高效微调大型模型的精确LORA定位 2506.20629v1 |
Authors (3): Soufiane Hayou, Nikhil Ghosh, Bin Yu
Low-Rank Adaptation (LoRA) is a widely used finetuning method for large models. Its small memory footprint allows practitioners to adapt large models to specific tasks at a fraction of the cost of full finetuning. Different modifications have been proposed to enhance its efficiency by, for example, setting the learning rate, the rank, and the initialization. Another improvement axis is adapter placement strategy: when using LoRA, practitioners usually pick module types to adapt with LoRA, such as Query and Key modules. Few works have studied the problem of adapter placement, with nonconclusive results: original LoRA paper suggested placing adapters in attention modules, while other works suggested placing them in the MLP modules. Through an intuitive theoretical analysis, we introduce PLoP (Precise LoRA Placement), a lightweight method that allows automatic identification of module types where LoRA adapters should be placed, given a pretrained model and a finetuning task. We demonstrate that PLoP consistently outperforms, and in the worst case competes, with commonly used placement strategies through comprehensive experiments on supervised finetuning and reinforcement learning for reasoning.
低兰克适应(LORA)是大型模型广泛使用的微调方法,其小记忆足迹使从业人员能够以完全微调成本的一小部分使大型模型适应具体任务。提出了不同的修改建议,以提高其效率,例如,通过设定学习率、等级和初始化。另一个改进轴是适配安置战略:当使用LORA时,从业人员通常选择模块类型与LORA(如Query和Key模块)相适应。很少有作品研究适应者安置问题,且没有得出结论性结果:原始LORA文件建议将适配者放在模块中,而其他作品则建议将其置于MLP模块中。通过直观的理论分析,我们引入了PLP(Precise LoRA Placis),这是一种轻量的方法,允许自动识别模块类型,在LORA适应者应放置的位置,我们先经过培训的模型和微调任务。我们证明,PLOP始终优异,在最坏的情况下,通过监督性微调和强化学习的全面实验,共同使用定位战略。
Article 217
Title@2025-06-25 (3): On Context-Content Uncertainty Principle
Title: On Context-Content Uncertainty Principle | Zu Kontext-Inhalt-Unsicherheitsprinzip | 关于内含内含的不确定性原则 2506.20699v1 |
Authors (1): Xin Li
The Context-Content Uncertainty Principle (CCUP) proposes that inference under uncertainty is governed by an entropy asymmetry between context and content: high-entropy contexts must be interpreted through alignment with low-entropy, structured content. In this paper, we develop a layered computational framework that derives operational principles from this foundational asymmetry. At the base level, CCUP formalizes inference as directional entropy minimization, establishing a variational gradient that favors content-first structuring. Building upon this, we identify four hierarchical layers of operational principles: (\textbf{L1}) \emph{Core Inference Constraints}, including structure-before-specificity, asymmetric inference flow, cycle-consistent bootstrapping, and conditional compression, all shown to be mutually reducible; (\textbf{L2}) \emph{Resource Allocation Principles}, such as precision-weighted attention, asymmetric learning rates, and attractor-based memory encoding; (\textbf{L3}) \emph{Temporal Bootstrapping Dynamics}, which organize learning over time via structure-guided curricula; and (\textbf{L4}) \emph{Spatial Hierarchical Composition}, which integrates these mechanisms into self-organizing cycles of memory, inference, and planning. We present formal equivalence theorems, a dependency lattice among principles, and computational simulations demonstrating the efficiency gains of CCUP-aligned inference. This work provides a unified theoretical foundation for understanding how brains and machines minimize uncertainty through recursive structure-specificity alignment. The brain is not just an inference machine. It is a cycle-consistent entropy gradient resolver, aligning structure and specificity via path-dependent, content-seeded simulation.
上下文不确定性原则 (CCUP) 提出, 不确定性下的推论由上下文和内容之间的变异不对称决定: 高渗透性环境必须通过与低渗透性、 结构化内容的匹配来解释。 在本文中, 我们开发了一个分层计算框架, 其运作原则来自这种基础不对称。 在基数层面, CCUP将推论正式化为方向性最小化, 建立有利于内容第一个结构的变异性梯度。 在此基础上, 我们确定了四个操作原则的等级层:( textbf{L1}) 高渗透性环境环境必须通过结构前特定性、 不对称推动性结构、 循环性机电流和有条件压缩来解释。 (\ textbf{L2} 将推导性推理性推理性最小化, 校正正( textblef) 和感官统性校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正- 校正
Article 218
Title@2025-06-25 (3): Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States
Title: Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States | Probing Quantum Spin Systems mit Kolmogorov-Arnold Neural Network Quantum States | 与Kolmogorov-Arold神经网络 2506.01891v3 |
Authors (5): Mahmud Ashraf Shamim, Eric A F Reinhardt, Talal Ahmed Chowdhury, Sergei Gleyzer, Paulo T Araujo
Neural Quantum States (NQS) are a class of variational wave functions parametrized by neural networks (NNs) to study quantum many-body systems. In this work, we propose \texttt{SineKAN}, a NQS \textit{ansatz} based on Kolmogorov-Arnold Networks (KANs), to represent quantum mechanical wave functions as nested univariate functions. We show that \texttt{SineKAN} wavefunction with learnable sinusoidal activation functions can capture the ground state energies, fidelities and various correlation functions of the one dimensional Transverse-Field Ising model, Anisotropic Heisenberg model, and Antiferromagnetic $J_{1}-J_{2}$ model with different chain lengths. In our study of the $J_1-J_2$ model with $L=100$ sites, we find that the \texttt{SineKAN} model outperforms several previously explored neural quantum state \textit{ans"atze}, including Restricted Boltzmann Machines (RBMs), Long Short-Term Memory models (LSTMs), and Multi-layer Perceptrons (MLP) \textit{a.k.a.} Feed Forward Neural Networks, when compared to the results obtained from the Density Matrix Renormalization Group (DMRG) algorithm. We find that \texttt{SineKAN} models can be trained to high precisions and accuracies with minimal computational costs.
神经量子( NQS) 是一个由神经网络( NNS) 研究量子体系统的变波函数的类别。 在这项工作中, 我们提议基于 Kolmogorov-Arnold 网络( KANS) 的 NQS\ textit{ ansatz} , 代表量子机械波函数, 作为嵌入的单体功能 。 我们显示,\ textt{SineKAN} 具有可学习的正弦值激活功能的波函数, 可以捕捉到一个维度反向字段的模型、 Anisotropic Heisenberg 模型和 不同链长的抗腐蚀磁波模型的地面状态能量、真实性和各种相关功能。 在我们对$L=100美元的模型的研究中, 我们发现,\ textt{SineKAN} 模型可以超越先前探索的内基量值( Restrical) 和 Restrical Restrial( Restal- Restrial) 模型。
Article 219
Title@2025-06-25 (3): Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning
Title: Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning | Lost in Retraining: Roaming des Parameterraums exponentieller Familien unter geschlossenem Loop-Lernen | 损失在再培训中:在闭路学习下,在封闭式学习下,对有生命力的家庭的参数空间进行Roaming 2506.20623v1 |
Authors (3): Fariba Jangjoo, Matteo Marsili, Yasser Roudi
Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented by polluting the data with an infinitesimal fraction of data points generated from a fixed model, by relying on maximum a posteriori estimation or by introducing regularisation. Furthermore, we show that the asymptotic behavior of the dynamics is not reparametrisation invariant.
闭路学习是一个从模型本身产生的数据中反复估计模型的过程,由于大型神经网络模型今后可能主要接受人工神经网络本身产生的数据培训,因此受到极大关注。我们研究属于指数型家庭的模型的这个过程,从中得出支配参数动态的动作方程式。我们显示,对参数的最大可能性的估计足以提供与马丁格尔属性有关的充分统计数据,因此,这一过程会与吸收扩大数据中存在的初始偏差的国家汇合在一起。然而,我们表明,如果用固定模型产生的极小的数据点污染数据,依靠最大程度的事后估计或通过引入常规化,则可能无法防止这一结果。此外,我们表明,动态的无症状行为并不是在逆差中进行重新校正。
Article 220
Title@2025-06-25 (3): Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Title: Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models | Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle | 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v2 |
Authors (7): Thao Nguyen, Yang Li, Olga Golovneva, Luke Zettlemoyer, Sewoong Oh, Ludwig Schmidt, Xian Li
Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. Furthermore, the availability of high-quality texts is even more limited: data filtering pipelines often remove up to 99% of the initial web scrapes to achieve state-of-the-art. To address the “data wall” of pre-training scaling, our work explores ways to transform and recycle data discarded in existing filtering processes. We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.
缩放法律预测,随着模型规模和数据规模的增加,大型语言模型的性能会随着模型规模和数据规模的扩大而提高。在实践中,培训前一直依赖大规模网络爬行,使用互联网上迄今公开的几乎所有数据源。然而,这一自然数据库的生长速度与计算供应量的增长速度不同。此外,高质量文本的提供甚至更加有限:数据过滤管道往往会从初始网络废料中消除多达99%的绩效,以达到最新水平。为了解决培训前规模的“数据墙”问题,我们的工作一直在探索如何改造和再循环现有过滤过程中丢弃的数据。我们提议REWIRE, 将网络循环使用虚拟版的Rewret, 以这一方法来丰富低质量文件,这样就可以对培训有用。 数据过滤前的1B、 3B和 7B 测试显示,高质量的原始文本和我们重新编译文的文本将最终转化为1.0、 1.3 和 2.5 变异性指标将分别在22个不同的任务中进行, 培训数据将显示我们精选的精选前数据。
Article 221
Title@2025-06-25 (3): Do Concept Bottleneck Models Respect Localities?
Title: Do Concept Bottleneck Models Respect Localities? | Respektieren Konzept-Hengpässe-Modelle die Lokalitäten? | ” 瓶颈模式 “ 概念是否尊重地方? 2401.01259v5 |
Authors (4): Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik
Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models. These methods assume concept predictions can help understand a model’s internal reasoning. In this work, we assess the degree to which such an assumption is true by analyzing whether concept predictors leverage “relevant” features to make predictions, a term we call locality. Concept-based models that fail to respect localities also fail to be explainable because concept predictions are based on spurious features, making the interpretation of the concept predictions vacuous. To assess whether concept-based models respect localities, we construct and use three metrics to characterize when models respect localities, complementing our analysis with theoretical results. Each of our metrics captures a different notion of perturbation and assess whether perturbing “irrelevant” features impacts the predictions made by a concept predictors. We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts. Based on these findings, we propose suggestions for alleviating this issue.
基于概念的解释方法使用人类可理解的中间人来解释机器学习模型。这些方法假定概念预测有助于理解模型的内部推理。在这项工作中,我们通过分析概念预测人是否利用“相关”特征作出预测来评估这种假设的真实程度,我们称之为“地点”。基于概念的模型不尊重地点,也未能解释,因为概念预测是基于虚假特征,使概念预测的解释变得空洞。为了评估基于概念的模型是否尊重地点,我们建造和使用三个衡量尺度来描述模型尊重地点的特点,用理论结果来补充我们的分析。我们的每一项衡量尺度都捕捉了一种不同的扰动性概念,并评估“相关”特征是否影响概念预测人所作的预测。我们发现,在实践中使用的许多基于概念的模型无法尊重地点,因为概念预测人总是无法明确区分不同的概念。根据这些发现,我们提出了减轻这一问题的建议。
Article 222
Title@2025-06-25 (3): From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification
Title: From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification | Von $\mathcal{O}(n^{2})$ bis $\mathcal{O}(n)$ Parameter: Quanten Selbstaufmerksamkeit in Visionstransformatoren für die biomedizinische Bildklassifikation | 从$\mathcal{O}(n2})美元到$\mathcal{O}(n)$ 参数:生物医学图像分类视觉变异器中的量子自我注意 2503.07294v2 |
Authors (3): Thomas Boucher, John Whittle, Evangelos B. Mazomenos
We demonstrate that quantum vision transformers (QViTs), vision transformers (ViTs) with self-attention (SA) mechanisms replaced by quantum self-attention (QSA) mechanisms, can match state-of-the-art (SOTA) biomedical image classifiers while using 99.99% fewer parameters. QSAs are produced by replacing linear SA layers with parameterised quantum neural networks (QNNs), producing a QSA mechanism and reducing parameter scaling from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$. On RetinaMNIST, our ultra parameter-efficient QViT outperforms 13/14 SOTA methods including CNNs and ViTs, achieving 56.5% accuracy, just 0.88% below the top MedMamba model while using 99.99% fewer parameters (1K vs 14.5M) and 89% fewer GFLOPs. We present the first investigation of knowledge distillation (KD) from classical to quantum vision transformers in biomedical image classification, showing that QViTs maintain comparable performance to classical ViTs across eight diverse datasets spanning multiple modalities, with improved QSA parameter-efficiency. Our higher-qubit architecture benefitted more from KD pre-training, suggesting a scaling relationship between QSA parameters and KD effectiveness. These findings establish QSA as a practical architectural choice toward parameter-efficient biomedical image analysis.
我们证明,量子视觉变压器(QVITs),视觉变压器(VITs),用量子自留(SA)机制取代自我自留(SA)机制。在RetinaMNIST(超高参数效率QVIT)方面,我们超高参数自留(QSA)方法优于13/14 SOTA(包括CNN和VITs)方法,同时使用999.99%的参数(1K vs 14.5MM)和89%的GFLOPs,将线状SA(QSA)替换为线性量子神经神经网络(QNNNTs),生产QSA机制,并将参数的缩放从$mathcal{O}(n%2)降为$\mathcal{O}(n)美元。在RetinaMNIST(SA)方面,我们的超高参数效率QVIT优于13/14 SO方法,包括CN和VITs, 达到50.5%的精确度,仅低于MMMMMMMMMMMAMBA模型。我们8的数据化(SA)比SA)比SA(SA) 的升级关系中,我们第一次调查从古典-SAL-SAL-SAL-SAL-SAL-SAL-SA-SA-SA-SAL-SA-SAL-SAL-SA-S-S-S-SB),显示了一种可比较性能。
Article 223
Title@2025-06-25 (3): H-FEX: A Symbolic Learning Method for Hamiltonian Systems
Title: H-FEX: A Symbolic Learning Method for Hamiltonian Systems | H-FEX: Eine symbolische Lernmethode für Hamilton-Systeme | H-FEX:汉密尔顿系统符号学习方法 2506.20607v1 |
Authors (3): Jasen Lai, Senwei Liang, Chunmei Wang
Hamiltonian systems describe a broad class of dynamical systems governed by Hamiltonian functions, which encode the total energy and dictate the evolution of the system. Data-driven approaches, such as symbolic regression and neural network-based methods, provide a means to learn the governing equations of dynamical systems directly from observational data of Hamiltonian systems. However, these methods often struggle to accurately capture complex Hamiltonian functions while preserving energy conservation. To overcome this limitation, we propose the Finite Expression Method for learning Hamiltonian Systems (H-FEX), a symbolic learning method that introduces novel interaction nodes designed to capture intricate interaction terms effectively. Our experiments, including those on highly stiff dynamical systems, demonstrate that H-FEX can recover Hamiltonian functions of complex systems that accurately capture system dynamics and preserve energy over long time horizons. These findings highlight the potential of H-FEX as a powerful framework for discovering closed-form expressions of complex dynamical systems.
汉密尔顿系统描述由汉密尔顿功能管理的广泛一类动态系统,该功能对总能量进行编码,对该系统的演化有支配力。数据驱动方法,例如象征性回归和神经网络法,提供了直接从汉密尔顿系统的观测数据中学习动态系统治理方程式的手段。然而,这些方法往往难以准确捕捉复杂的汉密尔顿功能,同时保护节能。为了克服这一限制,我们提议了学习汉密尔顿系统(H-FEX)的“Finite表达法 ” , 这是一种象征性学习方法,它引入了旨在有效捕捉复杂互动术语的新的互动节点。我们的实验,包括高度僵硬的动态系统实验,表明H-FEX能够恢复汉密尔顿系统功能,准确捕捉到系统动态并保存长时的能量。这些研究结果凸显了H-FEX作为发现复杂动态系统的闭式表达的强大框架的潜力。
Article 224
Title@2025-06-25 (3): LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization
Title: LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization | LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization | LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v3 |
Authors (5): Yuanye Zhou, Zhaokun Wang, Kai Zhou, Hui Tang, Xiaofan Li
Physics-informed neural networks (PINNs) have emerged as a powerful meshless tool for topology optimization, capable of simultaneously determining optimal topologies and physical solutions. However, conventional PINNs rely on density-based topology descriptions, which necessitate manual interpolation and limit their applicability to complex geometries. To address this, we propose Lagrangian topology-conscious PINNs (LT-PINNs), a novel framework for boundary-focused engineering optimization. By parameterizing the control variables of topology boundary curves as learnable parameters, LT-PINNs eliminate the need for manual interpolation and enable precise boundary determination. We further introduce specialized boundary condition loss function and topology loss function to ensure sharp and accurate boundary representations, even for intricate topologies. The accuracy and robustness of LT-PINNs are validated via two types of partial differential equations (PDEs), including elastic equation with Dirichlet boundary conditions and Laplace’s equation with Neumann boundary conditions. Furthermore, we demonstrate effectiveness of LT-PINNs on more complex time-dependent and time-independent flow problems without relying on measurement data, and showcase their engineering application potential in flow velocity rearrangement, transforming a uniform upstream velocity into a sine-shaped downstream profile. The results demonstrate (1) LT-PINNs achieve substantial reductions in relative L2 errors compared with the state-of-art density topology-oriented PINNs (DT-PINNs), (2) LT-PINNs can handle arbitrary boundary conditions, making them suitable for a wide range of PDEs, and (3) LT-PINNs can infer clear topology boundaries without manual interpolation, especially for complex topologies.
物理知情神经网络(PINNs)已成为一个强大的表面优化的全网工具,能够同时确定最佳地形和物理解决方案。然而,传统的PINNs依赖基于密度的地形描述,这就需要人工的内插,并限制其适用于复杂的地形。为此,我们提议Lagrangian地形意识型神经网络(LT-PINNs),这是一个以边界为重点的工程优化新框架。通过将地形边界曲线的控制变量参数作为可学习参数来参数,LT-PINNS消除了人工内插的需要,并能够精确地确定边界。我们进一步引入专门的边界条件损失功能和地形损失功能,以确保精确和准确的边界表述,即使是复杂的地形特征。LT-PINNS的准确性和稳健性通过两种局部差异方程式(PDEs)来验证,其中包括与Drichlet边界条件的弹性方程式和Laplace公司与Nemann边界条件的等方程式。此外,我们展示了LPNPNNPNPN的实效效力,在更复杂的时间依赖性和时间和时间上的边界界限内精确度上下限。我们进一步展示其深度流,不依赖的内流,而不依赖的内流的内流的内流的内测测测测测测测测测测测测测测测测测程。
Article 225
Title@2025-06-25 (3): FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation
Title: FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation | FluoroSAM: Ein sprachförderndes Foundation-Modell für flexible Röntgenbild-Segmentierung | FluororosAM:灵活X射线图像分割语言快速基础模型 2403.08059v3 |
Authors (8): Benjamin D. Killeen, Liam J. Wang, Blanca Inigo, Han Zhang, Mehran Armand, Russell H. Taylor, Greg Osgood, Mathias Unberath
Language promptable X-ray image segmentation would enable greater flexibility for human-in-the-loop workflows in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving problems within a narrow scope, but expanding to broader use requires additional data, annotations, and training time. Recently, language-aligned foundation models (LFMs) – machine learning models trained on large amounts of highly variable image and text data thus enabling broad applicability – have emerged as promising tools for automated image analysis. Existing foundation models for medical image analysis focus on scenarios and modalities where large, richly annotated datasets are available. However, the X-ray imaging modality features highly variable image appearance and applications, from diagnostic chest X-rays to interventional fluoroscopy, with varying availability of data. To pave the way toward an LFM for comprehensive and language-aligned analysis of arbitrary medical X-ray images, we introduce FluoroSAM, a language-promptable variant of the Segment Anything Model, trained from scratch on 3M synthetic X-ray images from a wide variety of human anatomies, imaging geometries, and viewing angles. These include pseudo-ground truth masks for 128 organ types and 464 tools with associated text descriptions. FluoroSAM is capable of segmenting myriad anatomical structures and tools based on natural language prompts, thanks to the novel incorporation of vector quantization (VQ) of text embeddings in the training process. We demonstrate FluoroSAM’s performance quantitatively on real X-ray images and showcase on several applications how FluoroSAM is a key enabler for rich human-machine interaction in the X-ray image acquisition and analysis context. Code is available at https://github.com/arcadelab/fluorosam.
在诊断和干预精度医学中,语言可即时X射线图像分解将使得在诊断和干预精度医学中,人行流流流流中具有更大的灵活性。先前的努力有助于建立能够解决范围狭小问题的任务特定模型,但扩大使用范围需要更多数据、说明和培训时间。最近,语言调合基础模型(LMM) – – 以大量可变性图像和文本数据培训的机器学习模型,从而能够广泛应用 – – 已成为自动图像分析的有希望的工具。现有医学图像分析基础模型侧重于在具有大量附加注释数据集的情景和模式。然而,X光成像模型中,从3M合成流流流流流流流流流流流流流图像的应用中,具有高度差异性真实的图像外观和应用程序,从诊断胸前X射线透镜到干预含含不同数据的含干预氟谱的透视模型。在X光流流流流流流流流流流流流流流流流流流流流流流流流流流流图像中,用于多种人类解、成影的快速流流流流流流流流流流流图像应用应用应用工具中,我们将获取和流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流流数据解的版本的模型的模型和流数据流数据流数据流流流流流流流数据流数据流数据流数据流数据系统流数据解数据解数据解数据化数据系统流数据解数据系统化数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据系统系统数据解数据解数据解数据系统数据解数据解数据解数据解数据解数据解数据解数据解数据解数据分析系统数据解数据解数据解数据解数据系统数据系统数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据解数据分析系统数据解数据解数据系统数据解数据解数据解数据解数据解
Article 226
Title@2025-06-25 (3): On the Role of Context in Reading Time Prediction
Title: On the Role of Context in Reading Time Prediction | Zur Rolle des Kontexts bei der Lesezeitvorhersage | 关于在阅读时间预测方面背景作用的 2409.08160v4 |
Authors (4): Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox
We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.
我们对读者如何在实时语言理解中整合背景提出了一个新的观点。 我们的提案基于超正理论, 假设语言单位( 例如一个单词)的处理努力与其内置信息内容的功能是一对等的。 我们首先发现超普里萨只是从多种潜在方法中的一种, 来源预测器可以从语言模型中产生。 另一个是单位与其上下文之间点向的相互信息( PMI), 最终在控制单格频率时产生与超正比力相同的预测力。 此外, PMI 和 suprisal 都与频率相关。 这意味着PMI 和 suprisal 都不包含仅包含上下文信息。 对此, 我们提出一种技术, 我们投影到频率的正反向补补补码上, 产生与频率不相联的新的背景预测器。 我们的实验显示, 当背景在控制单格频率时, 所解释的时间差异的比例要小得多。 此外, PMI 和suprisal 都与频率相关。 这意味着 PMI 和 suprisal 都无法仅包含上下文信息。 对此做出解释性预测, 。 在解释性分析时, 解释性观点中, 显示, 之前的研究可能读取了这个作用。
Article 227
Title@2025-06-25 (3): The kernel of graph indices for vector search
Title: The kernel of graph indices for vector search | Der Kernel der Graphenindizes für die Vektorsuche | 用于矢量搜索的图表索引核心 2506.20584v1 |
Authors (2): Mariano Tepper, Ted Willke
The most popular graph indices for vector search use principles from computational geometry to build the graph. Hence, their formal graph navigability guarantees are only valid in Euclidean space. In this work, we show that machine learning can be used to build graph indices for vector search in metric and non-metric vector spaces (e.g., for inner product similarity). From this novel perspective, we introduce the Support Vector Graph (SVG), a new type of graph index that leverages kernel methods to establish the graph connectivity and that comes with formal navigability guarantees valid in metric and non-metric vector spaces. In addition, we interpret the most popular graph indices, including HNSW and DiskANN, as particular specializations of SVG and show that new indices can be derived from the principles behind this specialization. Finally, we propose SVG-L0 that incorporates an $\ell_0$ sparsity constraint into the SVG kernel method to build graphs with a bounded out-degree. This yields a principled way of implementing this practical requirement, in contrast to the traditional heuristic of simply truncating the out edges of each node. Additionally, we show that SVG-L0 has a self-tuning property that avoids the heuristic of using a set of candidates to find the out-edges of each node and that keeps its computational complexity in check.
用于矢量搜索的最受欢迎的图表指数使用计算几何原则来构建图形。 因此, 它们的正式图形导航保证只在欧几里德空间有效。 在这项工作中, 我们显示, 机器学习可以用来在度量和非度量矢量空间( 例如, 内部产品相似性) 建立矢量搜索的图表指数。 我们从这个新的角度, 引入支持矢量图( SVG) , 这是一种新型的图表指数, 利用内心图方法来建立图形连接, 并伴随正式导航保证度和非度量矢量空间的有效。 此外, 我们将最受欢迎的图表指数, 包括 HNSWW 和 DiskANN , 解释为SVG 的特定专业, 并显示新的指数可以从此专业化空间( 如内产品相似性) 的原则中衍生出。 最后, 我们建议 SVG- L0 限制 方法将 $\ 00 美元 的缩略度限制纳入 SVG 内核方法, 来构建有约束的图表。 这产生一种执行这一实际要求的有原则性的方法, , 与简单的超度值值相比, 将SNS- devricdeal 显示每次的自我定位的自我定位, 的自我调整, 的自我定位, 将显示, 的自我定位的自我定位的自我定位的自我定位, 。
Article 228
Title@2025-06-25 (3): Rethinking Early Stopping: Refine, Then Calibrate
Title: Rethinking Early Stopping: Refine, Then Calibrate | Frühes Aufhören neu denken: Verfeinern, dann kalibrieren | 重新思考早期停止: 校正, 然后校准 2501.19195v2 |
Authors (4): Eugène Berta, David Holzmüller, Michael I. Jordan, Francis Bach
Machine learning classifiers often produce probabilistic predictions that are critical for accurate and interpretable decision-making in various domains. The quality of these predictions is generally evaluated with proper losses, such as cross-entropy, which decompose into two components: calibration error assesses general under/overconfidence, while refinement error measures the ability to distinguish different classes. In this paper, we present a novel variational formulation of the calibration-refinement decomposition that sheds new light on post-hoc calibration, and enables rapid estimation of the different terms. Equipped with this new perspective, we provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training. Selecting the best epoch based on validation loss thus leads to a compromise point that is suboptimal for both terms. To address this, we propose minimizing refinement error only during training (Refine,…), before minimizing calibration error post hoc, using standard techniques (…then Calibrate). Our method integrates seamlessly with any classifier and consistently improves performance across diverse classification tasks.
机器学习分类器往往产生对各个领域的准确和可解释决策至关重要的概率预测。这些预测的质量通常以适当的损失来评价,例如交叉孔虫,这种质量分解成两个部分:校准错误评估一般的低/超信任度,而完善错误则衡量区分不同类别的能力。在本文中,我们提出了一个校准-再精密分解的新的变式公式,为后热校准提供了新的光亮,并使得能够快速估计不同的术语。在这一新的角度下,我们提供了理论和经验证据,证明校准和精炼错误不会在培训期间同时最小化。根据验证损失选择最佳的切入点,从而导致一个折衷点,对于这两个术语来说都是次优的。为了解决这个问题,我们建议只在培训期间(Refine,…),在使用标准技术(…nry Calbrate)最大限度地减少校准后误差之前,在尽量减少校准误差前,我们的方法与任何分类和各种分类任务之间不断改进的性能融合。
Article 229
Title@2025-06-25 (3): Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling
Title: Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling | Entsperren des In-Context-Lernens für natürliche Datensätze jenseits der Sprachmodellierung | 解锁超出语言建模之外的自然数据集的文中学习 2501.06256v2 |
Authors (8): Jelena Bratulić, Sudhanshu Mittal, David T. Hoffmann, Samuel Böhm, Robin Tibor Schirrmeister, Tonio Ball, Christian Rupprecht, Thomas Brox
Large Language Models (LLMs) exhibit In-Context Learning (ICL), which enables the model to perform new tasks conditioning only on the examples provided in the context without updating the model’s weights. While ICL offers fast adaptation across natural language tasks and domains, its emergence is less straightforward for modalities beyond text. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL for autoregressive models and various modalities by promoting the learning of the needed mechanisms for ICL. We identify exact token repetitions in the training data sequences as an important factor for ICL. Such repetitions further improve stability and reduce transiency in ICL performance. Moreover, we emphasise the significance of training task difficulty for the emergence of ICL. Finally, by applying our novel insights on ICL emergence, we unlock ICL capabilities for various visual datasets and a more challenging EEG classification task in a few-shot learning regime.
大型语言模型(LLMS)展览 “ 内文学习 “ ,使该模型能够执行新的任务,仅以上下文中提供的例子为条件,而不更新模型的权重。虽然ICL提供各种自然语言任务和领域的快速适应,但其出现对于文本之外的方式来说并不那么简单。在这项工作中,我们系统地发现LLMS中存在的特性,这些特性支持ICL出现自动递减模式和各种模式,促进学习ICL所需的机制。我们把培训数据序列中确切的象征性重复作为ICL的一个重要因素。这种重复进一步提高了ICL的稳定性并减少了其运行的短暂性。此外,我们强调培训任务困难对ICL的出现的重要性。最后,通过运用我们对ICL的新的洞察力,我们为各种视觉数据集解锁了ICL的能力,并在几门学习制度中增加了EG的分类任务。
Article 230
Title@2025-06-25 (3): Causal Representation Learning with Observational Grouping for CXR Classification
Title: Causal Representation Learning with Observational Grouping for CXR Classification | Kausales Repräsentationslernen mit Beobachtungsgruppe für CXR-Klassifikation | 与CXR分类观察组一起进行因果代表性学习 2506.20582v1 |
Authors (3): Rajat Rasal, Avinash Kori, Ben Glocker
Identifiable causal representation learning seeks to uncover the true causal relationships underlying a data generation process. In medical imaging, this presents opportunities to improve the generalisability and robustness of task-specific latent features. This work introduces the concept of grouping observations to learn identifiable representations for disease classification in chest X-rays via an end-to-end framework. Our experiments demonstrate that these causal representations improve generalisability and robustness across multiple classification tasks when grouping is used to enforce invariance w.r.t race, sex, and imaging views.
在医学成像中,这为改进特定任务潜在特征的普遍性和稳健性提供了机会;这项工作引入了分组观测概念,通过端对端框架,了解胸部X光中疾病分类的可识别表现;我们的实验表明,这些因果表现提高了多种分类任务之间的普遍性和稳健性,因为分组被用于执行种族、性别和成像方面的差异观点。
Article 231
Title@2025-06-25 (3): TabArena: A Living Benchmark for Machine Learning on Tabular Data
Title: TabArena: A Living Benchmark for Machine Learning on Tabular Data | TabArena: Ein lebender Benchmark für maschinelles Lernen auf Tabellendaten | TabArena:用表格数据进行机器学习的活基准 2506.16791v2 |
Authors (7): Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, Frank Hutter
With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning and investigate the contributions of individual models. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https://tabarena.ai.
随着深层次学习和基础模型对表格数据的日益普及,标准化和可靠基准的需求量比以往任何时候要高。然而,目前的基准是静止的。即使发现缺陷,也没有更新设计,也没有更新其设计,即使发现缺陷,更新模型版本,或发布新模型。为了解决这个问题,我们引入了第一个持续维持的表表格基准系统TabArena。为了启动TabArena,我们手工整理了一套具有代表性的数据集和良好执行模式,进行了大规模基准研究,以启动一个公共领导板,并组建了一个有经验的维护者小组。我们的结果突出了验证方法的影响,并组装了超参数配置,以充分衡量模型的潜能。尽管梯度振动的树木仍然是实用的表格数据集的强大竞争者,但我们发现深层学习方法在更大的预算时间里赶上了混杂的集合。与此同时,基础模型在较小的数据集上非常出色。最后,我们展示了各种模型在列表机器学习中推进了状态,并调查了单个模型的贡献。我们用一个公共领导人基准,在塔巴纳建立了一个数据库上创建了一套可复制的代码。
Article 232
Title@2025-06-25 (3): Exploring Graph-Transformer Out-of-Distribution Generalization Abilities
Title: Exploring Graph-Transformer Out-of-Distribution Generalization Abilities | Erforschen von Graph-Transformer-Verallgemeinerungsfähigkeiten im Out-of-Distribution-Bereich | 探索图图向外转移 2506.20575v1 |
Authors (2): Itay Niv, Neta Rabin
Deep learning on graphs has shown remarkable success across numerous applications, including social networks, bio-physics, traffic networks, and recommendation systems. Regardless of their successes, current methods frequently depend on the assumption that training and testing data share the same distribution, a condition rarely met in real-world scenarios. While graph-transformer (GT) backbones have recently outperformed traditional message-passing neural networks (MPNNs) in multiple in-distribution (ID) benchmarks, their effectiveness under distribution shifts remains largely unexplored. In this work, we address the challenge of out-of-distribution (OOD) generalization for graph neural networks, with a special focus on the impact of backbone architecture. We systematically evaluate GT and hybrid backbones in OOD settings and compare them to MPNNs. To do so, we adapt several leading domain generalization (DG) algorithms to work with GTs and assess their performance on a benchmark designed to test a variety of distribution shifts. Our results reveal that GT and hybrid GT-MPNN backbones consistently demonstrate stronger generalization ability compared to MPNNs, even without specialized DG algorithms. Additionally, we propose a novel post-training analysis approach that compares the clustering structure of the entire ID and OOD test datasets, specifically examining domain alignment and class separation. Demonstrating its model-agnostic design, this approach not only provided meaningful insights into GT and MPNN backbones. It also shows promise for broader applicability to DG problems beyond graph learning, offering a deeper perspective on generalization abilities that goes beyond standard accuracy metrics. Together, our findings highlight the promise of graph-transformers for robust, real-world graph learning and set a new direction for future research in OOD generalization.
图表上的深层学习显示,在包括社交网络、生物物理、交通网络和建议系统在内的众多应用系统中取得了显著的成功。尽管取得了成功,但目前的方法往往取决于这样的假设:培训和测试数据分布相同,在现实世界情景下,这种情况很少出现。虽然图形转换(GT)主干网最近在多个分布(ID)基准中优于传统的信息传递神经网络(MPNN),但在分配转移中,其效力在很大程度上仍未得到探索。在这项工作中,我们解决了图表神经网络(OOOD)的超传(OOOOD)普及化的挑战,并特别关注骨干结构的影响。我们系统地评价OOOD设置的GT和混合骨干,并将它们与MPNG(O)相比,我们系统地评估GO(GNG)的G(GO)常规分析结果,并具体地将OD(OGO)常规分析结果与OD(ODGLA)常规数据对比。我们提出一个新的通用域域通用的通用数据分析。
Article 233
Title@2025-06-25 (3): Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series
Title: Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series | Benchmarking unüberwachter Strategien zur Erkennung von Anomalien in multivariaten Zeitreihen | 确定多变时间序列中异常探测不受监督战略的基准 2506.20574v1 |
Authors (3): Laura Boggia, Rafael Teixeira de Lima, Bogdan Malaescu
Anomaly detection in multivariate time series is an important problem across various fields such as healthcare, financial services, manufacturing or physics detector monitoring. Accurately identifying when unexpected errors or faults occur is essential, yet challenging, due to the unknown nature of anomalies and the complex interdependencies between time series dimensions. In this paper, we investigate transformer-based approaches for time series anomaly detection, focusing on the recently proposed iTransformer architecture. Our contributions are fourfold: (i) we explore the application of the iTransformer to time series anomaly detection, and analyse the influence of key parameters such as window size, step size, and model dimensions on performance; (ii) we examine methods for extracting anomaly labels from multidimensional anomaly scores and discuss appropriate evaluation metrics for such labels; (iii) we study the impact of anomalous data present during training and assess the effectiveness of alternative loss functions in mitigating their influence; and (iv) we present a comprehensive comparison of several transformer-based models across a diverse set of datasets for time series anomaly detection.
多变时间序列中的异常探测是保健、金融服务、制造或物理探测器监测等不同领域的一个重要问题。精确地确定何时发生意外错误或故障是必要的,但由于异常现象性质不明以及时间序列维度之间错综复杂的相互依存性,因此具有挑战性。在本文件中,我们调查基于变压器的时间序列异常探测方法,重点是最近提议的变异结构。我们的贡献有四个方面:(一) 我们探索对时间序列异常现象探测应用变异技术,并分析窗口大小、步骤大小和模型层面等关键参数对性能的影响;(二) 我们审查从多层面异常现象评分中提取异常标签的方法,并讨论此类标签的适当评价指标;(三) 我们研究培训期间出现的异常数据的影响,并评估其他损失功能在减轻其影响方面的效力。以及(四) 我们全面比较了不同数据集中的若干变压器模型,用于时间序列异常现象检测。
Article 234
Title@2025-06-25 (3): LARP: Learner-Agnostic Robust Data Prefiltering
Title: LARP: Learner-Agnostic Robust Data Prefiltering | LARP: Learner-Agnostic Robuste Datenvorfilterung | LARP: 学习者-不可知强力数据预过滤 2506.20573v1 |
Authors (3): Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov
The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of scalar mean estimation with Huber estimators under the Huber data contamination model. We provide a hardness result on a specific problem instance and analyze several natural prefiltering procedures. Our theoretical results indicate that performing LARP on a heterogeneous set of learners leads to some loss in model performance compared to the alternative of prefiltering data for each learner/use-case individually. We explore the resulting utility loss and its dependence on the problem parameters via extensive experiments on real-world image and tabular data, observing statistically significant reduction in utility. Finally, we model the trade-off between the utility drop and the cost of repeated (learner-specific) prefiltering within a game-theoretic framework and showcase benefits of LARP for large datasets.
大量公共数据集的广泛可得性是最近成功统计性游戏推断和机器学习方法取得成功的一个关键因素。然而,这些数据集往往包含一些低质量或污染数据,许多学习程序对此敏感。因此,公共数据集是否以及如何进行预过滤以方便准确的下游学习的问题。在技术一级,这要求建立有原则的数据预过滤方法,这些方法在学习者、测量者、稳健程度方面是可靠的,可以明显地保护一组预定的下游学习者不受腐败数据的影响。在这项工作中,我们正式确定了学习者-Agnnoster Robust数据的重复过滤(LARP)问题,目的是找到预过滤程序,将最坏情况的损失降到最低程度,以利准确的下游学习者。在Huber数据污染模型下,我们首先与Huber估计者一起进行升级前期估算。我们提供了一个具体问题实例的精确性结果,并分析了若干自然预检程序。我们的理论结果表明,在每组通用数据反复筛选参数中执行LARP的精确性规则,从而从统计性模型中降低成本。我们通过成本模型,在每一阶段进行实地评估,在数据库中,然后进行实地评估。
Article 235
Title@2025-06-25 (3): Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control
Title: Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control | Verstärktes Lernen steigert die Produktion von Windfarmen durch Ermöglichung der Closed-Loop-Kollaborative Steuerung | 增强学习能力,通过扶持闭路合作控制,增加风农场发电量 2506.20554v1 |
Authors (4): Andrew Mole, Max Weissenbacher, Georgios Rigas, Sylvain Laizet
Traditional wind farm control operates each turbine independently to maximize individual power output. However, coordinated wake steering across the entire farm can substantially increase the combined wind farm energy production. Although dynamic closed-loop control has proven effective in flow control applications, wind farm optimization has relied primarily on static, low-fidelity simulators that ignore critical turbulent flow dynamics. In this work, we present the first reinforcement learning (RL) controller integrated directly with high-fidelity large-eddy simulation (LES), enabling real-time response to atmospheric turbulence through collaborative, dynamic control strategies. Our RL controller achieves a 4.30% increase in wind farm power output compared to baseline operation, nearly doubling the 2.19% gain from static optimal yaw control obtained through Bayesian optimization. These results establish dynamic flow-responsive control as a transformative approach to wind farm optimization, with direct implications for accelerating renewable energy deployment to net-zero targets.
传统风力农场控制系统独立运行每个涡轮机,以最大限度地实现个人电力产出。然而,整个农场的协调后游可大幅提高风力农场联合能源生产量。尽管动态闭环控制在流量控制应用方面证明行之有效,但风力农场优化主要依靠静态、低纤维模拟器,它们忽视了关键的动荡流动动态。在这项工作中,我们展示了第一个强化学习控制器,该控制器直接与高纤维大型模拟(LES)相结合,通过协作、动态控制战略,能够实时应对大气动荡。我们的风力农场控制器与基线操作相比,风力农场的发电量增加了4.3%,从通过贝叶斯优化获得的静态最佳电线虫控制中获得的收益几乎增加了2.19%。这些结果建立了动态流动控制器,作为风力优化的转型方法,直接影响到可再生能源向净零目标的加速部署。
Article 236
Title@2025-06-25 (3): Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks
Title: Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks | Weniger Aufmerksamkeit auf trügerische Artefakte: Robuste Erkennung von komprimierten Deepfakes auf Online-Sozialen Netzwerken | 较少注意欺骗性人工制品:在网上社交网络上大力发现压缩的深层假象 2506.20548v1 |
Authors (8): Manyi Li, Renshuai Tao, Yufan Liu, Chuangchuang Tan, Haotong Qin, Bing Li, Yunchao Wei, Yao Zhao
With the rapid advancement of deep learning, particularly through generative adversarial networks (GANs) and diffusion models (DMs), AI-generated images, or deepfakes", have become nearly indistinguishable from real ones. These images are widely shared across Online Social Networks (OSNs), raising concerns about their misuse. Existing deepfake detection methods overlook the
block effects” introduced by compression in OSNs, which obscure deepfake artifacts, and primarily focus on raw images, rarely encountered in real-world scenarios. To address these challenges, we propose PLADA (Pay Less Attention to Deceptive Artifacts), a novel framework designed to tackle the lack of paired data and the ineffective use of compressed images. PLADA consists of two core modules: Block Effect Eraser (B2E), which uses a dual-stage attention mechanism to handle block effects, and Open Data Aggregation (ODA), which processes both paired and unpaired data to improve detection. Extensive experiments across 26 datasets demonstrate that PLADA achieves a remarkable balance in deepfake detection, outperforming SoTA methods in detecting deepfakes on OSNs, even with limited paired data and compression. More importantly, this work introduces the ``block effect” as a critical factor in deepfake detection, providing a robust solution for open-world scenarios. Our code is available at https://github.com/ManyiLee/PLADA.
随着深层学习的迅速发展,特别是通过基因对抗网络(GANs)和传播模型(DMs)、AI产生的图像或“Eepfakes”的快速进步,这些图像几乎与真实的图像几乎无法区分。这些图像在在线社会网络(OSNs)中被广泛分享,引起对其滥用的担忧。现有的深层假发现方法忽略了OSNs压缩带来的“区块效应 ” 。OSNs的压缩掩盖了深层假手工艺品,主要侧重于原始图像,在现实世界的情景中很少遇到。为了应对这些挑战,我们建议PLADA(少注意欺骗性艺术行为),这是一个新颖的框架,旨在解决配对数据的缺乏和压缩图像的无效使用。PLADADA由两个核心模块组成:Blockeffect Eraser(B2E),它使用双阶段关注机制处理区块效应,以及ODAG(ODADA)处理对立和未对立数据的公开数据,在现实世界的情景中进行广泛的实验,在深层次的检测中甚至递解解解(OSURAL)数据中,在进行更精确的模型中,在检测中可以理解中采用。
Article 237
Title@2025-06-25 (3): Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls
Title: Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls | Kontextuelle Optimierung unter Kovariate Shift: Ein robuster Ansatz durch Intersektion von Wassersteinkugeln | 共变转移下的上下文优化:通过交叉的瓦森斯泰因球 采取强有力的方法 2406.02426v2 |
Authors (3): Tianyu Wang, Ningyuan Chen, Chun Wang
In contextual optimization, a decision-maker leverages contextual information, often referred to as covariates, to better resolve uncertainty and make informed decisions. In this paper, we examine the challenges of contextual decision-making under covariate shift, a phenomenon where the distribution of covariates differs between the training and test environments. Such shifts can lead to inaccurate upstream estimations for test covariates that lie far from the training data, ultimately resulting in suboptimal downstream decisions. To tackle these challenges, we propose a novel approach called Intersection Wasserstein-balls DRO (IW-DRO), which integrates multiple estimation methods into the distributionally robust optimization (DRO) framework. At the core of our approach is an innovative ambiguity set defined as the intersection of two Wasserstein balls, with their centers constructed using appropriate nonparametric and parametric estimators. On the computational side, we reformulate the IW-DRO problem as a tractable convex program and develop an approximate algorithm tailored for large-scale problems to enhance computational efficiency. From a theoretical perspective, we demonstrate that IW-DRO achieves superior performance compared to single Wasserstein-ball DRO models. We further establish performance guarantees by analyzing the coverage of the intersection ambiguity set and the measure concentration of both estimators under the Wasserstein distance. Notably, we derive a finite-sample concentration result for the Nadaraya-Watson kernel estimator under covariate shift. The proposed IW-DRO framework offers practical value for decision-makers operating in uncertain environments affected by covariate shifts.
在环境优化方面,决策者利用背景信息(通常称为共变)来更好地解决不确定性和做出知情决定。在本文件中,我们审视了在共变式变化下背景决策的挑战,共变式的分布在培训和测试环境之间有差异的现象。这种转变可能导致测试变异性的上游估计不准确,而测试变异远离培训数据,最终导致不优化的下游决定。为了应对这些挑战,我们提议了一种新颖的方法,称为交叉瓦塞斯汀球DRO(IW-DRO),将多种估算方法纳入分布式强力优化框架。我们的方法核心是将创新的模糊性设定为两个瓦塞斯坦球的交叉点,其中心使用适当的非参数和参数性估测器构建。在计算方面,我们重新将IW-DRO问题作为一个可拉动的矩形组合程序,并针对大规模框架的提高计算效率,我们从理论角度表明,IW-DRO在分配式更稳性优化优化优化优化优化优化的优化度优化度优化度优化度优化度优化度(DRO) ,我们用Staria-destrain Stalteral-destrain roisal deal developation roduisl lax the lax the lax the roduclex the roduclemental roducal deal deal destral deal destraldaldaldaldaldaldaldaldaldalticaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldal),我们在使用 度中,我们通过在持续分析了标准化的计算结果,我们在使用。
Article 238
Title@2025-06-25 (3): Demonstration of effective UCB-based routing in skill-based queues on real-world data
Title: Demonstration of effective UCB-based routing in skill-based queues on real-world data | Demonstration eines effektiven UCB-basierten Routings in kompetenzbasierten Warteschlangen auf realen Daten | 根据真实世界数据,在基于技能的队列中,演示基于UCB的有效路线 2506.20543v1 |
Authors (4): Sanne van Kempen, Jaron Sanders, Fiona Sloothaak, Maarten G. Wolf
This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade–offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.
本文是关于优化控制以技能为基础的排队系统,如数据中心、云计算网络和服务系统。通过使用真实世界数据集进行案例研究,我们调查了最近开发的优化客户路由的强化学习算法的实际实施情况。我们的实验表明,算法有效地学习和适应不断变化的环境,并优于静态基准政策,表明其实时实施的潜力。我们还通过引入新的超常路线规则来减少延误来增强这一算法的真实世界适用性。此外,我们表明,算法可以优化多种目标:下一目标是支付最大化,第二是服务器装载公平性和客户等待时间缩短等次级目标。使用调试参数来平衡内在的性能权衡取舍。最后,我们研究了估算误差和参数调整的敏感性,为在复杂的真实世界排队系统中实施适应性路程算法提供了宝贵的见解。
Article 239
Title@2025-06-25 (3): Adversarial Reasoning at Jailbreaking Time
Title: Adversarial Reasoning at Jailbreaking Time | Widerspenstige Vernunft in der Zeit des Gefängnisbruchs | 监狱破禁时间的对立理由 2502.01633v2 |
Authors (6): Mahdi Sabbaghi, Paul Kassianik, George Pappas, Yaron Singer, Amin Karbasi, Hamed Hassani
As large language models (LLMs) are becoming more capable and widespread, the study of their failure cases is becoming increasingly important. Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs. We develop an adversarial reasoning approach to automatic jailbreaking that leverages a loss signal to guide the test-time compute, achieving SOTA attack success rates against many aligned LLMs, even those that aim to trade inference-time compute for adversarial robustness. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
随着大型语言模型(LLMs)的能力和普及程度的提高,对其失败案例的研究正变得越来越重要。最近在标准化、计量和缩放测试时间计算方面取得的进步为优化模型以实现高难度任务业绩提出了新方法。在本文件中,我们将这些进步用于示范破狱任务:从一致的LLMs中引来有害反应。我们开发了自动破狱的对抗推理方法,利用损失信号来指导测试-时间计算,实现SOTA对许多匹配的LMs的成功率,甚至那些旨在用推断-时间计算对抗性强健性的方法。我们的方法引入了一种了解LLM脆弱性的新模式,为开发更强大和可靠的AI系统奠定了基础。
Article 240
Title@2025-06-25 (3): Physics-Informed Machine Learning Regulated by Finite Element Analysis for Simulation Acceleration of Laser Powder Bed Fusion
Title: Physics-Informed Machine Learning Regulated by Finite Element Analysis for Simulation Acceleration of Laser Powder Bed Fusion | Physik-informiertes maschinelles Lernen reguliert durch Finite Element Analyse für Simulation Beschleunigung von Laser-Pulver Bed Fusion | 受激光粉尘床溶化加速模拟加速的有限元素分析规范的物理系统化机械学习 2506.20537v1 |
Authors (3): R. Sharma, M. Raissi, Y. B. Guo
Efficient simulation of Laser Powder Bed Fusion (LPBF) is crucial for process prediction due to the lasting issue of high computation cost using traditional numerical methods such as finite element analysis (FEA). This study presents an efficient modeling framework termed FEA-Regulated Physics-Informed Neural Network (FEA-PINN) to accelerate the thermal field prediction in a LPBF process while maintaining the FEA accuracy. A novel dynamic material updating strategy is developed to capture the dynamic phase change of powder-liquid-solid in the PINN model. The PINN model incorporates temperature-dependent material properties and phase change behavior using the apparent heat capacity method. While the PINN model demonstrates high accuracy with a small training data and enables generalization of new process parameters via transfer learning, it faces the challenge of high computation cost in time-dependent problems due to the residual accumulation. To overcome this issue, the FEA-PINN framework integrates corrective FEA simulations during inference to enforce physical consistency and reduce error drift. A comparative analysis shows that FEA-PINN achieves equivalent accuracy to FEA while significantly reducing computational cost. The framework has been validated using the benchmark FEA data and demonstrated through single-track scanning in LPBF.
由于使用有限要素分析(FEA)等传统数字方法的高计算成本问题,激光粉底浸泡泡素(LPBFF)的有效模拟对于过程预测至关重要。本研究提出了一个高效的模型框架,称为FEA-Regulate物理内化神经网络(FEA-PINNN),目的是在LPBF进程中加速热场预测,同时保持FEA的准确性。正在开发一个新的动态材料更新战略,以捕捉PINN模型中粉末-液体固态的动态阶段变化。PINN模型包含依赖温度的物质特性和使用表面热能能力方法的阶段变化行为。PINN模型显示,使用少量培训数据具有高度准确性,并能通过转移学习将新的过程参数普遍化。但是,由于残余积累,它面临着高计算成本的挑战。为了克服这一问题,FEA-PNNF在推断中结合了纠正性FEA模拟,以实施物理一致性和减少误差流。比较分析表明,FEA-PINNN在大幅降低计算成本时,FEA实现了等同FEA的精确性。框架已经通过LFA-BFRA的单一数据测试得到验证。
Article 241
Title@2025-06-25 (3): WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads
Title: WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads | WattsOnAI: Messen, Analysieren und Visualisieren von Energie und Carbon Footprint von KI-Workloads | WattsOnAI:AI工作量的测量、分析、可视化能源和碳足迹 2506.20535v1 |
Authors (5): Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang
The rapid advancement of AI, particularly large language models (LLMs), has raised significant concerns about the energy use and carbon emissions associated with model training and inference. However, existing tools for measuring and reporting such impacts are often fragmented, lacking systematic metric integration and offering limited support for correlation analysis among them. This paper presents WattsOnAI, a comprehensive software toolkit for the measurement, analysis, and visualization of energy use, power draw, hardware performance, and carbon emissions across AI workloads. By seamlessly integrating with existing AI frameworks, WattsOnAI offers standardized reports and exports fine-grained time-series data to support benchmarking and reproducibility in a lightweight manner. It further enables in-depth correlation analysis between hardware metrics and model performance and thus facilitates bottleneck identification and performance enhancement. By addressing critical limitations in existing tools, WattsOnAI encourages the research community to weigh environmental impact alongside raw performance of AI workloads and advances the shift toward more sustainable “Green AI” practices. The code is available at https://github.com/SusCom-Lab/WattsOnAI.
AI的快速发展,特别是大型语言模型(LLMS),引起了人们对与模型培训和推论有关的能源使用和碳排放的严重关切,然而,衡量和报告这些影响的现有工具往往支离破碎,缺乏系统的标准化整合,对相互关系分析的支持有限。本文介绍了WattsOnAI, 能源使用、电力抽取、硬件性能和碳排放计量、分析和可视化的综合软件工具包。WattsOnAI与现有的AI框架紧密结合,提供标准化报告和出口精细的实时系列数据,以支持基准制定和以轻量度方式再生。它进一步使得硬件计量和模型性能之间的深入分析能够促进瓶颈识别和绩效提高。WattsOnAI通过解决现有工具中的关键局限性,鼓励研究界与AI工作量的原始表现一道权衡环境影响,并推动转向更可持续的“绿色AI”做法。该代码可在https://github.com/SusCom-Lab/Wats Onats OnAI上查阅。
Article 242
Title@2025-06-25 (3): Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery
Title: Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery | Globale Konvergenz iterativ umgewichteter Least Squares für robuste Subraum-Recovery | 自动再加权最低空间平面对强力亚空间恢复的全球趋同 2506.20533v1 |
Authors (4): Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang
Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.
强大的亚空间估计是许多机器学习和数据分析任务的基础。 循环再加权最低空间( IRLS) 是解决这一问题的优雅、 经验上有效的方法, 但其理论属性仍然不甚为人理解。 本文确定, 在确定性条件下, 动态平稳正规化的IRLS的变种从任何初始化开始就直线地连接到潜在的亚空间。 我们将这些保证扩大到近距离子空间估计, 这个环境缺乏先前的恢复理论。 此外, 我们通过应用低维神经网络培训来说明IRLS的实际效益。 我们的结果为IRLS提供了在强大的亚空间回收中的第一个全球趋同保证, 更广义地说, 为Riemannian 方块上的非Convex IRLS提供了第一个全球趋同保证。
Article 243
Title@2025-06-25 (3): Attention with Trained Embeddings Provably Selects Important Tokens
Title: Attention with Trained Embeddings Provably Selects Important Tokens | Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus | 与经过训练的嵌入器的关注 2505.17282v3 |
Authors (4): Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli
Token embeddings play a crucial role in language modeling but, despite this practical relevance, their theoretical understanding remains limited. Our paper addresses the gap by characterizing the structure of embeddings obtained via gradient descent. Specifically, we consider a one-layer softmax attention model with a linear head for binary classification, i.e., $\texttt{Softmax}( p^\top E_X^\top ) E_X v = \frac{ \sum_{i=1}^T \exp(p^\top E_{x_i}) E_{x_i}^\top v}{\sum_{j=1}^T \exp(p^\top E_{x_{j}}) }$, where $E_X = [ E_{x_1} , \dots, E_{x_T} ]^\top$ contains the embeddings of the input sequence, $p$ is the embedding of the $\mathrm{\langle cls \rangle}$ token and $v$ the output vector. First, we show that, already after a single step of gradient training with the logistic loss, the embeddings $E_X$ capture the importance of tokens in the dataset by aligning with the output vector $v$ proportionally to the frequency with which the corresponding tokens appear in the dataset. Then, after training $p$ via gradient flow until convergence, the softmax selects the important tokens in the sentence (i.e., those that are predictive of the label), and the resulting $\mathrm{\langle cls \rangle}$ embedding maximizes the margin for such a selection. Experiments on real-world datasets (IMDB, Yelp) exhibit a phenomenology close to that unveiled by our theory.
Token 嵌入在语言建模中发挥着关键作用, 但是, 尽管这种实际相关性, 它们的理论理解仍然有限 。 我们的纸张通过描述通过渐渐下降获得的嵌入结构来弥补差距 。 具体地说, 我们考虑一个单层软式关注模型, 其二进制分类为线性头, 即 $\ textt{Softmax} (ptop E_ Xtoff) , E_ X v =\ frac{\ sum=1\\ t=1\ t\ t\ ex( littop Exx_ i}) 。 E_ ittop = ligle = ligal =1\ t=1\ t\ t= exm= exml= exmal exmession 。 $xtrealal deal deal ladeal dies a mission. $x dremodal lax the demodeal train listrations a mission extial listrations a mission extial extial extial extial extial 。 。 extial extial 数据, 我们 extial ex ex ex extal extal ex ex extal extal extal extal extrad ex ex ex ex extal ex ex ex ex ex ex =x ex ex ex ex =x =xxxxxxxxxx 。 。 extal 。 =xxxxxxx =xxxxx =x =xx =x =x = =xxxx =xxx =xxxx =x =x = =x =x ex = =x = = = === = ==========================
Article 244
Title@2025-06-25 (3): Variational Learning Finds Flatter Solutions at the Edge of Stability
Title: Variational Learning Finds Flatter Solutions at the Edge of Stability | Variationelles Lernen findet flachere Lösungen am Rande der Stabilität | 稳定边缘的变异学习发现快餐式解决方案 2506.12903v2 |
Authors (8): Avrajit Ghosh, Bai Cong, Rio Yokota, Saiprasad Ravishankar, Rongrong Wang, Molei Tao, Mohammad Emtiyaz Khan, Thomas Möllenhoff
Variational Learning (VL) has recently gained popularity for training deep neural networks and is competitive to standard learning methods. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but there are few tools to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL through the Edge of Stability (EoS) framework. EoS has previously been used to show that gradient descent can find flat solutions and we extend this result to VL to show that it can find even flatter solutions. This is obtained by controlling the posterior covariance and the number of Monte Carlo samples from the posterior. These results are derived in a similar fashion as the standard EoS literature for deep learning, by first deriving a result for a quadratic problem and then extending it to deep neural networks. We empirically validate these findings on a wide variety of large networks, such as ResNet and ViT, to find that the theoretical results closely match the empirical ones. Ours is the first work to analyze the EoS dynamics in VL.
在培养深神经网络方面,变异学习(VL)最近越来越受欢迎,并且对标准学习方法具有竞争力。其经验成功的一部分可以用PAC-Bayes 边框、最小描述长度和边际可能性等理论来解释,但很少有工具可以解析游戏中的隐含规范化。在这里,我们通过稳定边缘框架分析了VL的隐含正规化。Eos过去曾用来表明梯度下降可以找到平坦的解决方案,我们把这一结果扩大到VL,以表明它能够找到更受人青睐的解决方案。这是通过控制后方的后方变量和Monte Carlo样本数量获得的。这些结果与Eos标准文献用于深层学习的类似,首先得出了二次曲线问题的结果,然后将其扩展到深神经网络。我们从经验上验证了诸如ResNet和VIT等各种大型网络的这些研究结果,以便发现理论结果与实验结果非常接近。我们的第一个工作是分析VL的EOS动态。
Article 245
Title@2025-06-25 (3): Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains
Title: Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains | Proximale Kontrolle von UAVs mit Federated Learning für Mensch-Roboter Collaborative Domains | 人类-机器人合作域的联邦学习系统对无人驾驶航空器的优化控制 2412.02863v2 |
Authors (4): Lucas Nogueira Nobrega, Ewerton de Oliveira, Martin Saska, Tiago Nascimento
The human-robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot’s movement, the human operator is often out of the robot’s field of view. Furthermore, in multi-robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi-robot system’s learning. Furthermore, our multi-robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.
人类- 机器人互动( HRI) 是一个日益增长的研究领域。 在 HRI 中, 复杂的命令( 动作) 分类仍是一个尚未解决的问题, 通常会妨碍这种技术的真正应用。 文献展示了一些使用神经网络检测这些动作的作品。 然而, 封闭仍然是人权 中的一个主要问题, 特别是使用未封闭的飞行器( UAVs ) , 因为, 在机器人运动期间, 人类操作者经常不在机器人视野的视野之内。 此外, 在多机器人假设中, 分布式培训也是一个尚未解决的问题。 从这个意义上讲, 这项工作提议了一种基于长期短期内存( LSTM) 深神经网络的行动识别和控制方法, 与三个密不可分的层和多个无人机中嵌入的联邦学习( FLL) 相关联的两层相联。 FL 使得我们的方法能够以分布式的方式接受培训, 也就是说, 无需云或其他储存库, 就能促进多机器人系统的学习。 此外, 我们的多机器人方法的结果也防止了隐蔽状态, , 并且有比实际机器人更精确的实验 96 % 。
Article 246
Title@2025-06-25 (3): Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation
Title: Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation | Industrial Energy Disaggregation mit digitalem Twin-generated Dataset und effizienter Datenvergrößerung | 工业能源分类与数字双生成数据集和高效数据扩增 2506.20525v1 |
Authors (5): Christian Internò, Andrea Castellani, Sebastian Schmitt, Fabio Stella, Barbara Hammer
Industrial Non-Intrusive Load Monitoring (NILM) is limited by the scarcity of high-quality datasets and the complex variability of industrial energy consumption patterns. To address data scarcity and privacy issues, we introduce the Synthetic Industrial Dataset for Energy Disaggregation (SIDED), an open-source dataset generated using Digital Twin simulations. SIDED includes three types of industrial facilities across three different geographic locations, capturing diverse appliance behaviors, weather conditions, and load profiles. We also propose the Appliance-Modulated Data Augmentation (AMDA) method, a computationally efficient technique that enhances NILM model generalization by intelligently scaling appliance power contributions based on their relative impact. We show in experiments that NILM models trained with AMDA-augmented data significantly improve the disaggregation of energy consumption of complex industrial appliances like combined heat and power systems. Specifically, in our out-of-sample scenarios, models trained with AMDA achieved a Normalized Disaggregation Error of 0.093, outperforming models trained without data augmentation (0.451) and those trained with random data augmentation (0.290). Data distribution analyses confirm that AMDA effectively aligns training and test data distributions, enhancing model generalization.
由于缺乏高质量的数据集和工业能源消费模式的复杂变异性,工业非侵入性负载监测(NILM)有限,因为缺少高质量的数据集和工业能源消费模式的复杂变化。为了解决数据稀缺和隐私问题,我们引入了能源分解合成工业数据集(SIDIDD),这是利用数字双模拟产生的开放源数据集。SIDIDD包括三个不同地理位置的三类工业设施,捕捉了各种不同的电器行为、天气条件和负载剖面图。我们还提议了应用程序-移动数据放大法(AMDA)方法,这是一种计算效率高的技术,通过根据相对影响明智地扩大应用程序贡献来增强NILM模型的通用化。我们在实验中显示,NILMM模型经过利用AMDA强化数据模型培训,大大改进了综合热电系统等复杂工业电器的能源消耗的分类。具体地说,在我们的抽样假设中,接受过AMDA培训的模型实现了0.093的正常分解误差,未经过数据增强数据的扩增积度(0.451)和经过随机数据扩增度测试的模型。
Article 247
Title@2025-06-25 (3): On Advancements of the Forward-Forward Algorithm
Title: On Advancements of the Forward-Forward Algorithm | Auf den Fortschritten des Vorwärtsalgorithmus | 关于前向前前进算法的推进 2504.21662v2 |
Authors (3): Mauricio Ortiz Torres, Markus Lange, Arne P. Raulf
The Forward-Forward algorithm has evolved in machine learning research, tackling more complex tasks that mimic real-life applications. In the last years, it has been improved by several techniques to perform better than its original version, handling a challenging dataset like CIFAR10 without losing its flexibility and low memory usage. We have shown in our results that improvements are achieved through a combination of convolutional channel grouping, learning rate schedules, and independent block structures during training that lead to a 20\% decrease in test error percentage. Additionally, to approach further implementations on low-capacity hardware projects, we have presented a series of lighter models that achieve low test error percentages within (21$\pm$3)\% and number of trainable parameters between 164,706 and 754,386. This serves as a basis for our future study on complete verification and validation of these kinds of neural networks.
前向算法在机器学习研究中有所发展,处理模仿现实应用的更复杂任务。在过去几年中,它通过若干技术得到了改进,以比原版更好运行,处理像CIFAR10这样的具有挑战性的数据集,同时又不丧失灵活性和低记忆使用率。我们的结果显示,通过结合进化频道组合、学习率时间表和独立块状结构,在培训过程中取得了改进,导致测试误差百分比下降20%。此外,为了进一步实施低容量硬件项目,我们提出了一系列较轻模型,在21\pm3美元范围内达到低测试误差率(21\pm3美元) 和164,706和754,386之间可培训参数的数量,这成为我们今后全面核查和验证这类神经网络的基础。
Article 248
Title@2025-06-25 (3): Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Title: Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards | Asymmetrisches REINFORCE für Off-Policy-Verstärkung-Lernen: Ausgleich positiver und negativer Belohnungen | 非政策加强学习的不对称REINFORCE对非政策加强学习的影响:平衡正与负的奖励 2506.20520v1 |
Authors (6): Charles Arnal, Gaëtan Narozniak, Vivien Cabannes, Yunhao Tang, Julia Kempe, Remi Munos
Reinforcement learning (RL) is increasingly used to align large language models (LLMs). Off-policy methods offer greater implementation simplicity and data efficiency than on-policy techniques, but often result in suboptimal performance. In this work, we study the intermediate range of algorithms between off-policy RL and supervised fine-tuning by analyzing a simple off-policy REINFORCE algorithm, where the advantage is defined as $A=r-V$, with $r$ a reward and $V$ some tunable baseline. Intuitively, lowering $V$ emphasizes high-reward samples, while raising it penalizes low-reward ones more heavily. We first provide a theoretical analysis of this off-policy REINFORCE algorithm, showing that when the baseline $V$ lower-bounds the expected reward, the algorithm enjoys a policy improvement guarantee. Our analysis reveals that while on-policy updates can safely leverage both positive and negative signals, off-policy updates benefit from focusing more on positive rewards than on negative ones. We validate our findings experimentally in a controlled stochastic bandit setting and through fine-tuning state-of-the-art LLMs on reasoning tasks.
强化学习(RL)越来越多地用于调整大型语言模型(LLM) 。非政策性方法比政策性技术更简单,数据效率更高,但往往导致低于最佳性能。在这项工作中,我们研究了非政策性RL之间的中间算法范围,并通过分析一个简单的REINFORCE非政策性REINFORCE算法来监督微调,该算法的优势被定义为$A=r-V$,其中的优势被定义为美元奖励和一些金枪鱼可获量的基线美元。直觉地说,降低V$强调高回报样本,同时提高低回报样本。我们首先对这一非政策性REINFORCE算法提供了理论分析,表明当基线值为美元,低限预期的回报时,该算法享有政策改进保证。我们的分析表明,虽然在政策上更新可以安全地利用正和负信号,但非政策性更新的好处是更多地侧重于正面的而不是负面的。我们从受控的悬殊的悬殊中验证了我们的调查结果。
Article 249
Title@2025-06-25 (3): VRAIL: Vectorized Reward-based Attribution for Interpretable Learning
Title: VRAIL: Vectorized Reward-based Attribution for Interpretable Learning | VRAIL: Vectorized Reward-based Attribution for Interpretable Learning | VRAIL: 可解释性学习的矢量奖励 2506.16014v3 |
Authors (3): Jina Kim, Youjin Jang, Jeongjin Han
We propose VRAIL (Vectorized Reward-based Attribution for Interpretable Learning), a bi-level framework for value-based reinforcement learning (RL) that learns interpretable weight representations from state features. VRAIL consists of two stages: a deep learning (DL) stage that fits an estimated value function using state features, and an RL stage that uses this to shape learning via potential-based reward transformations. The estimator is modeled in either linear or quadratic form, allowing attribution of importance to individual features and their interactions. Empirical results on the Taxi-v3 environment demonstrate that VRAIL improves training stability and convergence compared to standard DQN, without requiring environment modifications. Further analysis shows that VRAIL uncovers semantically meaningful subgoals, such as passenger possession, highlighting its ability to produce human-interpretable behavior. Our findings suggest that VRAIL serves as a general, model-agnostic framework for reward shaping that enhances both learning and interpretability.
我们建议VRAIL(基于解释性学习的激励奖励),这是一个基于价值的强化学习的双级框架(RL),从各州的特点中学习可解释的重量表现。 VRAIL由两个阶段组成:一个深层次的学习(DL)阶段,该阶段符合使用国家特点的估计价值功能,一个利用这一阶段来通过潜在的奖励转换影响学习的RAL阶段。该估计以线性或二次形式建模,允许赋予个人特征及其互动的重要性。出租车-V3环境的经验性结果表明,VRAIL与标准的DQN相比,提高了培训稳定性和趋同性,而不需要环境的修改。进一步的分析表明,VRAIL发现了具有内在意义的次级目标,例如乘客拥有,突出其产生人际行为的能力。我们的研究结果表明,VRAIL作为奖励塑造既能提高学习能力又能解释性的一般、示范性、不可忽视性框架。
Article 250
Title@2025-06-25 (3): WallStreetFeds: Client-Specific Tokens as Investment Vehicles in Federated Learning
Title: WallStreetFeds: Client-Specific Tokens as Investment Vehicles in Federated Learning | WallStreetFeds: Kundenspezifische Token als Investment Vehicles in Federated Learning | WallStreetFededs: 客户特有名称作为联邦学习联盟的投资工具 2506.20518v1 |
Authors (3): Arno Geimer, Beltran Fiz Pontiveros, Radu State
Federated Learning (FL) is a collaborative machine learning paradigm which allows participants to collectively train a model while training data remains private. This paradigm is especially beneficial for sectors like finance, where data privacy, security and model performance are paramount. FL has been extensively studied in the years following its introduction, leading to, among others, better performing collaboration techniques, ways to defend against other clients trying to attack the model, and contribution assessment methods. An important element in for-profit Federated Learning is the development of incentive methods to determine the allocation and distribution of rewards for participants. While numerous methods for allocation have been proposed and thoroughly explored, distribution frameworks remain relatively understudied. In this paper, we propose a novel framework which introduces client-specific tokens as investment vehicles within the FL ecosystem. Our framework aims to address the limitations of existing incentive schemes by leveraging a decentralized finance (DeFi) platform and automated market makers (AMMs) to create a more flexible and scalable reward distribution system for participants, and a mechanism for third parties to invest in the federation learning process.
联邦学习联合会(FL)是一个合作的机器学习模式,它使参与者能够集体培训一个模型,而培训数据仍然是私有的。这种模式对金融等部门特别有益,因为其数据隐私、安全和模型性能是最重要的。在引入该模式之后的几年里,对FL进行了广泛的研究,除其他外,导致更好的合作技巧、针对试图攻击该模式的其他客户的辩护方法以及会费评估方法。为营利性联邦学习制定奖励方法,以确定参与者的奖赏分配和分配。尽管已经提出并彻底探讨了许多分配方法,但分配框架仍然相对不足。在本文件中,我们提出了一个新的框架,将客户专用象征性物作为FL生态系统的投资工具。我们的框架旨在通过分散化的融资平台和自动化市场制造者,为参与者建立一个更加灵活和可扩展的奖赏分配系统,以及第三方投资于联邦学习进程的机制,以解决现有奖励办法的局限性。
Article 251
Title@2025-06-25 (3): Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch
Title: Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch | Schnelle Bodendurchdringung Radar Dual-Parameter Vollwellenform Inversion Methode beschleunigt durch hybride Zusammenstellung von CUDA-Kernel-Funktion und PyTorch | 通过混合汇编CUDA内核功能和PyTorch,加速采用快速穿透快速地面雷达双参数双参数全波形反转法 2506.20513v1 |
Authors (6): Lei Liu, Chao Song, Liangsheng He, Silin Wang, Xuan Feng, Cai Liu
This study proposes a high-performance dual-parameter full waveform inversion framework (FWI) for ground-penetrating radar (GPR), accelerated through the hybrid compilation of CUDA kernel functions and PyTorch. The method leverages the computational efficiency of GPU programming while preserving the flexibility and usability of Python-based deep learning frameworks. By integrating customized CUDA kernels into PyTorch’s automatic differentiation mechanism, the framework enables accurate and efficient inversion of both dielectric permittivity and electrical conductivity. Experimental evaluations on synthetic data and real wavefield data demonstrate that the proposed method achieves dual-parameter FWI for GPR data while maintaining high accuracy. Moreover, the framework is flexible and extensible, supporting optional regularization strategies such as total variation and multi-scale inversion. These features make the proposed approach a practical and scalable framework for rapid GPR-based subsurface imaging in applications including civil engineering, environmental monitoring, and geophysical exploration.
这项研究建议对地面穿透雷达采用高性能双参数全波变形框架,通过综合汇编CUDA内核功能和PyTorrch加速进行。该方法利用了GPU编程的计算效率,同时保持了基于Python的深层学习框架的灵活性和使用性。通过将CUDA内核纳入PyTorch的自动区分机制,该框架能够准确和有效地转换电动允许性和电导性。对合成数据和实际波地数据进行的实验评估表明,拟议方法在保持高准确性的同时,实现了GPR数据的双参数FWI。此外,该框架灵活和可推广,支持完全变换和多尺度转换等备选的正规化战略。这些特点使得拟议方法成为在民用工程、环境监测和地球物理勘探等应用中快速以GPR为基础的次表层成像的实用和可扩展框架。
Article 252
Title@2025-06-25 (3): OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Title: OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling | OctoThinker: Mittleres Training fördert verstärktes Lernen Scaling | OctoThinker: 中级培训鼓励加强学习 2506.20512v1 |
Authors (4): Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
Different base language model families, such as Llama and Qwen, exhibit divergent behaviors during post-training with reinforcement learning (RL), especially on reasoning-intensive tasks. What makes a base language model suitable for reinforcement learning? Gaining deeper insight into this question is essential for developing RL-scalable foundation models of the next generation. In this work, we investigate how mid-training strategies shape RL dynamics, focusing on two representative model families: Qwen and Llama. Our study reveals that (1) high-quality mathematical corpora, such as MegaMath-Web-Pro, significantly improve both base model and RL performance, while existing alternatives (e.g., FineMath-4plus) fail to do so; (2) further adding QA-style data, particularly long chain-of-thought (CoT) reasoning examples, enhances RL outcomes, and instruction data further unlocks this effect; (3) while long-CoT improves reasoning depth, it can also induce verbosity of model responses and unstability of RL training, underscoring the importance of data formatting; (4) scaling mid-training consistently leads to stronger downstream RL performance. Building on these insights, we introduce a two-stage mid-training strategy, Stable-then-Decay, in which base models are first trained on 200B tokens with a constant learning rate, followed by 20B tokens across three CoT-focused branches with learning rate decay. This yields OctoThinker, a family of models demonstrating strong RL compatibility and closing the performance gap with more RL-friendly model families, i.e., Qwen. We hope our work will help shape pre-training strategies for foundation models in the RL era. To support further research, we release our open-source models along with a curated math reasoning-intensive corpus of over 70 billion tokens (i.e., MegaMath-Web-Pro-Max).
Llama 和 Quwen 等不同基础语言模式家庭在强化学习(RL) 后培训期间表现出不同的行为, 特别是在强化学习( RL) 上。 是什么使得基础语言模式适合强化学习? 更深入地了解这一问题对于开发下一代RL可扩缩的基础模型至关重要。 在这项工作中, 我们调查中培训战略如何影响RL动态, 侧重于两个具有代表性的模式家庭: Quen 和 Llama。 我们的研究显示:(1) 高质量的数学公司, 如MegaMath- Web-Pro, 大大改进了基础模型和RL的绩效, 而现有的替代模式( 例如, FinalMath-4+) 却未能这样做; (2) 进一步增加QA类数据, 特别是长期思考( CoT) 推理学实例, 增强RL的结果, 指导数据数据数据数据进一步释放前的深度。
Article 253
Title@2025-06-25 (3): Collaborative Batch Size Optimization for Federated Learning
Title: Collaborative Batch Size Optimization for Federated Learning | Kollaborative Batch-Größenoptimierung für Federated Learning | 联邦学习联合会的合作批量数量优化 2506.20511v1 |
Authors (3): Arno Geimer, Karthick Panner Selvam, Beltran Fiz Pontiveros
Federated Learning (FL) is a decentralized collaborative Machine Learning framework for training models without collecting data in a centralized location. It has seen application across various disciplines, from helping medical diagnoses in hospitals to detecting fraud in financial transactions. In this paper, we focus on improving the local training process through hardware usage optimization. While participants in a federation might share the hardware they are training on, since there is no information exchange between them, their training process can be hindered by an improper training configuration. Taking advantage of the parallel processing inherent to Federated Learning, we use a greedy randomized search to optimize local batch sizes for the best training settings across all participants. Our results show that against default parameter settings, our method improves convergence speed while staying nearly on par with the case where local parameters are optimized.
联邦学习联合会(FL)是一个分散合作的机械学习框架,用于培训模式,而不在集中地点收集数据,它在不同学科应用,从帮助医院的医疗诊断到发现金融交易中的欺诈。在本文中,我们侧重于通过硬件使用优化改善当地培训过程。联邦参与者可能分享他们正在接受培训的硬件,因为他们之间没有信息交流,他们的培训过程可能受到不适当的培训配置的阻碍。我们利用联邦学习协会固有的平行处理,利用贪婪随机搜索,优化所有参与者的最佳培训环境的本地批量规模。我们的结果显示,在默认参数设置下,我们的方法可以提高趋同速度,同时与优化当地参数的情况保持接近一致。
Article 254
Title@2025-06-25 (3): LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Title: LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation | LPOSS: Label Propagation über Patches und Pixel für Open-vocabulary Semantic Segmentation | LPOSS: 用于开放式词汇语义分解的补丁和像素的标签传播 2503.19777v2 |
Authors (4): Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias
We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS
我们建议采用开放词汇语义分离模型(VLMs)为开放词汇语义分隔法(VLMs)提供一种不培训的方法。我们的方法通过标签传播,提高VLM最初的每批预测,通过将补到补到补关系,共同优化预测。由于VLMs主要优化用于跨模式调整,而不是用于模式内相似性,我们使用观察到的愿景模型(VM)更好地捕捉这些关系。我们通过在像素水平上使用标签作为改进步骤来解决基于补丁的编码器(VM)固有的分辨率限制,显著提高分类边界附近的分解准确性。我们称为LPOSS+(LPOSS+)的方法对整个图像进行推断,避免基于窗口的处理,从而在整个图像中捕捉到背景互动。LPOSS+(LPOSS+)在一系列不同的数据集中实现无培训方法的状态性能。代码:https://github.com/vladan-tojnic/LPOSSS)
Article 255
Title@2025-06-25 (3): Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank
Title: Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank | Unidentifiziert und verwechselt? Zwei-Tower-Modelle für unvoreingenommenes Lernen verstehen | 如何理解两塔式的无偏见学习模式到排名? 2506.20501v1 |
Authors (3): Philipp Hager, Onno Zoeter, Maarten de Rijke
Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.
添加二至下方模型是处理行业环境中有偏见用户反馈的流行式学习到等级方法。然而,最近的研究报告报告了有关现象:在业绩良好的生产系统收集的点击器上培训双到下方模型导致排名业绩下降。本文调查了这一观察的最近两个解释:伐木政策和模型可识别性问题的混杂效应。我们从理论上分析了两到下方模型的可识别性条件,表明要么需要不同位置的文件互换,要么需要重叠的特征分布,以便从点击中恢复模型参数。我们还调查了伐木政策对二到下方模型的影响,发现在模型完全捕捉用户行为时,它们没有引入偏差。然而,伐木政策可以在模型不完善地捕捉用户行为时扩大偏差,特别是当预测错误与跨位置的文件放置相关时。我们建议了一种抽样加权技术,以缓解这些效应,并为使用二到式模型的研究人员和从业人员提供可操作的洞察力。
Article 256
Title@2025-06-25 (3): Training Plug-n-Play Knowledge Modules with Deep Context Distillation
Title: Training Plug-n-Play Knowledge Modules with Deep Context Distillation | Training Plug-n-Play Wissensmodule mit Deep Context Destillation | 具有深背景蒸馏作用的培训插件-n-玩耍知识模块 2503.08727v3 |
Authors (5): Lucas Caccia, Alan Ansell, Edoardo Ponti, Ivan Vulić, Alessandro Sordoni
Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and RAG.
在(Large)语言模型预培训之后,动态地整合新的或迅速发展的信息仍然是一项挑战,特别是在低数据情景中或在处理私人和专门文件时。内流学习和检索增强的一代(RAG)面临一些限制,包括高推论成本和无法获取全球文件信息。在本文件中,我们建议通过培训文件级知识模块(KMS)将知识模块化。知识管理是作为节能LORA模块实施的轻量级组件,这些模块经过培训,可以储存关于新文件的信息,并且可以很容易地按需插入模型。我们表明,下端预测作为知识管理的培训目标表现不佳。我们建议深背景蒸馏:我们学习知识管理参数,例如模拟隐蔽状态和在背景中选择文件的教师的登录。我们的方法超越了标准下端预测和前测试培训技术,横跨两个数据集。最后,我们强调知识管理和RAG之间的协同作用。
Article 257
Title@2025-06-25 (3): Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Title: Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging | Gut, ich werde es selbst verschmelzen: Ein Multi-Fidelity-Framework für automatisiertes Modellverschmelzen | 好吧,我会合并它我自己:一个自动模型合并的多功能框架 2502.04030v2 |
Authors (2): Guinan Su, Jonas Geiping
Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.
理性能力代表了大型语言模型(LLMS)的关键前沿,但开发这些模型需要广泛的专有数据集和计算资源。 高效地补充能力的方法之一是模型合并,通过不再培训合并多个模型提供了有希望的替代办法。然而,目前的合并方法依靠人工设计的战略,将超参数合并,限制了对潜在模型组合的探索,需要大量人力投入。我们提议了一个自动化模型合并框架,以便能够精细探索合并战略,同时通过多功能近似来降低成本。我们支持单一和多目标优化,并引入两个新的搜索空间:分层融合和深度整合。我们发现,在对一些基准进行评估时,自动发现:(1) 合并,即使模型已经对各项任务进行了调整,也能够进一步促进单一目标性业绩的合并;(2) 优化跨任务的多目标边界的合并。有效的合并是有限的,例如,在不到500个搜索步骤内。
Article 258
Title@2025-06-25 (3): ReCode: Updating Code API Knowledge with Reinforcement Learning
Title: ReCode: Updating Code API Knowledge with Reinforcement Learning | ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen | ReCode:更新法规API知识与强化学习 2506.20495v1 |
Authors (5): Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs’ code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs’ general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.
大型语言模型(LLMS)具有非凡的代码生成能力,但在适应外部图书馆API的频繁更新时却步履维艰。这一关键限制来自对培训数据中过时的 API 知识的依赖,即使能够查阅现有文件,从而在动态环境中阻碍可靠的代码生成。为了解决这一问题,我们提议ReCode(基于规则的加强学习以更新代码),这是一个模仿人类程序程序员适应API变化的新框架。具体地说,我们建立一个大约2 000个数据条目的数据集,以培训LLMS进行基于更新信息的版本的迁移。然后,我们引入一个修改后的代码评估字符串相似度指标,作为强化学习的奖励。我们的实验表明,ReCode大大提升了LPIS在动态API情景中的代码生成性能,特别是在隐蔽的代码AredateArena任务上。与监管的微调相比,ReCode对于LMS的一般代码生成能力影响较小。我们应用了一套LMS和强化学习算法(GPO和DAPO),所有这些都都实现了一致的改进。 值得注意的是,在培训后,Quender2.5-C-7BB的模型/Rebroughdaldroformax
Article 259
Title@2025-06-25 (3): Multimodal Representation Learning and Fusion
Title: Multimodal Representation Learning and Fusion | Multimodales Repräsentationslernen und -fusion | 多模式代表性学习和融合 2506.20494v1 |
Authors (10): Qihang Jin, Enze Ge, Yuhang Xie, Hongying Luo, Junhao Song, Ziqian Bi, Chia Xin Liang, Jibin Guan, Joe Yeong, Junfeng Hao
Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each modality, multi-modal learning allows AI systems to build stronger and richer internal representations. These help machines better interpretation, reasoning, and making decisions in real-life situations. This field includes core techniques such as representation learning (to get shared features from different data types), alignment methods (to match information across modalities), and fusion strategies (to combine them by deep learning models). Although there has been good progress, some major problems still remain. Like dealing with different data formats, missing or incomplete inputs, and defending against adversarial attacks. Researchers now are exploring new methods, such as unsupervised or semi-supervised learning, AutoML tools, to make models more efficient and easier to scale. And also more attention on designing better evaluation metrics or building shared benchmarks, make it easier to compare model performance across tasks and domains. As the field continues to grow, multi-modal learning is expected to improve many areas: computer vision, natural language processing, speech recognition, and healthcare. In the future, it may help to build AI systems that can understand the world in a way more like humans, flexible, context aware, and able to deal with real-world complexity.
多模式学习是人工智能中一个快速增长的领域。 它试图帮助机器通过综合来自不同来源的信息(如图像、文本和音频)来理解复杂的事物。 通过利用每种模式的优势,多模式学习使AI系统能够建立更强大和更丰富的内部代表。 这些帮助机器在现实环境中更好地解释、推理和作出决定。 这个领域包括核心技术,例如代表性学习(以不同数据类型获得共同特征)、协调方法(以不同模式匹配信息)和融合战略(以深层次学习模式结合信息)等。虽然取得了良好的进展,但仍然存在一些重大问题。就像处理不同数据格式、缺失或不完整的投入以及防御对抗性攻击一样。现在研究人员正在探索新方法,如无监督或半监督的学习、自动ML工具等,以使模型更高效和更容易规模化。这个领域还包括更多关注设计更好的评价衡量标准或建立共同基准,从而更容易比较跨任务和领域的模型绩效。随着实地的不断增长,多模式学习将帮助改进许多领域: 计算机视野、 自然语言处理、语音识别和保健系统等。 在将来, 将它与真实的复杂程度相适应世界。
Article 260
Title@2025-06-25 (3): Non-equilibrium Annealed Adjoint Sampler
Title: Non-equilibrium Annealed Adjoint Sampler | Nicht-Equilibrium Annealed Adjoint Sampler | 非平衡 Annaaled 联合采样器 2506.18165v2 |
Authors (4): Jaemoo Choi, Yongxin Chen, Molei Tao, Guan-Horng Liu
Recently, there has been significant progress in learning-based diffusion samplers, which aim to sample from a given unnormalized density. These methods typically follow one of two paradigms: (i) formulating sampling as an unbiased stochastic optimal control (SOC) problem using a canonical reference process, or (ii) refining annealed path measures through importance-weighted sampling. Although annealing approaches have advantages in guiding samples toward high-density regions, reliance on importance sampling leads to high variance and limited scalability in practice. In this paper, we introduce the \textbf{Non-equilibrium Annealed Adjoint Sampler (NAAS)}, a novel SOC-based diffusion sampler that leverages annealed reference dynamics without resorting to importance sampling. NAAS employs a lean adjoint system inspired by adjoint matching, enabling efficient and scalable training. We demonstrate the effectiveness of our approach across a range of tasks, including sampling from classical energy landscapes and molecular Boltzmann distribution.
最近,在以学习为基础的扩散采样器方面取得了显著进展,这些采样器旨在从特定非正常密度中取样,这些方法通常遵循两种范式之一:(一) 使用罐头参照程序将采样制成一个无偏见的随机最佳控制(SOC)问题,或(二) 通过重要加权抽样改进防疫路径测量方法,尽管防疫方法在引导样品进入高密度区域方面有优势,但依赖重要取样在实际操作中导致差异很大和可缩放性有限。在本文中,我们引入了基于新颖SOC的传播采样器,在不使用重要取样的情况下,利用静脉参照动态。NAAS采用了一种精细的连接系统,这种系统受到联合匹配的启发,能够进行有效和可缩放的培训。我们展示了我们的方法在一系列任务中的有效性,包括来自古典能源景观和分子博尔茨曼分布的取样。
Article 261
Title@2025-06-25 (3): Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning
Title: Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning | Offline-Zielkonditioniertes Verstärkungslernen mit projektiver Quasimetrieplanung | 离线目标-有条件加强强化学习,进行预测准准准准规划 2506.18847v2 |
Authors (5): Anthony Kobanda, Waris Radji, Mathieu Petitbois, Odalric-Ambrym Maillard, Rémy Portelas
Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.
离线目标强化强化学习(ProQ)旨在培训代理人员从先前收集的轨迹中达到特定目标。 向长方位任务承诺的扩展仍然具有挑战性, 特别是由于价值估计错误的复杂性。 原则几何为解决这些问题提供了潜在的解决方案。 在进行这一深入了解后, 我们引入了预测量规划(ProQ), 即一个能够学习不对称距离并随后重新加以利用的构成框架, 首先是令人厌恶的能量, 迫使一组稀少的关键点在所学的潜在空间上统一分布, 其次是向准目标提供结构化的方向成本指导。 特别是, ProQ 将这一几何方法与拉格朗吉亚分流探测器相配对, 以确保所学的关键点保持在可达的区域内。 通过统一指标学习、 关键点覆盖和有目标的控控控, 我们的方法产生了有意义的次级目标, 并强有力地驱动对不同导航基准的长方位目标产生影响。
Article 262
Title@2025-06-25 (3): Counterfactual Influence as a Distributional Quantity
Title: Counterfactual Influence as a Distributional Quantity | Gegenfaktischer Einfluss als Verteilungsmenge | 分发量的反事实影响 2506.20481v1 |
Authors (4): Matthieu Meeus, Igor Shilov, Georgios Kaissis, Yves-Alexandre de Montjoye
Machine learning models are known to memorize samples from their training data, raising concerns around privacy and generalization. Counterfactual self-influence is a popular metric to study memorization, quantifying how the model’s prediction for a sample changes depending on the sample’s inclusion in the training dataset. However, recent work has shown memorization to be affected by factors beyond self-influence, with other training samples, in particular (near-)duplicates, having a large impact. We here study memorization treating counterfactual influence as a distributional quantity, taking into account how all training samples influence how a sample is memorized. For a small language model, we compute the full influence distribution of training samples on each other and analyze its properties. We find that solely looking at self-influence can severely underestimate tangible risks associated with memorization: the presence of (near-)duplicates seriously reduces self-influence, while we find these samples to be (near-)extractable. We observe similar patterns for image classification, where simply looking at the influence distributions reveals the presence of near-duplicates in CIFAR-10. Our findings highlight that memorization stems from complex interactions across training data and is better captured by the full influence distribution than by self-influence alone.
机器学习模型可以记住其培训数据中的样本,引起人们对隐私和一般化的关切。反事实自我影响是一种研究记忆化的流行衡量标准,它量化了模型对样本变化的预测,取决于将样本纳入培训数据集的情况。然而,最近的工作表明,自我影响会受到自我影响以外的因素的影响,而其他培训样本,特别是(近(近)的复制品,具有很大影响。我们在这里研究将反事实影响作为分配数量处理的记忆化问题。我们考虑到所有培训样本如何影响一个样本的记忆化。对于一个小语言模型,我们计算模型对样本的抽样变化的全面影响分布,分析其特性。我们发现,仅仅研究自我影响可能会严重低估与记忆化相关的有形风险,特别是(近)的(近)复制品的存在会严重减少自我影响,而我们发现这些样本是(近(近)易)的。我们观察了相似的图像分类模式,仅仅看一个小语言模型对培训样本的全面影响分布会如何,而仅仅看一个小的图像分布会显示我们从综合分析中获取的数据分析的自我分析会更好。
Article 263
Title@2025-06-25 (3): Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Title: Graph Linearization Methods for Reasoning on Graphs with Large Language Models | Graphische Linearisierungsmethoden zur Begründung von Graphen mit großen Sprachmodellen | 用于解释大语言模型图表的线性线性方法 2410.19494v3 |
Authors (9): Christos Xypolopoulos, Guokan Shang, Xiao Fei, Giannis Nikolentzos, Hadi Abdine, Iakovos Evdaimon, Michail Chatzianastasis, Giorgos Stamou, Michalis Vazirgiannis
Large language models have evolved to process multiple modalities beyond text, such as images and audio, which motivates us to explore how to effectively leverage them for graph reasoning tasks. The key question, therefore, is how to transform graphs into linear sequences of tokens, a process we term “graph linearization”, so that LLMs can handle graphs naturally. We consider that graphs should be linearized meaningfully to reflect certain properties of natural language text, such as local dependency and global alignment, in order to ease contemporary LLMs, trained on trillions of textual tokens, better understand graphs. To achieve this, we developed several graph linearization methods based on graph centrality and degeneracy. These methods are further enhanced using node relabeling techniques. The experimental results demonstrate the effectiveness of our methods compared to the random linearization baseline. Our work introduces novel graph representations suitable for LLMs, contributing to the potential integration of graph machine learning with the trend of multimodal processing using a unified transformer model.
大型语言模型已经演变为处理文本以外的多种模式,例如图像和音频,这促使我们探索如何有效地利用图像和音频来进行图形推理任务。因此,关键问题是如何将图形转换成线性象征序列,我们称之为“线性线性化”,这样LLMS就能自然地处理图形。我们认为,图表应该有意义地线化,以反映自然语言文本的某些特性,例如当地依赖性和全球对齐,以便方便当代LMS, 接受数万亿个文本符号的培训, 更好地理解图表。为了实现这一目标,我们开发了几种基于图形中心性和退化性的图形线性化方法。这些方法利用节点再标签技术得到进一步加强。实验结果显示了我们的方法相对于随机线性基准的有效性。我们的工作引入了适合LMS的新型图形表达方式,有助于将图形机学习与使用统一的变压模型进行多式联运的趋势结合起来。
Article 264
Title@2025-06-25 (3): MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing
Title: MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing | MARCO: Multi-Agent Code-Optimierung mit Echtzeit-Knowledge Integration für High-Performance Computing | MARCO: 利用实时知识整合优化多机构代码,促进高绩效计算 2505.03906v3 |
Authors (10): Asif Rahman, Veljko Cvetkovic, Kathleen Reece, Aidan Walters, Yasir Hassan, Aneesh Tummeti, Bryan Torres, Denise Cooney, Margaret Ellis, Dimitrios S. Nikolopoulos
Large language models (LLMs) have transformed software development through code generation capabilities, yet their effectiveness for high-performance computing (HPC) remains limited. HPC code requires specialized optimizations for parallelism, memory efficiency, and architecture-specific considerations that general-purpose LLMs often overlook. We present MARCO (Multi-Agent Reactive Code Optimizer), a novel framework that enhances LLM-generated code for HPC through a specialized multi-agent architecture. MARCO employs separate agents for code generation and performance evaluation, connected by a feedback loop that progressively refines optimizations. A key innovation is MARCO’s web-search component that retrieves real-time optimization techniques from recent conference proceedings and research publications, bridging the knowledge gap in pre-trained LLMs. Our extensive evaluation on the LeetCode 75 problem set demonstrates that MARCO achieves a 14.6\% average runtime reduction compared to Claude 3.5 Sonnet alone, while the integration of the web-search component yields a 30.9\% performance improvement over the base MARCO system. These results highlight the potential of multi-agent systems to address the specialized requirements of high-performance code generation, offering a cost-effective alternative to domain-specific model fine-tuning.
大型语言模型(LLMS)通过代码生成能力改变了软件开发,但其高效高性能计算(HPC)的效果仍然有限,HPC代码要求专门优化平行性、记忆效率以及一般通用LMS经常忽略的建筑特有考虑。我们介绍了MARCO(MLCO)(Multi-Agency Reactive Code Apptimerimizer),这是一个新颖的框架,它通过专门的多试剂结构加强LLMM为HPC生成的代码。MARCO使用不同的代码生成和绩效评估代理,通过逐步完善优化的反馈回路进行连接。一个关键的创新是MARCO的网络搜索组件,它从最近的会议记录和研究出版物中检索实时优化技术,缩小培训前LMS的知识差距。我们对LetCode 75问题集的广泛评价表明,MARCO仅与Claude 3.5 Sonnet系统相比,平均减少了14.6 %的运行时间,而网络搜索组件的整合则使MARCO系统的业绩得到30.9的改进。这些结果突出表明,多试剂系统有可能解决高性能模型生成的专门要求。
Article 265
Title@2025-06-25 (3): Physics-informed Imitative Reinforcement Learning for Real-world Driving
Title: Physics-informed Imitative Reinforcement Learning for Real-world Driving | Physik-informiert Imitative Verstärkungs-Lernen für das Fahren in der realen Welt | 为现实世界驾驶进行物理知情的模拟强化学习 2407.02508v3 |
Authors (4): Hang Zhou, Yihao Qin, Dan Xu, Yiding Ji
Recent advances in imitative reinforcement learning (IRL) have considerably enhanced the ability of autonomous agents to assimilate expert demonstrations, leading to rapid skill acquisition in a range of demanding tasks. However, such learning-based agents face significant challenges when transferring knowledge to highly dynamic closed-loop environments. Their performance is significantly impacted by the conflicting optimization objectives of imitation learning (IL) and reinforcement learning (RL), sample inefficiency, and the complexity of uncovering the hidden world model and physics. To address this challenge, we propose a physics-informed IRL that is entirely data-driven. It leverages both expert demonstration data and exploratory data with a joint optimization objective, allowing the underlying physical principles of vehicle dynamics to emerge naturally from the training process. The performance is evaluated through empirical experiments and results exceed popular IL, RL and IRL algorithms in closed-loop settings on Waymax benchmark. Our approach exhibits 37.8% reduction in collision rate and 22.2% reduction in off-road rate compared to the baseline method.
在模仿强化学习(IRL)方面最近取得的进展大大提高了自主机构吸收专家示范的能力,导致在一系列艰巨任务中迅速获得技能;然而,这种以学习为基础的机构在向高度动态的闭路环境转让知识时面临重大挑战;由于模仿学习(IL)和强化学习(RL)的优化目标相互冲突、抽样效率低下以及发现隐蔽世界模式和物理学的复杂性,其业绩受到显著影响。为了应对这一挑战,我们提议采用完全以数据为驱动的物理知情的IRL。它利用专家示范数据和探索性数据,共同优化目标,使车辆动态的基本物理原理能够自然地从培训过程产生。通过经验实验和结果评价业绩,在Waymax基准的闭路环境中超过流行的IL、RL和IRL算法。我们的方法显示碰撞率降低了37.8%,离轨率比基线方法降低了22.2%。
Article 266
Title@2025-06-25 (3): HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
Title: HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling | HiWave: Schulungsfreie High-Resolution-Bildgenerierung über Wavelet-basierte Diffusions-Sampling | Hiwave:通过以波子为基础的传播抽样生成高分辨率图像,无需培训 2506.20452v1 |
Authors (4): Tobias Vontobel, Seyedmorteza Sadat, Farnood Salehi, Romann M. Weber
Diffusion models have emerged as the leading approach for image synthesis, demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts, including object duplication and spatial incoherence. In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline: generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures. Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality. A user study confirmed HiWave’s performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.
传播模型已成为图像合成的主导方法,显示了特殊的光现实主义和多样性。然而,高分辨率培训传播模型的推广模型仍然在计算上令人望而却步,现有将图像合成到培训分辨率之外的零光生成技术往往产生艺术品,包括物体重复和空间不一致。在本文中,我们引入了HiWave,这是一个无培训的零光模型,它大大加强了超高分辨率图像合成的视觉真实性和结构一致性,使用预先培训的传播模型。我们的方法使用两阶段管道:从预先培训的模型中产生一个基础图像,随后采用宽度的DDIM转换步骤和一个基于新颖的波盘细节增强模块。具体地说,我们首先使用零光生成方法来合成图像,从基本图像中获取能够保持全球一致性的初始噪声矢量。随后,在取样过程中,我们的波盘-部分保留了基础图像中的低频组件,以确保结构一致性,同时有选择地指导高频组件,以丰富精细的细节和文字。使用Stable Difredistration XL进行广泛的评价,然后用新颖的波盘转换步骤显示,在先前的高级图像质量中,在高清晰度研究中,在高分辨率分析中,在高分辨率中,在前的图像中实现了高分辨率分析中,在高分辨率分析中,在高端研究中,在高端研究中,在高分辨率分析中实现了。
Article 267
Title@2025-06-25 (3): Automatic Demonstration Selection for LLM-based Tabular Data Classification
Title: Automatic Demonstration Selection for LLM-based Tabular Data Classification | Automatische Demonstrationsauswahl für LLM-basierte Tabellendatenklassifikation | 以LLM为基础的表格数据分类的自动演示选择 2506.20451v1 |
Authors (2): Shuchu Han, Wolfgang Bruckner
A fundamental question in applying In-Context Learning (ICL) for tabular data classification is how to determine the ideal number of demonstrations in the prompt. This work addresses this challenge by presenting an algorithm to automatically select a reasonable number of required demonstrations. Our method distinguishes itself by integrating not only the tabular data’s distribution but also the user’s selected prompt template and the specific Large Language Model (LLM) into its estimation. Rooted in Spectral Graph Theory, our proposed algorithm defines a novel metric to quantify the similarities between different demonstrations. We then construct a similarity graph and analyze the eigenvalues of its Laplacian to derive the minimum number of demonstrations capable of representing the data within the LLM’s intrinsic representation space. We validate the efficacy of our approach through experiments comparing its performance against conventional random selection algorithms on diverse datasets and LLMs.
在表格数据分类中应用In-Contlearning(ICL)的一个基本问题是,如何确定快速演示的理想数量。这项工作通过提供自动选择合理数量所需演示的算法来应对这一挑战。我们的方法不仅通过将表格数据分布,而且通过将用户选定的快速模板和特定的大语言模型(LLM)纳入估计而有所区别。我们的拟议算法以光谱图理论为基础,定义了一种新颖的衡量标准,以量化不同演示之间的相似性。我们随后绘制了一个相似性图,并分析了其拉帕拉西语的egen值,以得出能够代表LLM内在代表空间内数据的最低演示量。我们通过对不同数据集和LLMS的常规随机选择算法进行比较,验证了我们方法的有效性。
Article 268
Title@2025-06-25 (3): Image Super-Resolution with Guarantees via Conformalized Generative Models
Title: Image Super-Resolution with Guarantees via Conformalized Generative Models | Bild Super-Resolution mit Garantien über konformisierte Generative Modelle | 图像超级分辨率,通过正规化创制模型提供保障 2502.09664v2 |
Authors (3): Eduardo Adame, Daniel Csillag, Guilherme Tegoni Goedert
The increasing use of generative ML foundation models for image restoration tasks such as super-resolution calls for robust and interpretable uncertainty quantification methods. We address this need by presenting a novel approach based on conformal prediction techniques to create a ‘confidence mask’ capable of reliably and intuitively communicating where the generated image can be trusted. Our method is adaptable to any black-box generative model, including those locked behind an opaque API, requires only easily attainable data for calibration, and is highly customizable via the choice of a local image similarity metric. We prove strong theoretical guarantees for our method that span fidelity error control (according to our local image similarity metric), reconstruction quality, and robustness in the face of data leakage. Finally, we empirically evaluate these results and establish our method’s solid performance.
在超分辨率等图像恢复任务中越来越多地使用基因化 ML 基础模型,这要求采用稳健和可解释的不确定性量化方法。我们通过提出基于一致预测技术的新颖方法来满足这一需要,以创建能够可靠和直觉地沟通生成图像的“信任面罩 ” , 从而能够可靠和直觉地沟通到可以信任生成图像的地方。 我们的方法可以适应任何黑盒的基因模型,包括那些被锁在不透明的 API 背后的基因模型,只需要容易获得的校准数据,并且通过选择本地图像相似度标准可以高度定制。 我们证明,在理论上提供了强有力的保障,用以控制(根据我们本地图像相似度衡量标准 ) , 重建质量, 以及面对数据泄漏时的稳健性。 最后,我们用经验来评估这些结果并确定我们的方法的可靠性能。
Article 269
Title@2025-06-25 (3): Méthode de quadrature pour les PINNs fondée théoriquement sur la hessienne des résiduels
Title: Méthode de quadrature pour les PINNs fondée théoriquement sur la hessienne des résiduels | Méthode de quadrature pour les PINNs Fondée théoriquement sur la hessienne des résiduels | 厄立特里亚武装部队武装部队武装部队的 PIN-PIN-PENs 省立立立方体元体 2506.20441v1 |
Authors (5): Antoine Caradot, Rémi Emonet, Amaury Habrard, Abdel-Rahim Mezidi, Marc Sebban
Physics-informed Neural Networks (PINNs) have emerged as an efficient way to learn surrogate neural solvers of PDEs by embedding the physical model in the loss function and minimizing its residuals using automatic differentiation at so-called collocation points. Originally uniformly sampled, the choice of the latter has been the subject of recent advances leading to adaptive sampling refinements. In this paper, we propose a new quadrature method for approximating definite integrals based on the hessian of the considered function, and that we leverage to guide the selection of the collocation points during the training process of PINNs.
物理知情神经网络(PINNs)通过将物理模型嵌入损失功能并使用所谓的合用点的自动区别来尽量减少其残渣,已成为学习PDE的替代神经溶液的有效方法。最初统一抽样,后者的选择是最近取得适应性取样改进的进展的主题。在本文件中,我们提出了一种基于考虑功能的黑手党的近似固定整体体的新的二次方位方法,我们利用这种方法指导在PINNs培训过程中选择合用点。
Article 270
Title@2025-06-25 (3): Tackling Data Heterogeneity in Federated Learning through Knowledge Distillation with Inequitable Aggregation
Title: Tackling Data Heterogeneity in Federated Learning through Knowledge Distillation with Inequitable Aggregation | Bekämpfung von Daten Heterogenität im Föderierten Lernen durch Wissensdestillation mit unwiderruflicher Aggregation | 通过知识蒸馏处理联邦学习中的数据异质性,以不平等的聚合方式进行知识蒸馏 2506.20431v1 |
Authors (1): Xing Ma
Federated learning aims to train a global model in a distributed environment that is close to the performance of centralized training. However, issues such as client label skew, data quantity skew, and other heterogeneity problems severely degrade the model’s performance. Most existing methods overlook the scenario where only a small portion of clients participate in training within a large-scale client setting, whereas our experiments show that this scenario presents a more challenging federated learning task. Therefore, we propose a Knowledge Distillation with teacher-student Inequitable Aggregation (KDIA) strategy tailored to address the federated learning setting mentioned above, which can effectively leverage knowledge from all clients. In KDIA, the student model is the average aggregation of the participating clients, while the teacher model is formed by a weighted aggregation of all clients based on three frequencies: participation intervals, participation counts, and data volume proportions. During local training, self-knowledge distillation is performed. Additionally, we utilize a generator trained on the server to generate approximately independent and identically distributed (IID) data features locally for auxiliary training. We conduct extensive experiments on the CIFAR-10/100/CINIC-10 datasets and various heterogeneous settings to evaluate KDIA. The results show that KDIA can achieve better accuracy with fewer rounds of training, and the improvement is more significant under severe heterogeneity.
联邦学习的目的是在接近集中培训业绩的分布环境中培训一个全球模式,然而,客户标签、数据数量、数据数量、其他异质问题等问题严重削弱了模式的绩效。大多数现有方法忽略了只有一小部分客户在大型客户环境中参与培训的情景,而我们的实验表明,这种情景是一个更具挑战性的联结学习任务。因此,我们提议与师生不均等聚合(KDIA)战略一起进行知识蒸馏,专门针对上述联合学习设置,这可以有效地利用所有客户的知识。在KDIA,学生模式是参与客户的平均组合,而教师模式则根据三个频率(参与间隔、参与计数和数据量比例)对所有客户进行加权组合。在当地培训期间,进行自我知识蒸馏。此外,我们利用在服务器上培训过的发电机在当地生成了大约独立和相同的分布数据特征,用于辅助培训。我们在CIFAR-10/10/CIDI中进行广泛的实验,在KFARA-10/10/NIDI中,以更精确的方式进行更精确地改进了KFAR-10/NIIS-10的模型。我们用更精确地评估了KA-10-10的模型,在更精确的模型下,能够更精确地改进。
Article 271
Title@2025-06-25 (3): Scalable Subset Selection in Linear Mixed Models
Title: Scalable Subset Selection in Linear Mixed Models | Skalierbare Subset-Auswahl in linearen gemischten Modellen | 线性混合模型中可缩放子集选择 2506.20425v1 |
Authors (3): Ryan Thompson, Matt P. Wand, Joanna J. J. Wang
Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine or adaptive marketing. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in synthetic experiments and illustrate its utility on two datasets from biology and journalism.
包含固定和随机效应的线性混合模型(LMMs)包含固定和随机效应,是分析多种数据的关键工具,如个性化医学或适应性营销。如今,这类数据越来越广泛,有时包含数千个候选预测器,需要大量预测和解释。然而,LMMs现有的稀疏学习方法没有超过几十或数百个预测器,与线性模型的稀疏方法相比,留下巨大差距,而线性模型的稀疏方法忽略了随机效应。本文用一个新的 $\ ell_0美元固定化方法填补了LMM子选择的缺口,该方法可以在包含数千个预测器的数据集上运行几秒钟到几分钟。在计算方面,我们开发了协调的下行算算法,作为我们的主要工作马,并提供其趋同的保证。我们还开发了一种本地搜索算法,以帮助绕过非convelx优化表面。这两种算法都很容易扩展为普通LMMMs的子筛选法,而忽略了随机效应。在统计方面,我们提供了一种定式的模模-模模集集集集集,以约束在新方法的库尔背-利差分差分差分差的分法上。然后展示了它与合成生物学的两极的功能。我们展示了它在合成生物学上。然后展示了它的极的精确性试验和精确性能。
Article 272
Title@2025-06-25 (3): Off-Policy Evaluation and Learning for the Future under Non-Stationarity
Title: Off-Policy Evaluation and Learning for the Future under Non-Stationarity | Off-Policy-Evaluierung und -Lernen für die Zukunft unter Nicht-Stationarität | 非政策性评价和在非标准化下学习未来 2506.20417v1 |
Authors (7): Tatsuhiro Shimizu, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Kei Tateno, Takuma Udagawa, Yuta Saito
We study the novel problem of future off-policy evaluation (F-OPE) and learning (F-OPL) for estimating and optimizing the future value of policies in non-stationary environments, where distributions vary over time. In e-commerce recommendations, for instance, our goal is often to estimate and optimize the policy value for the upcoming month using data collected by an old policy in the previous month. A critical challenge is that data related to the future environment is not observed in the historical data. Existing methods assume stationarity or depend on restrictive reward-modeling assumptions, leading to significant bias. To address these limitations, we propose a novel estimator named \textit{\textbf{O}ff-\textbf{P}olicy Estimator for the \textbf{F}uture \textbf{V}alue (\textbf{\textit{OPFV}})}, designed for accurately estimating policy values at any future time point. The key feature of OPFV is its ability to leverage the useful structure within time-series data. While future data might not be present in the historical log, we can leverage, for example, seasonal, weekly, or holiday effects that are consistent in both the historical and future data. Our estimator is the first to exploit these time-related structures via a new type of importance weighting, enabling effective F-OPE. Theoretical analysis identifies the conditions under which OPFV becomes low-bias. In addition, we extend our estimator to develop a new policy-gradient method to proactively learn a good future policy using only historical data. Empirical results show that our methods substantially outperform existing methods in estimating and optimizing the future policy value under non-stationarity for various experimental setups.
我们研究未来离政策评价(F-OPE)和学习(F-OPL)的新问题,以估计和优化非静止环境中政策的未来价值,因为非静止环境中的分布随时间而变化。例如,在电子商务建议中,我们的目标往往是使用旧政策在上个月收集的数据来估计和优化即将到来的月份的政策价值。一个严峻的挑战是历史数据中没有观察到与未来环境有关的数据。现有方法假定固定性或取决于限制性的薪酬模型模型假设,从而导致重大偏差。为了应对这些限制,我们提议了一个名为 extit_textb{Offff\ff\ textbf{P}O-licty Emlictial imator , 用于计算未来政策价值的新数字。 未来数据在历史模型中,我们可能无法持续地利用历史模型中的历史模型分析结果,我们只能从历史模型中获取新的数据,在历史模型中可以持续地利用新的数据模型,我们未来的模型,我们从历史模型中学习新的数据。
Article 273
Title@2025-06-25 (3): No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
Title: No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Kein kostenloses Mittagessen: Internes Feedback für LLM Reasoning neu denken | 无免费午餐:重新思考LLM理由解释的内部反馈 2506.17219v2 |
Authors (9): Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He
Reinforcement learning has emerged as a powerful paradigm for post-training large language models (LLMs) to improve reasoning. Approaches like Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) have shown strong results, but they require extensive external supervision. We investigate an alternative class of methods, Reinforcement Learning from Internal Feedback (RLIF), which relies solely on intrinsic model-derived signals instead of external rewards. In particular, we leverage unsupervised reward proxies such as token-level entropy, trajectory-level entropy, and self-certainty. Our theoretical analysis shows these internal objectives are partially equivalent, and we empirically evaluate various RLIF strategies on challenging math reasoning benchmarks. Experimental results demonstrate that RLIF can boost the reasoning performance of base LLMs at the beginning phase of the training, matching or surpassing RLVR techniques on these tasks. However, when training progresses, performance degrades even below the model before training. Moreover, we find that RLIF yields little improvement for instruction-tuned models, indicating diminishing returns of intrinsic feedback once an LLM is already instruction-tuned. We further analyze this limitation by mixing model weights and explain the reason of RLIF’s training behaviors, providing practical guidelines for integrating internal feedback signals into LLM training. We hope our analysis of internal feedback will inform more principled and effective strategies for LLM post-training.
强化学习是培训后大型语言模型(LLM)改进推理的有力范例。强化学习是培训后大型语言模型(LLM)改进推理的有力范例。强化学习人类反馈(RLHF)和强化学习(RLVR)等方法已经显示出强有力的成果,但需要广泛的外部监督。我们调查了另一类方法,即强化学习内部反馈(RLIF),它完全依靠从内部模式获得的信号,而不是外部奖励。特别是,我们利用了无监督的奖赏,例如象征性级的指南、轨迹级的指南和自我确定。我们的理论分析显示,这些内部目标部分相当,我们从经验上评价各种关于挑战数学推理基准的RLIF战略。实验结果表明,RLIF在培训的初始阶段可以提高LM的推理性能,同时或超过RLFR的技巧。然而,在培训进展甚至低于培训之前的模型。此外,我们发现,RLIF在调整模型后,表明一旦LM的内部反馈的内在反馈将减少回馈,我们又将分析为LM的内部指示调整了方向。我们如何将分析。我们通过培训的进度分析。
Article 274
Title@2025-06-25 (3): Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
Title: Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning | Client Clustering trifft auf Wissensaustausch: Verbesserung der Privatsphäre und Robustheit im personalisierten Peer-to-Peer-Learning | 客户群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群群 2506.20413v1 |
Authors (3): Mohammad Mahdi Maheri, Denys Herasymuk, Hamed Haddadi
The growing adoption of Artificial Intelligence (AI) in Internet of Things (IoT) ecosystems has intensified the need for personalized learning methods that can operate efficiently and privately across heterogeneous, resource-constrained devices. However, enabling effective personalized learning in decentralized settings introduces several challenges, including efficient knowledge transfer between clients, protection of data privacy, and resilience against poisoning attacks. In this paper, we address these challenges by developing P4 (Personalized, Private, Peer-to-Peer) – a method designed to deliver personalized models for resource-constrained IoT devices while ensuring differential privacy and robustness against poisoning attacks. Our solution employs a lightweight, fully decentralized algorithm to privately detect client similarity and form collaborative groups. Within each group, clients leverage differentially private knowledge distillation to co-train their models, maintaining high accuracy while ensuring robustness to the presence of malicious clients. We evaluate P4 on popular benchmark datasets using both linear and CNN-based architectures across various heterogeneity settings and attack scenarios. Experimental results show that P4 achieves 5% to 30% higher accuracy than leading differentially private peer-to-peer approaches and maintains robustness with up to 30% malicious clients. Additionally, we demonstrate its practicality by deploying it on resource-constrained devices, where collaborative training between two clients adds only ~7 seconds of overhead.
互联网物质(IoT)生态系统越来越多地采用人工智能(AI),这更加需要个人化的学习方法,这种方法可以在多种不同、资源受限制的装置之间高效和私下运作。然而,在分散环境中,使个人化的有效学习在分散环境中带来若干挑战,包括客户之间高效的知识转让、保护数据隐私和抵御中毒袭击的复原力。在本文件中,我们通过开发P4(个人化、私人、同侪/Peer)来应对这些挑战,这一方法旨在为资源受限制的IoT装置提供个性化模型,同时确保对中毒袭击的隐私和稳健性有差异。我们的解决方案使用一种轻量级、完全分散的算法,以私人检测客户的相似性并形成协作团体。在每一个群体中,客户利用差异性私人知识蒸馏来共同培养其模型,同时保持高度的准确性,同时确保恶意客户的存在。我们利用基于线性和CNN的架构和攻击情景来评估流行的基准数据集的P4。实验结果表明,P4的准确性比领先的私人客户的近30%的近似点性平价两秒。
Article 275
Title@2025-06-25 (3): POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes
Title: POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes | POLAR: Pessimistisches modellbasiertes politisches Lernen Algorithmen für dynamische Behandlungssysteme | POLAR: 动态治疗制度基于政策学习模型的悲观模型 2506.20406v1 |
Authors (5): Ruijia Zhang, Zhengling Qi, Yue Wu, Xiangyu Zhang, Yanxun Xu
Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, we propose POLAR, a novel pessimistic model-based policy learning algorithm for offline DTR optimization. POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair. A pessimistic penalty is then incorporated into the reward function to discourage actions with high uncertainty. Unlike many existing methods that focus on average training performance, POLAR directly targets the suboptimality of the final learned policy and offers theoretical guarantees, without relying on computationally intensive minimax or constrained optimization procedures. To the best of our knowledge, POLAR is the first model-based DTR method to provide both statistical and computational guarantees, including finite-sample bounds on policy suboptimality. Empirical results on both synthetic data and the MIMIC-III dataset demonstrate that POLAR outperforms state-of-the-art methods and yields near-optimal, history-aware treatment strategies.
动态处理制度(DTRs)提供了一个原则性框架,用以优化在决定必须随着时间变化而适应个人轨迹的领域,如保健、教育和数字干预等领域的顺序决策;然而,现有的统计方法往往依赖强烈的假设假设,在部分数据覆盖范围下缺乏稳健性,而离线强化学习方法通常侧重于平均培训业绩,缺乏统计保障,需要解决复杂的优化问题;为应对这些挑战,我们提议POLAR,这是一个新的悲观的离线DTR优化模式政策学习算法。POLAR估计离线数据过渡动态,并量化每个历史行动对应方的不确定性。然后,将悲观惩罚纳入奖励功能,以阻止高度不确定性的行动。与许多侧重于平均培训业绩的现有方法不同,POLAR直接针对最后学习政策的亚优性,并提供理论保证,而不必依赖计算密集的微量或有限的优化程序。根据我们的知识,POLAR是第一个基于模型的DTR方法,既提供统计保证,又计算接近历史行动的定量保证,包括限值的三号国际马氏-MAS-S-AFS-ADR ASimal Statimal Statimal Statal Statal Stative Statimal Statimal State State Status。
Article 276
Title@2025-06-25 (3): scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection
Title: scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection | scMamba: Ein skalierbares Foundation-Modell für die Single-Cell-Multi-Omics-Integration jenseits einer sehr variablen Feature-Auswahl | scMamba:一个超越高可变地物选择的单细胞多有机集成的可缩放基础模型 2506.20697v1 |
Authors (4): Zhen Yuan, Shaoqing Jiao, Yihang Xiao, Jiajie Peng
The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessing, which may inadvertently discard crucial biological information. Here, we present scMamba, a foundation model designed to integrate single-cell multi-omics data without the need for prior feature selection while preserving genomic positional information. scMamba introduces a patch-based cell tokenization strategy that treats genomics regions as words (tokens) and cells as sentences. Building upon the concept of state space duality, scMamba distills rich biological insights from high-dimensional, sparse single-cell multi-omics data. Additionally, our novel contrastive learning approach, enhanced with cosine similarity regularization, enables superior alignment across omics layers compared to traditional methods. Systematic benchmarking across multiple datasets demonstrates that scMamba significantly outperforms state-of-the-art methods in preserving biological variation, aligning omics layers, and enhancing key downstream tasks such as clustering, cell type annotation, and trajectory inference. Our findings position scMamba as a powerful tool for large-scale single-cell multi-omics integration, capable of handling large-scale atlases and advancing biological discovery.
单细胞多组技术的出现,使各个细胞内不同基因层的同步特征得以同时剖析。这种多式联运数据对细胞特征、监管过程和疾病机制提供了史无前例的洞察力。然而,它仍然具有挑战性,因为目前的方法往往依赖在预处理过程中选择高度可变基因或峰值,这可能会无意中丢弃重要的生物信息。在这里,我们提出了ScMamba,这是一个基础模型,旨在整合单细胞多组数据,而无需事先选择特征,同时保存基因组位置信息。scMamba引入了基于补丁细胞代号的战略,将基因组区域作为单词(当量)和细胞作为句子处理。基于国家空间二元性概念,ScMamba从高维、稀疏的单细胞多组数据中提取丰富的生物洞察力。此外,我们的新颖的对比学习方法,在保存基因组定位的同时,能够优于基因组层次与传统方法。 跨多个数据集的系统基准显示,将基因组区域作为单组(当量)的文字(当量)和细胞组作为词组的词组化的分流,在生物级的大型集中,在生物级的大型集中,将巨型集中,将巨型的集中,将生物级研究中,将巨型集的底研究,将生物分级研究。
Article 277
Title@2025-06-25 (3): Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
Title: Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking | Ausbeuten von leichten Hierarchischen ViT und Dynamic Framework für effizientes visuelles Tracking | 利用轻量轻级高压静电和高效视觉跟踪动态框架 2506.20381v1 |
Authors (6): Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang, Huchuan Lu
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To address this challenge, we present HiT, a novel family of efficient tracking models that achieve high performance while maintaining fast operation across various devices. The core innovation of HiT lies in its Bridge Module, which connects lightweight transformers to the tracking framework, enhancing feature representation quality. Additionally, we introduce a dual-image position encoding approach to effectively encode spatial information. HiT achieves an impressive speed of 61 frames per second (fps) on the NVIDIA Jetson AGX platform, alongside a competitive AUC of 64.6% on the LaSOT benchmark, outperforming all previous efficient trackers.Building on HiT, we propose DyHiT, an efficient dynamic tracker that flexibly adapts to scene complexity by selecting routes with varying computational requirements. DyHiT uses search area features extracted by the backbone network and inputs them into an efficient dynamic router to classify tracking scenarios. Based on the classification, DyHiT applies a divide-and-conquer strategy, selecting appropriate routes to achieve a superior trade-off between accuracy and speed. The fastest version of DyHiT achieves 111 fps on NVIDIA Jetson AGX while maintaining an AUC of 62.4% on LaSOT.Furthermore, we introduce a training-free acceleration method based on the dynamic routing architecture of DyHiT. This method significantly improves the execution speed of various high-performance trackers without sacrificing accuracy. For instance, our acceleration method enables the state-of-the-art tracker SeqTrack-B256 to achieve a 2.68 times speedup on an NVIDIA GeForce RTX 2080 Ti GPU while maintaining the same AUC of 69.9% on the LaSOT.
以变压器为基础的视觉跟踪器由于其强大的建模能力而展示了显著的进步。 但是,由于处理速度缓慢,其实用性在资源限制的装置上有限。 为了应对这一挑战,我们展示了HiT, 这是一套新型高效跟踪模型,在保持各种装置的快速运行的同时实现了高性能。 HiT的核心创新在于其桥梁模块,它将轻量变压器与跟踪框架连接起来,提高了特征显示质量。此外,我们引入了一种双重图像定位编码方法,以有效编码空间信息。 HiT在 NVDIA Jetson AGX平台上实现了令人印象深刻的61个框架(fps),每秒(fps) 速度缓慢。为了应对这一挑战,我们在LASOT基准上具有64.6%的竞争性AUCUC,比以往所有高效的跟踪器更出色。我们在HiT上提出DyHT, 高效的动态跟踪器,通过选择不同计算要求的路径来灵活适应场景的复杂性。 DyHT使用由主网络提取的搜索区域功能,并将它们输入到一个有效的动态路段进行适当的跟踪图。 基于分类,DyHITA-reval-t-ral-de-rever在不作上,在选择一个分级的快速路路段的S-ral-de-ral-ral-ral-de-de-de-de-de-de-de-ral-de-de-de-de-de-de-de-de-de-de-de-de-deal-dealder-de-de-de-de-de-de-deal-deal-de-de-de-de-deal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-de-de-de-de-de-de-de-de-de-deal-de-de-deal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-
Article 278
Title@2025-06-25 (3): TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
Title: TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis | TESSERA: Temporale Einbettungen von Oberflächenspektren für die Darstellung und Analyse der Erde | TESSERA:用于地球代表和分析的地平面表面表层实时嵌入 2506.20380v1 |
Authors (13): Zhengpeng Feng, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline Lisaius, Markus Immitzer, James Ball, Clement Atzberger, David A. Coomes, Anil Madhavapeddy, Andrew Blake, Srinivasan Keshav
Satellite remote sensing (RS) enables a wide array of downstream Earth observation (EO) applications, including climate modeling, carbon accounting, and strategies for conservation and sustainable land use. We present TESSERA, a novel Remote Sensing Foundation Model (RSFM) that uses Self-Supervised Learning (SSL) to generate global, robust representations at 10m scale from pixel-level satellite time series data. TESSERA combines information from only optical and SAR data streams using two parallel Transformer-based encoders: one dedicated to Sentinel-1 SAR polarizations and another to Sentinel-2 MSI data (10 selected spectral bands) to create representations that are then fused using a multilayer perceptron (MLP), resulting in a global representation map covering the years 2017 to 2024. Our precomputed representations set a new state-of-the-art performance benchmark and our open-source approach democratizes access to high-performance, high-resolution representations. We benchmark the performance of TESSERA in five diverse tasks, comparing our work with state-of-the-art task-specific models and other foundation models. Our results show that TESSERA outperforms both traditional RS baselines and the leading geospatial foundation models in these diverse downstream tasks.
卫星遥感使下游地球观测(EO)应用广泛多样,包括气候建模、碳核算以及养护和可持续土地使用的战略。我们介绍TESSERA,这是利用自学学习(SSL)的新型遥感基金会模型,利用像素级卫星时间序列数据,在10米范围内生成全球强势代表。TESSERA利用两个平行的变异器模型,将光学和合成孔径雷达数据流中的信息合并在一起:一个专门用于Sentinel-1合成孔径雷达两极化,另一个专门用于Sentinel-2MSI数据(10个选定的光谱带),以便建立代表机构,然后用多层透视器(MLP)进行整合,从而形成2017-2024年的全球代表性地图。我们预先制作的演示为一个新的最新业绩基准和我们的开放源方法,使高性能高分辨率演示的获取民主化。我们将TESERRA在五项不同任务中的业绩基准,将我们的工作与最新任务特定模型和其他基础模型(10个选定的光谱带段)进行比较,我们的成果显示下游空间基础的基础。
Article 279
Title@2025-06-25 (3): WyckoffDiff – A Generative Diffusion Model for Crystal Symmetry
Title: WyckoffDiff – A Generative Diffusion Model for Crystal Symmetry | WyckoffDiff - ein generatives Diffusionsmodell für die Kristallsymmetrie | WycccoffDiff – – 水晶对称生成扩散模型 2502.06485v3 |
Authors (6): Filip Ekström Kelvinius, Oskar B. Andersson, Abhijith S. Parackal, Dong Qian, Rickard Armiento, Fredrik Lindsten
Crystalline materials often exhibit a high level of symmetry. However, most generative models do not account for symmetry, but rather model each atom without any constraints on its position or element. We propose a generative model, Wyckoff Diffusion (WyckoffDiff), which generates symmetry-based descriptions of crystals. This is enabled by considering a crystal structure representation that encodes all symmetry, and we design a novel neural network architecture which enables using this representation inside a discrete generative model framework. In addition to respecting symmetry by construction, the discrete nature of our model enables fast generation. We additionally present a new metric, Fr'echet Wrenformer Distance, which captures the symmetry aspects of the materials generated, and we benchmark WyckoffDiff against recently proposed generative models for crystal generation. As a proof-of-concept study, we use WyckoffDiff to find new materials below the convex hull of thermodynamical stability.
晶体材料往往表现出高度的对称性。 然而,大多数基因模型并不反映对称性,而是每个原子的模型,其位置或元素不受任何限制。我们提议了一个基因模型,Wyckoff Difmission(Wyckoff Diff),该模型生成基于对称性的晶体描述。这是通过考虑一种晶体结构表示法,将所有对称性都编码起来,我们设计了一个新型的神经网络结构,使得能够在离散的基因模型框架内使用这种表示法。除了通过建筑来尊重对称性外,我们模型的离散性质也能够快速生成。我们还提出了一个新的指标,Fr'echet Wrencreform,它捕捉了所生成材料的对称性方面,我们根据最近提议的晶体生成的基因模型对WyckoffDiff进行基准。作为概念的验证研究,我们使用WyckoffDiff在热动力稳定层下面找到新的材料。
Article 280
Title@2025-06-25 (3): Chemical knowledge-informed framework for privacy-aware retrosynthesis learning
Title: Chemical knowledge-informed framework for privacy-aware retrosynthesis learning | Chemischer wissensbasierter Rahmen für datenschutzbewusstes Retrosynthese-Lernen | 以化学知识为基础的隐私意识复后学习框架 2502.19119v2 |
Authors (6): Guikun Chen, Xu Zhang, Xiaolin Hu, Yong Liu, Yi Yang, Wenguan Wang
Chemical reaction data is a pivotal asset, driving advances in competitive fields such as pharmaceuticals, materials science, and industrial chemistry. Its proprietary nature renders it sensitive, as it often includes confidential insights and competitive advantages organizations strive to protect. However, in contrast to this need for confidentiality, the current standard training paradigm for machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries and frequent data transmission between entities, potentially exposing proprietary information to unauthorized access or interception during storage and transfer. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models. CKIF enables distributed training across multiple chemical organizations without compromising the confidentiality of proprietary reaction data. Instead of gathering raw reaction data, CKIF learns retrosynthesis models through iterative, chemical knowledge-informed aggregation of model parameters. In particular, the chemical properties of predicted reactants are leveraged to quantitatively assess the observable behaviors of individual models, which in turn determines the adaptive weights used for model aggregation. On a variety of reaction datasets, CKIF outperforms several strong baselines by a clear margin.
化学反应数据是一个关键资产,它推动在药品、材料科学和工业化学等竞争领域取得进展,其专有性质使其敏感,因为它往往包括机密的洞察力和竞争优势组织努力保护。然而,与这种保密需要相反,目前机器学习复式合成系统的标准培训模式将反应数据从多种来源收集到单一边缘,以训练预测模型。这一模式带来相当大的隐私风险,因为它需要跨组织边界广泛提供数据和各实体之间经常传送数据,有可能将专利信息暴露于未经授权的获取或截获中。在本研究中,我们引入了化学知识知情框架(CKIF),这是学习复式合成模型的一种隐私保护方法。CKIF能够将培训分散到多个化学组织,而不损害自有反应数据的保密性。CKIF不是收集原始反应数据,而是通过对模型参数进行反复、化学知识知情的汇总来学习复用模型模型模型。特别是预测反应者的化学特性被用于定量评估单个模型的可观测行为,从而反过来确定用于模型的强度比值。
Article 281
Title@2025-06-25 (3): InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking
Title: InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking | InvZW: Invariantes Feature-Lernen über Lärm-Adversarial-Training für robuste Bild-Null-Wasser-Markierung | InvZW:通过对强力图像零水标记的噪音 – – Adversarial培训进行不易变地物学习 2506.20370v1 |
Authors (2): Abdullah All Tanvir, Xin Zhong
This paper introduces a novel deep learning framework for robust image zero-watermarking based on distortion-invariant feature learning. As a zero-watermarking scheme, our method leaves the original image unaltered and learns a reference signature through optimization in the feature space. The proposed framework consists of two key modules. In the first module, a feature extractor is trained via noise-adversarial learning to generate representations that are both invariant to distortions and semantically expressive. This is achieved by combining adversarial supervision against a distortion discriminator and a reconstruction constraint to retain image content. In the second module, we design a learning-based multibit zero-watermarking scheme where the trained invariant features are projected onto a set of trainable reference codes optimized to match a target binary message. Extensive experiments on diverse image datasets and a wide range of distortions show that our method achieves state-of-the-art robustness in both feature stability and watermark recovery. Comparative evaluations against existing self-supervised and deep watermarking techniques further highlight the superiority of our framework in generalization and robustness.
本文介绍了基于扭曲变化特性学习的稳健图像零水标记的新深层次学习框架。作为零水标记办法,我们的方法使原始图像未变换,并通过优化地物空间学习参考签名。拟议框架由两个关键模块组成。在第一个模块中,通过噪音对抗性学习,对地物提取器进行了培训,以产生既不会扭曲又不会以语义表达的演示。这是通过结合对抗性监督来达到的。在第二个模块中,我们设计了一个基于学习的多位零水标记办法,将经过培训的易变特征投放到一套可培训的参考代码上,以优化匹配目标双元信息。关于不同图像数据集的广泛实验和广泛的扭曲表明,我们的方法在特性稳定性和水标记恢复方面都达到了最先进的强性。对照现有的自我监督和深水标记技术进行比较评估,进一步突出我们框架在一般化和稳健度方面的优势。
Article 282
Title@2025-06-25 (3): A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges
Title: A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges | Eine Umfrage zum Erklärbaren Verstärkungslernen: Konzepte, Algorithmen, Herausforderungen | 关于 “ 可解释的强化学习调查:概念、等级、挑战 “ 的调查 2211.06665v5 |
Authors (5): Yunpeng Qing, Shunyu Liu, Jie Song, Huiqiong Wang, Mingli Song
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Driven by the resurgence of deep learning, Deep RL (DRL) has witnessed great success over a wide spectrum of complex control tasks. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability. In this survey, we provide a comprehensive review of existing works on eXplainable RL (XRL) and introduce a new taxonomy where prior works are clearly categorized into model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field. Some challenges and opportunities in XRL are discussed. This survey intends to provide a high-level summarization of XRL and to motivate future research on more effective XRL solutions. Corresponding open source codes are collected and categorized at https://github.com/Plankson/awesome-explainable-reinforcement-learning.
强化学习(RL)是一种流行的机器学习模式,智能剂与环境互动,以实现长期目标。在深层次学习的重新抬头的驱动下,Deep RL(DRL)在一系列复杂的控制任务中取得了巨大成功。尽管取得了令人鼓舞的成果,但深神经网络骨干被广泛视为一个黑盒,它阻碍从业者在高度安全和可靠性至关重要的现实情景中信任并雇用训练有素的代理人。为了缓解这一问题,通过建立内在可解释性或后热度解释性,提出了大量文献,专门说明智能剂内部的运作情况。在这次调查中,我们全面审查了关于可扩展的RL(XRL)的现有工作,并引入了一个新的分类学,将先前的工作明确归类为示范解释、报酬解释、州解释和任务解释方法。我们还审查并突出了RL(RL)方法的开放性方法,在XRL(XRL)领域常常忽视这种方法,而这种方法往往被忽略。在XRL(XRL)领域对高层次的研究分类中,一些挑战和机会正在对XL(C)的分类进行更多的研究分类。
Article 283
Title@2025-06-25 (3): Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations
Title: Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations | Selbstüberwachtes Graphenlernen über Spektrale Bootstrapping- und Laplacian-basierte Augmentationen | 通过光谱推进和拉平板辅助和拉平板辅助增强学习自摄图像 2506.20362v1 |
Authors (2): Lorenzo Bini, Stephane Marchand-Maillet
We present LaplaceGNN, a novel self-supervised graph learning framework that bypasses the need for negative sampling by leveraging spectral bootstrapping techniques. Our method integrates Laplacian-based signals into the learning process, allowing the model to effectively capture rich structural representations without relying on contrastive objectives or handcrafted augmentations. By focusing on positive alignment, LaplaceGNN achieves linear scaling while offering a simpler, more efficient, self-supervised alternative for graph neural networks, applicable across diverse domains. Our contributions are twofold: we precompute spectral augmentations through max-min centrality-guided optimization, enabling rich structural supervision without relying on handcrafted augmentations, then we integrate an adversarial bootstrapped training scheme that further strengthens feature learning and robustness. Our extensive experiments on different benchmark datasets show that LaplaceGNN achieves superior performance compared to state-of-the-art self-supervised graph methods, offering a promising direction for efficiently learning expressive graph representations.
我们提出LaplaceGNN,这是一个新的自我监督的图形学习框架,它通过利用光谱靴子技术,绕过负面取样的需要。我们的方法将基于Laplacian的信号纳入学习过程,使模型能够有效捕捉丰富的结构性表述,而不必依赖对比性目标或手工制作的放大器。我们通过注重正面调整,LaplaGNN可以实现线性缩放,同时提供一个适用于不同领域的图形神经网络的更简单、更高效、自监督的替代方法。我们的贡献是双重的:我们预先通过最大限度的中心引导优化来计算光谱谱增强,在不依赖手工制作的放大器的情况下进行丰富的结构监督,然后我们整合一个能进一步加强特征学习和稳健的对抗性靴式培训计划。我们在不同的基准数据集上进行的广泛实验表明,LaplaceGNNNN比最先进的自我监督的图形方法取得更高的性能,为高效学习直观图形展示提供了有希望的方向。
Article 284
Title@2025-06-25 (3): Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach
Title: Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach | Auf dem Weg zu einer interpretierbaren und effizienten Feature-Auswahl in Trajektori-Datensätzen: Ein taxonomischer Ansatz | 走向在轨迹数据集中进行解释和高效地物选择:分类学方法 2506.20359v1 |
Authors (2): Chanuka Don Samarasinghage, Dhruv Gulabani
Trajectory analysis is not only about obtaining movement data, but it is also of paramount importance in understanding the pattern in which an object moves through space and time, as well as in predicting its next move. Due to the significant interest in the area, data collection has improved substantially, resulting in a large number of features becoming available for training and predicting models. However, this introduces a high-dimensionality-induced feature explosion problem, which reduces the efficiency and interpretability of the data, thereby reducing the accuracy of machine learning models. To overcome this issue, feature selection has become one of the most prevalent tools. Thus, the objective of this paper was to introduce a taxonomy-based feature selection method that categorizes features based on their internal structure. This approach classifies the data into geometric and kinematic features, further categorizing them into curvature, indentation, speed, and acceleration. The comparative analysis indicated that a taxonomy-based approach consistently achieved comparable or superior predictive performance. Furthermore, due to the taxonomic grouping, which reduces combinatorial space, the time taken to select features was drastically reduced. The taxonomy was also used to gain insights into what feature sets each dataset was more sensitive to. Overall, this study provides robust evidence that a taxonomy-based feature selection method can add a layer of interpretability, reduce dimensionality and computational complexity, and contribute to high-level decision-making. It serves as a step toward providing a methodological framework for researchers and practitioners dealing with trajectory datasets and contributing to the broader field of explainable artificial intelligence.
轨迹分析不仅涉及获取移动数据,而且对于了解一个物体在空间和时间中移动的模式以及预测其下一个移动也至关重要。由于对该领域的兴趣很大,数据收集工作已大有改进,导致为培训和预测模型提供了大量特征;然而,这带来了一个高维因素引发的特征爆炸问题,降低了数据的效率和可解释性,从而降低了机器学习模型的准确性。为了克服这一问题,特征选择已成为最流行的工具之一。因此,本文件的目的是引入基于分类的精度精度数据选择方法,根据内部结构对特征进行分类。这种方法将数据分为几何和运动特征,进一步将其分为曲线、缩进、速度和加速等特征。比较分析表明,基于分类的方法始终可以达到可比性或高度的预测性。此外,由于分类组化,减少组合空间,选择领域的时间大大减少了。因此,基于分类的精度精度精度特征选择方法选择方法选择方法选择方法的方法选择方法方法选择方法方法方法方法方法,根据内部结构进行分类,将数据分类分为几何和运动特征特征特征特征特征特征,进一步将其分类和结构分析用于向高层次分析,从而提供高层次数据分析,从而进行高层次分析,从而提供高层次数据分析,为分析,从而向高层次分析,为分析提供高层次数据分析提供高层次数据分析,为分析提供高层次数据分析,为分析提供高层次分析,为分析,为分析提供一种数据分析提供高层次分析提供高层次分析,并增加数据分析,为分析提供高层次分析,为分析,为分析提供一种方法,为分析提供一种分析提供一种方法,为分析提供一种方法,为分析提供一种方法,为分析提供一种分析提供一种方法,可以提供一种分析方法,可以提供一种分析方法,为分析,可以增加一种方法,为分析。 提供一种方法,提供一种分析方法,提供一种方法,提供一种分析方法,提供一种方法,提供一种方法,可以提供一种方法,可以提供一种分析方法,为分析,可以提供一种分析方法分析方法,可以提供一种方法,可以提供一种分析方法,可以提供一种方法,可以提供一种方法,可以提供一种方法,为分析方法,进行一种方法,可以提供一种分析方法,进行一种方法,进行一种方法,可以提供一种分析方法,用于分析方法,用于分析,为分析方法,为分析,为分析,
Article 285
Title@2025-06-25 (3): A foundation model with multi-variate parallel attention to generate neuronal activity
Title: A foundation model with multi-variate parallel attention to generate neuronal activity | Ein Fundamentmodell mit multivariater paralleler Aufmerksamkeit zur Generierung neuronaler Aktivität | 具有多变量平行关注以产生神经活动的基础模型 2506.20354v1 |
Authors (5): Francesco Carzaniga, Michael Hersche, Abu Sebastian, Kaspar Schindler, Abbas Rahimi
Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks (DNNs), particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future effort by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. MVPFormer leverages MVPA to achieve strong generalization across subjects, demonstrating expert-level performance in seizure detection and outperforming state-of-the-art Transformer baselines on our SWEC, the MAYO, and the FNUSA dataset. We further validate MVPA on standard time-series forecasting and classification tasks, where it matches or exceeds existing attention-based models. Together, our contributions establish MVPA as a general-purpose attention mechanism for heterogeneous time-series and MVPFormer as the first open-source, open-weights, and open-data iEEG foundation model with state-of-the-art clinical performance. The code is available at https://github.com/IBM/multi-variate-parallel-transformer. The SWEC iEEG dataset is available at https://mb-neuro.medical-blocks.ch/public_access/databases/ieeg/swec_ieeg.
从多变时间序列中学习,有多种不同的频道配置,这仍然是深层神经网络(DNNS)面临的一个根本性挑战,特别是在临床领域,如内部电离脑物理(iEEG),各学科的频道设置差异很大。在这项工作中,我们引入了多变平行关注(MVPA),这是一个全新的自我关注机制,可以分解内容、时间和空间关注,能够灵活、普遍和高效地模拟具有不同频道计数和配置的时间序列数据。我们使用MVPA来建立MVPFormer,这是人类电子生理的基因基础模型,受过训练可以预测iEEEG信号在不同学科的演变。为了支持社区这一和未来的努力,我们发布了SWEC iEG数据集,这是迄今为止最大的公开可用的iEG数据集,包括来自不同临床来源的近10个小时的录音。MVPAFR将MPA用于在缉获检测和超过国家变压模型的状态,在SWEVA-Ralalal-alal-alal-al-alal-al-IG-alal-al-al-al-al-al-al-al-al-IG-al-al-IG-al-al-al-al-al-IG-al-al-IG-al-IG-al-al-al-al-al-S-S-S-S-S-S-ID-SD-S-S-S-S-ID-SD-S-S-S-SD-SD-S-SD-SD-S-SD-SD-S-S-S-S-S-S-S-SD-SD-S-S-S-S-S-S-S-S-S-S-S-S-SD-S-SD-S-S-S-SD-SD-SD-SD-SDSD-S-SD-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S
Article 286
Title@2025-06-25 (3): Backpropagation Through Time For Networks With Long-Term Dependencies
Title: Backpropagation Through Time For Networks With Long-Term Dependencies | Backpropagation durch die Zeit für Netzwerke mit langfristigen Abhängigkeiten | 长期依赖网络在时间上反向通信 2103.15589v3 |
Authors (2): George Bird, Maxim E. Polivoda
Backpropagation through time (BPTT) is a technique of updating tuned parameters within recurrent neural networks (RNNs). Several attempts at creating such an algorithm have been made including: Nth Ordered Approximations and Truncated-BPTT. These methods approximate the backpropagation gradients under the assumption that the RNN only utilises short-term dependencies. This is an acceptable assumption to make for the current state of artificial neural networks. As RNNs become more advanced, a shift towards influence by long-term dependencies is likely. Thus, a new method for backpropagation is required. We propose using the ‘discrete forward sensitivity equation’ and a variant of it for single and multiple interacting recurrent loops respectively. This solution is exact and also allows the network’s parameters to vary between each subsequent step, however it does require the computation of a Jacobian.
时间反向分析( BPTT) 是更新经常性神经网络( RNN) 中调控参数的一种技术。 已经尝试了几次建立这样的算法, 包括: Nth 顺序排列的近似和 转线- BPTT。 这些方法在假设 RNN 仅利用短期依赖性的情况下, 接近后向反向调整梯度。 这是为当前人工神经网络状况做出的一种可以接受的假设。 随着 RNNP 的不断进步, 向长期依赖性影响转变的可能性也有可能。 因此, 需要一种新的反向调整方法。 我们建议使用“ 分解的前方敏感方程” , 以及它用于单个和多个互动的重复循环的变方。 这个解决方案非常精确, 也允许网络参数在随后的每个步骤之间变化, 但它确实需要计算雅各布人 。
Article 287
Title@2025-06-25 (3): DipSVD: Dual-importance Protected SVD for Efficient LLM Compression
Title: DipSVD: Dual-importance Protected SVD for Efficient LLM Compression | DipSVD: Dual-Importance Protected SVD für effiziente LLM-Kompression | DipSVD: 用于高效LLM压缩的双重重要性保护SVD 2506.20353v1 |
Authors (9): Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuanlong Xie, Yao Zhu
The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.
大型语言模型(LLMS)日益增长的计算要求和部署成本催生了许多压缩方法。与量化和非结构化的剪裁相比,SVD压缩提供了更好的硬件兼容性和理论保障。然而,基于SVD的现有方法侧重于原始矩阵和压缩矩阵之间的总体差异,而忽视了对矩阵内关键部件的保护,从而导致压缩模型的性能低下。本文件提议了一种双重重要保护机制,以加强基于SVD的压缩方法:(1) 当地重要性保护:通过频道加权数据白化来保护每个重量矩阵中最关键的单一矢量;(2) 全球重要性保护:使较不重要的层能够通过超重或优化的方法承担较大压缩负担,从而最大限度地减少压缩对关键层的影响。广泛的实验表明,DipSVD超越了现有的基于SVD的压缩方法,超越了多个基准,特别是在高模型压缩率上达到优等模型性能。
Article 288
Title@2025-06-25 (3): It’s not you, it’s me – Global urban visual perception varies across demographics and personalities
Title: It’s not you, it’s me – Global urban visual perception varies across demographics and personalities | Es sind nicht Sie, es bin ich – Die globale urbane visuelle Wahrnehmung variiert zwischen demographischen und Persönlichkeiten | 不是你,是我,全球城市的视觉感知 不同人口和个性的不同 2505.12758v2 |
Authors (8): Matias Quintana, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Filip Biljecki
Understanding people’s preferences and needs is crucial for urban planning decisions, yet current approaches often combine them from multi-cultural and multi-city populations, obscuring important demographic differences and risking amplifying biases. We conducted a large-scale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics – including gender, age, income, education, race and ethnicity, and, for the first time, personality traits – shape perceptions among 1,000 participants, with balanced demographics, from five countries and 45 nationalities. This dataset, introduced as Street Perception Evaluation Considering Socioeconomics (SPECS), exhibits statistically significant differences in perception scores in six traditionally used indicators (safe, lively, wealthy, beautiful, boring, and depressing) and four new ones we propose (live nearby, walk, cycle, green) among demographics and personalities. We revealed that location-based sentiments are carried over in people’s preferences when comparing urban streetscapes with other cities. Further, we compared the perception scores based on where participants and streetscapes are from. We found that an off-the-shelf machine learning model trained on an existing global perception dataset tends to overestimate positive indicators and underestimate negative ones compared to human responses, suggesting that targeted intervention should consider locals’ perception. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.
理解人们的偏好和需求对于城市规划决策至关重要,然而,目前的方法往往将它们与多元文化和多城市人口结合起来,掩盖重要的人口差异,并有可能扩大偏见。我们利用街头观景图像对世界各地的街头景象进行了大规模的城市视觉调查,并审查了人口结构 – – 包括性别、年龄、收入、教育、种族和族裔,以及首次对个性特征 – – 塑造了来自5个国家和45个国家的1 000名参与者的观念和平衡的人口结构。这个数据集是作为街头观感评价社会经济(SPECS)推出的,显示了六个传统上使用的指标(安全、活跃、富足、美丽、无聊和令人沮丧)和四个新的指标(在附近、步行、循环、绿色)在人口和个性之间对街头景象进行了大规模的城市景象调查。我们发现,在比较城市街道景象和其他城市的景象时,基于地点的情绪变化,我们比较了根据参与者和街头景象从何来的认知分数。我们发现,一个在统计现有全球观景象方面受过训练的离子学习的机器学习的分数模式,比较了我们现有的全球观感官的直观、直观数据往往会考虑我们对正估测。
Article 289
Title@2025-06-25 (3): On the ability of Deep Neural Networks to Learn Granger Causality in Multi-Variate Time Series Data
Title: On the ability of Deep Neural Networks to Learn Granger Causality in Multi-Variate Time Series Data | Über die Fähigkeit von Deep Neural Networks, Granger-Causalität in mehrstufigen Zeitreihendaten zu lernen | 关于深神经网络在多变时间序列数据中学习重力原因的能力 2506.20347v1 |
Authors (2): Malik Shahid Sultan, Hernando Ombao
Granger Causality (GC) offers an elegant statistical framework to study the association between multivariate time series data. Linear Vector Autoregressive models (VAR) though have nice interpretation properties but have limited practical application due to underlying assumptions on the kind of associations that can be captured by these models. Numerous attempts have already been made in the literature that exploit the functional approximation power of Deep Neural Networks (DNNs) for the task of GC estimation. These methods however treat GC as a variable selection problem. We present a novel paradigm for approaching GC. We present this idea that GC is essentially linked with prediction and if a deep learning model is used to model the time series collectively or jointly, a well regularized model may learn the true granger causal structure from the data, given that there is enough training data. We propose to uncover the learned GC structure by comparing the model uncertainty or distribution of the residuals when the past of everything is used as compared to the one where a specific time series component is dropped from the model. We also compare the effect of input layer dropout on the ability of a neural network to learn granger causality from the data. We show that a well regularized model infact can learn the true GC structure from the data without explicitly adding terms in the loss function that guide the model to select variables or perform sparse regression.
Granger Granger Causality (GC) 提供了一个优雅的统计框架,用于研究多变时间序列数据之间的关联。线性矢量自动递减模型(VAR)虽然具有很好的解释属性,但由于这些模型可以捕捉到的关联类型的基本假设,其实际应用有限。文献中已经做了许多尝试,利用深神经网络(DNNS)的功能近似功率来进行GC估计。但这些方法却把GC视为一个变量选择问题。我们为接近GC提出了一个新的范例。我们提出了这样一个想法,即GC基本上与预测相联系,如果使用深层学习模型来模拟时间序列(VAR),那么一个正规化的模型可能从数据中学习真正的颗粒性因果结构,因为有足够的培训数据。我们提议通过比较模型的不确定性或残留物的分布来了解所学的GC结构,在使用过去时,与从模型中删除特定时间序列组成部分的模型相比较。我们还比较了输入层流失对神经网络从数据中学习颗粒性因果关系的能力的影响。我们比较了这个想法,如果使用深层次模型来从数据结构学习颗粒性因果关系,那么,那么,那么,那么,我们就可以在不断的模型中清楚地学习一个变量,我们就可以在模型中明确地变化的变量中学习。
Article 290
Title@2025-06-25 (3): Signatures of planets and Galactic subpopulations in solar analogs. Precise chemical abundances with neural networks
Title: Signatures of planets and Galactic subpopulations in solar analogs. Precise chemical abundances with neural networks | Signaturen von Planeten und galaktischen Subpopulationen in solaren Analogen. Präzise chemische Fülle mit neuronalen Netzwerken | 太阳模拟物中行星和银河子人口组的签名; 具有神经网络的精密化学丰度 2506.20345v1 |
Authors (4): Giulia Martos, Jorge Meléndez, Lorenzo Spina, Sara Lucatello
The aim of this work is to obtain precise atmospheric parameters and chemical abundances automatically for solar twins and analogs to find signatures of exoplanets, as well as to assess how peculiar the Sun is compared to these stars and to analyze any possible fine structures in the Galactic thin disk. We developed a neural network (NN) algorithm using Python to obtain these parameters for a sample of 99 solar twins and solar analogs previously studied in the literature from normalized high-quality spectra from HARPS, with a resolving power of R $\sim$ 115000 and a signal-to-noise ratio S/N > 400. We obtained precise atmospheric parameters and abundance ratios [X/Fe] of 20 chemical elements (Li, C, O, Na, Mg, Al, Si, S, Ca, Sc, Ti, V, Cr, Mn, Co, Ni, Cu, Zn, Y, and Ba). The results are in line with the literature, with average differences and standard deviations of $(2 \pm 27)$ K for T${\rm eff}$, $(0.00 \pm 0.06)$ dex for log g, $(0.00 \pm 0.02)$ dex for [Fe/H], $(-0.01 \pm 0.05)$ km s$^{-1}$ for microturbulence velocity, $(0.02 \pm 0.08)$ km s$^{-1}$ for the macro turbulence velocity, and $(-0.12 \pm 0.26)$ km s$^{-1}$ for the projected rotational velocity (vsin$i$). Regarding the chemical abundances, most of the elements agree with the literature within 0.01 - 0.02 dex. The abundances were corrected from the effects of the Galactic chemical evolution and analyzed with the condensation temperature (T${\rm cond}$) to verify whether the stars presented depletion of refractories compared to volatiles. We found that the Sun is more depleted in refractory elements compared to volatiles than 89% of the studied solar analogs, with a significance of 9.5$\sigma$ when compared to the stars without detected exoplanets. We also found the possible presence of three subpopulations in the solar analogs: one Cu-rich, one Cu-poor, and the last one slightly older and poor in Na.
nan
Article 291
Title@2025-06-25 (3): A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization
Title: A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization | Eine vollständige Verlustlandschaftsanalyse der Regularisierten Tiefenmatrix-Fabrikierung | 对正规化深母体因子化的全损全损地貌分析 2506.20344v1 |
Authors (3): Po Chen, Rujun Jiang, Peng Wang
Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form expression of all critical points. Building on this, we establish precise conditions under which a critical point is a local minimizer, a global minimizer, a strict saddle point, or a non-strict saddle point. Leveraging these results, we derive a necessary and sufficient condition under which each critical point is either a local minimizer or a strict saddle point. This provides insights into why gradient-based methods almost always converge to a local minimizer of the regularized DMF problem. Finally, we conduct numerical experiments to visualize its loss landscape under different settings to support our theory.
尽管深度矩阵因子化(DMF)的应用范围很广,但深层矩阵因子化(DMF)的优化基础仍然基本开放。在这项工作中,我们的目标是通过全面研究正常的DMF问题的损失情况来填补这一空白。为了实现这一目标,我们首先提供所有关键点的封闭式表达方式。在此基础上,我们建立了精确的条件,使一个临界点成为局部最小化器、全球最小化器、严格的战壕点或非严格性马鞍点。利用这些结果,我们形成了一个必要和充分的条件,使每个临界点要么是局部最小化器,要么是严格的马鞍点。这提供了对基于梯度的方法为何几乎总是会与正常的DMF问题的地方最小化器汇合的洞穴。最后,我们进行了数字实验,以便在不同环境下对其损失情况进行直观化,以支持我们的理论。
Article 292
Title@2025-06-25 (3): Feature Hallucination for Self-supervised Action Recognition
Title: Feature Hallucination for Self-supervised Action Recognition | Feature Halluzination für die Selbstüberwachte Aktionserkennung | 自我监督行动承认的幻觉 2506.20342v1 |
Authors (2): Lei Wang, Piotr Koniusz
Understanding human actions in videos requires more than raw pixel analysis; it relies on high-level semantic reasoning and effective integration of multimodal features. We propose a deep translational action recognition framework that enhances recognition accuracy by jointly predicting action concepts and auxiliary features from RGB video frames. At test time, hallucination streams infer missing cues, enriching feature representations without increasing computational overhead. To focus on action-relevant regions beyond raw pixels, we introduce two novel domain-specific descriptors. Object Detection Features (ODF) aggregate outputs from multiple object detectors to capture contextual cues, while Saliency Detection Features (SDF) highlight spatial and intensity patterns crucial for action recognition. Our framework seamlessly integrates these descriptors with auxiliary modalities such as optical flow, Improved Dense Trajectories, skeleton data, and audio cues. It remains compatible with state-of-the-art architectures, including I3D, AssembleNet, Video Transformer Network, FASTER, and recent models like VideoMAE V2 and InternVideo2. To handle uncertainty in auxiliary features, we incorporate aleatoric uncertainty modeling in the hallucination step and introduce a robust loss function to mitigate feature noise. Our multimodal self-supervised action recognition framework achieves state-of-the-art performance on multiple benchmarks, including Kinetics-400, Kinetics-600, and Something-Something V2, demonstrating its effectiveness in capturing fine-grained action dynamics.
理解视频中的人类行动需要的不仅仅是原始像素分析; 它依赖于高层次的语义推理和多式联运特征的有效整合。 我们提出一个深层次的翻译行动识别框架, 通过共同预测 RGB 视频框架的行动概念和辅助特征来提高认知准确性。 在测试时, 幻觉流会推断出缺失的线索, 丰富特征表达方式而不增加计算管理。 要在原始像素之外关注与行动相关的区域, 我们引入两个新的域标注器。 目标检测特性( ODF) 综合输出来自多个目标探测器, 以捕捉背景提示, 而 萨林特( SDF) 突出对行动识别至关重要的空间和强度模式。 我们的框架将这些描述符与辅助模式( 如光学流、 改进的感官轨迹、 骨架数据、 音频提示等) 紧密结合。 它仍然与最先进的结构兼容, 包括 I3D、 AsmemblebleNet、 视频变异器网络、 FAFASTERT, 以及最近的一些模型, 如VMAE V2 和 Internvidedododoo2 等模型, 要处理对动作的不确定性, 要在辅助特性特征中处理中的不确定性, , 我们将这些描述中引入了一种稳性模型, 的模型化的模型, 的模型化的模型化的模型化模型化模型化模型化了。
Article 293
Title@2025-06-25 (3): Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design
Title: Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design | Recurrent neuronale netzwerkbasierte robuste Steuerungssysteme mit geschlossener, regionaler Inkrementelle ISS und Anwendung in MPC-Design | 经常性神经网络的稳健神经网络控制系统,带有闭环区域递增性国际空间站并应用于多氯三联苯设计 2506.20334v1 |
Authors (4): Daniele Ravasio, Marcello Farina, Alessio La Bella, Andrea Ballarino
This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark, demonstrating the effectiveness of the proposed schemes.
本文调查了一组经常性神经网络所描述的系统的产出反馈计划的设计情况。我们提出了一个基于线性矩阵不平等的程序,用于设计观察员和静态国家反馈控制器。算法利用全球和区域递增投入到国家的稳定性(增加的国际空间站),并能够跟踪固定的定点,确保稳健的干扰和国家估计不确定性。为了解决区域递增国际空间站的潜在局限性,我们引入了一个替代方案,用一个基于管的、非线性模型预测控制器来取代静态法律,利用区域递增国际空间站的特性。我们表明,这些条件使得能够制定强有力的NMPC法律,保证趋同性和递归性可行性,导致吸引力的扩大。通过对pH-中立进程基准进行数字模拟,验证理论结果,表明拟议计划的有效性。
Article 294
Title@2025-06-25 (3): Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content
Title: Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content | Biomed-angereichert: Ein biomedizinischer Datensatz mit LLMs für die Vorschulung und Extraktion seltener und versteckter Inhalte | 生物医学富含生物医学:生物医学数据集,配有预培训和提取稀有和隐藏内容的LLMMs 2506.20331v1 |
Authors (3): Rian Touchent, Nathan Godey, Eric de la Clergerie
We introduce Biomed-Enriched, a biomedical text dataset constructed from PubMed via a two-stage annotation process. In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for their type (review, study, clinical case, other), domain (clinical, biomedical, other), and educational quality. The educational quality score (rated 1 to 5) estimates how useful a paragraph is for college-level learning. These annotations are then used to fine-tune a small language model, which propagates the labels across the full PMC-OA corpus. The resulting metadata allows us to extract refined subsets, including 2M clinical case paragraphs with over 450K high-quality ones from articles with commercial-use licenses, and to construct several variants via quality filtering and domain upsampling. Clinical text is typically difficult to access due to privacy constraints, as hospital records cannot be publicly shared. Hence, our dataset provides an alternative large-scale, openly available collection of clinical cases from PubMed, making it a valuable resource for biomedical and clinical NLP. Preliminary continual-pretraining experiments with OLMo2 suggest these curated subsets enable targeted improvements, with clinical upsampling boosting performance by ~5% on MMLU ProfMed and educational quality filtering improving MedQA and MedMCQA by ~1%. Combinations of these techniques led to faster convergence, reaching same performance with a third of training tokens, indicating potential for more efficient and effective biomedical pretraining strategies.
我们引入了Biomed-Enried, 生物医学文本数据集, 由PubMed通过两个阶段的批注过程构建。 在第一阶段, 一个大型语言模型, 将PubMed科学文章中的400K段内容通知PubMed科学文章, 分配其类型( 审查、 研究、 临床案例、 其他) 的评分, 领域( 临床、 生物医学、 其他) 和教育质量。 教育质量评分( 评分1至5) 估计某段对大学一级的学习有用。 这些评分随后被用来微调一个小语言模型, 在整个PMC- OA文库中传播标签。 由此形成的元数据让我们能够提取精细的子集, 包括2M2的临床案例段, 其中有450K的高质量文章, 商业用途许可证的评分, 领域( ) 域( 临床、 生物医学、 生物医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 医学、 研究等等实验实验实验实验实验, 通常试验, 都难以获得这些进步的改进。 因此, 向PMMLM 提供了这些进步的学习实验实验, 进步的改进了这些进步 和 和 基础 质量 质量 。 基础 基础 , 基础 , , 质量 学习 学习 学习 学习 和 学习 基础 , , , 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础 基础
Article 295
Title@2025-06-25 (3): Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition
Title: Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition | Representatives Lernen mit parameterisierten Quantenkreisen zur Förderung der Sprachemotionserkennung | 代表制学习,与推进言语情感识别参数量子电路进行代表制学习 2501.12050v3 |
Authors (5): Thejan Rajapakshe, Rajib Rana, Farina Riaz, Sara Khalifa, Björn W. Schuller
Quantum machine learning (QML) offers a promising avenue for advancing representation learning in complex signal domains. In this study, we investigate the use of parameterised quantum circuits (PQCs) for speech emotion recognition (SER) a challenging task due to the subtle temporal variations and overlapping affective states in vocal signals. We propose a hybrid quantum classical architecture that integrates PQCs into a conventional convolutional neural network (CNN), leveraging quantum properties such as superposition and entanglement to enrich emotional feature representations. Experimental evaluations on three benchmark datasets IEMOCAP, RECOLA, and MSP-IMPROV demonstrate that our hybrid model achieves improved classification performance relative to a purely classical CNN baseline, with over 50% reduction in trainable parameters. This work provides early evidence of the potential for QML to enhance emotion recognition and lays the foundation for future quantum-enabled affective computing systems.
量子机器学习(QML)为在复杂的信号领域推进代表性学习提供了一个有希望的渠道。在本研究中,我们调查了参数量子电路(PQCs)用于语音情感识别(SER)的用途,这是一项艰巨的任务,因为时间变化微妙,声音信号中具有相互重叠的情感状态。我们提议了一个混合量子古典结构,将PQCs纳入常规神经神经网络(CNN),利用量子特性,如叠加和缠绕来丰富情感特征表现。 对三套基准数据集IEMOCAP、RECOLA和MSP-IMPROV的实验性评估表明,我们的混合模型比纯古典CNN基线提高了分类性能,可训练参数减少了50%以上。这项工作早期地证明了QML提高情感认知的潜力,并为未来量子化影响计算系统奠定了基础。
Article 296
Title@2025-06-25 (3): Producer-Fairness in Sequential Bundle Recommendation
Title: Producer-Fairness in Sequential Bundle Recommendation | Hersteller-Fairness in Sequential Bundle Empfehlung | 序套件建议中的生产者公平 2506.20329v1 |
Authors (3): Alexandre Rio, Marta Soare, Sihem Amer-Yahia
We address fairness in the context of sequential bundle recommendation, where users are served in turn with sets of relevant and compatible items. Motivated by real-world scenarios, we formalize producer-fairness, that seeks to achieve desired exposure of different item groups across users in a recommendation session. Our formulation combines naturally with building high quality bundles. Our problem is solved in real time as users arrive. We propose an exact solution that caters to small instances of our problem. We then examine two heuristics, quality-first and fairness-first, and an adaptive variant that determines on-the-fly the right balance between bundle fairness and quality. Our experiments on three real-world datasets underscore the strengths and limitations of each solution and demonstrate their efficacy in providing fair bundle recommendations without compromising bundle quality.
我们从相继的捆绑建议的角度处理公平问题,即用户反过来得到一系列相关和兼容的物品。在现实世界情景的推动下,我们正式确定生产者公平性,力求在建议会议上实现不同物品群体在用户之间的理想接触。我们的配方自然地与建立高质量的捆绑结合起来。我们的问题在用户到达时即刻解决。我们提出了一个适合我们问题的小例子的精确解决方案。我们然后研究两种超常性,即质量第一和公平第一,以及决定捆绑公平和质量之间正确平衡的适应性变式。我们在三个真实世界数据集的实验强调了每一种解决办法的长处和局限性,并展示了它们在提供公平的捆绑绑绑建议而不损害捆绑质量方面的效力。
Article 297
Title@2025-06-25 (3): Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning
Title: Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning | Permutation Gleichwertige neural gesteuerte Differentialgleichungen für dynamisches Graphendarstellungslernen | 用于动态图表代表性学习的等同神经控制的异异性神经控制差异等量 2506.20324v1 |
Authors (5): Torben Berndt, Benjamin Walker, Tiexin Qin, Jan Stühmer, Andrey Kormilitzin
Dynamic graphs exhibit complex temporal dynamics due to the interplay between evolving node features and changing network structures. Recently, Graph Neural Controlled Differential Equations (Graph Neural CDEs) successfully adapted Neural CDEs from paths on Euclidean domains to paths on graph domains. Building on this foundation, we introduce Permutation Equivariant Neural Graph CDEs, which project Graph Neural CDEs onto permutation equivariant function spaces. This significantly reduces the model’s parameter count without compromising representational power, resulting in more efficient training and improved generalisation. We empirically demonstrate the advantages of our approach through experiments on simulated dynamical systems and real-world tasks, showing improved performance in both interpolation and extrapolation scenarios.
动态图形显示了复杂的时间动态, 原因是不断演变的节点特征和变化的网络结构之间的相互作用。 最近, 图形神经控制差异方程式( Grap Neal Control differental Equal Equal CDEs) 成功地改造了Euclidean 域的路径, 以及图形域的路径。 在此基础上, 我们引入了异变性神经图 CDEs, 投射到变异性等同功能空间上。 这大大降低了模型参数的计数, 同时又不损及代表能力, 从而导致更有效的培训和改进了一般化。 我们通过模拟动态系统和现实世界任务实验, 以实验方式展示了我们的方法的优势, 显示了在模拟动态系统和真实世界任务方面的绩效, 显示了在内推法和外推法情景上的绩效。
Article 298
Title@2025-06-25 (3): Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach
Title: Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach | Vergleichende Analyse von Deep-Learning-Modellen zur Erkennung von Crop Disease: Ein Transfer-Learning-Ansatz | 作物疾病检测深学习模型的比较分析:转让学习方法 2506.20323v1 |
Authors (4): Saundarya Subramaniam, Shalini Majumdar, Shantanu Nadar, Kaustubh Kulkarni
This research presents the development of an Artificial Intelligence (AI) - driven crop disease detection system designed to assist farmers in rural areas with limited resources. We aim to compare different deep learning models for a comparative analysis, focusing on their efficacy in transfer learning. By leveraging deep learning models, including EfficientNet, ResNet101, MobileNetV2, and our custom CNN, which achieved a validation accuracy of 95.76%, the system effectively classifies plant diseases. This research demonstrates the potential of transfer learning in reshaping agricultural practices, improving crop health management, and supporting sustainable farming in rural environments.
这项研究介绍了开发人工智能(AI)驱动作物疾病检测系统的情况,该系统旨在以有限的资源帮助农村地区的农民。我们的目标是比较不同的深层次学习模式,以便进行比较分析,重点是这些模式在转移学习方面的功效。通过利用深层次学习模式,包括高效网络、ResNet101、移动网络2和我们的有线电视新闻网,它们达到了95.76%的验证精确度。该系统有效地将植物疾病分类。这一研究表明了在改造农业做法、改善作物健康管理和支持农村环境的可持续农业方面转移学习的潜力。
Article 299
Title@2025-06-25 (3): Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning
Title: Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Konfuzius3-Math: Leichtes Hochleistungs-LLM für das chinesische K-12 Mathematik-Lernen | 剖析3- 数学: 中国 K-12 数学学习的轻量级高性能推理法LLMLM 2506.18330v2 |
Authors (5): Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan
We introduce Confucius3-Math, an open-source large language model with 14B parameters that (1) runs efficiently on a single consumer-grade GPU; (2) achieves SOTA performances on a range of mathematical reasoning tasks, outperforming many models with significantly larger sizes. In particular, as part of our mission to enhancing education and knowledge dissemination with AI, Confucius3-Math is specifically committed to mathematics learning for Chinese K-12 students and educators. Built via post-training with large-scale reinforcement learning (RL), Confucius3-Math aligns with national curriculum and excels at solving main-stream Chinese K-12 mathematical problems with low cost. In this report we share our development recipe, the challenges we encounter and the techniques we develop to overcome them. In particular, we introduce three technical innovations: Targeted Entropy Regularization, Recent Sample Recovery and Policy-Specific Hardness Weighting. These innovations encompass a new entropy regularization, a novel data scheduling policy, and an improved group-relative advantage estimator. Collectively, they significantly stabilize the RL training, improve data efficiency, and boost performance. Our work demonstrates the feasibility of building strong reasoning models in a particular domain at low cost. We open-source our model and code at https://github.com/netease-youdao/Confucius3-Math.
我们引入了开放源码的大型语言模型“孔子3-马特 ” ( Confurcius3-Math),这是一个开放源码的大型语言模型,有14B参数:(1) 高效地运行在单一消费级GPU上;(2) 实现SOTA在一系列数学推理任务方面的成绩,优于许多规模大得多的模型;特别是,作为我们与AI一起加强教育和知识传播的任务的一部分,孔子3-马特致力于中国K-12学生和教育工作者的数学学习。这些创新包括通过大规模强化学习(RL)、 Confurcius3-Math)的后培训,与国家课程保持一致,并优于低成本解决中流中国K-12数学问题。我们在本报告中分享了我们的发展秘诀、我们遇到的挑战以及我们为克服这些挑战而开发的技术。特别是,我们引入了三项技术创新:定向整治,最近的抽样恢复和政策特征易变。这些创新包括新的昆虫正规化,新的数据排期政策,以及改进的集团反向优势估测算器。集体,它们极大地稳定了RL培训,提高数据效率,提高数据效率,提升了我们在低成本模型上展示了我们的工作推理。
Article 300
Title@2025-06-25 (3): BINDy – Bayesian identification of nonlinear dynamics with reversible-jump Markov-chain Monte-Carlo
Title: BINDy – Bayesian identification of nonlinear dynamics with reversible-jump Markov-chain Monte-Carlo | BINDy – Bayesische Identifikation von nichtlinearen Dynamiken mit reversiblem Sprung Markov-Kette Monte-Carlo | BINDI-BINDy-Bayesian 识别非线性动态与可逆-可逆-jump Markov-链链-Monte-Carlo的非线性动态 2408.08062v3 |
Authors (2): Max D. Champneys, Timothy J. Rogers
Model parsimony is an important \emph{cognitive bias} in data-driven modelling that aids interpretability and helps to prevent over-fitting. Sparse identification of nonlinear dynamics (SINDy) methods are able to learn sparse representations of complex dynamics directly from data, given a basis of library functions. In this work, a novel Bayesian treatment of dictionary learning system identification, as an alternative to SINDy, is envisaged. The proposed method – Bayesian identification of nonlinear dynamics (BINDy) – is distinct from previous approaches in that it targets the full joint posterior distribution over both the terms in the library and their parameterisation in the model. This formulation confers the advantage that an arbitrary prior may be placed over the model structure to produce models that are sparse in the model space rather than in parameter space. Because this posterior is defined over parameter vectors that can change in dimension, the inference cannot be performed by standard techniques. Instead, a Gibbs sampler based on reversible-jump Markov-chain Monte-Carlo is proposed. BINDy is shown to compare favourably to ensemble SINDy in three benchmark case-studies. In particular, it is seen that the proposed method is better able to assign high probability to correct model terms.
nan
Article 301
Title@2025-06-25 (3): Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Title: Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration | Beyond-Expert Performance mit begrenzten Demonstrationen: Effizientes Imitationslernen mit doppelter Exploration | 具有有限演示的超出专家的超专业性能:高效的双重探索模拟学习 2506.20307v1 |
Authors (5): Heyang Zhao, Xingrui Yu, David M. Bossens, Ivor W. Tsang, Quanquan Gu
Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert’s behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.
nan
Article 302
Title@2025-06-25 (3): Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding
Title: Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding | Moderate Input-Sensitive Funktionen lernen: Eine Fallstudie in QR-Code-Dekodierung | 学习中度投入-敏感性功能:QR编码编码的案例研究 2506.20305v1 |
Authors (3): Kazuki Yoda, Kazuhiko Kawamoto, Hiroshi Kera
The hardness of learning a function that attains a target task relates to its input-sensitivity. For example, image classification tasks are input-insensitive as minor corruptions should not affect the classification results, whereas arithmetic and symbolic computation, which have been recently attracting interest, are highly input-sensitive as each input variable connects to the computation results. This study presents the first learning-based Quick Response (QR) code decoding and investigates learning functions of medium sensitivity. Our experiments reveal that Transformers can successfully decode QR codes, even beyond the theoretical error-correction limit, by learning the structure of embedded texts. They generalize from English-rich training data to other languages and even random strings. Moreover, we observe that the Transformer-based QR decoder focuses on data bits while ignoring error-correction bits, suggesting a decoding mechanism distinct from standard QR code readers.
nan
Article 303
Title@2025-06-25 (3): Bilinear MLPs enable weight-based mechanistic interpretability
Title: Bilinear MLPs enable weight-based mechanistic interpretability | Bilineare MLPs ermöglichen gewichtsbasierte mechanistische Interpretationsfähigkeit | 双线MLPs使基于重量的机械机械解释能力得以实现 2410.08417v2 |
Authors (5): Michael T. Pearce, Thomas Dooms, Alice Rigg, Jose M. Oramas, Lee Sharkey
A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that nevertheless achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight-based interpretability is viable for understanding deep-learning models.
nan
Article 304
Title@2025-06-25 (3): Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning
Title: Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning | Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen | 离线高层强化学习的图表辅助细化 2506.07744v2 |
Authors (5): Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim
Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.
nan
Article 305
Title@2025-06-25 (3): OLALa: Online Learned Adaptive Lattice Codes for Heterogeneous Federated Learning
Title: OLALa: Online Learned Adaptive Lattice Codes for Heterogeneous Federated Learning | OLala: Online gelernte adaptive Gittercodes für heterogenes Federated Learning | OLALA: 异质联邦学习在线知识适应性拉蒂码 2506.20297v1 |
Authors (3): Natalie Lang, Maya Simhi, Nir Shlezinger
Federated learning (FL) enables collaborative training across distributed clients without sharing raw data, often at the cost of substantial communication overhead induced by transmitting high-dimensional model updates. This overhead can be alleviated by having the clients quantize their model updates, with dithered lattice quantizers identified as an attractive scheme due to its structural simplicity and convergence-preserving properties. However, existing lattice-based FL schemes typically rely on a fixed quantization rule, which is suboptimal in heterogeneous and dynamic environments where the model updates distribution varies across users and training rounds. In this work, we propose Online Learned Adaptive Lattices (OLALa), a heterogeneous FL framework where each client can adjust its quantizer online using lightweight local computations. We first derive convergence guarantees for FL with non-fixed lattice quantizers and show that proper lattice adaptation can tighten the convergence bound. Then, we design an online learning algorithm that enables clients to tune their quantizers throughout the FL process while exchanging only a compact set of quantization parameters. Numerical experiments demonstrate that OLALa consistently improves learning performance under various quantization rates, outperforming conventional fixed-codebook and non-adaptive schemes.
nan
Article 306
Title@2025-06-25 (3): Provably Improving Generalization of Few-Shot Models with Synthetic Data
Title: Provably Improving Generalization of Few-Shot Models with Synthetic Data | Wahrscheinliche Verbesserung der Verallgemeinerung von wenigen scharfen Modellen mit synthetischen Daten | 改进利用合成数据普及微小热模型及合成数据 2505.24190v2 |
Authors (6): Lan-Cuong Nguyen, Quan Nguyen-Tri, Bang Tran Khanh, Dung D. Le, Long Tran-Thanh, Khoat Than
Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often face performance degradation due to the inherent gap between real and synthetic distributions. To address this limitation, we develop a theoretical framework that quantifies the impact of such distribution discrepancies on supervised learning, specifically in the context of image classification. More importantly, our framework suggests practical ways to generate good synthetic samples and to train a predictor with high generalization ability. Building upon this framework, we propose a novel theoretical-based algorithm that integrates prototype learning to optimize both data partitioning and model training, effectively bridging the gap between real few-shot data and synthetic data. Extensive experiments results show that our approach demonstrates superior performance compared to state-of-the-art methods, outperforming them across multiple datasets.
nan
Article 307
Title@2025-06-25 (3): Flexible Infinite-Width Graph Convolutional Neural Networks
Title: Flexible Infinite-Width Graph Convolutional Neural Networks | Flexible Infinite-Width Graph Convolutional Neural Networks | 灵活的无限线-线形图进化神经神经网络 2402.06525v2 |
Authors (3): Ben Anson, Edward Milsom, Laurence Aitchison
A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed and tunable only through a small number of hyperparameters, thus eliminating the possibility of representation learning. This contrasts with finite-width NNs, which are often believed to perform well because they are able to flexibly learn representations for the task at hand. Thus, in simplifying NNs to make them theoretically tractable, NNGPs may eliminate precisely what makes them work well (representation learning). This motivated us to understand whether representation learning is necessary in a range of graph tasks. We develop a precise tool for this task, the graph convolutional deep kernel machine. This is very similar to an NNGP, in that it is an infinite width limit and uses kernels, but comes with a ``knob’’ to control the amount of flexibility and hence representation learning. We found that representation learning gives noticeable performance improvements for heterophilous node classification tasks, but less so for homophilous node classification tasks.
nan
Article 308
Title@2025-06-25 (3): Efficient uniform approximation using Random Vector Functional Link networks
Title: Efficient uniform approximation using Random Vector Functional Link networks | Effiziente einheitliche Annäherung mit Random Vector Functional Link-Netzwerken | 使用随机矢量功能链接网络的有效统一近似 2306.17501v2 |
Authors (2): Palina Salanevich, Olov Schavemaker
A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. Only the outer weights of such an architecture are to be learned, so the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions in $L_\infty$ norm. To the best of our knowledge, our result is the first approximation result in $L_\infty$ norm using nice inner weights; namely, Gaussians. We give a nonasymptotic lower bound for the number of hidden-layer nodes to achieve a given accuracy with high probability, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
nan
Article 309
Title@2025-06-25 (3): Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo
Title: Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo | Lösen von linear-gaussischen inversen Problemen mit entkoppelter Diffusion Sequential Monte Carlo | 解决线性 – – 高加索-巴伊西亚州脱相扩散的反向问题 2502.06379v2 |
Authors (3): Filip Ekström Kelvinius, Zheng Zhao, Fredrik Lindsten
A recent line of research has exploited pre-trained generative diffusion models as priors for solving Bayesian inverse problems. We contribute to this research direction by designing a sequential Monte Carlo method for linear-Gaussian inverse problems which builds on “decoupled diffusion”, where the generative process is designed such that larger updates to the sample are possible. The method is asymptotically exact and we demonstrate the effectiveness of our Decoupled Diffusion Sequential Monte Carlo (DDSMC) algorithm on both synthetic as well as protein and image data. Further, we demonstrate how the approach can be extended to discrete data.
nan
Article 310
Title@2025-06-25 (3): Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective
Title: Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective | Über topologische selbsterklärbare GNNs hinaus: Eine formale Erklärbarkeitsperspektive | 超越地形学的自我自我解释的GNNs:正式解释的视角 2502.02719v2 |
Authors (4): Steve Azzolin, Sagar Malhotra, Andrea Passerini, Stefano Teso
Self-Explainable Graph Neural Networks (SE-GNNs) are popular explainable-by-design GNNs, but their explanations’ properties and limitations are not well understood. Our first contribution fills this gap by formalizing the explanations extracted by some popular SE-GNNs, referred to as Minimal Explanations (MEs), and comparing them to established notions of explanations, namely Prime Implicant (PI) and faithful explanations. Our analysis reveals that MEs match PI explanations for a restricted but significant family of tasks. In general, however, they can be less informative than PI explanations and are surprisingly misaligned with widely accepted notions of faithfulness. Although faithful and PI explanations are informative, they are intractable to find and we show that they can be prohibitively large. Given these observations, a natural choice is to augment SE-GNNs with alternative modalities of explanations taking care of SE-GNNs’ limitations. To this end, we propose Dual-Channel GNNs that integrate a white-box rule extractor and a standard SE-GNN, adaptively combining both channels. Our experiments show that even a simple instantiation of Dual-Channel GNNs can recover succinct rules and perform on par or better than widely used SE-GNNs.
nan
Article 311
Title@2025-06-25 (3): Distilling A Universal Expert from Clustered Federated Learning
Title: Distilling A Universal Expert from Clustered Federated Learning | Destillieren eines universellen Experten aus clustered Federated Learning | 一名来自分组联邦学习的通用专家 2506.20285v1 |
Authors (5): Zeqi Leng, Chunxu Zhang, Guodong Long, Riting Xia, Bo Yang
Clustered Federated Learning (CFL) addresses the challenges posed by non-IID data by training multiple group- or cluster-specific expert models. However, existing methods often overlook the shared information across clusters, which represents the generalizable knowledge valuable to all participants in the Federated Learning (FL) system. To overcome this limitation, this paper introduces a novel FL framework that distills a universal expert model from the knowledge of multiple clusters. This universal expert captures globally shared information across all clients and is subsequently distributed to each client as the initialization for the next round of model training. The proposed FL framework operates in three iterative steps: (1) local model training at each client, (2) cluster-specific model aggregation, and (3) universal expert distillation. This three-step learning paradigm ensures the preservation of fine-grained non-IID characteristics while effectively incorporating shared knowledge across clusters. Compared to traditional gradient-based aggregation methods, the distillation-based model aggregation introduces greater flexibility in handling model heterogeneity and reduces conflicts among cluster-specific experts. Extensive experimental results demonstrate the superior performance of the proposed method across various scenarios, highlighting its potential to advance the state of CFL by balancing personalized and shared knowledge more effectively.
nan
Article 312
Title@2025-06-25 (3): Forensic Study of Paintings Through the Comparison of Fabrics
Title: Forensic Study of Paintings Through the Comparison of Fabrics | Forensische Studie von Gemälden durch den Vergleich von Stoffen | 比较印刷品法证研究 2506.20272v1 |
Authors (3): Juan José Murillo-Fuentes, Pablo M. Olmos, Laura Alba-Carcelén
The study of canvas fabrics in works of art is a crucial tool for authentication, attribution and conservation. Traditional methods are based on thread density map matching, which cannot be applied when canvases do not come from contiguous positions on a roll. This paper presents a novel approach based on deep learning to assess the similarity of textiles. We introduce an automatic tool that evaluates the similarity between canvases without relying on thread density maps. A Siamese deep learning model is designed and trained to compare pairs of images by exploiting the feature representations learned from the scans. In addition, a similarity estimation method is proposed, aggregating predictions from multiple pairs of cloth samples to provide a robust similarity score. Our approach is applied to canvases from the Museo Nacional del Prado, corroborating the hypothesis that plain weave canvases, widely used in painting, can be effectively compared even when their thread densities are similar. The results demonstrate the feasibility and accuracy of the proposed method, opening new avenues for the analysis of masterpieces.
nan
Article 313
Title@2025-06-25 (3): X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis
Title: X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis | X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnose | XSIT:痴呆症诊断的内在解释式地表视野变异器 2506.20267v1 |
Authors (4): Fabian Bongratz, Tom Nuno Wolf, Jaume Gual Ramon, Christian Wachinger
Interpretable models are crucial for supporting clinical decision-making, driving advances in their development and application for medical images. However, the nature of 3D volumetric data makes it inherently challenging to visualize and interpret intricate and complex structures like the cerebral cortex. Cortical surface renderings, on the other hand, provide a more accessible and understandable 3D representation of brain anatomy, facilitating visualization and interactive exploration. Motivated by this advantage and the widespread use of surface data for studying neurological disorders, we present the eXplainable Surface Vision Transformer (X-SiT). This is the first inherently interpretable neural network that offers human-understandable predictions based on interpretable cortical features. As part of X-SiT, we introduce a prototypical surface patch decoder for classifying surface patch embeddings, incorporating case-based reasoning with spatially corresponding cortical prototypes. The results demonstrate state-of-the-art performance in detecting Alzheimer’s disease and frontotemporal dementia while additionally providing informative prototypes that align with known disease patterns and reveal classification errors.
nan
Article 314
Title@2025-06-25 (3): 3D variational autoencoder for fingerprinting microstructure volume elements
Title: 3D variational autoencoder for fingerprinting microstructure volume elements | 3D-Variations-Autoencoder für die Fingerabdruck-Mikrostruktur-Volume-Elemente | 用于指纹微结构体积元素的 3D 变式自动编码器 2503.17427v3 |
Authors (4): Michael D. White, Michael D. Atkinson, Adam J. Plowman, Pratheek Shanthraj
Microstructure quantification is an important step towards establishing structure-property relationships in materials. Machine learning-based image processing methods have been shown to outperform conventional image processing techniques and are increasingly applied to microstructure quantification tasks. In this work, we present a 3D variational autoencoder (VAE) for encoding microstructure volume elements (VEs) comprising voxelated crystallographic orientation data. Crystal symmetries in the orientation space are accounted for by mapping to the crystallographic fundamental zone as a preprocessing step, which allows for a continuous loss function to be used and improves the training convergence rate. The VAE is then used to encode a training set of VEs with an equiaxed polycrystalline microstructure with random texture. Accurate reconstructions are achieved with a relative average misorientation error of 3x10^-2 on the test dataset, for a continuous latent space with dimension 256. We show that the model generalises well to microstructures with textures, grain sizes and aspect ratios outside the training distribution. Structure-property relationships are explored through using the training set of VEs as initial configurations in various crystal plasticity (CP) simulations. Microstructural fingerprints extracted from the VAE, which parameterise the VEs in a low-dimensional latent space, are stored alongside the volume-averaged stress response, at each strain increment, to uniaxial tensile deformation from CP simulations. This is then used to train a fully connected neural network mapping the input fingerprint to the resulting stress response, which acts as a surrogate model for the CP simulation. The fingerprint-based surrogate model is shown to accurately predict the microstructural dependence in the CP stress response, with a relative mean-squared error of 2.75 MPa on unseen test data.
nan
Article 315
Title@2025-06-25 (3): Exploration-Exploitation Tradeoff in Universal Lossy Compression
Title: Exploration-Exploitation Tradeoff in Universal Lossy Compression | Explorations-Exploitation-Tradeoff bei universeller Lossy-Kompression | 普遍损失压缩中探索-探索-探索-开发权衡 2506.20261v1 |
Authors (2): Nir Weinberger, Ram Zamir
Universal compression can learn the source and adapt to it either in a batch mode (forward adaptation), or in a sequential mode (backward adaptation). We recast the sequential mode as a multi-armed bandit problem, a fundamental model in reinforcement-learning, and study the trade-off between exploration and exploitation in the lossy compression case. We show that a previously proposed “natural type selection” scheme can be cast as a reconstruction-directed MAB algorithm, for sequential lossy compression, and explain its limitations in terms of robustness and short-block performance. We then derive and analyze robust cost-directed MAB algorithms, which work at any block length.
nan
Article 316
Title@2025-06-25 (3): Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders
Title: Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders | Feintuning-Maschine-erlernte Partikelstrom-Rekonstruktion für neue Detektorgeometrien in zukünftigen Kollidern | 微调机了解粒子流重建,以在未来相撞器中进行新探测器的地形 2503.00131v4 |
Authors (7): Farouk Mokhtar, Joosep Pata, Dolores Garcia, Eric Wulff, Mengke Zhang, Michael Kagan, Javier Duarte
We demonstrate transfer learning capabilities in a machine-learned algorithm trained for particle-flow reconstruction in high energy particle colliders. This paper presents a cross-detector fine-tuning study, where we initially pretrain the model on a large full simulation dataset from one detector design, and subsequently fine-tune the model on a sample with a different collider and detector design. Specifically, we use the Compact Linear Collider detector (CLICdet) model for the initial training set and demonstrate successful knowledge transfer to the CLIC-like detector (CLD) proposed for the Future Circular Collider in electron-positron mode. We show that with an order of magnitude less samples from the second dataset, we can achieve the same performance as a costly training from scratch, across particle-level and event-level performance metrics, including jet and missing transverse momentum resolution. Furthermore, we find that the fine-tuned model achieves comparable performance to the traditional rule-based particle-flow approach on event-level metrics after training on 100,000 CLD events, whereas a model trained from scratch requires at least 1 million CLD events to achieve similar reconstruction performance. To our knowledge, this represents the first full-simulation cross-detector transfer learning study for particle-flow reconstruction. These findings offer valuable insights towards building large foundation models that can be fine-tuned across different detector designs and geometries, helping to accelerate the development cycle for new detectors and opening the door to rapid detector design and optimization using machine learning.
nan
Article 317
Title@2025-06-25 (3): Argumentative Ensembling for Robust Recourse under Model Multiplicity
Title: Argumentative Ensembling for Robust Recourse under Model Multiplicity | Argumentatives Zusammenbauen für robusten Rücklauf unter Modellvielfalt | 多种模式下强力利用的参数组合 2506.20260v1 |
Authors (4): Junqi Jiang, Antonio Rago, Francesco Leofante, Francesca Toni
In machine learning, it is common to obtain multiple equally performing models for the same prediction task, e.g., when training neural networks with different random seeds. Model multiplicity (MM) is the situation which arises when these competing models differ in their predictions for the same input, for which ensembling is often employed to determine an aggregation of the outputs. Providing recourse recommendations via counterfactual explanations (CEs) under MM thus becomes complex, since the CE may not be valid across all models, i.e., the CEs are not robust under MM. In this work, we formalise the problem of providing recourse under MM, which we name recourse-aware ensembling (RAE). We propose the idea that under MM, CEs for each individual model should be considered alongside their predictions so that the aggregated prediction and recourse are decided in tandem. Centred around this intuition, we introduce six desirable properties for solutions to this problem. For solving RAE, we propose a novel argumentative ensembling method which guarantees the robustness of CEs under MM. Specifically, our method leverages computational argumentation to explicitly represent the conflicts between models and counterfactuals regarding prediction results and CE validity. It then uses argumentation semantics to resolve the conflicts and obtain the final solution, in a manner which is parametric to the chosen semantics. Our method also allows for the specification of preferences over the models under MM, allowing further customisation of the ensemble. In a comprehensive theoretical analysis, we characterise the behaviour of argumentative ensembling with four different argumentation semantics. We then empirically demonstrate the effectiveness of our approach in satisfying desirable properties with eight instantiations of our method. (Abstract is shortened for arXiv.)
nan
Article 318
Title@2025-06-25 (3): A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features
Title: A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features | Ein transformerbasiertes Handschrifterkennungssystem, das Online- und Offline-Funktionen verwendet | 联合使用在线和离线特点的基于变换手写识别系统 2506.20255v1 |
Authors (4): Ayush Lodh, Ritabrata Chakraborty, Shivakumara Palaiahnakote, Umapada Pal
We posit that handwriting recognition benefits from complementary cues carried by the rasterized complex glyph and the pen’s trajectory, yet most systems exploit only one modality. We introduce an end-to-end network that performs early fusion of offline images and online stroke data within a shared latent space. A patch encoder converts the grayscale crop into fixed-length visual tokens, while a lightweight transformer embeds the $(x, y, \text{pen})$ sequence. Learnable latent queries attend jointly to both token streams, yielding context-enhanced stroke embeddings that are pooled and decoded under a cross-entropy loss objective. Because integration occurs before any high-level classification, temporal cues reinforce each other during representation learning, producing stronger writer independence. Comprehensive experiments on IAMOn-DB and VNOn-DB demonstrate that our approach achieves state-of-the-art accuracy, exceeding previous bests by up to 1\%. Our study also shows adaptation of this pipeline with gesturification on the ISI-Air dataset. Our code can be found here.
nan
Article 319
Title@2025-06-25 (3): Time-series surrogates from energy consumers generated by machine learning approaches for long-term forecasting scenarios
Title: Time-series surrogates from energy consumers generated by machine learning approaches for long-term forecasting scenarios | Zeitreihen von Energieverbrauchern, die durch maschinelle Lernansätze für langfristige Prognoseszenarien erzeugt werden | 长期预测设想情景的机器学习方法产生的能源消费者代用时间序列代用 2506.20253v1 |
Authors (8): Ben Gerhards, Nikita Popkov, Annekatrin König, Marcel Arpogaus, Bastian Schäfermeier, Leonie Riedl, Stephan Vogt, Philip Hehlert
Forecasting attracts a lot of research attention in the electricity value chain. However, most studies concentrate on short-term forecasting of generation or consumption with a focus on systems and less on individual consumers. Even more neglected is the topic of long-term forecasting of individual power consumption. Here, we provide an in-depth comparative evaluation of data-driven methods for generating synthetic time series data tailored to energy consumption long-term forecasting. High-fidelity synthetic data is crucial for a wide range of applications, including state estimations in energy systems or power grid planning. In this study, we assess and compare the performance of multiple state-of-the-art but less common techniques: a hybrid Wasserstein Generative Adversarial Network (WGAN), Denoising Diffusion Probabilistic Model (DDPM), Hidden Markov Model (HMM), and Masked Autoregressive Bernstein polynomial normalizing Flows (MABF). We analyze the ability of each method to replicate the temporal dynamics, long-range dependencies, and probabilistic transitions characteristic of individual energy consumption profiles. Our comparative evaluation highlights the strengths and limitations of: WGAN, DDPM, HMM and MABF aiding in selecting the most suitable approach for state estimations and other energy-related tasks. Our generation and analysis framework aims to enhance the accuracy and reliability of synthetic power consumption data while generating data that fulfills criteria like anonymisation - preserving privacy concerns mitigating risks of specific profiling of single customers. This study utilizes an open-source dataset from households in Germany with 15min time resolution. The generated synthetic power profiles can readily be used in applications like state estimations or consumption forecasting.
nan
Article 320
Title@2025-06-25 (3): Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
Title: Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models | Q-resafe: Bewertung von Sicherheitsrisiken und Quantization-aware Sicherheits-Patching für Quantized Large Language Models | Q-安全:评估安全风险和量化大语言模式量化大语言模型的量化安全补丁 2506.20251v1 |
Authors (7): Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song
Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experimental results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page is available at: https://github.com/Thecommonirin/Qresafe.
nan
Article 321
Title@2025-06-25 (3): FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data
Title: FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data | FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data | FDBKD: 精化的联邦学习学习,以接受非二二二二二维数据方面的全球化和个性化 2506.20245v1 |
Authors (8): Yushan Zhao, Jinyuan He, Donglai Chen, Weijie Luo, Chong Xie, Ri Zhang, Yonghong Chen, Yan Xu
Federated learning (FL) is a decentralized collaborative machine learning (ML) technique. It provides a solution to the issues of isolated data islands and data privacy leakage in industrial ML practices. One major challenge in FL is handling the non-identical and independent distributed (non-IID) data. Current solutions either focus on constructing an all-powerful global model, or customizing personalized local models. Few of them can provide both a well-generalized global model and well-performed local models at the same time. Additionally, many FL solutions to the non-IID problem are benefited from introducing public datasets. However, this will also increase the risk of data leakage. To tackle the problems, we propose a novel data-free distillation framework, Federated Bidirectional Knowledge Distillation (FedBKD). Specifically, we train Generative Adversarial Networks (GAN) for synthetic data. During the GAN training, local models serve as discriminators and their parameters are frozen. The synthetic data is then used for bidirectional distillation between global and local models to achieve knowledge interactions so that performances for both sides are improved. We conduct extensive experiments on 4 benchmarks under different non-IID settings. The results show that FedBKD achieves SOTA performances in every case.
nan
Article 322
Title@2025-06-25 (3): Dual-Channel Multiplex Graph Neural Networks for Recommendation
Title: Dual-Channel Multiplex Graph Neural Networks for Recommendation | Dual-Channel Multiplex Graph Neuronale Netzwerke zur Empfehlung | 供建议用的双声道多气多气图神经网络 2403.11624v5 |
Authors (7): Xiang Li, Chaofan Fu, Zhongying Zhao, Guanjie Zheng, Chao Huang, Yanwei Yu, Junyu Dong
Effective recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interactive relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant challenges: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations within behavior patterns on the target relation in recommender system scenarios. In this work, we introduce a novel recommendation framework, Dual-Channel Multiplex Graph Neural Network (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interactive relations, and includes a relation chain representation learner and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our DCMGNN surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06% and 12.15% on average across all datasets in terms of Recall@10 and NDCG@10, respectively.
nan
Article 323
Title@2025-06-25 (3): Directed Link Prediction using GNN with Local and Global Feature Fusion
Title: Directed Link Prediction using GNN with Local and Global Feature Fusion | Direkte Link-Vorhersage mit GNN mit lokaler und globaler Feature-Fusion | 使用GNN与本地和全球地貌融合的GNN进行直接链接预测 2506.20235v1 |
Authors (6): Yuyang Zhang, Xu Shen, Yu Xie, Ka-Chun Wong, Weidun Xie, Chengbin Peng
Link prediction is a classical problem in graph analysis with many practical applications. For directed graphs, recently developed deep learning approaches typically analyze node similarities through contrastive learning and aggregate neighborhood information through graph convolutions. In this work, we propose a novel graph neural network (GNN) framework to fuse feature embedding with community information. We theoretically demonstrate that such hybrid features can improve the performance of directed link prediction. To utilize such features efficiently, we also propose an approach to transform input graphs into directed line graphs so that nodes in the transformed graph can aggregate more information during graph convolutions. Experiments on benchmark datasets show that our approach outperforms the state-of-the-art in most cases when 30%, 40%, 50%, and 60% of the connected links are used as training data, respectively.
nan
Article 324
Title@2025-06-25 (3): E-ABIN: an Explainable module for Anomaly detection in BIological Networks
Title: E-ABIN: an Explainable module for Anomaly detection in BIological Networks | E-ABIN: ein erklärbares Modul zur Anomalieerkennung in BIologischen Netzwerken | E-ABIN:生物网络异常检测可解释模块 2506.20693v1 |
Authors (4): Ugo Lomoio, Tommaso Mazza, Pierangelo Veltri, Pietro Hiram Guzzi
The increasing availability of large-scale omics data calls for robust analytical frameworks capable of handling complex gene expression datasets while offering interpretable results. Recent advances in artificial intelligence have enabled the identification of aberrant molecular patterns distinguishing disease states from healthy controls. Coupled with improvements in model interpretability, these tools now support the identification of genes potentially driving disease phenotypes. However, current approaches to gene anomaly detection often remain limited to single datasets and lack accessible graphical interfaces. Here, we introduce E-ABIN, a general-purpose, explainable framework for Anomaly detection in Biological Networks. E-ABIN combines classical machine learning and graph-based deep learning techniques within a unified, user-friendly platform, enabling the detection and interpretation of anomalies from gene expression or methylation-derived networks. By integrating algorithms such as Support Vector Machines, Random Forests, Graph Autoencoders (GAEs), and Graph Adversarial Attributed Networks (GAANs), E-ABIN ensures a high predictive accuracy while maintaining interpretability. We demonstrate the utility of E-ABIN through case studies of bladder cancer and coeliac disease, where it effectively uncovers biologically relevant anomalies and offers insights into disease mechanisms.
nan
Article 325
Title@2025-06-25 (3): Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems
Title: Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems | Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems | 通过相互作用粒子系统逐步自由序列的巴伊西亚实验设计 2504.13320v2 |
Authors (4): Robert Gruhlke, Matei Hanu, Claudia Schillings, Philipp Wacker
We introduce a gradient-free framework for Bayesian Optimal Experimental Design (BOED) in sequential settings, aimed at complex systems where gradient information is unavailable. Our method combines Ensemble Kalman Inversion (EKI) for design optimization with the Affine-Invariant Langevin Dynamics (ALDI) sampler for efficient posterior sampling-both of which are derivative-free and ensemble-based. To address the computational challenges posed by nested expectations in BOED, we propose variational Gaussian and parametrized Laplace approximations that provide tractable upper and lower bounds on the Expected Information Gain (EIG). These approximations enable scalable utility estimation in high-dimensional spaces and PDE-constrained inverse problems. We demonstrate the performance of our framework through numerical experiments ranging from linear Gaussian models to PDE-based inference tasks, highlighting the method’s robustness, accuracy, and efficiency in information-driven experimental design.
nan
Article 326
Title@2025-06-25 (3): SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
Title: SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling | SLEEPING-DISCO 9M: Ein großformatiger Vortrainings-Datensatz für generative Musikmodellierung | SLEPING-DISCO 9M:用于基因音乐建模的大规模培训前数据集 2506.14293v3 |
Authors (3): Tawsif Ahmed, Andrej Radonjic, Gollam Rabby
We present Sleeping-DISCO 9M, a large-scale pre-training dataset for music and song. To the best of our knowledge, there are no open-source high-quality dataset representing popular and well-known songs for generative music modeling tasks such as text-music, music-captioning, singing-voice synthesis, melody reconstruction and cross-model retrieval. Past contributions focused on isolated and constrained factors whose core perspective was to create synthetic or re-recorded music corpus (e.g. GTSinger, M4Singer) and arbitrarily large-scale audio datasets (e.g. DISCO-10M and LAIONDISCO-12M) had been another focus for the community. Unfortunately, adoption of these datasets has been below substantial in the generative music community as these datasets fail to reflect real-world music and its flavour. Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.
nan
Article 327
Title@2025-06-25 (3): Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast
Title: Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast | Unterstützung der Planung und des Betriebs erneuerbarer Energien mit datengetriebener Hochauflösungs-Ensemble-Wettervorhersage | 支持可再生能源规划和运作,以数据驱动的高分辨率高分辨率气象组合组合天气预报支持可再生能源规划和运作 2505.04396v2 |
Authors (14): Jingnan Wang, Jie Chao, Shangshang Yang, Congyi Nai, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan
The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from chaoticity, model bias, and large-scale forcing. We address these challenges by learning the climatological distribution of a target wind farm using its high-resolution numerical weather simulations. An optimal combination of this learned high-resolution climatological prior with coarse-grid large scale forecasts yields highly accurate, fine-grained, full-variable, large ensemble of weather pattern forecasts. Using observed meteorological records and wind turbine power outputs as references, the proposed methodology verifies advantageously compared to existing numerical/statistical forecasting-downscaling pipelines, regarding either deterministic/probabilistic skills or economic gains. Moreover, a 100-member, 10-day forecast with spatial resolution of 1 km and output frequency of 15 min takes < 1 hour on a moderate-end GPU, as contrast to $\mathcal{O}(10^3)$ CPU hours for conventional numerical simulation. By drastically reducing computational costs while maintaining accuracy, our method paves the way for more efficient and reliable renewable energy planning and operation.
nan
Article 328
Title@2025-06-25 (3): MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution
Title: MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution | MS-TVNet:Eine Langzeit-Zeitreihenvorhersagemethode auf der Grundlage multi-Scale Dynamic Convolution | MS-TVNet:基于多空间动态演变的长期时间序列预测方法 2506.17253v2 |
Authors (4): Chenghan Li, Mingchen Li, Yipu Liao, Ruisheng Diao
Long-term time series prediction has predominantly relied on Transformer and MLP models, while the potential of convolutional networks in this domain remains underexplored. To address this gap, we introduce a novel multi-scale time series reshape module, which effectively captures the relationships among multi-period patches and variable dependencies. Building upon this module, we propose MS-TVNet, a multi-scale 3D dynamic convolutional neural network. Through comprehensive evaluations on diverse datasets, MS-TVNet demonstrates superior performance compared to baseline models, achieving state-of-the-art (SOTA) results in long-term time series prediction. Our findings highlight the effectiveness of leveraging convolutional networks for capturing complex temporal patterns, suggesting a promising direction for future research in this field.The code is realsed on https://github.com/Curyyfaust/TVNet.
nan
Article 329
Title@2025-06-25 (3): Curved representational Bregman divergences and their applications
Title: Curved representational Bregman divergences and their applications | Gebogene Repräsentationsdivergenzen von Bregman und deren Anwendungen | 曲线代表布列格曼差异及其应用 2504.05654v2 |
Authors (1): Frank Nielsen
By analogy to curved exponential families in statistics, we define curved Bregman divergences as Bregman divergences restricted to nonlinear parameter subspaces. We show that the barycenter of a finite weighted set of parameters under a curved Bregman divergence amounts to the right Bregman projection onto the nonlinear subspace of the barycenter with respect to the full Bregman divergence. We demonstrate the significance of curved Bregman divergences with two examples: (1) symmetrized Bregman divergences and (2) the Kullback-Leibler divergence between circular complex normal distributions. We then consider monotonic embeddings to define representational curved Bregman divergences and show that the $\alpha$-divergences are representational curved Bregman divergences with respect to $\alpha$-embeddings of the probability simplex into the positive measure cone. As an application, we report an efficient method to calculate the intersection of a finite set of $\alpha$-divergence spheres.
nan
Article 330
Title@2025-06-25 (3): Affective Priming Score: A Data-Driven Method to Detect Priming in Sequential Datasets
Title: Affective Priming Score: A Data-Driven Method to Detect Priming in Sequential Datasets | Affektiver Priming-Score: Eine datengetriebene Methode, um Priming in Sequenzdatensätzen zu erkennen | 情感原始分数:在序列数据集中检测原始数据的数据驱动方法 2506.20204v1 |
Authors (3): Eduardo Gutierrez Maestro, Hadi Banaee, Amy Loutfi
Affective priming exemplifies the challenge of ambiguity in affective computing. While the community has largely addressed this issue from a label-based perspective, identifying data points in the sequence affected by the priming effect, the impact of priming on data itself, particularly in physiological signals, remains underexplored. Data affected by priming can lead to misclassifications when used in learning models. This study proposes the Affective Priming Score (APS), a data-driven method to detect data points influenced by the priming effect. The APS assigns a score to each data point, quantifying the extent to which it is affected by priming. To validate this method, we apply it to the SEED and SEED-VII datasets, which contain sufficient transitions between emotional events to exhibit priming effects. We train models with the same configuration using both the original data and priming-free sequences. The misclassification rate is significantly reduced when using priming-free sequences compared to the original data. This work contributes to the broader challenge of ambiguity by identifying and mitigating priming effects at the data level, enhancing model robustness, and offering valuable insights for the design and collection of affective computing datasets.
nan
Article 331
Title@2025-06-25 (3): Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach
Title: Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach | Zero-Shot Attribution für große Sprachmodelle: Ein Distributionstestverfahren | 大语言模式零点位数:分销测试方法 2506.20197v1 |
Authors (3): Clément L. Canonne, Yash Pote, Uddalok Sarkar
A growing fraction of all code is sampled from Large Language Models (LLMs). We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees. Given a set of samples $S$ and a suspect model $\mathcal{L}^$, our goal is to assess the likelihood of $S$ originating from $\mathcal{L}^$. Due to the curse of dimensionality, this is intractable when only samples from the LLM are given: to circumvent this, we use both samples and density estimates from the LLM, a form of access commonly available. We introduce $\mathsf{Anubis}$, a zero-shot attribution tool that frames attribution as a distribution testing problem. Our experiments on a benchmark of code samples show that $\mathsf{Anubis}$ achieves high AUROC scores ( $\ge0.9$) when distinguishing between LLMs like DeepSeek-Coder, CodeGemma, and Stable-Code using only $\approx 2000$ samples.
nan
Article 332
Title@2025-06-25 (3): DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Title: DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs | DuoGPT: Training-freie Dual Sparsity durch Aktivierungs-bewusstes Pruning in LLMs | DuoGPT:通过在LLM中采取积极-有意识的节制措施,实现无培训的双重平等 2506.20194v1 |
Authors (4): Ruokai Yin, Yuhang Li, Donghyun Lee, Priyadarshini Panda
Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation sparsity observed at runtime. We reinterpret activation sparsity as dynamic structured weight sparsity and propose DuoGPT, a unified framework that constructs dual-sparse (spMspV) workloads by combining unstructured weight pruning with activation sparsity. To preserve accuracy, we extend the Optimal Brain Compression (OBC) framework with activation-aware calibration and introduce output residuals from the dense model as correction terms. We further optimize the solution for efficient GPU execution, enabling scalability to billion-parameter LLMs. Evaluations on LLaMA-2 and LLaMA-3 show that DuoGPT outperforms state-of-the-art structured pruning methods by up to 9.17% accuracy at an iso-speedup of 1.39$\times$ compared to the baseline dense model.
nan
Article 333
Title@2025-06-25 (3): IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model
Title: IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model | IKDiffuser: Ein generatives Inverse Kinematik-Lösemittel für Multiarm-Roboter über Diffusionsmodell | IKDiffuser: 通过扩散模型为多武器机器人制造的生成反反反虚拟解答器 2506.13087v3 |
Authors (2): Zeyu Zhang, Ziyuan Jiao
Solving Inverse Kinematics (IK) problems is fundamental to robotics, but has primarily been successful with single serial manipulators. For multi-arm robotic systems, IK remains challenging due to complex self-collisions, coupled joints, and high-dimensional redundancy. These complexities make traditional IK solvers slow, prone to failure, and lacking in solution diversity. In this paper, we present IKDiffuser, a diffusion-based model designed for fast and diverse IK solution generation for multi-arm robotic systems. IKDiffuser learns the joint distribution over the configuration space, capturing complex dependencies and enabling seamless generalization to multi-arm robotic systems of different structures. In addition, IKDiffuser can incorporate additional objectives during inference without retraining, offering versatility and adaptability for task-specific requirements. In experiments on 6 different multi-arm systems, the proposed IKDiffuser achieves superior solution accuracy, precision, diversity, and computational efficiency compared to existing solvers. The proposed IKDiffuser framework offers a scalable, unified approach to solving multi-arm IK problems, facilitating the potential of multi-arm robotic systems in real-time manipulation tasks.
nan
Article 334
Title@2025-06-25 (3): Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-Informed Neural Networks
Title: Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-Informed Neural Networks | Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-informed Neural Networks | 通过反事实物理内成神经网络在部分差别中发现因果操作器 2506.20181v1 |
Authors (1): Ronald Katende
We develop a principled framework for discovering causal structure in partial differential equations (PDEs) using physics-informed neural networks and counterfactual perturbations. Unlike classical residual minimization or sparse regression methods, our approach quantifies operator-level necessity through functional interventions on the governing dynamics. We introduce causal sensitivity indices and structural deviation metrics to assess the influence of candidate differential operators within neural surrogates. Theoretically, we prove exact recovery of the causal operator support under restricted isometry or mutual coherence conditions, with residual bounds guaranteeing identifiability. Empirically, we validate the framework on both synthetic and real-world datasets across climate dynamics, tumor diffusion, and ocean flows. Our method consistently recovers governing operators even under noise, redundancy, and data scarcity, outperforming standard PINNs and DeepONets in structural fidelity. This work positions causal PDE discovery as a tractable and interpretable inference task grounded in structural causal models and variational residual analysis.
nan
Article 335
Title@2025-06-25 (3): COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees
Title: COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees | COIN: Ungewissheitssicherung Selektive Frage-Beantwortung für Stiftungsmodelle mit wahrscheinlichen Risikogarantien | COIN: 可靠风险保障基础模型的不确定性保护选择性问题选择性回答 2506.20178v1 |
Authors (7): Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu
Uncertainty quantification (UQ) for foundation models is essential to identify and mitigate potential hallucinations in automatically generated text. However, heuristic UQ approaches lack formal guarantees for key metrics such as the false discovery rate (FDR) in selective prediction. Previous work adopts the split conformal prediction (SCP) framework to ensure desired coverage of admissible answers by constructing prediction sets, but these sets often contain incorrect candidates, limiting their practical utility. To address this, we propose COIN, an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question under user-specified FDR constraints. COIN estimates the empirical error rate on a calibration set and applies confidence interval methods such as Clopper-Pearson to establish a high-probability upper bound on the true error rate (i.e., FDR). This enables the selection of the largest uncertainty threshold that ensures FDR control on test data while significantly increasing sample retention. We demonstrate COIN’s robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data across both general and multimodal text generation tasks. Furthermore, we show that employing alternative upper bound constructions and UQ strategies can further boost COIN’s power performance, which underscores its extensibility and adaptability to diverse application scenarios.
nan
Article 336
Title@2025-06-25 (3): Valid Selection among Conformal Sets
Title: Valid Selection among Conformal Sets | Gültige Auswahl unter konformen Sets | 在套件中有效选择 2506.20173v1 |
Authors (4): Mahmoud Hegazy, Liviu Aolaritei, Michael I. Jordan, Aymeric Dieuleveut
Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To address this challenge, we propose a stability-based approach that ensures coverage for the selected prediction set. We extend our results to the online conformal setting, propose several refinements in settings where additional structure is available, and demonstrate its effectiveness through experiments.
nan
Article 337
Title@2025-06-25 (3): Causal discovery in deterministic discrete LTI-DAE systems
Title: Causal discovery in deterministic discrete LTI-DAE systems | Kausale Entdeckung in deterministischen diskreten LTI-DAE-Systemen | LTI-DAE系统中决定性离散离散系统中的因果发现 2506.20169v1 |
Authors (2): Bala Rajesh Konkathi, Arun K. Tangirala
Discovering pure causes or driver variables in deterministic LTI systems is of vital importance in the data-driven reconstruction of causal networks. A recent work by Kathari and Tangirala, proposed in 2022, formulated the causal discovery method as a constraint identification problem. The constraints are identified using a dynamic iterative PCA (DIPCA)-based approach for dynamical systems corrupted with Gaussian measurement errors. The DIPCA-based method works efficiently for dynamical systems devoid of any algebraic relations. However, several dynamical systems operate under feedback control and/or are coupled with conservation laws, leading to differential-algebraic (DAE) or mixed causal systems. In this work, a method, namely the partition of variables (PoV), for causal discovery in LTI-DAE systems is proposed. This method is superior to the method that was presented by Kathari and Tangirala (2022), as PoV also works for pure dynamical systems, which are devoid of algebraic equations. The proposed method identifies the causal drivers up to a minimal subset. PoV deploys DIPCA to first determine the number of algebraic relations ($n_a$), the number of dynamical relations ($n_d$) and the constraint matrix. Subsequently, the subsets are identified through an admissible partitioning of the constraint matrix by finding the condition number of it. Case studies are presented to demonstrate the effectiveness of the proposed method.
nan
Article 338
Title@2025-06-25 (3): Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes
Title: Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes | Aktives Lernen von tiefen neuralen Netzwerken durch gradient-free Schneidplanen | 通过无梯度断层计划积极学习深神经网络 2410.02145v5 |
Authors (3): Erica Zhang, Fangzhao Zhang, Mert Pilanci
Active learning methods aim to improve sample complexity in machine learning. In this work, we investigate an active learning scheme via a novel gradient-free cutting-plane training method for ReLU networks of arbitrary depth and develop a convergence theory. We demonstrate, for the first time, that cutting-plane algorithms, traditionally used in linear models, can be extended to deep neural networks despite their nonconvexity and nonlinear decision boundaries. Moreover, this training method induces the first deep active learning scheme known to achieve convergence guarantees, revealing a geometric contraction rate of the feasible set. We exemplify the effectiveness of our proposed active learning method against popular deep active learning baselines via both synthetic data experiments and sentimental classification task on real datasets.
nan
Article 339
Title@2025-06-25 (3): Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
Title: Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners | Belohnender Graph Reasoning Prozess macht LLMs mehr Generalized Reasoners | 奖励图表说明程序使LLMs公司更普遍化理由 2503.00845v2 |
Authors (4): Miao Peng, Nuo Chen, Zongrui Suo, Jia Li
Despite significant advancements in Large Language Models (LLMs), developing advanced reasoning capabilities in LLMs remains a key challenge. Process Reward Models (PRMs) have demonstrated exceptional promise in enhancing reasoning by providing step-wise feedback, particularly in the context of mathematical reasoning. However, their application to broader reasoning domains remains understudied, largely due to the high costs associated with manually creating step-level supervision. In this work, we explore the potential of PRMs in graph reasoning problems - a domain that demands sophisticated multi-step reasoning and offers opportunities for automated step-level data generation using established graph algorithms. We introduce GraphSILO, the largest dataset for graph reasoning problems with fine-grained step-wise labels, built using automated Task-oriented Trajectories and Monte Carlo Tree Search (MCTS) to generate detailed reasoning steps with step-wise labels. Building upon this dataset, we train GraphPRM, the first PRM designed for graph reasoning problems, and evaluate its effectiveness in two key settings: inference-time scaling and reinforcement learning via Direct Preference Optimization (DPO). Experimental results show that GraphPRM significantly improves LLM performance across 13 graph reasoning tasks, delivering a 9% gain for Qwen2.5-7B and demonstrating transferability to new graph reasoning datasets and new reasoning domains like mathematical problem-solving. Notably, GraphPRM enhances LLM performance on GSM8K and Math500, underscoring the cross-domain applicability of graph-based reasoning rewards. Our findings highlight the potential of PRMs in advancing reasoning across diverse domains, paving the way for more versatile and effective LLMs.
nan
Article 340
Title@2025-06-25 (3): Counterfactual Fairness through Transforming Data Orthogonal to Bias
Title: Counterfactual Fairness through Transforming Data Orthogonal to Bias | Counterfactual Fairness durch Umwandlung von Daten Orthogonal zu Bias | 通过将数据正正向转换为比亚斯来反事实公平 2403.17852v3 |
Authors (2): Shuyi Chen, Shixiang Zhu
Machine learning models have shown exceptional prowess in solving complex issues across various domains. However, these models can sometimes exhibit biased decision-making, resulting in unequal treatment of different groups. Despite substantial research on counterfactual fairness, methods to reduce the impact of multivariate and continuous sensitive variables on decision-making outcomes are still underdeveloped. We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB), which is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications. Our approach, based on the assumption of a jointly normal distribution within a structural causal model (SCM), demonstrates that counterfactual fairness can be achieved by ensuring the data is orthogonal to the observed sensitive variables. The OB algorithm is model-agnostic, making it applicable to a wide range of machine learning models and tasks. Additionally, it includes a sparse variant to improve numerical stability through regularization. Empirical evaluations on both simulated and real-world datasets, encompassing settings with both discrete and continuous sensitive variables, show that our methodology effectively promotes fairer outcomes without compromising accuracy.
nan
Article 341
Title@2025-06-25 (3): Accept More, Reject Less: Reducing up to 19% Unnecessary Desk-Rejections over 11 Years of ICLR Data
Title: Accept More, Reject Less: Reducing up to 19% Unnecessary Desk-Rejections over 11 Years of ICLR Data | Mehr akzeptieren, weniger ablehnen: bis zu 19% reduzieren Unnötige Desk-Abweisungen über 11 Jahre ICLR-Daten | 接受更多,拒绝减:在11年的ICLR数据中,将不必要的书面拒绝减少19% 2506.20141v1 |
Authors (3): Xiaoyu Li, Zhao Song, Jiahao Zhang
The explosive growth of AI research has driven paper submissions at flagship AI conferences to unprecedented levels, necessitating many venues in 2025 (e.g., CVPR, ICCV, KDD, AAAI, IJCAI, WSDM) to enforce strict per-author submission limits and to desk-reject any excess papers by simple ID order. While this policy helps reduce reviewer workload, it may unintentionally discard valuable papers and penalize authors’ efforts. In this paper, we ask an essential research question on whether it is possible to follow submission limits while minimizing needless rejections. We first formalize the current desk-rejection policies as an optimization problem, and then develop a practical algorithm based on linear programming relaxation and a rounding scheme. Under extensive evaluation on 11 years of real-world ICLR (International Conference on Learning Representations) data, our method preserves up to $19.23\%$ more papers without violating any author limits. Moreover, our algorithm is highly efficient in practice, with all results on ICLR data computed within at most 53.64 seconds. Our work provides a simple and practical desk-rejection strategy that significantly reduces unnecessary rejections, demonstrating strong potential to improve current CS conference submission policies.
nan
Article 342
Title@2025-06-25 (3): Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis
Title: Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis | Stückweise lineare Annäherung in Lernindexstrukturen: Theoretische und empirische Analyse | 进化指数结构的细线近似:理论和经验分析 2506.20139v1 |
Authors (10): Jiayong Qin, Xianyu Zhu, Qiyu Liu, Guangyi Zhang, Zhigang Cai, Jianwei Liao, Sha Hu, Jingshu Peng, Yingxia Shao, Lei Chen
A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($\epsilon$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $\epsilon$-PLA fitting algorithms remain underexplored. In this paper, we revisit $\epsilon$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $\Omega(\kappa \cdot \epsilon^2)$ on the expected segment coverage for existing $\epsilon$-PLA fitting algorithms, where $\kappa$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $\epsilon$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.
nan
Article 343
Title@2025-06-25 (3): TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis
Title: TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis | TSPulse: Dual Space Tiny Pre-Trained Modelle für die schnelle Zeit-Serien-Analyse | TSPulse: 快速时序分析的双重空间细细件前培训模型 2505.13033v2 |
Authors (8): Vijay Ekambaram, Subodh Kumar, Arindam Jati, Sumanta Mukherjee, Tomoya Sakai, Pankaj Dayama, Wesley M. Gifford, Jayant Kalagnanam
The rise of time-series pre-trained models has advanced temporal representation learning, but current state-of-the-art models are often large-scale, requiring substantial compute. We introduce TSPulse, ultra-compact time-series pre-trained models with only 1M parameters, specialized to perform strongly across classification, anomaly detection, imputation, and retrieval tasks. TSPulse introduces innovations at both the architecture and task levels. At the architecture level, it employs a dual-space masked reconstruction, learning from both time and frequency domains to capture complementary signals. This is further enhanced by a dual-embedding disentanglement, generating both detailed embeddings for fine-grained analysis and high-level semantic embeddings for broader task understanding. Notably, TSPulse’s semantic embeddings are robust to shifts in time, magnitude, and noise, which is important for robust retrieval. At the task level, TSPulse incorporates TSLens, a fine-tuning component enabling task-specific feature attention. It also introduces a multi-head triangulation technique that correlates deviations from multiple prediction heads, enhancing anomaly detection by fusing complementary model outputs. Additionally, a hybrid mask pretraining is proposed to improves zero-shot imputation by reducing pre-training bias. These architecture and task innovations collectively contribute to TSPulse’s significant performance gains: 5-16% on the UEA classification benchmarks, +20% on the TSB-AD anomaly detection leaderboard, +50% in zero-shot imputation, and +25% in time-series retrieval. Remarkably, these results are achieved with just 1M parameters (10-100X smaller than existing SOTA models) and allow GPU-free inference, setting a new standard for efficient time-series pre-trained models. The models can be accessed from https://huggingface.co/ibm-granite/granite-timeseries-tspulse-r1
nan
Article 344
Title@2025-06-25 (3): High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data
Title: High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data | High-Resolution Live Fuel Moisture Content (LFMC) Karten für Wildfire-Risiko aus multimodalen Erdbeobachtungsdaten | 多式地球观测数据产生的野火风险高分辨率活燃料动力内容地图 2506.20132v1 |
Authors (8): Patrick Alan Johnson, Gabriel Tseng, Yawen Zhang, Heather Heward, Virginia Sjahli, Favyen Bastani, Joseph Redmon, Patrick Beukema
Wildfires are increasing in intensity and severity at an alarming rate. Recent advances in AI and publicly available satellite data enable monitoring critical wildfire risk factors globally, at high resolution and low latency. Live Fuel Moisture Content (LFMC) is a critical wildfire risk factor and is valuable for both wildfire research and operational response. However, ground-based LFMC samples are both labor intensive and costly to acquire, resulting in sparse and infrequent updates. In this work, we explore the use of a pretrained, highly-multimodal earth-observation model for generating large-scale spatially complete (wall-to-wall) LFMC maps. Our approach achieves significant improvements over previous methods using randomly initialized models (20 reduction in RMSE). We provide an automated pipeline that enables rapid generation of these LFMC maps across the United States, and demonstrate its effectiveness in two regions recently impacted by wildfire (Eaton and Palisades).
nan
Article 345
Title@2025-06-25 (3): Log-Linear Attention
Title: Log-Linear Attention | Log-Linear-Achtung | 日志边注意 2506.04761v2 |
Authors (6): Han Guo, Songlin Yang, Tarushii Goel, Eric P. Xing, Tri Dao, Yoon Kim
The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention’s efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures – Mamba-2 and Gated DeltaNet – and find they perform well compared to their linear-time variants.
nan
Article 346
Title@2025-06-25 (3): CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation
Title: CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation | CCRS: Ein Null-Shot LLM-as-a-Richter-Rahmen für eine umfassende RAG-Bewertung | CCRS: 全面RAG综合评价框架 2506.20128v1 |
Authors (1): Aashiq Muhamed
RAG systems enhance LLMs by incorporating external knowledge, which is crucial for domains that demand factual accuracy and up-to-date information. However, evaluating the multifaceted quality of RAG outputs, spanning aspects such as contextual coherence, query relevance, factual correctness, and informational completeness, poses significant challenges. Existing evaluation methods often rely on simple lexical overlap metrics, which are inadequate for capturing these nuances, or involve complex multi-stage pipelines with intermediate steps like claim extraction or require finetuning specialized judge models, hindering practical efficiency. To address these limitations, we propose CCRS (Contextual Coherence and Relevance Score), a novel suite of five metrics that utilizes a single, powerful, pretrained LLM as a zero-shot, end-to-end judge. CCRS evaluates: Contextual Coherence (CC), Question Relevance (QR), Information Density (ID), Answer Correctness (AC), and Information Recall (IR). We apply CCRS to evaluate six diverse RAG system configurations on the challenging BioASQ dataset. Our analysis demonstrates that CCRS effectively discriminates between system performances, confirming, for instance, that the Mistral-7B reader outperforms Llama variants. We provide a detailed analysis of CCRS metric properties, including score distributions, convergent/discriminant validity, tie rates, population statistics, and discriminative power. Compared to the complex RAGChecker framework, CCRS offers comparable or superior discriminative power for key aspects like recall and faithfulness, while being significantly more computationally efficient. CCRS thus provides a practical, comprehensive, and efficient framework for evaluating and iteratively improving RAG systems.
nan
Article 347
Title@2025-06-25 (3): Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts
Title: Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts | Bewertung der Verallgemeinerung und Vertretungsstabilität in kleinen LMs durch Prompting, Fine-Tuning und Out-of-Distribution Prompts | 通过提示、罚款和销售外提示评估小型液流中小液流中普遍化和代表性稳定情况 2506.17289v2 |
Authors (2): Rahul Raja, Arpita Vats
We investigate the generalization capabilities of small language models under two popular adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is often favored for its parameter efficiency and flexibility, it remains unclear how robust this approach is in low-resource settings and under distributional shifts. This paper presents a comparative study of prompting and fine-tuning across task formats, prompt styles, and model scales, with a focus on their behavior in both in-distribution and out-of-distribution (OOD) settings. Beyond accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our findings highlight critical differences in how small models internalize and generalize knowledge under different adaptation strategies. This work offers practical guidance for model selection in low-data regimes and contributes empirical insight into the ongoing debate over prompting versus fine-tuning. Code for the experiments is available at the following
nan
Article 348
Title@2025-06-25 (3): Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests
Title: Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests | Einsatz von KI-Gradern für fehlende Score-Imputation, um eine genaue Abschätzung der Fähigkeit in konstruierten Reaktionstests zu erreichen | 利用AI 级数来计算缺失计分数,以在建构反应测试中实现准确能力估算 2506.20119v1 |
Authors (2): Masaki Uto, Yuma Ito
Evaluating the abilities of learners is a fundamental objective in the field of education. In particular, there is an increasing need to assess higher-order abilities such as expressive skills and logical thinking. Constructed-response tests such as short-answer and essay-based questions have become widely used as a method to meet this demand. Although these tests are effective, they require substantial manual grading, making them both labor-intensive and costly. Item response theory (IRT) provides a promising solution by enabling the estimation of ability from incomplete score data, where human raters grade only a subset of answers provided by learners across multiple test items. However, the accuracy of ability estimation declines as the proportion of missing scores increases. Although data augmentation techniques for imputing missing scores have been explored in order to address this limitation, they often struggle with inaccuracy for sparse or heterogeneous data. To overcome these challenges, this study proposes a novel method for imputing missing scores by leveraging automated scoring technologies for accurate IRT-based ability estimation. The proposed method achieves high accuracy in ability estimation while markedly reducing manual grading workload.
nan
Article 349
Title@2025-06-25 (3): U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs
Title: U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs | U-R-VEDA: Integration von UNET, Residual Links, Edge und Dual Attention und Vision Transformer für präzise semantische Segmentierung von CMRs | U-R-VEDA:将UNET、残余链接、边缘和双重关注以及遗留集束弹药准确的语义分割的愿景变异器结合起来 2506.20689v1 |
Authors (2): Racheal Mukisa, Arvind K. Bansal
Artificial intelligence, including deep learning models, will play a transformative role in automated medical image analysis for the diagnosis of cardiac disorders and their management. Automated accurate delineation of cardiac images is the first necessary initial step for the quantification and automated diagnosis of cardiac disorders. In this paper, we propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channel-attention, and spatial attention, together with edge-detection based skip-connections for an accurate fully-automated semantic segmentation of cardiac magnetic resonance (CMR) images. The model extracts local-features and their interrelationships using a stack of combination convolution blocks, with embedded channel and spatial attention in the convolution block, and vision transformers. Deep embedding of channel and spatial attention in the convolution block identifies important features and their spatial localization. The combined edge information with channel and spatial attention as skip connection reduces information-loss during convolution transformations. The overall model significantly improves the semantic segmentation of CMR images necessary for improved medical image analysis. An algorithm for the dual attention module (channel and spatial attention) has been presented. Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC metrics. The model outperforms the accuracy attained by other models, based on DSC and HD metrics, especially for the delineation of right-ventricle and left-ventricle-myocardium.
nan
Article 350
Title@2025-06-25 (3): Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives
Title: Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives | Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives | 从树形集合中提取解释模型:计算和统计视角 2506.20114v1 |
Authors (3): Brian Liu, Rahul Mazumder, Peter Radchenko
Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.
nan
Article 351
Title@2025-06-25 (3): Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox
Title: Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox | Autonome Cyber-Resilienz durch ein Co-Evolutionäres Waffenrennen innerhalb einer verstärkten digitalen Twin Sandbox | 通过在强化数字双沙箱内共同推进的军备竞赛实现自动网络复原力 2506.20102v1 |
Authors (2): Malikussaid, Sutiyo
The convergence of IT and OT has created hyper-connected ICS, exposing critical infrastructure to a new class of adaptive, intelligent adversaries that render static defenses obsolete. Existing security paradigms often fail to address a foundational “Trinity of Trust,” comprising the fidelity of the system model, the integrity of synchronizing data, and the resilience of the analytical engine against sophisticated evasion. This paper introduces the ARC framework, a method for achieving analytical resilience through an autonomous, closed-loop hardening process. ARC establishes a perpetual co-evolutionary arms race within the high-fidelity sandbox of a F-SCDT. A DRL agent, the “Red Agent,” is formalized and incentivized to autonomously discover stealthy, physically-plausible attack paths that maximize process disruption while evading detection. Concurrently, an ensemble-based “Blue Agent” defender is continuously hardened via adversarial training against the evolving threats discovered by its adversary. This co-evolutionary dynamic forces both agents to become progressively more sophisticated, enabling the system to autonomously probe and patch its own vulnerabilities. Experimental validation on both the TEP and the SWaT testbeds demonstrates the framework’s superior performance. A comprehensive ablation study, supported by extensive visualizations including ROC curves and SHAP plots, reveals that the co-evolutionary process itself is responsible for a significant performance increase in detecting novel attacks. By integrating XAI to ensure operator trust and proposing a scalable F-ARC architecture, this work presents ARC not merely as an improvement, but as a necessary paradigm shift toward dynamic, self-improving security for the future of critical infrastructure.
nan
Article 352
Title@2025-06-25 (3): What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning
Title: What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning | Was in LLM-generierten Daten zählt: Vielfalt und ihre Wirkung auf Modell Feintuning | LLM产生的数据中哪些重要:多样性及其对模拟微调的影响 2506.19262v2 |
Authors (9): Yuchang Zhu, Huazhen Zhong, Qunshu Lin, Haotong Wei, Xiaolong Sun, Zixuan Yu, Minghao Liu, Zibin Zheng, Liang Chen
With the remarkable generative capabilities of large language models (LLMs), using LLM-generated data to train downstream models has emerged as a promising approach to mitigate data scarcity in specific domains and reduce time-consuming annotations. However, recent studies have highlighted a critical issue: iterative training on self-generated data results in model collapse, where model performance degrades over time. Despite extensive research on the implications of LLM-generated data, these works often neglect the importance of data diversity, a key factor in data quality. In this work, we aim to understand the implications of the diversity of LLM-generated data on downstream model performance. Specifically, we explore how varying levels of diversity in LLM-generated data affect downstream model performance. Additionally, we investigate the performance of models trained on data that mixes different proportions of LLM-generated data, which we refer to as synthetic data. Our experimental results show that, with minimal distribution shift, moderately diverse LLM-generated data can enhance model performance in scenarios with insufficient labeled data, whereas highly diverse generated data has a negative impact. We hope our empirical findings will offer valuable guidance for future studies on LLMs as data generators.
nan
Article 353
Title@2025-06-25 (3): MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
Title: MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations | MIRAGE: Benchmark für multimodale Informationssuche und -vernunft in sachverständigen Gesprächen in der Landwirtschaft | MIRAGE:农业专家指导下的农业多模式信息查找和说明理由基准 2506.20100v1 |
Authors (7): Vardhan Dongre, Chi Gui, Shubham Garg, Hooshang Nayyeri, Gokhan Tur, Dilek Hakkani-Tür, Vikram S. Adve
We introduce MIRAGE, a new benchmark for multimodal expert-level reasoning and decision-making in consultative interaction settings. Designed for the agriculture domain, MIRAGE captures the full complexity of expert consultations by combining natural user queries, expert-authored responses, and image-based context, offering a high-fidelity benchmark for evaluating models on grounded reasoning, clarification strategies, and long-form generation in a real-world, knowledge-intensive domain. Grounded in over 35,000 real user-expert interactions and curated through a carefully designed multi-step pipeline, MIRAGE spans diverse crop health, pest diagnosis, and crop management scenarios. The benchmark includes more than 7,000 unique biological entities, covering plant species, pests, and diseases, making it one of the most taxonomically diverse benchmarks available for vision-language models, grounded in the real world. Unlike existing benchmarks that rely on well-specified user inputs and closed-set taxonomies, MIRAGE features underspecified, context-rich scenarios with open-world settings, requiring models to infer latent knowledge gaps, handle rare entities, and either proactively guide the interaction or respond. Project Page: https://mirage-benchmark.github.io
nan
Article 354
Title@2025-06-25 (3): BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning
Title: BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning | BeltCrack: Der erste Sequential-Image-Industrie-Förderband Crack Detection Datensatz und seine Basis mit Triple-Domain Feature Learning | BeltCrack:第一个序列图像工业相像式工业电容器带裂缝探测数据集及其基线,包括三域主文学习 2506.17892v2 |
Authors (4): Jianghong Huang, Luping Ji, Xin Ma, Mao Ye
Conveyor belts are important equipment in modern industry, widely applied in production and manufacturing. Their health is much critical to operational efficiency and safety. Cracks are a major threat to belt health. Currently, considering safety, how to intelligently detect belt cracks is catching an increasing attention. To implement the intelligent detection with machine learning, real crack samples are believed to be necessary. However, existing crack datasets primarily focus on pavement scenarios or synthetic data, no real-world industrial belt crack datasets at all. Cracks are a major threat to belt health. Furthermore, to validate usability and effectiveness, we propose a special baseline method with triple-domain ($i.e.$, time-space-frequency) feature hierarchical fusion learning for the two whole-new datasets. Experimental results demonstrate the availability and effectiveness of our dataset. Besides, they also show that our baseline is obviously superior to other similar detection methods. Our datasets and source codes are available at https://github.com/UESTC-nnLab/BeltCrack.
nan
Article 355
Title@2025-06-25 (3): Fine-Grained Perturbation Guidance via Attention Head Selection
Title: Fine-Grained Perturbation Guidance via Attention Head Selection | Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl | 通过 “ 关注负责人甄选 “ 指导 2506.10978v2 |
Authors (10): Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Saungwu Lee, Sayak Paul, Susung Hong, Seungryong Kim
Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose “HeadHunter”, a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head’s attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.
nan
Article 356
Title@2025-06-25 (3): MEL: Multi-level Ensemble Learning for Resource-Constrained Environments
Title: MEL: Multi-level Ensemble Learning for Resource-Constrained Environments | MEL: Multi-Level-Ensemble-Lernen für ressourcenbeschränkte Umgebungen | MEL:为受资源制约的环境进行多层次连锁学习 2506.20094v1 |
Authors (7): Krishna Praneet Gudipaty, Walid A. Hanafy, Kaan Ozkara, Qianlin Liang, Jesse Milzman, Prashant Shenoy, Suhas Diggavi
AI inference at the edge is becoming increasingly common for low-latency services. However, edge environments are power- and resource-constrained, and susceptible to failures. Conventional failure resilience approaches, such as cloud failover or compressed backups, often compromise latency or accuracy, limiting their effectiveness for critical edge inference services. In this paper, we propose Multi-Level Ensemble Learning (MEL), a new framework for resilient edge inference that simultaneously trains multiple lightweight backup models capable of operating collaboratively, refining each other when multiple servers are available, and independently under failures while maintaining good accuracy. Specifically, we formulate our approach as a multi-objective optimization problem with a loss formulation that inherently encourages diversity among individual models to promote mutually refining representations, while ensuring each model maintains good standalone performance. Empirical evaluations across vision, language, and audio datasets show that MEL provides performance comparable to original architectures while also providing fault tolerance and deployment flexibility across edge platforms. Our results show that our ensemble model, sized at 40\% of the original model, achieves similar performance, while preserving 95.6\% of ensemble accuracy in the case of failures when trained using MEL.
nan
Article 357
Title@2025-06-25 (3): Understanding World or Predicting Future? A Comprehensive Survey of World Models
Title: Understanding World or Predicting Future? A Comprehensive Survey of World Models | Welt verstehen oder Zukunft voraussagen? Eine umfassende Übersicht über Weltmodelle | 了解世界或预测未来?世界模式综合概览 2411.14499v2 |
Authors (12): Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Zefang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, Fengli Xu, Yong Li
The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions. We summarize the representative papers along with their code repositories in https://github.com/tsinghua-fib-lab/World-Model.
nan
Article 358
Title@2025-06-25 (3): A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression
Title: A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression | Eine Übersicht über Predictive Maintenance Methods: Eine Analyse der Prognostik durch Klassifizierung und Regression | 预测维护方法调查:通过分类和递减分析预测指标 2506.20090v1 |
Authors (3): Ainaz Jamshidi, Dongchan Kim, Muhammad Arif
Predictive maintenance (PdM) has become a crucial element of modern industrial practice. PdM plays a significant role in operational dependability and cost management by decreasing unforeseen downtime and optimizing asset life cycle management. Machine learning and deep learning have enabled more precise forecasts of equipment failure and remaining useful life (RUL). Although many studies have been conducted on PdM, there has not yet been a standalone comparative study between regression- and classification-based approaches. In this review, we look across a range of PdM methodologies, while focusing more strongly on the comparative use of classification and regression methods in prognostics. While regression-based methods typically provide estimates of RUL, classification-based methods present a forecast of the probability of failure across defined time intervals. Through a comprehensive analysis of recent literature, we highlight key advancements, challenges-such as data imbalance and high-dimensional feature spaces-and emerging trends, including hybrid approaches and AI-enabled prognostic systems. This review aims to provide researchers and practitioners with an awareness of the strengths and compromises of various PdM methods and to help identify future research and build more robust, directed adaptive maintenance systems. Future work may include a systematic review of practical aspects such as public datasets, benchmarking platforms, and open-source tools to support the advancement of PdM research.
nan
Article 359
Title@2025-06-25 (3): Attack Smarter: Attention-Driven Fine-Grained Webpage Fingerprinting Attacks
Title: Attack Smarter: Attention-Driven Fine-Grained Webpage Fingerprinting Attacks | Attack Smarter: aufmerksamkeitsgetriebene feinkörnige Webseiten-Fingerprinting-Angriffe | 攻击智能:引人注意的精美网页指纹印攻击 2506.20082v1 |
Authors (3): Yali Yuan, Weiyi Zou, Guang Cheng
Website Fingerprinting (WF) attacks aim to infer which websites a user is visiting by analyzing traffic patterns, thereby compromising user anonymity. Although this technique has been demonstrated to be effective in controlled experimental environments, it remains largely limited to small-scale scenarios, typically restricted to recognizing website homepages. In practical settings, however, users frequently access multiple subpages in rapid succession, often before previous content fully loads. WebPage Fingerprinting (WPF) generalizes the WF framework to large-scale environments by modeling subpages of the same site as distinct classes. These pages often share similar page elements, resulting in lower inter-class variance in traffic features. Furthermore, we consider multi-tab browsing scenarios, in which a single trace encompasses multiple categories of webpages. This leads to overlapping traffic segments, and similar features may appear in different positions within the traffic, thereby increasing the difficulty of classification. To address these challenges, we propose an attention-driven fine-grained WPF attack, named ADWPF. Specifically, during the training phase, we apply targeted augmentation to salient regions of the traffic based on attention maps, including attention cropping and attention masking. ADWPF then extracts low-dimensional features from both the original and augmented traffic and applies self-attention modules to capture the global contextual patterns of the trace. Finally, to handle the multi-tab scenario, we employ the residual attention to generate class-specific representations of webpages occurring at different temporal positions. Extensive experiments demonstrate that the proposed method consistently surpasses state-of-the-art baselines across datasets of different scales.
nan
Article 360
Title@2025-06-25 (3): Federated Learning Clients Clustering with Adaptation to Data Drifts
Title: Federated Learning Clients Clustering with Adaptation to Data Drifts | Federated Learning Clients Clustering mit Anpassung an Daten Drifts | 采用适应数据流数据组合组合的联邦学习客户 2411.01580v2 |
Authors (6): Minghao Li, Dmitrii Avdiukhin, Rana Shahout, Nikita Ivkin, Vladimir Braverman, Minlan Yu
Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this by grouping clients with similar representations and training a separate model for each cluster. In practice, client data evolves over time, a phenomenon we refer to as data drift, which breaks cluster homogeneity and degrades performance. Data drift can take different forms depending on whether changes occur in the output values, the input features, or the relationship between them. We propose FIELDING, a CFL framework for handling diverse types of data drift with low overhead. FIELDING detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity. Experiments show that FIELDING improves final model accuracy by 1.9-5.9% and achieves target accuracy 1.16x-2.23x faster than existing state-of-the-art CFL methods.
nan
Article 361
Title@2025-06-25 (3): Quantum-Classical Hybrid Quantized Neural Network
Title: Quantum-Classical Hybrid Quantized Neural Network | Quantum-klassische Hybride Quantisiertes Neuronales Netzwerk | 量-量- 量- 量- 量 混合 量- 量- 量 量- 量 量- 混合 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量 量- 量- 量- 量- 量- 混合混合 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 质- 质- 量- 量- 量- 量- 量- 量- 质- 质- 量- 量- 质- 质- 质- 质- 质- 量- 量- 量- 量- 质- 质- 量- 量- 质- 质- 质- 质- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 量- 2506.18240v2 |
Authors (7): Wenxin Li, Chuan Wang, Hongdong Zhu, Qi Gao, Yin Ma, Hai Wei, Kai Wen
Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activation functions into linear subintervals. This approach preserves the universal approximation properties of neural networks while allowing complex nonlinear functions to be optimized using quantum computers, thus broadening their applicability in artificial intelligence. We provide theoretical upper bounds on the approximation error and the number of Ising spins required, by deriving the sample complexity of the empirical risk minimization problem, from an optimization perspective. A significant challenge in solving the associated Quadratic Constrained Binary Optimization (QCBO) model on a large scale is the presence of numerous constraints. When employing the penalty method to handle these constraints, tuning a large number of penalty coefficients becomes a critical hyperparameter optimization problem, increasing computational complexity and potentially affecting solution quality. To address this, we employ the Quantum Conditional Gradient Descent (QCGD) algorithm, which leverages quantum computing to directly solve the QCBO problem. We prove the convergence of QCGD under a quantum oracle with randomness and bounded variance in objective value, as well as under limited precision constraints in the coefficient matrix. Additionally, we provide an upper bound on the Time-To-Solution for the QCBO solving process. Experimental results using a coherent Ising machine (CIM) demonstrate a 94.95% accuracy on the Fashion MNIST classification task, with only 1.1-bit precision.
nan
Article 362
Title@2025-06-25 (3): mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Title: mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks | mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben | mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v2 |
Authors (7): Luel Hagos Beyene, Vivek Verma, Min Ma, Jesujoba O. Alabi, Fabian David Schmidt, Joyce Nakatumba-Nabende, David Ifeoluwa Adelani
Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.
nan
Article 363
Title@2025-06-25 (3): A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs
Title: A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs | Ein modulares Multitask-Reasoning-Framework Integrating Spatio-temporal Models und LLMs | 纳入时空空间模型和LLMs的模块多任务解释框架 2506.20073v1 |
Authors (6): Kethmi Hirushini Hettige, Jiahao Ji, Cheng Long, Shili Xiang, Gao Cong, Jingyuan Wang
Spatio-temporal data mining plays a pivotal role in informed decision making across diverse domains. However, existing models are often restricted to narrow tasks, lacking the capacity for multi-task inference and complex long-form reasoning that require generation of in-depth, explanatory outputs. These limitations restrict their applicability to real-world, multi-faceted decision scenarios. In this work, we introduce STReason, a novel framework that integrates the reasoning strengths of large language models (LLMs) with the analytical capabilities of spatio-temporal models for multi-task inference and execution. Without requiring task-specific finetuning, STReason leverages in-context learning to decompose complex natural language queries into modular, interpretable programs, which are then systematically executed to generate both solutions and detailed rationales. To facilitate rigorous evaluation, we construct a new benchmark dataset and propose a unified evaluation framework with metrics specifically designed for long-form spatio-temporal reasoning. Experimental results show that STReason significantly outperforms advanced LLM baselines across all metrics, particularly excelling in complex, reasoning-intensive spatio-temporal scenarios. Human evaluations further validate STReason’s credibility and practical utility, demonstrating its potential to reduce expert workload and broaden the applicability to real-world spatio-temporal tasks. We believe STReason provides a promising direction for developing more capable and generalizable spatio-temporal reasoning systems.
nan
Article 364
Title@2025-06-25 (3): Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges
Title: Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges | Leichte Fußgängererkennung in Sicht- und Infrarotbild-Feeds: Probleme und Herausforderungen | 可见和红外图像输入中的低亮害虫探测:问题和挑战 2311.08557v3 |
Authors (2): Thangarajah Akilan, Hrishikesh Vachhani
Pedestrian detection has become a cornerstone for several high-level tasks, including autonomous driving, intelligent transportation, and traffic surveillance. There are several works focussed on pedestrian detection using visible images, mainly in the daytime. However, this task is very intriguing when the environmental conditions change to poor lighting or nighttime. Recently, new ideas have been spurred to use alternative sources, such as Far InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light conditions. This study reviews recent developments in low-light pedestrian detection approaches. It systematically categorizes and analyses various algorithms from region-based to non-region-based and graph-based learning methodologies by highlighting their methodologies, implementation issues, and challenges. It also outlines the key benchmark datasets that can be used for research and development of advanced pedestrian detection algorithms, particularly in low-light situations.
nan
Article 365
Title@2025-06-25 (3): Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision
Title: Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision | Multimodale Informationen Retrieval für offene Welt mit Edit Distanz Schwache Überwachung | 编辑远程弱力监督的开放世界多模式信息检索器 2506.20070v1 |
Authors (2): KMA Solaiman, Bharat Bhargava
Existing multi-media retrieval models either rely on creating a common subspace with modality-specific representation models or require schema mapping among modalities to measure similarities among multi-media data. Our goal is to avoid the annotation overhead incurred from considering retrieval as a supervised classification task and re-use the pretrained encoders in large language models and vision tasks. We propose “FemmIR”, a framework to retrieve multimodal results relevant to information needs expressed with multimodal queries by example without any similarity label. Such identification is necessary for real-world applications where data annotations are scarce and satisfactory performance is required without fine-tuning with a common framework across applications. We curate a new dataset called MuQNOL for benchmarking progress on this task. Our technique is based on weak supervision introduced through edit distance between samples: graph edit distance can be modified to consider the cost of replacing a data sample in terms of its properties, and relevance can be measured through the implicit signal from the amount of edit cost among the objects. Unlike metric learning or encoding networks, FemmIR re-uses the high-level properties and maintains the property value and relationship constraints with a multi-level interaction score between data samples and the query example provided by the user. We empirically evaluate FemmIR on a missing person use case with MuQNOL. FemmIR performs comparably to similar retrieval systems in delivering on-demand retrieval results with exact and approximate similarities while using the existing property identifiers in the system.
nan
Article 366
Title@2025-06-25 (3): Thought Anchors: Which LLM Reasoning Steps Matter?
Title: Thought Anchors: Which LLM Reasoning Steps Matter? | Thought Anchors: Welche LLM-Gründungsschritte sind wichtig? | 何为理据步骤? 2506.19143v2 |
Authors (4): Paul C. Bogdan, Uzay Macar, Neel Nanda, Arthur Conmy
Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method measuring each sentence’s counterfactual importance by comparing final answers across 100 rollouts conditioned on the model generating that sentence or one with a different meaning; (2) a white-box method of aggregating attention patterns between pairs of sentences, which identified “broadcasting” sentences that receive disproportionate attention from all future sentences via “receiver” attention heads; (3) a causal attribution method measuring logical connections between sentences by suppressing attention toward one sentence and measuring the effect on each future sentence’s tokens. Each method provides evidence for the existence of thought anchors, reasoning steps that have outsized importance and that disproportionately influence the subsequent reasoning process. These thought anchors are typically planning or backtracking sentences. We provide an open-source tool (www.thought-anchors.com) for visualizing the outputs of our methods, and present a case study showing converging patterns across methods that map how a model performs multi-step reasoning. The consistency across methods demonstrates the potential of sentence-level analysis for a deeper understanding of reasoning models.
nan
Article 367
Title@2025-06-25 (3): Conformal Prediction with Upper and Lower Bound Models
Title: Conformal Prediction with Upper and Lower Bound Models | Konforme Vorhersage mit oberen und unteren Bound-Modellen | 与上下下两界模型的非正规预测 2503.04071v2 |
Authors (5): Miao Li, Michael Klamkin, Mathieu Tanneau, Reza Zandehshahvar, Pascal Van Hentenryck
This paper studies a Conformal Prediction (CP) methodology for building prediction intervals in a regression setting, given only deterministic lower and upper bounds on the target variable. It proposes a new CP mechanism (CPUL) that goes beyond post-processing by adopting a model selection approach over multiple nested interval construction methods. Paradoxically, many well-established CP methods, including CPUL, may fail to provide adequate coverage in regions where the bounds are tight. To remedy this limitation, the paper proposes an optimal thresholding mechanism, OMLT, that adjusts CPUL intervals in tight regions with undercoverage. The combined CPUL-OMLT is validated on large-scale learning tasks where the goal is to bound the optimal value of a parametric optimization problem. The experimental results demonstrate substantial improvements over baseline methods across various datasets.
nan
Article 368
Title@2025-06-24 (2): Identifying Heterogeneity in Distributed Learning
Title: Identifying Heterogeneity in Distributed Learning | Heterogenität im verteilten Lernen identifizieren | 确定分布式学习中的差异性 2506.16394v3 |
Authors (3): Zelin Xiao, Jia Gu, Song Xi Chen
We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission. One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed data blocks $K$ is of a smaller order of the minimum block sample size and the level of heterogeneity is dense. The second one is an extreme contrast test (ECT) based on the difference between the largest and smallest component-wise estimated parameters among data blocks. By introducing a sample splitting procedure, the ECT can avoid the bias accumulation arising from the M-estimation procedures, and exhibits consistency for $K$ being much larger than the sample size while the heterogeneity is sparse. The ECT procedure is easy to operate and communication-efficient. A combination of the Wald and the extreme contrast tests is formulated to attain more robust power under varying levels of sparsity of the heterogeneity. We also conduct intensive numerical experiments to compare the family-wise error rate (FWER) and the power of the proposed methods. Additionally, we conduct a case study to present the implementation and validity of the proposed methods.
nan
Article 369
Title@2025-06-24 (2): Supervised Coupled Matrix-Tensor Factorization (SCMTF) for Computational Phenotyping of Patient Reported Outcomes in Ulcerative Colitis
Title: Supervised Coupled Matrix-Tensor Factorization (SCMTF) for Computational Phenotyping of Patient Reported Outcomes in Ulcerative Colitis | Überwachte gekoppelte Matrix-Tensor-Fabrikation (SCMTF) für Computational Phenotyping von Patienten berichteten Ergebnisse bei Ulcerative Colitis | 受监督的用于计算表性科结结结结结结结结结结结结结果的病人报告结果的计算式基因分析的矩阵-传感器系数(SCMTF) 2506.20065v1 |
Authors (4): Cristian Minoccheri, Sophia Tesic, Kayvan Najarian, Ryan Stidham
Phenotyping is the process of distinguishing groups of patients to identify different types of disease progression. A recent trend employs low-rank matrix and tensor factorization methods for their capability of dealing with multi-modal, heterogeneous, and missing data. Symptom quantification is crucial for understanding patient experiences in inflammatory bowel disease, especially in conditions such as ulcerative colitis (UC). However, patient-reported symptoms are typically noisy, subjective, and significantly more sparse than other data types. For this reason, they are usually not included in phenotyping and other machine learning methods. This paper explores the application of computational phenotyping to leverage Patient-Reported Outcomes (PROs) using a novel supervised coupled matrix-tensor factorization (SCMTF) method, which integrates temporal PROs and temporal labs with static features to predict medication persistence in ulcerative colitis. This is the first tensor-based method that is both supervised and coupled, it is the first application to the UC domain, and the first application to PROs. We use a deep learning framework that makes the model flexible and easy to train. The proposed method allows us to handle the large amount of missing data in the PROs. The best model predicts changes in medication 8 and 20 months in the future with AUCs of 0.853 and 0.803 on the test set respectively. We derive interpretable phenotypes consisting of static features and temporal features (including their temporal patterns). We show that low-rank matrix and tensor based phenotyping can be successfully applied to the UC domain and to highly missing PRO data. We identify phenotypes useful to predict medication persistence - these phenotypes include several symptom variables, showing that PROs contain relevant infromation that is usually discarded.
nan
Article 370
Title@2025-06-24 (2): Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models
Title: Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models | Lernen von Instruction-Following-Richtlinien durch offenes Instruction-Relabeling mit großen Sprachmodellen | 通过不限名额指令与大语言模式重新标签 2506.20061v1 |
Authors (4): Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang
Developing effective instruction-following policies in reinforcement learning remains challenging due to the reliance on extensive human-labeled instruction datasets and the difficulty of learning from sparse rewards. In this paper, we propose a novel approach that leverages the capabilities of large language models (LLMs) to automatically generate open-ended instructions retrospectively from previously collected agent trajectories. Our core idea is to employ LLMs to relabel unsuccessful trajectories by identifying meaningful subtasks the agent has implicitly accomplished, thereby enriching the agent’s training data and substantially alleviating reliance on human annotations. Through this open-ended instruction relabeling, we efficiently learn a unified instruction-following policy capable of handling diverse tasks within a single policy. We empirically evaluate our proposed method in the challenging Craftax environment, demonstrating clear improvements in sample efficiency, instruction coverage, and overall policy performance compared to state-of-the-art baselines. Our results highlight the effectiveness of utilizing LLM-guided open-ended instruction relabeling to enhance instruction-following reinforcement learning.
nan
Article 371
Title@2025-06-24 (2): The Alignment Trap: Complexity Barriers
Title: The Alignment Trap: Complexity Barriers | Die Alignment-Falle: Komplexitätsbarrieren | 协调陷阱:复杂障碍 2506.10304v2 |
Authors (1): Jasper Yao
This paper argues that AI alignment is not merely difficult, but is founded on a fundamental logical contradiction. We first establish The Enumeration Paradox: we use machine learning precisely because we cannot enumerate all necessary safety rules, yet making ML safe requires examples that can only be generated from the very enumeration we admit is impossible. This paradox is then confirmed by a set of five independent mathematical proofs, or “pillars of impossibility.” Our main results show that: (1) Geometric Impossibility: The set of safe policies has measure zero, a necessary consequence of projecting infinite-dimensional world-context requirements onto finite-dimensional models. (2) Computational Impossibility: Verifying a policy’s safety is coNP-complete, even for non-zero error tolerances. (3) Statistical Impossibility: The training data required for safety (abundant examples of rare disasters) is a logical contradiction and thus unobtainable. (4) Information-Theoretic Impossibility: Safety rules contain more incompressible, arbitrary information than any feasible network can store. (5) Dynamic Impossibility: The optimization process for increasing AI capability is actively hostile to safety, as the gradients for the two objectives are generally anti-aligned. Together, these results demonstrate that the pursuit of safe, highly capable AI is not a matter of overcoming technical hurdles, but of confronting fundamental, interlocking barriers. The paper concludes by presenting a strategic trilemma that these impossibilities force upon the field. A formal verification of the core theorems in Lean4 is currently in progress.
nan
Article 372
Title@2025-06-24 (2): Universal pre-training by iterated random computation
Title: Universal pre-training by iterated random computation | Universelles Pre-Training durch iterierte Zufallsberechnung | 通过迭代随机计算进行通用预培训 2506.20057v1 |
Authors (1): Peter Bloem
We investigate the use of randomly generated data for the sake of pre-training a model. We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We derive similar, but complementary theoretical results. We show empirically that synthetically generated data can be used to pre-train a model before the data is seen. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization.
nan
Article 373
Title@2025-06-24 (2): Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization
Title: Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization | Machine-Learning-Assisted Photonic Device Development: Ein multiskaliger Ansatz von der Theorie zur Charakterisierung | 机学辅助光学设备开发:从理论到定性的多尺度方法 2506.20056v1 |
Authors (19): Yuheng Chen, Alexander Montes McNeil, Taehyuk Park, Blake A. Wilson, Vaishnavi Iyer, Michael Bezick, Jae-Ik Choi, Rohan Ojha, Pravin Mahendran, Daksh Kumar Singh, Geetika Chitturi, Peigang Chen, Trang Do, Alexander V. Kildishev, Vladimir M. Shalaev, Michael Moebius, Wenshan Cai, Yongmin Liu, Alexandra Boltasseva
Photonic device development (PDD) has achieved remarkable success in designing and implementing new devices for controlling light across various wavelengths, scales, and applications, including telecommunications, imaging, sensing, and quantum information processing. PDD is an iterative, five-step process that consists of: i) deriving device behavior from design parameters, ii) simulating device performance, iii) finding the optimal candidate designs from simulations, iv) fabricating the optimal device, and v) measuring device performance. Classically, all these steps involve Bayesian optimization, material science, control theory, and direct physics-driven numerical methods. However, many of these techniques are computationally intractable, monetarily costly, or difficult to implement at scale. In addition, PDD suffers from large optimization landscapes, uncertainties in structural or optical characterization, and difficulties in implementing robust fabrication processes. However, the advent of machine learning over the past decade has provided novel, data-driven strategies for tackling these challenges, including surrogate estimators for speeding up computations, generative modeling for noisy measurement modeling and data augmentation, reinforcement learning for fabrication, and active learning for experimental physical discovery. In this review, we present a comprehensive perspective on these methods to enable machine-learning-assisted PDD (ML-PDD) for efficient design optimization with powerful generative models, fast simulation and characterization modeling under noisy measurements, and reinforcement learning for fabrication. This review will provide researchers from diverse backgrounds with valuable insights into this emerging topic, fostering interdisciplinary efforts to accelerate the development of complex photonic devices and systems.
nan
Article 374
Title@2025-06-24 (2): MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models
Title: MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models | MegaFold: System-Level-Optimierungen zur Beschleunigung von Proteinstruktur-Vorhersagemodellen | MegaFold:加速蛋白质结构结构预测模型的全系统优化 2506.20686v1 |
Authors (5): Hoa La, Ahan Gupta, Alex Morehead, Jianlin Cheng, Minjia Zhang
Protein structure prediction models such as AlphaFold3 (AF3) push the frontier of biomolecular modeling by incorporating science-informed architectural changes to the transformer architecture. However, these advances come at a steep system cost, introducing: compute- and memory-intensive operators, 2D attention mechanisms, and retrieval-augmented data pipelines, which collectively hinder the scalability of AF3 training. In this work, we present MegaFold, a cross-platform system to accelerate AF3 training. MegaFold tackles key bottlenecks through ahead-of-time caching to eliminate GPU idle time from the retrieval-augmented data pipeline, Triton-based kernels for memory-efficient EvoAttention on heterogeneous devices, and deep fusion for common and critical small operators in AF3. Evaluation on both NVIDIA H200 and AMD MI250 GPUs shows that MegaFold reduces peak memory usage of AF3 training by up to 1.23$\times$ and improves per-iteration training time by up-to 1.73$\times$ and 1.62$\times$ respectively. More importantly, MegaFold enables training on 1.35$\times$ longer sequence lengths compared to PyTorch baselines without running out-of-memory, significantly improving the scalability of modern protein folding models. We open source our code at https://github.com/Supercomputing-System-AI-Lab/MegaFold/.
nan
Article 375
Title@2025-06-24 (2): A Principled Path to Fitted Distributional Evaluation
Title: A Principled Path to Fitted Distributional Evaluation | Ein prinzipieller Weg zur integrierten Verteilungsevaluierung | 合格分配评价的一条原则性道路 2506.20048v1 |
Authors (4): Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond Ka Wai Wong
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation – developed for expectation-based reinforcement learning – to the distributional OPE setting. We refer to this extension as fitted distributional evaluation (FDE). While only a few related approaches exist, there remains no unified framework for designing FDE methods. To fill this gap, we present a set of guiding principles for constructing theoretically grounded FDE methods. Building on these principles, we develop several new FDE methods with convergence analysis and provide theoretical justification for existing methods, even in non-tabular environments. Extensive experiments, including simulations on linear quadratic regulators and Atari games, demonstrate the superior performance of the FDE methods.
nan
Article 376
Title@2025-06-24 (2): GNN’s Uncertainty Quantification using Self-Distillation
Title: GNN’s Uncertainty Quantification using Self-Distillation | Die Unbestimmtheitsquantifizierung von GNN mittels Selbstdestillation | GNN 使用自处理法对不确定性进行量化 2506.20046v1 |
Authors (2): Hirad Daneshvar, Reza Samavi
Graph Neural Networks (GNNs) have shown remarkable performance in the healthcare domain. However, what remained challenging is quantifying the predictive uncertainty of GNNs, which is an important aspect of trustworthiness in clinical settings. While Bayesian and ensemble methods can be used to quantify uncertainty, they are computationally expensive. Additionally, the disagreement metric used by ensemble methods to compute uncertainty cannot capture the diversity of models in an ensemble network. In this paper, we propose a novel method, based on knowledge distillation, to quantify GNNs’ uncertainty more efficiently and with higher precision. We apply self-distillation, where the same network serves as both the teacher and student models, thereby avoiding the need to train several networks independently. To ensure the impact of self-distillation, we develop an uncertainty metric that captures the diverse nature of the network by assigning different weights to each GNN classifier. We experimentally evaluate the precision, performance, and ability of our approach in distinguishing out-of-distribution data on two graph datasets: MIMIC-IV and Enzymes. The evaluation results demonstrate that the proposed method can effectively capture the predictive uncertainty of the model while having performance similar to that of the MC Dropout and ensemble methods. The code is publicly available at https://github.com/tailabTMU/UQ_GNN.
nan
Article 377
Title@2025-06-24 (2): PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning
Title: PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning | PocketVina ermöglicht skalierbare und hochgenaue physikalisch gültige Docking durch Multi-Pocket-Konditionierung | PocketVina 通过多盘附加条件, 使可缩放和高度精确的物理有效折叠 2506.20043v1 |
Authors (4): Ahmet Sarigun, Bora Uyar, Vedran Franke, Altuna Akalin
Sampling physically valid ligand-binding poses remains a major challenge in molecular docking, particularly for unseen or structurally diverse targets. We introduce PocketVina, a fast and memory-efficient, search-based docking framework that combines pocket prediction with systematic multi-pocket exploration. We evaluate PocketVina across four established benchmarks–PDBbind2020 (timesplit and unseen), DockGen, Astex, and PoseBusters–and observe consistently strong performance in sampling physically valid docking poses. PocketVina achieves state-of-the-art performance when jointly considering ligand RMSD and physical validity (PB-valid), while remaining competitive with deep learning-based approaches in terms of RMSD alone, particularly on structurally diverse and previously unseen targets. PocketVina also maintains state-of-the-art physically valid docking accuracy across ligands with varying degrees of flexibility. We further introduce TargetDock-AI, a benchmarking dataset we curated, consisting of over 500000 protein-ligand pairs, and a partition of the dataset labeled with PubChem activity annotations. On this large-scale dataset, PocketVina successfully discriminates active from inactive targets, outperforming a deep learning baseline while requiring significantly less GPU memory and runtime. PocketVina offers a robust and scalable docking strategy that requires no task-specific training and runs efficiently on standard GPUs, making it well-suited for high-throughput virtual screening and structure-based drug discovery.
nan
Article 378
Title@2025-06-24 (2): LSH-DynED: A Dynamic Ensemble Framework with LSH-Based Undersampling for Evolving Multi-Class Imbalanced Classification
Title: LSH-DynED: A Dynamic Ensemble Framework with LSH-Based Undersampling for Evolving Multi-Class Imbalanced Classification | LSH-DynED: Ein dynamisches Ensemble-Framework mit LSH-basierter Unterprobe für die Evolving-Multi-Class-Unausgeglichene Klassifizierung | LSH-Dyned:一个动态组合框架,以基于LSH的下层取样为基础,用于不断演化的多类综合分类 2506.20041v1 |
Authors (2): Soheil Abadifard, Fazli Can
The classification of imbalanced data streams, which have unequal class distributions, is a key difficulty in machine learning, especially when dealing with multiple classes. While binary imbalanced data stream classification tasks have received considerable attention, only a few studies have focused on multi-class imbalanced data streams. Effectively managing the dynamic imbalance ratio is a key challenge in this domain. This study introduces a novel, robust, and resilient approach to address these challenges by integrating Locality Sensitive Hashing with Random Hyperplane Projections (LSH-RHP) into the Dynamic Ensemble Diversification (DynED) framework. To the best of our knowledge, we present the first application of LSH-RHP for undersampling in the context of imbalanced non-stationary data streams. The proposed method undersamples the majority classes by utilizing LSH-RHP, provides a balanced training set, and improves the ensemble’s prediction performance. We conduct comprehensive experiments on 23 real-world and ten semi-synthetic datasets and compare LSH-DynED with 15 state-of-the-art methods. The results reveal that LSH-DynED outperforms other approaches in terms of both Kappa and mG-Mean effectiveness measures, demonstrating its capability in dealing with multi-class imbalanced non-stationary data streams. Notably, LSH-DynED performs well in large-scale, high-dimensional datasets with considerable class imbalances and demonstrates adaptation and robustness in real-world circumstances. To motivate our design, we review existing methods for imbalanced data streams, outline key challenges, and offer guidance for future work. For the reproducibility of our results, we have made our implementation available on GitHub.
nan
Article 379
Title@2025-06-24 (2): Cross-Layer Discrete Concept Discovery for Interpreting Language Models
Title: Cross-Layer Discrete Concept Discovery for Interpreting Language Models | Cross-Layer Discrete Concept Discovery für Interpretationssprachmodelle | 解释语言模型的跨语言监听概念发现 2506.20040v1 |
Authors (4): Ankur Garg, Xuemin Yu, Hassan Sajjad, Samira Ebrahimi Kahou
Uncovering emergent concepts across transformer layers remains a significant challenge because the residual stream linearly mixes and duplicates information, obscuring how features evolve within large language models. Current research efforts primarily inspect neural representations at single layers, thereby overlooking this cross-layer superposition and the redundancy it introduces. These representations are typically either analyzed directly for activation patterns or passed to probing classifiers that map them to a limited set of predefined concepts. To address these limitations, we propose \gls{clvqvae}, a framework that uses vector quantization to map representations across layers and in the process collapse duplicated residual-stream features into compact, interpretable concept vectors. Our approach uniquely combines top-$k$ temperature-based sampling during quantization with EMA codebook updates, providing controlled exploration of the discrete latent space while maintaining code-book diversity. We further enhance the framework with scaled-spherical k-means++ for codebook initialization, which clusters by directional similarity rather than magnitude, better aligning with semantic structure in word embedding space.
nan
Article 380
Title@2025-06-24 (2): Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning
Title: Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning | Bilaterale Teambildung im kooperativen Multi-Agenten-Verstärkungs-Lernen lernen | 合作多机构加强合作学习双边学习小组 2506.20039v1 |
Authors (2): Koorosh Moslemi, Chi-Guhn Lee
Team formation and the dynamics of team-based learning have drawn significant interest in the context of Multi-Agent Reinforcement Learning (MARL). However, existing studies primarily focus on unilateral groupings, predefined teams, or fixed-population settings, leaving the effects of algorithmic bilateral grouping choices in dynamic populations underexplored. To address this gap, we introduce a framework for learning two-sided team formation in dynamic multi-agent systems. Through this study, we gain insight into what algorithmic properties in bilateral team formation influence policy performance and generalization. We validate our approach using widely adopted multi-agent scenarios, demonstrating competitive performance and improved generalization in most scenarios.
nan
Article 381
Title@2025-06-24 (2): Verifiable Unlearning on Edge
Title: Verifiable Unlearning on Edge | Überprüfbares Lernen am Rande | 边缘不可核实的学习 2506.20037v1 |
Authors (3): Mohammad M Maheri, Alex Davidson, Hamed Haddadi
Machine learning providers commonly distribute global models to edge devices, which subsequently personalize these models using local data. However, issues such as copyright infringements, biases, or regulatory requirements may require the verifiable removal of certain data samples across all edge devices. Ensuring that edge devices correctly execute such unlearning operations is critical to maintaining integrity. In this work, we introduce a verification framework leveraging zero-knowledge proofs, specifically zk-SNARKs, to confirm data unlearning on personalized edge-device models without compromising privacy. We have developed algorithms explicitly designed to facilitate unlearning operations that are compatible with efficient zk-SNARK proof generation, ensuring minimal computational and memory overhead suitable for constrained edge environments. Furthermore, our approach carefully preserves personalized enhancements on edge devices, maintaining model performance post-unlearning. Our results affirm the practicality and effectiveness of this verification framework, demonstrating verifiable unlearning with minimal degradation in personalization-induced performance improvements. Our methodology ensures verifiable, privacy-preserving, and effective machine unlearning across edge devices.
nan
Article 382
Title@2025-06-24 (2): Neural network-based Godunov corrections for approximate Riemann solvers using bi-fidelity learning
Title: Neural network-based Godunov corrections for approximate Riemann solvers using bi-fidelity learning | Neurale netzwerkbasierte Godunov-Korrekturen für ungefähre Riemann-Löser mit Bi-Fidelity-Lernen | 近似Riemann的Riemann解决者使用双性忠诚学习校正 2503.13248v2 |
Authors (2): Akshay Thakur, Matthew J. Zahr
The Riemann problem is fundamental in the computational modeling of hyperbolic partial differential equations, enabling the development of stable and accurate upwind schemes. While exact solvers provide robust upwinding fluxes, their high computational cost necessitates approximate solvers. Although approximate solvers achieve accuracy in many scenarios, they produce inaccurate solutions in certain cases. To overcome this limitation, we propose constructing neural network-based surrogate models, trained using supervised learning, designed to map interior and exterior conservative state variables to the corresponding exact flux. Specifically, we propose two distinct approaches: one utilizing a vanilla neural network and the other employing a bi-fidelity neural network. The performance of the proposed approaches is demonstrated through applications to one-dimensional and two-dimensional partial differential equations, showcasing their robustness and accuracy.
nan
Article 383
Title@2025-06-24 (2): Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning
Title: Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning | Automatisierte Generierung von vielfältigen Handlungskursen für Multi-Agenten-Betriebe mit Binäroptimierung und Graphen-Lernen | 利用二进制优化和图表学习,自动产生多种多机构业务行动多样化行动方案 2506.20031v1 |
Authors (4): Prithvi Poddar, Ehsan Tarkesh Esfahani, Karthik Dantu, Souma Chowdhury
Operations in disaster response, search \& rescue, and military missions that involve multiple agents demand automated processes to support the planning of the courses of action (COA). Moreover, traverse-affecting changes in the environment (rain, snow, blockades, etc.) may impact the expected performance of a COA, making it desirable to have a pool of COAs that are diverse in task distributions across agents. Further, variations in agent capabilities, which could be human crews and/or autonomous systems, present practical opportunities and computational challenges to the planning process. This paper presents a new theoretical formulation and computational framework to generate such diverse pools of COAs for operations with soft variations in agent-task compatibility. Key to the problem formulation is a graph abstraction of the task space and the pool of COAs itself to quantify its diversity. Formulating the COAs as a centralized multi-robot task allocation problem, a genetic algorithm is used for (order-ignoring) allocations of tasks to each agent that jointly maximize diversity within the COA pool and overall compatibility of the agent-task mappings. A graph neural network is trained using a policy gradient approach to then perform single agent task sequencing in each COA, which maximizes completion rates adaptive to task features. Our tests of the COA generation process in a simulated environment demonstrate significant performance gain over a random walk baseline, small optimality gap in task sequencing, and execution time of about 50 minutes to plan up to 20 COAs for 5 agent/100 task operations.
nan
Article 384
Title@2025-06-24 (2): Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
Title: Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining | Daumen auf der Waage: Optimaler Verlustgewichtung in Last Layer Retraining | 缩放缩略图: 上层再训练中的最佳损耗 2506.20025v1 |
Authors (3): Nathan Stromberg, Christos Thrampoulidis, Lalitha Sankar
While machine learning models become more capable in discriminative tasks at scale, their ability to overcome biases introduced by training data has come under increasing scrutiny. Previous results suggest that there are two extremes of parameterization with very different behaviors: the population (underparameterized) setting where loss weighting is optimal and the separable overparameterized setting where loss weighting is ineffective at ensuring equal performance across classes. This work explores the regime of last layer retraining (LLR) in which the unseen limited (retraining) data is frequently inseparable and the model proportionately sized, falling between the two aforementioned extremes. We show, in theory and practice, that loss weighting is still effective in this regime, but that these weights \emph{must} take into account the relative overparameterization of the model.
nan
Article 385
Title@2025-06-24 (2): Evaluating Long Range Dependency Handling in Code Generation LLMs
Title: Evaluating Long Range Dependency Handling in Code Generation LLMs | Bewertung der Langzeitabhängigkeitsbehandlung in LLMs der Code-Generation | 评估代码生成中的长期依赖性处理 2407.21049v2 |
Authors (2): Yannick Assogba, Donghao Ren
As language models support larger and larger context sizes, evaluating their ability to make effective use of that context becomes increasingly important. We analyze the ability of several code generation models to handle long range dependencies using a suite of multi-step key retrieval tasks in context windows up to 8k tokens in length. The tasks progressively increase in difficulty and allow more nuanced evaluation of model capabilities than tests like the popular needle-in-the-haystack test. We find that performance degrades significantly for many models (up to 2x) when a function references another function that is defined later in the prompt. We also observe that models that use sliding window attention mechanisms have difficulty handling references further than the size of a single window. We perform simple prompt modifications using call graph information to improve multi-step retrieval performance up to 3x. Our analysis highlights ways that long-context performance needs deeper consideration beyond retrieval of single facts within a document.
nan
Article 386
Title@2025-06-24 (2): Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting
Title: Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting | Erklärte Rollendiffusionsmodelle für probabilistische Wettervorhersagen | 预测概率天气预测的显学滚滚传播模型 2506.20024v1 |
Authors (7): Salva Rühling Cachay, Miika Aittala, Karsten Kreis, Noah Brenowitz, Arash Vahdat, Morteza Mardani, Rose Yu
Diffusion models are a powerful tool for probabilistic forecasting, yet most applications in high-dimensional chaotic systems predict future snapshots one-by-one. This common approach struggles to model complex temporal dependencies and fails to explicitly account for the progressive growth of uncertainty inherent to such systems. While rolling diffusion frameworks, which apply increasing noise to forecasts at longer lead times, have been proposed to address this, their integration with state-of-the-art, high-fidelity diffusion techniques remains a significant challenge. We tackle this problem by introducing Elucidated Rolling Diffusion Models (ERDM), the first framework to successfully unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM). To do this, we adapt the core EDM components-its noise schedule, network preconditioning, and Heun sampler-to the rolling forecast setting. The success of this integration is driven by three key contributions: (i) a novel loss weighting scheme that focuses model capacity on the mid-range forecast horizons where determinism gives way to stochasticity; (ii) an efficient initialization strategy using a pre-trained EDM for the initial window; and (iii) a bespoke hybrid sequence architecture for robust spatiotemporal feature extraction under progressive denoising. On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5^\circ resolution, ERDM consistently outperforms key diffusion-based baselines, including conditional autoregressive EDM. ERDM offers a flexible and powerful general framework for tackling diffusion-based sequence generation problems where modeling escalating uncertainty is paramount. Code is available at: https://github.com/salvaRC/erdm
nan
Article 387
Title@2025-06-24 (2): DIM-SUM: Dynamic IMputation for Smart Utility Management
Title: DIM-SUM: Dynamic IMputation for Smart Utility Management | DIM-SUM: Dynamische Imputation für intelligentes Utility Management | DIM-SUM: 智能工具管理动态数字 2506.20023v1 |
Authors (5): Ryan Hildebrant, Rahul Bhope, Sharad Mehrotra, Christopher Tull, Nalini Venkatasubramanian
Time series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.
nan
Article 388
Title@2025-06-24 (2): New Insights on Unfolding and Fine-tuning Quantum Federated Learning
Title: New Insights on Unfolding and Fine-tuning Quantum Federated Learning | Neue Erkenntnisse zum Entfalten und Feintuning Quantum-Federated Learning | 新《关于不增加和微调量量子联邦学习的新观点》 2506.20016v1 |
Authors (2): Shanika Iroshi Nanayakkara, Shiva Raj Pokhrel
Client heterogeneity poses significant challenges to the performance of Quantum Federated Learning (QFL). To overcome these limitations, we propose a new approach leveraging deep unfolding, which enables clients to autonomously optimize hyperparameters, such as learning rates and regularization factors, based on their specific training behavior. This dynamic adaptation mitigates overfitting and ensures robust optimization in highly heterogeneous environments where standard aggregation methods often fail. Our framework achieves approximately 90% accuracy, significantly outperforming traditional methods, which typically yield around 55% accuracy, as demonstrated through real-time training on IBM quantum hardware and Qiskit Aer simulators. By developing self adaptive fine tuning, the proposed method proves particularly effective in critical applications such as gene expression analysis and cancer detection, enhancing diagnostic precision and predictive modeling within quantum systems. Our results are attributed to convergence-aware, learnable optimization steps intrinsic to the deep unfolded framework, which maintains the generalization. Hence, this study addresses the core limitations of conventional QFL, streamlining its applicability to any complex challenges such as healthcare and genomic research.
nan
Article 389
Title@2025-06-24 (2): Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion
Title: Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion | Auf dem Weg zu besseren Benchmark-Datensätzen für induktive Wissensgraphenvervollständigung | 建立更好的基准数据集,以完成引入知识图的完成 2406.11898v3 |
Authors (3): Harry Shomer, Jay Revolinsky, Jiliang Tang
Knowledge Graph Completion (KGC) attempts to predict missing facts in a Knowledge Graph (KG). Recently, there’s been an increased focus on designing KGC methods that can excel in the inductive setting, where a portion or all of the entities and relations seen in inference are unobserved during training. Numerous benchmark datasets have been proposed for inductive KGC, all of which are subsets of existing KGs used for transductive KGC. However, we find that the current procedure for constructing inductive KGC datasets inadvertently creates a shortcut that can be exploited even while disregarding the relational information. Specifically, we observe that the Personalized PageRank (PPR) score can achieve strong or near SOTA performance on most datasets. In this paper, we study the root cause of this problem. Using these insights, we propose an alternative strategy for constructing inductive KGC datasets that helps mitigate the PPR shortcut. We then benchmark multiple popular methods using the newly constructed datasets and analyze their performance. The new benchmark datasets help promote a better understanding of the capabilities and challenges of inductive KGC by removing any shortcuts that obfuscate performance. The code and dataset and can be found at https://github.com/HarryShomer/Better-Inductive-KGC.
nan
Article 390
Title@2025-06-24 (2): Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons
Title: Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons | Neuromorphes drahtloses Split Computing mit Resonanz-und-Feuer-Neuronen | 神经无线神经无线分裂计算,有共振和火灾中中子 2506.20015v1 |
Authors (5): Dengyu Wu, Jiechen Chen, H. Vincent Poor, Bipin Rajendran, Osvaldo Simeone
Neuromorphic computing offers an energy-efficient alternative to conventional deep learning accelerators for real-time time-series processing. However, many edge applications, such as wireless sensing and audio recognition, generate streaming signals with rich spectral features that are not effectively captured by conventional leaky integrate-and-fire (LIF) spiking neurons. This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons with oscillatory dynamics to process time-domain signals directly, eliminating the need for costly spectral pre-processing. By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity. This temporal sparsity translates into significant savings in both computation and transmission energy. Assuming an OFDM-based analog wireless interface for spike transmission, we present a complete system design and evaluate its performance on audio classification and modulation classification tasks. Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs, while substantially reducing spike rates and total energy consumption during inference and communication.
nan
Article 391
Title@2025-06-24 (2): DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation
Title: DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation | DRO-Augment Framework: Robustheit durch Synergisieren Wasserstein distributiv robust Optimierung und Datenvergrößerung | DRO - 增强框架:通过协调瓦森斯坦(Wasserstein)的分布式强力优化和数据增强,使瓦森斯坦(Wasserstein)的分布性强力 2506.17874v2 |
Authors (3): Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis
In many real-world applications, ensuring the robustness and stability of deep neural networks (DNNs) is crucial, particularly for image classification tasks that encounter various input perturbations. While data augmentation techniques have been widely adopted to enhance the resilience of a trained model against such perturbations, there remains significant room for improvement in robustness against corrupted data and adversarial attacks simultaneously. To address this challenge, we introduce DRO-Augment, a novel framework that integrates Wasserstein Distributionally Robust Optimization (W-DRO) with various data augmentation strategies to improve the robustness of the models significantly across a broad spectrum of corruptions. Our method outperforms existing augmentation methods under severe data perturbations and adversarial attack scenarios while maintaining the accuracy on the clean datasets on a range of benchmark datasets, including but not limited to CIFAR-10-C, CIFAR-100-C, MNIST, and Fashion-MNIST. On the theoretical side, we establish novel generalization error bounds for neural networks trained using a computationally efficient, variation-regularized loss function closely related to the W-DRO problem.
nan
Article 392
Title@2025-06-24 (2): Scalable Machine Learning Algorithms using Path Signatures
Title: Scalable Machine Learning Algorithms using Path Signatures | Skalierbare maschinelle Lernalgorithmen mit Pfadsignaturen | 使用路径签名缩放机器学习算法 2506.17634v2 |
Authors (1): Csaba Tóth
The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data. This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length. We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.
nan
Article 393
Title@2025-06-24 (2): Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing
Title: Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing | Kann ein Sicherheitsschlaufe Guard sie alle? Agentic Guard Rails für Federated Computing | 一个安全环圈能保护全部吗? 2506.20000v1 |
Authors (2): Narasimha Raghavan Veeraragavan, Jan Franz Nygård
We propose Guardian-FC, a novel two-layer framework for privacy preserving federated computing that unifies safety enforcement across diverse privacy preserving mechanisms, including cryptographic back-ends like fully homomorphic encryption (FHE) and multiparty computation (MPC), as well as statistical techniques such as differential privacy (DP). Guardian-FC decouples guard-rails from privacy mechanisms by executing plug-ins (modular computation units), written in a backend-neutral, domain-specific language (DSL) designed specifically for federated computing workflows and interchangeable Execution Providers (EPs), which implement DSL operations for various privacy back-ends. An Agentic-AI control plane enforces a finite-state safety loop through signed telemetry and commands, ensuring consistent risk management and auditability. The manifest-centric design supports fail-fast job admission and seamless extensibility to new privacy back-ends. We present qualitative scenarios illustrating backend-agnostic safety and a formal model foundation for verification. Finally, we outline a research agenda inviting the community to advance adaptive guard-rail tuning, multi-backend composition, DSL specification development, implementation, and compiler extensibility alongside human-override usability.
nan
Article 394
Title@2025-06-24 (2): A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior
Title: A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior | Ein Spatio-Temporal-Punkt-Verfahren zur feinkörnigen Modellierung des Leseverhaltens | 阅读行为精细模拟模型的斯帕迪奥时点进程 2506.19999v1 |
Authors (5): Francesco Ignazio Re, Andreas Opedal, Glib Manaiev, Mario Giulianelli, Ryan Cotterell
Reading is a process that unfolds across space and time, alternating between fixations where a reader focuses on a specific point in space, and saccades where a reader rapidly shifts their focus to a new point. An ansatz of psycholinguistics is that modeling a reader’s fixations and saccades yields insight into their online sentence processing. However, standard approaches to such modeling rely on aggregated eye-tracking measurements and models that impose strong assumptions, ignoring much of the spatio-temporal dynamics that occur during reading. In this paper, we propose a more general probabilistic model of reading behavior, based on a marked spatio-temporal point process, that captures not only how long fixations last, but also where they land in space and when they take place in time. The saccades are modeled using a Hawkes process, which captures how each fixation excites the probability of a new fixation occurring near it in time and space. The duration time of fixation events is modeled as a function of fixation-specific predictors convolved across time, thus capturing spillover effects. Empirically, our Hawkes process model exhibits a better fit to human saccades than baselines. With respect to fixation durations, we observe that incorporating contextual surprisal as a predictor results in only a marginal improvement in the model’s predictive accuracy. This finding suggests that surprisal theory struggles to explain fine-grained eye movements.
nan
Article 395
Title@2025-06-24 (2): In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory
Title: In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory | In-Context Learning for Gradient-Free Receiver Adaptation: Prinzipien, Anwendungen und Theorie | 逐步免费接收者适应:原则、应用和理论 2506.15176v2 |
Authors (6): Matteo Zecchin, Tomer Raviv, Dileep Kalathil, Krishna Narayanan, Nir Shlezinger, Osvaldo Simeone
In recent years, deep learning has facilitated the creation of wireless receivers capable of functioning effectively in conditions that challenge traditional model-based designs. Leveraging programmable hardware architectures, deep learning-based receivers offer the potential to dynamically adapt to varying channel environments. However, current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent. This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL). We review architectural frameworks for ICL based on Transformer models and structured state-space models (SSMs), alongside theoretical insights into how sequence models effectively learn adaptation from contextual information. Further, we explore the application of ICL to cell-free massive MIMO networks, providing both theoretical analyses and empirical evidence. Our findings indicate that ICL represents a principled and efficient approach to real-time receiver adaptation using pilot signals and auxiliary contextual information-without requiring online retraining.
nan
Article 396
Title@2025-06-24 (2): TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Title: TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design | TRACED: Transition-aware regret Annäherung mit Mitlernbarkeit für Umweltdesign | TRACEED: 环境设计中具有共负环境设计共负作用的过渡-意识到遗憾相近情况 2506.19997v1 |
Authors (6): Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called co-learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines. Ablation studies confirm that the transition prediction error drives rapid complexity ramp-up and that co-learnability delivers additional gains when paired with the transition prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.
nan
Article 397
Title@2025-06-24 (2): CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems
Title: CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems | CoVE: Komprimierte Vokabelerweiterung macht LLM-basierte Recommender-Systeme besser | COVE:压缩的词汇扩充使基于LLM的推荐系统更好 2506.19993v1 |
Authors (6): Haochen Zhang, Tianyi Zhang, Junze Yin, Oren Gal, Anshumali Shrivastava, Vladimir Braverman
Recommender systems play a pivotal role in providing relevant content to users. With the rapid development of large language models (LLMs), researchers have begun utilizing LLMs to build more powerful recommender systems. However, existing approaches that focus on aligning LLMs with recommendation tasks do not fully leverage their sequential information processing capabilities, leading to suboptimal performance. In this paper, we propose a novel system called compressed vocabulary expansion (CoVE). In CoVE, each item is assigned a unique ID within the expanded vocabulary. Our framework effectively capitalizes on sequence understanding abilities of LLMs, significantly enhancing their performance on recommendation tasks. Additionally, we compress the embedding layer, making CoVE practical for large-scale industrial applications. The effectiveness and performance of CoVE are demonstrated through comprehensive experiments on multiple recommendation datasets and comparisons with prior works. Our code can be found at https://github.com/HaochenZhang717/CoVE-official-Repo.
nan
Article 398
Title@2025-06-24 (2): HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization
Title: HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization | HERCULES: Hierarchische Einbettung von rekursiven Clustern mit LLMs für eine effiziente Zusammenfassung | HERCULES:利用LLMs高效汇总法,基于等级嵌入式嵌入式递递性集群 2506.19992v1 |
Authors (2): Gabor Petnehazi, Bernadett Aradi
The explosive growth of complex datasets across various modalities necessitates advanced analytical tools that not only group data effectively but also provide human-understandable insights into the discovered structures. We introduce HERCULES (Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization), a novel algorithm and Python package designed for hierarchical k-means clustering of diverse data types, including text, images, and numeric data (processed one modality per run). HERCULES constructs a cluster hierarchy by recursively applying k-means clustering, starting from individual data points at level 0. A key innovation is its deep integration of Large Language Models (LLMs) to generate semantically rich titles and descriptions for clusters at each level of the hierarchy, significantly enhancing interpretability. The algorithm supports two main representation modes: direct' mode, which clusters based on original data embeddings or scaled numeric features, and
description’ mode, which clusters based on embeddings derived from LLM-generated summaries. Users can provide a `topic_seed’ to guide LLM-generated summaries towards specific themes. An interactive visualization tool facilitates thorough analysis and understanding of the clustering results. We demonstrate HERCULES’s capabilities and discuss its potential for extracting meaningful, hierarchical knowledge from complex datasets.
nan
Article 399
Title@2025-06-24 (2): Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems
Title: Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems | Follow-the-Perturbed-Leader nähert sich Best-of-Both-Worlds für die m-Set Semi-Bandit-Probleme | M-Set半银行问题最佳世界最佳办法 2504.07307v3 |
Authors (4): Jingxin Zhan, Yuchen Xin, Chenjie Sun, Zhihua Zhang
We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr'echet perturbation also enjoys the near optimal regret bound $\mathcal{O}(\sqrt{nm}(\sqrt{d\log(d)}+m^{5/6}))$ in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.
nan
Article 400
Title@2025-06-24 (2): MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel
Title: MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel | MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel | Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v2 |
Authors (9): Elvis Kimara, Mozhgan Hadadi, Jackson Godbersen, Aditya Balu, Talukder Jubery, Yawei Li, Adarsh Krishnamurthy, Patrick S. Schnable, Baskar Ganapathysubramanian
The development of artificial intelligence (AI) and machine learning (ML) based tools for 3D phenotyping, especially for maize, has been limited due to the lack of large and diverse 3D datasets. 2D image datasets fail to capture essential structural details such as leaf architecture, plant volume, and spatial arrangements that 3D data provide. To address this limitation, we present MaizeField3D (https://baskargroup.github.io/MaizeField3D/), a curated dataset of 3D point clouds of field-grown maize plants from a diverse genetic panel, designed to be AI-ready for advancing agricultural research. Our dataset includes 1,045 high-quality point clouds of field-grown maize collected using a terrestrial laser scanner (TLS). Point clouds of 520 plants from this dataset were segmented and annotated using a graph-based segmentation method to isolate individual leaves and stalks, ensuring consistent labeling across all samples. This labeled data was then used for fitting procedural models that provide a structured parametric representation of the maize plants. The leaves of the maize plants in the procedural models are represented using Non-Uniform Rational B-Spline (NURBS) surfaces that were generated using a two-step optimization process combining gradient-free and gradient-based methods. We conducted rigorous manual quality control on all datasets, correcting errors in segmentation, ensuring accurate leaf ordering, and validating metadata annotations. The dataset also includes metadata detailing plant morphology and quality, alongside multi-resolution subsampled point cloud data (100k, 50k, 10k points), which can be readily used for different downstream computational tasks. MaizeField3D will serve as a comprehensive foundational dataset for AI-driven phenotyping, plant structural analysis, and 3D applications in agricultural research.
nan
Article 401
Title@2025-06-24 (2): Proofs as Explanations: Short Certificates for Reliable Predictions
Title: Proofs as Explanations: Short Certificates for Reliable Predictions | Beweise als Erklärungen: Kurze Zertifikate für zuverlässige Vorhersagen | 作为解释的证明:可靠预测的短期证明 2504.08377v3 |
Authors (4): Avrim Blum, Steve Hanneke, Chirag Pabbaraju, Donya Saless
We consider a model for explainable AI in which an explanation for a prediction $h(x)=y$ consists of a subset $S’$ of the training data (if it exists) such that all classifiers $h’ \in H$ that make at most $b$ mistakes on $S’$ predict $h’(x)=y$. Such a set $S’$ serves as a proof that $x$ indeed has label $y$ under the assumption that (1) the target function $h^\star$ belongs to $H$, and (2) the set $S$ contains at most $b$ corrupted points. For example, if $b=0$ and $H$ is the family of linear classifiers in $\mathbb{R}^d$, and if $x$ lies inside the convex hull of the positive data points in $S$ (and hence every consistent linear classifier labels $x$ as positive), then Carath'eodory’s theorem states that $x$ lies inside the convex hull of $d+1$ of those points. So, a set $S’$ of size $d+1$ could be released as an explanation for a positive prediction, and would serve as a short proof of correctness of the prediction under the assumption of realizability. In this work, we consider this problem more generally, for general hypothesis classes $H$ and general values $b\geq 0$. We define the notion of the robust hollow star number of $H$ (which generalizes the standard hollow star number), and show that it precisely characterizes the worst-case size of the smallest certificate achievable, and analyze its size for natural classes. We also consider worst-case distributional bounds on certificate size, as well as distribution-dependent bounds that we show tightly control the sample size needed to get a certificate for any given test example. In particular, we define a notion of the certificate coefficient $\varepsilon_x$ of an example $x$ with respect to a data distribution $D$ and target function $h^\star$, and prove matching upper and lower bounds on sample size as a function of $\varepsilon_x$, $b$, and the VC dimension $d$ of $H$.
nan
Article 402
Title@2025-06-24 (2): FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Title: FORTRESS: Frontier Risk Evaluation for National Security and Public Safety | FORTRESS: Frontier Risk Evaluation für nationale Sicherheit und öffentliche Sicherheit | FORTRES:国家安全和公共安全的边界风险评估 2506.14922v2 |
Authors (7): Christina Q. Knight, Kaustubh Deshpande, Ved Sirdeshmukh, Meher Mankikar, Scale Red Team, SEAL Research Team, Julian Michael
The rapid advancement of large language models (LLMs) introduces dual-use capabilities that could both threaten and bolster national security and public safety (NSPS). Models implement safeguards to protect against potential misuse relevant to NSPS and allow for benign users to receive helpful information. However, current benchmarks often fail to test safeguard robustness to potential NSPS risks in an objective, robust way. We introduce FORTRESS: 500 expert-crafted adversarial prompts with instance-based rubrics of 4-7 binary questions for automated evaluation across 3 domains (unclassified information only): Chemical, Biological, Radiological, Nuclear and Explosive (CBRNE), Political Violence & Terrorism, and Criminal & Financial Illicit Activities, with 10 total subcategories across these domains. Each prompt-rubric pair has a corresponding benign version to test for model over-refusals. This evaluation of frontier LLMs’ safeguard robustness reveals varying trade-offs between potential risks and model usefulness: Claude-3.5-Sonnet demonstrates a low average risk score (ARS) (14.09 out of 100) but the highest over-refusal score (ORS) (21.8 out of 100), while Gemini 2.5 Pro shows low over-refusal (1.4) but a high average potential risk (66.29). Deepseek-R1 has the highest ARS at 78.05, but the lowest ORS at only 0.06. Models such as o1 display a more even trade-off between potential risks and over-refusals (with an ARS of 21.69 and ORS of 5.2). To provide policymakers and researchers with a clear understanding of models’ potential risks, we publicly release FORTRESS at https://huggingface.co/datasets/ScaleAI/fortress_public. We also maintain a private set for evaluation.
nan
Article 403
Title@2025-06-24 (2): MAIZX: A Carbon-Aware Framework for Optimizing Cloud Computing Emissions
Title: MAIZX: A Carbon-Aware Framework for Optimizing Cloud Computing Emissions | MAIZX: Ein Carbon-Aware-Framework zur Optimierung von Cloud-Computing-Emissionen | MAIZX:优化云计算排放的碳软件框架 2506.19972v1 |
Authors (3): Federico Ruilova, Ernst Gunnar Gran, Sven-Arne Reinemo
Cloud computing drives innovation but also poses significant environmental challenges due to its high-energy consumption and carbon emissions. Data centers account for 2-4% of global energy usage, and the ICT sector’s share of electricity consumption is projected to reach 40% by 2040. As the goal of achieving net-zero emissions by 2050 becomes increasingly urgent, there is a growing need for more efficient and transparent solutions, particularly for private cloud infrastructures, which are utilized by 87% of organizations, despite the dominance of public-cloud systems. This study evaluates the MAIZX framework, designed to optimize cloud operations and reduce carbon footprint by dynamically ranking resources, including data centers, edge computing nodes, and multi-cloud environments, based on real-time and forecasted carbon intensity, Power Usage Effectiveness (PUE), and energy consumption. Leveraging a flexible ranking algorithm, MAIZX achieved an 85.68% reduction in CO2 emissions compared to baseline hypervisor operations. Tested across geographically distributed data centers, the framework demonstrates scalability and effectiveness, directly interfacing with hypervisors to optimize workloads in private, hybrid, and multi-cloud environments. MAIZX integrates real-time data on carbon intensity, power consumption, and carbon footprint, as well as forecasted values, into cloud management, providing a robust tool for enhancing climate performance potential while maintaining operational efficiency.
nan
Article 404
Title@2025-06-24 (2): COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty
Title: COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty | COBRA-PPM: Eine kausale Bayesian-Reasoning-Architektur mit probabilistischer Programmierung für Robotermanipulation unter Unsicherheit | COBRA-PPM: 在不确定性下对机器人操纵进行概率程序设计 2403.14488v3 |
Authors (5): Ricardo Cannizzaro, Michael Groom, Jonathan Routley, Robert Osazuwa Ness, Lars Kunze
Manipulation tasks require robots to reason about cause and effect when interacting with objects. Yet, many data-driven approaches lack causal semantics and thus only consider correlations. We introduce COBRA-PPM, a novel causal Bayesian reasoning architecture that combines causal Bayesian networks and probabilistic programming to perform interventional inference for robot manipulation under uncertainty. We demonstrate its capabilities through high-fidelity Gazebo-based experiments on an exemplar block stacking task, where it predicts manipulation outcomes with high accuracy (Pred Acc: 88.6%) and performs greedy next-best action selection with a 94.2% task success rate. We further demonstrate sim2real transfer on a domestic robot, showing effectiveness in handling real-world uncertainty from sensor noise and stochastic actions. Our generalised and extensible framework supports a wide range of manipulation scenarios and lays a foundation for future work at the intersection of robotics and causality.
nan
Article 405
Title@2025-06-24 (2): Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models
Title: Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models | Fuzz-Testing trifft auf LLM-basierte Agenten: Ein automatisiertes und effizientes Framework für Jailbreaking-Text-to-Image-Generationsmodelle | 以LLM为根据的代理物:一个自动有效的框架,用于制作监狱破译文本到图像制作模型。 2408.00523v3 |
Authors (5): Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li, Shanqing Guo
Text-to-image (T2I) generative models have revolutionized content creation by transforming textual descriptions into high-quality images. However, these models are vulnerable to jailbreaking attacks, where carefully crafted prompts bypass safety mechanisms to produce unsafe content. While researchers have developed various jailbreak attacks to expose this risk, these methods face significant limitations, including impractical access requirements, easily detectable unnatural prompts, restricted search spaces, and high query demands on the target system. In this paper, we propose JailFuzzer, a novel fuzzing framework driven by large language model (LLM) agents, designed to efficiently generate natural and semantically meaningful jailbreak prompts in a black-box setting. Specifically, JailFuzzer employs fuzz-testing principles with three components: a seed pool for initial and jailbreak prompts, a guided mutation engine for generating meaningful variations, and an oracle function to evaluate jailbreak success. Furthermore, we construct the guided mutation engine and oracle function by LLM-based agents, which further ensures efficiency and adaptability in black-box settings. Extensive experiments demonstrate that JailFuzzer has significant advantages in jailbreaking T2I models. It generates natural and semantically coherent prompts, reducing the likelihood of detection by traditional defenses. Additionally, it achieves a high success rate in jailbreak attacks with minimal query overhead, outperforming existing methods across all key metrics. This study underscores the need for stronger safety mechanisms in generative models and provides a foundation for future research on defending against sophisticated jailbreaking attacks. JailFuzzer is open-source and available at this repository: https://github.com/YingkaiD/JailFuzzer.
nan
Article 406
Title@2025-06-24 (2): Protein Structure Tokenization: Benchmarking and New Recipe
Title: Protein Structure Tokenization: Benchmarking and New Recipe | Proteinstruktur Tokenization: Benchmarking und neues Rezept | 蛋白质结构化:基准和新食谱 2503.00089v2 |
Authors (4): Xinyu Yuan, Zichen Wang, Marcus Collins, Huzefa Rangwala
Recent years have witnessed a surge in the development of protein structural tokenization methods, which chunk protein 3D structures into discrete or continuous representations. Structure tokenization enables the direct application of powerful techniques like language modeling for protein structures, and large multimodal models to integrate structures with protein sequences and functional texts. Despite the progress, the capabilities and limitations of these methods remain poorly understood due to the lack of a unified evaluation framework. We first introduce StructTokenBench, a framework that comprehensively evaluates the quality and efficiency of structure tokenizers, focusing on fine-grained local substructures rather than global structures, as typical in existing benchmarks. Our evaluations reveal that no single model dominates all benchmarking perspectives. Observations of codebook under-utilization led us to develop AminoAseed, a simple yet effective strategy that enhances codebook gradient updates and optimally balances codebook size and dimension for improved tokenizer utilization and quality. Compared to the leading model ESM3, our method achieves an average of 6.31% performance improvement across 24 supervised tasks, with sensitivity and utilization rates increased by 12.83% and 124.03%, respectively. Source code and model weights are available at https://github.com/KatarinaYuan/StructTokenBench
nan
Article 407
Title@2025-06-24 (2): Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems
Title: Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems | Progressives Size-Adaptive-Federated Learning: Ein umfassender Rahmen für heterogene multimodale Datensysteme | 渐进式规模-成熟型联邦学习:多种模式数据系统综合框架 2506.20685v1 |
Authors (5): Sajid Hussain, Muhammad Sohail, Nauman Ali Khan, Naima Iltaf, Ihtesham ul Islam
Federated Learning (FL) has emerged as a transformative paradigm for distributed machine learning while preserving data privacy. However, existing approaches predominantly focus on model heterogeneity and aggregation techniques, largely overlooking the fundamental impact of dataset size characteristics on federated training dynamics. This paper introduces Size-Based Adaptive Federated Learning (SAFL), a novel progressive training framework that systematically organizes federated learning based on dataset size characteristics across heterogeneous multi-modal data. Our comprehensive experimental evaluation across 13 diverse datasets spanning 7 modalities (vision, text, time series, audio, sensor, medical vision, and multimodal) reveals critical insights: 1) an optimal dataset size range of 1000-1500 samples for federated learning effectiveness; 2) a clear modality performance hierarchy with structured data (time series, sensor) significantly outperforming unstructured data (text, multimodal); and 3) systematic performance degradation for large datasets exceeding 2000 samples. SAFL achieves an average accuracy of 87.68% across all datasets, with structured data modalities reaching 99%+ accuracy. The framework demonstrates superior communication efficiency, reducing total data transfer to 7.38 GB across 558 communications while maintaining high performance. Our real-time monitoring framework provides unprecedented insights into system resource utilization, network efficiency, and training dynamics. This work fills critical gaps in understanding how data characteristics should drive federated learning strategies, providing both theoretical insights and practical guidance for real-world FL deployments in neural network and learning systems.
nan
Article 408
Title@2025-06-24 (2): SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models
Title: SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models | SA-Solver: Stochastischer Adams Solver für schnelle Probenahme von Diffusionsmodellen | SA-Solver:用于快速采样扩散模型的蒸汽器溶解器 2309.05019v3 |
Authors (7): Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks. As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed. The majority of such techniques consider solving the diffusion ODE due to its superior efficiency. However, stochastic sampling could offer additional advantages in generating diverse and high-quality data. In this work, we engage in a comprehensive analysis of stochastic sampling from two aspects: variance-controlled diffusion SDE and linear multi-step SDE solver. Based on our analysis, we propose \textit{SA-Solver}, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality. Our experiments show that \textit{SA-Solver} achieves: 1) improved or comparable performance compared with the existing state-of-the-art (SOTA) sampling methods for few-step sampling; 2) SOTA FID on substantial benchmark datasets under a suitable number of function evaluations (NFEs). Code is available at https://github.com/scxue/SA-Solver.
nan
Article 409
Title@2025-06-24 (2): MILAAP: Mobile Link Allocation via Attention-based Prediction
Title: MILAAP: Mobile Link Allocation via Attention-based Prediction | MILAAP: Mobile Link Allocation über aufmerksamkeitsbasierte Vorhersage | MILAAP:通过基于关注的预测分配移动链接 2506.19947v1 |
Authors (2): Yung-Fu Chen, Anish Arora
Channel hopping (CS) communication systems must adapt to interference changes in the wireless network and to node mobility for maintaining throughput efficiency. Optimal scheduling requires up-to-date network state information (i.e., of channel occupancy) to select non-overlapping channels for links in interference regions. However, state sharing among nodes introduces significant communication overhead, especially as network size or node mobility scale, thereby decreasing throughput efficiency of already capacity-limited networks. In this paper, we eschew state sharing while adapting the CS schedule based on a learning-based channel occupancy prediction. We propose the MiLAAP attention-based prediction framework for machine learning models of spectral, spatial, and temporal dependencies among network nodes. MiLAAP uses a self-attention mechanism that lets each node capture the temporospectral CS pattern in its interference region and accordingly predict the channel occupancy state within that region. Notably, the prediction relies only on locally and passively observed channel activities, and thus introduces no communication overhead. To deal with node mobility, MiLAAP also uses a multi-head self-attention mechanism that lets each node locally capture the spatiotemporal dependencies on other network nodes that can interfere with it and accordingly predict the motion trajectory of those nodes. Detecting nodes that enter or move outside the interference region is used to further improve the prediction accuracy of channel occupancy. We show that for dynamic networks that use local CS sequences to support relatively long-lived flow traffics, the channel state prediction accuracy of MiLAAP is remarkably ~100% across different node mobility patterns and it achieves zero-shot generalizability across different periods of CS sequences.
nan
Article 410
Title@2025-06-24 (2): Data-Driven Dynamic Factor Modeling via Manifold Learning
Title: Data-Driven Dynamic Factor Modeling via Manifold Learning | Datengetriebene Dynamische Faktormodellierung über Manifold Learning | 数据驱动动态因子通过 MManiple Learning 学习模式建模 2506.19945v1 |
Authors (3): Graeme Baker, Agostino Capponi, J. Antonio Sidaoui
We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework uncovers the joint dynamics of the covariates and responses in a purely data-driven way. We approximate the embedding dynamics using linear diffusions, and exploit Kalman filtering to predict the evolution of the covariates and response variables directly from the diffusion map embedding space. We generalize Singer’s convergence rate analysis of the graph Laplacian from the case of independent uniform samples on a compact manifold to the case of time series arising from Langevin diffusions in Euclidean space. Furthermore, we provide rigorous justification for our procedure by showing the robustness of approximations of the diffusion map coordinates by linear diffusions, and the convergence of ergodic averages under standard spectral assumptions on the underlying dynamics. We apply our method to the stress testing of equity portfolios using a combination of financial and macroeconomic factors from the Federal Reserve’s supervisory scenarios. We demonstrate that our data-driven stress testing method outperforms standard scenario analysis and Principal Component Analysis benchmarks through historical backtests spanning three major financial crises, achieving reductions in mean absolute error of up to 55% and 39% for scenario-based portfolio return prediction, respectively.
nan
Article 411
Title@2025-06-24 (2): LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps
Title: LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps | LLM-Wassermarkierung mit Mischungen und statistischen bis rechnerischen Lücken | LLM LLM 利用混合体和统计到统计差距进行水标记 2505.01484v2 |
Authors (2): Pedro Abdalla, Roman Vershynin
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
nan
Article 412
Title@2025-06-24 (2): The Most Important Features in Generalized Additive Models Might Be Groups of Features
Title: The Most Important Features in Generalized Additive Models Might Be Groups of Features | Die wichtigsten Merkmale in generalisierten additiven Modellen könnten Gruppen von Funktionen sein | 通用Additive模型中最重要的地物可能是地物群 2506.19937v1 |
Authors (11): Tomas M. Bosschieter, Luis Franca, Jessica Wolk, Yiyuan Wu, Bella Mehta, Joseph Dehoney, Orsolya Kiss, Fiona C. Baker, Qingyu Zhao, Rich Caruana, Kilian M. Pohl
While analyzing the importance of features has become ubiquitous in interpretable machine learning, the joint signal from a group of related features is sometimes overlooked or inadvertently excluded. Neglecting the joint signal could bypass a critical insight: in many instances, the most significant predictors are not isolated features, but rather the combined effect of groups of features. This can be especially problematic for datasets that contain natural groupings of features, including multimodal datasets. This paper introduces a novel approach to determine the importance of a group of features for Generalized Additive Models (GAMs) that is efficient, requires no model retraining, allows defining groups posthoc, permits overlapping groups, and remains meaningful in high-dimensional settings. Moreover, this definition offers a parallel with explained variation in statistics. We showcase properties of our method on three synthetic experiments that illustrate the behavior of group importance across various data regimes. We then demonstrate the importance of groups of features in identifying depressive symptoms from a multimodal neuroscience dataset, and study the importance of social determinants of health after total hip arthroplasty. These two case studies reveal that analyzing group importance offers a more accurate, holistic view of the medical issues compared to a single-feature analysis.
nan
Article 413
Title@2025-06-24 (2): Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
Title: Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture | Jede Bestellung GPT als Maskierte Diffusion Modell: Entkopplung Formulierung und Architektur | 任何指令 GPT , 以遮蔽扩散模型: 脱钩制成和结构 2506.19935v1 |
Authors (8): Shuchen Xue, Tianyu Xie, Tianyang Hu, Zijin Feng, Jiacheng Sun, Kenji Kawaguchi, Zhenguo Li, Zhi-Ming Ma
Large language models (LLMs) predominantly use autoregressive (AR) approaches, but masked diffusion models (MDMs) are emerging as viable alternatives. A key challenge in comparing AR and MDM paradigms is their typical architectural difference: AR models are often decoder-only, while MDMs have largely been encoder-only. This practice of changing both the modeling paradigm and architecture simultaneously makes direct comparisons unfair, as it’s hard to distinguish whether observed differences stem from the paradigm itself or the architectural shift. This research evaluates MDMs within a decoder-only framework to: (1) equitably compare MDM (as Any-Order AR, or AO-AR) and standard AR paradigms. Our investigation suggests that the standard AO-AR objective, which averages over all token permutations, may benefit from refinement, as many permutations appear less informative compared to the language’s inherent left-to-right structure. (2) Investigate architectural influences (decoder-only vs. encoder-only) within MDMs. We demonstrate that while encoder-only MDMs model a simpler conditional probability space, decoder-only MDMs can achieve dramatic generation speedups ($\sim25\times$) and comparable perplexity with temperature annealing despite modeling a vastly larger space, highlighting key trade-offs. This work thus decouples core paradigm differences from architectural influences, offering insights for future model design. Code is available at https://github.com/scxue/AO-GPT-MDM.
nan
Article 414
Title@2025-06-24 (2): C-Learner: Constrained Learning for Causal Inference
Title: C-Learner: Constrained Learning for Causal Inference | C-Learner: Eingeschränktes Lernen für kausale Schlussfolgerung | C-Learner: 控制学习以诱因推断 2405.09493v4 |
Authors (4): Tiffany Tianhui Cai, Yuri Fonseca, Kaiwen Hou, Hongseok Namkoong
Popular debiased estimation methods for causal inference – such as augmented inverse propensity weighting and targeted maximum likelihood estimation – enjoy desirable asymptotic properties like statistical efficiency and double robustness but they can produce unstable estimates when there is limited overlap between treatment and control, requiring additional assumptions or ad hoc adjustments in practice (e.g., truncating propensity scores). In contrast, simple plug-in estimators are stable but lack desirable asymptotic properties. We propose a novel debiasing approach that achieves the best of both worlds, producing stable plug-in estimates with desirable asymptotic properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero, and can leverage flexible model classes including neural networks and tree ensembles. In several experimental settings, including ones in which we handle text-based covariates by fine-tuning language models, our constrained learning-based estimator outperforms basic versions of one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs similarly otherwise. Finally, to understand why our method exhibits superior performance in settings with low overlap, we present a theoretical example with heavy-tailed inverse propensity scores in which other debiased estimators converge more slowly compared to ours.
nan
Article 415
Title@2025-06-24 (2): Anomaly Detection and Radio-frequency Interference Classification with Unsupervised Learning in Narrowband Radio Technosignature Searches
Title: Anomaly Detection and Radio-frequency Interference Classification with Unsupervised Learning in Narrowband Radio Technosignature Searches | Anomalieerkennung und Hochfrequenz-Interferenzklassifikation mit unüberwachtem Lernen in Schmalband-Radio-Technosignatur-Suchen | 在窄带无线电技术签名搜索中进行无监督学习的异常探测和无线电频率干扰分类 2411.16556v2 |
Authors (10): Ben Jacobson-Bell, Steve Croft, Carmen Choza, Alex Andersson, Daniel Bautista, Vishal Gajjar, Matthew Lebofsky, David H. E. MacMahon, Caleb Painter, Andrew P. V. Siemion
The search for radio technosignatures is an anomaly detection problem: Candidate signals represent needles of interest in the proverbial haystack of radio-frequency interference (RFI). Current search frameworks find an enormity of false-positive signals, especially in large surveys, requiring manual follow-up to a sometimes prohibitive degree. Unsupervised learning provides an algorithmic way to winnow the most anomalous signals from the chaff, as well as group together RFI signals that bear morphological similarities. We present GLOBULAR (Grouping Low-frequency Observations By Unsupervised Learning After Reduction) clustering, a signal processing method that uses HDBSCAN to reduce the false-positive rate and isolate outlier signals for further analysis. When combined with a standard narrowband signal detection and spatial filtering pipeline, such as turboSETI, GLOBULAR clustering offers significant improvements in the false-positive rate over the standard pipeline alone, suggesting dramatic potential for the amelioration of manual follow-up requirements for future large surveys. By removing RFI signals in regions of high spectral occupancy, GLOBULAR clustering may also enable the detection of signals missed by the standard pipeline. We benchmark our method against the Choza et al. turboSETI-only search of 97 nearby galaxies at the L band, demonstrating a false-positive hit reduction rate of 93.1% and a false-positive event reduction rate of 99.3%.
nan
Article 416
Title@2025-06-24 (2): A Comparative Analysis of Reinforcement Learning and Conventional Deep Learning Approaches for Bearing Fault Diagnosis
Title: A Comparative Analysis of Reinforcement Learning and Conventional Deep Learning Approaches for Bearing Fault Diagnosis | Eine vergleichende Analyse des Verstärkungslernens und konventioneller Deep-Learning-Ansätze zur Fault-Diagnose | 强化学习和遗留过失诊断常规深习方法比较分析 2506.19929v1 |
Authors (2): Efe Çakır, Patrick Dumond
Bearing faults in rotating machinery can lead to significant operational disruptions and maintenance costs. Modern methods for bearing fault diagnosis rely heavily on vibration analysis and machine learning techniques, which often require extensive labeled data and may not adapt well to dynamic environments. This study explores the feasibility of reinforcement learning (RL), specifically Deep Q-Networks (DQNs), for bearing fault classification tasks in machine condition monitoring to enhance the accuracy and adaptability of bearing fault diagnosis. The results demonstrate that while RL models developed in this study can match the performance of traditional supervised learning models under controlled conditions, they excel in adaptability when equipped with optimized reward structures. However, their computational demands highlight areas for further improvement. These findings demonstrate RL’s potential to complement traditional methods, paving the way for adaptive diagnostic frameworks.
nan
Article 417
Title@2025-06-24 (2): Prover Agent: An Agent-based Framework for Formal Mathematical Proofs
Title: Prover Agent: An Agent-based Framework for Formal Mathematical Proofs | Prover Agent: Ein agentenbasiertes Framework für formale mathematische Nachweise | 以代理人为基础的正式数学证明框架 2506.19923v1 |
Authors (4): Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai
We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present case studies illustrating how these generated lemmas contribute to solving challenging problems.
nan
Article 418
Title@2025-06-24 (2): Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Title: Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation | Radiale Aufmerksamkeit: $O(n\log n)$ Sparse Achtung mit Energieverlust für lange Video-Generation | 辐射注意: $O(nlog n)$ 散射注意, 长期视频生成的能源衰减导致能量衰减 2506.19852v1 |
Authors (14): Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han
Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature. Motivated by this, we propose Radial Attention, a scalable sparse attention mechanism with $O(n \log n)$ complexity that translates energy decay into exponentially decaying compute density, which is significantly more efficient than standard $O(n^2)$ dense attention and more expressive than linear attention. Specifically, Radial Attention employs a simple, static attention mask where each token attends to spatially nearby tokens, with the attention window size shrinking with temporal distance. Moreover, it allows pre-trained video diffusion models to extend their generation length with efficient LoRA-based fine-tuning. Extensive experiments show that Radial Attention maintains video quality across Wan2.1-14B, HunyuanVideo, and Mochi 1, achieving up to a 1.9$\times$ speedup over the original dense attention. With minimal tuning, it enables video generation up to 4$\times$ longer while reducing training costs by up to 4.4$\times$ compared to direct fine-tuning and accelerating inference by up to 3.7$\times$ compared to dense attention inference.
nan
Article 419
Title@2025-06-24 (2): Orthogonal Finetuning Made Scalable
Title: Orthogonal Finetuning Made Scalable | Orthogonale Feinsteuerung aus skalierbarem Material | 可缩放 2506.19847v1 |
Authors (4): Zeju Qiu, Weiyang Liu, Adrian Weller, Bernhard Schölkopf
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.
nan
Article 420
Title@2025-06-24 (2): A Comparative Study of NAFNet Baselines for Image Restoration
Title: A Comparative Study of NAFNet Baselines for Image Restoration | Eine vergleichende Studie von NAFNet Baselines für die Bildrestaurierung | NAFNet图像恢复基线比较研究 2506.19845v1 |
Authors (2): Vladislav Esaulov, M. Moein Esfahani
We study NAFNet (Nonlinear Activation Free Network), a simple and efficient deep learning baseline for image restoration. By using CIFAR10 images corrupted with noise and blur, we conduct an ablation study of NAFNet’s core components. Our baseline model implements SimpleGate activation, Simplified Channel Activation (SCA), and LayerNormalization. We compare this baseline to different variants that replace or remove components. Quantitative results (PSNR, SSIM) and examples illustrate how each modification affects restoration performance. Our findings support the NAFNet design: the SimpleGate and simplified attention mechanisms yield better results than conventional activations and attention, while LayerNorm proves to be important for stable training. We conclude with recommendations for model design, discuss potential improvements, and future work.
nan
Article 421
Title@2025-06-24 (2): Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering
Title: Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering | Konvergenz von mittleren Shift-Algorithmen für große Bandbreiten und simultanes präzises Clustering | 大型带宽和同声精密集束中 平均移动比值的趋同 2506.19837v1 |
Authors (2): Susovan Pal, Praneeth Vepakomma
The mean shift (MS) is a non-parametric, density-based, iterative algorithm that has prominent usage in clustering and image segmentation. A rigorous proof for its convergence in full generality remains unknown. Two significant steps in this direction were taken in the paper \cite{Gh1}, which proved that for \textit{sufficiently large bandwidth}, the MS algorithm with the Gaussian kernel always converges in any dimension, and also by the same author in \cite{Gh2}, proved that MS always converges in one dimension for kernels with differentiable, strictly decreasing, convex profiles. In the more recent paper \cite{YT}, they have proved the convergence in more generality,\textit{ without any restriction on the bandwidth}, with the assumption that the KDE $f$ has a continuous Lipschitz gradient on the closure of the convex hull of the trajectory of the iterated sequence of the mode estimate, and also satisfies the {\L}ojasiewicz property there. The main theoretical result of this paper is a generalization of those of \cite{Gh1}, where we show that (1) for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. The proof uses two alternate characterizations of radially symmetric positive definite smooth kernels by Schoenberg and Bernstein \cite{Fass}, and borrows some steps from the proofs in \cite{Gh1}. Although the authors acknowledge that the result in that paper is more restrictive than that of \cite{YT} due to the lower bandwidth limit, it uses a different set of assumptions than \cite{YT}, and the proof technique is different.
nan
Article 422
Title@2025-06-24 (2): Machine Learning with Privacy for Protected Attributes
Title: Machine Learning with Privacy for Protected Attributes | Maschinelles Lernen mit Datenschutz für geschützte Attribute | 带有受保护属性隐私的机器学习 2506.19836v1 |
Authors (4): Saeed Mahloujifar, Chuan Guo, G. Edward Suh, Kamalika Chaudhuri
Differential privacy (DP) has become the standard for private data analysis. Certain machine learning applications only require privacy protection for specific protected attributes. Using naive variants of differential privacy in such use cases can result in unnecessary degradation of utility. In this work, we refine the definition of DP to create a more general and flexible framework that we call feature differential privacy (FDP). Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary and adaptive separation of protected and non-protected features. We prove the properties of FDP, such as adaptive composition, and demonstrate its implications for limiting attribute inference attacks. We also propose a modification of the standard DP-SGD algorithm that satisfies FDP while leveraging desirable properties such as amplification via sub-sampling. We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available. For example, we train diffusion models on the AFHQ dataset of animal faces and observe a drastic improvement in FID compared to DP, from 286.7 to 101.9 at $\epsilon=8$, assuming that the blurred version of a training image is available as a public feature. Overall, our work provides a new approach to private data analysis that can help reduce the utility cost of DP while still providing strong privacy guarantees.
nan
Article 423
Title@2025-06-24 (2): Inferring Higher-Order Couplings with Neural Networks
Title: Inferring Higher-Order Couplings with Neural Networks | Rückschlüsse auf höhere Auftragskoppelungen mit neuralen Netzen | 与神经网络连接 2501.06108v3 |
Authors (3): Aurélien Decelle, Alfonso de Jesús Navas Gómez, Beatriz Seoane
Maximum entropy methods, rooted in the inverse Ising/Potts problem from statistical physics, are widely used to model pairwise interactions in complex systems across disciplines such as bioinformatics and neuroscience. While successful, these approaches often fail to capture higher-order interactions that are critical for understanding collective behavior. In contrast, modern machine learning methods can model such interactions, but their interpretability often comes at a prohibitive computational cost. Restricted Boltzmann Machines (RBMs) provide a computationally efficient alternative by encoding statistical correlations through hidden units in a bipartite architecture. In this work, we introduce a method that maps RBMs onto generalized Potts models, enabling the systematic extraction of interactions up to arbitrary order. Leveraging large-$N$ approximations – made tractable by the RBM’s structure – we extract effective many-body couplings with minimal computational effort. We further propose a robust framework for recovering higher-order interactions in more complex generative models, and introduce a simple gauge-fixing scheme for the effective Potts representation. Validation on synthetic data demonstrates accurate recovery of two- and three-body interactions. Applied to protein sequence data, our method reconstructs contact maps with high fidelity and outperforms state-of-the-art inverse Potts models. These results establish RBMs as a powerful and efficient tool for modeling higher-order structure in high-dimensional categorical data.
nan
Article 424
Title@2025-06-24 (2): A standard transformer and attention with linear biases for molecular conformer generation
Title: A standard transformer and attention with linear biases for molecular conformer generation | Ein Standardtransformator und Aufmerksamkeit mit linearen Vorspannungen für die molekulare Konformergeneration | 标准的变压器和对分子相配器生成具有线性偏偏的注意 2506.19834v1 |
Authors (2): Viatcheslav Gurev, Timothy Rumbell
Sampling low-energy molecular conformations, spatial arrangements of atoms in a molecule, is a critical task for many different calculations performed in the drug discovery and optimization process. Numerous specialized equivariant networks have been designed to generate molecular conformations from 2D molecular graphs. Recently, non-equivariant transformer models have emerged as a viable alternative due to their capability to scale to improve generalization. However, the concern has been that non-equivariant models require a large model size to compensate the lack of equivariant bias. In this paper, we demonstrate that a well-chosen positional encoding effectively addresses these size limitations. A standard transformer model incorporating relative positional encoding for molecular graphs when scaled to 25 million parameters surpasses the current state-of-the-art non-equivariant base model with 64 million parameters on the GEOM-DRUGS benchmark. We implemented relative positional encoding as a negative attention bias that linearly increases with the shortest path distances between graph nodes at varying slopes for different attention heads, similar to ALiBi, a widely adopted relative positional encoding technique in the NLP domain. This architecture has the potential to serve as a foundation for a novel class of generative models for molecular conformations.
nan
Article 425
Title@2025-06-24 (2): Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential
Title: Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential | Fourier-Multi-Komponente und Multi-Layer-Neural-Netzwerke: Entsperren von Hochfrequenzpotenzialen | Fariier多功能多功能多轨道神经网络:释放高功能潜能 2502.18959v2 |
Authors (4): Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou
The architecture of a neural network and the selection of its activation function are both fundamental to its performance. Equally vital is ensuring these two elements are well-matched, as their alignment is key to achieving effective representation and learning. In this paper, we introduce the Fourier Multi-Component and Multi-Layer Neural Network (FMMNN), a novel model that creates a strong synergy between them. We demonstrate that FMMNNs are highly effective and flexible in modeling high-frequency components. Our theoretical results demonstrate that FMMNNs have exponential expressive power for function approximation. We also analyze the optimization landscape of FMMNNs and find it to be much more favorable than that of standard fully connected neural networks, especially when dealing with high-frequency features. In addition, we propose a scaled random initialization method for the first layer’s weights in FMMNNs, which significantly speeds up training and enhances overall performance. Extensive numerical experiments support our theoretical insights, showing that FMMNNs consistently outperform traditional approaches in accuracy and efficiency across various tasks.
nan
Article 426
Title@2025-06-24 (2): Scaling Speculative Decoding with Lookahead Reasoning
Title: Scaling Speculative Decoding with Lookahead Reasoning | Spekulative Dekodierung mit Blick auf die Vernunft skalieren | 带有 “ 眼前 “ 理由的 投机替代 2506.19830v1 |
Authors (5): Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang
Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire $\gamma$-token guess is correct falls exponentially as $\gamma$ grows. This means allocating more compute for longer token drafts faces an algorithmic ceiling – making the speedup modest and hardware-agnostic. We raise this ceiling with Lookahead Reasoning, which exploits a second, step-level layer of parallelism. Our key insight is that reasoning models generate step-by-step, and each step needs only to be semantically correct, not exact token matching. In Lookahead Reasoning, a lightweight draft model proposes several future steps; the target model expands each proposal in one batched pass, and a verifier keeps semantically correct steps while letting the target regenerate any that fail. Token-level SD still operates within each reasoning step, so the two layers of parallelism multiply. We show Lookahead Reasoning lifts the peak speedup of SD both theoretically and empirically. Across GSM8K, AIME, and other benchmarks, Lookahead Reasoning improves the speedup of SD from 1.4x to 2.1x while preserving answer quality, and its speedup scales better with additional GPU throughput. Our code is available at https://github.com/hao-ai-lab/LookaheadReasoning
nan
Article 427
Title@2025-06-24 (2): Persona Features Control Emergent Misalignment
Title: Persona Features Control Emergent Misalignment | Persona Eigenschaften Kontrolle Emergent Fehlausrichtung | 人文特征控制 2506.19823v1 |
Authors (9): Miles Wang, Tom Dupré la Tour, Olivia Watkins, Alex Makelov, Ryan A. Chi, Samuel Miserendino, Johannes Heidecke, Tejal Patwardhan, Dan Mossing
Understanding how language models generalize behaviors from their training to a broader deployment distribution is an important problem in AI safety. Betley et al. discovered that fine-tuning GPT-4o on intentionally insecure code causes “emergent misalignment,” where models give stereotypically malicious responses to unrelated prompts. We extend this work, demonstrating emergent misalignment across diverse conditions, including reinforcement learning on reasoning models, fine-tuning on various synthetic datasets, and in models without safety training. To investigate the mechanisms behind this generalized misalignment, we apply a “model diffing” approach using sparse autoencoders to compare internal model representations before and after fine-tuning. This approach reveals several “misaligned persona” features in activation space, including a toxic persona feature which most strongly controls emergent misalignment and can be used to predict whether a model will exhibit such behavior. Additionally, we investigate mitigation strategies, discovering that fine-tuning an emergently misaligned model on just a few hundred benign samples efficiently restores alignment.
nan
Article 428
Title@2025-06-24 (2): ProxelGen: Generating Proteins as 3D Densities
Title: ProxelGen: Generating Proteins as 3D Densities | ProxelGen: Proteine als 3D-Dichte generieren | ProxelGen: 将蛋白质生成为 3D 密度 2506.19820v1 |
Authors (4): Felix Faltings, Hannes Stark, Regina Barzilay, Tommi Jaakkola
We develop ProxelGen, a protein structure generative model that operates on 3D densities as opposed to the prevailing 3D point cloud representations. Representing proteins as voxelized densities, or proxels, enables new tasks and conditioning capabilities. We generate proteins encoded as proxels via a 3D CNN-based VAE in conjunction with a diffusion model operating on its latent space. Compared to state-of-the-art models, ProxelGen’s samples achieve higher novelty, better FID scores, and the same level of designability as the training set. ProxelGen’s advantages are demonstrated in a standard motif scaffolding benchmark, and we show how 3D density-based generation allows for more flexible shape conditioning.
nan
Article 429
Title@2025-06-24 (2): Model-Based Exploration in Monitored Markov Decision Processes
Title: Model-Based Exploration in Monitored Markov Decision Processes | Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen | 在监测的Markov决策过程中进行基于模型的探索 2502.16772v5 |
Authors (4): Alireza Kazemipour, Simone Parisi, Matthew E. Taylor, Michael Bowling
A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for ‘unsolvable’ Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in over four dozen benchmarks, and even more dramatic improvement when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.
nan
Article 430
Title@2025-06-24 (2): Curating art exhibitions using machine learning
Title: Curating art exhibitions using machine learning | Kunstausstellungen mit maschinellem Lernen kuratieren | 利用机器学习,举办美术展览 2506.19813v1 |
Authors (1): Eurico Covas
Art curatorship has always been mostly the subjective work of human experts, who, with extensive knowledge of many and diverse artworks, select a few of those to present in communal spaces, spaces that evolved into what we now call art galleries. There are no hard and fast set of rules on how to select these artworks, given a theme which either is presented to the art curator or constructed by her/him. Here we present a series of artificial models – a total of four related models – based on machine learning techniques (a subset of artificial intelligence) that attempt to learn from existing exhibitions which have been curated by human experts, in order to be able to do similar curatorship work. We focus exclusively on the last 25 years of past exhibitions at the Metropolitan Museum of Art in New York, due to the quality of the data available and the physical and time limitations of our research. Our four artificial intelligence models achieve a reasonable ability at imitating these various curators responsible for all those exhibitions, with various degrees of precision and curatorial coherence. In particular, we can conclude two key insights: first, that there is sufficient information in these exhibitions to construct an artificial intelligence model that replicates past exhibitions with an accuracy well above random choices; second, that using feature engineering and carefully designing the architecture of modest size models can make them as good as those using the so-called large language models such as GPT in a brute force approach. We also believe, based on small attempts to use the models in out-of-sample experiments, that given more much more data, it should be possible for these kinds of artificial intelligence agents to be closer and closer to the aesthetic and curatorial judgment of human art curators.
nan
Article 431
Title@2025-06-24 (2): Ambiguous Online Learning
Title: Ambiguous Online Learning | Vielfältiges Online-Lernen | 模糊的在线学习 2506.19810v1 |
Authors (1): Vanessa Kosoy
We propose a new variant of online learning that we call “ambiguous online learning”. In this setting, the learner is allowed to produce multiple predicted labels. Such an “ambiguous prediction” is considered correct when at least one of the labels is correct, and none of the labels are “predictably wrong”. The definition of “predictably wrong” comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is “predictably wrong” if it’s not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called “apple tasting”. We show that in this setting, there is a trichotomy of mistake bounds: up to logarithmic factors, any hypothesis class has an optimal mistake bound of either Theta(1), Theta(sqrt(N)) or N.
nan
Article 432
Title@2025-06-24 (2): KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
Title: KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität | KnowRL:探索知识强化学习促进事实质量 2506.19807v1 |
Authors (5): Baochang Ren, Shuofei Qiao, Wenhao Yu, Huajun Chen, Ningyu Zhang
Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.
nan
Article 433
Title@2025-06-24 (2): First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models
Title: First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models | First-Passage-Ansatz zur Optimierung von Störungen für verbessertes Training von Machine Learning-Modellen | 优化干扰以改进机械学习模式培训的第一套办法 2502.04121v3 |
Authors (4): Sagi Meir, Tommer D. Keidar, Shlomi Reuveni, Barak Hirshberg
Machine learning models have become indispensable tools in applications across the physical sciences. Their training is often time-consuming, vastly exceeding the inference timescales. Several protocols have been developed to perturb the learning process and improve the training, such as shrink and perturb, warm restarts, and stochastic resetting. For classifiers, these perturbations have been shown to result in enhanced speedups or improved generalization. However, the design of such perturbations is usually done ad hoc by intuition and trial and error. To rationally optimize training protocols, we frame them as first-passage processes and consider their response to perturbations. We show that if the unperturbed learning process reaches a quasi-steady state, the response at a single perturbation frequency can predict the behavior at a wide range of frequencies. We employ this approach to a CIFAR-10 classifier using the ResNet-18 model and identify a useful perturbation and frequency among several possibilities. We demonstrate the transferability of the approach to other datasets, architectures, optimizers and even tasks (regression instead of classification). Our work allows optimization of perturbations for improving the training of machine learning models using a first-passage approach.
nan
Article 434
Title@2025-06-24 (2): Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective
Title: Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective | Convolution-Gewichtungsmethode für das physikinformierte neuronale Netzwerk: Eine primär-duale Optimierungsperspektive | 物理学-知情神经网络的革命加权法:原始-多极优化视角 2506.19805v1 |
Authors (2): Chenhao Si, Ming Yan
Physics-informed neural networks (PINNs) are extensively employed to solve partial differential equations (PDEs) by ensuring that the outputs and gradients of deep learning models adhere to the governing equations. However, constrained by computational limitations, PINNs are typically optimized using a finite set of points, which poses significant challenges in guaranteeing their convergence and accuracy. In this study, we proposed a new weighting scheme that will adaptively change the weights to the loss functions from isolated points to their continuous neighborhood regions. The empirical results show that our weighting scheme can reduce the relative $L^2$ errors to a lower value.
nan
Article 435
Title@2025-06-24 (2): Multiscale Training of Convolutional Neural Networks
Title: Multiscale Training of Convolutional Neural Networks | Multiskalige Ausbildung konvolutionärer neuraler Netzwerke | 革命神经网络的多规模培训 2501.12739v3 |
Authors (4): Shadab Ahamed, Niloufar Zakariaei, Eldad Haber, Moshe Eliasof
Training convolutional neural networks (CNNs) on high-resolution images is often bottlenecked by the cost of evaluating gradients of the loss on the finest spatial mesh. To address this, we propose Multiscale Gradient Estimation (MGE), a Multilevel Monte Carlo-inspired estimator that expresses the expected gradient on the finest mesh as a telescopic sum of gradients computed on progressively coarser meshes. By assigning larger batches to the cheaper coarse levels, MGE achieves the same variance as single-scale stochastic gradient estimation while reducing the number of fine mesh convolutions by a factor of 4 with each downsampling. We further embed MGE within a Full-Multiscale training algorithm that solves the learning problem on coarse meshes first and “hot-starts” the next finer level, cutting the required fine mesh iterations by an additional order of magnitude. Extensive experiments on image denoising, deblurring, inpainting and super-resolution tasks using UNet, ResNet and ESPCN backbones confirm the practical benefits: Full-Multiscale reduces the computation costs by 4-16$\times$ with no significant loss in performance. Together, MGE and Full-Multiscale offer a principled, architecture-agnostic route to accelerate CNN training on high-resolution data without sacrificing accuracy, and they can be combined with other variance-reduction or learning-rate schedules to further enhance scalability.
nan
Article 436
Title@2025-06-24 (2): Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Title: Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study | Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie | 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v1 |
Authors (10): Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang
Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs’ analytical reasoning capabilities.
nan
Article 437
Title@2025-06-24 (2): A comparative analysis of machine learning algorithms for predicting probabilities of default
Title: A comparative analysis of machine learning algorithms for predicting probabilities of default | Eine vergleichende Analyse von maschinellen Lernalgorithmen zur Vorhersage von Ausfallwahrscheinlichkeiten | 对用于预测违约概率的机器学习算法进行比较分析 2506.19789v1 |
Authors (2): Adrian Iulian Cristescu, Matteo Giordano
Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions. In recent years, machine learning (ML) algorithms have achieved remarkable success across a wide variety of prediction tasks; yet, they remain relatively underutilised in credit risk analysis. This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models-Random Forests, Decision Trees, XGBoost, Gradient Boosting and AdaBoost-to the predominantly used logistic regression, over a benchmark dataset from Scheule et al. (Credit Risk Analytics: The R Companion). Our findings underscore the strengths and weaknesses of each method, providing valuable insights into the most effective ML algorithms for PD prediction in the context of loan portfolios.
nan
Article 438
Title@2025-06-24 (2): SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Title: SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | SRFT: Einstufige Methode mit überwachter und verstärkter Feinsteuerung für die Vernunft | SRFT: 单一标准方法,以监督和加固为理由的罚款 2506.19767v1 |
Authors (10): Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao
Large language models (LLMs) have achieved remarkable progress in reasoning tasks, yet the optimal integration of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) remains a fundamental challenge. Through comprehensive analysis of token distributions, learning dynamics, and integration mechanisms from entropy-based perspectives, we reveal key differences between these paradigms: SFT induces coarse-grained global changes to LLM policy distributions, while RL performs fine-grained selective optimizations, with entropy serving as a critical indicator of training effectiveness. Building on these observations, we propose Supervised Reinforcement Fine-Tuning (SRFT), a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms. Our approach simultaneously applies SFT and RL to directly optimize the LLM using demonstrations and self-exploration rollouts rather than through two-stage sequential methods. Extensive experiments show that SRFT achieves 59.1% average accuracy, outperforming zero-RL methods by 9.0% on five mathematical reasoning benchmarks and 10.9% on three out-of-distribution benchmarks.
nan
Article 439
Title@2025-06-24 (2): FDA-Opt: Communication-Efficient Federated Fine-Tuning of Language Models
Title: FDA-Opt: Communication-Efficient Federated Fine-Tuning of Language Models | FDA-Opt: Kommunikationseffizientes Federated Fine-Tuning von Sprachmodellen | FFDA-Opt: 交流-高效联邦语言模型精密使用 2505.04535v2 |
Authors (3): Michail Theologitis, Vasilis Samoladas, Antonios Deligiannakis
Federated Learning (FL) enables the utilization of vast, previously inaccessible data sources. At the same time, pre-trained Language Models (LMs) have taken the world by storm and for good reason. They exhibit remarkable emergent abilities and are readily adapted to downstream tasks. This opens one of the most exciting frontiers in FL: fine-tuning LMs. Yet, a persistent challenge in FL is the frequent, rigid communication of parameters – a problem magnified by the sheer size of these contemporary models. The FedOpt family of algorithms has become the go-to approach for FL, relying on fixed but arbitrary intervals for model exchanges. Recently, the FDA algorithm prescribed a dynamic approach by monitoring the training progress. However, it introduced a hard-to-calibrate parameter and imposed a rigid synchronization scheme. In this work, we address these limitations by proposing the FDA-Opt family of algorithms – a unified generalization of both FDA and FedOpt. Our experimental evaluation focuses on fine-tuning LMs on downstream NLP tasks and demonstrates that FDA-Opt outperforms FedOpt even when it is configured with hyper-parameters specifically optimized for the latter. In other words, we show that FDA-Opt is a practical, drop-in replacement for FedOpt in modern FL libraries and systems: it requires no additional configuration and delivers superior performance out of the box.
nan
Article 440
Title@2025-06-24 (2): The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series
Title: The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series | Die Form des Konsumverhaltens: Eine symbolische und topologische Analyse der Zeitreihen | 《消费者行为形态:时间序列的象征和地形分析》 2506.19759v1 |
Authors (2): Pola Bereta, Ioannis Diamantis
Understanding temporal patterns in online search behavior is crucial for real-time marketing and trend forecasting. Google Trends offers a rich proxy for public interest, yet the high dimensionality and noise of its time-series data present challenges for effective clustering. This study evaluates three unsupervised clustering approaches, Symbolic Aggregate approXimation (SAX), enhanced SAX (eSAX), and Topological Data Analysis (TDA), applied to 20 Google Trends keywords representing major consumer categories. Our results show that while SAX and eSAX offer fast and interpretable clustering for stable time series, they struggle with volatility and complexity, often producing ambiguous ``catch-all’’ clusters. TDA, by contrast, captures global structural features through persistent homology and achieves more balanced and meaningful groupings. We conclude with practical guidance for using symbolic and topological methods in consumer analytics and suggest that hybrid approaches combining both perspectives hold strong potential for future applications.
nan
Article 441
Title@2025-06-24 (2): Cross-regularization: Adaptive Model Complexity through Validation Gradients
Title: Cross-regularization: Adaptive Model Complexity through Validation Gradients | Cross-Regulierung: Adaptive Modellkomplexität durch Validierungsgradienten | 交叉正规化:通过验证梯度使适应性模型复杂度 2506.19755v1 |
Authors (1): Carlos Stein Brito
Model regularization requires extensive manual tuning to balance complexity against overfitting. Cross-regularization resolves this tradeoff by directly adapting regularization parameters through validation gradients during training. The method splits parameter optimization - training data guides feature learning while validation data shapes complexity controls - converging provably to cross-validation optima. When implemented through noise injection in neural networks, this approach reveals striking patterns: unexpectedly high noise tolerance and architecture-specific regularization that emerges organically during training. Beyond complexity control, the framework integrates seamlessly with data augmentation, uncertainty calibration and growing datasets while maintaining single-run efficiency through a simple gradient-based approach.
nan
Article 442
Title@2025-06-24 (2): A Robust Twin Parametric Margin Support Vector Machine for Multiclass Classification
Title: A Robust Twin Parametric Margin Support Vector Machine for Multiclass Classification | Eine robuste Twin-Parametrische Margin-Unterstützungs-Vektormaschine für die Multiclass-Klassifikation | 多级分类的强力双双参数边距支持矢量机 2306.06213v3 |
Authors (3): Renato De Leone, Francesca Maggioni, Andrea Spinelli
In this paper, we introduce novel Twin Parametric Margin Support Vector Machine (TPMSVM) models designed to address multiclass classification tasks under feature uncertainty. To handle data perturbations, we construct bounded-by-norm uncertainty set around each training observation and derive the robust counterparts of the deterministic models using robust optimization techniques. To capture complex data structure, we explore both linear and kernel-induced classifiers, providing computationally tractable reformulations of the resulting robust models. Additionally, we propose two alternatives for the final decision function, enhancing models’ flexibility. Finally, we validate the effectiveness of the proposed robust multiclass TPMSVM methodology on real-world datasets, showing the good performance of the approach in the presence of uncertainty.
nan
Article 443
Title@2025-06-24 (2): On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls
Title: On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls | Über die Notwendigkeit einer adaptiven Regularisierung: Optimales Online-Lernen jederzeit auf $\boldsymbol{\ell_p}$-Bällen | 关于适应性规范化的必要性: 最佳时间在网上学习$\ boldsysymbol_ell_p}$balls 2506.19752v1 |
Authors (4): Emmeran Johnson, David Martínez-Rubio, Ciara Pike-Burke, Patrick Rebeschini
We study online convex optimization on $\ell_p$-balls in $\mathbb{R}^d$ for $p > 2$. While always sub-linear, the optimal regret exhibits a shift between the high-dimensional setting ($d > T$), when the dimension $d$ is greater than the time horizon $T$ and the low-dimensional setting ($d \leq T$). We show that Follow-the-Regularised-Leader (FTRL) with time-varying regularisation which is adaptive to the dimension regime is anytime optimal for all dimension regimes. Motivated by this, we ask whether it is possible to obtain anytime optimality of FTRL with fixed non-adaptive regularisation. Our main result establishes that for separable regularisers, adaptivity in the regulariser is necessary, and that any fixed regulariser will be sub-optimal in one of the two dimension regimes. Finally, we provide lower bounds which rule out sub-linear regret bounds for the linear bandit problem in sufficiently high-dimension for all $\ell_p$-balls with $p \geq 1$.
nan
Article 444
Title@2025-06-24 (2): Continuous Bayesian Model Selection for Multivariate Causal Discovery
Title: Continuous Bayesian Model Selection for Multivariate Causal Discovery | Kontinuierliche bayesische Modellauswahl für multivariate Kausalentdeckung | 多变因果发现连续的巴伊西亚模型选择 2411.10154v2 |
Authors (5): Anish Dhir, Ruby Sedgwick, Avinash Kori, Ben Glocker, Mark van der Wilk
Current causal discovery approaches require restrictive model assumptions in the absence of interventional data to ensure structure identifiability. These assumptions often do not hold in real-world applications leading to a loss of guarantees and poor performance in practice. Recent work has shown that, in the bivariate case, Bayesian model selection can greatly improve performance by exchanging restrictive modelling for more flexible assumptions, at the cost of a small probability of making an error. Our work shows that this approach is useful in the important multivariate case as well. We propose a scalable algorithm leveraging a continuous relaxation of the discrete model selection problem. Specifically, we employ the Causal Gaussian Process Conditional Density Estimator (CGP-CDE) as a Bayesian non-parametric model, using its hyperparameters to construct an adjacency matrix. This matrix is then optimised using the marginal likelihood and an acyclicity regulariser, giving the maximum a posteriori causal graph. We demonstrate the competitiveness of our approach, showing it is advantageous to perform multivariate causal discovery without infeasible assumptions using Bayesian model selection.
nan
Article 445
Title@2025-06-24 (2): DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Title: DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization | DecDEC: Ein Systemansatz zur Steigerung der LLM-Quantisierung mit niedrigem Bit | DecDEC: 推进低比低级LLM量化的系统方法 2412.20185v2 |
Authors (4): Yeonhong Park, Jake Hyun, Hojoon Kim, Jae W. Lee
Quantization of Large Language Models (LLMs) has recently gained popularity, particularly for on-device settings with limited hardware resources. While efficient, quantization inevitably degrades model quality, especially in aggressive low-bit settings such as 3-bit and 4-bit precision. In this paper, we propose DecDEC, an inference scheme that improves the quality of low-bit LLMs while preserving the key benefits of quantization: GPU memory savings and latency reduction. DecDEC stores the residual matrix – the difference between full-precision and quantized weights – in CPU, and dynamically fetches the residuals for only a small portion of the weights. This portion corresponds to the salient channels, marked by activation outliers, with the fetched residuals helping to correct quantization errors in these channels. Salient channels are identified dynamically at each decoding step by analyzing the input activations – this enables adaptation to the dynamic nature of activation distribution, thus maximizing the effectiveness of error compensation. We demonstrate the effectiveness of DecDEC by augmenting state-of-the-art quantization methods. For example, DecDEC reduces the perplexity of a 3-bit Llama-3-8B-Instruct model from 10.15 to 9.12 – outperforming its 3.5-bit counterpart – while adding less than 0.0003\% to GPU memory usage and incurring only a 1.7\% inference slowdown on NVIDIA RTX 4050 Mobile.
nan
Article 446
Title@2025-06-24 (2): Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls
Title: Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls | Noise Consistency Training: Ein nativer Ansatz für One-Step-Generator beim Lernen zusätzlicher Steuerungen | 噪音一致性培训:在学习额外控制措施方面对单步发电机采取土著办法 2506.19741v1 |
Authors (4): Yihong Luo, Shuchen Xue, Tianyang Hu, Jing Tang
The pursuit of efficient and controllable high-quality content generation remains a central challenge in artificial intelligence-generated content (AIGC). While one-step generators, enabled by diffusion distillation techniques, offer excellent generation quality and computational efficiency, adapting them to new control conditions–such as structural constraints, semantic guidelines, or external inputs–poses a significant challenge. Conventional approaches often necessitate computationally expensive modifications to the base model and subsequent diffusion distillation. This paper introduces Noise Consistency Training (NCT), a novel and lightweight approach to directly integrate new control signals into pre-trained one-step generators without requiring access to original training images or retraining the base diffusion model. NCT operates by introducing an adapter module and employs a noise consistency loss in the noise space of the generator. This loss aligns the adapted model’s generation behavior across noises that are conditionally dependent to varying degrees, implicitly guiding it to adhere to the new control. Theoretically, this training objective can be understood as minimizing the distributional distance between the adapted generator and the conditional distribution induced by the new conditions. NCT is modular, data-efficient, and easily deployable, relying only on the pre-trained one-step generator and a control signal model. Extensive experiments demonstrate that NCT achieves state-of-the-art controllable generation in a single forward pass, surpassing existing multi-step and distillation-based methods in both generation quality and computational efficiency. Code is available at https://github.com/Luo-Yihong/NCT
nan
Article 447
Title@2025-06-24 (2): Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery
Title: Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery | Q2SAR: Ein Quantum-Multiple-Kernel-Lernansatz für die Drogenentdeckung | Q2SAR:药物发现量子多核心学习方法 2506.14920v2 |
Authors (5): Alejandro Giraldo, Daniel Ruiz, Mariano Caruso, Javier Mancilla, Guido Bellomo
Quantitative Structure-Activity Relationship (QSAR) modeling is a cornerstone of computational drug discovery. This research demonstrates the successful application of a Quantum Multiple Kernel Learning (QMKL) framework to enhance QSAR classification, showing a notable performance improvement over classical methods. We apply this methodology to a dataset for identifying DYRK1A kinase inhibitors. The workflow involves converting SMILES representations into numerical molecular descriptors, reducing dimensionality via Principal Component Analysis (PCA), and employing a Support Vector Machine (SVM) trained on an optimized combination of multiple quantum and classical kernels. By benchmarking the QMKL-SVM against a classical Gradient Boosting model, we show that the quantum-enhanced approach achieves a superior AUC score, highlighting its potential to provide a quantum advantage in challenging cheminformatics classification tasks.
nan
Article 448
Title@2025-06-24 (2): Unscrambling disease progression at scale: fast inference of event permutations with optimal transport
Title: Unscrambling disease progression at scale: fast inference of event permutations with optimal transport | Verkrampfte Krankheitsprogression im Maßstab: schnelle Schlussfolgerung von Ereignispermutationen mit optimalem Transport | 分解疾病大规模演变:以最佳运输方式快速推断事件变异 2410.14388v3 |
Authors (2): Peter A. Wijeratne, Daniel C. Alexander
Disease progression models infer group-level temporal trajectories of change in patients’ features as a chronic degenerative condition plays out. They provide unique insight into disease biology and staging systems with individual-level clinical utility. Discrete models consider disease progression as a latent permutation of events, where each event corresponds to a feature becoming measurably abnormal. However, permutation inference using traditional maximum likelihood approaches becomes prohibitive due to combinatoric explosion, severely limiting model dimensionality and utility. Here we leverage ideas from optimal transport to model disease progression as a latent permutation matrix of events belonging to the Birkhoff polytope, facilitating fast inference via optimisation of the variational lower bound. This enables a factor of 1000 times faster inference than the current state of the art and, correspondingly, supports models with several orders of magnitude more features than the current state of the art can consider. Experiments demonstrate the increase in speed, accuracy and robustness to noise in simulation. Further experiments with real-world imaging data from two separate datasets, one from Alzheimer’s disease patients, the other age-related macular degeneration, showcase, for the first time, pixel-level disease progression events in the brain and eye, respectively. Our method is low compute, interpretable and applicable to any progressive condition and data modality, giving it broad potential clinical utility.
nan
Article 449
Title@2025-06-24 (2): DRIFT: Data Reduction via Informative Feature Transformation- Generalization Begins Before Deep Learning starts
Title: DRIFT: Data Reduction via Informative Feature Transformation- Generalization Begins Before Deep Learning starts | DRIFT: Datenreduktion durch Informative Feature Transformation- Verallgemeinerung beginnt bevor Deep Learning startet | DRIFT: 在深学习开始前通过信息特征转换普遍化开始减少数据 2506.19734v1 |
Authors (1): Ben Keslaki
Modern deep learning architectures excel at optimization, but only after the data has entered the network. The true bottleneck lies in preparing the right input: minimal, salient, and structured in a way that reflects the essential patterns of the data. We propose DRIFT (Data Reduction via Informative Feature Transformation), a novel preprocessing technique inspired by vibrational analysis in physical systems, to identify and extract the most resonant modes of input data prior to training. Unlike traditional models that attempt to learn amidst both signal and noise, DRIFT mimics physics perception by emphasizing informative features while discarding irrelevant elements. The result is a more compact and interpretable representation that enhances training stability and generalization performance. In DRIFT, images are projected onto a low-dimensional basis formed by spatial vibration mode shapes of plates, offering a physically grounded feature set. This enables neural networks to operate with drastically fewer input dimensions (~ 50 features on MNIST and less than 100 on CIFAR100) while achieving competitive classification accuracy. Extensive experiments across MNIST and CIFAR100 demonstrate DRIFT’s superiority over standard pixel-based models and PCA in terms of training stability, resistance to overfitting, and generalization robustness. Notably, DRIFT displays minimal sensitivity to changes in batch size, network architecture, and image resolution, further establishing it as a resilient and efficient data representation strategy. This work shifts the focus from architecture engineering to input curation and underscores the power of physics-driven data transformations in advancing deep learning performance.
nan
Article 450
Title@2025-06-24 (2): Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units
Title: Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units | Wer macht was im Deep Learning? Multidimensionale Game-Theoretische Zuordnung der Funktion neuraler Einheiten | 谁在深层学习中做什么? 神经单位功能的多层面游戏理论归属 2506.19732v1 |
Authors (6): Shrey Dixit, Kayson Fakhar, Fatemeh Hadaeghi, Patrick Mineault, Konrad P. Kording, Claus C. Hilgetag
Neural networks now generate text, images, and speech with billions of parameters, producing a need to know how each neural unit contributes to these high-dimensional outputs. Existing explainable-AI methods, such as SHAP, attribute importance to inputs, but cannot quantify the contributions of neural units across thousands of output pixels, tokens, or logits. Here we close that gap with Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic framework. By systematically lesioning combinations of units, MSA yields Shapley Modes, unit-wise contribution maps that share the exact dimensionality of the model’s output. We apply MSA across scales, from multi-layer perceptrons to the 56-billion-parameter Mixtral-8x7B and Generative Adversarial Networks (GAN). The approach demonstrates how regularisation concentrates computation in a few hubs, exposes language-specific experts inside the LLM, and reveals an inverted pixel-generation hierarchy in GANs. Together, these results showcase MSA as a powerful approach for interpreting, editing, and compressing deep neural networks.
nan
Article 451
Title@2025-06-24 (2): IgCONDA-PET: Weakly-Supervised PET Anomaly Detection using Implicitly-Guided Attention-Conditional Counterfactual Diffusion Modeling – a Multi-Center, Multi-Cancer, and Multi-Tracer Study
Title: IgCONDA-PET: Weakly-Supervised PET Anomaly Detection using Implicitly-Guided Attention-Conditional Counterfactual Diffusion Modeling – a Multi-Center, Multi-Cancer, and Multi-Tracer Study | IgCONDA-PET: Schwachüberwachte PET-Anomalie-Erkennung mittels implizit geführter Aufmerksamkeits-Bedingtheits-Kontrafaktual-Diffusionsmodellierung – eine Multi-Center-, Multi-Cancer- und Multi-Tracer-Studie | IgCONDA-PET:使用隐性引导的注意-有条件反扩散模型 – – 多中心、多癌症和多跟踪研究 – – 多中心、多癌症和多跟踪研究 – – 进行微弱超弱PET异常探测 2405.00239v3 |
Authors (2): Shadab Ahamed, Arman Rahmim
Minimizing the need for pixel-level annotated data to train PET lesion detection and segmentation networks is highly desired and can be transformative, given time and cost constraints associated with expert annotations. Current unsupervised or weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks (GANs) trained only on healthy data. While these approaches reduce annotation dependency, GAN-based methods are notably more challenging to train than non-GAN alternatives (such as autoencoders) due to issues such as the simultaneous optimization of two competing networks, mode collapse, and training instability. In this paper, we present the weakly-supervised $\textbf{I}$mplicitly-$\textbf{g}$uided $\textbf{CO}$u$\textbf{N}$terfactual diffusion model for $\textbf{D}$etecting $\textbf{A}$nomalies in $\textbf{PET}$ images (IgCONDA-PET). The solution is developed and validated using PET scans from six retrospective cohorts consisting of a total of 2652 cases (multi-cancer, multi-tracer) containing both local and public datasets (spanning multiple centers). The training is conditioned on image class labels (healthy vs. unhealthy) via attention modules, and we employ implicit diffusion guidance. We perform counterfactual generation which facilitates “unhealthy-to-healthy” domain translation by generating a synthetic, healthy version of an unhealthy input image, enabling the detection of anomalies through the calculated differences. The performance of our method was compared against several other deep learning based weakly-supervised or unsupervised methods as well as traditional methods like 41% SUV$_\text{max}$ thresholding. We also highlight the importance of incorporating attention modules in our network for the detection of small anomalies. The code is publicly available at: https://github.com/ahxmeds/IgCONDA-PET.git.
nan
Article 452
Title@2025-06-24 (2): Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Title: Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving | Lokale Look-Ahead-Anleitung über Verifier-in-the-Loop für automatisierte Theorem-Proving | 通过自动理论验证人在线验证人进行自动理论验证,指导当地目视中心 2503.09730v2 |
Authors (4): Sara Rajaee, Kumar Pratik, Gabriele Cesa, Arash Behboodi
The most promising recent methods for AI reasoning require applying variants of reinforcement learning (RL) either on rolled out trajectories from the LLMs, even for the step-wise rewards, or large quantities of human-annotated trajectory data. The reliance on the rolled-out trajectory renders the compute cost and time prohibitively high. In particular, the correctness of a reasoning trajectory can typically only be judged at its completion, leading to sparse rewards in RL or requiring expensive synthetic data generation in expert iteration-like methods. In this work, we focus on the Automatic Theorem Proving (ATP) task and propose a novel verifier-in-the-loop design, which, unlike existing approaches that leverage feedback on the entire reasoning trajectory, employs an automated verifier to give intermediate feedback at each step of the reasoning process. Using Lean as the verifier, we empirically show that the step-by-step local verification produces a global improvement in the model’s reasoning accuracy and efficiency.
nan
Article 453
Title@2025-06-24 (2): Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty
Title: Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty | Geometrisch-Bewusst Variationelle Schlussfolgerung: Robuste und adaptive Regularisierung mit Richtungsgewichtsunsicherheit | 几何-软件变化推断:强力和适应性规范化,具有方向性重量不确定性 2506.19726v1 |
Authors (1): Carlos Stein Brito
Deep neural networks require principled uncertainty quantification, yet existing variational inference methods often employ isotropic Gaussian approximations in weight space that poorly match the network’s inherent geometry. We address this mismatch by introducing Concentration-Adapted Perturbations (CAP), a variational framework that models weight uncertainties directly on the unit hypersphere using von Mises-Fisher distributions. Building on recent work in radial-directional posterior decompositions and spherical weight constraints, CAP provides the first complete theoretical framework connecting directional statistics to practical noise regularization in neural networks. Our key contribution is an analytical derivation linking vMF concentration parameters to activation noise variance, enabling each layer to learn its optimal uncertainty level through a novel closed-form KL divergence regularizer. In experiments on CIFAR-10, CAP significantly improves model calibration - reducing Expected Calibration Error by 5.6x - while providing interpretable layer-wise uncertainty profiles. CAP requires minimal computational overhead and integrates seamlessly into standard architectures, offering a theoretically grounded yet practical approach to uncertainty quantification in deep learning.
nan
Article 454
Title@2025-06-24 (2): Identifying Unknown Stochastic Dynamics via Finite expression methods
Title: Identifying Unknown Stochastic Dynamics via Finite expression methods | Unbekannte Stochastische Dynamik über Finite-Expression-Methoden identifizieren | 通过 Finite 表达式方法识别未知的斯托卡动态 2504.07085v3 |
Authors (3): Senwei Liang, Chunmei Wang, Xingjian Xu
Modeling stochastic differential equations (SDEs) is crucial for understanding complex dynamical systems in various scientific fields. Recent methods often employ neural network-based models, which typically represent SDEs through a combination of deterministic and stochastic terms. However, these models usually lack interpretability and have difficulty generalizing beyond their training domain. This paper introduces the Finite Expression Method (FEX), a symbolic learning approach designed to derive interpretable mathematical representations of the deterministic component of SDEs. For the stochastic component, we integrate FEX with advanced generative modeling techniques to provide a comprehensive representation of SDEs. The numerical experiments on linear, nonlinear, and multidimensional SDEs demonstrate that FEX generalizes well beyond the training domain and delivers more accurate long-term predictions compared to neural network-based methods. The symbolic expressions identified by FEX not only improve prediction accuracy but also offer valuable scientific insights into the underlying dynamics of the systems, paving the way for new scientific discoveries.
nan
Article 455
Title@2025-06-24 (2): Conservative quantum offline model-based optimization
Title: Conservative quantum offline model-based optimization | Konservative Quanten-Offline-Modell-basierte Optimierung | 保守性量子离线离线模型优化 2506.19714v1 |
Authors (5): Kristian Sotirov, Annie E. Paine, Savvas Varsamopoulos, Antonio A. Gentile, Osvaldo Simeone
Offline model-based optimization (MBO) refers to the task of optimizing a black-box objective function using only a fixed set of prior input-output data, without any active experimentation. Recent work has introduced quantum extremal learning (QEL), which leverages the expressive power of variational quantum circuits to learn accurate surrogate functions by training on a few data points. However, as widely studied in the classical machine learning literature, predictive models may incorrectly extrapolate objective values in unexplored regions, leading to the selection of overly optimistic solutions. In this paper, we propose integrating QEL with conservative objective models (COM) - a regularization technique aimed at ensuring cautious predictions on out-of-distribution inputs. The resulting hybrid algorithm, COM-QEL, builds on the expressive power of quantum neural networks while safeguarding generalization via conservative modeling. Empirical results on benchmark optimization tasks demonstrate that COM-QEL reliably finds solutions with higher true objective values compared to the original QEL, validating its superiority for offline design problems.
nan
Article 456
Title@2025-06-24 (2): Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales
Title: Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales | Anleitung im Frequenzbereich ermöglicht High-Fidelity-Sampling bei niedrigen CFG-Skalen | CFG 低CFG 尺度高频域允许高频度抽样的指南 2506.19713v1 |
Authors (4): Seyedmorteza Sadat, Tobias Vontobel, Farnood Salehi, Romann M. Weber
Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies – as is done in standard CFG – leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.
nan
Article 457
Title@2025-06-24 (2): Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks
Title: Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks | Lernen-unterstützte Bigraph Matching Ansatz zur Multi-Crew Wiederherstellung beschädigter Stromnetze mit Straßentransport-Netzwerke gekoppelt | 与公路运输网相结合的多组恢复受损电力网的学习辅助活书匹配方法 2506.19703v1 |
Authors (5): Nathan Maurer, Harshal Kaushik, Roshni Anna Jacob, Jie Zhang, Souma Chowdhury
The resilience of critical infrastructure networks (CINs) after disruptions, such as those caused by natural hazards, depends on both the speed of restoration and the extent to which operational functionality can be regained. Allocating resources for restoration is a combinatorial optimal planning problem that involves determining which crews will repair specific network nodes and in what order. This paper presents a novel graph-based formulation that merges two interconnected graphs, representing crew and transportation nodes and power grid nodes, into a single heterogeneous graph. To enable efficient planning, graph reinforcement learning (GRL) is integrated with bigraph matching. GRL is utilized to design the incentive function for assigning crews to repair tasks based on the graph-abstracted state of the environment, ensuring generalization across damage scenarios. Two learning techniques are employed: a graph neural network trained using Proximal Policy Optimization and another trained via Neuroevolution. The learned incentive functions inform a bipartite graph that links crews to repair tasks, enabling weighted maximum matching for crew-to-task allocations. An efficient simulation environment that pre-computes optimal node-to-node path plans is used to train the proposed restoration planning methods. An IEEE 8500-bus power distribution test network coupled with a 21 square km transportation network is used as the case study, with scenarios varying in terms of numbers of damaged nodes, depots, and crews. Results demonstrate the approach’s generalizability and scalability across scenarios, with learned policies providing 3-fold better performance than random policies, while also outperforming optimization-based solutions in both computation time (by several orders of magnitude) and power restored.
nan
Article 458
Title@2025-06-24 (2): Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Title: Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models | Ausreißer-sicheres Pre-Training für robuste 4-Bit Quantisierung großer Sprachmodelle | 大语言模式强力四比四比四的量化培训前培训 2506.19697v1 |
Authors (5): Jungwoo Park, Taewhoo Lee, Chanwoong Yoon, Hyeon Hwang, Jaewoo Kang
Extreme activation outliers in Large Language Models (LLMs) critically degrade quantization performance, hindering efficient on-device deployment. While channel-wise operations and adaptive gradient scaling are recognized causes, practical mitigation remains challenging. We introduce Outlier-Safe Pre-Training (OSP), a practical guideline that proactively prevents outlier formation rather than relying on post-hoc mitigation. OSP combines three key innovations: (1) the Muon optimizer, eliminating privileged bases while maintaining training efficiency; (2) Single-Scale RMSNorm, preventing channel-wise amplification; and (3) a learnable embedding projection, redistributing activation magnitudes originating from embedding matrices. We validate OSP by training a 1.4B-parameter model on 1 trillion tokens, which is the first production-scale LLM trained without such outliers. Under aggressive 4-bit quantization, our OSP model achieves a 35.7 average score across 10 benchmarks (compared to 26.5 for an Adam-trained model), with only a 2% training overhead. Remarkably, OSP models exhibit near-zero excess kurtosis (0.04) compared to extreme values (1818.56) in standard models, fundamentally altering LLM quantization behavior. Our work demonstrates that outliers are not inherent to LLMs but are consequences of training strategies, paving the way for more efficient LLM deployment. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Outlier-Safe-Pre-Training.
nan
Article 459
Title@2025-06-24 (2): Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networks
Title: Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networks | Nahezu optimale Schätzungen für die Konstanten $\ell^p$-Lipschitz von tiefen zufälligen ReLU-Neuralnetzwerken | 深随机RLU神经网络的 $\ ell\ p$- Lipschitz 常数近于最佳的估计值 2506.19695v1 |
Authors (5): Sjoerd Dirksen, Patrick Finke, Paul Geuchen, Dominik Stöger, Felix Voigtlaender
This paper studies the $\ell^p$-Lipschitz constants of ReLU neural networks $\Phi: \mathbb{R}^d \to \mathbb{R}$ with random parameters for $p \in [1,\infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn from symmetric distributions. We derive high probability upper and lower bounds for wide networks that differ at most by a factor that is logarithmic in the network’s width and linear in its depth. In the special case of shallow networks, we obtain matching bounds. Remarkably, the behavior of the $\ell^p$-Lipschitz constant varies significantly between the regimes $ p \in [1,2) $ and $ p \in [2,\infty] $. For $p \in [2,\infty]$, the $\ell^p$-Lipschitz constant behaves similarly to $\Vert g\Vert_{p’}$, where $g \in \mathbb{R}^d$ is a $d$-dimensional standard Gaussian vector and $1/p + 1/p’ = 1$. In contrast, for $p \in [1,2)$, the $\ell^p$-Lipschitz constant aligns more closely to $\Vert g \Vert_{2}$.
nan
Article 460
Title@2025-06-24 (2): ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping
Title: ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping | ReBoot: Verschlüsseltes Training von Deep Neural Networks mit CKKS Bootstrapping | ReBoot:使用 CKKS 启动系统加密深神经网络培训 2506.19693v1 |
Authors (2): Alberto Pirillo, Luca Colombo
Growing concerns over data privacy underscore the need for deep learning methods capable of processing sensitive information without compromising confidentiality. Among privacy-enhancing technologies, Homomorphic Encryption (HE) stands out by providing post-quantum cryptographic security and end-to-end data protection, safeguarding data even during computation. While Deep Neural Networks (DNNs) have gained attention in HE settings, their use has largely been restricted to encrypted inference. Prior research on encrypted training has primarily focused on logistic regression or has relied on multi-party computation to enable model fine-tuning. This stems from the substantial computational overhead and algorithmic complexity involved in DNNs training under HE. In this paper, we present ReBoot, the first framework to enable fully encrypted and non-interactive training of DNNs. Built upon the CKKS scheme, ReBoot introduces a novel HE-compliant neural network architecture based on local error signals, specifically designed to minimize multiplicative depth and reduce noise accumulation. ReBoot employs a tailored packing strategy that leverages real-number arithmetic via SIMD operations, significantly lowering both computational and memory overhead. Furthermore, by integrating approximate bootstrapping, ReBoot learning algorithm supports effective training of arbitrarily deep multi-layer perceptrons, making it well-suited for machine learning as-a-service. ReBoot is evaluated on both image recognition and tabular benchmarks, achieving accuracy comparable to 32-bit floating-point plaintext training while enabling fully encrypted training. It improves test accuracy by up to +3.27% over encrypted logistic regression, and up to +6.83% over existing encrypted DNN frameworks, while reducing training latency by up to 8.83x. ReBoot is made available to the scientific community as a public repository.
nan
Article 461
Title@2025-06-24 (2): Leveraging Lightweight Generators for Memory Efficient Continual Learning
Title: Leveraging Lightweight Generators for Memory Efficient Continual Learning | Leveraging Lightweight Generators für Speicher Effizientes kontinuierliches Lernen | 利用轻型发电机促进记忆高效持续学习 2506.19692v1 |
Authors (4): Christiaan Lamers, Ahmed Nabil Belbachir, Thomas Bäck, Niki van Stein
Catastrophic forgetting can be trivially alleviated by keeping all data from previous tasks in memory. Therefore, minimizing the memory footprint while maximizing the amount of relevant information is crucial to the challenge of continual learning. This paper aims to decrease required memory for memory-based continuous learning algorithms. We explore the options of extracting a minimal amount of information, while maximally alleviating forgetting. We propose the usage of lightweight generators based on Singular Value Decomposition to enhance existing continual learning methods, such as A-GEM and Experience Replay. These generators need a minimal amount of memory while being maximally effective. They require no training time, just a single linear-time fitting step, and can capture a distribution effectively from a small number of data samples. Depending on the dataset and network architecture, our results show a significant increase in average accuracy compared to the original methods. Our method shows great potential in minimizing the memory footprint of memory-based continual learning algorithms.
nan
Article 462
Title@2025-06-24 (2): AYLA: Amplifying Gradient Sensitivity via Loss Transformation in Non-Convex Optimization
Title: AYLA: Amplifying Gradient Sensitivity via Loss Transformation in Non-Convex Optimization | AYLA: Verstärkte Gradientenempfindlichkeit durch Verlusttransformation in nicht konvexer Optimierung | AYLA:通过非Convex优化的亏损转化增强渐进感敏度 2504.01875v2 |
Authors (1): Ben Keslaki
Stochastic Gradient Descent (SGD) and its variants, such as ADAM, are foundational to deep learning optimization, adjusting model parameters through fixed or adaptive learning rates based on loss function gradients. However, these methods often struggle to balance adaptability and efficiency in high-dimensional, non-convex settings. This paper introduces AYLA, a novel optimization framework that enhances training dynamics via loss function transformation. AYLA applies a tunable power-law transformation to the loss, preserving critical points while scaling loss values to amplify gradient sensitivity and accelerate convergence. Additionally, we propose an effective learning rate that dynamically adapts to the transformed loss, further improving optimization efficiency. Empirical evaluations on minimizing a synthetic non-convex polynomial, solving a non-convex curve-fitting task, and performing digit classification (MNIST) and image recognition (CIFAR-100) demonstrate that AYLA consistently outperforms SGD and ADAM in both convergence speed and training stability. By reshaping the loss landscape, AYLA provides a model-agnostic enhancement to existing optimization methods, offering a promising advancement in deep neural network training.
nan
Article 463
Title@2025-06-24 (2): When Can We Reuse a Calibration Set for Multiple Conformal Predictions?
Title: When Can We Reuse a Calibration Set for Multiple Conformal Predictions? | Wann können wir eine Kalibrierung für mehrere konforme Vorhersagen wiederverwenden? | 什么时候我们才能重新使用一个校准装置 来进行多常规的预测呢? 2506.19689v1 |
Authors (2): A. A. Balinsky, A. D. Balinsky
Reliable uncertainty quantification is crucial for the trustworthiness of machine learning applications. Inductive Conformal Prediction (ICP) offers a distribution-free framework for generating prediction sets or intervals with user-specified confidence. However, standard ICP guarantees are marginal and typically require a fresh calibration set for each new prediction to maintain their validity. This paper addresses this practical limitation by demonstrating how e-conformal prediction, in conjunction with Hoeffding’s inequality, can enable the repeated use of a single calibration set with a high probability of preserving the desired coverage. Through a case study on the CIFAR-10 dataset, we train a deep neural network and utilise a calibration set to estimate a Hoeffding correction. This correction allows us to apply a modified Markov’s inequality, leading to the construction of prediction sets with quantifiable confidence. Our results illustrate the feasibility of maintaining provable performance in conformal prediction while enhancing its practicality by reducing the need for repeated calibration. The code for this work is publicly available.
nan
Article 464
Title@2025-06-24 (2): Model Guidance via Robust Feature Attribution
Title: Model Guidance via Robust Feature Attribution | Modellführung über robuste Eigenschaftszuweisung | 通过强力地物学示范指导 2506.19680v1 |
Authors (3): Mihnea Ghitu, Matthew Wicker, Vihari Piratla
Controlling the patterns a model learns is essential to preventing reliance on irrelevant or misleading features. Such reliance on irrelevant features, often called shortcut features, has been observed across domains, including medical imaging and natural language processing, where it may lead to real-world harms. A common mitigation strategy leverages annotations (provided by humans or machines) indicating which features are relevant or irrelevant. These annotations are compared to model explanations, typically in the form of feature salience, and used to guide the loss function during training. Unfortunately, recent works have demonstrated that feature salience methods are unreliable and therefore offer a poor signal to optimize. In this work, we propose a simplified objective that simultaneously optimizes for explanation robustness and mitigation of shortcut learning. Unlike prior objectives with similar aims, we demonstrate theoretically why our approach ought to be more effective. Across a comprehensive series of experiments, we show that our approach consistently reduces test-time misclassifications by 20% compared to state-of-the-art methods. We also extend prior experimental settings to include natural language processing tasks. Additionally, we conduct novel ablations that yield practical insights, including the relative importance of annotation quality over quantity. Code for our method and experiments is available at: https://github.com/Mihneaghitu/ModelGuidanceViaRobustFeatureAttribution.
nan
Article 465
Title@2025-06-24 (2): Extreme Learning Machines for Exoplanet Simulations: A Faster, Lightweight Alternative to Deep Learning
Title: Extreme Learning Machines for Exoplanet Simulations: A Faster, Lightweight Alternative to Deep Learning | Extreme Lernmaschinen für Exoplanetensimulationen: Eine schnellere, leichte Alternative zum Deep Learning | 用于Explanet模拟的极端学习机器:一种更快、轻量比深层学习的替代方法 2506.19679v1 |
Authors (6): Tara P. A. Tahseen, Luís F. Simões, Kai Hou Yip, Nikolaos Nikolaou, João M. Mendonça, Ingo P. Waldmann
Increasing resolution and coverage of astrophysical and climate data necessitates increasingly sophisticated models, often pushing the limits of computational feasibility. While emulation methods can reduce calculation costs, the neural architectures typically used–optimised via gradient descent–are themselves computationally expensive to train, particularly in terms of data generation requirements. This paper investigates the utility of the Extreme Learning Machine (ELM) as a lightweight, non-gradient-based machine learning algorithm for accelerating complex physical models. We evaluate ELM surrogate models in two test cases with different data structures: (i) sequentially-structured data, and (ii) image-structured data. For test case (i), where the number of samples $N$ » the dimensionality of input data $d$, ELMs achieve remarkable efficiency, offering a 100,000$\times$ faster training time and a 40$\times$ faster prediction speed compared to a Bi-Directional Recurrent Neural Network (BIRNN), whilst improving upon BIRNN test performance. For test case (ii), characterised by $d » N$ and image-based inputs, a single ELM was insufficient, but an ensemble of 50 individual ELM predictors achieves comparable accuracy to a benchmark Convolutional Neural Network (CNN), with a 16.4$\times$ reduction in training time, though costing a 6.9$\times$ increase in prediction time. We find different sample efficiency characteristics between the test cases: in test case (i) individual ELMs demonstrate superior sample efficiency, requiring only 0.28% of the training dataset compared to the benchmark BIRNN, while in test case (ii) the ensemble approach requires 78% of the data used by the CNN to achieve comparable results–representing a trade-off between sample efficiency and model complexity.
nan
Article 466
Title@2025-06-24 (2): Higher-Order Graph Databases
Title: Higher-Order Graph Databases | Graphendatenbanken mit höherer Ordnung | 高层图表数据库 2506.19661v1 |
Authors (9): Maciej Besta, Shriram Chandran, Jakub Cudak, Patrick Iff, Marcin Copik, Robert Gerstenberger, Tomasz Szydlo, Jürgen Müller, Torsten Hoefler
Recent advances in graph databases (GDBs) have been driving interest in large-scale analytics, yet current systems fail to support higher-order (HO) interactions beyond first-order (one-hop) relations, which are crucial for tasks such as subgraph counting, polyadic modeling, and HO graph learning. We address this by introducing a new class of systems, higher-order graph databases (HO-GDBs) that use lifting and lowering paradigms to seamlessly extend traditional GDBs with HO. We provide a theoretical analysis of OLTP and OLAP queries, ensuring correctness, scalability, and ACID compliance. We implement a lightweight, modular, and parallelizable HO-GDB prototype that offers native support for hypergraphs, node-tuples, subgraphs, and other HO structures under a unified API. The prototype scales to large HO OLTP & OLAP workloads and shows how HO improves analytical tasks, for example enhancing accuracy of graph neural networks within a GDB by 44%. Our work ensures low latency and high query throughput, and generalizes both ACID-compliant and eventually consistent systems.
nan
Article 467
Title@2025-06-24 (2): PEVLM: Parallel Encoding for Vision-Language Models
Title: PEVLM: Parallel Encoding for Vision-Language Models | PEVLM: Parallele Kodierung für Vision-Language-Modelle | PEVLM: 视觉语言模型平行编码 2506.19651v1 |
Authors (6): Letian Kang, Shixian Luo, Yiqiang Li, Xiaoyang Yu, Shenxuan Zhou, Yong Wu
Vision-Language Models (VLMs) have demonstrated strong performance in video-language tasks, yet their application to long video understanding remains constrained by the quadratic complexity of standard attention mechanisms. In this paper, we propose \textbf{PEVLM}, a parallel encoding strategy specifically designed to improve the prefill efficiency of VLMs without requiring model finetuning. PEVLM partitions the input into block-wise segments with a shared sink, preserves full-attention positional embeddings, and aligns attention weights to mimic full-attention distributions. This design reduces attention computation from $O((T \times N)^2)$ to $O(T \times N)$ while maintaining high accuracy. Extensive experiments on the LongVideoBench benchmark show that PEVLM achieves up to 8.37\% accuracy improvement over existing inference-efficient methods and delivers up to 7.47x speedup in attention computation and 40\% reduction in end-to-end latency. Under strict latency constraints, PEVLM significantly outperforms baselines, raising accuracy from 23.26\% to 61.03\%. These results highlight PEVLM’s effectiveness for low-latency, long-context video understanding, making it well-suited for real-world applications such as autonomous driving.
nan
Article 468
Title@2025-06-24 (2): Tensor-Parallelism with Partially Synchronized Activations
Title: Tensor-Parallelism with Partially Synchronized Activations | Tensor-Parallelismus mit teilweise synchronisierten Aktivierungen | 具有部分同步激活作用的长者-长者-长者主义 2506.19645v1 |
Authors (5): Itay Lamprecht, Asaf Karnieli, Yair Hanani, Niv Giladi, Daniel Soudry
Training and inference of Large Language Models (LLMs) with tensor-parallelism requires substantial communication to synchronize activations. Our findings suggest that with a few minor adjustments to current practices, LLMs can be trained without fully synchronizing activations, reducing bandwidth demands. We name this “Communication-Aware Architecture for Tensor-parallelism” (CAAT-Net). We train 1B and 7B parameter CAAT-Net models, with a 50% reduction in tensor-parallel communication and no significant drop in pretraining accuracy. Furthermore, we demonstrate how CAAT-Net accelerates both training and inference workloads.
nan
Article 469
Title@2025-06-24 (2): Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model
Title: Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model | Unüberwachte Datengenerierung für Offline-Verstärkung Lernen: Eine Perspektive vom Modell | 未受监督的离线强化学习数据生成:模式的视角 2506.19643v1 |
Authors (5): Shuncheng He, Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Xiangyang Ji
Offline reinforcement learning (RL) recently gains growing interests from RL researchers. However, the performance of offline RL suffers from the out-of-distribution problem, which can be corrected by feedback in online RL. Previous offline RL research focuses on restricting the offline algorithm in in-distribution even in-sample action sampling. In contrast, fewer work pays attention to the influence of the batch data. In this paper, we first build a bridge over the batch data and the performance of offline RL algorithms theoretically, from the perspective of model-based offline RL optimization. We draw a conclusion that, with mild assumptions, the distance between the state-action pair distribution generated by the behavioural policy and the distribution generated by the optimal policy, accounts for the performance gap between the policy learned by model-based offline RL and the optimal policy. Secondly, we reveal that in task-agnostic settings, a series of policies trained by unsupervised RL can minimize the worst-case regret in the performance gap. Inspired by the theoretical conclusions, UDG (Unsupervised Data Generation) is devised to generate data and select proper data for offline training under tasks-agnostic settings. Empirical results demonstrate that UDG can outperform supervised data generation on solving unknown tasks.
nan
Article 470
Title@2025-06-24 (2): Hierarchical Time Series Forecasting Via Latent Mean Encoding
Title: Hierarchical Time Series Forecasting Via Latent Mean Encoding | Hierarchische Zeitreihen über latente mittlere Kodierung prognostizieren | 预测Via 隐中平均值编码的等级时间序列 2506.19633v1 |
Authors (3): Alessandro Salatiello, Stefan Birr, Manuel Kunz
Coherently forecasting the behaviour of a target variable across both coarse and fine temporal scales is crucial for profit-optimized decision-making in several business applications, and remains an open research problem in temporal hierarchical forecasting. Here, we propose a new hierarchical architecture that tackles this problem by leveraging modules that specialize in forecasting the different temporal aggregation levels of interest. The architecture, which learns to encode the average behaviour of the target variable within its hidden layers, makes accurate and coherent forecasts across the target temporal hierarchies. We validate our architecture on the challenging, real-world M5 dataset and show that it outperforms established methods, such as the TSMixer model.
nan
Article 471
Title@2025-06-24 (2): Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations
Title: Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations | Warum Ungewissheitskalibrierung zählt für zuverlässige Perturbation-basierte Erklärungen | 以可靠干扰为基础的解释的不确定性校准为何重要 2506.19630v1 |
Authors (3): Thomas Decker, Volker Tresp, Florian Buettner
Perturbation-based explanations are widely utilized to enhance the transparency of modern machine-learning models. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models frequently produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved perturbation-based explanations while preserving their original predictions. Experiments on popular computer vision models demonstrate that our calibration strategy produces explanations that are more aligned with human perception and actual object locations.
nan
Article 472
Title@2025-06-24 (2): Operator Forces For Coarse-Grained Molecular Dynamics
Title: Operator Forces For Coarse-Grained Molecular Dynamics | Bedienerkräfte für geradlinige molekulare Dynamiken | 粗粗粒分子动态操作员力量 2506.19628v1 |
Authors (5): Leon Klein, Atharva Kelkar, Aleksander Durumeric, Yaoyi Chen, Frank Noé
Coarse-grained (CG) molecular dynamics simulations extend the length and time scale of atomistic simulations by replacing groups of correlated atoms with CG beads. Machine-learned coarse-graining (MLCG) has recently emerged as a promising approach to construct highly accurate force fields for CG molecular dynamics. However, the calibration of MLCG force fields typically hinges on force matching, which demands extensive reference atomistic trajectories with corresponding force labels. In practice, atomistic forces are often not recorded, making traditional force matching infeasible on pre-existing datasets. Recently, noise-based kernels have been introduced to adapt force matching to the low-data regime, including situations in which reference atomistic forces are not present. While this approach produces force fields which recapitulate slow collective motion, it introduces significant local distortions due to the corrupting effects of the noise-based kernel. In this work, we introduce more general kernels based on normalizing flows that substantially reduce these local distortions while preserving global conformational accuracy. We demonstrate our method on small proteins, showing that flow-based kernels can generate high-quality CG forces solely from configurational samples.
nan
Article 473
Title@2025-06-24 (2): Scaling Up Unbiased Search-based Symbolic Regression
Title: Scaling Up Unbiased Search-based Symbolic Regression | Skalierung unvoreingenommen Suchbasierte Symbolische Regression | 增强无偏向的反向( U) 2506.19626v1 |
Authors (4): Paul Kahlmeyer, Joachim Giesen, Michael Habeck, Henrik Voigt
In a regression task, a function is learned from labeled data to predict the labels at new data points. The goal is to achieve small prediction errors. In symbolic regression, the goal is more ambitious, namely, to learn an interpretable function that makes small prediction errors. This additional goal largely rules out the standard approach used in regression, that is, reducing the learning problem to learning parameters of an expansion of basis functions by optimization. Instead, symbolic regression methods search for a good solution in a space of symbolic expressions. To cope with the typically vast search space, most symbolic regression methods make implicit, or sometimes even explicit, assumptions about its structure. Here, we argue that the only obvious structure of the search space is that it contains small expressions, that is, expressions that can be decomposed into a few subexpressions. We show that systematically searching spaces of small expressions finds solutions that are more accurate and more robust against noise than those obtained by state-of-the-art symbolic regression methods. In particular, systematic search outperforms state-of-the-art symbolic regressors in terms of its ability to recover the true underlying symbolic expressions on established benchmark data sets.
nan
Article 474
Title@2025-06-24 (2): Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges
Title: Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges | Multimodales maschinelles Lernen in der psychischen Gesundheit: Eine Erhebung von Daten, Algorithmen und Herausforderungen | 心理健康中多式机器学习:数据调查、判断力和挑战 2407.16804v2 |
Authors (3): Zahraa Al Sahili, Ioannis Patras, Matthew Purver
Multimodal machine learning (MML) is rapidly reshaping the way mental-health disorders are detected, characterized, and longitudinally monitored. Whereas early studies relied on isolated data streams – such as speech, text, or wearable signals – recent research has converged on architectures that integrate heterogeneous modalities to capture the rich, complex signatures of psychiatric conditions. This survey provides the first comprehensive, clinically grounded synthesis of MML for mental health. We (i) catalog 26 public datasets spanning audio, visual, physiological signals, and text modalities; (ii) systematically compare transformer, graph, and hybrid-based fusion strategies across 28 models, highlighting trends in representation learning and cross-modal alignment. Beyond summarizing current capabilities, we interrogate open challenges: data governance and privacy, demographic and intersectional fairness, evaluation explainability, and the complexity of mental health disorders in multimodal settings. By bridging methodological innovation with psychiatric utility, this survey aims to orient both ML researchers and mental-health practitioners toward the next generation of trustworthy, multimodal decision-support systems.
nan
Article 475
Title@2025-06-24 (2): Contactless Cardiac Pulse Monitoring Using Event Cameras
Title: Contactless Cardiac Pulse Monitoring Using Event Cameras | Kontaktlose Herz Pulsüberwachung mit Ereigniskameras | 使用事件相机进行无触碰心心心脏病脉动监测 2505.09529v2 |
Authors (3): Mohamed Moustafa, Joseph Lemley, Peter Corcoran
Time event cameras are a novel technology for recording scene information at extremely low latency and with low power consumption. Event cameras output a stream of events that encapsulate pixel-level light intensity changes within the scene, capturing information with a higher dynamic range and temporal resolution than traditional cameras. This study investigates the contact-free reconstruction of an individual’s cardiac pulse signal from time event recording of their face using a supervised convolutional neural network (CNN) model. An end-to-end model is trained to extract the cardiac signal from a two-dimensional representation of the event stream, with model performance evaluated based on the accuracy of the calculated heart rate. The experimental results confirm that physiological cardiac information in the facial region is effectively preserved within the event stream, showcasing the potential of this novel sensor for remote heart rate monitoring. The model trained on event frames achieves a root mean square error (RMSE) of 3.32 beats per minute (bpm) compared to the RMSE of 2.92 bpm achieved by the baseline model trained on standard camera frames. Furthermore, models trained on event frames generated at 60 and 120 FPS outperformed the 30 FPS standard camera results, achieving an RMSE of 2.54 and 2.13 bpm, respectively.
nan
Article 476
Title@2025-06-24 (2): ECG-SMART-NET: A Deep Learning Architecture for Precise ECG Diagnosis of Occlusion Myocardial Infarction
Title: ECG-SMART-NET: A Deep Learning Architecture for Precise ECG Diagnosis of Occlusion Myocardial Infarction | EKG-SMART-NET: Eine Deep-Learning-Architektur für präzise EKG-Diagnose des Okklusionsmyokardinfarkts | ECG-SMART-NET: 精密ECG心肌梗塞症诊断的深学习结构 2405.09567v2 |
Authors (13): Nathan T. Riek, Murat Akcakaya, Zeineb Bouzid, Tanmay Gokhale, Stephanie Helman, Karina Kraevsky-Philips, Rui Qi Ji, Ervin Sejdic, Jessica K. Zègre-Hemsey, Christian Martin-Gill, Clifton W. Callaway, Samir Saba, Salah Al-Zaiti
Objective: In this paper we develop and evaluate ECG-SMART-NET for occlusion myocardial infarction (OMI) identification. OMI is a severe form of heart attack characterized by complete blockage of one or more coronary arteries requiring immediate referral for cardiac catheterization to restore blood flow to the heart. Two thirds of OMI cases are difficult to visually identify from a 12-lead electrocardiogram (ECG) and can be potentially fatal if not identified quickly. Previous works on this topic are scarce, and current state-of-the-art evidence suggests both feature-based random forests and convolutional neural networks (CNNs) are promising approaches to improve ECG detection of OMI. Methods: While the ResNet architecture has been adapted for use with ECG recordings, it is not ideally suited to capture informative temporal features within each lead and the spatial concordance or discordance across leads. We propose a clinically informed modification of the ResNet-18 architecture. The model first learns temporal features through temporal convolutional layers with 1xk kernels followed by a spatial convolutional layer, after the residual blocks, with 12x1 kernels to learn spatial features. Results: ECG-SMART-NET was benchmarked against the original ResNet-18 and other state-of-the-art models on a multisite real-word clinical dataset that consists of 10,393 ECGs from 7,397 unique patients (rate of OMI =7.2%). ECG-SMART-NET outperformed other models in the classification of OMI with a test AUC of 0.953 [0.921, 0.978]. Conclusion and Significance: ECG-SMART-NET can outperform the state-of-the-art random forest for OMI prediction and is better suited for this task than the original ResNet-18 architecture.
nan
Article 477
Title@2025-06-24 (2): A text-to-tabular approach to generate synthetic patient data using LLMs
Title: A text-to-tabular approach to generate synthetic patient data using LLMs | Ein text-to-tabuläres Konzept zur Generierung synthetischer Patientendaten mit Hilfe von LLMs | 使用LLMM 生成合成病人数据的一种文本到表格的方法 2412.05153v2 |
Authors (6): Margaux Tornqvist, Jean-Daniel Zucker, Tristan Fauvel, Nicolas Lambert, Mathilde Berthelot, Antoine Movschin
Access to large-scale high-quality healthcare databases is key to accelerate medical research and make insightful discoveries about diseases. However, access to such data is often limited by patient privacy concerns, data sharing restrictions and high costs. To overcome these limitations, synthetic patient data has emerged as an alternative. However, synthetic data generation (SDG) methods typically rely on machine learning (ML) models trained on original data, leading back to the data scarcity problem. We propose an approach to generate synthetic tabular patient data that does not require access to the original data, but only a description of the desired database. We leverage prior medical knowledge and in-context learning capabilities of large language models (LLMs) to generate realistic patient data, even in a low-resource setting. We quantitatively evaluate our approach against state-of-the-art SDG models, using fidelity, privacy, and utility metrics. Our results show that while LLMs may not match the performance of state-of-the-art models trained on the original data, they effectively generate realistic patient data with well-preserved clinical correlations. An ablation study highlights key elements of our prompt contributing to high-quality synthetic patient data generation. This approach, which is easy to use and does not require original data or advanced ML skills, is particularly valuable for quickly generating custom-designed patient data, supporting project implementation and providing educational resources.
nan
Article 478
Title@2025-06-24 (2): Beyond Static Models: Hypernetworks for Adaptive and Generalizable Forecasting in Complex Parametric Dynamical Systems
Title: Beyond Static Models: Hypernetworks for Adaptive and Generalizable Forecasting in Complex Parametric Dynamical Systems | Jenseits statischer Modelle: Hypernetworks für adaptive und generalisierbare Vorhersagen in komplexen parametrischen dynamischen Systemen | 超越静态模型:复杂参数动态系统适应性和可通用预报超网络 2506.19609v1 |
Authors (3): Pantelis R. Vlachas, Konstantinos Vlachas, Eleni Chatzi
Dynamical systems play a key role in modeling, forecasting, and decision-making across a wide range of scientific domains. However, variations in system parameters, also referred to as parametric variability, can lead to drastically different model behavior and output, posing challenges for constructing models that generalize across parameter regimes. In this work, we introduce the Parametric Hypernetwork for Learning Interpolated Networks (PHLieNet), a framework that simultaneously learns: (a) a global mapping from the parameter space to a nonlinear embedding and (b) a mapping from the inferred embedding to the weights of a dynamics propagation network. The learned embedding serves as a latent representation that modulates a base network, termed the hypernetwork, enabling it to generate the weights of a target network responsible for forecasting the system’s state evolution conditioned on the previous time history. By interpolating in the space of models rather than observations, PHLieNet facilitates smooth transitions across parameterized system behaviors, enabling a unified model that captures the dynamic behavior across a broad range of system parameterizations. The performance of the proposed technique is validated in a series of dynamical systems with respect to its ability to extrapolate in time and interpolate and extrapolate in the parameter space, i.e., generalize to dynamics that were unseen during training. In all cases, our approach outperforms or matches state-of-the-art baselines in both short-term forecast accuracy and in capturing long-term dynamical features, such as attractor statistics.
nan
Article 479
Title@2025-06-24 (2): ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Title: ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP | ChordPrompt: Orchestrierung von Cross-Modal Prompt Synergy für Multi-Domain Incremental Learning in CLIP | ChordPrompt:CLIP中多领域递增学习的交织式跨模式即时同步协同 2506.19608v1 |
Authors (2): Zhiyuan Wang, Bokui Chen
Continual learning (CL) empowers pre-trained vision-language models to adapt effectively to novel or previously underrepresented data distributions without comprehensive retraining, enhancing their adaptability and efficiency. While vision-language models like CLIP show great promise, they struggle to maintain performance across domains in incremental learning scenarios. Existing prompt learning methods face two main limitations: 1) they primarily focus on class-incremental learning scenarios, lacking specific strategies for multi-domain task incremental learning; 2) most current approaches employ single-modal prompts, neglecting the potential benefits of cross-modal information exchange. To address these challenges, we propose the \ChordPrompt framework, which facilitates a harmonious interplay between visual and textual prompts. \ChordPrompt introduces cross-modal prompts to leverage interactions between visual and textual information. Our approach also employs domain-adaptive text prompts to select appropriate prompts for continual adaptation across multiple domains. Comprehensive experiments on multi-domain incremental learning benchmarks demonstrate that \ChordPrompt outperforms state-of-the-art methods in zero-shot generalization and downstream task performance.
nan
Article 480
Title@2025-06-24 (2): Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks
Title: Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks | Konstruktive Universal-Annäherung und Finite Sample-Memorisation durch Narrow Deep ReLU Networks | 由窄深深RELU网络进行 2409.06555v2 |
Authors (2): Martín Hernández, Enrique Zuazua
We present a fully constructive analysis of deep ReLU neural networks for classification and function approximation tasks. First, we prove that any dataset with $N$ distinct points in $\mathbb{R}^d$ and $M$ output classes can be exactly classified using a multilayer perceptron (MLP) of width $2$ and depth at most $2N + 4M - 1$, with all network parameters constructed explicitly. This result is sharp with respect to width and is interpreted through the lens of simultaneous or ensemble controllability in discrete nonlinear dynamics. Second, we show that these explicit constructions yield uniform bounds on the parameter norms and, in particular, provide upper estimates for minimizers of standard regularized training loss functionals in supervised learning. As the regularization parameter vanishes, the trained networks converge to exact classifiers with bounded norm, explaining the effectiveness of overparameterized training in the small-regularization regime. We also prove a universal approximation theorem in $L^p(\Omega; \mathbb{R}_+)$ for any bounded domain $\Omega \subset \mathbb{R}^d$ and $p \in [1, \infty)$, using MLPs of fixed width $d + 1$. The proof is constructive, geometrically motivated, and provides explicit estimates on the network depth when the target function belongs to the Sobolev space $W^{1,p}$. We also extend the approximation and depth estimation results to $L^p(\Omega; \mathbb{R}^m)$ for any $m \geq 1$. Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.
nan
Article 481
Title@2025-06-24 (2): Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases
Title: Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases | Diff-Def: Diffusionsgenerierte Deformationsfelder für Bedingte Atlase | Diff- Def: 用于条件图集的 Diff- Def: 用于条件图集的 Dif- 扩散- 驱动解析字段 2403.16776v3 |
Authors (6): Sophie Starck, Vasiliki Sideri-Lampretsa, Bernhard Kainz, Martin J. Menten, Tamara T. Mueller, Daniel Rueckert
Anatomical atlases are widely used for population studies and analysis. Conditional atlases target a specific sub-population defined via certain conditions, such as demographics or pathologies, and allow for the investigation of fine-grained anatomical differences like morphological changes associated with ageing or disease. Existing approaches use either registration-based methods that are often unable to handle large anatomical variations or generative adversarial models, which are challenging to train since they can suffer from training instabilities. Instead of generating atlases directly in as intensities, we propose using latent diffusion models to generate deformation fields, which transform a general population atlas into one representing a specific sub-population. Our approach ensures structural integrity, enhances interpretability and avoids hallucinations that may arise during direct image synthesis by generating this deformation field and regularising it using a neighbourhood of images. We compare our method to several state-of-the-art atlas generation methods using brain MR images from the UK Biobank. Our method generates highly realistic atlases with smooth transformations and high anatomical fidelity, outperforming existing baselines. We demonstrate the quality of these atlases through comprehensive evaluations, including quantitative metrics for anatomical accuracy, perceptual similarity, and qualitative analyses displaying the consistency and realism of the generated atlases.
nan
Article 482
Title@2025-06-24 (2): Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
Title: Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra | Training Flexible Modelle genetischer Variant-Effekte aus funktionellen Anmerkungen mit beschleunigter Linear Algebra | 使用加速线性线性代数对功能说明的遗传变异效应灵活模型的培训 2506.19598v1 |
Authors (3): Alan N. Amin, Andres Potapczynski, Andrew Gordon Wilson
To understand how genetic variants in human genomes manifest in phenotypes – traits like height or diseases like asthma – geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.
nan
Article 483
Title@2025-06-24 (2): Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications
Title: Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications | Vision Transformer-basierte Zeitreihen-Bildrekonstruktion für Cloud-Filling-Anwendungen | 为云层填云应用而重建基于时间-系列图像 2506.19591v1 |
Authors (3): Lujun Li, Yiqun Wang, Radu State
Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping, as it leads to missing or corrupted spectral information. Synthetic aperture radar (SAR) data, which is not affected by cloud interference, offers a complementary solution, but lack sufficient spectral detail for precise crop mapping. To address this, we propose a novel framework, Time-series MSI Image Reconstruction using Vision Transformer (ViT), to reconstruct MSI data in cloud-covered regions by leveraging the temporal coherence of MSI and the complementary information from SAR from the attention mechanism. Comprehensive experiments, using rigorous reconstruction evaluation metrics, demonstrate that Time-series ViT framework significantly outperforms baselines that use non-time-series MSI and SAR or time-series MSI without SAR, effectively enhancing MSI image reconstruction in cloud-covered regions.
nan
Article 484
Title@2025-06-24 (2): ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks
Title: ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks | ConStellaration: Ein Datensatz von QI-ähnlichen Stellaratoren-Plasmagrenzen und Optimierungs-Benchmarks | 交配:一套类似 QI 星际等离子体边界和优化基准的数据集 2506.19583v1 |
Authors (11): Santiago A. Cadena, Andrea Merlo, Emanuel Laude, Alexander Bauer, Atul Agrawal, Maria Pascu, Marija Savtchouk, Enrico Guiraud, Lukas Bonauer, Stuart Hudson, Markus Kaiser
Stellarators are magnetic confinement devices under active development to deliver steady-state carbon-free fusion energy. Their design involves a high-dimensional, constrained optimization problem that requires expensive physics simulations and significant domain expertise. Recent advances in plasma physics and open-source tools have made stellarator optimization more accessible. However, broader community progress is currently bottlenecked by the lack of standardized optimization problems with strong baselines and datasets that enable data-driven approaches, particularly for quasi-isodynamic (QI) stellarator configurations, considered as a promising path to commercial fusion due to their inherent resilience to current-driven disruptions. Here, we release an open dataset of diverse QI-like stellarator plasma boundary shapes, paired with their ideal magnetohydrodynamic (MHD) equilibria and performance metrics. We generated this dataset by sampling a variety of QI fields and optimizing corresponding stellarator plasma boundaries. We introduce three optimization benchmarks of increasing complexity: (1) a single-objective geometric optimization problem, (2) a “simple-to-build” QI stellarator, and (3) a multi-objective ideal-MHD stable QI stellarator that investigates trade-offs between compactness and coil simplicity. For every benchmark, we provide reference code, evaluation scripts, and strong baselines based on classical optimization techniques. Finally, we show how learned models trained on our dataset can efficiently generate novel, feasible configurations without querying expensive physics oracles. By openly releasing the dataset along with benchmark problems and baselines, we aim to lower the entry barrier for optimization and machine learning researchers to engage in stellarator design and to accelerate cross-disciplinary progress toward bringing fusion energy to the grid.
nan
Article 485
Title@2025-06-24 (2): Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention
Title: Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention | Realistisches Bild-zu-Bild-Maschine-Entlernen durch Entkopplung und Wissensretention | 通过脱钩和知识保留消除学习 2502.04260v2 |
Authors (2): Ayush K. Varshney, Vicenç Torra
Machine Unlearning allows participants to remove their data from a trained machine learning model in order to preserve their privacy, and security. However, the machine unlearning literature for generative models is rather limited. The literature for image-to-image generative model (I2I model) considers minimizing the distance between Gaussian noise and the output of I2I model for forget samples as machine unlearning. However, we argue that the machine learning model performs fairly well on unseen data i.e., a retrained model will be able to catch generic patterns in the data and hence will not generate an output which is equivalent to Gaussian noise. In this paper, we consider that the model after unlearning should treat forget samples as out-of-distribution (OOD) data, i.e., the unlearned model should no longer recognize or encode the specific patterns found in the forget samples. To achieve this, we propose a framework which decouples the model parameters with gradient ascent, ensuring that forget samples are OOD for unlearned model with theoretical guarantee. We also provide $(\epsilon, \delta)$-unlearning guarantee for model updates with gradient ascent. The unlearned model is further fine-tuned on the remaining samples to maintain its performance. We also propose an attack model to ensure that the unlearned model has effectively removed the influence of forget samples. Extensive empirical evaluation on two large-scale datasets, ImageNet-1K and Places365 highlights the superiority of our approach. To show comparable performance with retrained model, we also show the comparison of a simple AutoEncoder on various baselines on CIFAR-10 dataset.
nan
Article 486
Title@2025-06-24 (2): Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects
Title: Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects | Fake oder Real, Können Roboter erzählen? Evaluieren von körpereigenen Vision-Sprachenmodellen auf realen und 3D-gedruckten Objekten | 假的还是假的,机器人能告诉吗?评价关于真物和3D实用物的内嵌视觉语言模型 2506.19579v1 |
Authors (3): Federico Tavella, Kathryn Mearns, Angelo Cangelosi
Robotic scene understanding increasingly relies on vision-language models (VLMs) to generate natural language descriptions of the environment. In this work, we present a comparative study of captioning strategies for tabletop scenes captured by a robotic arm equipped with an RGB camera. The robot collects images of objects from multiple viewpoints, and we evaluate several models that generate scene descriptions. We compare the performance of various captioning models, like BLIP and VLMs. Our experiments examine the trade-offs between single-view and multi-view captioning, and difference between recognising real-world and 3D printed objects. We quantitatively evaluate object identification accuracy, completeness, and naturalness of the generated captions. Results show that VLMs can be used in robotic settings where common objects need to be recognised, but fail to generalise to novel representations. Our findings provide practical insights into deploying foundation models for embodied agents in real-world settings.
nan
Article 487
Title@2025-06-24 (2): FAF: A Feature-Adaptive Framework for Few-Shot Time Series Forecasting
Title: FAF: A Feature-Adaptive Framework for Few-Shot Time Series Forecasting | FAF: Ein Feature-Adaptive-Framework für die Vorhersage von Kurzzeitreihen | FAF: 低热时间序列预测的特征-适应性框架 2506.19567v1 |
Authors (6): Pengpeng Ouyang, Dong Chen, Tong Yang, Shuo Feng, Zhao Jin, Mingliang Xu
Multi-task and few-shot time series forecasting tasks are commonly encountered in scenarios such as the launch of new products in different cities. However, traditional time series forecasting methods suffer from insufficient historical data, which stems from a disregard for the generalized and specific features among different tasks. For the aforementioned challenges, we propose the Feature-Adaptive Time Series Forecasting Framework (FAF), which consists of three key components: the Generalized Knowledge Module (GKM), the Task-Specific Module (TSM), and the Rank Module (RM). During training phase, the GKM is updated through a meta-learning mechanism that enables the model to extract generalized features across related tasks. Meanwhile, the TSM is trained to capture diverse local dynamics through multiple functional regions, each of which learns specific features from individual tasks. During testing phase, the RM dynamically selects the most relevant functional region from the TSM based on input sequence features, which is then combined with the generalized knowledge learned by the GKM to generate accurate forecasts. This design enables FAF to achieve robust and personalized forecasting even with sparse historical observations We evaluate FAF on five diverse real-world datasets under few-shot time series forecasting settings. Experimental results demonstrate that FAF consistently outperforms baselines that include three categories of time series forecasting methods. In particular, FAF achieves a 41.81\% improvement over the best baseline, iTransformer, on the CO$_2$ emissions dataset.
nan
Article 488
Title@2025-06-24 (2): Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees
Title: Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees | Neudenken Neurale Kombinatorische Optimierung für Fahrzeugrouting-Probleme mit unterschiedlichen Engegraden | 重新思考具有不同紧紧度的车辆运行问题神经组合优化 2505.24627v2 |
Authors (4): Fu Luo, Yaoxin Wu, Zhi Zheng, Zhenkun Wang
Recent neural combinatorial optimization (NCO) methods have shown promising problem-solving ability without requiring domain-specific expertise. Most existing NCO methods use training and testing data with a fixed constraint value and lack research on the effect of constraint tightness on the performance of NCO methods. This paper takes the capacity-constrained vehicle routing problem (CVRP) as an example to empirically analyze the NCO performance under different tightness degrees of the capacity constraint. Our analysis reveals that existing NCO methods overfit the capacity constraint, and they can only perform satisfactorily on a small range of the constraint values but poorly on other values. To tackle this drawback of existing NCO methods, we develop an efficient training scheme that explicitly considers varying degrees of constraint tightness and proposes a multi-expert module to learn a generally adaptable solving strategy. Experimental results show that the proposed method can effectively overcome the overfitting issue, demonstrating superior performances on the CVRP and CVRP with time windows (CVRPTW) with various constraint tightness degrees.
nan
Article 489
Title@2025-06-24 (2): ConCM: Consistency-Driven Calibration and Matching for Few-Shot Class-Incremental Learning
Title: ConCM: Consistency-Driven Calibration and Matching for Few-Shot Class-Incremental Learning | ConCM: Konsistenzgetriebene Kalibrierung und Passung für das wenige-heiße Klassen-Inkrementelle Lernen | CCCM: 校准和校准低热级高级强化学习 2506.19558v1 |
Authors (6): QinZhe Wang, Zixuan Chen, Keke Huang, Xiu Su, Chunhua Yang, Chang Xu
Few-Shot Class-Incremental Learning (FSCIL) requires models to adapt to novel classes with limited supervision while preserving learned knowledge. Existing prospective learning-based space construction methods reserve space to accommodate novel classes. However, prototype deviation and structure fixity limit the expressiveness of the embedding space. In contrast to fixed space reservation, we explore the optimization of feature-structure dual consistency and propose a Consistency-driven Calibration and Matching Framework (ConCM) that systematically mitigate the knowledge conflict inherent in FSCIL. Specifically, inspired by hippocampal associative memory, we design a memory-aware prototype calibration that extracts generalized semantic attributes from base classes and reintegrates them into novel classes to enhance the conceptual center consistency of features. Further, we propose dynamic structure matching, which adaptively aligns the calibrated features to a session-specific optimal manifold space, ensuring cross-session structure consistency. Theoretical analysis shows that our method satisfies both geometric optimality and maximum matching, thereby overcoming the need for class-number priors. On large-scale FSCIL benchmarks including mini-ImageNet and CUB200, ConCM achieves state-of-the-art performance, surpassing current optimal method by 3.20% and 3.68% in harmonic accuracy of incremental sessions.
nan
Article 490
Title@2025-06-24 (2): General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound
Title: General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound | General Methods Make Great Domain-spezifische Foundation Models: Eine Fallstudie über Fetal Ultrasound | 通用方法:胎儿超声波案例研究 2506.19552v1 |
Authors (9): Jakob Ambsdorf, Asbjørn Munk, Sebastian Llambias, Anders Nymark Christensen, Kamil Mikolaj, Randall Balestriero, Martin Tolsgaard, Aasa Feragen, Mads Nielsen
With access to large-scale, unlabeled medical datasets, researchers are confronted with two questions: Should they attempt to pretrain a custom foundation model on this medical data, or use transfer-learning from an existing generalist model? And, if a custom model is pretrained, are novel methods required? In this paper we explore these questions by conducting a case-study, in which we train a foundation model on a large regional fetal ultrasound dataset of 2M images. By selecting the well-established DINOv2 method for pretraining, we achieve state-of-the-art results on three fetal ultrasound datasets, covering data from different countries, classification, segmentation, and few-shot tasks. We compare against a series of models pretrained on natural images, ultrasound images, and supervised baselines. Our results demonstrate two key insights: (i) Pretraining on custom data is worth it, even if smaller models are trained on less data, as scaling in natural image pretraining does not translate to ultrasound performance. (ii) Well-tuned methods from computer vision are making it feasible to train custom foundation models for a given medical domain, requiring no hyperparameter tuning and little methodological adaptation. Given these findings, we argue that a bias towards methodological innovation should be avoided when developing domain specific foundation models under common computational resource constraints.
nan
Article 491
Title@2025-06-24 (2): Discovering Symmetries of ODEs by Symbolic Regression
Title: Discovering Symmetries of ODEs by Symbolic Regression | Symmetrien von ODEs durch symbolische Regression entdecken | 通过符号回归发现对 ODE 的对称 2506.19550v1 |
Authors (3): Paul Kahlmeyer, Niklas Merk, Joachim Giesen
Solving systems of ordinary differential equations (ODEs) is essential when it comes to understanding the behavior of dynamical systems. Yet, automated solving remains challenging, in particular for nonlinear systems. Computer algebra systems (CASs) provide support for solving ODEs by first simplifying them, in particular through the use of Lie point symmetries. Finding these symmetries is, however, itself a difficult problem for CASs. Recent works in symbolic regression have shown promising results for recovering symbolic expressions from data. Here, we adapt search-based symbolic regression to the task of finding generators of Lie point symmetries. With this approach, we can find symmetries of ODEs that existing CASs cannot find.
nan
Article 492
Title@2025-06-24 (2): RCStat: A Statistical Framework for using Relative Contextualization in Transformers
Title: RCStat: A Statistical Framework for using Relative Contextualization in Transformers | RCStat: Statistischer Rahmen für die Verwendung der relativen Kontextualisierung in Transformern | RCStat: 在变异器中使用相对环境化的统计框架 2506.19549v1 |
Authors (4): Debabrata Mahapatra, Shubham Agarwal, Apoorv Saxena, Subrata Mitra
Prior work on input-token importance in auto-regressive transformers has relied on Softmax-normalized attention weights, which obscure the richer structure of pre-Softmax query-key logits. We introduce RCStat, a statistical framework that harnesses raw attention logits via Relative Contextualization (RC), a random variable measuring contextual alignment between token segments, and derive an efficient upper bound for RC. We demonstrate two applications: (i) Key-Value compression, where RC-based thresholds drive adaptive key-value eviction for substantial cache reduction with minimal quality loss; and (ii) Attribution, where RC yields higher-fidelity token-, sentence-, and chunk-level explanations than post-Softmax methods. Across question answering, summarization, and attribution benchmarks, RCStat achieves significant empirical gains, delivering state-of-the-art compression and attribution performance without any model retraining.
nan
Article 493
Title@2025-06-24 (2): Overtuning in Hyperparameter Optimization
Title: Overtuning in Hyperparameter Optimization | Overtuning in Hyperparameter-Optimierung | 超重参数超强优化 2506.19540v1 |
Authors (3): Lennart Schneider, Bernd Bischl, Matthias Feurer
Hyperparameter optimization (HPO) aims to identify an optimal hyperparameter configuration (HPC) such that the resulting model generalizes well to unseen data. As the expected generalization error cannot be optimized directly, it is estimated with a resampling strategy, such as holdout or cross-validation. This approach implicitly assumes that minimizing the validation error leads to improved generalization. However, since validation error estimates are inherently stochastic and depend on the resampling strategy, a natural question arises: Can excessive optimization of the validation error lead to overfitting at the HPO level, akin to overfitting in model training based on empirical risk minimization? In this paper, we investigate this phenomenon, which we term overtuning, a form of overfitting specific to HPO. Despite its practical relevance, overtuning has received limited attention in the HPO and AutoML literature. We provide a formal definition of overtuning and distinguish it from related concepts such as meta-overfitting. We then conduct a large-scale reanalysis of HPO benchmark data to assess the prevalence and severity of overtuning. Our results show that overtuning is more common than previously assumed, typically mild but occasionally severe. In approximately 10% of cases, overtuning leads to the selection of a seemingly optimal HPC with worse generalization error than the default or first configuration tried. We further analyze how factors such as performance metric, resampling strategy, dataset size, learning algorithm, and HPO method affect overtuning and discuss mitigation strategies. Our results highlight the need to raise awareness of overtuning, particularly in the small-data regime, indicating that further mitigation strategies should be studied.
nan
Article 494
Title@2025-06-24 (2): Dimension Reduction for Symbolic Regression
Title: Dimension Reduction for Symbolic Regression | Dimensionsreduzierung für symbolische Regression | 减少内效退退的尺寸 2506.19537v1 |
Authors (3): Paul Kahlmeyer, Markus Fischer, Joachim Giesen
Solutions of symbolic regression problems are expressions that are composed of input variables and operators from a finite set of function symbols. One measure for evaluating symbolic regression algorithms is their ability to recover formulae, up to symbolic equivalence, from finite samples. Not unexpectedly, the recovery problem becomes harder when the formula gets more complex, that is, when the number of variables and operators gets larger. Variables in naturally occurring symbolic formulas often appear only in fixed combinations. This can be exploited in symbolic regression by substituting one new variable for the combination, effectively reducing the number of variables. However, finding valid substitutions is challenging. Here, we address this challenge by searching over the expression space of small substitutions and testing for validity. The validity test is reduced to a test of functional dependence. The resulting iterative dimension reduction procedure can be used with any symbolic regression approach. We show that it reliably identifies valid substitutions and significantly boosts the performance of different types of state-of-the-art symbolic regression algorithms.
nan
Article 495
Title@2025-06-24 (2): Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks
Title: Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks | Identifizieren physikalisch realisierbare Auslöser für Backdoored Face Recognition Networks | 识别后门脸部识别网络的有形可实现触发器 2506.19533v1 |
Authors (3): Ankita Raj, Ambar Pal, Chetan Arora
Backdoor attacks embed a hidden functionality into deep neural networks, causing the network to display anomalous behavior when activated by a predetermined pattern in the input Trigger, while behaving well otherwise on public test data. Recent works have shown that backdoored face recognition (FR) systems can respond to natural-looking triggers like a particular pair of sunglasses. Such attacks pose a serious threat to the applicability of FR systems in high-security applications. We propose a novel technique to (1) detect whether an FR network is compromised with a natural, physically realizable trigger, and (2) identify such triggers given a compromised network. We demonstrate the effectiveness of our methods with a compromised FR network, where we are able to identify the trigger (e.g., green sunglasses or red hat) with a top-5 accuracy of 74%, whereas a naive brute force baseline achieves 56% accuracy.
nan
Article 496
Title@2025-06-24 (2): A Framework for Uncertainty Quantification Based on Nearest Neighbors Across Layers
Title: A Framework for Uncertainty Quantification Based on Nearest Neighbors Across Layers | Ein Rahmen für die Unsicherheitsquantifizierung auf der Grundlage der nächsten Nachbarländer über Schichten hinweg | 基于跨层近邻的不确定性定量框架 2506.19895v1 |
Authors (3): Miguel N. Font, José L. Jorro-Aragoneses, Carlos M. Alaíz
Neural Networks have high accuracy in solving problems where it is difficult to detect patterns or create a logical model. However, these algorithms sometimes return wrong solutions, which become problematic in high-risk domains like medical diagnosis or autonomous driving. One strategy to detect and mitigate these errors is the measurement of the uncertainty over neural network decisions. In this paper, we present a novel post-hoc framework for measuring the uncertainty of a decision based on retrieved training cases that have a similar activation vector to the query for each layer. Based on these retrieved cases, we propose two new metrics: Decision Change and Layer Uncertainty, which capture changes in nearest-neighbor class distributions across layers. We evaluated our approach in a classification model for two datasets: CIFAR-10 and MNIST. The results show that these metrics enhance uncertainty estimation, especially in challenging classification tasks, outperforming softmax-based confidence.
nan
Article 497
Title@2025-06-24 (2): Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges
Title: Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges | Auf dem Weg zu robusten Stabilitätsprognosen in Smart Grids: GAN-basierter Ansatz unter Datenbeschränkungen und adversarialen Herausforderungen | 实现智能网格中强有力的稳定预测:在数据制约和反向挑战下采取基于全球网络的方法 2501.16490v2 |
Authors (5): Emad Efatinasab, Alessandro Brighente, Denis Donadel, Mauro Conti, Mirco Rampazzo
Smart grids are crucial for meeting rising energy demands driven by global population growth and urbanization. By integrating renewable energy sources, they enhance efficiency, reliability, and sustainability. However, ensuring their availability and security requires advanced operational control and safety measures. Although artificial intelligence and machine learning can help assess grid stability, challenges such as data scarcity and cybersecurity threats, particularly adversarial attacks, remain. Data scarcity is a major issue, as obtaining real-world instances of grid instability requires significant expertise, resources, and time. Yet, these instances are critical for testing new research advancements and security mitigations. This paper introduces a novel framework for detecting instability in smart grids using only stable data. It employs a Generative Adversarial Network (GAN) where the generator is designed not to produce near-realistic data but instead to generate Out-Of-Distribution (OOD) samples with respect to the stable class. These OOD samples represent unstable behavior, anomalies, or disturbances that deviate from the stable data distribution. By training exclusively on stable data and exposing the discriminator to OOD samples, our framework learns a robust decision boundary to distinguish stable conditions from any unstable behavior, without requiring unstable data during training. Furthermore, we incorporate an adversarial training layer to enhance resilience against attacks. Evaluated on a real-world dataset, our solution achieves up to 98.1\% accuracy in predicting grid stability and 98.9\% in detecting adversarial attacks. Implemented on a single-board computer, it enables real-time decision-making with an average response time of under 7ms.
nan
Article 498
Title@2025-06-24 (2): Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration
Title: Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration | Auf dem Weg zu einem unüberwachten Mehr-Agenten-Verstärkungs-Lernen durch Task-Agnostic Exploration | 通过任务不可知探索实现无人监督的多机构强化学习 2502.08365v3 |
Authors (3): Riccardo Zamboni, Mirco Mutti, Marcello Restelli
In reinforcement learning, we typically refer to unsupervised pre-training when we aim to pre-train a policy without a priori access to the task specification, i.e. rewards, to be later employed for efficient learning of downstream tasks. In single-agent settings, the problem has been extensively studied and mostly understood. A popular approach, called task-agnostic exploration, casts the unsupervised objective as maximizing the entropy of the state distribution induced by the agent’s policy, from which principles and methods follow. In contrast, little is known about it in multi-agent settings, which are ubiquitous in the real world. What are the pros and cons of alternative problem formulations in this setting? How hard is the problem in theory, how can we solve it in practice? In this paper, we address these questions by first characterizing those alternative formulations and highlighting how the problem, even when tractable in theory, is non-trivial in practice. Then, we present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings. Finally, we provide numerical validations to both corroborate the theoretical findings and pave the way for unsupervised multi-agent reinforcement learning via task-agnostic exploration in challenging domains, showing that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
nan
Article 499
Title@2025-06-24 (2): Explaining deep neural network models for electricity price forecasting with XAI
Title: Explaining deep neural network models for electricity price forecasting with XAI | Erläutern von Deep-Neural-Netzwerk-Modellen für die Strompreisprognose mit XAI | 解释与XAI公司一道进行电力价格预测的深层神经网络模型 2506.19894v1 |
Authors (2): Antoine Pesenti, Aidan OSullivan
Electricity markets are highly complex, involving lots of interactions and complex dependencies that make it hard to understand the inner workings of the market and what is driving prices. Econometric methods have been developed for this, white-box models, however, they are not as powerful as deep neural network models (DNN). In this paper, we use a DNN to forecast the price and then use XAI methods to understand the factors driving the price dynamics in the market. The objective is to increase our understanding of how different electricity markets work. To do that, we apply explainable methods such as SHAP and Gradient, combined with visual techniques like heatmaps (saliency maps) to analyse the behaviour and contributions of various features across five electricity markets. We introduce the novel concepts of SSHAP values and SSHAP lines to enhance the complex representation of high-dimensional tabular models.
nan
Article 500
Title@2025-06-24 (2): Visual hallucination detection in large vision-language models via evidential conflict
Title: Visual hallucination detection in large vision-language models via evidential conflict | Visuelle Halluzinationserkennung in großen visionssprachlichen Modellen über Beweiskonflikte | 通过证据冲突在大型视觉语言模型中发现视觉幻觉 2506.19513v1 |
Authors (5): Tao Huang, Zhekun Liu, Rui Wang, Yang Zhang, Liping Jing
Despite the remarkable multimodal capabilities of Large Vision-Language Models (LVLMs), discrepancies often occur between visual inputs and textual outputs–a phenomenon we term visual hallucination. This critical reliability gap poses substantial risks in safety-critical Artificial Intelligence (AI) applications, necessitating a comprehensive evaluation benchmark and effective detection methods. Firstly, we observe that existing visual-centric hallucination benchmarks mainly assess LVLMs from a perception perspective, overlooking hallucinations arising from advanced reasoning capabilities. We develop the Perception-Reasoning Evaluation Hallucination (PRE-HAL) dataset, which enables the systematic evaluation of both perception and reasoning capabilities of LVLMs across multiple visual semantics, such as instances, scenes, and relations. Comprehensive evaluation with this new benchmark exposed more visual vulnerabilities, particularly in the more challenging task of relation reasoning. To address this issue, we propose, to the best of our knowledge, the first Dempster-Shafer theory (DST)-based visual hallucination detection method for LVLMs through uncertainty estimation. This method aims to efficiently capture the degree of conflict in high-level features at the model inference phase. Specifically, our approach employs simple mass functions to mitigate the computational complexity of evidence combination on power sets. We conduct an extensive evaluation of state-of-the-art LVLMs, LLaVA-v1.5, mPLUG-Owl2 and mPLUG-Owl3, with the new PRE-HAL benchmark. Experimental results indicate that our method outperforms five baseline uncertainty metrics, achieving average AUROC improvements of 4%, 10%, and 7% across three LVLMs. Our code is available at https://github.com/HT86159/Evidential-Conflict.
nan
Article 501
Title@2025-06-24 (2): TrainVerify: Equivalence-Based Verification for Distributed LLM Training
Title: TrainVerify: Equivalence-Based Verification for Distributed LLM Training | TrainVerify: Gleichwertigkeitsbasierte Überprüfung für verteiltes LLM-Training | 培训核查:分布式LLM培训的等效核查 2506.15961v2 |
Authors (7): Yunchi Lu, Youshan Miao, Cheng Tan, Peng Huang, Yi Zhu, Xian Zhang, Fan Yang
Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are rarely verified, leaving them prone to silent errors and potentially wasting millions of GPU hours. We introduce TrainVerify, a system for verifiable distributed training of LLMs. Given a deep learning model’s logical specification as the ground truth, TrainVerify formally verifies that a distributed parallel execution plan is mathematically equivalent to it. Direct verification is notoriously difficult due to the sheer scale of LLMs which often involves billions of variables and highly intricate computation graphs. Therefore, TrainVerify introduces shape-reduction techniques and a stage-wise parallel verification algorithm that significantly reduces complexity while preserving formal correctness. TrainVerify scales to frontier LLMs, including the successful verification of the Llama3 (405B) and DeepSeek-V3 (671B) training plans.
nan
Article 502
Title@2025-06-24 (2): Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks
Title: Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks | Destillationsfähiges Wissen Ausrichtung für generative semantische Kommunikation in AIGC Provisioning-Aufgaben | 在AIGC提供任务中产生语义通信的知识协调 2506.19893v1 |
Authors (2): Jingzhi Hu, Geoffrey Ye Li
Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between the knowledge in the cloud generative AI (GAI) and that possessed by the edges and users, and between the knowledge for wireless transmission and that of actual channels, which remains challenging. In this paper, we propose DeKA-g, a distillation-enabled knowledge alignment algorithm for GSC systems. The core idea is to distill the generation knowledge from the cloud-GAI into low-rank matrices, which can be incorporated by the edge and used to adapt the transmission knowledge to diverse wireless channel conditions. DeKA-g comprises two novel methods: metaword-aided knowledge distillation (MAKD) and variable-rate grouped SNR adaptation (VGSA). For MAKD, an optimized metaword is employed to enhance the efficiency of knowledge distillation, while VGSA enables efficient adaptation to diverse compression rates and SNR ranges. From simulation results, DeKA-g improves the alignment between the edge-generated images and the cloud-generated ones by 44%. Moreover, it adapts to compression rates with 116% higher efficiency than the baseline and enhances the performance in low-SNR conditions by 28%.
nan
Article 503
Title@2025-06-24 (2): RepuNet: A Reputation System for Mitigating Malicious Clients in DFL
Title: RepuNet: A Reputation System for Mitigating Malicious Clients in DFL | RepuNet: Ein Reputationssystem zur Bekämpfung bösartiger Kunden in der DFL | RepuNet:DFL中减少恶意客户的声望系统 2506.19892v1 |
Authors (4): Isaac Marroqui Penalva, Enrique Tomás Martínez Beltrán, Manuel Gil Pérez, Alberto Huertas Celdrán
Decentralized Federated Learning (DFL) enables nodes to collaboratively train models without a central server, introducing new vulnerabilities since each node independently selects peers for model aggregation. Malicious nodes may exploit this autonomy by sending corrupted models (model poisoning), delaying model submissions (delay attack), or flooding the network with excessive messages, negatively affecting system performance. Existing solutions often depend on rigid configurations or additional infrastructures such as blockchain, leading to computational overhead, scalability issues, or limited adaptability. To overcome these limitations, this paper proposes RepuNet, a decentralized reputation system that categorizes threats in DFL and dynamically evaluates node behavior using metrics like model similarity, parameter changes, message latency, and communication volume. Nodes’ influence in model aggregation is adjusted based on their reputation scores. RepuNet was integrated into the Nebula DFL platform and experimentally evaluated with MNIST and CIFAR-10 datasets under non-IID distributions, using federations of up to 25 nodes in both fully connected and random topologies. Different attack intensities, frequencies, and activation intervals were tested. Results demonstrated that RepuNet effectively detects and mitigates malicious behavior, achieving F1 scores above 95% for MNIST scenarios and approximately 76% for CIFAR-10 cases. These outcomes highlight RepuNet’s adaptability, robustness, and practical potential for mitigating threats in decentralized federated learning environments.
nan
Article 504
Title@2025-06-24 (2): MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications
Title: MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications | MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications | MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v1 |
Authors (3): Aleksandr Algazinov, Matt Laing, Paul Laban
Accessibility remains a critical concern in today’s society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user’s needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.
nan
Article 505
Title@2025-06-24 (2): NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling
Title: NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling | NaviAgent: Bilevel-Planung auf Werkzeugabhängigkeitsgraphen für Funktionsaufruf | NaviAgent: 功能调用工具依赖图双层规划 2506.19500v1 |
Authors (5): Yan Jiang, Hao Zhou, LiZhong GU, Ai Han, TianLong Li
LLMs’ reliance on static knowledge and fragile tool invocation severely hinders the orchestration of complex, heterogeneous toolchains, particularly at large scales. Existing methods typically use rigid single-path execution, resulting in poor error recovery and exponentially growing search spaces. We introduce NaviAgent, a graph-navigated bilevel planning architecture for robust function calling, comprising a Multi-Path Decider and Graph-Encoded Navigator. As an LLM-powered agent, the Multi-Path Decider defines a four-dimensional decision space and continuously perceives environmental states, dynamically selecting the optimal action to fully cover all tool invocation scenarios. The Graph-Encoded Navigator constructs a Tool Dependency Heterogeneous Graph (TDHG), where node embeddings explicitly fuse API schema structure with historical invocation behavior. It also integrates a novel heuristic search strategy that guides the Decider toward efficient and highly successful toolchains, even for unseen tool combinations. Experiments show that NaviAgent consistently achieves the highest task success rate (TSR) across all foundation models and task complexities, outperforming the average baselines (ReAct, ToolLLM, {\alpha}-UMI) by 13.5%, 16.4%, and 19.0% on Qwen2.5-14B, Qwen2.5-32B, and Deepseek-V3, respectively. Its execution steps are typically within one step of the most efficient baseline, ensuring a strong balance between quality and efficiency. Notably, a fine-tuned Qwen2.5-14B model achieves a TSR of 49.5%, surpassing the much larger 32B model (44.9%) under our architecture. Incorporating the Graph-Encoded Navigator further boosts TSR by an average of 2.4 points, with gains up over 9 points on complex tasks for larger models (Deepseek-V3 and GPT-4o), highlighting its essential role in toolchain orchestration.
nan
Article 506
Title@2025-06-24 (2): HeNCler: Node Clustering in Heterophilous Graphs via Learned Asymmetric Similarity
Title: HeNCler: Node Clustering in Heterophilous Graphs via Learned Asymmetric Similarity | Hencler: Knoten-Clustering in heterophilen Graphen mittels Asymmetrischer Ähnlichkeit | HENCler:通过取得非对称相似性,将异生物性图案的节点分组 2405.17050v2 |
Authors (5): Sonny Achten, Zander Op de Beeck, Francesco Tonin, Volkan Cevher, Johan A. K. Suykens
Clustering nodes in heterophilous graphs is challenging as traditional methods assume that effective clustering is characterized by high intra-cluster and low inter-cluster connectivity. To address this, we introduce HeNCler-a novel approach for Heterophilous Node Clustering. HeNCler learns a similarity graph by optimizing a clustering-specific objective based on weighted kernel singular value decomposition. Our approach enables spectral clustering on an asymmetric similarity graph, providing flexibility for both directed and undirected graphs. By solving the primal problem directly, our method overcomes the computational difficulties of traditional adjacency partitioning-based approaches. Experimental results show that HeNCler significantly improves node clustering performance in heterophilous graph settings, highlighting the advantage of its asymmetric graph-learning framework.
nan
Article 507
Title@2025-06-24 (2): COLUR: Confidence-Oriented Learning, Unlearning and Relearning with Noisy-Label Data for Model Restoration and Refinement
Title: COLUR: Confidence-Oriented Learning, Unlearning and Relearning with Noisy-Label Data for Model Restoration and Refinement | COLUR: Vertrauensorientiertes Lernen, Unlearning und Relearning mit Noisy-Label-Daten zur Modellrestauration und -verfeinerung | COLUR: 以信心为导向的学习、不学习和再学习,使用噪音标签数据促进示范恢复和完善 2506.19496v1 |
Authors (6): Zhihao Sui, Liang Hu, Jian Cao, Usman Naseem, Zhongyuan Lai, Qi Zhang
Large deep learning models have achieved significant success in various tasks. However, the performance of a model can significantly degrade if it is needed to train on datasets with noisy labels with misleading or ambiguous information. To date, there are limited investigations on how to restore performance when model degradation has been incurred by noisy label data. Inspired by the ``forgetting mechanism’’ in neuroscience, which enables accelerating the relearning of correct knowledge by unlearning the wrong knowledge, we propose a robust model restoration and refinement (MRR) framework COLUR, namely Confidence-Oriented Learning, Unlearning and Relearning. Specifically, we implement COLUR with an efficient co-training architecture to unlearn the influence of label noise, and then refine model confidence on each label for relearning. Extensive experiments are conducted on four real datasets and all evaluation results show that COLUR consistently outperforms other SOTA methods after MRR.
nan
Article 508
Title@2025-06-24 (2): Tunable correlation retention: A statistical method for generating synthetic data
Title: Tunable correlation retention: A statistical method for generating synthetic data | Tunable Korrelationsspeicherung: Eine statistische Methode zur Generierung synthetischer Daten | 保留可裁判的关联性:生成合成数据的统计方法 2403.01471v3 |
Authors (4): Nicklas Jävergård, Rainey Lyons, Adrian Muntean, Jonas Forsman
We propose a method to generate statistically representative synthetic data from a given dataset. The main goal of our method is for the created data set to mimic the inter–feature correlations present in the original data, while also offering a tunable parameter to influence the privacy level. In particular, our method constructs a statistical map by using the empirical conditional distributions between the features of the original dataset. Part of the tunability is achieved by limiting the depths of conditional distributions that are being used. We describe in detail our algorithms used both in the construction of a statistical map and how to use this map to generate synthetic observations. This approach is tested in three different ways: with a hand calculated example; a manufactured dataset; and a real world energy-related dataset of consumption/production of households in Madeira Island. We evaluate the method by comparing the datasets using the Pearson correlation matrix with different levels of resolution and depths of correlation. These two considerations are being viewed as tunable parameters influencing the resulting datasets fidelity and privacy. The proposed methodology is general in the sense that it does not rely on the used test dataset. We expect it to be applicable in a much broader context than indicated here.
nan
Article 509
Title@2025-06-24 (2): Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
Title: Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story | Steigerung der Vielfalt bei Parallelagenten: Eine höchstmögliche Entropie-Explorationsgeschichte | 增强平行代表机构的多样性:最大国家宇宙体探索空间 2505.01336v2 |
Authors (4): Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti, Marcello Restelli
Parallel data collection has redefined Reinforcement Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, $N$ identical agents operate in $N$ replicas of an environment simulator, accelerating data collection by a factor of $N$. A critical question arises: \textit{Does specializing the policies of the parallel agents hold the key to surpass the $N$ factor acceleration?} In this paper, we introduce a novel learning framework that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing redundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against systems of identical agents, as well as synergy with batch RL techniques that can exploit data diversity. Finally, we provide an original concentration analysis that shows faster rates for specialized parallel sampling distributions, which supports our methodology and may be of independent interest.
nan
Article 510
Title@2025-06-24 (2): Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy
Title: Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy | Erinnern an die vergessene Klasse Mitgliedschaften: Unerlernte Modelle können laute Labeler sein, um Leak Privacy | 回顾《被遗忘的阶级成员:未学习的模型》, 2506.19486v1 |
Authors (7): Zhihao Sui, Liang Hu, Jian Cao, Dora D. Liu, Usman Naseem, Zhongyuan Lai, Qi Zhang
Machine Unlearning (MU) technology facilitates the removal of the influence of specific data instances from trained models on request. Despite rapid advancements in MU technology, its vulnerabilities are still underexplored, posing potential risks of privacy breaches through leaks of ostensibly unlearned information. Current limited research on MU attacks requires access to original models containing privacy data, which violates the critical privacy-preserving objective of MU. To address this gap, we initiate an innovative study on recalling the forgotten class memberships from unlearned models (ULMs) without requiring access to the original one. Specifically, we implement a Membership Recall Attack (MRA) framework with a teacher-student knowledge distillation architecture, where ULMs serve as noisy labelers to transfer knowledge to student models. Then, it is translated into a Learning with Noisy Labels (LNL) problem for inferring the correct labels of the forgetting instances. Extensive experiments on state-of-the-art MU methods with multiple real datasets demonstrate that the proposed MRA strategy exhibits high efficacy in recovering class memberships of unlearned instances. As a result, our study and evaluation have established a benchmark for future research on MU vulnerabilities.
nan
Article 511
Title@2025-06-24 (2): Privacy Attacks on Image AutoRegressive Models
Title: Privacy Attacks on Image AutoRegressive Models | Datenschutzangriffe auf Image AutoRegressive Modelle | 对图像自动递减模型的隐私攻击 2502.02514v4 |
Authors (4): Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch, Adam Dziedzic
Image AutoRegressive generation has emerged as a new powerful paradigm with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for a higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns regarding their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. Concretely, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with a True Positive Rate at False Positive Rate = 1% of 86.38% vs. 6.38% for DMs with comparable attacks). We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 6 samples to detect dataset membership (compared to 200 for DI in DMs), confirming a higher information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-d30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are empirically significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance. We release the code at https://github.com/sprintml/privacy_attacks_against_iars for reproducibility.
nan
Article 512
Title@2025-06-24 (2): Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning
Title: Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning | Schnelle und distributed äquivariant Graph Neural Networks by Virtual Node Learning | 通过虚拟节点学习快速和分布的等同图形神经网络 2506.19482v1 |
Authors (4): Yuelin Zhang, Jiacheng Cen, Jiaqi Han, Wenbing Huang
Equivariant Graph Neural Networks (GNNs) have achieved remarkable success across diverse scientific applications. However, existing approaches face critical efficiency challenges when scaling to large geometric graphs and suffer significant performance degradation when the input graphs are sparsified for computational tractability. To address these limitations, we introduce FastEGNN and DistEGNN, two novel enhancements to equivariant GNNs for large-scale geometric graphs. FastEGNN employs a key innovation: a small ordered set of virtual nodes that effectively approximates the large unordered graph of real nodes. Specifically, we implement distinct message passing and aggregation mechanisms for different virtual nodes to ensure mutual distinctiveness, and minimize Maximum Mean Discrepancy (MMD) between virtual and real coordinates to achieve global distributedness. This design enables FastEGNN to maintain high accuracy while efficiently processing large-scale sparse graphs. For extremely large-scale geometric graphs, we present DistEGNN, a distributed extension where virtual nodes act as global bridges between subgraphs in different devices, maintaining consistency while dramatically reducing memory and computational overhead. We comprehensively evaluate our models across four challenging domains: N-body systems (100 nodes), protein dynamics (800 nodes), Water-3D (8,000 nodes), and our new Fluid113K benchmark (113,000 nodes). Results demonstrate superior efficiency and performance, establishing new capabilities in large-scale equivariant graph learning. Code is available at https://github.com/GLAD-RUC/DistEGNN.
nan
Article 513
Title@2025-06-24 (2): ADDQ: Adaptive Distributional Double Q-Learning
Title: ADDQ: Adaptive Distributional Double Q-Learning | ADDQ: Adaptive Verteilung Doppeltes Q-Lernen | ADDQ:适应性分配双重学习 2506.19478v1 |
Authors (5): Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bîrsan, Martin Slowik
Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.
nan
Article 514
Title@2025-06-24 (2): Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense
Title: Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense | Tiefe neuronale Netzwerke mit ReLU, undichtem ReLU und Softplus-Aktivierung überwinden nachweislich den Fluch der Dimensionalität für Kolmogorov partielle Differentialgleichungen mit Lipschitz-Nichtlinearitäten im $L^p$-Sense | 与RELU、渗漏ReLU和软增压激活的深神经网络可以克服Kolmogorov部分差异方程式的维度诅咒,Lipschitz非线性方程式以$Lúp$-sense为单位。 2309.13722v2 |
Authors (5): Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, Joshua Lee Padgett
Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computational operations they require to achieve a certain approximation accuracy $\varepsilon\in(0,\infty)$ grows at most polynomially in the PDE dimension $d\in\mathbb N$ and the reciprocal of $\varepsilon$. While there is thus far no mathematical result that proves that one of such methods is indeed capable of overcoming the COD, there are now a number of rigorous results in the literature that show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the COD in the sense that the number of parameters used to describe the approximating DNN grows at most polynomially in both the PDE dimension $d\in\mathbb N$ and the reciprocal of the approximation accuracy $\varepsilon>0$. Roughly speaking, in the literature it is has been proved for every $T>0$ that solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated by DNNs with ReLU activation at the terminal time in the $L^2$-sense without the COD provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$, can be approximated by ReLU DNNs without the COD. It is the key contribution of this work to generalize this result by establishing this statement in the $L^p$-sense with $p\in(0,\infty)$ and by allowing the activation function to be more general covering the ReLU, the leaky ReLU, and the softplus activation functions as special cases.
nan
Article 515
Title@2025-06-24 (2): Uncertainty Quantification on Graph Learning: A Survey
Title: Uncertainty Quantification on Graph Learning: A Survey | Ungewissheit Quantifizierung des Graphenlernens: Eine Umfrage | 图表学习的不确定性量化:调查 2404.14642v3 |
Authors (8): Chao Chen, Chenghua Guo, Rui Xu, Xiangwen Liao, Xi Zhang, Sihong Xie, Hui Xiong, Philip Yu
Graphical models have demonstrated their exceptional capabilities across numerous applications, such as social networks, citation networks, and online recommendation systems. However, their performance, confidence, and trustworthiness are often limited by the inherent randomness in data and the challenges of accurately modeling real-world complexities. There has been increased interest in developing uncertainty quantification (UQ) techniques tailored to graphical models. In this survey, we comprehensively examine existing works on UQ for graphical models, focusing on key aspects such as the sources, representation, handling, and evaluation of uncertainty. This survey distinguishes itself from most existing UQ surveys by specifically concentrating on UQ in graphical models, including probabilistic graphical models (PGMs) and graph neural networks (GNNs). After reviewing sources of uncertainty, we organize the work using two high-level dimensions: uncertainty representation and uncertainty handling. By offering a comprehensive overview of the current landscape, including both established methodologies and emerging trends, we aim to bridge gaps in understanding key challenges and opportunities in UQ for graphical models, hoping to inspire researchers working on graphical models or uncertainty quantification to make further advancements at the cross of the two fields.
nan
Article 516
Title@2025-06-24 (2): Orthogonal Soft Pruning for Efficient Class Unlearning
Title: Orthogonal Soft Pruning for Efficient Class Unlearning | Orthogonale Soft Pruning für effizientes Lernen | 为高效班级取消学习而整形软节奏 2506.19891v1 |
Authors (3): Qinghui Gong, Xue Yang, Xiaohu Tang
Machine unlearning aims to selectively remove class-specific knowledge from pretrained neural networks to satisfy privacy regulations such as the GDPR. Existing methods typically face a trade-off between unlearning speed and preservation of predictive accuracy, often incurring either high computational overhead or significant performance degradation on retained classes. In this paper, we propose a novel class-aware soft pruning framework leveraging orthogonal convolutional kernel regularization to achieve rapid and precise forgetting with millisecond-level response times. By enforcing orthogonality constraints during training, our method decorrelates convolutional filters and disentangles feature representations, while efficiently identifying class-specific channels through activation difference analysis. Extensive evaluations across multiple architectures and datasets demonstrate stable pruning with near-instant execution, complete forgetting of targeted classes, and minimal accuracy loss on retained data. Experiments on CIFAR-10, CIFAR-100, and TinyImageNet confirm that our approach substantially reduces membership inference attack risks and accelerates unlearning by orders of magnitude compared to state-of-the-art baselines. This framework provides an efficient, practical solution for real-time machine unlearning in Machine Learning as a Service (MLaaS) scenarios.
nan
Article 517
Title@2025-06-24 (2): Stylized Structural Patterns for Improved Neural Network Pre-training
Title: Stylized Structural Patterns for Improved Neural Network Pre-training | Stylisierte Strukturmuster für verbesserte Neural-Netzwerk-Vorausbildung | 改善神经网络的固定结构模式 2506.19465v1 |
Authors (4): Farnood Salehi, Vandit Sharma, Amirhossein Askari Farsangi, Tunç Ozan Aydın
Modern deep learning models in computer vision require large datasets of real images, which are difficult to curate and pose privacy and legal concerns, limiting their commercial use. Recent works suggest synthetic data as an alternative, yet models trained with it often underperform. This paper proposes a two-step approach to bridge this gap. First, we propose an improved neural fractal formulation through which we introduce a new class of synthetic data. Second, we propose reverse stylization, a technique that transfers visual features from a small, license-free set of real images onto synthetic datasets, enhancing their effectiveness. We analyze the domain gap between our synthetic datasets and real images using Kernel Inception Distance (KID) and show that our method achieves a significantly lower distributional gap compared to existing synthetic datasets. Furthermore, our experiments across different tasks demonstrate the practical impact of this reduced gap. We show that pretraining the EDM2 diffusion model on our synthetic dataset leads to an 11% reduction in FID during image generation, compared to models trained on existing synthetic datasets, and a 20% decrease in autoencoder reconstruction error, indicating improved performance in data representation. Furthermore, a ViT-S model trained for classification on this synthetic data achieves over a 10% improvement in ImageNet-100 accuracy. Our work opens up exciting possibilities for training practical models when sufficiently large real training sets are not available.
nan
Article 518
Title@2025-06-24 (2): Tagged for Direction: Pinning Down Causal Edge Directions with Precision
Title: Tagged for Direction: Pinning Down Causal Edge Directions with Precision | Tagged for Richtung: Pinning Down Causal Edge Richtungen mit Präzision | 指向标记: 精密地弯曲下构造边缘方向 2506.19459v1 |
Authors (5): Florian Peter Busch, Moritz Willig, Florian Guldan, Kristian Kersting, Devendra Singh Dhami
Not every causal relation between variables is equal, and this can be leveraged for the task of causal discovery. Recent research shows that pairs of variables with particular type assignments induce a preference on the causal direction of other pairs of variables with the same type. Although useful, this assignment of a specific type to a variable can be tricky in practice. We propose a tag-based causal discovery approach where multiple tags are assigned to each variable in a causal graph. Existing causal discovery approaches are first applied to direct some edges, which are then used to determine edge relations between tags. Then, these edge relations are used to direct the undirected edges. Doing so improves upon purely type-based relations, where the assumption of type consistency lacks robustness and flexibility due to being restricted to single types for each variable. Our experimental evaluations show that this boosts causal discovery and that these high-level tag relations fit common knowledge.
nan
Article 519
Title@2025-06-24 (2): Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Title: Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference | Mischung aus Cache-Conditional Experts für effiziente mobile Geräteableitung | 高效移动设备引力缓存-条件专家混合 2412.00099v2 |
Authors (8): Andrii Skliar, Ties van Rozendaal, Romain Lepert, Todor Boinovski, Mart van Baalen, Markus Nagel, Paul Whatmough, Babak Ehteshami Bejnordi
Mixture of Experts (MoE) LLMs have recently gained attention for their ability to enhance performance by selectively engaging specialized subnetworks or “experts” for each input. However, deploying MoEs on memory-constrained devices remains challenging, particularly when generating tokens sequentially with a batch size of one, as opposed to typical high-throughput settings involving long sequences or large batches. In this work, we optimize MoE on memory-constrained devices where only a subset of expert weights fit in DRAM. We introduce a novel cache-aware routing strategy that leverages expert reuse during token generation to improve cache locality. We evaluate our approach on language modeling, MMLU, and GSM8K benchmarks and present on-device results demonstrating 2$\times$ speedups on mobile devices, offering a flexible, training-free solution to extend MoE’s applicability across real-world applications.
nan
Article 520
Title@2025-06-24 (2): Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search
Title: Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search | Low-Complexity Semantic Packet Aggregation für Token Communication via Lookahead Search | 通过 Lookahead 搜索建立Tokon 通信的低复杂度语义包装集成 2506.19451v1 |
Authors (4): Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park
Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerable to outage channels, where the loss of a single token can significantly distort the original message semantics. Motivated by this, this paper focuses on optimizing token packetization to maximize the average token similarity (ATS) between the original and received token messages under outage channels. Due to inter-token dependency, this token grouping problem is combinatorial, with complexity growing exponentially with message length. To address this, we propose a novel framework of semantic packet aggregation with lookahead search (SemPA-Look), built on two core ideas. First, it introduces the residual semantic score (RSS) as a token-level surrogate for the message-level ATS, allowing robust semantic preservation even when a certain token packet is lost. Second, instead of full search, SemPA-Look applies a lookahead search-inspired algorithm that samples intra-packet token candidates without replacement (fixed depth), conditioned on inter-packet token candidates sampled with replacement (fixed width), thereby achieving linear complexity. Experiments on a remote AIGC task with the MS-COCO dataset (text captioned images) demonstrate that SemPA-Look achieves high ATS and LPIPS scores comparable to exhaustive search, while reducing computational complexity by up to 40$\times$. Compared to other linear-complexity algorithms such as the genetic algorithm (GA), SemPA-Look achieves 10$\times$ lower complexity, demonstrating its practicality for remote AIGC and other TC applications.
nan
Article 521
Title@2025-06-24 (2): SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
Title: SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification | SSPS: Selbstüberwachte positive Probenahme für robuste selbstüberwachte Lautsprecherverifizierung | SSPS: 自我监督的自我监督发言人自我监督的积极抽样核查 2505.14561v2 |
Authors (2): Theo Lepage, Reda Dehak
Self-Supervised Learning (SSL) has led to considerable progress in Speaker Verification (SV). The standard framework uses same-utterance positive sampling and data-augmentation to generate anchor-positive pairs of the same speaker. This is a major limitation, as this strategy primarily encodes channel information from the recording condition, shared by the anchor and positive. We propose a new positive sampling technique to address this bottleneck: Self-Supervised Positive Sampling (SSPS). For a given anchor, SSPS aims to find an appropriate positive, i.e., of the same speaker identity but a different recording condition, in the latent space using clustering assignments and a memory queue of positive embeddings. SSPS improves SV performance for both SimCLR and DINO, reaching 2.57% and 2.53% EER, outperforming SOTA SSL methods on VoxCeleb1-O. In particular, SimCLR-SSPS achieves a 58% EER reduction by lowering intra-speaker variance, providing comparable performance to DINO-SSPS.
nan
Article 522
Title@2025-06-24 (2): The Elements of Differentiable Programming
Title: The Elements of Differentiable Programming | Die Elemente der differenzierbaren Programmierung | 不同方案拟订要素 2403.14606v3 |
Authors (2): Mathieu Blondel, Vincent Roulet
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two. Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.
nan
Article 523
Title@2025-06-24 (2): Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning
Title: Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning | Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Verstärkung Learning | 重力-引力引导焦点集中影响多机构强化学习机制中心 2506.19417v1 |
Authors (3): Yisak Park, Sunwoo Lee, Seungyul Han
Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) state dimensions, inspired by Clausewitz’s military theory. FIM consists of three core components: (1) identifying CoG state dimensions based on their stability under agent behavior, (2) designing counterfactual intrinsic rewards to promote meaningful influence on these dimensions, and (3) encouraging persistent and synchronized focus through eligibility-trace-based credit accumulation. These mechanisms enable agents to induce more targeted and effective state transitions, facilitating robust cooperation even in extremely sparse reward settings. Empirical evaluations across diverse MARL benchmarks demonstrate that the proposed FIM significantly improves cooperative performance compared to baselines.
nan
Article 524
Title@2025-06-24 (2): Multi-Continental Healthcare Modelling Using Blockchain-Enabled Federated Learning
Title: Multi-Continental Healthcare Modelling Using Blockchain-Enabled Federated Learning | Multi-Continental Healthcare Modellierung mittels Blockchain-Enabled Federated Learning | 利用综合链链-能连链的联邦学习模式建立多州保健模式 2410.17933v3 |
Authors (8): Rui Sun, Zhipeng Wang, Hengrui Zhang, Ming Jiang, Yizhe Wen, Jiahao Sun, Xinyu Qu, Kezhi Li
One of the biggest challenges of building artificial intelligence (AI) model in the healthcare area is the data sharing. Since healthcare data is private, sensitive, and heterogeneous, collecting sufficient data for modelling is exhausting, costly, and sometimes impossible. In this paper, we propose a framework for global healthcare modelling using datasets from multi-continents (Europe, North America, and Asia) without sharing the local datasets, and choose glucose management as a study model to verify its effectiveness. Technically, blockchain-enabled federated learning is implemented with adaptation to meet the privacy and safety requirements of healthcare data, meanwhile, it rewards honest participation and penalizes malicious activities using its on-chain incentive mechanism. Experimental results show that the proposed framework is effective, efficient, and privacy-preserving. Its prediction accuracy consistently outperforms models trained on limited personal data and achieves comparable or even slightly better results than centralized training in certain scenarios, all while preserving data privacy. This work paves the way for international collaborations on healthcare projects, where additional data is crucial for reducing bias and providing benefits to humanity.
nan
Article 525
Title@2025-06-24 (2): Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Title: Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models | Meta-Reasoner: Dynamische Anleitung zur optimierten Schlussfolgerungs-Zeit-Reasoning in großen Sprachmodellen | Meta-Reasoner:大语言模型中优化推断-时间理由的动态指导 2502.19918v3 |
Authors (6): Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi
Large Language Models (LLMs) increasingly rely on prolonged reasoning chains to solve complex tasks. However, this trial-and-error approach often leads to high computational overhead and error propagation, where early mistakes can derail subsequent steps. To address these issues, we introduce Meta-Reasoner, a framework that dynamically optimizes inference-time reasoning by enabling LLMs to “think about how to think.” Drawing inspiration from human meta-cognition and dual-process theory, Meta-Reasoner operates as a strategic advisor, decoupling high-level guidance from step-by-step generation. It employs contextual multi-armed bandits to iteratively evaluate reasoning progress and select optimal strategies (e.g., backtrack, clarify ambiguity, restart from scratch, or propose alternative approaches), and reallocates computational resources toward the most promising paths. Our evaluations on mathematical reasoning and puzzles highlight the potential of dynamic reasoning chains to overcome inherent challenges in the LLM reasoning process and also show promise in broader applications, offering a scalable and adaptable solution for reasoning-intensive tasks.
nan
Article 526
Title@2025-06-24 (2): Online Discovery of Simulation Models for Evolving Business Processes (Extended Version)
Title: Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) | Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) | 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v2 |
Authors (4): Francesco Vinci, Gyunam Park, Wil van der Aalst, Massimiliano de Leoni
Business Process Simulation (BPS) refers to techniques designed to replicate the dynamic behavior of a business process. Many approaches have been proposed to automatically discover simulation models from historical event logs, reducing the cost and time to manually design them. However, in dynamic business environments, organizations continuously refine their processes to enhance efficiency, reduce costs, and improve customer satisfaction. Existing techniques to process simulation discovery lack adaptability to real-time operational changes. In this paper, we propose a streaming process simulation discovery technique that integrates Incremental Process Discovery with Online Machine Learning methods. This technique prioritizes recent data while preserving historical information, ensuring adaptation to evolving process dynamics. Experiments conducted on four different event logs demonstrate the importance in simulation of giving more weight to recent data while retaining historical knowledge. Our technique not only produces more stable simulations but also exhibits robustness in handling concept drift, as highlighted in one of the use cases.
nan
Article 527
Title@2025-06-24 (2): M3D: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition
Title: M3D: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition | M3D: Manifold-based Domain Adaptation mit dynamischer Distribution für nicht-deep Transfer Learning in Cross-Subjekt und Cross-Session EEG-based Emotion Recognition | M3D: 跨学科和跨学科EEG的情感识别中非深入转移学习动态分布的多功能适应和基于多科目和跨学科EEG的情感识别中非深入转移学习动态分布 2404.15615v3 |
Authors (7): Ting Luo, Jing Zhang, Yingwei Qiu, Li Zhang, Yaohua Hu, Zhuliang Yu, Zhen Liang
Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces (aBCIs) plays a crucial role in affective computing but is limited by challenges such as EEG’s non-stationarity, individual variability, and the high cost of large labeled datasets. While deep learning methods are effective, they require extensive computational resources and large data volumes, limiting their practical application. To overcome these issues, we propose Manifold-based Domain Adaptation with Dynamic Distribution (M3D), a lightweight, non-deep transfer learning framework. M3D consists of four key modules: manifold feature transformation, dynamic distribution alignment, classifier learning, and ensemble learning. The data is mapped to an optimal Grassmann manifold space, enabling dynamic alignment of source and target domains. This alignment is designed to prioritize both marginal and conditional distributions, improving adaptation efficiency across diverse datasets. In classifier learning, the principle of structural risk minimization is applied to build robust classification models. Additionally, dynamic distribution alignment iteratively refines the classifier. The ensemble learning module aggregates classifiers from different optimization stages to leverage diversity and enhance prediction accuracy. M3D is evaluated on two EEG emotion recognition datasets using two validation protocols (cross-subject single-session and cross-subject cross-session) and a clinical EEG dataset for Major Depressive Disorder (MDD). Experimental results show that M3D outperforms traditional non-deep learning methods with a 4.47% average improvement and achieves deep learning-level performance with reduced data and computational requirements, demonstrating its potential for real-world aBCI applications.
nan
Article 528
Title@2025-06-24 (2): Improved and Explainable Cervical Cancer Classification using Ensemble Pooling of Block Fused Descriptors
Title: Improved and Explainable Cervical Cancer Classification using Ensemble Pooling of Block Fused Descriptors | Verbesserte und erklärbare Cervical Cancer Classification mit Ensemblepooling von Block Fused Descriptors | 使用聚在一起的聚聚聚块阻燃描述词块改进子宫颈癌分类和可解释的子宫颈癌分类 2405.01600v2 |
Authors (3): Saurabh Saini, Kapil Ahuja, Akshat S. Chauhan
Cervical cancer is the second most common cancer in women and causes high death rates. Earlier models for detecting cervical cancer had limited success. In this work, we propose new models that substantially outperform previous models. Previous studies show that pretrained ResNets extract features from cervical cancer images well. Hence, our first model involves working with three ResNets (50, 101, 152). All the existing works use only the last convolution block of their respective ResNet, which captures abstract features (e.g., shapes, objects). However, we believe that detailed features (e.g., color, edges, texture), coming from earlier convolution blocks, are equally important for cancer (specifically cervical cancer) classification. Since now the number of features become large, we use a novel feature selection technique of Global Max Pooling for detailed features and Global Average Pooling for abstract features. Hence, our second model consists of the resulting Cascaded Block Fused variants of the three ResNets. To improve the performance further, we combine and normalize the features of the three standard ResNets as well as our proposed three Cascaded Block Fused ResNets. This type of combination is also new in cancer classification domain (also in cervical cancer), and results in our third and fourth models, respectively. We use a linear SVM for classification. We exhaustively perform experiments on two public datasets, IARC and AnnoCerv, achieving an average performance of 97.92% and 92.97% surpassing standard ResNets performance of 90.89% and 87.97%, respectively. We outperform the competitive approach available on IARC dataset with an average gain of 13.20%, while no prior competitive work available on AnnoCerv. Additionally, we introduce a novel SHAP+LIME explainability method, accurately identifying the cancerous region in 97% of cases.
nan
Article 529
Title@2025-06-24 (2): Controllable Video Generation with Provable Disentanglement
Title: Controllable Video Generation with Provable Disentanglement | Steuerbare Video-Generation mit wahrnehmbarer Entwirrtheit | 带可变解脱的可控视频生成 2502.02690v2 |
Authors (9): Yifan Shen, Peiyuan Zhu, Zijian Li, Shaoan Xie, Zeyu Tang, Namrata Deka, Zongfang Liu, Guangyi Chen, Kun Zhang
Controllable video generation remains a significant challenge, despite recent advances in generating high-quality and consistent videos. Most existing methods for controlling video generation treat the video as a whole, neglecting intricate fine-grained spatiotemporal relationships, which limits both control precision and efficiency. In this paper, we propose Controllable Video Generative Adversarial Networks (CoVoGAN) to disentangle the video concepts, thus facilitating efficient and independent control over individual concepts. Specifically, following the minimal change principle, we first disentangle static and dynamic latent variables. We then leverage the sufficient change property to achieve component-wise identifiability of dynamic latent variables, enabling disentangled control of video generation. To establish the theoretical foundation, we provide a rigorous analysis demonstrating the identifiability of our approach. Building on these theoretical insights, we design a Temporal Transition Module to disentangle latent dynamics. To enforce the minimal change principle and sufficient change property, we minimize the dimensionality of latent dynamic variables and impose temporal conditional independence. To validate our approach, we integrate this module as a plug-in for GANs. Extensive qualitative and quantitative experiments on various video generation benchmarks demonstrate that our method significantly improves generation quality and controllability across diverse real-world scenarios.
nan
Article 530
Title@2025-06-24 (2): Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators
Title: Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators | Maximale Aktualisierung Parametrisierung und Null-Shot-Hyperparameter-Übertragung für Fourier-Neural-Betreiber | Fourier神经操作员最大更新平衡化和零热超强参数转换 2506.19396v1 |
Authors (3): Shanda Li, Shinjae Yoo, Yiming Yang
Fourier Neural Operators (FNOs) offer a principled approach for solving complex partial differential equations (PDEs). However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $\mu$Transfer-FNO, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs without additional tuning. Building on the Maximal Update Parametrization ($\mu$P) framework, we mathematically derive a parametrization scheme that facilitates the transfer of optimal hyperparameters across models with different numbers of Fourier modes in FNOs, which is validated through extensive experiments on various PDEs. Our empirical study shows that Transfer-FNO reduces computational cost for tuning hyperparameters on large FNOs while maintaining or improving accuracy.
nan
Article 531
Title@2025-06-24 (2): ANOVA-boosting for Random Fourier Features
Title: ANOVA-boosting for Random Fourier Features | ANOVA-Boosting für zufällige Fourier-Features | ANOVA 启动随机 Fourier 特性 2404.03050v2 |
Authors (2): Daniel Potts, Laura Weidensager
We propose two algorithms for boosting random Fourier feature models for approximating high-dimensional functions. These methods utilize the classical and generalized analysis of variance (ANOVA) decomposition to learn low-order functions, where there are few interactions between the variables. Our algorithms are able to find an index set of important input variables and variable interactions reliably. Furthermore, we generalize already existing random Fourier feature models to an ANOVA setting, where terms of different order can be used. Our algorithms have the advantage of interpretability, meaning that the influence of every input variable is known in the learned model, even for dependent input variables. We give theoretical as well as numerical results that our algorithms perform well for sensitivity analysis. The ANOVA-boosting step reduces the approximation error of existing methods significantly.
nan
Article 532
Title@2025-06-24 (2): Causal-Aware Intelligent QoE Optimization for VR Interaction with Adaptive Keyframe Extraction
Title: Causal-Aware Intelligent QoE Optimization for VR Interaction with Adaptive Keyframe Extraction | Causal-Aware Intelligente QoE-Optimierung für VR-Interaktion mit adaptiver Keyframe-Extraktion | VR 与适应性键框架的提取互动的优化 QoE 2506.19890v1 |
Authors (3): Ziru Zhang, Jiadong Yu, Danny H. K. Tsang
The optimization of quality of experience (QoE) in multi-user virtual reality (VR) interactions demands a delicate balance between ultra-low latency, high-fidelity motion synchronization, and equitable resource allocation. While adaptive keyframe extraction mitigates transmission overhead, existing approaches often overlook the causal relationships among allocated bandwidth, CPU frequency, and user perception, limiting QoE gains. This paper proposes an intelligent framework to maximize QoE by integrating adaptive keyframe extraction with causal-aware reinforcement learning (RL). First, a novel QoE metric is formulated using the Weber-Fechner Law, combining perceptual sensitivity, attention-driven priorities, and motion reconstruction accuracy. The QoE optimization problem is then modeled as a mixed integer programming (MIP) task, jointly optimizing keyframe ratios, bandwidth, and computational resources under horizon-fairness constraints. We propose Partial State Causal Deep Deterministic Policy Gradient (PS-CDDPG), which integrates the Deep Deterministic Policy Gradient (DDPG) method with causal influence detection. By leveraging causal information regarding how QoE is influenced and determined by various actions, we explore actions guided by weights calculated from causal inference (CI), which in turn improves training efficiency. Experiments conducted with the CMU Motion Capture Database demonstrate that our framework significantly reduces interactive latency, enhances QoE, and maintains fairness, achieving superior performance compared to benchmark methods.
nan
Article 533
Title@2025-06-24 (2): Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees
Title: Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees | Bewältigen sich Vendi-Scores mit Finite-Proben? Beschnittener Vendi-Score für Finite-Sample-Konvergenzgarantien | Vendi 分数是否与有限样本相连接? 2410.21719v3 |
Authors (2): Azim Ospanov, Farzan Farnia
Evaluating the diversity of generative models without reference data poses methodological challenges. The reference-free Vendi and RKE scores address this by quantifying the diversity of generated data using matrix-based entropy measures. Among these two, the Vendi score is typically computed via the eigendecomposition of an $n \times n$ kernel matrix constructed from n generated samples. However, the prohibitive computational cost of eigendecomposition for large $n$ often limits the number of samples used to fewer than 20,000. In this paper, we investigate the statistical convergence of the Vendi and RKE scores under restricted sample sizes. We numerically demonstrate that, in general, the Vendi score computed with standard sample sizes below 20,000 may not converge to its asymptotic value under infinite sampling. To address this, we introduce the $t$-truncated Vendi score by truncating the eigenspectrum of the kernel matrix, which is provably guaranteed to converge to its population limit with $n=\mathcal{O}(t)$ samples. We further show that existing Nystr"om and FKEA approximation methods converge to the asymptotic limit of the truncated Vendi score. In contrast to the Vendi score, we prove that the RKE score enjoys universal convergence guarantees across all kernel functions. We conduct several numerical experiments to illustrate the concentration of Nystr"om and FKEA computed Vendi scores around the truncated Vendi score, and we analyze how the truncated Vendi and RKE scores correlate with the diversity of image and text data. The code is available at https://github.com/aziksh-ospanov/truncated-vendi.
nan
Article 534
Title@2025-06-24 (2): NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs
Title: NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs | NAADA: A Noise-Aware Aufmerksamkeit Denoising Autoencoder für Dental Panoramic Radiographs | a. 用于牙科全景射电辐射仪的噪音警报器注意自动编码器 2506.19387v1 |
Authors (3): Khuram Naveed, Bruna Neves de Freitas, Ruben Pauwels
Convolutional denoising autoencoders (DAEs) are powerful tools for image restoration. However, they inherit a key limitation of convolutional neural networks (CNNs): they tend to recover low-frequency features, such as smooth regions, more effectively than high-frequency details. This leads to the loss of fine details, which is particularly problematic in dental radiographs where preserving subtle anatomical structures is crucial. While self-attention mechanisms can help mitigate this issue by emphasizing important features, conventional attention methods often prioritize features corresponding to cleaner regions and may overlook those obscured by noise. To address this limitation, we propose a noise-aware self-attention method, which allows the model to effectively focus on and recover key features even within noisy regions. Building on this approach, we introduce the noise-aware attention-enhanced denoising autoencoder (NAADA) network for enhancing noisy panoramic dental radiographs. Compared with the recent state of the art (and much heavier) methods like Uformer, MResDNN etc., our method improves the reconstruction of fine details, ensuring better image quality and diagnostic accuracy.
nan
Article 535
Title@2025-06-24 (2): Deep Electromagnetic Structure Design Under Limited Evaluation Budgets
Title: Deep Electromagnetic Structure Design Under Limited Evaluation Budgets | Deep Elektromagnetic Structure Design unter begrenzter Bewertung Budgets | 有限评价预算下的深电磁结构设计 2506.19384v1 |
Authors (5): Shijian Zheng, Fangxiao Jin, Shuhai Zhang, Quan Xue, Mingkui Tan
Electromagnetic structure (EMS) design plays a critical role in developing advanced antennas and materials, but remains challenging due to high-dimensional design spaces and expensive evaluations. While existing methods commonly employ high-quality predictors or generators to alleviate evaluations, they are often data-intensive and struggle with real-world scale and budget constraints. To address this, we propose a novel method called Progressive Quadtree-based Search (PQS). Rather than exhaustively exploring the high-dimensional space, PQS converts the conventional image-like layout into a quadtree-based hierarchical representation, enabling a progressive search from global patterns to local details. Furthermore, to lessen reliance on highly accurate predictors, we introduce a consistency-driven sample selection mechanism. This mechanism quantifies the reliability of predictions, balancing exploitation and exploration when selecting candidate designs. We evaluate PQS on two real-world engineering tasks, i.e., Dual-layer Frequency Selective Surface and High-gain Antenna. Experimental results show that our method can achieve satisfactory designs under limited computational budgets, outperforming baseline methods. In particular, compared to generative approaches, it cuts evaluation costs by 75-85%, effectively saving 20.27-38.80 days of product designing cycle.
nan
Article 536
Title@2025-06-24 (2): Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning
Title: Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning | Erklärbare Künstliche Intelligenz Bonitätsbeurteilung mittels maschinellem Lernen | 利用机器学习进行可解释的人工智能信息信用风险评估 2506.19383v1 |
Authors (2): Shreya, Harsh Pathak
This paper presents an intelligent and transparent AI-driven system for Credit Risk Assessment using three state-of-the-art ensemble machine learning models combined with Explainable AI (XAI) techniques. The system leverages XGBoost, LightGBM, and Random Forest algorithms for predictive analysis of loan default risks, addressing the challenges of model interpretability using SHAP and LIME. Preprocessing steps include custom imputation, one-hot encoding, and standardization. Class imbalance is managed using SMOTE, and hyperparameter tuning is performed with GridSearchCV. The model is evaluated on multiple performance metrics including ROC-AUC, precision, recall, and F1-score. LightGBM emerges as the most business-optimal model with the highest accuracy and best trade off between approval and default rates. Furthermore, the system generates applicant-specific XAI visual reports and business impact summaries to ensure transparent decision-making.
nan
Article 537
Title@2025-06-24 (2): ReDit: Reward Dithering for Improved LLM Policy Optimization
Title: ReDit: Reward Dithering for Improved LLM Policy Optimization | ReDit: Belohnung für verbesserte LLM-Policy-Optimierung | Redit:为改进LLM政策优化而向优利分差 2506.18631v2 |
Authors (6): Chenxing Wei, Jiarui Yu, Ying Tiffany He, Hande Dong, Yao Shu, Fei Yu
DeepSeek-R1 has successfully enhanced Large Language Model (LLM) reasoning capabilities through its rule-based reward system. While it’s a ‘‘perfect’’ reward system that effectively mitigates reward hacking, such reward functions are often discrete. Our experimental observations suggest that discrete rewards can lead to gradient anomaly, unstable optimization, and slow convergence. To address this issue, we propose ReDit (Reward Dithering), a method that dithers the discrete reward signal by adding simple random noise. With this perturbed reward, exploratory gradients are continuously provided throughout the learning process, enabling smoother gradient updates and accelerating convergence. The injected noise also introduces stochasticity into flat reward regions, encouraging the model to explore novel policies and escape local optima. Experiments across diverse tasks demonstrate the effectiveness and efficiency of ReDit. On average, ReDit achieves performance comparable to vanilla GRPO with only approximately 10% the training steps, and furthermore, still exhibits a 4% performance improvement over vanilla GRPO when trained for a similar duration. Visualizations confirm significant mitigation of gradient issues with ReDit. Moreover, theoretical analyses are provided to further validate these advantages.
nan
Article 538
Title@2025-06-24 (2): Path Learning with Trajectory Advantage Regression
Title: Path Learning with Trajectory Advantage Regression | Pfad-Lernen mit Trajektor-Vorteil Regression | 路径学习与轨迹优于后退的路径学习 2506.19375v1 |
Authors (1): Kohei Miyaguchi
In this paper, we propose trajectory advantage regression, a method of offline path learning and path attribution based on reinforcement learning. The proposed method can be used to solve path optimization problems while algorithmically only solving a regression problem.
nan
Article 539
Title@2025-06-24 (2): Flopping for FLOPs: Leveraging equivariance for computational efficiency
Title: Flopping for FLOPs: Leveraging equivariance for computational efficiency | Flopping für FLOPs: Equivarianz für Berechnungseffizienz | FLOPs 的浮动 : 利用计算效率的等差 2502.05169v2 |
Authors (3): Georg Bökman, David Nordström, Fredrik Kahl
Incorporating geometric invariance into neural networks enhances parameter efficiency but typically increases computational costs. This paper introduces new equivariant neural networks that preserve symmetry while maintaining a comparable number of floating-point operations (FLOPs) per parameter to standard non-equivariant networks. We focus on horizontal mirroring (flopping) invariance, common in many computer vision tasks. The main idea is to parametrize the feature spaces in terms of mirror-symmetric and mirror-antisymmetric features, i.e., irreps of the flopping group. This decomposes the linear layers to be block-diagonal, requiring half the number of FLOPs. Our approach reduces both FLOPs and wall-clock time, providing a practical solution for efficient, scalable symmetry-aware architectures.
nan
Article 540
Title@2025-06-24 (2): WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT
Title: WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT | WebGuard++:Interpretable bösartige URL-Erkennung durch bidirektionale Fusion von HTML-Subgraphen und multi-Scale Convolutional BERT | WebGuard++: 通过 HTML 子集成和多波段进化 BERT 双向融合来可解释的恶意 URL 探测 2506.19356v1 |
Authors (5): Ye Tian, Zhang Yumin, Yifan Jia, Jianguo Sun, Yanbin Wang
URL+HTML feature fusion shows promise for robust malicious URL detection, since attacker artifacts persist in DOM structures. However, prior work suffers from four critical shortcomings: (1) incomplete URL modeling, failing to jointly capture lexical patterns and semantic context; (2) HTML graph sparsity, where threat-indicative nodes (e.g., obfuscated scripts) are isolated amid benign content, causing signal dilution during graph aggregation; (3) unidirectional analysis, ignoring URL-HTML feature bidirectional interaction; and (4) opaque decisions, lacking attribution to malicious DOM components. To address these challenges, we present WebGuard++, a detection framework with 4 novel components: 1) Cross-scale URL Encoder: Hierarchically learns local-to-global and coarse to fine URL features based on Transformer network with dynamic convolution. 2) Subgraph-aware HTML Encoder: Decomposes DOM graphs into interpretable substructures, amplifying sparse threat signals via Hierarchical feature fusion. 3) Bidirectional Coupling Module: Aligns URL and HTML embeddings through cross-modal contrastive learning, optimizing inter-modal consistency and intra-modal specificity. 4) Voting Module: Localizes malicious regions through consensus voting on malicious subgraph predictions. Experiments show WebGuard++ achieves significant improvements over state-of-the-art baselines, achieving 1.1x-7.9x higher TPR at fixed FPR of 0.001 and 0.0001 across both datasets.
nan
Article 541
Title@2025-06-24 (2): In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
Title: In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly | In-Context Occams Razor: Wie Transformer einfachere Hypothesen auf der Fliege bevorzugen | 内文 Occam 的剃刀: 如何在飞行中发生变形人更倾向于简单易碎的假说 2506.19351v1 |
Authors (4): Puneesh Deora, Bhavya Vasudeva, Tina Behnia, Christos Thrampoulidis
In-context learning (ICL) enables transformers to adapt to new tasks through contextual examples without parameter updates. While existing research has typically studied ICL in fixed-complexity environments, practical language models encounter tasks spanning diverse complexity levels. This paper investigates how transformers navigate hierarchical task structures where higher-complexity categories can perfectly represent any pattern generated by simpler ones. We design well-controlled testbeds based on Markov chains and linear regression that reveal transformers not only identify the appropriate complexity level for each task but also accurately infer the corresponding parameters–even when the in-context examples are compatible with multiple complexity hypotheses. Notably, when presented with data generated by simpler processes, transformers consistently favor the least complex sufficient explanation. We theoretically explain this behavior through a Bayesian framework, demonstrating that transformers effectively implement an in-context Bayesian Occam’s razor by balancing model fit against complexity penalties. We further ablate on the roles of model size, training mixture distribution, inference context length, and architecture. Finally, we validate this Occam’s razor-like inductive bias on a pretrained GPT-4 model with Boolean-function tasks as case study, suggesting it may be inherent to transformers trained on diverse task distributions.
nan
Article 542
Title@2025-06-24 (2): Discrepancy-Aware Graph Mask Auto-Encoder
Title: Discrepancy-Aware Graph Mask Auto-Encoder | Discrepancy-Aware Graph Maske Auto-Encoder | 自动编码器 2506.19343v1 |
Authors (5): Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Weigang Lu
Masked Graph Auto-Encoder, a powerful graph self-supervised training paradigm, has recently shown superior performance in graph representation learning. Existing works typically rely on node contextual information to recover the masked information. However, they fail to generalize well to heterophilic graphs where connected nodes may be not similar, because they focus only on capturing the neighborhood information and ignoring the discrepancy information between different nodes, resulting in indistinguishable node representations. In this paper, to address this issue, we propose a Discrepancy-Aware Graph Mask Auto-Encoder (DGMAE). It obtains more distinguishable node representations by reconstructing the discrepancy information of neighboring nodes during the masking process. We conduct extensive experiments on 17 widely-used benchmark datasets. The results show that our DGMAE can effectively preserve the discrepancies of nodes in low-dimensional space. Moreover, DGMAE significantly outperforms state-of-the-art graph self-supervised learning methods on three graph analytic including tasks node classification, node clustering, and graph classification, demonstrating its remarkable superiority. The code of DGMAE is available at https://github.com/zhengziyu77/DGMAE.
nan
Article 543
Title@2025-06-24 (2): Unlocking Insights Addressing Alcohol Inference Mismatch through Database-Narrative Alignment
Title: Unlocking Insights Addressing Alcohol Inference Mismatch through Database-Narrative Alignment | Unlocking Insights adressing Alcohol Inferenz Mismatch durch Datenbank-Narrative Alignment | 通过数据库-聚合对齐来解锁对酒精推断误差的透视 2506.19342v1 |
Authors (6): Sudesh Bhagat, Raghupathi Kandiboina, Ibne Farabi Shihab, Skylar Knickerbocker, Neal Hawkins, Anuj Sharma
Road traffic crashes are a significant global cause of fatalities, emphasizing the urgent need for accurate crash data to enhance prevention strategies and inform policy development. This study addresses the challenge of alcohol inference mismatch (AIM) by employing database narrative alignment to identify AIM in crash data. A framework was developed to improve data quality in crash management systems and reduce the percentage of AIM crashes. Utilizing the BERT model, the analysis of 371,062 crash records from Iowa (2016-2022) revealed 2,767 AIM incidents, resulting in an overall AIM percentage of 24.03%. Statistical tools, including the Probit Logit model, were used to explore the crash characteristics affecting AIM patterns. The findings indicate that alcohol-related fatal crashes and nighttime incidents have a lower percentage of the mismatch, while crashes involving unknown vehicle types and older drivers are more susceptible to mismatch. The geospatial cluster as part of this study can identify the regions which have an increased need for education and training. These insights highlight the necessity for targeted training programs and data management teams to improve the accuracy of crash reporting and support evidence-based policymaking.
nan
Article 544
Title@2025-06-24 (2): CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension
Title: CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension | CAM-NET: Ein KI-Modell für ganze Atmosphäre mit Thermosphäre und Ionosphärenerweiterung | CAM-NET:具有热层和电离层扩展作用的AI全大气模型 2506.19340v1 |
Authors (2): Jiahui Hu, Wenjun Dong
We present Compressible Atmospheric Model-Network (CAM-NET), an AI model designed to predict neutral atmospheric variables from the Earth’s surface to the ionosphere with high accuracy and computational efficiency. Accurate modeling of the entire atmosphere is critical for understanding the upward propagation of gravity waves, which influence upper-atmospheric dynamics and coupling across atmospheric layers. CAM-NET leverages the Spherical Fourier Neural Operator (SFNO) to capture global-scale atmospheric dynamics while preserving the Earth’s spherical structure. Trained on a decade of datasets from the Whole Atmosphere Community Climate Model with thermosphere and ionosphere eXtension (WACCM-X), CAM-NET demonstrates accuracy comparable to WACCM-X while achieving a speedup of over 1000x in inference time, can provide one year simulation within a few minutes once trained. The model effectively predicts key atmospheric parameters, including zonal and meridional winds, temperature, and time rate of pressure. Inspired by traditional modeling approaches that use external couplers to simulate tracer transport, CAM-NET introduces a modular architecture that explicitly separates tracer prediction from core dynamics. The core backbone of CAM-NET focuses on forecasting primary physical variables (e.g., temperature, wind velocity), while tracer variables are predicted through a lightweight, fine-tuned model. This design allows for efficient adaptation to specific tracer scenarios with minimal computational cost, avoiding the need to retrain the entire model. We have validated this approach on the $O^2$ tracer, demonstrating strong performance and generalization capabilities.
nan
Article 545
Title@2025-06-24 (2): Contrastive Cross-Modal Learning for Infusing Chest X-ray Knowledge into ECGs
Title: Contrastive Cross-Modal Learning for Infusing Chest X-ray Knowledge into ECGs | Kontrastives Cross-Modal-Lernen für das Einbringen von Röntgenwissen im Brustkorb in EKGs | 将切斯特X射线知识注入ECG 2506.19329v1 |
Authors (3): Vineet Punyamoorty, Aditya Malusare, Vaneet Aggarwal
Modern diagnostic workflows are increasingly multimodal, integrating diverse data sources such as medical images, structured records, and physiological time series. Among these, electrocardiograms (ECGs) and chest X-rays (CXRs) are two of the most widely used modalities for cardiac assessment. While CXRs provide rich diagnostic information, ECGs are more accessible and can support scalable early warning systems. In this work, we propose CroMoTEX, a novel contrastive learning-based framework that leverages chest X-rays during training to learn clinically informative ECG representations for multiple cardiac-related pathologies: cardiomegaly, pleural effusion, and edema. Our method aligns ECG and CXR representations using a novel supervised cross-modal contrastive objective with adaptive hard negative weighting, enabling robust and task-relevant feature learning. At test time, CroMoTEX relies solely on ECG input, allowing scalable deployment in real-world settings where CXRs may be unavailable. Evaluated on the large-scale MIMIC-IV-ECG and MIMIC-CXR datasets, CroMoTEX outperforms baselines across all three pathologies, achieving up to 78.31 AUROC on edema. Our code is available at github.com/vineetpmoorty/cromotex.
nan
Article 546
Title@2025-06-24 (2): Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack
Title: Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack | Diffusionsbasierte aufgabenorientierte semantische Kommunikation mit Model Inversion Attack | 以传播为基础的以任务为导向的语义通信与模型反向攻击 2506.19886v1 |
Authors (5): Xuesong Wang, Mo Li, Xingyan Shi, Zhaoqian Liu, Shenghao Yang
Semantic communication has emerged as a promising neural network-based system design for 6G networks. Task-oriented semantic communication is a novel paradigm whose core goal is to efficiently complete specific tasks by transmitting semantic information, optimizing communication efficiency and task performance. The key challenge lies in preserving privacy while maintaining task accuracy, as this scenario is susceptible to model inversion attacks. In such attacks, adversaries can restore or even reconstruct input data by analyzing and processing model outputs, owing to the neural network-based nature of the systems. In addition, traditional systems use image quality indicators (such as PSNR or SSIM) to assess attack severity, which may be inadequate for task-oriented semantic communication, since visual differences do not necessarily ensure semantic divergence. In this paper, we propose a diffusion-based semantic communication framework, named DiffSem, that optimizes semantic information reconstruction through a diffusion mechanism with self-referential label embedding to significantly improve task performance. Our model also compensates channel noise and adopt semantic information distortion to ensure the robustness of the system in various signal-to-noise ratio environments. To evaluate the attacker’s effectiveness, we propose a new metric that better quantifies the semantic fidelity of estimations from the adversary. Experimental results based on this criterion show that on the MNIST dataset, DiffSem improves the classification accuracy by 10.03%, and maintain stable performance under dynamic channels. Our results further demonstrate that significant deviation exists between traditional image quality indicators and the leakage of task-relevant semantic information.
nan
Article 547
Title@2025-06-24 (2): Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
Title: Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups | Summ-of-Parts: Selbstzuordnende neurale Netzwerke mit Ende-zu-Ende-Lernen von Feature-Gruppen | 部分总和:自成一体的神经网络,以及特异组的端到端学习 2310.16316v4 |
Authors (5): Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong
Self-attributing neural networks (SANNs) present a potential path towards interpretable models for high-dimensional problems, but often face significant trade-offs in performance. In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. Motivated by these insights, we propose Sum-of-Parts (SOP), a framework that transforms any differentiable model into a group-based SANN, where feature groups are learned end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and we validate that the groups are interpretable on a range of quantitative and semantic metrics. We further validate the utility of SOP explanations in model debugging and cosmological scientific discovery. Our code is available at https://github.com/BrachioLab/sop
nan
Article 548
Title@2025-06-24 (2): FlightKooba: A Fast Interpretable FTP Model
Title: FlightKooba: A Fast Interpretable FTP Model | FlightKooba: Ein schnell interpretierbares FTP-Modell | 库巴飞行:快速解释式FTP模型 2506.19885v1 |
Authors (5): Jing Lu, Xuan Wu, Yizhun Tian, Songhan Fan, Yali Fang
The Koopman theory is a powerful and effective modeling tool for converting nonlinear systems into linear representations, and flight trajectory prediction (FTP) is a complex nonlinear system. However, current models applying the Koopman theory to FTP tasks are not very effective, model interpretability is indeed an issue, and the Koopman operators are computationally intensive, resulting in long training times. To address this issue, this paper proposes a new modeling and control framework based on the HIPPO method, the Koopman theory, and state space equations from cybernetics: FlightKooba. Inspired by the idea of structural state space equations, FlightKooba directly constructs the Koopman operators from data. This makes the framework highly interpretable and significantly reduces the number of trainable parameters in the module, thereby greatly reducing training time. Experiments have demonstrated the superiority of the FlightKooba modeling method in terms of time and memory consumption (training time comparable to the Mamba module without using CUDA-level acceleration; memory reduced by more than 50% on most datasets, with a tenfold reduction in the number of parameters), essentially completing the FTP task. It provides a new method for the fast computation of the Koopman operators, opening up new possibilities for the combination of time series forecasting and control.
nan
Article 549
Title@2025-06-24 (2): Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays
Title: Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays | Adversariale Angriffe auf tief lernbasierte falsche Dateninjektionserkennung in Differentialrelais | 在差异中继中对深学习假数据输入探测的反向攻击 2506.19302v1 |
Authors (4): Ahmad Mohammad Saber, Aditi Maheshwari, Amr Youssef, Deepa Kundur
The application of Deep Learning-based Schemes (DLSs) for detecting False Data Injection Attacks (FDIAs) in smart grids has attracted significant attention. This paper demonstrates that adversarial attacks, carefully crafted FDIAs, can evade existing DLSs used for FDIA detection in Line Current Differential Relays (LCDRs). We propose a novel adversarial attack framework, utilizing the Fast Gradient Sign Method, which exploits DLS vulnerabilities by introducing small perturbations to LCDR remote measurements, leading to misclassification of the FDIA as a legitimate fault while also triggering the LCDR to trip. We evaluate the robustness of multiple deep learning models, including multi-layer perceptrons, convolutional neural networks, long short-term memory networks, and residual networks, under adversarial conditions. Our experimental results demonstrate that while these models perform well, they exhibit high degrees of vulnerability to adversarial attacks. For some models, the adversarial attack success rate exceeds 99.7%. To address this threat, we introduce adversarial training as a proactive defense mechanism, significantly enhancing the models’ ability to withstand adversarial FDIAs without compromising fault detection accuracy. Our results highlight the significant threat posed by adversarial attacks to DLS-based FDIA detection, underscore the necessity for robust cybersecurity measures in smart grids, and demonstrate the effectiveness of adversarial training in enhancing model robustness against adversarial FDIAs.
nan
Article 550
Title@2025-06-24 (2): LAuReL: Learned Augmented Residual Layer
Title: LAuReL: Learned Augmented Residual Layer | LAuReL: Erlernte Augmented Residual Layer | LauReL: 积累的剩余层 2411.07501v4 |
Authors (3): Gaurav Menghani, Ravi Kumar, Sanjiv Kumar
One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce Learned Augmented Residual Layer (LAuReL) – a novel generalization of the canonical residual connection – with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using LAuReL can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves 60% of the gains from adding an extra layer, while only adding 0.003% more parameters, and matches it while adding 2.6 times fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.
nan
Article 551
Title@2025-06-24 (2): SycnMapV2: Robust and Adaptive Unsupervised Segmentation
Title: SycnMapV2: Robust and Adaptive Unsupervised Segmentation | SycnMapV2: Robuste und adaptive unüberwachte Segmentierung | SycnMapV2:强力和适应性不受监督的分割 2506.16297v2 |
Authors (3): Heng Zhang, Zikang Wan, Danilo Vasconcellos Vargas
Human vision excels at segmenting visual cues without the need for explicit training, and it remains remarkably robust even as noise severity increases. In contrast, existing AI algorithms struggle to maintain accuracy under similar conditions. Here, we present SyncMapV2, the first to solve unsupervised segmentation with state-of-the-art robustness. SyncMapV2 exhibits a minimal drop in mIoU, only 0.01%, under digital corruption, compared to a 23.8% drop observed in SOTA methods. This superior performance extends across various types of corruption: noise (7.3% vs. 37.7%), weather (7.5% vs. 33.8%), and blur (7.0% vs. 29.5%). Notably, SyncMapV2 accomplishes this without any robust training, supervision, or loss functions. It is based on a learning paradigm that uses self-organizing dynamical equations combined with concepts from random networks. Moreover, unlike conventional methods that require re-initialization for each new input, SyncMapV2 adapts online, mimicking the continuous adaptability of human vision. Thus, we go beyond the accurate and robust results, and present the first algorithm that can do all the above online, adapting to input rather than re-initializing. In adaptability tests, SyncMapV2 demonstrates near-zero performance degradation, which motivates and fosters a new generation of robust and adaptive intelligence in the near future.
nan
Article 552
Title@2025-06-24 (2): The Effect of Depth on the Expressivity of Deep Linear State-Space Models
Title: The Effect of Depth on the Expressivity of Deep Linear State-Space Models | Der Effekt der Tiefe auf die Expressivität von Deep Linear State-Space-Modellen | 深度对深线国家空间模型-深线国家空间模型的表达性的影响 2506.19296v1 |
Authors (4): Zeyu Bao, Penghao Yu, Haotian Jiang, Qianxiao Li
Deep state-space models (SSMs) have gained increasing popularity in sequence modelling. While there are numerous theoretical investigations of shallow SSMs, how the depth of the SSM affects its expressiveness remains a crucial problem. In this paper, we systematically investigate the role of depth and width in deep linear SSMs, aiming to characterize how they influence the expressive capacity of the architecture. First, we rigorously prove that in the absence of parameter constraints, increasing depth and increasing width are generally equivalent, provided that the parameter count remains within the same order of magnitude. However, under the assumption that the parameter norms are constrained, the effects of depth and width differ significantly. We show that a shallow linear SSM with large parameter norms can be represented by a deep linear SSM with smaller norms using a constructive method. In particular, this demonstrates that deep SSMs are more capable of representing targets with large norms than shallow SSMs under norm constraints. Finally, we derive upper bounds on the minimal depth required for a deep linear SSM to represent a given shallow linear SSM under constrained parameter norms. We also validate our theoretical results with numerical experiments
nan
Article 553
Title@2025-06-24 (2): Efficient Extreme Operating Condition Search for Online Relay Setting Calculation in Renewable Power Systems Based on Parallel Graph Neural Network
Title: Efficient Extreme Operating Condition Search for Online Relay Setting Calculation in Renewable Power Systems Based on Parallel Graph Neural Network | Effiziente extreme Betriebsbedingungen Suche nach Online-Relay-Setting-Berechnung in erneuerbaren Stromsystemen basierend auf parallelem Graphen-Neural-Netzwerk | 以平行图形神经网络为基础的可再生能源系统在线中继设置计算 2506.19289v1 |
Authors (7): Yan Li, Zengli Yang, Youhuai Wang, Jing Wang, Xiaoyu Han, Jingyu Wang, Dongyuan Shi
The Extreme Operating Conditions Search (EOCS) problem is one of the key problems in relay setting calculation, which is used to ensure that the setting values of protection relays can adapt to the changing operating conditions of power systems over a period of time after deployment. The high penetration of renewable energy and the wide application of inverter-based resources make the operating conditions of renewable power systems more volatile, which urges the adoption of the online relay setting calculation strategy. However, the computation speed of existing EOCS methods based on local enumeration, heuristic algorithms, and mathematical programming cannot meet the efficiency requirement of online relay setting calculation. To reduce the time overhead, this paper, for the first time, proposes an efficient deep learning-based EOCS method suitable for online relay setting calculation. First, the power system information is formulated as four layers, i.e., a component parameter layer, a topological connection layer, an electrical distance layer, and a graph distance layer, which are fed into a parallel graph neural network (PGNN) model for feature extraction. Then, the four feature layers corresponding to each node are spliced and stretched, and then fed into the decision network to predict the extreme operating condition of the system. Finally, the proposed PGNN method is validated on the modified IEEE 39-bus and 118-bus test systems, where some of the synchronous generators are replaced by renewable generation units. The nonlinear fault characteristics of renewables are fully considered when computing fault currents. The experiment results show that the proposed PGNN method achieves higher accuracy than the existing methods in solving the EOCS problem. Meanwhile, it also provides greater improvements in online computation time.
nan
Article 554
Title@2025-06-24 (2): Information-Theoretic Proofs for Diffusion Sampling
Title: Information-Theoretic Proofs for Diffusion Sampling | Informationstheoretische Nachweise für die Diffusionsprobenahme | 用于扩散取样的信息理论证据 2502.02305v2 |
Authors (2): Galen Reeves, Henry D. Pfister
This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the sampling process of interest with an idealized comparison process that has an explicit Gaussian-convolution structure. We then leverage simple identities from information theory, including the I-MMSE relationship, to bound the discrepancy (in terms of the Kullback-Leibler divergence) between these two discrete-time processes. In particular, we show that, if the diffusion step sizes are chosen sufficiently small and one can approximate certain conditional mean estimators well, then the sampling distribution is provably close to the target distribution. Our results also provide a transparent view on how to accelerate convergence by using additional randomness in each step to match higher-order moments in the comparison process.
nan
Article 555
Title@2025-06-24 (2): DF2: Distribution-Free Decision-Focused Learning
Title: DF2: Distribution-Free Decision-Focused Learning | DF2: Verteilungsfreies entscheidungsorientiertes Lernen | DF2:无分发决定-无分发决定-以学习为目的的学习 2308.05889v2 |
Authors (7): Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang
Decision-focused learning (DFL), which differentiates through the KKT conditions, has recently emerged as a powerful approach for predict-then-optimize problems. However, under probabilistic settings, DFL faces three major bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misalignment between the model’s parameterized predictive distribution and the true probability distribution. Sample average approximation error arises when using finite samples to approximate the expected optimization objective. Gradient approximation error occurs when the objectives are non-convex and KKT conditions cannot be directly applied. In this paper, we present DF2, the first distribution-free decision-focused learning method designed to mitigate these three bottlenecks. Rather than depending on a task-specific forecaster that requires precise model assumptions, our method directly learns the expected optimization function during training. To efficiently learn this function in a data-driven manner, we devise an attention-based model architecture inspired by the distribution-based parameterization of the expected objective. We evaluate DF2 on two synthetic problems and three real-world problems, demonstrating the effectiveness of DF2. Our code is available at: https://github.com/Lingkai-Kong/DF2.
nan
Article 556
Title@2025-06-24 (2): A Batch-Insensitive Dynamic GNN Approach to Address Temporal Discontinuity in Graph Streams
Title: A Batch-Insensitive Dynamic GNN Approach to Address Temporal Discontinuity in Graph Streams | Ein Batch-Insensibler Dynamischer GNN-Ansatz zur Adresse der zeitlichen Diskontinuität in Graph Streams | 处理图表流中时间性失常问题的批量不敏感动态 GNN 方法 2506.19282v1 |
Authors (2): Yang Zhou, Xiaoning Ren
In dynamic graphs, preserving temporal continuity is critical. However, Memory-based Dynamic Graph Neural Networks (MDGNNs) trained with large batches often disrupt event sequences, leading to temporal information loss. This discontinuity not only deteriorates temporal modeling but also hinders optimization by increasing the difficulty of parameter convergence. Our theoretical study quantifies this through a Lipschitz upper bound, showing that large batch sizes enlarge the parameter search space. In response, we propose BADGNN, a novel batch-agnostic framework consisting of two core components: (1) Temporal Lipschitz Regularization (TLR) to control parameter search space expansion, and (2) Adaptive Attention Adjustment (A3) to alleviate attention distortion induced by both regularization and batching. Empirical results on three benchmark datasets show that BADGNN maintains strong performance while enabling significantly larger batch sizes and faster training compared to TGN. Our code is available at Code: https://anonymous.4open.science/r/TGN_Lipichitz-C033/.
nan
Article 557
Title@2025-06-24 (2): STIMULUS: Achieving Fast Convergence and Low Sample Complexity in Stochastic Multi-Objective Learning
Title: STIMULUS: Achieving Fast Convergence and Low Sample Complexity in Stochastic Multi-Objective Learning | STIMULUS: Schnelle Konvergenz und geringe Probenkomplexität im stochastischen Multi-Ziel-Lernen | STIMULUS:在托盘多目的学习中实现快速趋同和低样本复杂性 2506.19883v1 |
Authors (8): Zhuqing Liu, Chaosheng Dong, Michinari Momma, Simone Shao, Shaoyuan Xu, Yan Gao, Haibo Yang, Jia Liu
Recently, multi-objective optimization (MOO) has gained attention for its broad applications in ML, operations research, and engineering. However, MOO algorithm design remains in its infancy and many existing MOO methods suffer from unsatisfactory convergence rate and sample complexity performance. To address this challenge, in this paper, we propose an algorithm called STIMULUS( stochastic path-integrated multi-gradient recursive e\ulstimator), a new and robust approach for solving MOO problems. Different from the traditional methods, STIMULUS introduces a simple yet powerful recursive framework for updating stochastic gradient estimates to improve convergence performance with low sample complexity. In addition, we introduce an enhanced version of STIMULUS, termed STIMULUS-M, which incorporates a momentum term to further expedite convergence. We establish $O(1/T)$ convergence rates of the proposed methods for non-convex settings and $O (\exp{-\mu T})$ for strongly convex settings, where $T$ is the total number of iteration rounds. Additionally, we achieve the state-of-the-art $O \left(n+\sqrt{n}\epsilon^{-1}\right)$ sample complexities for non-convex settings and $O\left(n+ \sqrt{n} \ln ({\mu/\epsilon})\right)$ for strongly convex settings, where $\epsilon>0$ is a desired stationarity error. Moreover, to alleviate the periodic full gradient evaluation requirement in STIMULUS and STIMULUS-M, we further propose enhanced versions with adaptive batching called STIMULUS+/ STIMULUS-M+ and provide their theoretical analysis.
nan
Article 558
Title@2025-06-24 (2): Robust OOD Graph Learning via Mean Constraints and Noise Reduction
Title: Robust OOD Graph Learning via Mean Constraints and Noise Reduction | Robustes OOD Graphenlernen über mittlere Einschränkungen und Lärmreduzierung | 通过中度制约和减少噪音进行强有力的 OOD 图表学习 2506.19281v1 |
Authors (2): Yang Zhou, Xiaoning Ren
Graph Out-of-Distribution (OOD) classification often suffers from sharp performance drops, particularly under category imbalance and structural noise. This work tackles two pressing challenges in this context: (1) the underperformance of minority classes due to skewed label distributions, and (2) their heightened sensitivity to structural noise in graph data. To address these problems, we propose two complementary solutions. First, Constrained Mean Optimization (CMO) improves minority class robustness by encouraging similarity-based instance aggregation under worst-case conditions. Second, the Neighbor-Aware Noise Reweighting (NNR) mechanism assigns dynamic weights to training samples based on local structural consistency, mitigating noise influence. We provide theoretical justification for our methods, and validate their effectiveness with extensive experiments on both synthetic and real-world datasets, showing significant improvements in Graph OOD generalization and classification accuracy. The code for our method is available at: https://anonymous.4open.science/r/CMO-NNR-2F30.
nan
Article 559
Title@2025-06-24 (2): Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach
Title: Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach | Emotion Detection on User Front-Facing App Interfaces für verbesserte Zeitplanoptimierung: Ein Ansatz zum maschinellen Lernen | 增强计划优化的用户前向应用程序接口的情感探测:机械学习方法 2506.19280v1 |
Authors (3): Feiting Yang, Antoine Moevus, Steve Lévesque
Human-Computer Interaction (HCI) has evolved significantly to incorporate emotion recognition capabilities, creating unprecedented opportunities for adaptive and personalized user experiences. This paper explores the integration of emotion detection into calendar applications, enabling user interfaces to dynamically respond to users’ emotional states and stress levels, thereby enhancing both productivity and engagement. We present and evaluate two complementary approaches to emotion detection: a biometric-based method utilizing heart rate (HR) data extracted from electrocardiogram (ECG) signals processed through Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks to predict the emotional dimensions of Valence, Arousal, and Dominance; and a behavioral method analyzing computer activity through multiple machine learning models to classify emotions based on fine-grained user interactions such as mouse movements, clicks, and keystroke patterns. Our comparative analysis, from real-world datasets, reveals that while both approaches demonstrate effectiveness, the computer activity-based method delivers superior consistency and accuracy, particularly for mouse-related interactions, which achieved approximately 90\% accuracy. Furthermore, GRU networks outperformed LSTM models in the biometric approach, with Valence prediction reaching 84.38\% accuracy.
nan
Article 560
Title@2025-06-24 (2): Rare dense solutions clusters in asymmetric binary perceptrons – local entropy via fully lifted RDT
Title: Rare dense solutions clusters in asymmetric binary perceptrons – local entropy via fully lifted RDT | Seltene dichte Lösungen Cluster in asymmetrischen binären Perzeptronen – lokale Entropie über vollständig angehobene RDT | 非对称二进光线 – – 通过完全提升的区域主任小组,当地对流 2506.19276v1 |
Authors (1): Mihailo Stojnic
We study classical asymmetric binary perceptron (ABP) and associated \emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of \emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient algorithms do exist even for constraint densities $\alpha$ fairly close but at a finite distance (\emph{computational gap}) from the capacity. In recent years, existence of rare large dense clusters and magical ability of fast algorithms to find them have been posited as the conceptual resolution of this paradox. Monotonicity or breakdown of the LEs associated with such \emph{atypical} clusters are predicated to play a key role in their thinning-out or even complete defragmentation. Invention of fully lifted random duality theory (fl RDT) [90,93,94] allows studying random structures \emph{typical} features. A large deviation upgrade, sfl LD RDT [96,97], moves things further and enables \emph{atypical} features characterizations as well. Utilizing the machinery of [96,97] we here develop a generic framework to study LE as an ABP’s atypical feature. Already on the second level of lifting we discover that the LE results are closely matching those obtained through replica methods. For classical zero threshold ABP, we obtain that LE breaks down for $\alpha$ in $(0.77,0.78)$ interval which basically matches $\alpha\sim 0.75-0.77$ range that currently best ABP solvers can handle and effectively indicates that LE’s behavior might indeed be among key reflections of the ABP’s computational gaps presumable existence.
nan
Article 561
Title@2025-06-24 (2): Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation
Title: Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation | Compound Fault Diagnose für Zugübertragungssysteme mit Deep Learning mit Fourier-verstärkter Darstellung | 利用Fourier加强的代表制进行深学习,对火车传输系统进行断层分析 2504.07155v2 |
Authors (3): Jonathan Adam Rico, Nagarajan Raghavan, Senthilnath Jayavelu
Fault diagnosis prevents train disruptions by ensuring the stability and reliability of their transmission systems. Data-driven fault diagnosis models have several advantages over traditional methods in terms of dealing with non-linearity, adaptability, scalability, and automation. However, existing data-driven models are trained on separate transmission components and only consider single faults due to the limitations of existing datasets. These models will perform worse in scenarios where components operate with each other at the same time, affecting each component’s vibration signals. To address some of these challenges, we propose a frequency domain representation and a 1-dimensional convolutional neural network for compound fault diagnosis and applied it on the PHM Beijing 2024 dataset, which includes 21 sensor channels, 17 single faults, and 42 compound faults from 4 interacting components, that is, motor, gearbox, left axle box, and right axle box. Our proposed model achieved 97.67% and 93.93% accuracies on the test set with 17 single faults and on the test set with 42 compound faults, respectively.
nan
Article 562
Title@2025-06-24 (2): A Qubit-Efficient Hybrid Quantum Encoding Mechanism for Quantum Machine Learning
Title: A Qubit-Efficient Hybrid Quantum Encoding Mechanism for Quantum Machine Learning | Ein qubit-effizienter Hybrid-Quantum-Encoding-Mechanismus für das Quantum Machine Learning | 量子机器学习量子编码机制 2506.19275v1 |
Authors (5): Hevish Cowlessur, Tansu Alpcan, Chandra Thapa, Seyit Camtepe, Neel Kanth Kundu
Efficiently embedding high-dimensional datasets onto noisy and low-qubit quantum systems is a significant barrier to practical Quantum Machine Learning (QML). Approaches such as quantum autoencoders can be constrained by current hardware capabilities and may exhibit vulnerabilities to reconstruction attacks due to their invertibility. We propose Quantum Principal Geodesic Analysis (qPGA), a novel, non-invertible method for dimensionality reduction and qubit-efficient encoding. Executed classically, qPGA leverages Riemannian geometry to project data onto the unit Hilbert sphere, generating outputs inherently suitable for quantum amplitude encoding. This technique preserves the neighborhood structure of high-dimensional datasets within a compact latent space, significantly reducing qubit requirements for amplitude encoding. We derive theoretical bounds quantifying qubit requirements for effective encoding onto noisy systems. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 show that qPGA preserves local structure more effectively than both quantum and hybrid autoencoders. Additionally, we demonstrate that qPGA enhances resistance to reconstruction attacks due to its non-invertible nature. In downstream QML classification tasks, qPGA can achieve over 99% accuracy and F1-score on MNIST and Fashion-MNIST, outperforming quantum-dependent baselines. Initial tests on real hardware and noisy simulators confirm its potential for noise-resilient performance, offering a scalable solution for advancing QML applications.
nan
Article 563
Title@2025-06-24 (2): Stabilizing PDE–ML Coupled System
Title: Stabilizing PDE–ML Coupled System | Stabilisierung des PDE-ML-gekoppelten Systems | 稳定PDE-ML混合系统 2506.19274v1 |
Authors (3): Saad Qadeer, Panos Stinis, Hui Wan
A long-standing obstacle in the use of machine-learnt surrogates with larger PDE systems is the onset of instabilities when solved numerically. Efforts towards ameliorating these have mostly concentrated on improving the accuracy of the surrogates or imbuing them with additional structure, and have garnered limited success. In this article, we study a prototype problem and draw insights that can help with more complex systems. In particular, we focus on a viscous Burgers’-ML system and, after identifying the cause of the instabilities, prescribe strategies to stabilize the coupled system. To improve the accuracy of the stabilized system, we next explore methods based on the Mori–Zwanzig formalism.
nan
Article 564
Title@2025-06-24 (2): Process Reward Models That Think
Title: Process Reward Models That Think | Prozess Belohnung Modelle, die denken | 思考的流程奖励模式 2504.16828v3 |
Authors (8): Muhammad Khalifa, Rishabh Agarwal, Lajanugen Logeswaran, Jaekyeom Kim, Hao Peng, Moontae Lee, Honglak Lee, Lu Wang
Step-by-step verifiers – also known as process reward models (PRMs) – are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers – using only 1% of the process labels in PRM800K – across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME ‘24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation on a subset of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained on the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. Our work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training. Our code, data, and models will be released at https://github.com/mukhal/thinkprm.
nan
Article 565
Title@2025-06-24 (2): Continuous-variable Quantum Diffusion Model for State Generation and Restoration
Title: Continuous-variable Quantum Diffusion Model for State Generation and Restoration | Kontinuierlich-variables Quantendiffusionsmodell für die Zustandserstellung und Wiederherstellung | 国家发电和复原的连续可变量量量传播模型 2506.19270v1 |
Authors (3): Haitao Huang, Chuangtao Chen, Qinglin Zhao
The generation and preservation of complex quantum states against environmental noise are paramount challenges in advancing continuous-variable (CV) quantum information processing. This paper introduces a novel framework based on continuous-variable quantum diffusion principles, synergizing them with CV quantum neural networks (CVQNNs) to address these dual challenges. For the task of state generation, our Continuous-Variable Quantum Diffusion Generative model (CVQD-G) employs a physically driven forward diffusion process using a thermal loss channel, which is then inverted by a learnable, parameter-efficient backward denoising process based on a CVQNN with time-embedding. This framework’s capability is further extended for state recovery by the Continuous-Variable Quantum Diffusion Restoration model (CVQD-R), a specialized variant designed to restore quantum states, particularly coherent states with unknown parameters, from thermal degradation. Extensive numerical simulations validate these dual capabilities, demonstrating the high-fidelity generation of diverse Gaussian (coherent, squeezed) and non-Gaussian (Fock, cat) states, typically with fidelities exceeding 99%, and confirming the model’s ability to robustly restore corrupted states. Furthermore, a comprehensive complexity analysis reveals favorable training and inference costs, highlighting the framework’s efficiency, scalability, and its potential as a robust tool for quantum state engineering and noise mitigation in realistic CV quantum systems.
nan
Article 566
Title@2025-06-24 (2): Learning Treatment Representations for Downstream Instrumental Variable Regression
Title: Learning Treatment Representations for Downstream Instrumental Variable Regression | Lern-Behandlung Darstellungen für Downstream Instrumentale Variable Regression | 下下游工具递退学习治疗代表 2506.02200v2 |
Authors (3): Shiangyi Lin, Hui Lan, Vasilis Syrgkanis
Traditional instrumental variable (IV) estimators face a fundamental constraint: they can only accommodate as many endogenous treatment variables as available instruments. This limitation becomes particularly challenging in settings where the treatment is presented in a high-dimensional and unstructured manner (e.g. descriptions of patient treatment pathways in a hospital). In such settings, researchers typically resort to applying unsupervised dimension reduction techniques to learn a low-dimensional treatment representation prior to implementing IV regression analysis. We show that such methods can suffer from substantial omitted variable bias due to implicit regularization in the representation learning step. We propose a novel approach to construct treatment representations by explicitly incorporating instrumental variables during the representation learning process. Our approach provides a framework for handling high-dimensional endogenous variables with limited instruments. We demonstrate both theoretically and empirically that fitting IV models on these instrument-informed representations ensures identification of directions that optimize outcome prediction. Our experiments show that our proposed methodology improves upon the conventional two-stage approaches that perform dimension reduction without incorporating instrument information.
nan
Article 567
Title@2025-06-24 (2): Leveraging Large Language Models to Democratize Access to Costly Datasets for Academic Research
Title: Leveraging Large Language Models to Democratize Access to Costly Datasets for Academic Research | Nutzung großer Sprachmodelle zur Demokratisierung des Zugangs zu kostengünstigen Datensätzen für die akademische Forschung | 利用大语言模式使学术研究获得成本昂贵的数据集民主化 2412.02065v2 |
Authors (2): Julian Junyan Wang, Victor Xiaoqi Wang
Unequal access to costly datasets essential for empirical research has long hindered researchers from disadvantaged institutions, limiting their ability to contribute to their fields and advance their careers. Recent breakthroughs in Large Language Models (LLMs) have the potential to democratize data access by automating data collection from unstructured sources. We develop and evaluate a novel methodology using GPT-4o-mini within a Retrieval-Augmented Generation (RAG) framework to collect data from corporate disclosures. Our approach achieves human-level accuracy in collecting CEO pay ratios from approximately 10,000 proxy statements and Critical Audit Matters (CAMs) from more than 12,000 10-K filings, with LLM processing times of 9 and 40 minutes respectively, each at a cost under $10. This stands in stark contrast to the hundreds of hours needed for manual collection or the thousands of dollars required for commercial database subscriptions. To foster a more inclusive research community by empowering researchers with limited resources to explore new avenues of inquiry, we share our methodology and the resulting datasets.
nan
Article 568
Title@2025-06-24 (2): Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning
Title: Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning | Netzwerkstrukturen als Angriffsfläche: Topologiebasiertes Datenschutz-Leakage im Federated Learning | 网络结构作为攻击表面:联邦学习中的基于地形的隐私渗漏 2506.19260v1 |
Authors (3): Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya
Federated learning systems increasingly rely on diverse network topologies to address scalability and organizational constraints. While existing privacy research focuses on gradient-based attacks, the privacy implications of network topology knowledge remain critically understudied. We conduct the first comprehensive analysis of topology-based privacy leakage across realistic adversarial knowledge scenarios, demonstrating that adversaries with varying degrees of structural knowledge can infer sensitive data distribution patterns even under strong differential privacy guarantees. Through systematic evaluation of 4,720 attack instances, we analyze six distinct adversarial knowledge scenarios: complete topology knowledge and five partial knowledge configurations reflecting real-world deployment constraints. We propose three complementary attack vectors: communication pattern analysis, parameter magnitude profiling, and structural position correlation, achieving success rates of 84.1%, 65.0%, and 47.2% under complete knowledge conditions. Critically, we find that 80% of realistic partial knowledge scenarios maintain attack effectiveness above security thresholds, with certain partial knowledge configurations achieving performance superior to the baseline complete knowledge scenario. To address these vulnerabilities, we propose and empirically validate structural noise injection as a complementary defense mechanism across 808 configurations, demonstrating up to 51.4% additional attack reduction when properly layered with existing privacy techniques. These results establish that network topology represents a fundamental privacy vulnerability in federated learning systems while providing practical pathways for mitigation through topology-aware defense mechanisms.
nan
Article 569
Title@2025-06-24 (2): Personality Prediction from Life Stories using Language Models
Title: Personality Prediction from Life Stories using Language Models | Persönlichkeitsvorhersage aus Lebensgeschichten mit Sprachmodellen | 使用语言模型对生活故事的个性预测 2506.19258v1 |
Authors (5): Rasiq Hussain, Jerry Ma, Rithik Khandelwal, Joshua Oltmanns, Mehak Gupta
Natural Language Processing (NLP) offers new avenues for personality assessment by leveraging rich, open-ended text, moving beyond traditional questionnaires. In this study, we address the challenge of modeling long narrative interview where each exceeds 2000 tokens so as to predict Five-Factor Model (FFM) personality traits. We propose a two-step approach: first, we extract contextual embeddings using sliding-window fine-tuning of pretrained language models; then, we apply Recurrent Neural Networks (RNNs) with attention mechanisms to integrate long-range dependencies and enhance interpretability. This hybrid method effectively bridges the strengths of pretrained transformers and sequence modeling to handle long-context data. Through ablation studies and comparisons with state-of-the-art long-context models such as LLaMA and Longformer, we demonstrate improvements in prediction accuracy, efficiency, and interpretability. Our results highlight the potential of combining language-based features with long-context modeling to advance personality assessment from life narratives.
nan
Article 570
Title@2025-06-24 (2): Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track
Title: Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track | Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen | 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v1 |
Authors (15): Rylan Schaeffer, Joshua Kazdan, Yegor Denisov-Blanch, Brando Miranda, Matthias Gerstgrasser, Susan Zhang, Andreas Haupt, Isha Gupta, Elyas Obbad, Jesse Dodge, Jessica Zosa Forde, Koustuv Sinha, Francesco Orabona, Sanmi Koyejo, David Donoho
Science progresses by iteratively advancing and correcting humanity’s understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made.This position paper argues that ML conferences should establish a dedicated “Refutations and Critiques” (R & C) Track. This R & C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.
nan
Article 571
Title@2025-06-24 (2): Robust Behavior Cloning Via Global Lipschitz Regularization
Title: Robust Behavior Cloning Via Global Lipschitz Regularization | Robustes Verhalten Klonen über globale Lipschitz Regularisierung | 强力行为 克隆通过全球自由自由实现正规化 2506.19250v1 |
Authors (5): Shili Wu, Yizhao Jin, Puhua Niu, Aniruddha Datta, Sean B. Andersson
Behavior Cloning (BC) is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles. BC trains a policy to mimic the behavior of an expert by using a dataset composed of only state-action pairs demonstrated by the expert, without any additional interaction with the environment. However, During deployment, the policy observations may contain measurement errors or adversarial disturbances. Since the observations may deviate from the true states, they can mislead the agent into making sub-optimal actions. In this work, we use a global Lipschitz regularization approach to enhance the robustness of the learned policy network. We then show that the resulting global Lipschitz property provides a robustness certificate to the policy with respect to different bounded norm perturbations. Then, we propose a way to construct a Lipschitz neural network that ensures the policy robustness. We empirically validate our theory across various environments in Gymnasium. Keywords: Robust Reinforcement Learning; Behavior Cloning; Lipschitz Neural Network
nan
Article 572
Title@2025-06-24 (2): Inference-Time Reward Hacking in Large Language Models
Title: Inference-Time Reward Hacking in Large Language Models | Inferenz-Time Reward Hacking in großen Sprachmodellen | 大语种模型中的推定-时间回授 2506.19248v1 |
Authors (5): Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio du Pin Calmon
A common paradigm to improve the performance of large language models is optimizing for a reward model. Reward models assign a numerical score to LLM outputs indicating, for example, which response would likely be preferred by a user or is most aligned with safety goals. However, reward models are never perfect. They inevitably function as proxies for complex desiderata such as correctness, helpfulness, and safety. By overoptimizing for a misspecified reward, we can subvert intended alignment goals and reduce overall performance – a phenomenon commonly referred to as reward hacking. In this work, we characterize reward hacking in inference-time alignment and demonstrate when and how we can mitigate it by hedging on the proxy reward. We study this phenomenon under Best-of-$n$ (BoN) and Soft-Best-of-$n$ (SBoN), and we introduce Best-of-Poisson (BoP) that provides an efficient, near-exact approximation of the optimal reward-KL divergence policy at inference time. We show that the characteristic pattern of hacking as observed in practice (where the true reward first increases before declining) is an inevitable property of a broad class of inference-time mechanisms, including BoN and BoP. To counter this effect, hedging offers a tactical choice to avoid placing undue confidence in high but potentially misleading proxy reward signals. We introduce HedgeTune, an efficient algorithm to find the optimal inference-time parameter and avoid reward hacking. We demonstrate through experiments that hedging mitigates reward hacking and achieves superior distortion-reward tradeoffs with minimal computational overhead.
nan
Article 573
Title@2025-06-24 (2): Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning
Title: Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning | Verhaltensanomalienerkennung in verteilten Systemen über Federated Contrastive Learning | 通过联邦反竞争学习在分布式系统中进行行为异常检测 2506.19246v1 |
Authors (6): Renzi Meng, Heyi Wang, Yumeng Sun, Qiyuan Wu, Lian Lian, Renhan Zhang
This paper addresses the increasingly prominent problem of anomaly detection in distributed systems. It proposes a detection method based on federated contrastive learning. The goal is to overcome the limitations of traditional centralized approaches in terms of data privacy, node heterogeneity, and anomaly pattern recognition. The proposed method combines the distributed collaborative modeling capabilities of federated learning with the feature discrimination enhancement of contrastive learning. It builds embedding representations on local nodes and constructs positive and negative sample pairs to guide the model in learning a more discriminative feature space. Without exposing raw data, the method optimizes a global model through a federated aggregation strategy. Specifically, the method uses an encoder to represent local behavior data in high-dimensional space. This includes system logs, operational metrics, and system calls. The model is trained using both contrastive loss and classification loss to improve its ability to detect fine-grained anomaly patterns. The method is evaluated under multiple typical attack types. It is also tested in a simulated real-time data stream scenario to examine its responsiveness. Experimental results show that the proposed method outperforms existing approaches across multiple performance metrics. It demonstrates strong detection accuracy and adaptability, effectively addressing complex anomalies in distributed environments. Through careful design of key modules and optimization of the training mechanism, the proposed method achieves a balance between privacy preservation and detection performance. It offers a feasible technical path for intelligent security management in distributed systems.
nan
Article 574
Title@2025-06-24 (2): Universal kernels via harmonic analysis on Riemannian symmetric spaces
Title: Universal kernels via harmonic analysis on Riemannian symmetric spaces | Universelle Kerne durch harmonische Analyse auf Riemannschen symmetrischen Räumen | 通过对里格曼对称空间的和谐分析实现通用内核 2506.19245v1 |
Authors (3): Franziskus Steinert, Salem Said, Cyrus Mostajeran
The universality properties of kernels characterize the class of functions that can be approximated in the associated reproducing kernel Hilbert space and are of fundamental importance in the theoretical underpinning of kernel methods in machine learning. In this work, we establish fundamental tools for investigating universality properties of kernels in Riemannian symmetric spaces, thereby extending the study of this important topic to kernels in non-Euclidean domains. Moreover, we use the developed tools to prove the universality of several recent examples from the literature on positive definite kernels defined on Riemannian symmetric spaces, thus providing theoretical justification for their use in applications involving manifold-valued data.
nan
Article 575
Title@2025-06-24 (2): SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Title: SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation | SASSHA: Scharfheitsbewusste Adaptive Second-Order-Optimierung mit stabiler hessischer Annäherung | SASSHA: 使用稳定黑森相近的优化度 2502.18153v2 |
Authors (4): Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee
Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.
nan
Article 576
Title@2025-06-24 (2): High precision PINNs in unbounded domains: application to singularity formulation in PDEs
Title: High precision PINNs in unbounded domains: application to singularity formulation in PDEs | Hochpräzise PINNs in ungebundenen Domänen: Anwendung auf Singularitätsformulierung in PDEs | 在无约束域域的高精精密 PINNs: 应用到PDEs 的独一配方 2506.19243v1 |
Authors (5): Yixuan Wang, Ziming Liu, Zongyi Li, Anima Anandkumar, Thomas Y. Hou
We investigate the high-precision training of Physics-Informed Neural Networks (PINNs) in unbounded domains, with a special focus on applications to singularity formulation in PDEs. We propose a modularized approach and study the choices of neural network ansatz, sampling strategy, and optimization algorithm. When combined with rigorous computer-assisted proofs and PDE analysis, the numerical solutions identified by PINNs, provided they are of high precision, can serve as a powerful tool for studying singularities in PDEs. For 1D Burgers equation, our framework can lead to a solution with very high precision, and for the 2D Boussinesq equation, which is directly related to the singularity formulation in 3D Euler and Navier-Stokes equations, we obtain a solution whose loss is $4$ digits smaller than that obtained in \cite{wang2023asymptotic} with fewer training steps. We also discuss potential directions for pushing towards machine precision for higher-dimensional problems.
nan
Article 577
Title@2025-06-24 (2): Understanding Reasoning in Thinking Language Models via Steering Vectors
Title: Understanding Reasoning in Thinking Language Models via Steering Vectors | Verständnis von Vernunft im Denken von Sprachmodellen über Lenkungs-Vektoren | 通过指导矢量来理解思考语言模式的理由 2506.18167v2 |
Authors (5): Constantin Venhoff, Iván Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda
Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, controlling their reasoning processes remains challenging. This work presents a steering approach for thinking LLMs by analyzing and manipulating specific reasoning behaviors in DeepSeek-R1-Distill models. Through a systematic experiment on 500 tasks across 10 diverse categories, we identify several reasoning behaviors exhibited by thinking models, including expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model’s activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model’s reasoning process, such as its tendency to backtrack or express uncertainty. Our approach offers practical tools for steering reasoning processes in thinking models in a controlled and interpretable manner. We validate our steering method using three DeepSeek-R1-Distill models, demonstrating consistent control across different model architectures.
nan
Article 578
Title@2025-06-24 (2): A General Framework for Property-Driven Machine Learning
Title: A General Framework for Property-Driven Machine Learning | Ein allgemeiner Rahmen für eigentumsorientiertes maschinelles Lernen | 财产驱动机器学习总框架 2505.00466v2 |
Authors (5): Thomas Flinkow, Marco Casadio, Colin Kessler, Rosemary Monahan, Ekaterina Komendantskaya
Neural networks have been shown to frequently fail to learn critical safety and correctness properties purely from data, highlighting the need for training methods that directly integrate logical specifications. While adversarial training can be used to improve robustness to small perturbations within $\epsilon$-cubes, domains other than computer vision – such as control systems and natural language processing – may require more flexible input region specifications via generalised hyper-rectangles. Differentiable logics offer a way to encode arbitrary logical constraints as additional loss terms that guide the learning process towards satisfying these constraints. In this paper, we investigate how these two complementary approaches can be unified within a single framework for property-driven machine learning, as a step toward effective formal verification of neural networks. We show that well-known properties from the literature are subcases of this general approach, and we demonstrate its practical effectiveness on a case study involving a neural network controller for a drone system. Our framework is made publicly available at https://github.com/tflinkow/property-driven-ml.
nan
Article 579
Title@2025-06-24 (2): Limits of Discrete Energy of Families of Increasing Sets
Title: Limits of Discrete Energy of Families of Increasing Sets | Grenzen der diskreten Energie von Familien zunehmender Sets | 增加组家庭不同能源限度的限制 2504.11302v2 |
Authors (1): Hari Sarang Nathan
The Hausdorff dimension of a set can be detected using the Riesz energy. Here, we consider situations where a sequence of points, ${x_n}$, ``fills in’’ a set $E \subset \mathbb{R}^d$ in an appropriate sense and investigate the degree to which the discrete analog to the Riesz energy of these sets can be used to bound the Hausdorff dimension of $E$. We also discuss applications to data science and Erd\H{o}s/Falconer type problems.
nan
Article 580
Title@2025-06-24 (2): Private Model Personalization Revisited
Title: Private Model Personalization Revisited | Private Modell-Personalisierung überarbeitet | 重新研究的私人个人模式 2506.19220v1 |
Authors (3): Conor Snedeker, Xinyu Zhou, Raef Bassily
We study model personalization under user-level differential privacy (DP) in the shared representation framework. In this problem, there are $n$ users whose data is statistically heterogeneous, and their optimal parameters share an unknown embedding $U^* \in\mathbb{R}^{d\times k}$ that maps the user parameters in $\mathbb{R}^d$ to low-dimensional representations in $\mathbb{R}^k$, where $k\ll d$. Our goal is to privately recover the shared embedding and the local low-dimensional representations with small excess risk in the federated setting. We propose a private, efficient federated learning algorithm to learn the shared embedding based on the FedRep algorithm in [CHM+21]. Unlike [CHM+21], our algorithm satisfies differential privacy, and our results hold for the case of noisy labels. In contrast to prior work on private model personalization [JRS+21], our utility guarantees hold under a larger class of users’ distributions (sub-Gaussian instead of Gaussian distributions). Additionally, in natural parameter regimes, we improve the privacy error term in [JRS+21] by a factor of $\widetilde{O}(dk)$. Next, we consider the binary classification setting. We present an information-theoretic construction to privately learn the shared embedding and derive a margin-based accuracy guarantee that is independent of $d$. Our method utilizes the Johnson-Lindenstrauss transform to reduce the effective dimensions of the shared embedding and the users’ data. This result shows that dimension-independent risk bounds are possible in this setting under a margin loss.
nan
Article 581
Title@2025-06-24 (2): Iterative Minimax Games with Coupled Linear Constraints
Title: Iterative Minimax Games with Coupled Linear Constraints | Iterative Minimax Spiele mit gekoppelten linearen Einschränkungen | 带有连线限制的迭接小型游戏 2212.04672v5 |
Authors (3): Huiling Zhang, Zi Xu, Yu-Hong Dai
The study of nonconvex minimax games has gained significant momentum in machine learning and decision science communities due to their fundamental connections to adversarial training scenarios. This work develops a primal-dual alternating proximal gradient (PDAPG) algorithm framework for resolving iterative minimax games featuring nonsmooth nonconvex objectives subject to coupled linear constraints. We establish rigorous convergence guarantees for both nonconvex-strongly concave and nonconvex-concave game configurations, demonstrating that PDAPG achieves an $\varepsilon$-stationary solution within $\mathcal{O}\left( \varepsilon ^{-2} \right)$ iterations for strongly concave settings and $\mathcal{O}\left( \varepsilon ^{-4} \right)$ iterations for concave scenarios. Our analysis provides the first known iteration complexity bounds for this class of constrained minimax games, particularly addressing the critical challenge of coupled linear constraints that induce inherent interdependencies among strategy variables. The proposed game-theoretic framework advances existing solution methodologies by simultaneously handling nonsmooth components and coordinated constraint structures through alternating primal-dual updates.
nan
Article 582
Title@2025-06-23 (1): Transferring Features Across Language Models With Model Stitching
Title: Transferring Features Across Language Models With Model Stitching | Übertragung von Funktionen über Sprachmodelle mit Modellstich | 使用模型裁剪的跨语言模型传输功能 2506.06609v2 |
Authors (4): Alan Chen, Jack Merullo, Alessandro Stolfo, Ellie Pavlick
In this work, we demonstrate that affine mappings between residual streams of language models is a cheap way to effectively transfer represented features between models. We apply this technique to transfer the weights of Sparse Autoencoders (SAEs) between models of different sizes to compare their representations. We find that small and large models learn similar representation spaces, which motivates training expensive components like SAEs on a smaller model and transferring to a larger model at a FLOPs savings. In particular, using a small-to-large transferred SAE as initialization can lead to 50% cheaper training runs when training SAEs on larger models. Next, we show that transferred probes and steering vectors can effectively recover ground truth performance. Finally, we dive deeper into feature-level transferability, finding that semantic and structural features transfer noticeably differently while specific classes of functional features have their roles faithfully mapped. Overall, our findings illustrate similarities and differences in the linear representation spaces of small and large models and demonstrate a method for improving the training efficiency of SAEs.
nan
Article 583
Title@2025-06-23 (1): Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Title: Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Align and Distill: Domain-Adaptive-Objekterkennung vereinheitlichen und verbessern | 调整和蒸馏:统一和改进域适应性物体探测 2403.12029v4 |
Authors (8): Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, Grant Van Horn
Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +0.6 AP50 on CFC Kenai to Channel. ALDI and ALDI++ are architecture-agnostic, setting a new state-of-the-art for YOLO and DETR-based DAOD as well without additional hyperparameter tuning. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.
nan
Article 584
Title@2025-06-23 (1): Simulation of a closed-loop dc-dc converter using a physics-informed neural network-based model
Title: Simulation of a closed-loop dc-dc converter using a physics-informed neural network-based model | Simulation eines Closed-Loop-DC-Wandlers mit einem physik-informierten neuronalen Netzwerk-basierten Modell | 使用以物理知情神经网络为基础的模型模拟闭闭环dc-dc转换器 2506.19178v1 |
Authors (3): Marc-Antoine Coulombe, Maxime Berger, Antoine Lesage-Landry
The growing reliance on power electronics introduces new challenges requiring detailed time-domain analyses with fast and accurate circuit simulation tools. Currently, commercial time-domain simulation software are mainly relying on physics-based methods to simulate power electronics. Recent work showed that data-driven and physics-informed learning methods can increase simulation speed with limited compromise on accuracy, but many challenges remain before deployment in commercial tools can be possible. In this paper, we propose a physics-informed bidirectional long-short term memory neural network (BiLSTM-PINN) model to simulate the time-domain response of a closed-loop dc-dc boost converter for various operating points, parameters, and perturbations. A physics-informed fully-connected neural network (FCNN) and a BiLSTM are also trained to establish a comparison. The three methods are then compared using step-response tests to assess their performance and limitations in terms of accuracy. The results show that the BiLSTM-PINN and BiLSTM models outperform the FCNN model by more than 9 and 4.5 times, respectively, in terms of median RMSE. Their standard deviation values are more than 2.6 and 1.7 smaller than the FCNN’s, making them also more consistent. Those results illustrate that the proposed BiLSTM-PINN is a potential alternative to other physics-based or data-driven methods for power electronics simulations.
nan
Article 585
Title@2025-06-23 (1): Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
Title: Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes | Maschinen und mathematische Mutationen: Verwendung von GNNs zur Charakterisierung von Quiver-Mutationsklassen | 机器和数学变异:使用 GNNs 来定性 Quiver 变异分类 2411.07467v2 |
Authors (7): Jesse He, Helen Jenne, Herman Chau, Davis Brown, Mark Raugas, Sara Billey, Henry Kvinge
Machine learning is becoming an increasingly valuable tool in mathematics, enabling one to identify subtle patterns across collections of examples so vast that they would be impossible for a single researcher to feasibly review and analyze. In this work, we use graph neural networks to investigate \emph{quiver mutation} – an operation that transforms one quiver (or directed multigraph) into another – which is central to the theory of cluster algebras with deep connections to geometry, topology, and physics. In the study of cluster algebras, the question of \emph{mutation equivalence} is of fundamental concern: given two quivers, can one efficiently determine if one quiver can be transformed into the other through a sequence of mutations? In this paper, we use graph neural networks and AI explainability techniques to independently discover mutation equivalence criteria for quivers of type $\tilde{D}$. Along the way, we also show that even without explicit training to do so, our model captures structure within its hidden representation that allows us to reconstruct known criteria from type $D$, adding to the growing evidence that modern machine learning models are capable of learning abstract and parsimonious rules from mathematical data.
nan
Article 586
Title@2025-06-23 (1): The Gittins Index: A Design Principle for Decision-Making Under Uncertainty
Title: The Gittins Index: A Design Principle for Decision-Making Under Uncertainty | Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit | Gittins指数:不确定性下决策的设计原则 2506.10872v2 |
Authors (2): Ziv Scully, Alexander Terenin
The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora’s box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems. The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.
nan
Article 587
Title@2025-06-23 (1): Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
Title: Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms | Realistische gemeinsame Raumgrenzen für die Bewegungsanalyse gesunder und beeinträchtigter menschlicher Arme lernen | 人体健康与残疾武器运动分析范围空间联合学习现实联合空间边界 2311.10653v3 |
Authors (3): Shafagh Keyvanian, Michelle J. Johnson, Nadia Figueroa
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients. [https://sites.google.com/seas.upenn.edu/learning-rom]
nan
Article 588
Title@2025-06-23 (1): Distilling Tool Knowledge into Language Models via Back-Translated Traces
Title: Distilling Tool Knowledge into Language Models via Back-Translated Traces | Destillieren von Werkzeugwissen in Sprachmodelle über Back-Germany Traces | 通过后转路径将工具知识提炼成语言模型 2506.19171v1 |
Authors (12): Xingyue Huang, Xianglong Hu, Zifeng Ding, Yuan He, Rishabh, Waleed Alzarooni, Ziyu Ye, Wendong Fan, Bailan He, Haige Bo, Changran Hu, Guohao Li
Large language models (LLMs) often struggle with mathematical problems that require exact computation or multi-step algebraic reasoning. Tool-integrated reasoning (TIR) offers a promising solution by leveraging external tools such as code interpreters to ensure correctness, but it introduces inference-time dependencies that hinder scalability and deployment. In this work, we propose a new paradigm for distilling tool knowledge into LLMs purely through natural language. We first construct a Solver Agent that solves math problems by interleaving planning, symbolic tool calls, and reflective reasoning. Then, using a back-translation pipeline powered by multiple LLM-based agents, we convert interleaved TIR traces into natural language reasoning traces. A Translator Agent generates explanations for individual tool calls, while a Rephrase Agent merges them into a fluent and globally coherent narrative. Empirically, we show that fine-tuning a small open-source model on these synthesized traces enables it to internalize both tool knowledge and structured reasoning patterns, yielding gains on competition-level math benchmarks without requiring tool access at inference.
nan
Article 589
Title@2025-06-23 (1): A Deep Learning Based Method for Fast Registration of Cardiac Magnetic Resonance Images
Title: A Deep Learning Based Method for Fast Registration of Cardiac Magnetic Resonance Images | Eine Deep Learning-basierte Methode zur schnellen Registrierung von Herz-Magnetresonanz-Bildern | 快速登记心电磁共振图像的深学习法 2506.19167v1 |
Authors (1): Benjamin Graham
Image registration is used in many medical image analysis applications, such as tracking the motion of tissue in cardiac images, where cardiac kinematics can be an indicator of tissue health. Registration is a challenging problem for deep learning algorithms because ground truth transformations are not feasible to create, and because there are potentially multiple transformations that can produce images that appear correlated with the goal. Unsupervised methods have been proposed to learn to predict effective transformations, but these methods take significantly longer to predict than established baseline methods. For a deep learning method to see adoption in wider research and clinical settings, it should be designed to run in a reasonable time on common, mid-level hardware. Fast methods have been proposed for the task of image registration but often use patch-based methods which can affect registration accuracy for a highly dynamic organ such as the heart. In this thesis, a fast, volumetric registration model is proposed for the use of quantifying cardiac strain. The proposed Deep Learning Neural Network (DLNN) is designed to utilize an architecture that can compute convolutions incredibly efficiently, allowing the model to achieve registration fidelity similar to other state-of-the-art models while taking a fraction of the time to perform inference. The proposed fast and lightweight registration (FLIR) model is used to predict tissue motion which is then used to quantify the non-uniform strain experienced by the tissue. For acquisitions taken from the same patient at approximately the same time, it would be expected that strain values measured between the acquisitions would have very small differences. Using this metric, strain values computed using the FLIR method are shown to be very consistent.
nan
Article 590
Title@2025-06-23 (1): GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model
Title: GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model | GradualDiff-Fed: Ein Federated Learning Specialized Framework für großes Sprachmodell | 逐步发展伙伴关系:联邦学习大语言模式专门框架 2506.19164v1 |
Authors (2): Amir Faiyaz, Tara Salman
The rapid proliferation of large language models (LLMs) has created an unprecedented demand for fine-tuning models for specialized domains, such as medical science. While federated learning (FL) offers a decentralized and privacy-preserving approach to collaboratively fine-tune LLMs without sharing raw data, it presents significant challenges, particularly in performance and managing large model sizes efficiently. In this paper, we introduce GradualDiff-Fed, an FL framework designed explicitly for LLMs, and their challenge of handling the high parameter size. GradualDiff-Fed reduces communication costs by transmitting only the difference of model weights rather than the entire model during training rounds. Such an approach significantly improves scalability and communication efficiency, making it more feasible to fine-tune LLMs across distributed clients without compromising performance. Our evaluation demonstrates that GradualDiff-Fed achieves performance on par with centralized training while drastically reducing communication overhead. These results highlight the potential of GradualDiff-Fed as an efficient solution for fine-tuning large models from distributed data in privacy-preserving settings without comprising performance.
nan
Article 591
Title@2025-06-23 (1): ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs
Title: ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs | ProxSparse: Regularisiertes Lernen von halbstrukturierten Sparsity Masken für vortrainierte LLMs | ProxSparse:为预先培训的LMM 定期学习半结构化半结构化的顶罩 2502.00258v2 |
Authors (8): Hongyi Liu, Rajarshi Saha, Zhen Jia, Youngsuk Park, Jiaji Huang, Shoham Sabach, Yu-Xiang Wang, George Karypis
Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semi-structured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback. We present ProxSparse, a learning-based framework for mask selection enabled by regularized optimization. ProxSparse transforms the rigid, non-differentiable mask selection process into a smoother optimization procedure, allowing gradual mask exploration with flexibility. ProxSparse does not involve additional weight updates once the mask is determined. Our extensive evaluations on 7 widely used models show that ProxSparse consistently outperforms previously proposed semi-structured mask selection methods with significant improvement, demonstrating the effectiveness of our learned approach towards semi-structured pruning.
nan
Article 592
Title@2025-06-23 (1): Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
Title: Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality | Hintere Kontraktion für Sparse Neuronale Netzwerke in Besov-Räumen mit Intrinsischer Dimensionalität | 贝索夫空间内有内分层的微孔神经网络的皮层收缩 2506.19144v1 |
Authors (4): Kyeongwon Lee, Lizhen Lin, Jaewoo Park, Seonghyun Jeong
This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.
nan
Article 593
Title@2025-06-23 (1): EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding
Title: EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding | EEG-Stiftungsherausforderung: Von der Cross-Task zur Cross-Subject-EEG-Dekodierung | EEG基金会挑战:从跨任务到跨主题的EEG解码 2506.19141v1 |
Authors (19): Bruno Aristimunha, Dung Truong, Pierre Guetschel, Seyed Yahya Shirazi, Isabelle Guyon, Alexandre R. Franco, Michael P. Milham, Aviv Dotan, Scott Makeig, Alexandre Gramfort, Jean-Remi King, Marie-Constance Corsi, Pedro A. Valdés-Sosa, Amit Majumdar, Alan Evans, Terrence J Sejnowski, Oren Shriki, Sylvain Chevallier, Arnaud Delorme
Current electroencephalogram (EEG) decoding models are typically trained on small numbers of subjects performing a single task. Here, we introduce a large-scale, code-submission-based competition comprising two challenges. First, the Transfer Challenge asks participants to build and test a model that can zero-shot decode new tasks and new subjects from their EEG data. Second, the Psychopathology factor prediction Challenge asks participants to infer subject measures of mental health from EEG data. For this, we use an unprecedented, multi-terabyte dataset of high-density EEG signals (128 channels) recorded from over 3,000 child to young adult subjects engaged in multiple active and passive tasks. We provide several tunable neural network baselines for each of these two challenges, including a simple network and demographic-based regression models. Developing models that generalise across tasks and individuals will pave the way for ML network architectures capable of adapting to EEG data collected from diverse tasks and individuals. Similarly, predicting mental health-relevant personality trait values from EEG might identify objective biomarkers useful for clinical diagnosis and design of personalised treatment for psychological conditions. Ultimately, the advances spurred by this challenge could contribute to the development of computational psychiatry and useful neurotechnology, and contribute to breakthroughs in both fundamental neuroscience and applied clinical research.
nan
Article 594
Title@2025-06-23 (1): Command-V: Pasting LLM Behaviors via Activation Profiles
Title: Command-V: Pasting LLM Behaviors via Activation Profiles | Befehl-V: Einfügen von LLM-Behaviors über Aktivierungsprofile | 命令- V: 通过激活剖析档粘贴 LLM 行为 2506.19140v1 |
Authors (7): Barry Wang, Avi Schwarzschild, Alexander Robey, Ali Payani, Charles Fleming, Mingjie Sun, Daphne Ippolito
Retrofitting large language models (LLMs) with new behaviors typically requires full finetuning or distillation-costly steps that must be repeated for every architecture. In this work, we introduce Command-V, a backpropagation-free behavior transfer method that copies an existing residual activation adapter from a donor model and pastes its effect into a recipient model. Command-V profiles layer activations on a small prompt set, derives linear converters between corresponding layers, and applies the donor intervention in the recipient’s activation space. This process does not require access to the original training data and needs minimal compute. In three case studies-safety-refusal enhancement, jailbreak facilitation, and automatic chain-of-thought reasoning–Command-V matches or exceeds the performance of direct finetuning while using orders of magnitude less compute. Our code and data are accessible at https://github.com/GithuBarry/Command-V/.
nan
Article 595
Title@2025-06-23 (1): Local Learning Rules for Out-of-Equilibrium Physical Generative Models
Title: Local Learning Rules for Out-of-Equilibrium Physical Generative Models | Lokale Lernregeln für Physische Generative Modelle außerhalb des Equilibriums | 外部平衡物理生成模型的地方学习规则 2506.19136v1 |
Authors (4): Cyrill Bösch, Geoffrey Roeder, Marc Serra-Garcia, Ryan P. Adams
We show that the out-of-equilibrium driving protocol of score-based generative models (SGMs) can be learned via a local learning rule. The gradient with respect to the parameters of the driving protocol are computed directly from force measurements or from observed system dynamics. As a demonstration, we implement an SGM in a network of driven, nonlinear, overdamped oscillators coupled to a thermal bath. We first apply it to the problem of sampling from a mixture of two Gaussians in 2D. Finally, we train a network of 10x10 oscillators to sample images of 0s and 1s from the MNIST dataset.
nan
Article 596
Title@2025-06-23 (1): Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
Title: Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series | Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen | 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v2 |
Authors (7): Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wen-Chih Peng, Tien-Fu Chen, Wei Wang
Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions. The dataset is publicly available at https://www.kaggle.com/datasets/blacksnail789521/time-imm/data, and the benchmark library can be accessed at https://anonymous.4open.science/r/IMMTSF_NeurIPS2025.
nan
Article 597
Title@2025-06-23 (1): Riemannian generative decoder
Title: Riemannian generative decoder | Riemannischer Generativ-Decoder | 里伊曼尼基因解码器 2506.19133v1 |
Authors (3): Andreas Bjerregaard, Søren Hauberg, Anders Krogh
Riemannian representation learning typically relies on approximating densities on chosen manifolds. This involves optimizing difficult objectives, potentially harming models. To completely circumvent this issue, we introduce the Riemannian generative decoder which finds manifold-valued maximum likelihood latents with a Riemannian optimizer while training a decoder network. By discarding the encoder, we vastly simplify the manifold constraint compared to current approaches which often only handle few specific manifolds. We validate our approach on three case studies – a synthetic branching diffusion process, human migrations inferred from mitochondrial DNA, and cells undergoing a cell division cycle – each showing that learned representations respect the prescribed geometry and capture intrinsic non-Euclidean structure. Our method requires only a decoder, is compatible with existing architectures, and yields interpretable latent spaces aligned with data geometry.
nan
Article 598
Title@2025-06-23 (1): Finding Clustering Algorithms in the Transformer Architecture
Title: Finding Clustering Algorithms in the Transformer Architecture | Clustering-Algorithmen in der Transformer-Architektur finden | 在变换结构中查找聚集的算法 2506.19125v1 |
Authors (5): Kenneth L. Clarkson, Lior Horesh, Takuya Ito, Charlotte Park, Parikshit Ram
The invention of the transformer architecture has revolutionized Artificial Intelligence (AI), yielding unprecedented success in areas such as natural language processing, computer vision, and multimodal reasoning. Despite these advances, it is unclear whether transformers are able to learn and implement precise algorithms. Here, we demonstrate that transformers can exactly implement a fundamental and widely used algorithm for $k$-means clustering: Lloyd’s algorithm. First, we theoretically prove the existence of such a transformer architecture, which we term the $k$-means transformer, that exactly implements Lloyd’s algorithm for $k$-means clustering using the standard ingredients of modern transformers: attention and residual connections. Next, we numerically implement this transformer and demonstrate in experiments the exact correspondence between our architecture and Lloyd’s algorithm, providing a fully neural implementation of $k$-means clustering. Finally, we demonstrate that interpretable alterations (e.g., incorporating layer normalizations or multilayer perceptrons) to this architecture yields diverse and novel variants of clustering algorithms, such as soft $k$-means, spherical $k$-means, trimmed $k$-means, and more. Collectively, our findings demonstrate how transformer mechanisms can precisely map onto algorithmic procedures, offering a clear and interpretable perspective on implementing precise algorithms in transformers.
nan
Article 599
Title@2025-06-23 (1): CUPID: Curating Data your Robot Loves with Influence Functions
Title: CUPID: Curating Data your Robot Loves with Influence Functions | CUPID: Daten kuratieren, die Ihr Roboter mit Einflussfunktionen liebt | CUPID: 计算机器人爱的有影响函数的曲线数据 2506.19121v1 |
Authors (8): Christopher Agia, Rohan Sinha, Jingyun Yang, Rika Antonova, Marco Pavone, Haruki Nishimura, Masha Itkina, Jeannette Bohg
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy’s expected return. This enables ranking and selection of demonstrations according to their impact on the policy’s closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Additional materials are made available at: https://cupid-curation.github.io.
nan
Article 600
Title@2025-06-23 (1): Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models
Title: Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models | Blameless Users in einem sauberen Raum: Definition des Urheberrechtsschutzes für generative Modelle | 清洁室内的无名用户:界定对创源模式的版权保护 2506.19881v1 |
Authors (1): Aloni Cohen
Are there any conditions under which a generative model’s outputs are guaranteed not to infringe the copyrights of its training data? This is the question of “provable copyright protection” first posed by Vyas, Kakade, and Barak (ICML 2023). They define near access-freeness (NAF) and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection – foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub being tainted. Then, we introduce our blameless copy protection framework for defining meaningful guarantees, and instantiate it with clean-room copy protection. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual clean-room setting. Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset is golden, a copyright deduplication requirement.
nan
Article 601
Title@2025-06-23 (1): On the algorithmic construction of deep ReLU networks
Title: On the algorithmic construction of deep ReLU networks | Zur algorithmischen Konstruktion von tiefen ReLU-Netzwerken | 关于深ReLU网络的算法构造 2506.19104v1 |
Authors (1): Daan Huybrechs
It is difficult to describe in mathematical terms what a neural network trained on data represents. On the other hand, there is a growing mathematical understanding of what neural networks are in principle capable of representing. Feedforward neural networks using the ReLU activation function represent continuous and piecewise linear functions and can approximate many others. The study of their expressivity addresses the question: which ones? Contributing to the available answers, we take the perspective of a neural network as an algorithm. In this analogy, a neural network is programmed constructively, rather than trained from data. An interesting example is a sorting algorithm: we explicitly construct a neural network that sorts its inputs exactly, not approximately, and that, in a sense, has optimal computational complexity if the input dimension is large. Such constructed networks may have several billion parameters. We construct and analyze several other examples, both existing and new. We find that, in these examples, neural networks as algorithms are typically recursive and parallel. Compared to conventional algorithms, ReLU networks are restricted by having to be continuous. Moreover, the depth of recursion is limited by the depth of the network, with deep networks having superior properties over shallow ones.
nan
Article 602
Title@2025-06-23 (1): ADVLLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
Title: ADVLLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities | ADVLLM: Iterative Selbst-Tuning LLMs für verbesserte Jailbreaking-Fähigkeiten | ADVLLM: 强化破室能力自动自动自调LMs 2410.18469v4 |
Authors (8): Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao
Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses. Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100\% ASR on various open-source LLMs. Moreover, it exhibits strong attack transferability to closed-source models, achieving 99\% ASR on GPT-3.5 and 49\% ASR on GPT-4, despite being optimized solely on Llama3. Beyond improving jailbreak ability, ADV-LLM provides valuable insights for future safety alignment research through its ability to generate large datasets for studying LLM safety.
nan
Article 603
Title@2025-06-23 (1): Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Title: Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks | Code Graph Model (CGM): Ein Graph-integriertes Large Language Model für Repository-Level Software Engineering Aufgaben | 代码图表模型(CGM):存储层软件工程任务 2505.16901v4 |
Authors (15): Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di
Recent advances in Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging. Current solutions predominantly rely on proprietary LLM agents, which introduce unpredictability and limit accessibility, raising concerns about data privacy and model customization. This paper investigates whether open-source LLMs can effectively address repository-level tasks without requiring agent-based approaches. We demonstrate this is possible by enabling LLMs to comprehend functions and files within codebases through their semantic information and structural dependencies. To this end, we introduce Code Graph Models (CGMs), which integrate repository code graph structures into the LLM’s attention mechanism and map node attributes to the LLM’s input space using a specialized adapter. When combined with an agentless graph RAG framework, our approach achieves a 43.00% resolution rate on the SWE-bench Lite benchmark using the open-source Qwen2.5-72B model. This performance ranks first among open weight models, second among methods with open-source systems, and eighth overall, surpassing the previous best open-source model-based method by 12.33%.
nan
Article 604
Title@2025-06-23 (1): Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Title: Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation | Lernen von stochastischen Lehrerdarstellungen mit studentisch geführter Wissensdestillation | 利用学生指导知识蒸馏,从Stochatic教师代表处学习 2504.14307v2 |
Authors (6): Muhammad Haseeb Aslam, Clara Martinez, Marco Pedersoli, Alessandro Koerich, Ali Etemad, Eric Granger
Advances in self-distillation have shown that when knowledge is distilled from a teacher to a student using the same deep learning (DL) architecture, the student performance can surpass the teacher particularly when the network is overparameterized and the teacher is trained with early stopping. Alternatively, ensemble learning also improves performance, although training, storing, and deploying multiple models becomes impractical as the number of models grows. Even distilling an ensemble to a single student model or weight averaging methods first requires training of multiple teacher models and does not fully leverage the inherent stochasticity for generating and distilling diversity in DL models. These constraints are particularly prohibitive in resource-constrained or latency-sensitive applications such as wearable devices. This paper proposes to train only one model and generate multiple diverse teacher representations using distillation-time dropout. However, generating these representations stochastically leads to noisy representations that are misaligned with the learned task. To overcome this problem, a novel stochastic self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only, using student-guided knowledge distillation (SGKD). The student representation at each distillation step is used as authority to guide the distillation process. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods without increasing the model size at both training and testing time, and incurs negligible computational complexity compared to state-of-the-art ensemble learning and weight averaging methods.
nan
Article 605
Title@2025-06-23 (1): Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes
Title: Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes | Feinsteuerung eines Weather Foundation Modells mit leichten Decodern für ungesehene physikalische Prozesse | 微调天气基础模型,为未见物理过程使用轻量代谢器 2506.19088v1 |
Authors (6): Fanny Lehmann, Firat Ozdemir, Benedikt Soja, Torsten Hoefler, Siddhartha Mishra, Sebastian Schemm
Recent advances in AI weather forecasting have led to the emergence of so-called “foundation models”, typically defined by expensive pretraining and minimal fine-tuning for downstream tasks. However, in the natural sciences, a desirable foundation model should also encode meaningful statistical relationships between the underlying physical variables. This study evaluates the performance of the state-of-the-art Aurora foundation model in predicting hydrological variables, which were not considered during pretraining. We introduce a lightweight approach using shallow decoders trained on the latent representations of the pretrained model to predict these new variables. As a baseline, we compare this to fine-tuning the full model, which allows further optimization of the latent space while incorporating new variables into both inputs and outputs. The decoder-based approach requires 50% less training time and 35% less memory, while achieving strong accuracy across various hydrological variables and preserving desirable properties of the foundation model, such as autoregressive stability. Notably, decoder accuracy depends on the physical correlation between the new variables and those used during pretraining, indicating that Aurora’s latent space captures meaningful physical relationships. In this sense, we argue that an important quality metric for foundation models in Earth sciences is their ability to be extended to new variables without a full fine-tuning. This provides a new perspective for making foundation models more accessible to communities with limited computational resources, while supporting broader adoption in Earth sciences.
nan
Article 606
Title@2025-06-23 (1): Benchmarking Music Generation Models and Metrics via Human Preference Studies
Title: Benchmarking Music Generation Models and Metrics via Human Preference Studies | Benchmarking von Musikgenerierungsmodellen und Metrics über Human Preference Studies | 通过人类特惠研究制定音乐创作模型和计量基准 2506.19085v1 |
Authors (4): Florian Grötschla, Ahmet Solak, Luca A. Lanzendörfer, Roger Wattenhofer
Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. To further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.
nan
Article 607
Title@2025-06-23 (1): FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation
Title: FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation | FaircauseSyn: Auf dem Weg zu einer ursächlich fairen LLM-generierten synthetischen Datengenerierung | FairCreause Syn: 迈向产生公平而公平的LLM – – 增强的合成数据 2506.19082v1 |
Authors (3): Nitish Nagesh, Ziyu Wang, Amir M. Rahmani
Synthetic data generation creates data based on real-world data using generative models. In health applications, generating high-quality data while maintaining fairness for sensitive attributes is essential for equitable outcomes. Existing GAN-based and LLM-based methods focus on counterfactual fairness and are primarily applied in finance and legal domains. Causal fairness provides a more comprehensive evaluation framework by preserving causal structure, but current synthetic data generation methods do not address it in health settings. To fill this gap, we develop the first LLM-augmented synthetic data generation method to enhance causal fairness using real-world tabular health data. Our generated data deviates by less than 10% from real data on causal fairness metrics. When trained on causally fair predictors, synthetic data reduces bias on the sensitive attribute by 70% compared to real data. This work improves access to fair synthetic data, supporting equitable health research and healthcare delivery.
nan
Article 608
Title@2025-06-23 (1): First-Order Sparse Convex Optimization: Better Rates with Sparse Updates
Title: First-Order Sparse Convex Optimization: Better Rates with Sparse Updates | Sparse Convex Optimization: Bessere Preise mit Sparse-Updates | 第一序式螺旋螺旋式最优化: 与粗序更新相比, 利率更好 。 2506.19075v1 |
Authors (1): Dan Garber
In was recently established that for convex optimization problems with a sparse optimal solution (may it be entry-wise sparsity or matrix rank-wise sparsity) it is possible to have linear convergence rates which depend on an improved mixed-norm condition number of the form $\frac{\beta_1{}s}{\alpha_2}$, where $\beta_1$ is the $\ell_1$-Lipchitz continuity constant of the gradient, $\alpha_2$ is the $\ell_2$-quadratic growth constant, and $s$ is the sparsity of the optimal solution. However, beyond the improved convergence rate, these methods are unable to leverage the sparsity of optimal solutions towards improving also the runtime of each iteration, which may still be prohibitively high for high-dimensional problems. In this work, we establish that linear convergence rates which depend on this improved condition number can be obtained using only sparse updates, which may result in overall significantly improved running times. Moreover, our methods are considerably easier to implement.
nan
Article 609
Title@2025-06-23 (1): Which Company Adjustment Matter? Insights from Uplift Modeling on Financial Health
Title: Which Company Adjustment Matter? Insights from Uplift Modeling on Financial Health | Welches Unternehmen passt sich an? Einblicke aus dem Uplift Modelling on Financial Health | 哪些公司调整重要?从提升金融健康模型中的观点 2506.19049v1 |
Authors (2): Xinlin Wang, Mats Brorsson
Uplift modeling has achieved significant success in various fields, particularly in online marketing. It is a method that primarily utilizes machine learning and deep learning to estimate individual treatment effects. This paper we apply uplift modeling to analyze the effect of company adjustment on their financial status, and we treat these adjustment as treatments or interventions in this study. Although there have been extensive studies and application regarding binary treatments, multiple treatments, and continuous treatments, company adjustment are often more complex than these scenarios, as they constitute a series of multiple time-dependent actions. The effect estimation of company adjustment needs to take into account not only individual treatment traits but also the temporal order of this series of treatments. This study collects a real-world data set about company financial statements and reported behavior in Luxembourg for the experiments. First, we use two meta-learners and three other well-known uplift models to analyze different company adjustment by simplifying the adjustment as binary treatments. Furthermore, we propose a new uplift modeling framework (MTDnet) to address the time-dependent nature of these adjustment, and the experimental result shows the necessity of considering the timing of these adjustment.
nan
Article 610
Title@2025-06-23 (1): Rational Metareasoning for Large Language Models
Title: Rational Metareasoning for Large Language Models | Rationale Metaveraking für große Sprachmodelle | 大语言模型的逻辑比值 2410.05563v3 |
Authors (5): C. Nicolò De Sabbata, Theodore R. Sumers, Badr AlKhamissi, Antoine Bosselut, Thomas L. Griffiths
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. How, then, might we optimize reasoning’s cost-performance tradeoff? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37\% fewer tokens generated across three models) while maintaining task performance across diverse datasets.
nan
Article 611
Title@2025-06-23 (1): Self-reflecting Large Language Models: A Hegelian Dialectical Approach
Title: Self-reflecting Large Language Models: A Hegelian Dialectical Approach | Selbstreflektierende große Sprachmodelle: Ein hegelianischer dialektischer Ansatz | 自我反映大语言模型:海格利人对立方法 2501.14917v6 |
Authors (6): Sara Abdali, Can Goksen, Michael Solodko, Saeed Amizadeh, Julie E. Maybee, Kazuhito Koishida
Investigating NLP through a philosophical lens has recently caught researchers’ eyes, as it bridges computational methods with classical schools of philosophy. This paper introduces a philosophical framework inspired by the Hegelian Dialectic to enable LLMs’ self-reflection, utilizing a self-dialectical approach to emulate internal critiques and synthesize new scientific ideas (spanning domains such as mathematics, physics, and more). Additionally, we explore the effect of generation temperature in LLMs by introducing a dynamic annealing approach, which encourages creativity in the early stages and gradually focuses on refinement and nuance, as well as a constant-temperature strategy. Furthermore, we implement a Multi-Agent Majority Voting (MAMV) strategy to assess the validity and novelty of the generated ideas, which proves useful in the absence of domain experts. We also evaluate the effectiveness of our method in generating novel scientific ideas and improving LLMs’ reasoning capabilities. Our experiments demonstrate promising results in ideation, along with significant improvements in mathematical and symbolic reasoning.
nan
Article 612
Title@2025-06-23 (1): Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Title: Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training | Critical Batch Size Revisited: Ein einfacher empirischer Ansatz für großflächige Sprachmodellschulungen | 重新审视关键批量大小:大型批量语文示范培训的简单经验方法 2505.23971v2 |
Authors (4): William Merrill, Shane Arora, Dirk Groeneveld, Hannaneh Hajishirzi
The right batch size is important when training language models at scale: a large batch size is necessary for fast training, but a batch size that is too large will harm token efficiency. To navigate this tradeoff, McCandlish et al. (2018) suggest that a critical batch size (CBS), below which training will not substantially degrade loss, can be estimated based on the gradient noise scale during training. While their method has been adopted in practice, e.g., when training GPT-3, strong assumptions are required to justify gradient noise as a proxy for the CBS, which makes it unclear whether their approach should be trusted in practice, limiting its applicability. In this paper, we introduce a simple, empirical approach to directly measure the CBS and show how the CBS evolves over training. Applying our approach to the OLMo models, we find that CBS is near 0 at initialization, increases rapidly at first, and then plateaus as training progresses. Furthermore, we find that this trend holds across different model sizes (1B and 7B), suggesting CBS from small training runs can inform larger-scale training runs. Our findings about how the CBS changes over training motivate batch size warmup as a natural way to reliably train language models at large batch size: start the batch size small and increase it as the CBS grows. To validate this claim, we use batch size warmup to train OLMo 1B to slightly better loss than the original training run with 43% fewer gradient steps. This shows how our framework can be applied to reliably train language models at larger batch sizes, increasing data parallelism without compromising performance.
nan
Article 613
Title@2025-06-23 (1): Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments
Title: Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments | Online-Lernen für dynamischen Vickrey-Clarke-Groves-Mechanismus in sequenziellen Auktionen unter unbekannten Umgebungen | 在未知环境中有顺序拍卖的动态Vickrey-Clark-Groves机制在线学习 2506.19038v1 |
Authors (2): Vincent Leon, S. Rasoul Etesami
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders’ values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP), where the transition kernel and reward functions are unknown to the seller. In each round, the seller determines an allocation and a payment for each bidder. Each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller’s allocation policy. Unlike existing works that formulate the problem as a multi-armed bandit model or as an episodic MDP, where the environment resets to an initial state after each round or episode, our paper considers a more realistic and sophisticated setting in which the market continues to evolve without restarting. We first extend the Vickrey-Clarke-Groves (VCG) mechanism, which is known to be efficient, truthful, and individually rational for one-shot static auctions, to sequential auctions, thereby obtaining a dynamic VCG mechanism counterpart that preserves these desired properties. We then focus on the online setting and develop an online reinforcement learning algorithm for the seller to learn the underlying MDP model and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned online mechanism asymptotically converges to a dynamic mechanism that approximately satisfies efficiency, truthfulness, and individual rationality with arbitrarily high probability and achieves guaranteed performance in terms of various notions of regret.
nan
Article 614
Title@2025-06-23 (1): Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Title: Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning | Robustes Verstärktes Lernen aus menschlichem Feedback für große Sprachmodelle Feintuning | 从人类反馈中学习大语言模型精美调整 2504.03784v4 |
Authors (5): Kai Ye, Hongyi Zhou, Jin Zhu, Francesco Quinzan, Chengchun Shi
Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments. In this paper, we propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications. Theoretically, our algorithm reduces the variance of reward and policy estimators, leading to improved regret bounds. Empirical evaluations on LLM benchmark datasets demonstrate that the proposed algorithm consistently outperforms existing methods, with 77-81% of responses being favored over baselines on the Anthropic Helpful and Harmless dataset.
nan
Article 615
Title@2025-06-23 (1): Plan for Speed – Dilated Scheduling for Masked Diffusion Language Models
Title: Plan for Speed – Dilated Scheduling for Masked Diffusion Language Models | Plan für Geschwindigkeit – Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle | 速度计划 – – 蒙面传播语言模型的压缩排程计划 2506.19037v1 |
Authors (3): Omer Luxembourg, Haim Permuter, Eliya Nachmani
Masked diffusion language models (MDLM) have shown strong promise for non-autoregressive text generation, yet existing samplers act as implicit planners, selecting tokens to unmask via denoiser confidence or entropy scores. Such heuristics falter under parallel unmasking - they ignore pairwise interactions between tokens and cannot account for dependencies when unmasking multiple positions at once, limiting their inference time to traditional auto-regressive (AR) models. We introduce the Dilated-scheduled Unmasking Strategy (DUS), an inference-only, planner-model-free method that requires no additional training. DUS leverages a first-order Markov assumption to partition sequence positions into dilation-based groups of non-adjacent tokens, enabling independent, parallel unmasking steps that respect local context that minimizes the joint entropy of each iteration step. Unlike semi-AR block approaches (e.g., LLADA and Dream) that still invoke the denoiser per block, DUS reduces the number of denoiser calls to O(log B) per generation block - yielding substantial speedup over the O(B) run time of state-of-the-art diffusion models, where B is the block size in the semi-AR inference process. In experiments on math (GSM8K) and code completion (Humaneval, MBPP) benchmarks - domains suited to non-ordinal generation - DUS improves scores over parallel confidence-based planner, without modifying the underlying denoiser. DUS offers a lightweight, budget-aware approach to efficient, high-quality text generation, paving the way to unlock the true capabilities of MDLMs.
nan
Article 616
Title@2025-06-23 (1): Failure Modes of Time Series Interpretability Algorithms for Critical Care Applications and Potential Solutions
Title: Failure Modes of Time Series Interpretability Algorithms for Critical Care Applications and Potential Solutions | Failure Modes of Time Series Interpretations-Algorithmen für kritische Pflegeanwendungen und mögliche Lösungen | 关键护理应用和潜在解决方案的可解释性数值 2506.19035v1 |
Authors (2): Shashank Yadav, Vignesh Subbian
Interpretability plays a vital role in aligning and deploying deep learning models in critical care, especially in constantly evolving conditions that influence patient survival. However, common interpretability algorithms face unique challenges when applied to dynamic prediction tasks, where patient trajectories evolve over time. Gradient, Occlusion, and Permutation-based methods often struggle with time-varying target dependency and temporal smoothness. This work systematically analyzes these failure modes and supports learnable mask-based interpretability frameworks as alternatives, which can incorporate temporal continuity and label consistency constraints to learn feature importance over time. Here, we propose that learnable mask-based approaches for dynamic timeseries prediction problems provide more reliable and consistent interpretations for applications in critical care and similar domains.
nan
Article 617
Title@2025-06-23 (1): When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
Title: When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets | Wenn Diffusionsmodelle merken: Induktive Biasen in der Wahrscheinlichkeit Fluss von minimal-Norm Shallow Neural Nets | 当传播模型时 记忆化:最低浅质神经网可能性流动中的诱导二分法 2506.19031v1 |
Authors (6): Chen Zeno, Hila Manor, Greg Ongie, Nir Weinberger, Tomer Michaeli, Daniel Soudry
While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold. We analyze this by studying the probability flow of shallow ReLU neural network denoisers trained with minimal $\ell^2$ norm. For intuition, we introduce a simpler score flow and show that for orthogonal datasets, both flows follow similar trajectories, converging to a training point or a sum of training points. However, early stopping by the diffusion time scheduler allows probability flow to reach more general manifold points. This reflects the tendency of diffusion models to both memorize training samples and generate novel points that combine aspects of multiple samples, motivating our study of such behavior in simplified settings. We extend these results to obtuse simplex data and, through simulations in the orthogonal case, confirm that probability flow converges to a training point, a sum of training points, or a manifold point. Moreover, memorization decreases when the number of training samples grows, as fewer samples accumulate near training points.
nan
Article 618
Title@2025-06-23 (1): Emergent Risk Awareness in Rational Agents under Resource Constraints
Title: Emergent Risk Awareness in Rational Agents under Resource Constraints | Emergent Risk Awareness in Rational Agents unter Ressourcenbeschränkungen | 资源限制下对合理代理的新兴风险意识 2505.23436v3 |
Authors (7): Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Wei-Chen Lee, Ani Calinescu, Doyne Farmer, Michael Wooldridge
Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision-making problems under (approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade-offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival-driven preference shifts, identify conditions under which misalignment emerges and propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours. As a result, this work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under such survival pressure, and offer guidelines for safely deploying such AI systems in critical resource-limited environments.
nan
Article 619
Title@2025-06-23 (1): Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives
Title: Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives | Statistische Schlussfolgerung für optimale Verkehrskarten: Jüngste Fortschritte und Perspektiven | 最佳运输地图统计推论:最新进展和前景 2506.19025v1 |
Authors (3): Sivaraman Balakrishnan, Tudor Manole, Larry Wasserman
In many applications of optimal transport (OT), the object of primary interest is the optimal transport map. This map rearranges mass from one probability distribution to another in the most efficient way possible by minimizing a specified cost. In this paper we review recent advances in estimating and developing limit theorems for the OT map, using samples from the underlying distributions. We also review parallel lines of work that establish similar results for special cases and variants of the basic OT setup. We conclude with a discussion of key directions for future research with the goal of providing practitioners with reliable inferential tools.
nan
Article 620
Title@2025-06-23 (1): Double Machine Learning for Conditional Moment Restrictions: IV Regression, Proximal Causal Learning and Beyond
Title: Double Machine Learning for Conditional Moment Restrictions: IV Regression, Proximal Causal Learning and Beyond | Doppeltes maschinelles Lernen für bedingten Moment Einschränkungen: IV Regression, proximales Kausallernen und darüber hinaus | 有条件时刻限制的双机学习:四级递减、近似因果学习及以后 2506.14950v2 |
Authors (4): Daqian Shao, Ashkan Soleymani, Francesco Quinzan, Marta Kwiatkowska
Solving conditional moment restrictions (CMRs) is a key problem considered in statistics, causal inference, and econometrics, where the aim is to solve for a function of interest that satisfies some conditional moment equalities. Specifically, many techniques for causal inference, such as instrumental variable (IV) regression and proximal causal learning (PCL), are CMR problems. Most CMR estimators use a two-stage approach, where the first-stage estimation is directly plugged into the second stage to estimate the function of interest. However, naively plugging in the first-stage estimator can cause heavy bias in the second stage. This is particularly the case for recently proposed CMR estimators that use deep neural network (DNN) estimators for both stages, where regularisation and overfitting bias is present. We propose DML-CMR, a two-stage CMR estimator that provides an unbiased estimate with fast convergence rate guarantees. We derive a novel learning objective to reduce bias and develop the DML-CMR algorithm following the double/debiased machine learning (DML) framework. We show that our DML-CMR estimator can achieve the minimax optimal convergence rate of $O(N^{-1/2})$ under parameterisation and mild regularity conditions, where $N$ is the sample size. We apply DML-CMR to a range of problems using DNN estimators, including IV regression and proximal causal learning on real-world datasets, demonstrating state-of-the-art performance against existing CMR estimators and algorithms tailored to those problems.
nan
Article 621
Title@2025-06-23 (1): Automating Traffic Monitoring with SHM Sensor Networks via Vision-Supervised Deep Learning
Title: Automating Traffic Monitoring with SHM Sensor Networks via Vision-Supervised Deep Learning | Automatisieren der Verkehrsüberwachung mit SHM-Sensornetzwerken über Vision-Supervised Deep Learning | 通过视觉监督深层学习,与南高频传感器网络进行自动化交通监测 2506.19023v1 |
Authors (6): Hanshuo Wu, Xudong Jian, Christos Lataniotis, Cyprien Hoelzl, Eleni Chatzi, Yves Reuland
Bridges, as critical components of civil infrastructure, are increasingly affected by deterioration, making reliable traffic monitoring essential for assessing their remaining service life. Among operational loads, traffic load plays a pivotal role, and recent advances in deep learning - particularly in computer vision (CV) - have enabled progress toward continuous, automated monitoring. However, CV-based approaches suffer from limitations, including privacy concerns and sensitivity to lighting conditions, while traditional non-vision-based methods often lack flexibility in deployment and validation. To bridge this gap, we propose a fully automated deep-learning pipeline for continuous traffic monitoring using structural health monitoring (SHM) sensor networks. Our approach integrates CV-assisted high-resolution dataset generation with supervised training and inference, leveraging graph neural networks (GNNs) to capture the spatial structure and interdependence of sensor data. By transferring knowledge from CV outputs to SHM sensors, the proposed framework enables sensor networks to achieve comparable accuracy of vision-based systems, with minimal human intervention. Applied to accelerometer and strain gauge data in a real-world case study, the model achieves state-of-the-art performance, with classification accuracies of 99% for light vehicles and 94% for heavy vehicles.
nan
Article 622
Title@2025-06-23 (1): Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions
Title: Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions | Simulationsbasierte Sensitivitätsanalyse in Optimalen Behandlungsregimen und kausaler Zersetzung mit individualisierten Interventionen | 最佳治疗制度和与个性化干预相结合的因果分解中的模拟-基于模拟的敏感度分析 2506.19010v1 |
Authors (3): Soojin Park, Suyeon Kang, Chioun Lee
Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing optimal treatment regimes (OTRs). Since the newly defined individualized effects rely on the no omitted confounding assumption, developing sensitivity analyses to account for potential omitted confounding is essential. Moreover, OTRs and individualized effects are primarily based on binary risk factors, and no formal approach currently exists to benchmark the strength of omitted confounding using observed covariates for binary risk factors. To address this gap, we extend a simulation-based sensitivity analysis that simulates unmeasured confounders, addressing two sources of bias emerging from deriving OTRs and estimating individualized effects. Additionally, we propose a formal bounding strategy that benchmarks the strength of omitted confounding for binary risk factors. Using the High School Longitudinal Study 2009 (HSLS:09), we demonstrate this sensitivity analysis and benchmarking method.
nan
Article 623
Title@2025-06-23 (1): Steering Conceptual Bias via Transformer Latent-Subspace Activation
Title: Steering Conceptual Bias via Transformer Latent-Subspace Activation | Steuerung konzeptioneller Bias über Transformer Latent-Subspace-Aktivierung | 通过变换器中子空间动力动动 2506.18887v1 |
Authors (2): Vansh Sharma, Venkat Raman
This work examines whether activating latent subspaces in language models (LLMs) can steer scientific code generation toward a specific programming language. Five causal LLMs were first evaluated on scientific coding prompts to quantify their baseline bias among four programming languages. A static neuron-attribution method, perturbing the highest activated MLP weight for a C++ or CPP token, proved brittle and exhibited limited generalization across prompt styles and model scales. To address these limitations, a gradient-refined adaptive activation steering framework (G-ACT) was developed: per-prompt activation differences are clustered into a small set of steering directions, and lightweight per-layer probes are trained and refined online to select the appropriate steering vector. In LLaMA-3.2 3B, this approach reliably biases generation towards the CPP language by increasing the average probe classification accuracy by 15% and the early layers (0-6) improving the probe classification accuracy by 61.5% compared to the standard ACT framework. For LLaMA-3.3 70B, where attention-head signals become more diffuse, targeted injections at key layers still improve language selection. Although per-layer probing introduces a modest inference overhead, it remains practical by steering only a subset of layers and enables reproducible model behavior. These results demonstrate a scalable, interpretable and efficient mechanism for concept-level control for practical agentic systems.
nan
Article 624
Title@2025-06-23 (1): Accurate and scalable exchange-correlation with deep learning
Title: Accurate and scalable exchange-correlation with deep learning | Genaue und skalierbare Austauschkorrelation mit Deep Learning | 与深深学习的准确和可缩放的交换关系 2506.14665v3 |
Authors (25): Giulia Luise, Chin-Wei Huang, Thijs Vogels, Derk P. Kooi, Sebastian Ehlert, Stephanie Lanius, Klaas J. H. Giesbertz, Amir Karton, Deniz Gunceler, Megan Stanley, Wessel P. Bruinsma, Lin Huang, Xinran Wei, José Garrido Torres, Abylay Katbashev, Rodrigo Chavez Zavaleta, Bálint Máté, Sékou-Oumar Kaba, Roberto Sordillo, Yingrong Chen, David B. Williams-Young, Christopher M. Bishop, Jan Hermann, Rianne van den Berg, Paola Gori-Giorgi
Density Functional Theory (DFT) is the most widely used electronic structure method for predicting the properties of molecules and materials. Although DFT is, in principle, an exact reformulation of the Schr"odinger equation, practical applications rely on approximations to the unknown exchange-correlation (XC) functional. Most existing XC functionals are constructed using a limited set of increasingly complex, hand-crafted features that improve accuracy at the expense of computational efficiency. Yet, no current approximation achieves the accuracy and generality for predictive modeling of laboratory experiments at chemical accuracy – typically defined as errors below 1 kcal/mol. In this work, we present Skala, a modern deep learning-based XC functional that bypasses expensive hand-designed features by learning representations directly from data. Skala achieves chemical accuracy for atomization energies of small molecules while retaining the computational efficiency typical of semi-local DFT. This performance is enabled by training on an unprecedented volume of high-accuracy reference data generated using computationally intensive wavefunction-based methods. Notably, Skala systematically improves with additional training data covering diverse chemistry. By incorporating a modest amount of additional high-accuracy data tailored to chemistry beyond atomization energies, Skala achieves accuracy competitive with the best-performing hybrid functionals across general main group chemistry, at the cost of semi-local DFT. As the training dataset continues to expand, Skala is poised to further enhance the predictive power of first-principles simulations.
nan
Article 625
Title@2025-06-23 (1): A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series
Title: A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series | Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe | 时间序列中人类在Loop异常探测的可靠框架 2405.03234v4 |
Authors (4): Ziquan Deng, Xiwei Xuan, Kwan-Liu Ma, Zhaodan Kong
Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist – They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.
nan
Article 626
Title@2025-06-23 (1): CDI: Copyrighted Data Identification in Diffusion Models
Title: CDI: Copyrighted Data Identification in Diffusion Models | CDI: Copyrighted Data Identification in Diffusion Models | CDI: 传播模型中的版权数据识别 2411.12858v3 |
Authors (4): Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
Diffusion Models (DMs) benefit from large and diverse datasets for their training. Since this data is often scraped from the Internet without permission from the data owners, this raises concerns about copyright and intellectual property protections. While (illicit) use of data is easily detected for training samples perfectly re-created by a DM at inference time, it is much harder for data owners to verify if their data was used for training when the outputs from the suspect DM are not close replicas. Conceptually, membership inference attacks (MIAs), which detect if a given data point was used during training, present themselves as a suitable tool to address this challenge. However, we demonstrate that existing MIAs are not strong enough to reliably determine the membership of individual images in large, state-of-the-art DMs. To overcome this limitation, we propose CDI, a framework for data owners to identify whether their dataset was used to train a given DM. CDI relies on dataset inference techniques, i.e., instead of using the membership signal from a single data point, CDI leverages the fact that most data owners, such as providers of stock photography, visual media companies, or even individual artists, own datasets with multiple publicly exposed data points which might all be included in the training of a given DM. By selectively aggregating signals from existing MIAs and using new handcrafted methods to extract features for these datasets, feeding them to a scoring model, and applying rigorous statistical testing, CDI allows data owners with as little as 70 data points to identify with a confidence of more than 99% whether their data was used to train a given DM. Thereby, CDI represents a valuable tool for data owners to claim illegitimate use of their copyrighted data. We make the code available at https://github.com/sprintml/copyrighted_data_identification
nan
Article 627
Title@2025-06-23 (1): Controlling Moments with Kernel Stein Discrepancies
Title: Controlling Moments with Kernel Stein Discrepancies | Kontrollieren von Momenten mit Kernel Stein Diskrepanzen | 控制内核施用技术差异的控控时刻 2211.05408v7 |
Authors (4): Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.
nan
Article 628
Title@2025-06-23 (1): EXPRTS: Exploring and Probing the Robustness ofTime Series Forecasting Models
Title: EXPRTS: Exploring and Probing the Robustness ofTime Series Forecasting Models | EXPRTS: Erforschung und Erprobung der Robustheit von Zeitreihenprognosemodellen | EXPRTS:探索和检验时间系列预测模型的强劲性 2403.03508v2 |
Authors (7): Håkon Hanisch Kjærnli, Lluis Mas-Ribas, Hans Jakob Håland, Vegard Sjåvik, Aida Ashrafi, Helge Langseth, Odd Erik Gundersen
When deploying time series forecasting models based on machine learning to real world settings, one often encounter situations where the data distribution drifts. Such drifts expose the forecasting models to out-of-distribution (OOD) data, and machine learning models lack robustness in these settings. Robustness can be improved by using deep generative models or genetic algorithms to augment time series datasets, but these approaches lack interpretability and are computationally expensive. In this work, we develop an interpretable and simple framework for generating time series. Our method combines time-series decompositions with analytic functions, and is able to generate time series with characteristics matching both in- and out-of-distribution data. This approach allows users to generate new time series in an interpretable fashion, which can be used to augment the dataset and improve forecasting robustness. We demonstrate our framework through EXPRTS, a visual analytics tool designed for univariate time series forecasting models and datasets. Different visualizations of the data distribution, forecasting errors and single time series instances enable users to explore time series datasets, apply transformations, and evaluate forecasting model robustness across diverse scenarios. We show how our framework can generate meaningful OOD time series that improve model robustness, and we validate EXPRTS effectiveness and usability through three use-cases and a user study.
nan
Article 629
Title@2025-06-23 (1): A Comment On “The Illusion of Thinking”: Reframing the Reasoning Cliff as an Agentic Gap
Title: A Comment On “The Illusion of Thinking”: Reframing the Reasoning Cliff as an Agentic Gap | Ein Kommentar zu “Die Illusion des Denkens”: Den vernünftigen Cliff als Agent-Gap zurückweisen | 关于“思考的幻觉”的评论:将理性断裂重新定位为一种危险差距 2506.18957v1 |
Authors (3): Sheraz Khan, Subha Madhavan, Kannan Natarajan
The recent work by Shojaee et al. (2025), titled The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, presents a compelling empirical finding, a reasoning cliff, where the performance of Large Reasoning Models (LRMs) collapses beyond a specific complexity threshold, which the authors posit as an intrinsic scaling limitation of Chain-of-Thought (CoT) reasoning. This commentary, while acknowledging the study’s methodological rigor, contends that this conclusion is confounded by experimental artifacts. We argue that the observed failure is not evidence of a fundamental cognitive boundary, but rather a predictable outcome of system-level constraints in the static, text-only evaluation paradigm, including tool use restrictions, context window recall issues, the absence of crucial cognitive baselines, inadequate statistical reporting, and output generation limits. We reframe this performance collapse through the lens of an agentic gap, asserting that the models are not failing at reasoning, but at execution within a profoundly restrictive interface. We empirically substantiate this critique by demonstrating a striking reversal. A model, initially declaring a puzzle impossible when confined to text-only generation, now employs agentic tools to not only solve it but also master variations of complexity far beyond the reasoning cliff it previously failed to surmount. Additionally, our empirical analysis of tool-enabled models like o4-mini and GPT-4o reveals a hierarchy of agentic reasoning, from simple procedural execution to complex meta-cognitive self-correction, which has significant implications for how we define and measure machine intelligence. The illusion of thinking attributed to LRMs is less a reasoning deficit and more a consequence of an otherwise capable mind lacking the tools for action.
nan
Article 630
Title@2025-06-23 (1): Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment
Title: Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment | Segmentation-Aware Generatives Verstärkungsnetzwerk (GRN) für Tissue Layer Segmentierung in 3-D-Ulbrosound-Bildern für chronische Rückenschmerzen (cLBP) Assessment | 三维超声图像中3-三维超声图像中用于慢性低位疼痛(cLBBP)的 组织图层分层(CLBP)评估 2501.17690v3 |
Authors (17): Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Tong Yu, Jing Wang, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison Bean, Ryan Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Dinesh Kumbhare, Kang Kim, Ajay Wasan, Jiantao Pu
We introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the segmentation model. Two variants of GRN were also developed, including GRN for sample-efficient learning (GRN-SEL) and GRN for semi-supervised learning (GRN-SSL). GRN’s performance was evaluated using a dataset of 69 fully annotated 3D ultrasound scans from 29 subjects. The annotations included six anatomical structures: dermis, superficial fat, superficial fascial membrane (SFM), deep fat, deep fascial membrane (DFM), and muscle. Our results show that GRN-SEL with SGE reduces labeling efforts by up to 70% while achieving a 1.98% improvement in the Dice Similarity Coefficient (DSC) compared to models trained on fully labeled datasets. GRN-SEL alone reduces labeling efforts by 60%, GRN-SSL with SGE decreases labeling requirements by 70%, and GRN-SSL alone by 60%, all while maintaining performance comparable to fully supervised models. These findings suggest the effectiveness of the GRN framework in optimizing segmentation performance with significantly less labeled data, offering a scalable and efficient solution for ultrasound image analysis and reducing the burdens associated with data annotation.
nan
Article 631
Title@2025-06-23 (1): LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth
Title: LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth | LIGHTHOUSE: Schneller und präziser Abstand zu Küstenberechnungen von überall auf der Erde | 从地球上任何地方 快速和精确的距离到海岸线的计算 2506.18842v1 |
Authors (5): Patrick Beukema, Henry Herzog, Yawen Zhang, Hunter Pitelka, Favyen Bastani
We introduce a new dataset and algorithm for fast and efficient coastal distance calculations from Anywhere on Earth (AoE). Existing global coastal datasets are only available at coarse resolution (e.g. 1-4 km) which limits their utility. Publicly available satellite imagery combined with computer vision enable much higher precision. We provide a global coastline dataset at 10 meter resolution, a 100+ fold improvement in precision over existing data. To handle the computational challenge of querying at such an increased scale, we introduce a new library: Layered Iterative Geospatial Hierarchical Terrain-Oriented Unified Search Engine (Lighthouse). Lighthouse is both exceptionally fast and resource-efficient, requiring only 1 CPU and 2 GB of RAM to achieve millisecond online inference, making it well suited for real-time applications in resource-constrained environments.
nan
Article 632
Title@2025-06-23 (1): LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Title: LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | LongWriter-Zero: Mastering Ultra-Long Text Generation via Verstärkungslernen | LongWriter-Zero:通过强化学习掌握超大龙制文本 2506.18841v1 |
Authors (5): Yuhao Wu, Yushi Bai, Zhiqiang Hu, Roy Ka-Wei Lee, Juanzi Li
Ultra-long generation by large language models (LLMs) is a widely demanded scenario, yet it remains a significant challenge due to their maximum generation length limit and overall quality degradation as sequence length increases. Previous approaches, exemplified by LongWriter, typically rely on ‘‘teaching’’, which involves supervised fine-tuning (SFT) on synthetic long-form outputs. However, this strategy heavily depends on synthetic SFT data, which is difficult and costly to construct, often lacks coherence and consistency, and tends to be overly artificial and structurally monotonous. In this work, we propose an incentivization-based approach that, starting entirely from scratch and without relying on any annotated or synthetic data, leverages reinforcement learning (RL) to foster the emergence of ultra-long, high-quality text generation capabilities in LLMs. We perform RL training starting from a base model, similar to R1-Zero, guiding it to engage in reasoning that facilitates planning and refinement during the writing process. To support this, we employ specialized reward models that steer the LLM towards improved length control, writing quality, and structural formatting. Experimental evaluations show that our LongWriter-Zero model, trained from Qwen2.5-32B, consistently outperforms traditional SFT methods on long-form writing tasks, achieving state-of-the-art results across all metrics on WritingBench and Arena-Write, and even surpassing 100B+ models such as DeepSeek R1 and Qwen3-235B. We open-source our data and model checkpoints under https://huggingface.co/THU-KEG/LongWriter-Zero-32B
nan
Article 633
Title@2025-06-23 (1): A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection
Title: A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection | Eine umfassende Untersuchung von Techniken des maschinellen Lernens zur logbasierten Anomalieerkennung | 全面研究用于基于日志异常探测的机器学习技术 2307.16714v5 |
Authors (5): Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand
Growth in system complexity increases the need for automated log analysis techniques, such as Log-based Anomaly Detection (LAD). While deep learning (DL) methods have been widely used for LAD, traditional machine learning (ML) techniques can also perform well depending on the context and dataset. Semi-supervised techniques deserve the same attention as they offer practical advantages over fully supervised methods. Current evaluations mainly focus on detection accuracy, but this alone is insufficient to determine the suitability of a technique for a given LAD task. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. This paper presents a comprehensive empirical study evaluating a wide range of supervised and semi-supervised, traditional and deep ML techniques across four criteria: detection accuracy, time performance, and sensitivity to hyperparameter tuning in both detection accuracy and time performance. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time on most of the benchmark datasets considered in our study. Moreover, overall, sensitivity analysis to hyperparameter tuning with respect to detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.
nan
Article 634
Title@2025-06-23 (1): Conformal Prediction for Causal Effects of Continuous Treatments
Title: Conformal Prediction for Causal Effects of Continuous Treatments | Konforme Vorhersage für ursächliche Wirkungen kontinuierlicher Behandlungen | 持续治疗因果影响非正式预测 2407.03094v3 |
Authors (6): Maresa Schröder, Dennis Frauen, Jonas Schweisthal, Konstantin Heß, Valentyn Melnychuk, Stefan Feuerriegel
Uncertainty quantification of causal effects is crucial for safety-critical applications such as personalized medicine. A powerful approach for this is conformal prediction, which has several practical benefits due to model-agnostic finite-sample guarantees. Yet, existing methods for conformal prediction of causal effects are limited to binary/discrete treatments and make highly restrictive assumptions such as known propensity scores. In this work, we provide a novel conformal prediction method for potential outcomes of continuous treatments. We account for the additional uncertainty introduced through propensity estimation so that our conformal prediction intervals are valid even if the propensity score is unknown. Our contributions are three-fold: (1) We derive finite-sample prediction intervals for potential outcomes of continuous treatments. (2) We provide an algorithm for calculating the derived intervals. (3) We demonstrate the effectiveness of the conformal prediction intervals in experiments on synthetic and real-world datasets. To the best of our knowledge, we are the first to propose conformal prediction for continuous treatments when the propensity score is unknown and must be estimated from data.
nan
Article 635
Title@2025-06-23 (1): Regularized Neural Ensemblers
Title: Regularized Neural Ensemblers | Regularisierte Neurale Ensemblers | 正规神经组 2410.04520v2 |
Authors (6): Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka
Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembling often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we explore employing regularized neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions during the training. We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities. Our experiments showcase that the regularized neural ensemblers yield competitive results compared to strong baselines across several modalities such as computer vision, natural language processing, and tabular data.
nan
Article 636
Title@2025-06-23 (1): Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators
Title: Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators | Kernel-Spektralfugeneinbettungen für hochdimensionale laute Datensätze mit Duo-Landmark-Integraloperatoren | 使用双陆标记集成操作器进行高维噪音数据集的内核光谱联合嵌入 2405.12317v2 |
Authors (2): Xiucai Ding, Rong Ma
Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing nonlinear structures, insufficient account of noisiness and effects of high-dimensionality, lack of adaptivity to signals and sample sizes imbalance, and their results are sometimes difficult to interpret. To address these limitations, we propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets. The proposed method automatically captures and leverages possibly shared low-dimensional structures across datasets to enhance embedding quality. The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising. The proposed method is justified by rigorous theoretical analysis. Specifically, we show the consistency of our method in recovering the low-dimensional noiseless signals, and characterize the effects of the signal-to-noise ratios on the rates of convergence. Under a joint manifolds model framework, we establish the convergence of ultimate embeddings to the eigenfunctions of some newly introduced integral operators. These operators, referred to as duo-landmark integral operators, are defined by the convolutional kernel maps of some reproducing kernel Hilbert spaces (RKHSs). These RKHSs capture the either partially or entirely shared underlying low-dimensional nonlinear signal structures of the two datasets. Our numerical experiments and analyses of two single-cell omics datasets demonstrate the empirical advantages of the proposed method over existing methods in both embeddings and several downstream tasks.
nan
Article 637
Title@2025-06-23 (1): Maximizing Confidence Alone Improves Reasoning
Title: Maximizing Confidence Alone Improves Reasoning | Maximierung des Vertrauens allein verbessert die Vernunft | 使信心最大化单独提高合理性 2505.22660v3 |
Authors (6): Mihir Prabhudesai, Lili Chen, Alex Ippoliti, Katerina Fragkiadaki, Hao Liu, Deepak Pathak
Reinforcement learning (RL) has enabled machine learning models to achieve significant advances in many fields. Most recently, RL has empowered frontier language models to solve challenging math, science, and coding problems. However, central to any RL algorithm is the reward function, and reward engineering is a notoriously difficult problem in any domain. In this paper, we propose RENT: Reinforcement Learning via Entropy Minimization – a fully unsupervised RL method that requires no external reward or ground-truth answers, and instead uses the model’s entropy of its underlying distribution as an intrinsic reward. We find that by reinforcing the chains of thought that yield high model confidence on its generated answers, the model improves its reasoning ability. In our experiments, we showcase these improvements on an extensive suite of commonly-used reasoning benchmarks, including GSM8K, MATH500, AMC, AIME, and GPQA, and models of varying sizes from the Qwen and Mistral families. The generality of our unsupervised learning method lends itself to applicability in a wide range of domains where external supervision is unavailable.
nan
Article 638
Title@2025-06-23 (1): Multi-Agent Online Control with Adversarial Disturbances
Title: Multi-Agent Online Control with Adversarial Disturbances | Multi-Agent Online-Steuerung mit störenden Störungen | 具有对抗骚乱的多代理在线控制 2506.18814v1 |
Authors (4): Anas Barakat, John Lazarsfeld, Georgios Piliouras, Antonios Varvitsiotis
Multi-agent control problems involving a large number of agents with competing and time-varying objectives are increasingly prevalent in applications across robotics, economics, and energy systems. In this paper, we study online control in multi-agent linear dynamical systems with disturbances. In contrast to most prior work in multi-agent control, we consider an online setting where disturbances are adversarial and where each agent seeks to minimize its own, adversarial sequence of convex losses. In this setting, we investigate the robustness of gradient-based controllers from single-agent online control, with a particular focus on understanding how individual regret guarantees are influenced by the number of agents in the system. Under minimal communication assumptions, we prove near-optimal sublinear regret bounds that hold uniformly for all agents. Finally, when the objectives of the agents are aligned, we show that the multi-agent control problem induces a time-varying potential game for which we derive equilibrium gap guarantees.
nan
Article 639
Title@2025-06-23 (1): Learning Physical Systems: Symplectification via Gauge Fixing in Dirac Structures
Title: Learning Physical Systems: Symplectification via Gauge Fixing in Dirac Structures | Physikalische Systeme lernen: Symplektifizierung über Messstreifenfixierung in Dirac-Strukturen | 学习物理系统:通过在Dirac结构中定额进行定额的症状 2506.18812v1 |
Authors (4): Aristotelis Papatheodorou, Pranav Vaidhyanathan, Natalia Ares, Ioannis Havoutis
Physics-informed deep learning has achieved remarkable progress by embedding geometric priors, such as Hamiltonian symmetries and variational principles, into neural networks, enabling structure-preserving models that extrapolate with high accuracy. However, in systems with dissipation and holonomic constraints, ubiquitous in legged locomotion and multibody robotics, the canonical symplectic form becomes degenerate, undermining the very invariants that guarantee stability and long-term prediction. In this work, we tackle this foundational limitation by introducing Presymplectification Networks (PSNs), the first framework to learn the symplectification lift via Dirac structures, restoring a non-degenerate symplectic geometry by embedding constrained systems into a higher-dimensional manifold. Our architecture combines a recurrent encoder with a flow-matching objective to learn the augmented phase-space dynamics end-to-end. We then attach a lightweight Symplectic Network (SympNet) to forecast constrained trajectories while preserving energy, momentum, and constraint satisfaction. We demonstrate our method on the dynamics of the ANYmal quadruped robot, a challenging contact-rich, multibody system. To the best of our knowledge, this is the first framework that effectively bridges the gap between constrained, dissipative mechanical systems and symplectic learning, unlocking a whole new class of geometric machine learning models, grounded in first principles yet adaptable from data.
nan
Article 640
Title@2025-06-23 (1): Image Captions are Natural Prompts for Text-to-Image Models
Title: Image Captions are Natural Prompts for Text-to-Image Models | Bildunterschriften sind natürliche Prompts für Text-zu-Image-Modelle | 图像说明是文本到图像模型的自然提示 2307.08526v2 |
Authors (5): Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, Dacheng Tao
With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become a common practice to train models on synthetic data due to data-scarcity and privacy leakage problems. Owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generative models to synthesize informative training data with hand-crafted prompts. Considering the impressive ability of large generative models, could such models directly synthesize good training images for prediction tasks with proper prompts? We offer an affirmative response to this question by proposing a simple yet effective method, validated through ImageNet classification. Specifically, we caption each real image with the advanced captioning model to obtain informative and faithful prompts that extract class-relevant information and clarify the polysemy of class names. The image captions and class names are concatenated to prompt generative models for training image synthesis. We show that this simple caption incorporation significantly boosts the informativeness of synthetic data therefore enhancing downstream model generalization. More importantly, besides improvements in data augmentation and privacy preservation, our experiments demonstrate that synthesized images can exceed real data in terms of out-of-distribution robustness.
nan
Article 641
Title@2025-06-23 (1): Simple and Critical Iterative Denoising: A Recasting of Discrete Diffusion in Graph Generation
Title: Simple and Critical Iterative Denoising: A Recasting of Discrete Diffusion in Graph Generation | Einfaches und kritisches iteratives Denoisieren: Eine Neuformulierung von diskreter Diffusion in der Graphengenerierung | 简单和关键迭代代代代代代:图生成中分辨扩散的重新定性 2503.21592v2 |
Authors (1): Yoann Boget
Discrete Diffusion and Flow Matching models have significantly advanced generative modeling for discrete structures, including graphs. However, the dependencies between intermediate noisy states lead to error accumulation and propagation during the reverse denoising process - a phenomenon known as compounding denoising errors. To address this problem, we propose a novel framework called Simple Iterative Denoising, which simplifies discrete diffusion and circumvents the issue by assuming conditional independence between intermediate states. Additionally, we enhance our model by incorporating a Critic. During generation, the Critic selectively retains or corrupts elements in an instance based on their likelihood under the data distribution. Our empirical evaluations demonstrate that the proposed method significantly outperforms existing discrete diffusion baselines in graph generation tasks.
nan
Article 642
Title@2025-06-23 (1): A Multi-view Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction
Title: A Multi-view Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction | Ein Multi-View Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction | 与药物有关的微生物预测多视图差异-信念-特征增强框架 2506.18797v1 |
Authors (6): Xin An, Ruijie Li, Qiao Ning, Shikai Guo, Hui Li, Qian Ma
In the study of drug function and precision medicine, identifying new drug-microbe associations is crucial. However, current methods isolate association and similarity analysis of drug and microbe, lacking effective inter-view optimization and coordinated multi-view feature fusion. In our study, a multi-view Divergence-Convergence Feature Augmentation framework for Drug-related Microbes Prediction (DCFA_DMP) is proposed, to better learn and integrate association information and similarity information. In the divergence phase, DCFA_DMP strengthens the complementarity and diversity between heterogeneous information and similarity information by performing Adversarial Learning method between the association network view and different similarity views, optimizing the feature space. In the convergence phase, a novel Bidirectional Synergistic Attention Mechanism is proposed to deeply synergize the complementary features between different views, achieving a deep fusion of the feature space. Moreover, Transformer graph learning is alternately applied on the drug-microbe heterogeneous graph, enabling each drug or microbe node to focus on the most relevant nodes. Numerous experiments demonstrate DCFA_DMP’s significant performance in predicting drug-microbe associations. It also proves effectiveness in predicting associations for new drugs and microbes in cold start experiments, further confirming its stability and reliability in predicting potential drug-microbe associations.
nan
Article 643
Title@2025-06-23 (1): Focus Your Attention: Towards Data-Intuitive Lightweight Vision Transformers
Title: Focus Your Attention: Towards Data-Intuitive Lightweight Vision Transformers | Fokussieren Sie Ihre Aufmerksamkeit: Auf dem Weg zu datenintuitiven Leichtbautransformatoren | 关注焦点:面向数据直观的轻量级视觉变异器 2506.18791v1 |
Authors (4): Suyash Gaurav, Muhammad Farhan Humayun, Jukka Heikkonen, Jatin Chaudhary
The evolution of Vision Transformers has led to their widespread adaptation to different domains. Despite large-scale success, there remain significant challenges including their reliance on extensive computational and memory resources for pre-training on huge datasets as well as difficulties in task-specific transfer learning. These limitations coupled with energy inefficiencies mainly arise due to the computation-intensive self-attention mechanism. To address these issues, we propose a novel Super-Pixel Based Patch Pooling (SPPP) technique that generates context-aware, semantically rich, patch embeddings to effectively reduce the architectural complexity and improve efficiency. Additionally, we introduce the Light Latent Attention (LLA) module in our pipeline by integrating latent tokens into the attention mechanism allowing cross-attention operations to significantly reduce the time and space complexity of the attention module. By leveraging the data-intuitive patch embeddings coupled with dynamic positional encodings, our approach adaptively modulates the cross-attention process to focus on informative regions while maintaining the global semantic structure. This targeted attention improves training efficiency and accelerates convergence. Notably, the SPPP module is lightweight and can be easily integrated into existing transformer architectures. Extensive experiments demonstrate that our proposed architecture provides significant improvements in terms of computational efficiency while achieving comparable results with the state-of-the-art approaches, highlighting its potential for energy-efficient transformers suitable for edge deployment. (The code is available on our GitHub repository: https://github.com/zser092/Focused-Attention-ViT).
nan
Article 644
Title@2025-06-23 (1): Learning to Insert for Constructive Neural Vehicle Routing Solver
Title: Learning to Insert for Constructive Neural Vehicle Routing Solver | Einfügen lernen für konstruktive Neural Vehicle Routing Solver | 用于建设型神经车辆路标解答器的“插入学习” 2505.13904v2 |
Authors (7): Fu Luo, Xi Lin, Mengyuan Zhong, Fei Liu, Zhenkun Wang, Jianyong Sun, Qingfu Zhang
Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm’s flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes.
nan
Article 645
Title@2025-06-23 (1): Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning
Title: Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning | Shift Happens: Mischung aus Experten basierende kontinuierliche Anpassung im Federated Learning | 变化发生:基于专家的混合组合,在联邦学习中持续适应 2506.18789v1 |
Authors (4): Rahul Atul Bhope, K. R. Jayaram, Praveen Venkateswaran, Nalini Venkatasubramanian
Federated Learning (FL) enables collaborative model training across decentralized clients without sharing raw data, yet faces significant challenges in real-world settings where client data distributions evolve dynamically over time. This paper tackles the critical problem of covariate and label shifts in streaming FL environments, where non-stationary data distributions degrade model performance and require adaptive middleware solutions. We introduce ShiftEx, a shift-aware mixture of experts framework that dynamically creates and trains specialized global models in response to detected distribution shifts using Maximum Mean Discrepancy for covariate shifts. The framework employs a latent memory mechanism for expert reuse and implements facility location-based optimization to jointly minimize covariate mismatch, expert creation costs, and label imbalance. Through theoretical analysis and comprehensive experiments on benchmark datasets, we demonstrate 5.5-12.9 percentage point accuracy improvements and 22-95 % faster adaptation compared to state-of-the-art FL baselines across diverse shift scenarios. The proposed approach offers a scalable, privacy-preserving middleware solution for FL systems operating in non-stationary, real-world conditions while minimizing communication and computational overhead.
nan
Article 646
Title@2025-06-23 (1): A generalized neural tangent kernel for surrogate gradient learning
Title: A generalized neural tangent kernel for surrogate gradient learning | Ein generalisierter neuronaler Tangentenkern für das Erlernen von Surrogatgradienten | 用于代用梯度学习的普遍神经相近内核 2405.15539v2 |
Authors (3): Luke Eilers, Raoul-Martin Memmesheimer, Sven Goedeke
State-of-the-art neural network training methods depend on the gradient of the network function. Therefore, they cannot be applied to networks whose activation functions do not have useful derivatives, such as binary and discrete-time spiking neural networks. To overcome this problem, the activation function’s derivative is commonly substituted with a surrogate derivative, giving rise to surrogate gradient learning (SGL). This method works well in practice but lacks theoretical foundation. The neural tangent kernel (NTK) has proven successful in the analysis of gradient descent. Here, we provide a generalization of the NTK, which we call the surrogate gradient NTK, that enables the analysis of SGL. First, we study a naive extension of the NTK to activation functions with jumps, demonstrating that gradient descent for such activation functions is also ill-posed in the infinite-width limit. To address this problem, we generalize the NTK to gradient descent with surrogate derivatives, i.e., SGL. We carefully define this generalization and expand the existing key theorems on the NTK with mathematical rigor. Further, we illustrate our findings with numerical experiments. Finally, we numerically compare SGL in networks with sign activation function and finite width to kernel regression with the surrogate gradient NTK; the results confirm that the surrogate gradient NTK provides a good characterization of SGL.
nan
Article 647
Title@2025-06-23 (1): Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems
Title: Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems | Begründung von Einschränkungen multimodaler Großsprachenmodelle. Eine Fallstudie zu Bongard-Problemen | 多种多式大语言模型的理由限制,邦格问题案例研究 2411.01173v2 |
Authors (3): Mikołaj Małkiński, Szymon Pawlonka, Jacek Mańdziuk
Abstract visual reasoning (AVR) involves discovering shared concepts across images through analogy, akin to solving IQ test problems. Bongard Problems (BPs) remain a key challenge in AVR, requiring both visual reasoning and verbal description. We investigate whether multimodal large language models (MLLMs) can solve BPs by formulating a set of diverse MLLM-suited solution strategies and testing $4$ proprietary and $4$ open-access models on $3$ BP datasets featuring synthetic (classic BPs) and real-world (Bongard HOI and Bongard-OpenWorld) images. Despite some successes on real-world datasets, MLLMs struggle with synthetic BPs. To explore this gap, we introduce Bongard-RWR, a dataset representing synthetic BP concepts using real-world images. Our findings suggest that weak MLLM performance on classical BPs is not due to the domain specificity, but rather comes from their general AVR limitations. Code and dataset are available at: https://github.com/pavonism/bongard-rwr
nan
Article 648
Title@2025-06-23 (1): The Impact of Input Order Bias on Large Language Models for Software Fault Localization
Title: The Impact of Input Order Bias on Large Language Models for Software Fault Localization | Die Auswirkungen der Eingabereihenfolge Bias auf große Sprachmodelle für Softwarefehlerlokalisierung | 输入顺序对软件失错本地化大语言模式的影响 2412.18750v3 |
Authors (4): Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, Shaowei Wang
Large Language Models (LLMs) have shown significant potential in software engineering tasks such as Fault Localization (FL) and Automatic Program Repair (APR). This study investigates how input order and context size influence LLM performance in FL, a crucial step for many downstream software engineering tasks. We evaluate different method orderings using Kendall Tau distances, including “perfect” (where ground truths appear first) and “worst” (where ground truths appear last), across two benchmarks containing Java and Python projects. Our results reveal a strong order bias: in Java projects, Top-1 FL accuracy drops from 57% to 20% when reversing the order, while in Python projects, it decreases from 38% to approximately 3%. However, segmenting inputs into smaller contexts mitigates this bias, reducing the performance gap in FL from 22% and 6% to just 1% across both benchmarks. We replaced method names with semantically meaningful alternatives to determine whether this bias is due to data leakage. The observed trends remained consistent, suggesting that the bias is not caused by memorization from training data but rather by the inherent effect of input order. Additionally, we explored ordering methods based on traditional FL techniques and metrics, finding that DepGraph’s ranking achieves 48% Top-1 accuracy, outperforming simpler approaches such as CallGraph(DFS). These findings highlight the importance of structuring inputs, managing context effectively, and selecting appropriate ordering strategies to enhance LLM performance in FL and other software engineering applications.
nan
Article 649
Title@2025-06-23 (1): Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Title: Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training | Programmierung durch Backprop: LLMs Erwerben wiederverwendbarer algorithmischer Abstraktionen während der Code-Schulung | 按后方程式分列的编程情况: 守则培训期间可重复使用的演算摘要LLM Acquire Accre Repre Reable Agrotic Empactations 2506.18777v1 |
Authors (7): Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis
Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O examples. To explore this idea, we finetune LLMs on two sets of programs representing simple maths problems and algorithms: one with source code and I/O examples (w/ IO), the other with source code only (w/o IO). We find evidence that LLMs have some ability to evaluate w/o IO programs for inputs in a range of experimental settings, and make several observations. Firstly, PBB works significantly better when programs are provided as code rather than semantically equivalent language descriptions. Secondly, LLMs can produce outputs for w/o IO programs directly, by implicitly evaluating the program within the forward pass, and more reliably when stepping through the program in-context via chain-of-thought. We further show that PBB leads to more robust evaluation of programs across inputs than training on I/O pairs drawn from a distribution that mirrors naturally occurring data. Our findings suggest a mechanism for enhanced reasoning through code training: it allows LLMs to internalise reusable algorithmic abstractions. Significant scope remains for future work to enable LLMs to more effectively learn from symbolic procedures, and progress in this direction opens other avenues like model alignment by training on formal constitutional principles.
nan
Article 650
Title@2025-06-23 (1): Fast Bayesian Optimization of Function Networks with Partial Evaluations
Title: Fast Bayesian Optimization of Function Networks with Partial Evaluations | Schnelle Bayesian Optimierung von Funktionsnetzwerken mit teilweisen Bewertungen | 利用部分评价优化功能网络 2506.11456v2 |
Authors (2): Poompol Buathong, Peter I. Frazier
Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties - nodes that can be evaluated independently and incur varying costs. A recent BOFN variant, p-KGFN, leverages this structure and enables cost-aware partial evaluations, selectively querying only a subset of nodes at each iteration. p-KGFN reduces the number of expensive objective function evaluations needed but has a large computational overhead: choosing where to evaluate requires optimizing a nested Monte Carlo-based acquisition function for each node in the network. To address this, we propose an accelerated p-KGFN algorithm that reduces computational overhead with only a modest loss in query efficiency. Key to our approach is generation of node-specific candidate inputs for each node in the network via one inexpensive global Monte Carlo simulation. Numerical experiments show that our method maintains competitive query efficiency while achieving up to a 16x speedup over the original p-KGFN algorithm.
nan
Article 651
Title@2025-06-23 (1): DPG loss functions for learning parameter-to-solution maps by neural networks
Title: DPG loss functions for learning parameter-to-solution maps by neural networks | DPG-Verlustfunktionen für das Lernen von Parameter-zu-Lösung-Karten durch neuronale Netzwerke | 神经网络学习参数图解图的DPG损失函数 2506.18773v1 |
Authors (3): Pablo Cortés Castillo, Wolfgang Dahmen, Jay Gopalakrishnan
We develop, analyze, and experimentally explore residual-based loss functions for machine learning of parameter-to-solution maps in the context of parameter-dependent families of partial differential equations (PDEs). Our primary concern is on rigorous accuracy certification to enhance prediction capability of resulting deep neural network reduced models. This is achieved by the use of variationally correct loss functions. Through one specific example of an elliptic PDE, details for establishing the variational correctness of a loss function from an ultraweak Discontinuous Petrov Galerkin (DPG) discretization are worked out. Despite the focus on the example, the proposed concepts apply to a much wider scope of problems, namely problems for which stable DPG formulations are available. The issue of {high-contrast} diffusion fields and ensuing difficulties with degrading ellipticity are discussed. Both numerical results and theoretical arguments illustrate that for high-contrast diffusion parameters the proposed DPG loss functions deliver much more robust performance than simpler least-squares losses.
nan
Article 652
Title@2025-06-23 (1): Neural Total Variation Distance Estimators for Changepoint Detection in News Data
Title: Neural Total Variation Distance Estimators for Changepoint Detection in News Data | Neurale Gesamtvariationsdistanz-Schätzer für Changepoint Detection in News Daten | 用于新闻数据中变化点探测变化点的神经总变化 2506.18764v1 |
Authors (3): Csaba Zsolnai, Niels Lörch, Julian Arnold
Detecting when public discourse shifts in response to major events is crucial for understanding societal dynamics. Real-world data is high-dimensional, sparse, and noisy, making changepoint detection in this domain a challenging endeavor. In this paper, we leverage neural networks for changepoint detection in news data, introducing a method based on the so-called learning-by-confusion scheme, which was originally developed for detecting phase transitions in physical systems. We train classifiers to distinguish between articles from different time periods. The resulting classification accuracy is used to estimate the total variation distance between underlying content distributions, where significant distances highlight changepoints. We demonstrate the effectiveness of this method on both synthetic datasets and real-world data from The Guardian newspaper, successfully identifying major historical events including 9/11, the COVID-19 pandemic, and presidential elections. Our approach requires minimal domain knowledge, can autonomously discover significant shifts in public discourse, and yields a quantitative measure of change in content, making it valuable for journalism, policy analysis, and crisis monitoring.
nan
Article 653
Title@2025-06-23 (1): Local Averaging Accurately Distills Manifold Structure From Noisy Data
Title: Local Averaging Accurately Distills Manifold Structure From Noisy Data | Lokale Mittelung genau destilliert manifold Struktur aus geräuschreichen Daten | 从噪音数据生成的本地蒸馏处理结构 2506.18761v1 |
Authors (5): Yihan Shen, Shiyu Wang, Arnaud Lamy, Mariam Avagyan, John Wright
High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal denoising, reconstruction, and generation. However, in practice, the manifold is typically unknown and only noisy samples are available. A fundamental approach to uncovering the manifold structure is local averaging, which is a cornerstone of state-of-the-art provable methods for manifold fitting and denoising. However, to the best of our knowledge, there are no works that rigorously analyze the accuracy of local averaging in a manifold setting in high-noise regimes. In this work, we provide theoretical analyses of a two-round mini-batch local averaging method applied to noisy samples drawn from a $d$-dimensional manifold $\mathcal M \subset \mathbb{R}^D$, under a relatively high-noise regime where the noise size is comparable to the reach $\tau$. We show that with high probability, the averaged point $\hat{\mathbf q}$ achieves the bound $d(\hat{\mathbf q}, \mathcal M) \leq \sigma \sqrt{d\left(1+\frac{\kappa\mathrm{diam}(\mathcal {M})}{\log(D)}\right)}$, where $\sigma, \mathrm{diam(\mathcal M)},\kappa$ denote the standard deviation of the Gaussian noise, manifold’s diameter and a bound on its extrinsic curvature, respectively. This is the first analysis of local averaging accuracy over the manifold in the relatively high noise regime where $\sigma \sqrt{D} \approx \tau$. The proposed method can serve as a preprocessing step for a wide range of provable methods designed for lower-noise regimes. Additionally, our framework can provide a theoretical foundation for a broad spectrum of denoising and dimensionality reduction methods that rely on local averaging techniques.
nan
Article 654
Title@2025-06-23 (1): Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017
Title: Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017 | Robuste Anomalieerkennung im Netzwerkverkehr: Bewertung von Machine Learning-Modellen auf CICIDS2017 | 网络交通中的强力异常探测:评价CICIDS2017的机械学习模式 2506.19877v1 |
Authors (2): Zhaoyang Xu, Yunbo Liu
Identifying suitable machine learning paradigms for intrusion detection remains critical for building effective and generalizable security solutions. In this study, we present a controlled comparison of four representative models - Multi-Layer Perceptron (MLP), 1D Convolutional Neural Network (CNN), One-Class Support Vector Machine (OCSVM) and Local Outlier Factor (LOF) - on the CICIDS2017 dataset under two scenarios: detecting known attack types and generalizing to previously unseen threats. Our results show that supervised MLP and CNN achieve near-perfect accuracy on familiar attacks but suffer drastic recall drops on novel attacks. Unsupervised LOF attains moderate overall accuracy and high recall on unknown threats at the cost of elevated false alarms, while boundary-based OCSVM balances precision and recall best, demonstrating robust detection across both scenarios. These findings offer practical guidance for selecting IDS models in dynamic network environments.
nan
Article 655
Title@2025-06-23 (1): SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
Title: SEAL: Scaling to Emphasize Attention for Long-Context Retrieval | SEAL: Skalierung zur Betonung der Aufmerksamkeit für die Langzeitretrieval-Retrieval | SEAL: 逐步强调对长期检索的重视 2501.15225v2 |
Authors (5): Changhun Lee, Minsang Seok, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park
While many advanced LLMs are designed to handle long sequence data, we can still observe notable quality degradation even within the sequence limit. In this work, we introduce a novel approach called Scaling to Emphasize Attention for Long-context retrieval (SEAL), which enhances the retrieval performance of large language models (LLMs) over long contexts. We observe that specific attention heads are closely tied to long-context retrieval, showing positive or negative correlation with retrieval scores, and adjusting the strength of these heads boosts the quality of LLMs in long context by a large margin. Built on this insight, we propose a learning-based mechanism that leverages generated data to emphasize these heads. By applying SEAL, we achieve significant improvements in long-context retrieval performance across various tasks and models. Additionally, when combined with existing training-free context extension techniques, SEAL extends the contextual limits of LLMs while maintaining highly reliable outputs.
nan
Article 656
Title@2025-06-23 (1): Sensitivity Analysis of Image Classification Models using Generalized Polynomial Chaos
Title: Sensitivity Analysis of Image Classification Models using Generalized Polynomial Chaos | Sensitivitätsanalyse von Bildklassifikationsmodellen mit Generalized Polynomial Chaos | 利用普遍化的多面性混乱现象分析图像分类模型的敏感性分析 2506.18751v1 |
Authors (5): Lukas Bahr, Lucas Poßner, Konstantin Weise, Sophie Gröger, Rüdiger Daub
Integrating advanced communication protocols in production has accelerated the adoption of data-driven predictive quality methods, notably machine learning (ML) models. However, ML models in image classification often face significant uncertainties arising from model, data, and domain shifts. These uncertainties lead to overconfidence in the classification model’s output. To better understand these models, sensitivity analysis can help to analyze the relative influence of input parameters on the output. This work investigates the sensitivity of image classification models used for predictive quality. We propose modeling the distributional domain shifts of inputs with random variables and quantifying their impact on the model’s outputs using Sobol indices computed via generalized polynomial chaos (GPC). This approach is validated through a case study involving a welding defect classification problem, utilizing a fine-tuned ResNet18 model and an emblem classification model used in BMW Group production facilities.
nan
Article 657
Title@2025-06-23 (1): ContinualFlow: Learning and Unlearning with Neural Flow Matching
Title: ContinualFlow: Learning and Unlearning with Neural Flow Matching | ContinualFlow: Lernen und Nichtlernen mit neuralem Fluss passend | 连续花:与神经流动匹配学习和不学习 2506.18747v1 |
Authors (3): Lorenzo Simone, Davide Bacciu, Shuangge Ma
We introduce ContinualFlow, a principled framework for targeted unlearning in generative models via Flow Matching. Our method leverages an energy-based reweighting loss to softly subtract undesired regions of the data distribution without retraining from scratch or requiring direct access to the samples to be unlearned. Instead, it relies on energy-based proxies to guide the unlearning process. We prove that this induces gradients equivalent to Flow Matching toward a soft mass-subtracted target, and validate the framework through experiments on 2D and image domains, supported by interpretable visualizations and quantitative evaluations.
nan
Article 658
Title@2025-06-23 (1): Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression
Title: Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression | Schnelles State-Augmented-Lernen für drahtlose Ressourcenallokation mit Dual Variable Regression | 以双重变量递减为无线资源分配快速国家强化学习 2506.18748v1 |
Authors (4): Yigit Berkay Uslu, Navid NaderiAlizadeh, Mark Eisen, Alejandro Ribeiro
We consider resource allocation problems in multi-user wireless networks, where the goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We demonstrate how a state-augmented graph neural network (GNN) parametrization for the resource allocation policy circumvents the drawbacks of the ubiquitous dual subgradient methods by representing the network configurations (or states) as graphs and viewing dual variables as dynamic inputs to the model, viewed as graph signals supported over the graphs. Lagrangian maximizing state-augmented policies are learned during the offline training phase, and the dual variables evolve through gradient updates while executing the learned state-augmented policies during the inference phase. Our main contributions are to illustrate how near-optimal initialization of dual multipliers for faster inference can be accomplished with dual variable regression, leveraging a secondary GNN parametrization, and how maximization of the Lagrangian over the multipliers sampled from the dual descent dynamics substantially improves the training of state-augmented models. We demonstrate the superior performance of the proposed algorithm with extensive numerical experiments in a case study of transmit power control. Finally, we prove a convergence result and an exponential probability bound on the excursions of the dual function (iterate) optimality gaps.
nan
Article 659
Title@2025-06-23 (1): DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation
Title: DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation | DiffDesign: Steuerbare Diffusion mit Meta Prior für effiziente Interior Design Generation | DiffDign: 有效内部设计设计前可控制的Meta扩散 2411.16301v3 |
Authors (2): Yuxuan Yang, Tao Geng
Interior design is a complex and creative discipline involving aesthetics, functionality, ergonomics, and materials science. Effective solutions must meet diverse requirements, typically producing multiple deliverables such as renderings and design drawings from various perspectives. Consequently, interior design processes are often inefficient and demand significant creativity. With advances in machine learning, generative models have emerged as a promising means of improving efficiency by creating designs from text descriptions or sketches. However, few generative works focus on interior design, leading to substantial discrepancies between outputs and practical needs, such as differences in size, spatial scope, and the lack of controllable generation quality. To address these challenges, we propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation. Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone. We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency. Simultaneously, we construct an interior design-specific dataset, DesignHelper, consisting of over 400 solutions across more than 15 spatial types and 15 design styles. This dataset helps fine-tune DiffDesign. Extensive experiments conducted on various benchmark datasets demonstrate the effectiveness and robustness of DiffDesign.
nan
Article 660
Title@2025-06-23 (1): Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments
Title: Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments | Experimentieren, schnell und langsam: Bayesische Optimierung langfristiger Ergebnisse mit Online-Experimenten | 实验、快速和缓慢:利用在线实验优化长期成果 2506.18744v1 |
Authors (5): Qing Feng, Samuel Dalton, Benjamin Letham, Maximilian Balandat, Eytan Bakshy
Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies–which typically involve several iterations–can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.
nan
Article 661
Title@2025-06-23 (1): On the Existence of Universal Simulators of Attention
Title: On the Existence of Universal Simulators of Attention | Über die Existenz universeller Simulatoren der Aufmerksamkeit | 全世界关注模拟器的存在 2506.18739v1 |
Authors (4): Debanjan Dutta, Faizanuddin Ansari, Anish Chakrabarty, Swagatam Das
Prior work on the learnability of transformers has established its capacity to approximate specific algorithmic patterns through training under restrictive architectural assumptions. Fundamentally, these arguments remain data-driven and therefore can only provide a probabilistic guarantee. Expressivity, on the contrary, has theoretically been explored to address the problems \emph{computable} by such architecture. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can transformer architectures exactly simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder’s ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to identically replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. Our proofs, for the first time, show the existence of an algorithmically achievable data-agnostic solution, previously known to be approximated only by learning.
nan
Article 662
Title@2025-06-23 (1): Towards Group Fairness with Multiple Sensitive Attributes in Federated Foundation Models
Title: Towards Group Fairness with Multiple Sensitive Attributes in Federated Foundation Models | Auf dem Weg zu Gruppengerechtigkeit mit mehreren sensiblen Attributen in Federated Foundation Models | 争取在联邦基金会模式中实现多敏感属性集团公平 2506.18732v1 |
Authors (5): Yuning Yang, Han Yu, Tianrun Gao, Xiaodong Xu, Guangyu Wang
The deep integration of foundation models (FM) with federated learning (FL) enhances personalization and scalability for diverse downstream tasks, making it crucial in sensitive domains like healthcare. Achieving group fairness has become an increasingly prominent issue in the era of federated foundation models (FFMs), since biases in sensitive attributes might lead to inequitable treatment for under-represented demographic groups. Existing studies mostly focus on achieving fairness with respect to a single sensitive attribute. This renders them unable to provide clear interpretability of dependencies among multiple sensitive attributes which is required to achieve group fairness. Our paper takes the first attempt towards a causal analysis of the relationship between group fairness across various sensitive attributes in the FFM. We extend the FFM structure to trade off multiple sensitive attributes simultaneously and quantify the causal effect behind the group fairness through causal discovery and inference. Extensive experiments validate its effectiveness, offering insights into interpretability towards building trustworthy and fair FFM systems.
nan
Article 663
Title@2025-06-23 (1): When to Forget? Complexity Trade-offs in Machine Unlearning
Title: When to Forget? Complexity Trade-offs in Machine Unlearning | Wann vergessen? Komplexität Trade-offs in Machine Unlearning | 何时忘却? 机器不学习的复杂权衡取舍 2502.17323v2 |
Authors (4): Martin Van Waerebeke, Marco Lorenzi, Giovanni Neglia, Kevin Scaman
Machine Unlearning (MU) aims at removing the influence of specific data points from a trained model, striving to achieve this at a fraction of the cost of full model retraining. In this paper, we analyze the efficiency of unlearning methods and establish the first upper and lower bounds on minimax computation times for this problem, characterizing the performance of the most efficient algorithm against the most difficult objective function. Specifically, for strongly convex objective functions and under the assumption that the forget data is inaccessible to the unlearning method, we provide a phase diagram for the unlearning complexity ratio – a novel metric that compares the computational cost of the best unlearning method to full model retraining. The phase diagram reveals three distinct regimes: one where unlearning at a reduced cost is infeasible, another where unlearning is trivial because adding noise suffices, and a third where unlearning achieves significant computational advantages over retraining. These findings highlight the critical role of factors such as data dimensionality, the number of samples to forget, and privacy constraints in determining the practical feasibility of unlearning.
nan
Article 664
Title@2025-06-23 (1): Learning interpretable positional encodings in transformers depends on initialization
Title: Learning interpretable positional encodings in transformers depends on initialization | Das Lernen interpretierbarer Positionskodierungen in Transformatoren hängt von der Initialisierung ab | 变压器中学习可解释的位置编码取决于初始化 2406.08272v4 |
Authors (6): Takuya Ito, Luca Cocchi, Tim Klinger, Parikshit Ram, Murray Campbell, Luke Hearne
In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input sequences, such as those presented in natural language, where adjacent tokens (e.g., words) are highly related. In contrast, many real world tasks involve datasets with highly non-trivial positional arrangements, such as datasets organized in multiple spatial dimensions, or datasets for which ground truth positions are not known. Here we find that the choice of initialization of a learnable PE greatly influences its ability to learn interpretable PEs that lead to enhanced generalization. We empirically demonstrate our findings in three experiments: 1) A 2D relational reasoning task; 2) A nonlinear stochastic network simulation; 3) A real world 3D neuroscience dataset, applying interpretability analyses to verify the learning of accurate PEs. Overall, we find that a learned PE initialized from a small-norm distribution can 1) uncover interpretable PEs that mirror ground truth positions in multiple dimensions, and 2) lead to improved generalization. These results illustrate the feasibility of learning identifiable and interpretable PEs for enhanced generalization.
nan
Article 665
Title@2025-06-23 (1): Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition
Title: Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition | Einschließlich semantischer Informationen über Word-Embeddings für skeletonbasierte Aktionserkennung | 包括通过单词嵌入嵌入式提供语义信息,促进基于Sleton的行动确认 2506.18721v1 |
Authors (4): Dustin Aganian, Erik Franze, Markus Eisenbach, Horst-Michael Gross
Effective human action recognition is widely used for cobots in Industry 4.0 to assist in assembly tasks. However, conventional skeleton-based methods often lose keypoint semantics, limiting their effectiveness in complex interactions. In this work, we introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information. Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects. Through extensive experiments on multiple assembly datasets, we demonstrate that our approach significantly improves classification performance, and enhances generalization capabilities by simultaneously supporting different skeleton types and object classes. Our findings highlight the potential of incorporating semantic information to enhance skeleton-based action recognition in dynamic and diverse environments.
nan
Article 666
Title@2025-06-23 (1): Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation
Title: Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation | Multimodaler Ankerverteiler mit Wissensdestillation zur Emotionserkennung im Gespräch | 具有知识蒸馏的多式锁定器变异器,在对话中承认情感 2506.18716v1 |
Authors (4): Jie Li, Shifei Ding, Lili Guo, Xuan Li
Emotion Recognition in Conversation (ERC) aims to detect the emotions of individual utterances within a conversation. Generating efficient and modality-specific representations for each utterance remains a significant challenge. Previous studies have proposed various models to integrate features extracted using different modality-specific encoders. However, they neglect the varying contributions of modalities to this task and introduce high complexity by aligning modalities at the frame level. To address these challenges, we propose the Multi-modal Anchor Gated Transformer with Knowledge Distillation (MAGTKD) for the ERC task. Specifically, prompt learning is employed to enhance textual modality representations, while knowledge distillation is utilized to strengthen representations of weaker modalities. Furthermore, we introduce a multi-modal anchor gated transformer to effectively integrate utterance-level representations across modalities. Extensive experiments on the IEMOCAP and MELD datasets demonstrate the effectiveness of knowledge distillation in enhancing modality representations and achieve state-of-the-art performance in emotion recognition. Our code is available at: https://github.com/JieLi-dd/MAGTKD.
nan
Article 667
Title@2025-06-23 (1): PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations
Title: PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations | PC-SRGAN: Physikalisch konsistente Super-Resolution Generatives Adversarial Network für allgemeine Transientensimulationen | PC-SRGAN: 通用中转模拟器实际一致的超分辨率生成反反向网络 2505.06502v2 |
Authors (4): Md Rakibul Hasan, Pouria Behnoudfar, Dan MacKinlay, Thomas Poulet
Machine Learning, particularly Generative Adversarial Networks (GANs), has revolutionised Super Resolution (SR). However, generated images often lack physical meaningfulness, which is essential for scientific applications. Our approach, PC-SRGAN, enhances image resolution while ensuring physical consistency for interpretable simulations. PC-SRGAN significantly improves both the Peak Signal-to-Noise Ratio and the Structural Similarity Index Measure compared to conventional methods, even with limited training data (e.g., only 13% of training data required for SRGAN). Beyond SR, PC-SRGAN augments physically meaningful machine learning, incorporating numerically justified time integrators and advanced quality metrics. These advancements promise reliable and causal machine-learning models in scientific domains. A significant advantage of PC-SRGAN over conventional SR techniques is its physical consistency, which makes it a viable surrogate model for time-dependent problems. PC-SRGAN advances scientific machine learning, offering improved accuracy and efficiency for image processing, enhanced process understanding, and broader applications to scientific research. We publicly release the complete source code at https://github.com/hasan-rakibul/PC-SRGAN.
nan
Article 668
Title@2025-06-23 (1): Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition
Title: Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition | Kontext Biasing für Aussprachen-Orthographie Missverhältnis in der automatischen Spracherkennung | 自动语音识别中出现偏差以引发-正正对学误差的背景情况 2506.18703v1 |
Authors (2): Christian Huber, Alexander Waibel
Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, acronyms, or domain-specific special words. To address this problem, many context biasing methods have been proposed; however, for words with a pronunciation-orthography mismatch, these methods may still struggle. We propose a method which allows corrections of substitution errors to improve the recognition accuracy of such challenging words. Users can add corrections on the fly during inference. We show that with this method we get a relative improvement in biased word error rate of up to 11\%, while maintaining a competitive overall word error rate.
nan
Article 669
Title@2025-06-23 (1): SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding
Title: SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding | SaGIF: Individuelle Fairness in Graphen-Neuralen Netzwerken durch Ähnlichkeitskodierung verbessern | SaGIF:通过相似编码提高图形神经网络的个人公平性 2506.18696v1 |
Authors (5): Yuchang Zhu, Jintang Li, Huizhe Zhang, Liang Chen, Zibin Zheng
Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar individuals. To bridge these gaps, we conduct a preliminary analysis to explore the underlying reason for individual unfairness and observe correlations between IF and similarity consistency, a concept introduced to evaluate the discrepancy in identifying similar individuals based on graph structure versus node features. Inspired by our observations, we introduce two metrics to assess individual similarity from two distinct perspectives: topology fusion and feature fusion. Building upon these metrics, we propose Similarity-aware GNNs for Individual Fairness, named SaGIF. The key insight behind SaGIF is the integration of individual similarities by independently learning similarity representations, leading to an improvement of IF in GNNs. Our experiments on several real-world datasets validate the effectiveness of our proposed metrics and SaGIF. Specifically, SaGIF consistently outperforms state-of-the-art IF methods while maintaining utility performance. Code is available at: https://github.com/ZzoomD/SaGIF.
nan
Article 670
Title@2025-06-23 (1): BAnG: Bidirectional Anchored Generation for Conditional RNA Design
Title: BAnG: Bidirectional Anchored Generation for Conditional RNA Design | BAnG: Bidirektionale Anchored Generation für Conditional RNA Design | BANG: 有条件的RNA设计双向导导导导导导导出代 2502.21274v2 |
Authors (3): Roman Klypa, Alberto Bietti, Sergei Grudinin
Designing RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Existing computational approaches require a substantial amount of previously known interacting RNA sequences for each specific protein or a detailed knowledge of RNA structure, restricting their utility in practice. To address this limitation, we develop RNA-BAnG, a deep learning-based model designed to generate RNA sequences for protein interactions without these requirements. Central to our approach is a novel generative method, Bidirectional Anchored Generation (BAnG), which leverages the observation that protein-binding RNA sequences often contain functional binding motifs embedded within broader sequence contexts. We first validate our method on generic synthetic tasks involving similar localized motifs to those appearing in RNAs, demonstrating its benefits over existing generative approaches. We then evaluate our model on biological sequences, showing its effectiveness for conditional RNA sequence design given a binding protein.
nan
Article 671
Title@2025-06-23 (1): One Step Diffusion via Shortcut Models
Title: One Step Diffusion via Shortcut Models | Ein Schritt Diffusion über Shortcut-Modelle | 通过快捷键模型进行单步扩散 2410.12557v3 |
Authors (4): Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel
Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
nan
Article 672
Title@2025-06-23 (1): VesselGPT: Autoregressive Modeling of Vascular Geometry
Title: VesselGPT: Autoregressive Modeling of Vascular Geometry | SchiffGPT: Autoregressive Modellierung der Gefäßgeometrie | SelGPT: 血管几何自动递减建模 2505.13318v2 |
Authors (5): Paula Feldman, Martin Sinnona, Claudio Delrieux, Viviana Siless, Emmanuel Iarussi
Anatomical trees are critical for clinical diagnosis and treatment planning, yet their complex and diverse geometry make accurate representation a significant challenge. Motivated by the latest advances in large language models, we introduce an autoregressive method for synthesizing anatomical trees. Our approach first embeds vessel structures into a learned discrete vocabulary using a VQ-VAE architecture, then models their generation autoregressively with a GPT-2 model. This method effectively captures intricate geometries and branching patterns, enabling realistic vascular tree synthesis. Comprehensive qualitative and quantitative evaluations reveal that our technique achieves high-fidelity tree reconstruction with compact discrete representations. Moreover, our B-spline representation of vessel cross-sections preserves critical morphological details that are often overlooked in previous’ methods parameterizations. To the best of our knowledge, this work is the first to generate blood vessels in an autoregressive manner. Code is available at https://github.com/LIA-DiTella/VesselGPT-MICCAI.
nan
Article 673
Title@2025-06-23 (1): A Random Matrix Analysis of In-context Memorization for Nonlinear Attention
Title: A Random Matrix Analysis of In-context Memorization for Nonlinear Attention | Eine zufällige Matrixanalyse der In-Kontext-Memorisierung für nichtlineare Aufmerksamkeit | 用于非线性关注的非线性关注的 内流记忆化随机矩阵分析 2506.18656v1 |
Authors (5): Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling
Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theoretical understanding of Attentions, especially in the nonlinear setting, remains limited. In this paper, we provide a precise characterization of the \emph{in-context memorization error} of \emph{nonlinear Attention}, in the high-dimensional proportional regime where the number of input tokens $n$ and their embedding dimension $p$ are both large and comparable. Leveraging recent advances in the theory of large kernel random matrices, we show that nonlinear Attention typically incurs higher memorization error than linear ridge regression on random inputs. However, this gap vanishes, and can even be reversed, when the input exhibits statistical structure, particularly when the Attention weights align with the input signal direction. Our results reveal how nonlinearity and input structure interact with each other to govern the memorization performance of nonlinear Attention. The theoretical insights are supported by numerical experiments.
nan
Article 674
Title@2025-06-23 (1): Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning
Title: Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning | Enge Verallgemeinerungsfehler-Bounds für stochastische Gradient Descent in Non-convex-Lernen | 非节流学习中 Stopchactic Gradient Emple 的紧度一般误差弹道界 2506.18645v1 |
Authors (4): Wenjun Xiong, Juan Ding, Xinlei Zuo, Qizhai Li
Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD’s generalization properties is crucial for ensuring robust model performance on unseen data. In this paper, we analyze the generalization error bounds of SGD for non-convex learning by introducing the Type II perturbed SGD (T2pm-SGD), which accommodates both sub-Gaussian and bounded loss functions. The generalization error bound is decomposed into two components: the trajectory term and the flatness term. Our analysis improves the trajectory term to $O(n^{-1})$, significantly enhancing the previous $O((nb)^{-1/2})$ bound for bounded losses, where n is the number of training samples and b is the batch size. By selecting an optimal variance for the perturbation noise, the overall bound is further refined to $O(n^{-2/3})$. For sub-Gaussian loss functions, a tighter trajectory term is also achieved. In both cases, the flatness term remains stable across iterations and is smaller than those reported in previous literature, which increase with iterations. This stability, ensured by T2pm-SGD, leads to tighter generalization error bounds for both loss function types. Our theoretical results are validated through extensive experiments on benchmark datasets, including MNIST and CIFAR-10, demonstrating the effectiveness of T2pm-SGD in establishing tighter generalization bounds.
nan
Article 675
Title@2025-06-23 (1): On Union-Closedness of Language Generation
Title: On Union-Closedness of Language Generation | Zur Unions-Schließung der Sprachgenerierung | 关于联合语言一代的关闭 2506.18642v1 |
Authors (4): Steve Hanneke, Amin Karbasi, Anay Mehrotra, Grigoris Velegkas
We investigate language generation in the limit - a model by Kleinberg and Mullainathan [NeurIPS 2024] and extended by Li, Raman, and Tewari [COLT 2025]. While Kleinberg and Mullainathan proved generation is possible for all countable collections, Li et al. defined a hierarchy of generation notions (uniform, non-uniform, and generatable) and explored their feasibility for uncountable collections. Our first set of results resolve two open questions of Li et al. by proving finite unions of generatable or non-uniformly generatable classes need not be generatable. These follow from a stronger result: there is a non-uniformly generatable class and a uniformly generatable class whose union is non-generatable. This adds to the aspects along which language generation in the limit is different from traditional tasks in statistical learning theory like classification, which are closed under finite unions. In particular, it implies that given two generators for different collections, one cannot combine them to obtain a single “more powerful” generator, prohibiting this notion of boosting. Our construction also addresses a third open question of Li et al. on whether there are uncountable classes that are non-uniformly generatable and do not satisfy the eventually unbounded closure (EUC) condition introduced by Li, Raman, and Tewari. Our approach utilizes carefully constructed classes along with a novel diagonalization argument that could be of independent interest in the growing area of language generation.
nan
Article 676
Title@2025-06-23 (1): Federated Loss Exploration for Improved Convergence on Non-IID Data
Title: Federated Loss Exploration for Improved Convergence on Non-IID Data | Föderated Loss Exploration für verbesserte Konvergenz auf nicht-IID-Daten | 改进关于非IID数据的趋同的联邦损失探索 2506.18640v1 |
Authors (4): Christian Internò, Markus Olhofer, Yaochu Jin, Barbara Hammer
Federated learning (FL) has emerged as a groundbreaking paradigm in machine learning (ML), offering privacy-preserving collaborative model training across diverse datasets. Despite its promise, FL faces significant hurdles in non-identically and independently distributed (non-IID) data scenarios, where most existing methods often struggle with data heterogeneity and lack robustness in performance. This paper introduces Federated Loss Exploration (FedLEx), an innovative approach specifically designed to tackle these challenges. FedLEx distinctively addresses the shortcomings of existing FL methods in non-IID settings by optimizing its learning behavior for scenarios in which assumptions about data heterogeneity are impractical or unknown. It employs a federated loss exploration technique, where clients contribute to a global guidance matrix by calculating gradient deviations for model parameters. This matrix serves as a strategic compass to guide clients’ gradient updates in subsequent FL rounds, thereby fostering optimal parameter updates for the global model. FedLEx effectively navigates the complex loss surfaces inherent in non-IID data, enhancing knowledge transfer in an efficient manner, since only a small number of epochs and small amount of data are required to build a strong global guidance matrix that can achieve model convergence without the need for additional data sharing or data distribution statics in a large client scenario. Our extensive experiments with state-of-the art FL algorithms demonstrate significant improvements in performance, particularly under realistic non-IID conditions, thus highlighting FedLEx’s potential to overcome critical barriers in diverse FL applications.
nan
Article 677
Title@2025-06-23 (1): Granular-Ball-Induced Multiple Kernel K-Means
Title: Granular-Ball-Induced Multiple Kernel K-Means | Granular-Ball-induzierter Mehrfach-Kernel K-Means | 颗粒球制导多核心K-Myans 2506.18637v1 |
Authors (4): Shuyin Xia, Yifan Wang, Lifeng Shen, Guoyin Wang
Most existing multi-kernel clustering algorithms, such as multi-kernel K-means, often struggle with computational efficiency and robustness when faced with complex data distributions. These challenges stem from their dependence on point-to-point relationships for optimization, which can lead to difficulty in accurately capturing data sets’ inherent structure and diversity. Additionally, the intricate interplay between multiple kernels in such algorithms can further exacerbate these issues, effectively impacting their ability to cluster data points in high-dimensional spaces. In this paper, we leverage granular-ball computing to improve the multi-kernel clustering framework. The core of granular-ball computing is to adaptively fit data distribution by balls from coarse to acceptable levels. Each ball can enclose data points based on a density consistency measurement. Such ball-based data description thus improves the computational efficiency and the robustness to unknown noises. Specifically, based on granular-ball representations, we introduce the granular-ball kernel (GBK) and its corresponding granular-ball multi-kernel K-means framework (GB-MKKM) for efficient clustering. Using granular-ball relationships in multiple kernel spaces, the proposed GB-MKKM framework shows its superiority in efficiency and clustering performance in the empirical evaluation of various clustering tasks.
nan
Article 678
Title@2025-06-23 (1): Trustworthy Prediction with Gaussian Process Knowledge Scores
Title: Trustworthy Prediction with Gaussian Process Knowledge Scores | Vertrauenswürdige Vorhersage mit Gaussian Prozess Wissen Scores | 高斯进程知识分数的可信赖的预测 2506.18630v1 |
Authors (4): Kurt Butler, Guanchao Feng, Tong Chen, Petar Djuric
Probabilistic models are often used to make predictions in regions of the data space where no observations are available, but it is not always clear whether such predictions are well-informed by previously seen data. In this paper, we propose a knowledge score for predictions from Gaussian process regression (GPR) models that quantifies the extent to which observing data have reduced our uncertainty about a prediction. The knowledge score is interpretable and naturally bounded between 0 and 1. We demonstrate in several experiments that the knowledge score can anticipate when predictions from a GPR model are accurate, and that this anticipation improves performance in tasks such as anomaly detection, extrapolation, and missing data imputation. Source code for this project is available online at https://github.com/KurtButler/GP-knowledge.
nan
Article 679
Title@2025-06-23 (1): On Equivariant Model Selection through the Lens of Uncertainty
Title: On Equivariant Model Selection through the Lens of Uncertainty | Bei gleicher Modellauswahl durch das Lens of Uncertainty | 通过不确定性的镜头进行等同模型选择 2506.18629v1 |
Authors (4): Putri A. van der Linden, Alexander Timans, Dharmesh Tailor, Erik J. Bekkers
Equivariant models leverage prior knowledge on symmetries to improve predictive performance, but misspecified architectural constraints can harm it instead. While work has explored learning or relaxing constraints, selecting among pretrained models with varying symmetry biases remains challenging. We examine this model selection task from an uncertainty-aware perspective, comparing frequentist (via Conformal Prediction), Bayesian (via the marginal likelihood), and calibration-based measures to naive error-based evaluation. We find that uncertainty metrics generally align with predictive performance, but Bayesian model evidence does so inconsistently. We attribute this to a mismatch in Bayesian and geometric notions of model complexity, and discuss possible remedies. Our findings point towards the potential of uncertainty in guiding symmetry-aware model selection.
nan
Article 680
Title@2025-06-23 (1): Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits
Title: Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits | Multi-Agenten-Verstärkungs-Lernen für Inverses Design in photonischen integrierten Schaltungen | 光感集成电路反设计多机构强化学习 2506.18627v1 |
Authors (6): Yannik Mahlau, Maximilian Schier, Christoph Reinders, Frederik Schubert, Marco Bügling, Bodo Rosenhahn
Inverse design of photonic integrated circuits (PICs) has traditionally relied on gradientbased optimization. However, this approach is prone to end up in local minima, which results in suboptimal design functionality. As interest in PICs increases due to their potential for addressing modern hardware demands through optical computing, more adaptive optimization algorithms are needed. We present a reinforcement learning (RL) environment as well as multi-agent RL algorithms for the design of PICs. By discretizing the design space into a grid, we formulate the design task as an optimization problem with thousands of binary variables. We consider multiple two- and three-dimensional design tasks that represent PIC components for an optical computing system. By decomposing the design space into thousands of individual agents, our algorithms are able to optimize designs with only a few thousand environment samples. They outperform previous state-of-the-art gradient-based optimization in both twoand three-dimensional design tasks. Our work may also serve as a benchmark for further exploration of sample-efficient RL for inverse design in photonics.
nan
Article 681
Title@2025-06-23 (1): Bures-Wasserstein Flow Matching for Graph Generation
Title: Bures-Wasserstein Flow Matching for Graph Generation | Bures-Wasserstein-Durchfluss passend für die Graphenerzeugung | Bures-Wasserstein 图表生成匹配流程 2506.14020v2 |
Authors (4): Keyue Jiang, Jiahao Cui, Xiaowen Dong, Laura Toni
Graph generation has emerged as a critical task in fields ranging from molecule design to drug discovery. Contemporary approaches, notably diffusion and flow-based models, have achieved solid graph generative performance through constructing a probability path that interpolates between a reference distribution and the data distribution. However, these methods typically model the evolution of individual nodes and edges independently and use linear interpolations to build the path assuming that the data lie in Euclidean space. We show that this is suboptimal given the intrinsic non-Euclidean structure and interconnected patterns of graphs, and it poses risks to the sampling convergence. To build a better probability path, we model the joint evolution of the nodes and edges by representing graphs as connected systems parameterized by Markov random fields (MRF). We then leverage the optimal transport displacement between MRF objects to design the probability path for graph generation. Based on this, we introduce BWFlow, a flow-matching framework for graph generation that respects the underlying geometry of graphs and provides smooth velocities in the probability path. The novel framework can be adapted to both continuous and discrete flow-matching algorithms. Experimental evaluations in plain graph generation and 2D/3D molecule generation validate the effectiveness of BWFlow in graph generation with competitive performance, stable training, and guaranteed sampling convergence.
nan
Article 682
Title@2025-06-23 (1): Pr{é}diction optimale pour un mod{è}le ordinal {à} covariables fonctionnelles
Title: Pr{é}diction optimale pour un mod{è}le ordinal {à} covariables fonctionnelles | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | nan 2506.18615v1 |
Authors (3): Simón Weinberger, Jairo Cugliari, Aurélie Le Cain
We present a prediction framework for ordinal models: we introduce optimal predictions using loss functions and give the explicit form of the Least-Absolute-Deviation prediction for these models. Then, we reformulate an ordinal model with functional covariates to a classic ordinal model with multiple scalar covariates. We illustrate all the proposed methods and try to apply these to a dataset collected by EssilorLuxottica for the development of a control algorithm for the shade of connected glasses.
nan
Article 683
Title@2025-06-23 (1): Policy gradient methods for ordinal policies
Title: Policy gradient methods for ordinal policies | Politikgradientenmethoden für Ordinalpolitiken | 通常政策的政策梯度方法 2506.18614v1 |
Authors (2): Simón Weinberger, Jairo Cugliari
In reinforcement learning, the softmax parametrization is the standard approach for policies over discrete action spaces. However, it fails to capture the order relationship between actions. Motivated by a real-world industrial problem, we propose a novel policy parametrization based on ordinal regression models adapted to the reinforcement learning setting. Our approach addresses practical challenges, and numerical experiments demonstrate its effectiveness in real applications and in continuous action tasks, where discretizing the action space and applying the ordinal policy yields competitive performance.
nan
Article 684
Title@2025-06-23 (1): SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
Title: SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer | SHAMANS: Klanglokalisierung mit hybrider Alpha-stabiler Raummessung und neuralem Steerer | SHAMANS: 与混合阿尔法稳定空间测量和神经传感器的稳妥本地化 2506.18954v1 |
Authors (5): Diego Di Carlo, Mathieu Fontaine, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
This paper describes a sound source localization (SSL) technique that combines an $\alpha$-stable model for the observed signal with a neural network-based approach for modeling steering vectors. Specifically, a physics-informed neural network, referred to as Neural Steerer, is used to interpolate measured steering vectors (SVs) on a fixed microphone array. This allows for a more robust estimation of the so-called $\alpha$-stable spatial measure, which represents the most plausible direction of arrival (DOA) of a target signal. As an $\alpha$-stable model for the non-Gaussian case ($\alpha$ $\in$ (0, 2)) theoretically defines a unique spatial measure, we choose to leverage it to account for residual reconstruction error of the Neural Steerer in the downstream tasks. The objective scores indicate that our proposed technique outperforms state-of-the-art methods in the case of multiple sound sources.
nan
Article 685
Title@2025-06-23 (1): Simulation-Free Differential Dynamics through Neural Conservation Laws
Title: Simulation-Free Differential Dynamics through Neural Conservation Laws | Simulationsfreie Differentialdynamik durch neurale Erhaltungsgesetze | 通过神经保护法实现无模拟-无差异动态 2506.18604v1 |
Authors (3): Mengjian Hua, Eric Vanden-Eijnden, Ricky T. Q. Chen
We present a novel simulation-free framework for training continuous-time diffusion processes over very general objective functions. Existing methods typically involve either prescribing the optimal diffusion process – which only works for heavily restricted problem formulations – or require expensive simulation to numerically obtain the time-dependent densities and sample from the diffusion process. In contrast, we propose a coupled parameterization which jointly models a time-dependent density function, or probability path, and the dynamics of a diffusion process that generates this probability path. To accomplish this, our approach directly bakes in the Fokker-Planck equation and density function requirements as hard constraints, by extending and greatly simplifying the construction of Neural Conservation Laws. This enables simulation-free training for a large variety of problem formulations, from data-driven objectives as in generative modeling and dynamical optimal transport, to optimality-based objectives as in stochastic optimal control, with straightforward extensions to mean-field objectives due to the ease of accessing exact density functions. We validate our method in a diverse range of application domains from modeling spatio-temporal events to learning optimal dynamics from population data.
nan
Article 686
Title@2025-06-23 (1): BulletGen: Improving 4D Reconstruction with Bullet-Time Generation
Title: BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | BulletGen: Verbesserung der 4D-Rekonstruktion mit Bullet-Time-Generation | BulletGen: 改进4D重建与代代代代代代代代代代代代代代代 2506.18601v1 |
Authors (5): Denys Rozumnyi, Jonathon Luiten, Numair Khan, Johannes Schönberger, Peter Kontschieder
Transforming casually captured, monocular videos into fully immersive dynamic experiences is a highly ill-posed task, and comes with significant challenges, e.g., reconstructing unseen regions, and dealing with the ambiguity in monocular depth estimation. In this work we introduce BulletGen, an approach that takes advantage of generative models to correct errors and complete missing information in a Gaussian-based dynamic scene representation. This is done by aligning the output of a diffusion-based video generation model with the 4D reconstruction at a single frozen “bullet-time” step. The generated frames are then used to supervise the optimization of the 4D Gaussian model. Our method seamlessly blends generative content with both static and dynamic scene components, achieving state-of-the-art results on both novel-view synthesis, and 2D/3D tracking tasks.
nan
Article 687
Title@2025-06-23 (1): No Training Wheels: Steering Vectors for Bias Correction at Inference Time
Title: No Training Wheels: Steering Vectors for Bias Correction at Inference Time | Keine Trainingsräder: Lenk-Vektoren für Bias-Korrektur zur Inferenzzeit | 无培训轮:推论时间比亚更正指导矢量 2506.18598v1 |
Authors (3): Aviral Gupta, Armaan Sethi, Ameesh Sethi
Neural network classifiers trained on datasets with uneven group representation often inherit class biases and learn spurious correlations. These models may perform well on average but consistently fail on atypical groups. For example, in hair color classification, datasets may over-represent females with blond hair, reinforcing stereotypes. Although various algorithmic and data-centric methods have been proposed to address such biases, they often require retraining or significant compute. In this work, we propose a cheap, training-free method inspired by steering vectors used to edit behaviors in large language models. We compute the difference in mean activations between majority and minority groups to define a “bias vector,” which we subtract from the model’s residual stream. This leads to reduced classification bias and improved worst-group accuracy. We explore multiple strategies for extracting and applying these vectors in transformer-like classifiers, showing that steering vectors, traditionally used in generative models, can also be effective in classification. More broadly, we showcase an extremely cheap, inference time, training free method to mitigate bias in classification models.
nan
Article 688
Title@2025-06-23 (1): SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds
Title: SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds | SpaNN: Erkennung mehrerer Adversarial Patches auf CNNs durch Spanning Saliency Thresholds | SPANN: 透过透视阈值在CNN上检测多反向补丁 2506.18591v1 |
Authors (3): Mauricio Byrd Victorica, György Dán, Henrik Sandberg
State-of-the-art convolutional neural network models for object detection and image classification are vulnerable to physically realizable adversarial perturbations, such as patch attacks. Existing defenses have focused, implicitly or explicitly, on single-patch attacks, leaving their sensitivity to the number of patches as an open question or rendering them computationally infeasible or inefficient against attacks consisting of multiple patches in the worst cases. In this work, we propose SpaNN, an attack detector whose computational complexity is independent of the expected number of adversarial patches. The key novelty of the proposed detector is that it builds an ensemble of binarized feature maps by applying a set of saliency thresholds to the neural activations of the first convolutional layer of the victim model. It then performs clustering on the ensemble and uses the cluster features as the input to a classifier for attack detection. Contrary to existing detectors, SpaNN does not rely on a fixed saliency threshold for identifying adversarial regions, which makes it robust against white box adversarial attacks. We evaluate SpaNN on four widely used data sets for object detection and classification, and our results show that SpaNN outperforms state-of-the-art defenses by up to 11 and 27 percentage points in the case of object detection and the case of image classification, respectively. Our code is available at https://github.com/gerkbyrd/SpaNN.
nan
Article 689
Title@2025-06-23 (1): Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks
Title: Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks | Optimierungs-induzierte Dynamik der Lipschitz-Kontinuität in neuralen Netzwerken | 神经网络中利普西茨连续性的优化-引导动态 2506.18588v1 |
Authors (5): Róisín Luo, James McDermott, Christian Gagné, Qiang Sun, Colm O’Riordan
Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations; yet its dynamics (i.e. temporal evolution) during training remains under-explored. We present a rigorous mathematical framework to model the temporal evolution of Lipschitz continuity during training with stochastic gradient descent (SGD). This framework leverages a system of stochastic differential equations (SDEs) to capture both deterministic and stochastic forces. Our theoretical analysis identifies three principal factors driving the evolution: (i) the projection of gradient flows, induced by the optimization dynamics, onto the operator-norm Jacobian of parameter matrices; (ii) the projection of gradient noise, arising from the randomness in mini-batch sampling, onto the operator-norm Jacobian; and (iii) the projection of the gradient noise onto the operator-norm Hessian of parameter matrices. Furthermore, our theoretical framework sheds light on such as how noisy supervision, parameter initialization, batch size, and mini-batch sampling trajectories, among other factors, shape the evolution of the Lipschitz continuity of neural networks. Our experimental results demonstrate strong agreement between the theoretical implications and the observed behaviors.
nan
Article 690
Title@2025-06-23 (1): Radio Map Prediction from Aerial Images and Application to Coverage Optimization
Title: Radio Map Prediction from Aerial Images and Application to Coverage Optimization | Radio Map Vorhersage von Luftbildern und Anwendung in die Reichweite Optimierung | 从空中图像和应用于最佳报道优化的无线电地图预测 2410.17264v2 |
Authors (3): Fabian Jaensch, Giuseppe Caire, Begüm Demir
Several studies have explored deep learning algorithms to predict large-scale signal fading, or path loss, in urban communication networks. The goal is to replace costly measurement campaigns, inaccurate statistical models, or computationally expensive ray-tracing simulations with machine learning models that deliver quick and accurate predictions. We focus on predicting path loss radio maps using convolutional neural networks, leveraging aerial images alone or in combination with supplementary height information. Notably, our approach does not rely on explicit classification of environmental objects, which is often unavailable for most locations worldwide. While the prediction of radio maps using complete 3D environmental data is well-studied, the use of only aerial images remains under-explored. We address this gap by showing that state-of-the-art models developed for existing radio map datasets can be effectively adapted to this task. Additionally, we introduce a new model dubbed UNetDCN that achieves on par or better performance compared to the state-of-the-art with reduced complexity. The trained models are differentiable, and therefore they can be incorporated in various network optimization algorithms. While an extensive discussion is beyond this paper’s scope, we demonstrate this through an example optimizing the directivity of base stations in cellular networks via backpropagation to enhance coverage.
nan
Article 691
Title@2025-06-23 (1): Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning
Title: Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning | Effiziente Strahlauswahl für ISAC im zellfreien Massiv MIMO über digitales Twin Assisted Deep Reinforcement Learning | 通过数字双互助深层强化学习,在无细胞大规模MIMO中高效选择ISAC 2506.18560v1 |
Authors (5): Jiexin Zhang, Shu Xu, Chunguo Li, Yongming Huang, Luxi Yang
Beamforming enhances signal strength and quality by focusing energy in specific directions. This capability is particularly crucial in cell-free integrated sensing and communication (ISAC) systems, where multiple distributed access points (APs) collaborate to provide both communication and sensing services. In this work, we first derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints, and then formulate the beam selection procedure as a Markov decision process (MDP). We establish a deep reinforcement learning (DRL) framework, in which reward shaping and sinusoidal embedding are introduced to facilitate agent learning. To eliminate the high costs and associated risks of real-time agent-environment interactions, we further propose a novel digital twin (DT)-assisted offline DRL approach. Different from traditional online DRL, a conditional generative adversarial network (cGAN)-based DT module, operating as a replica of the real world, is meticulously designed to generate virtual state-action transition pairs and enrich data diversity, enabling offline adjustment of the agent’s policy. Additionally, we address the out-of-distribution issue by incorporating an extra penalty term into the loss function design. The convergency of agent-DT interaction and the upper bound of the Q-error function are theoretically derived. Numerical results demonstrate the remarkable performance of our proposed approach, which significantly reduces online interaction overhead while maintaining effective beam selection across diverse conditions including strict false alarm control, low signal-to-noise ratios, and high target velocities.
nan
Article 692
Title@2025-06-23 (1): Soft decision trees for survival analysis
Title: Soft decision trees for survival analysis | Weiche Entscheidungsbäume für die Überlebensanalyse | 用于生存分析的软决策树 2506.16846v2 |
Authors (3): Antonio Consolo, Edoardo Amaldi, Emilio Carrizosa
Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.
nan
Article 693
Title@2025-06-23 (1): Accurate early detection of Parkinson’s disease from SPECT imaging through Convolutional Neural Networks
Title: Accurate early detection of Parkinson’s disease from SPECT imaging through Convolutional Neural Networks | Präzise Früherkennung der Parkinson-Krankheit durch SPECT-Bildgebung durch konvolutionäre neurale Netzwerke | 通过进化神经网络从SPECT成像中准确早期检测帕金森病 2412.05348v2 |
Authors (1): R. Prashanth
Early and accurate detection of Parkinson’s disease (PD) is a crucial diagnostic challenge carrying immense clinical significance, for effective treatment regimens and patient management. For instance, a group of subjects termed SWEDD who are clinically diagnosed as PD, but show normal Single Photon Emission Computed Tomography (SPECT) scans, change their diagnosis as non-PD after few years of follow up, and in the meantime, they are treated with PD medications which do more harm than good. In this work, machine learning models are developed using features from SPECT images to detect early PD and SWEDD subjects from normal. These models were observed to perform with high accuracy. It is inferred from the study that these diagnostic models carry potential to help PD clinicians in the diagnostic process
nan
Article 694
Title@2025-06-23 (1): AutoPDL: Automatic Prompt Optimization for LLM Agents
Title: AutoPDL: Automatic Prompt Optimization for LLM Agents | AutoPDL: Automatische Prompt-Optimierung für LLM-Agenten | AAUPDL:LLM代理器自动快速优化 2504.04365v2 |
Authors (4): Claudio Spiess, Mandana Vaziri, Louis Mandel, Martin Hirzel
The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and specific to a given LLM and task. Therefore, this paper proposes AutoPDL, an automated approach to discovering good LLM agent configurations. Our approach frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and seven LLMs (ranging from 3B to 70B parameters) show consistent accuracy gains ($9.06\pm15.3$ percentage points), up to 68.9pp, and reveal that selected prompting strategies vary across models and tasks.
nan
Article 695
Title@2025-06-23 (1): Hidden Breakthroughs in Language Model Training
Title: Hidden Breakthroughs in Language Model Training | Versteckte Durchbrüche im Sprachmodelltraining | 语言模式培训中的隐藏突破 2506.15872v2 |
Authors (3): Sara Kangaslahti, Elan Rosenfeld, Naomi Saphra
Loss curves are smooth during most of model training, so visible discontinuities stand out as possible conceptual breakthroughs. Studying these breakthroughs enables a deeper understanding of learning dynamics, but only when they are properly identified. This paper argues that similar breakthroughs occur frequently throughout training but they are obscured by a loss metric that collapses all variation into a single scalar. To find these hidden transitions, we introduce POLCA, a method for decomposing changes in loss along arbitrary bases of the low-rank training subspace. We use our method to identify clusters of samples that share similar changes in loss during training, disaggregating the overall loss into that of smaller groups of conceptually similar data. We validate our method on synthetic arithmetic and natural language tasks, showing that POLCA recovers clusters that represent interpretable breakthroughs in the model’s capabilities. We demonstrate the promise of these hidden phase transitions as a tool for unsupervised interpretability.
nan
Article 696
Title@2025-06-23 (1): Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning
Title: Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning | Transformer-Weltmodell für Proben Effizientes Mehr-Agenten-Verstärkungs-Lernen | 取样效率高的多机构强化学习世界模式 2506.18537v1 |
Authors (3): Azad Deihim, Eduardo Alonso, Dimitra Apostolopoulou
We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model designed for multi-agent reinforcement learning in both vector- and image-based environments. MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module, enabling agents to model and anticipate the behavior of others under partial observability. To address non-stationarity, we incorporate a prioritized replay mechanism that trains the world model on recent experiences, allowing it to adapt to agents’ evolving policies. We evaluated MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot. MATWM achieves state-of-the-art performance, outperforming both model-free and prior world model approaches, while demonstrating strong sample efficiency, achieving near-optimal performance in as few as 50K environment interactions. Ablation studies confirm the impact of each component, with substantial gains in coordination-heavy tasks.
nan
Article 697
Title@2025-06-23 (1): Affordable AI Assistants with Knowledge Graph of Thoughts
Title: Affordable AI Assistants with Knowledge Graph of Thoughts | Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken | 具有知识思想知识图的负担得起的AI助理 2504.02670v4 |
Authors (18): Maciej Besta, Lorenzo Paleari, Jia Hao Andrea Jiang, Robert Gerstenberger, You Wu, Jón Gunnar Hannesson, Patrick Iff, Ales Kubicek, Piotr Nyczyk, Diana Khimey, Nils Blach, Haiqiang Zhang, Tao Zhang, Peiran Ma, Grzegorz Kwaśniewski, Marcin Copik, Hubert Niewiadomski, Torsten Hoefler
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively while also minimizing bias and noise. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini. Moreover, harnessing a smaller model dramatically reduces operational costs by over 36x compared to GPT-4o. Improvements for other models (e.g., Qwen2.5-32B and Deepseek-R1-70B) and benchmarks (e.g., SimpleQA) are similar. KGoT offers a scalable, affordable, versatile, and high-performing solution for AI assistants.
nan
Article 698
Title@2025-06-23 (1): Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
Title: Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning | Multi-Stage-Manipulation mit Demonstrations-Augmented Reward, Politik und World Model Learning | 以示范性奖励、政策和世界示范学习模式进行多层次处理 2503.01837v2 |
Authors (5): Adrià López Escoriza, Nicklas Hansen, Stone Tao, Tongzhou Mu, Hao Su
Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable subgoals. In this work, we propose DEMO3, a framework that exploits this structure for efficient learning from visual inputs. Specifically, our approach incorporates multi-stage dense reward learning, a bi-phasic training scheme, and world model learning into a carefully designed demonstration-augmented RL framework that strongly mitigates the challenge of exploration in long-horizon tasks. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks compared to state-of-the-art approaches. We validate this across 16 sparse-reward tasks spanning four domains, including challenging humanoid visual control tasks using as few as five demonstrations.
nan
Article 699
Title@2025-06-23 (1): End-to-End Spoken Grammatical Error Correction
Title: End-to-End Spoken Grammatical Error Correction | End-to-End-Spoken Grammatical Error Correction | 端端到端口语语语法错误校正 2506.18532v1 |
Authors (5): Mengjie Qian, Rao Ma, Stefano Bannò, Mark J. F. Gales, Kate M. Knill
Grammatical Error Correction (GEC) and feedback play a vital role in supporting second language (L2) learners, educators, and examiners. While written GEC is well-established, spoken GEC (SGEC), aiming to provide feedback based on learners’ speech, poses additional challenges due to disfluencies, transcription errors, and the lack of structured input. SGEC systems typically follow a cascaded pipeline consisting of Automatic Speech Recognition (ASR), disfluency detection, and GEC, making them vulnerable to error propagation across modules. This work examines an End-to-End (E2E) framework for SGEC and feedback generation, highlighting challenges and possible solutions when developing these systems. Cascaded, partial-cascaded and E2E architectures are compared, all built on the Whisper foundation model. A challenge for E2E systems is the scarcity of GEC labeled spoken data. To address this, an automatic pseudo-labeling framework is examined, increasing the training data from 77 to over 2500 hours. To improve the accuracy of the SGEC system, additional contextual information, exploiting the ASR output, is investigated. Candidate feedback of their mistakes is an essential step to improving performance. In E2E systems the SGEC output must be compared with an estimate of the fluent transcription to obtain the feedback. To improve the precision of this feedback, a novel reference alignment process is proposed that aims to remove hypothesised edits that results from fluent transcription errors. Finally, these approaches are combined with an edit confidence estimation approach, to exclude low-confidence edits. Experiments on the in-house Linguaskill (LNG) corpora and the publicly available Speak & Improve (S&I) corpus show that the proposed approaches significantly boost E2E SGEC performance.
nan
Article 700
Title@2025-06-23 (1): A Set-to-Set Distance Measure in Hyperbolic Space
Title: A Set-to-Set Distance Measure in Hyperbolic Space | Eine eingestellte Distanzmessung im Hyperbolischen Raum | 超曲向空间的定位到 set- set 距离测量 2506.18529v1 |
Authors (9): Pengxiang Li, Wei Wu, Zhi Gao, Xiaomeng Fan, Peilin Yu, Yuwei Wu, Zhipeng Lu, Yunde Jia, Mehrtash Harandi
We propose a hyperbolic set-to-set distance measure for computing dissimilarity between sets in hyperbolic space. While point-to-point distances in hyperbolic space effectively capture hierarchical relationships between data points, many real-world applications require comparing sets of hyperbolic data points, where the local structure and the global structure of the sets carry crucial semantic information. The proposed the \underline{h}yperbolic \underline{s}et-\underline{to}-\underline{s}et \underline{d}istance measure (HS2SD) integrates both global and local structural information: global structure through geodesic distances between Einstein midpoints of hyperbolic sets, and local structure through topological characteristics of the two sets. To efficiently compute topological differences, we prove that using a finite Thue-Morse sequence of degree and adjacency matrices can serve as a robust approximation to capture the topological structure of a set. In this case, by considering the topological differences, HS2SD provides a more nuanced understanding of the relationships between two hyperbolic sets. Empirical evaluation on entity matching, standard image classification, and few-shot image classification demonstrates that our distance measure outperforms existing methods by effectively modeling the hierarchical and complex relationships inherent in hyperbolic sets.
nan
Article 701
Title@2025-06-23 (1): Federated Learning from Molecules to Processes: A Perspective
Title: Federated Learning from Molecules to Processes: A Perspective | Föderiertes Lernen von Molekülen zu Prozessen: Eine Perspektive | 从分子到过程的联邦学习:视角 2506.18525v1 |
Authors (2): Jan G. Rittig, Clemens Kortmann
We present a perspective on federated learning in chemical engineering that envisions collaborative efforts in machine learning (ML) developments within the chemical industry. Large amounts of chemical and process data are proprietary to chemical companies and are therefore locked in data silos, hindering the training of ML models on large data sets in chemical engineering. Recently, the concept of federated learning has gained increasing attention in ML research, enabling organizations to jointly train machine learning models without disclosure of their individual data. We discuss potential applications of federated learning in several fields of chemical engineering, from the molecular to the process scale. In addition, we apply federated learning in two exemplary case studies that simulate practical scenarios of multiple chemical companies holding proprietary data sets: (i) prediction of binary mixture activity coefficients with graph neural networks and (ii) system identification of a distillation column with autoencoders. Our results indicate that ML models jointly trained with federated learning yield significantly higher accuracy than models trained by each chemical company individually and can perform similarly to models trained on combined datasets from all companies. Federated learning has therefore great potential to advance ML models in chemical engineering while respecting corporate data privacy, making it promising for future industrial applications.
nan
Article 702
Title@2025-06-23 (1): DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling
Title: DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling | DDOT: Ein Derivativ-gerichteter Dual-Decoder-Normaldifferentialgleichungstransformator für dynamische Systemmodellierung | DDOT: 用于动态系统建模的衍生式双向双向脱coder普通差异等同变换器 2506.18522v1 |
Authors (5): Yang Chang, Kuang-Da Wang, Ping-Chun Hsieh, Cheng-Kuan Lin, Wen-Chih Peng
Uncovering the underlying ordinary differential equations (ODEs) that govern dynamic systems is crucial for advancing our understanding of complex phenomena. Traditional symbolic regression methods often struggle to capture the temporal dynamics and intervariable correlations inherent in ODEs. ODEFormer, a state-of-the-art method for inferring multidimensional ODEs from single trajectories, has made notable progress. However, its focus on single-trajectory evaluation is highly sensitive to initial starting points, which may not fully reflect true performance. To address this, we propose the divergence difference metric (DIV-diff), which evaluates divergence over a grid of points within the target region, offering a comprehensive and stable analysis of the variable space. Alongside, we introduce DDOT (Derivative-Directed Dual-Decoder Ordinary Differential Equation Transformer), a transformer-based model designed to reconstruct multidimensional ODEs in symbolic form. By incorporating an auxiliary task predicting the ODE’s derivative, DDOT effectively captures both structure and dynamic behavior. Experiments on ODEBench show DDOT outperforms existing symbolic regression methods, achieving an absolute improvement of 4.58% and 1.62% in $P(R^2 > 0.9)$ for reconstruction and generalization tasks, respectively, and an absolute reduction of 3.55% in DIV-diff. Furthermore, DDOT demonstrates real-world applicability on an anesthesia dataset, highlighting its practical impact.
nan
Article 703
Title@2025-06-23 (1): Machine-learning based high-bandwidth magnetic sensing
Title: Machine-learning based high-bandwidth magnetic sensing | Machine-Learning-basierte High-Bandwidth-Magnet-Sensoring | 基于机械学习的高带宽磁遥感 2409.12820v2 |
Authors (5): Galya Haim, Stefano Martina, John Howell, Nir Bar-Gill, Filippo Caruso
Recent years have seen significant growth of quantum technologies, and specifically quantum sensing, both in terms of the capabilities of advanced platforms and their applications. One of the leading platforms in this context is nitrogen-vacancy (NV) color centers in diamond, providing versatile, high-sensitivity, and high-spatial-resolution magnetic sensing. Nevertheless, current schemes for spin resonance magnetic sensing (as applied by NV quantum sensing) suffer from tradeoffs associated with sensitivity, dynamic range, and bandwidth. Here we address this issue, and implement machine learning tools to enhance NV magnetic sensing in terms of the sensitivity/bandwidth tradeoff in large dynamic range scenarios. Our results indicate a potential reduction of required data points by at least a factor of 3, while maintaining the current error level. Our results promote quantum machine learning protocols for sensing applications towards more feasible and efficient quantum technologies.
nan
Article 704
Title@2025-06-23 (1): Theoretical guarantees for neural estimators in parametric statistics
Title: Theoretical guarantees for neural estimators in parametric statistics | Theoretische Garantien für neuronale Schätzer in der parametrischen Statistik | 参数统计中神经测算员的理论保障 2506.18508v1 |
Authors (3): Almut Rödder, Manuel Hentschel, Sebastian Engelke
Neural estimators are simulation-based estimators for the parameters of a family of statistical models, which build a direct mapping from the sample to the parameter vector. They benefit from the versatility of available network architectures and efficient training methods developed in the field of deep learning. Neural estimators are amortized in the sense that, once trained, they can be applied to any new data set with almost no computational cost. While many papers have shown very good performance of these methods in simulation studies and real-world applications, so far no statistical guarantees are available to support these observations theoretically. In this work, we study the risk of neural estimators by decomposing it into several terms that can be analyzed separately. We formulate easy-to-check assumptions ensuring that each term converges to zero, and we verify them for popular applications of neural estimators. Our results provide a general recipe to derive theoretical guarantees also for broader classes of architectures and estimation problems.
nan
Article 705
Title@2025-06-23 (1): Indeterminate Probability Theory
Title: Indeterminate Probability Theory | Unbestimmte Wahrscheinlichkeitstheorie | 不确定概率理论 2303.11536v2 |
Authors (11): Tao Yang, Chuang Liu, Xiaofeng Ma, Weijia Lu, Ning Wu, Bingyang Li, Zhifei Yang, Peng Liu, Lin Sun, Xiaodong Zhang, Can Zhang
Complex continuous or mixed joint distributions (e.g., P(Y | z_1, z_2, …, z_N)) generally lack closed-form solutions, often necessitating approximations such as MCMC. This paper proposes Indeterminate Probability Theory (IPT), which makes the following contributions: (1) An observer-centered framework in which experimental outcomes are represented as distributions combining ground truth with observation error; (2) The introduction of three independence candidate axioms that enable a two-phase probabilistic inference framework; (3) The derivation of closed-form solutions for arbitrary complex joint distributions under this framework. Both the Indeterminate Probability Neural Network (IPNN) model and the non-neural multivariate time series forecasting application demonstrate IPT’s effectiveness in modeling high-dimensional distributions, with successful validation up to 1000 dimensions. Importantly, IPT is consistent with classical probability theory and subsumes the frequentist equation in the limit of vanishing observation error. |
nan
Article 706
Title@2025-06-23 (1): PuckTrick: A Library for Making Synthetic Data More Realistic
Title: PuckTrick: A Library for Making Synthetic Data More Realistic | PuckTrick: Eine Bibliothek, um synthetische Daten realistischer zu machen | PuckTrick:一个使合成数据更加现实的图书馆 2506.18499v1 |
Authors (3): Alessandra Agostini, Andrea Maurino, Blerina Spahiu
The increasing reliance on machine learning (ML) models for decision-making requires high-quality training data. However, access to real-world datasets is often restricted due to privacy concerns, proprietary restrictions, and incomplete data availability. As a result, synthetic data generation (SDG) has emerged as a viable alternative, enabling the creation of artificial datasets that preserve the statistical properties of real data while ensuring privacy compliance. Despite its advantages, synthetic data is often overly clean and lacks real-world imperfections, such as missing values, noise, outliers, and misclassified labels, which can significantly impact model generalization and robustness. To address this limitation, we introduce Pucktrick, a Python library designed to systematically contaminate synthetic datasets by introducing controlled errors. The library supports multiple error types, including missing data, noisy values, outliers, label misclassification, duplication, and class imbalance, offering a structured approach to evaluating ML model resilience under real-world data imperfections. Pucktrick provides two contamination modes: one for injecting errors into clean datasets and another for further corrupting already contaminated datasets. Through extensive experiments on real-world financial datasets, we evaluate the impact of systematic data contamination on model performance. Our findings demonstrate that ML models trained on contaminated synthetic data outperform those trained on purely synthetic, error-free data, particularly for tree-based and linear models such as SVMs and Extra Trees.
nan
Article 707
Title@2025-06-23 (1): SPoRt – Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL
Title: SPoRt – Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL | SPORt – Safe Policy Ratio: Zertifizierte Schulung und Bereitstellung von Task-Richtlinien in modellfreier RL | SPORT – – 安全政策比率:无模式RL中任务政策的认证培训和部署 2504.06386v2 |
Authors (3): Jacques Cloete, Nikolaus Vertovec, Alessandro Abate
To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work, we present theoretical results that place a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setting. This bound, based on a maximum policy ratio computed with respect to a ‘safe’ base policy, can also be applied to temporally-extended properties (beyond safety) and to robust control problems. To utilize these results, we introduce SPoRt, which provides a data-driven method for computing this bound for the base policy using the scenario approach, and includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. SPoRt thus enables users to trade off safety guarantees against task-specific performance. Complementing our theoretical results, we present experimental results demonstrating this trade-off and comparing the theoretical bound to posterior bounds derived from empirical violation rates.
nan
Article 708
Title@2025-06-23 (1): Leveraging neural network interatomic potentials for a foundation model of chemistry
Title: Leveraging neural network interatomic potentials for a foundation model of chemistry | Nutzung von interatomaren Potenzialen für ein Grundlagenmodell der Chemie | 为化学基础模型发挥神经网络互动潜力 2506.18497v1 |
Authors (3): So Yeon Kim, Yang Jeong Park, Ju Li
Large-scale foundation models, including neural network interatomic potentials (NIPs) in computational materials science, have demonstrated significant potential. However, despite their success in accelerating atomistic simulations, NIPs face challenges in directly predicting electronic properties and often require coupling to higher-scale models or extensive simulations for macroscopic properties. Machine learning (ML) offers alternatives for structure-to-property mapping but faces trade-offs: feature-based methods often lack generalizability, while deep neural networks require significant data and computational power. To address these trade-offs, we introduce HackNIP, a two-stage pipeline that leverages pretrained NIPs. This method first extracts fixed-length feature vectors (embeddings) from NIP foundation models and then uses these embeddings to train shallow ML models for downstream structure-to-property predictions. This study investigates whether such a hybridization approach, by ``hacking” the NIP, can outperform end-to-end deep neural networks, determines the dataset size at which this transfer learning approach surpasses direct fine-tuning of the NIP, and identifies which NIP embedding depths yield the most informative features. HackNIP is benchmarked on Matbench, evaluated for data efficiency, and tested on diverse tasks including \textit{ab initio}, experimental, and molecular properties. We also analyze how embedding depth impacts performance. This work demonstrates a hybridization strategy to overcome ML trade-offs in materials science, aiming to democratize high-performance predictive modeling.
nan
Article 709
Title@2025-06-23 (1): Disentangling representations of retinal images with generative models
Title: Disentangling representations of retinal images with generative models | Entwirrende Darstellungen von retinalen Bildern mit generativen Modellen | 用基因模型拆分视视视像图像的形状 2402.19186v3 |
Authors (4): Sarah Müller, Lisa M. Koch, Hendrik P. A. Lensch, Philipp Berens
Retinal fundus images play a crucial role in the early detection of eye diseases. However, the impact of technical factors on these images can pose challenges for reliable AI applications in ophthalmology. For example, large fundus cohorts are often confounded by factors like camera type, bearing the risk of learning shortcuts rather than the causal relationships behind the image generation process. Here, we introduce a population model for retinal fundus images that effectively disentangles patient attributes from camera effects, enabling controllable and highly realistic image generation. To achieve this, we propose a disentanglement loss based on distance correlation. Through qualitative and quantitative analyses, we show that our models encode desired information in disentangled subspaces and enable controllable image generation based on the learned subspaces, demonstrating the effectiveness of our disentanglement loss. The project’s code is publicly available: https://github.com/berenslab/disentangling-retinal-images.
nan
Article 710
Title@2025-06-23 (1): AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing
Title: AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing | AnalogNAS-Bench: Ein NAS-Benchmark für analoges In-Memory Computing | AnalogNAS-Bench:NAS模拟计算基准 2506.18495v1 |
Authors (4): Aniss Bessalah, Hatem Mohamed Abdelmoumen, Karima Benatchba, Hadjer Benmeziane
Analog In-memory Computing (AIMC) has emerged as a highly efficient paradigm for accelerating Deep Neural Networks (DNNs), offering significant energy and latency benefits over conventional digital hardware. However, state-of-the-art neural networks are not inherently designed for AIMC, as they fail to account for its unique non-idealities. Neural Architecture Search (NAS) is thus needed to systematically discover neural architectures optimized explicitly for AIMC constraints. However, comparing NAS methodologies and extracting insights about robust architectures for AIMC requires a dedicated NAS benchmark that explicitly accounts for AIMC-specific hardware non-idealities. To address this, we introduce AnalogNAS-Bench, the first NAS benchmark tailored specifically for AIMC. Our study reveals three key insights: (1) standard quantization techniques fail to capture AIMC-specific noises, (2) robust architectures tend to feature wider and branched blocks, (3) skip connections improve resilience to temporal drift noise. These insights highlight the limitations of current NAS benchmarks for AIMC and pave the way for future analog-aware NAS. All the implementations used in this paper can be found at https://github.com/IBM/analog-nas/tree/main/analognasbench.
nan
Article 711
Title@2025-06-23 (1): Reliability-Adjusted Prioritized Experience Replay
Title: Reliability-Adjusted Prioritized Experience Replay | Reliability-Adjusted Prioritized Experience Replay | 调整了可靠性调整后确定优先经验重述 2506.18482v1 |
Authors (3): Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer
Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-5 benchmark.
nan
Article 712
Title@2025-06-23 (1): FREQuency ATTribution: Benchmarking Frequency-based Occlusion for Time Series Data
Title: FREQuency ATTribution: Benchmarking Frequency-based Occlusion for Time Series Data | FREQuency ATTription: Benchmarking Frequenzbasierte Okklusion für Zeitreihendaten | 时间序列数据基于频率的封闭性基准 2506.18481v1 |
Authors (4): Dominique Mercier, Andreas Dengel, Sheraz, Ahmed
Deep neural networks are among the most successful algorithms in terms of performance and scalability in different domains. However, since these networks are black boxes, their usability is severely restricted due to the lack of interpretability. Existing interpretability methods do not address the analysis of time-series-based networks specifically enough. This paper shows that an analysis in the frequency domain can not only highlight relevant areas in the input signal better than existing methods, but is also more robust to fluctuations in the signal. In this paper, FreqATT is presented, a framework that enables post-hoc networks to interpret time series analysis. To achieve this, the relevant different frequencies are evaluated and the signal is either filtered or the relevant input data is marked.
nan
Article 713
Title@2025-06-23 (1): LLMs on a Budget? Say HOLA
Title: LLMs on a Budget? Say HOLA | LLMs auf einem Budget? Sagen Sie HOLA | 预算LLLM 预算? 2506.18952v1 |
Authors (7): Zohaib Hasan Siddiqui, Jiechao Gao, Ebad Shabbir, Mohammad Anas Azeez, Rafiq Ali, Gautam Siddharth Kashyap, Usman Naseem
Running Large Language Models (LLMs) on edge devices is constrained by high compute and memory demands posing a barrier for real-time applications in sectors like healthcare, education, and embedded systems. Current solutions such as quantization, pruning, and retrieval-augmented generation (RAG) offer only partial optimizations and often compromise on speed or accuracy. We introduce HOLA, an end-to-end optimization framework for efficient LLM deployment. Internally, it leverages Hierarchical Speculative Decoding (HSD) for faster inference without quality loss. Externally, AdaComp-RAG adjusts retrieval complexity based on context needs. Together with LoBi, which blends structured pruning (LoRA) and quantization, HOLA delivers significant gains: 17.6% EMA on GSM8K, 10.5% MCA on ARC, and reduced latency and memory on edge devices like Jetson Nano–proving both scalable and production-ready.
nan
Article 714
Title@2025-06-23 (1): A Deep Convolutional Neural Network-Based Novel Class Balancing for Imbalance Data Segmentation
Title: A Deep Convolutional Neural Network-Based Novel Class Balancing for Imbalance Data Segmentation | Eine tiefkonvolutionäre neurale Netzwerk-basierte neuartige Klassenbalancing für Ungleichgewicht-Datensegmentierung | 以深革命神经网络为基础、基于深度神经网络的新奇分类平衡,以平衡数据分割 2506.18474v1 |
Authors (6): Atifa Kalsoom, M. A. Iftikhar, Amjad Ali, Zubair Shah, Shidin Balakrishnan, Hazrat Ali
Retinal fundus images provide valuable insights into the human eye’s interior structure and crucial features, such as blood vessels, optic disk, macula, and fovea. However, accurate segmentation of retinal blood vessels can be challenging due to imbalanced data distribution and varying vessel thickness. In this paper, we propose BLCB-CNN, a novel pipeline based on deep learning and bi-level class balancing scheme to achieve vessel segmentation in retinal fundus images. The BLCB-CNN scheme uses a Convolutional Neural Network (CNN) architecture and an empirical approach to balance the distribution of pixels across vessel and non-vessel classes and within thin and thick vessels. Level-I is used for vessel/non-vessel balancing and Level-II is used for thick/thin vessel balancing. Additionally, pre-processing of the input retinal fundus image is performed by Global Contrast Normalization (GCN), Contrast Limited Adaptive Histogram Equalization (CLAHE), and gamma corrections to increase intensity uniformity as well as to enhance the contrast between vessels and background pixels. The resulting balanced dataset is used for classification-based segmentation of the retinal vascular tree. We evaluate the proposed scheme on standard retinal fundus images and achieve superior performance measures, including an area under the ROC curve of 98.23%, Accuracy of 96.22%, Sensitivity of 81.57%, and Specificity of 97.65%. We also demonstrate the method’s efficacy through external cross-validation on STARE images, confirming its generalization ability.
nan
Article 715
Title@2025-06-23 (1): A Motivational Architecture for Open-Ended Learning Challenges in Robots
Title: A Motivational Architecture for Open-Ended Learning Challenges in Robots | Eine motivierende Architektur für offene Lernherausforderungen in Robotern | 机器人中开放式学习挑战的动力结构 2506.18454v1 |
Authors (4): Alejandro Romero, Gianluca Baldassarre, Richard J. Duro, Vieri Giuliano Santucci
Developing agents capable of autonomously interacting with complex and dynamic environments, where task structures may change over time and prior knowledge cannot be relied upon, is a key prerequisite for deploying artificial systems in real-world settings. The open-ended learning framework identifies the core challenges for creating such agents, including the ability to autonomously generate new goals, acquire the necessary skills (or curricula of skills) to achieve them, and adapt to non-stationary environments. While many existing works tackles various aspects of these challenges in isolation, few propose integrated solutions that address them simultaneously. In this paper, we introduce H-GRAIL, a hierarchical architecture that, through the use of different typologies of intrinsic motivations and interconnected learning mechanisms, autonomously discovers new goals, learns the required skills for their achievement, generates skill sequences for tackling interdependent tasks, and adapts to non-stationary environments. We tested H-GRAIL in a real robotic scenario, demonstrating how the proposed solutions effectively address the various challenges of open-ended learning.
nan
Article 716
Title@2025-06-23 (1): xInv: Explainable Optimization of Inverse Problems
Title: xInv: Explainable Optimization of Inverse Problems | xInv: Erklärbare Optimierung inverser Probleme | xInv: 反向问题的可解释优化 2506.11056v2 |
Authors (4): Sean Memery, Kevin Denamganai, Anna Kapron-King, Kartic Subr
Inverse problems are central to a wide range of fields, including healthcare, climate science, and agriculture. They involve the estimation of inputs, typically via iterative optimization, to some known forward model so that it produces a desired outcome. Despite considerable development in the explainability and interpretability of forward models, the iterative optimization of inverse problems remains largely cryptic to domain experts. We propose a methodology to produce explanations, from traces produced by an optimizer, that are interpretable by humans at the abstraction of the domain. The central idea in our approach is to instrument a differentiable simulator so that it emits natural language events during its forward and backward passes. In a post-process, we use a Language Model to create an explanation from the list of events. We demonstrate the effectiveness of our approach with an illustrative optimization problem and an example involving the training of a neural network.
nan
Article 717
Title@2025-06-23 (1): TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Title: TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning | TreeSynth: Verschiedenste Daten von Scratch über baumgeführte Subraumpartitionierung synthetisieren | TreeSynth: 通过树制辅助空间分割从 Scratch 通过树制辅助空间分隔从 Scratch 中合成多样性数据 2503.17195v2 |
Authors (10): Sheng Wang, Pengan Chen, Jingqi Zhou, Qintong Li, Jingwei Dong, Jiahui Gao, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu
Model customization necessitates high-quality and diverse datasets, but acquiring such data remains time-consuming and labor-intensive. Despite the great potential of large language models (LLMs) for data synthesis, current approaches are constrained by limited seed data, model biases, and low-variation prompts, resulting in limited diversity and biased distributions with the increase of data scales. To tackle this challenge, we introduce TREESYNTH, a tree-guided subspace-based data synthesis approach inspired by decision trees. It constructs a spatial partitioning tree to recursively divide a task-specific full data space (i.e., root node) into numerous atomic subspaces (i.e., leaf nodes) with mutually exclusive and exhaustive attributes to ensure both distinctiveness and comprehensiveness before synthesizing samples within each atomic subspace. This globally dividing-and-synthesizing method finally collects subspace samples into a comprehensive dataset, effectively circumventing repetition and space collapse to ensure the diversity of large-scale data synthesis. Furthermore, the spatial partitioning tree enables sample allocation into atomic subspaces, allowing the rebalancing of existing datasets for more balanced and comprehensive distributions. Empirically, extensive experiments across diverse benchmarks consistently demonstrate the superior data diversity, model performance, and robust scalability of TREESYNTH compared to both human-crafted datasets and peer data synthesis methods, with an average performance gain reaching 10%. Besides, the consistent improvements of TREESYNTH-balanced datasets highlight its efficacious application to redistribute existing datasets for more comprehensive coverage and the induced performance enhancement. The code is available at https://github.com/cpa2001/TreeSynth.
nan
Article 718
Title@2025-06-23 (1): LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
Title: LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently | LoRA-One: Ein-Schritt-Full Gradient könnte genug für feines Tuning von großen Sprachmodellen sein, wahrscheinlich und effizient | LORA-OI: 精巧、高效、可预见和高效的微调大语言模型的单步全步可满足需要 2502.01235v3 |
Authors (3): Yuanhe Zhang, Fanghui Liu, Yudong Chen
This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation. Code is available at: https://github.com/YuanheZ/LoRA-One.
nan
Article 719
Title@2025-06-23 (1): New Hardness Results for Low-Rank Matrix Completion
Title: New Hardness Results for Low-Rank Matrix Completion | Neue Härte-Ergebnisse für Low-Rank-Matrix-Vervollständigung | 低 Rank 矩阵补全新硬性结果 2506.18440v1 |
Authors (2): Dror Chawin, Ishay Haviv
The low-rank matrix completion problem asks whether a given real matrix with missing values can be completed so that the resulting matrix has low rank or is close to a low-rank matrix. The completed matrix is often required to satisfy additional structural constraints, such as positive semi-definiteness or a bounded infinity norm. The problem arises in various research fields, including machine learning, statistics, and theoretical computer science, and has broad real-world applications. This paper presents new $\mathsf{NP} $-hardness results for low-rank matrix completion problems. We show that for every sufficiently large integer $d$ and any real number $\varepsilon \in [ 2^{-O(d)},\frac{1}{7}]$, given a partial matrix $A$ with exposed values of magnitude at most $1$ that admits a positive semi-definite completion of rank $d$, it is $\mathsf{NP}$-hard to find a positive semi-definite matrix that agrees with each given value of $A$ up to an additive error of at most $\varepsilon$, even when the rank is allowed to exceed $d$ by a multiplicative factor of $O (\frac{1}{\varepsilon ^2 \cdot \log(1/\varepsilon)} )$. This strengthens a result of Hardt, Meka, Raghavendra, and Weitz (COLT, 2014), which applies to multiplicative factors smaller than $2$ and to $\varepsilon $ that decays polynomially in $d$. We establish similar $\mathsf{NP}$-hardness results for the case where the completed matrix is constrained to have a bounded infinity norm (rather than be positive semi-definite), for which all previous hardness results rely on complexity assumptions related to the Unique Games Conjecture. Our proofs involve a novel notion of nearly orthonormal representations of graphs, the concept of line digraphs, and bounds on the rank of perturbed identity matrices.
nan
Article 720
Title@2025-06-23 (1): Thermal Vision: Pioneering Non-Invasive Temperature Tracking in Congested Spaces
Title: Thermal Vision: Pioneering Non-Invasive Temperature Tracking in Congested Spaces | Thermische Vision: Pionierische nicht-invasive Temperaturverfolgung in überlasteten Räumen | 热远景:在拥挤空间进行最先出现的非侵入性温度跟踪 2412.00863v2 |
Authors (2): Arijit Samal, Haroon R Lone
Non-invasive temperature monitoring of individuals plays a crucial role in identifying and isolating symptomatic individuals. Temperature monitoring becomes particularly vital in settings characterized by close human proximity, often referred to as dense settings. However, existing research on non-invasive temperature estimation using thermal cameras has predominantly focused on sparse settings. Unfortunately, the risk of disease transmission is significantly higher in dense settings like movie theaters or classrooms. Consequently, there is an urgent need to develop robust temperature estimation methods tailored explicitly for dense settings. Our study proposes a non-invasive temperature estimation system that combines a thermal camera with an edge device. Our system employs YOLO models for face detection and utilizes a regression framework for temperature estimation. We evaluated the system on a diverse dataset collected in dense and sparse settings. Our proposed face detection model achieves an impressive mAP score of over 84 in both in-dataset and cross-dataset evaluations. Furthermore, the regression framework demonstrates remarkable performance with a mean square error of 0.18$^{\circ}$C and an impressive $R^2$ score of 0.96. Our experiments’ results highlight the developed system’s effectiveness, positioning it as a promising solution for continuous temperature monitoring in real-world applications. With this paper, we release our dataset and programming code publicly.
nan
Article 721
Title@2025-06-23 (1): Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Title: Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | Harmony: Ein gemeinsamer selbstüberwachter und schwach-überwachter Rahmen für das Lernen von allgemeinen visuellen Repräsentationen | 和谐:学习一般目的视觉表现的共同自我监督、弱力监督框架 2405.14239v3 |
Authors (3): Mohammed Baharoon, Jonathan Klein, Dominik L. Michels
Vision-language contrastive learning frameworks such as CLIP enable learning representations from natural language supervision and provide strong zero-shot classification capabilities. However, due to the nature of the supervisory signal in these paradigms, they lack the ability to learn localized features, leading to degraded performance on dense prediction tasks such as segmentation and detection. On the other hand, self-supervised learning methods have shown the ability to learn granular representations, complementing the high-level features in vision-language training. In this work, we present Harmony, a framework that combines vision-language training with discriminative and generative self-supervision to learn visual features that can be generalized across different downstream vision tasks. Our framework is specifically designed to work on web-scraped data by not relying on negative examples in the self-supervised learning path and addressing the one-to-one correspondence issue using soft CLIP targets generated by an EMA model. Moreover, Harmony optimizes for five different objectives simultaneously, efficiently utilizing the supervision in each data example, making it even more suited in data-constrained settings. We comprehensively evaluate Harmony across various vision downstream tasks and find that it significantly outperforms the baseline CLIP and outperforms the previously leading joint self- and weakly supervised methods, SLIP, MaskCLIP, and DetailCLIP.
nan
Article 722
Title@2025-06-23 (1): How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
Title: How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models | Wie robust ist Modellbearbeitung nach Feinsteuerung? Eine empirische Studie zu Text-zu-Bild-Diffusionsmodellen | 微调后模型编辑的力度如何? 文本到图像传播模型的经验研究 2506.18428v1 |
Authors (4): Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao
Model editing offers a low-cost technique to inject or correct a particular behavior in a pre-trained model without extensive retraining, supporting applications such as factual correction and bias mitigation. Despite this common practice, it remains unknown whether edits persist after fine-tuning or whether they are inadvertently reversed. This question has fundamental practical implications. For example, if fine-tuning removes prior edits, it could serve as a defence mechanism against hidden malicious edits. Vice versa, the unintended removal of edits related to bias mitigation could pose serious safety concerns. We systematically investigate the interaction between model editing and fine-tuning in the context of T2I diffusion models, which are known to exhibit biases and generate inappropriate content. Our study spans two T2I model families (Stable Diffusion and FLUX), two sota editing techniques, and three fine-tuning methods (DreamBooth, LoRA, and DoRA). Through an extensive empirical analysis across diverse editing tasks and evaluation metrics, our findings reveal a trend: edits generally fail to persist through fine-tuning, even when fine-tuning is tangential or unrelated to the edits. Notably, we observe that DoRA exhibits the strongest edit reversal effect. At the same time, among editing methods, UCE demonstrates greater robustness, retaining significantly higher efficacy post-fine-tuning compared to ReFACT. These findings highlight a crucial limitation in current editing methodologies, emphasizing the need for more robust techniques to ensure reliable long-term control and alignment of deployed AI systems. These findings have dual implications for AI safety: they suggest that fine-tuning could serve as a remediation mechanism for malicious edits while simultaneously highlighting the need for re-editing after fine-tuning to maintain beneficial safety and alignment properties.
nan
Article 723
Title@2025-06-23 (1): Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Title: Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models | Circuit Compositions: Erforschen von modularen Strukturen in transformerbasierten Sprachmodellen | 电路构成:探索以变换语言模式为基础的模块结构 2410.01434v3 |
Authors (3): Philipp Mondorf, Sondre Wold, Barbara Plank
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform more complex tasks. Recent advances in mechanistic interpretability have made progress in identifying $\textit{circuits}$, which represent the minimal computational subgraphs responsible for a model’s behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits $\textit{relate}$ to each other. To address this gap, we study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness. Moreover, we demonstrate that the circuits identified can be reused and combined through set operations to represent more complex functional model capabilities.
nan
Article 724
Title@2025-06-23 (1): An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets
Title: An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets | Ein erweiterter Benchmark, der den Rand der Ungewissheit für aktives Lernen in Tabellendatensätzen bestätigt | 扩大基准范围,在表格数据集中重新覆盖并肯定不确定抽样的边缘,以便积极学习 2306.08954v3 |
Authors (4): Po-Yi Lu, Yi-Jie Cheng, Chun-Liang Li, Hsuan-Tien Lin
Active Learning (AL) addresses the crucial challenge of enabling machines to efficiently gather labeled examples through strategic queries. Among the many AL strategies, Uncertainty Sampling (US) stands out as one of the most widely adopted. US queries the example(s) that the current model finds uncertain, proving to be both straightforward and effective. Despite claims in the literature suggesting superior alternatives to US, community-wide acceptance remains elusive. In fact, existing benchmarks for tabular datasets present conflicting conclusions on the continued competitiveness of US. In this study, we review the literature on AL strategies in the last decade and build the most comprehensive open-source AL benchmark to date to understand the relative merits of different AL strategies. The benchmark surpasses existing ones by encompassing a broader coverage of strategies, models, and data. Through our investigation of the conflicting conclusions in existing tabular AL benchmarks by evaluation under broad AL experimental settings, we uncover fresh insights into the often-overlooked issue of using machine learning models–model compatibility in the context of US. Specifically, we notice that adopting the different models for the querying unlabeled examples and learning tasks would degrade US’s effectiveness. Notably, our findings affirm that US maintains a competitive edge over other strategies when paired with compatible models. These findings have practical implications and provide a concrete recipe for AL practitioners, empowering them to make informed decisions when working with tabular classifications with limited labeled data. The code for this project is available on https://github.com/ariapoy/active-learning-benchmark.
nan
Article 725
Title@2025-06-23 (1): FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning for Semi-Supervised Semantic Segmentation
Title: FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning for Semi-Supervised Semantic Segmentation | FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning für semi-überwachte semantische Segmentierung | CFACLUSS: 半超声分解的模糊适应性再平衡和相矛盾的不确定性学习 2506.11142v2 |
Authors (3): Ebenezer Tarubinga, Jenifer Kalafatovich, Seong-Whan Lee
Semi-supervised semantic segmentation (SSSS) faces persistent challenges in effectively leveraging unlabeled data, such as ineffective utilization of pseudo-labels, exacerbation of class imbalance biases, and neglect of prediction uncertainty. Current approaches often discard uncertain regions through strict thresholding favouring dominant classes. To address these limitations, we introduce a holistic framework that transforms uncertainty into a learning asset through four principal components: (1) fuzzy pseudo-labeling, which preserves soft class distributions from top-K predictions to enrich supervision; (2) uncertainty-aware dynamic weighting, that modulate pixel-wise contributions via entropy-based reliability scores; (3) adaptive class rebalancing, which dynamically adjust losses to counteract long-tailed class distributions; and (4) lightweight contrastive regularization, that encourage compact and discriminative feature embeddings. Extensive experiments on benchmarks demonstrate that our method outperforms current state-of-the-art approaches, achieving significant improvements in the segmentation of under-represented classes and ambiguous regions.
nan
Article 726
Title@2025-06-23 (1): Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings
Title: Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings | Generative Modellierung von Voll-Atom-Protein-Konformationen mit Latent Diffusion auf Graph-Embeddings | 利用在图形嵌入器上延迟扩散生成全原子蛋白质变形的生成模型 2506.17064v2 |
Authors (5): Aditya Sengar, Ali Hariri, Daniel Probst, Patrick Barth, Pierre Vandergheynst
Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.
nan
Article 727
Title@2025-06-23 (1): Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning
Title: Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning | Sensorische Neuronen optimieren: Nichtlineare Aufmerksamkeitsmechanismen für beschleunigte Konvergenz in Permutations-Invarianten Neuralen Netzwerken für Verstärkungslernen | 优化感知神经中世纪:在加强学习的常变-内在神经网络中加速趋同的非线性注意机制 2506.00691v4 |
Authors (5): Junaid Muzaffar, Khubaib Ahmed, Ingo Frommholz, Zeeshan Pervez, Ahsan ul Haq
Training reinforcement learning (RL) agents often requires significant computational resources and prolonged training durations. To address this challenge, we build upon prior work that introduced a neural architecture with permutation-invariant sensory processing. We propose a modified attention mechanism that applies a non-linear transformation to the key vectors (K), producing enriched representations (K’) through a custom mapping function. This Nonlinear Attention (NLA) mechanism enhances the representational capacity of the attention layer, enabling the agent to learn more expressive feature interactions. As a result, our model achieves significantly faster convergence and improved training efficiency, while maintaining performance on par with the baseline. These results highlight the potential of nonlinear attention mechanisms to accelerate reinforcement learning without sacrificing effectiveness.
nan
Article 728
Title@2025-06-23 (1): Online high-precision prediction method for injection molding product weight by integrating time series/non-time series mixed features and feature attention mechanism
Title: Online high-precision prediction method for injection molding product weight by integrating time series/non-time series mixed features and feature attention mechanism | Online-Präzisionsvorhersageverfahren für das Gewicht des Spritzgussprodukts durch Integration von Zeitreihen/Nicht-Zeitreihen gemischte Funktionen und Feature-Aufmerksamkeitsmechanismus | 通过将时间序列/非时间序列混合特点和特征关注机制相结合,对注入模型产品重量的在线高精确度预测方法 2506.18950v1 |
Authors (5): Maoyuan Li, Sihong Li, Guancheng Shen, Yun Zhang, Huamin Zhou
To address the challenges of untimely detection and online monitoring lag in injection molding quality anomalies, this study proposes a mixed feature attention-artificial neural network (MFA-ANN) model for high-precision online prediction of product weight. By integrating mechanism-based with data-driven analysis, the proposed architecture decouples time series data (e.g., melt flow dynamics, thermal profiles) from non-time series data (e.g., mold features, pressure settings), enabling hierarchical feature extraction. A self-attention mechanism is strategically embedded during cross-domain feature fusion to dynamically calibrate inter-modality feature weights, thereby emphasizing critical determinants of weight variability. The results demonstrate that the MFA-ANN model achieves a RMSE of 0.0281 with 0.5 g weight fluctuation tolerance, outperforming conventional benchmarks: a 25.1% accuracy improvement over non-time series ANN models, 23.0% over LSTM networks, 25.7% over SVR, and 15.6% over RF models, respectively. Ablation studies quantitatively validate the synergistic enhancement derived from the integration of mixed feature modeling (contributing 22.4%) and the attention mechanism (contributing 11.2%), significantly enhancing the model’s adaptability to varying working conditions and its resistance to noise. Moreover, critical sensitivity analyses further reveal that data resolution significantly impacts prediction reliability, low-fidelity sensor inputs degrade performance by 23.8% RMSE compared to high-precision measurements. Overall, this study provides an efficient and reliable solution for the intelligent quality control of injection molding processes.
nan
Article 729
Title@2025-06-23 (1): ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction
Title: ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction | ADNF-Clustering: Adaptives und dynamisches Neuro-Fuzzy-Clustering für Leukämie-Vorhersage | ADNF-CLADNF:白血病预测适应性和动态神经结扎聚群 2506.18396v1 |
Authors (4): Marco Aruta, Ciro Listone, Giuseppe Murano, Aniello Murano
Leukemia diagnosis and monitoring rely increasingly on high-throughput image data, yet conventional clustering methods lack the flexibility to accommodate evolving cellular patterns and quantify uncertainty in real time. We introduce Adaptive and Dynamic Neuro-Fuzzy Clustering, a novel streaming-capable framework that combines Convolutional Neural Network-based feature extraction with an online fuzzy clustering engine. ADNF initializes soft partitions via Fuzzy C-Means, then continuously updates micro-cluster centers, densities, and fuzziness parameters using a Fuzzy Temporal Index (FTI) that measures entropy evolution. A topology refinement stage performs density-weighted merging and entropy-guided splitting to guard against over- and under-segmentation. On the C-NMC leukemia microscopy dataset, our tool achieves a silhouette score of 0.51, demonstrating superior cohesion and separation over static baselines. The method’s adaptive uncertainty modeling and label-free operation hold immediate potential for integration within the INFANT pediatric oncology network, enabling scalable, up-to-date support for personalized leukemia management.
nan
Article 730
Title@2025-06-23 (1): Reliable Vertical Federated Learning in 5G Core Network Architecture
Title: Reliable Vertical Federated Learning in 5G Core Network Architecture | Zuverlässiges vertikales Federated Learning in 5G Core Network Architecture | 5G核心网络架构中的可靠垂直联邦学习 2505.15244v3 |
Authors (2): Mohamad Mestoukirdi, Mourad Khanfouci
This work proposes a new algorithm to mitigate model generalization loss in Vertical Federated Learning (VFL) operating under client reliability constraints within 5G Core Networks (CNs). Recently studied and endorsed by 3GPP, VFL enables collaborative and load-balanced model training and inference across the CN. However, the performance of VFL significantly degrades when the Network Data Analytics Functions (NWDAFs) - which serve as primary clients for VFL model training and inference - experience reliability issues stemming from resource constraints and operational overhead. Unlike edge environments, CN environments adopt fundamentally different data management strategies, characterized by more centralized data orchestration capabilities. This presents opportunities to implement better distributed solutions that take full advantage of the CN data handling flexibility. Leveraging this flexibility, we propose a method that optimizes the vertical feature split among clients while centrally defining their local models based on reliability metrics. Our empirical evaluation demonstrates the effectiveness of our proposed algorithm, showing improved performance over traditional baseline methods.
nan
Article 731
Title@2025-06-23 (1): SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
Title: SLR: An Automated Synthesis Framework for Scalable Logical Reasoning | SLR: Ein automatisiertes Synthese-Framework für skalierbare logische Vernunft | SLR: 一个可缩放逻辑理由的自动合成框架 2506.15787v2 |
Authors (9): Lukas Helff, Ahmad Omar, Felix Friedrich, Wolfgang Stammer, Antonia Wüst, Tim Woydt, Rupert Mitchell, Patrick Schramowski, Kristian Kersting
We introduce SLR, an end-to-end framework for systematic evaluation and training of Large Language Models (LLMs) via Scalable Logical Reasoning. Given a user’s task specification, SLR enables scalable, automated synthesis of inductive reasoning tasks with precisely controlled difficulty. For each task, SLR synthesizes (i) a latent ground-truth rule, (ii) an executable validation program used by a symbolic judge to deterministically verify model outputs, and (iii) an instruction prompt for the reasoning task. Using SLR, we create SLR-Bench, a benchmark comprising over 19k prompts spanning 20 curriculum levels that progressively increase in relational, arithmetic, and recursive complexity. Large-scale evaluation reveals that contemporary LLMs readily produce syntactically valid rules, yet often fail at correct logical inference. Recent reasoning LLMs do somewhat better, but incur substantial increases in test-time compute, sometimes exceeding 15k completion tokens. Finally, logic-tuning via SLR doubles Llama-3-8B accuracy on SLR-Bench, achieving parity with Gemini-Flash-Thinking at a fraction of computational cost. SLR is fully automated, requires no human annotation, ensures dataset novelty, and offers a scalable environment for probing and advancing LLMs’ reasoning capabilities.
nan
Article 732
Title@2025-06-23 (1): LOGICPO: Efficient Translation of NL-based Logical Problems to FOL using LLMs and Preference Optimization
Title: LOGICPO: Efficient Translation of NL-based Logical Problems to FOL using LLMs and Preference Optimization | LOGICPO: Effiziente Übersetzung von NL-basierten Logischen Problemen in FOL mittels LLMs und Preference Optimization | LOGICPO:利用LLM和优先优化将基于NL的逻辑问题有效翻译给FOL 2506.18383v1 |
Authors (3): Koushik Viswanadha, Deepanway Ghosal, Somak Aditya
Logical reasoning is a key task for artificial intelligence due to it’s role in major downstream tasks such as Question Answering, Summarization. Recent methods in improving the reasoning ability of LLMs fall short in correctly converting a natural language reasoning problem to an equivalent logical formulation, which hinders the framework’s overall ability to reason. Towards this, we propose to use finetuning on a preference optimization dataset to learn to parse and represent a natural language problem as a whole to a consistent logical program by 1) introducing a new supervised and preference optimization dataset LogicPO, and 2) adopting popular techniques such as Direct Preference Optimization (DPO), Kahneman-Tversky optimization (KTO) to finetune open-source LLMs. Our best model with Phi-3.5 consistently outperforms GPT-3.5-turbo’s (8-shot) by producing 10% more logically correct and with 14% less syntax errors. Through the framework and our improved evaluation metrics, we offer a promising direction in improving the logical reasoning of LLMs by better representing them in their logical formulations.
nan
Article 733
Title@2025-06-23 (1): PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching
Title: PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching | PERSCEN: Lerne personalisierte Interaktionsmuster und Szenarioeinstellungen für Multi-Szenario-Matching | PERSCEN: 学习个人化互动模式和多情景匹配情景 2506.18382v1 |
Authors (8): Haotong Du, Yaqing Wang, Fei Xiong, Lei Shao, Ming Liu, Hao Gu, Quanming Yao, Zhen Wang
With the expansion of business scales and scopes on online platforms, multi-scenario matching has become a mainstream solution to reduce maintenance costs and alleviate data sparsity. The key to effective multi-scenario recommendation lies in capturing both user preferences shared across all scenarios and scenario-aware preferences specific to each scenario. However, existing methods often overlook user-specific modeling, limiting the generation of personalized user representations. To address this, we propose PERSCEN, an innovative approach that incorporates user-specific modeling into multi-scenario matching. PERSCEN constructs a user-specific feature graph based on user characteristics and employs a lightweight graph neural network to capture higher-order interaction patterns, enabling personalized extraction of preferences shared across scenarios. Additionally, we leverage vector quantization techniques to distil scenario-aware preferences from users’ behavior sequence within individual scenarios, facilitating user-specific and scenario-aware preference modeling. To enhance efficient and flexible information transfer, we introduce a progressive scenario-aware gated linear unit that allows fine-grained, low-latency fusion. Extensive experiments demonstrate that PERSCEN outperforms existing methods. Further efficiency analysis confirms that PERSCEN effectively balances performance with computational cost, ensuring its practicality for real-world industrial systems.
nan
Article 734
Title@2025-06-23 (1): Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space
Title: Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space | Ganzheitliche Physik Solver: Lernen von PDEs in einem einheitlichen Spektral-Physischen Raum | 综合物理解答器:在统一光谱物理空间学习PDE 2410.11382v2 |
Authors (3): Xihang Yue, Yi Yang, Linchao Zhu
Recent advances in operator learning have produced two distinct approaches for solving partial differential equations (PDEs): attention-based methods offering point-level adaptability but lacking spectral constraints, and spectral-based methods providing domain-level continuity priors but limited in local flexibility. This dichotomy has hindered the development of PDE solvers with both strong flexibility and generalization capability. This work introduces Holistic Physics Mixer (HPM), a simple framework that bridges this gap by integrating spectral and physical information in a unified space. HPM unifies both approaches as special cases while enabling more powerful spectral-physical interactions beyond either method alone. This enables HPM to inherit both the strong generalization of spectral methods and the flexibility of attention mechanisms while avoiding their respective limitations. Through extensive experiments across diverse PDE problems, we demonstrate that HPM consistently outperforms state-of-the-art methods in both accuracy and computational efficiency, while maintaining strong generalization capabilities with limited training data and excellent zero-shot performance on unseen resolutions.
nan
Article 735
Title@2025-06-23 (1): Persistent Sampling: Enhancing the Efficiency of Sequential Monte Carlo
Title: Persistent Sampling: Enhancing the Efficiency of Sequential Monte Carlo | Persistente Probenahme: Verbesserung der Effizienz von Sequential Monte Carlo | 持久性抽样:提高按顺序排列的蒙特卡洛的效率 2407.20722v3 |
Authors (2): Minas Karamanis, Uroš Seljak
Sequential Monte Carlo (SMC) samplers are powerful tools for Bayesian inference but suffer from high computational costs due to their reliance on large particle ensembles for accurate estimates. We introduce persistent sampling (PS), an extension of SMC that systematically retains and reuses particles from all prior iterations to construct a growing, weighted ensemble. By leveraging multiple importance sampling and resampling from a mixture of historical distributions, PS mitigates the need for excessively large particle counts, directly addressing key limitations of SMC such as particle impoverishment and mode collapse. Crucially, PS achieves this without additional likelihood evaluations-weights for persistent particles are computed using cached likelihood values. This framework not only yields more accurate posterior approximations but also produces marginal likelihood estimates with significantly lower variance, enhancing reliability in model comparison. Furthermore, the persistent ensemble enables efficient adaptation of transition kernels by leveraging a larger, decorrelated particle pool. Experiments on high-dimensional Gaussian mixtures, hierarchical models, and non-convex targets demonstrate that PS consistently outperforms standard SMC and related variants, including recycled and waste-free SMC, achieving substantial reductions in mean squared error for posterior expectations and evidence estimates, all at reduced computational cost. PS thus establishes itself as a robust, scalable, and efficient alternative for complex Bayesian inference tasks.
nan
Article 736
Title@2025-06-23 (1): Recent Trends in Artificial Intelligence Technology: A Scoping Review
Title: Recent Trends in Artificial Intelligence Technology: A Scoping Review | Jüngste Trends in der Künstlichen Intelligenz-Technologie: Eine Bewertung | 《人造情报技术的近期趋势:范围审查》 2305.04532v3 |
Authors (3): Teemu Niskanen, Tuomo Sipola, Olli Väänänen
Artificial intelligence is more ubiquitous in multiple domains. Smartphones, social media platforms, search engines, and autonomous vehicles are just a few examples of applications that utilize artificial intelligence technologies to enhance their performance. This study carries out a scoping review of the current state-of-the-art artificial intelligence technologies following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. The goal was to find the most advanced technologies used in different domains of artificial intelligence technology research. Three recognized journals were used from artificial intelligence and machine learning domain: Journal of Artificial Intelligence Research, Journal of Machine Learning Research, and Machine Learning, and articles published in 2022 were observed. Certain qualifications were laid for the technological solutions: the technology must be tested against comparable solutions, commonly approved or otherwise well justified datasets must be used while applying, and results must show improvements against comparable solutions. One of the most important parts of the technology development appeared to be how to process and exploit the data gathered from multiple sources. The data can be highly unstructured, and the technological solution should be able to utilize the data with minimum manual work from humans. The results of this review indicate that creating labeled datasets is very laborious, and solutions exploiting unsupervised or semi-supervised learning technologies are more and more researched. The learning algorithms should be able to be updated efficiently, and predictions should be interpretable. Using artificial intelligence technologies in real-world applications, safety and explainable predictions are mandatory to consider before mass adoption can occur.
nan
Article 737
Title@2025-06-23 (1): Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations
Title: Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations | Factual Knowledge in Language Models: Robustheit und Anomalien unter einfachen zeitlichen Kontextvariationen | 语言模型中的事实知识:简单时间环境变化下的强力和异常现象 2502.01220v6 |
Authors (5): Hichem Ammar Khodja, Frédéric Béchet, Quentin Brabant, Alexis Nasr, Gwénolé Lecorvé
This paper explores the robustness of language models (LMs) to variations in the temporal context within factual knowledge. It examines whether LMs can correctly associate a temporal context with a past fact valid over a defined period, by asking them to differentiate correct from incorrect contexts. The LMs’ ability to distinguish is analyzed along two dimensions: the distance of the incorrect context from the validity period and the granularity of the context. To this end, a dataset called TimeStress is introduced, enabling the evaluation of 18 diverse LMs. Results reveal that the best LM achieves a perfect distinction for only 11% of the studied facts, with errors, certainly rare, but critical that humans would not make. This work highlights the limitations of current LMs in temporal representation.
nan
Article 738
Title@2025-06-23 (1): DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Title: DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy | DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie | DipLLM: 外交战略决策的精细推荐LLM 2506.09655v2 |
Authors (6): Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
nan
Article 739
Title@2025-06-23 (1): Global Context-aware Representation Learning for Spatially Resolved Transcriptomics
Title: Global Context-aware Representation Learning for Spatially Resolved Transcriptomics | Global Context-aware Representative Learning for Spatially Resolved Transcriptomics | 空间解决中转技术学全球背景意识代表制学习 2506.15698v2 |
Authors (6): Yunhak Oh, Junseok Lee, Yeongmin Kim, Sangwoo Seo, Namkyeong Lee, Chanyoung Park
Spatially Resolved Transcriptomics (SRT) is a cutting-edge technique that captures the spatial context of cells within tissues, enabling the study of complex biological networks. Recent graph-based methods leverage both gene expression and spatial information to identify relevant spatial domains. However, these approaches fall short in obtaining meaningful spot representations, especially for spots near spatial domain boundaries, as they heavily emphasize adjacent spots that have minimal feature differences from an anchor node. To address this, we propose Spotscape, a novel framework that introduces the Similarity Telescope module to capture global relationships between multiple spots. Additionally, we propose a similarity scaling strategy to regulate the distances between intra- and inter-slice spots, facilitating effective multi-slice integration. Extensive experiments demonstrate the superiority of Spotscape in various downstream tasks, including single-slice and multi-slice scenarios. Our code is available at the following link: https: //github.com/yunhak0/Spotscape.
nan
Article 740
Title@2025-06-23 (1): RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming
Title: RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming | RePST: Sprachmodell empowered Spatio-Temporal Forecasting via Semantisch-orientierte Reprogrammierung | REPST:通过以语义为主的重新编制方案来进行语言模型增强能力SPA-时间预报 2408.14505v3 |
Authors (5): Hao Wang, Jindong Han, Wei Fan, Leilei Sun, Hao Liu
Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a semantic-oriented PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a semantic-oriented decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting. Our codes can be found at https://github.com/usail-hkust/REPST.
nan
Article 741
Title@2025-06-23 (1): SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Title: SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation | SlimMoE: Strukturierte Kompression großer MoE-Modelle über Expert Slimming und Destillation | SlimMoE:通过专家攀爬和蒸馏对大型MOE模型进行结构性压缩 2506.18349v1 |
Authors (7): Zichong Li, Chen Liang, Zixuan Zhang, Ilgee Hong, Young Jin Kim, Weizhu Chen, Tuo Zhao
The Mixture of Experts (MoE) architecture has emerged as a powerful paradigm for scaling large language models (LLMs) while maintaining inference efficiency. However, their enormous memory requirements make them prohibitively expensive to fine-tune or deploy in resource-constrained environments. To address this challenge, we introduce SlimMoE, a multi-stage compression framework for transforming large MoE models into much smaller, efficient variants without incurring the prohibitive costs of training from scratch. Our method systematically reduces parameter counts by slimming experts and transferring knowledge through intermediate stages, effectively mitigating the performance degradation common in one-shot pruning approaches. Using this framework, we compress Phi 3.5-MoE (41.9B total/6.6B activated parameters) to create Phi-mini-MoE (7.6B total/2.4B activated parameters) and Phi-tiny-MoE (3.8B total/1.1B activated parameters) using only 400B tokens–less than 10% of the original model’s training data. These compressed models can be fine-tuned on a single GPU (A100 for Phi-mini-MoE, A6000 for Phi-tiny-MoE), making them highly suitable for academic and resource-limited settings. Our experiments demonstrate that these compressed models outperform others of similar size and remain competitive with larger models. For instance, Phi-mini-MoE achieves similar or better performance to Phi-3-mini using only 2/3 of the activated parameters and yields comparable MMLU scores to Llama 3.1 8B despite having significantly lower latency. Our findings demonstrate that structured pruning combined with staged distillation offers an effective path to creating high-quality, compact MoE models, paving the way for broader adoption of MoE architectures. We make our models publicly available at https://huggingface.co/microsoft/Phi-mini-MoE-instruct and https://huggingface.co/microsoft/Phi-tiny-MoE-instruct .
nan
Article 742
Title@2025-06-23 (1): Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Title: Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration | Bohdi: Heterogene LLM Fusion mit automatischer Datenexploration | Bohdi: 与自动数据探索相混合的异基因LLM 2506.15721v2 |
Authors (8): Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi
Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM’s varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM’s performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM’s updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM’s capabilities. Our code is available at https://github.com/gjq100/Bohdi.git.
nan
Article 743
Title@2025-06-23 (1): LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots
Title: LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots | LoopSR: Looping Sim-and-Real für lebenslange politische Anpassung von Legged Robots | 环圈:为终身政策调整而环绕定形机器人终身政策 2409.17992v2 |
Authors (5): Peilin Wu, Weiji Xie, Jiahang Cao, Hang Lai, Weinan Zhang
Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy’s performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual training are derived by combining predicted values from the decoder with retrieved parameters from a pre-collected simulation trajectory dataset. By leveraging simulated continual training, LoopSR achieves superior data efficiency compared with strong baselines, yielding eminent performance with limited data in both sim-to-sim and sim-to-real experiments.
nan
Article 744
Title@2025-06-23 (1): Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control
Title: Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control | Dynamische Hybridmodellierung: Inkrementelle Identifikation und Modellvorhersagesteuerung | 动态混合模型:递增识别和模型预测控制 2506.18344v1 |
Authors (8): Adrian Caspari, Thomas Bierweiler, Sarah Fadda, Daniel Labisch, Maarten Nauta, Franzisko Wagner, Merle Warmbold, Constantinos C. Pantelides
Mathematical models are crucial for optimizing and controlling chemical processes, yet they often face significant limitations in terms of computational time, algorithm complexity, and development costs. Hybrid models, which combine mechanistic models with data-driven models (i.e. models derived via the application of machine learning to experimental data), have emerged as a promising solution to these challenges. However, the identification of dynamic hybrid models remains difficult due to the need to integrate data-driven models within mechanistic model structures. We present an incremental identification approach for dynamic hybrid models that decouples the mechanistic and data-driven components to overcome computational and conceptual difficulties. Our methodology comprises four key steps: (1) regularized dynamic parameter estimation to determine optimal time profiles for flux variables, (2) correlation analysis to evaluate relationships between variables, (3) data-driven model identification using advanced machine learning techniques, and (4) hybrid model integration to combine the mechanistic and data-driven components. This approach facilitates early evaluation of model structure suitability, accelerates the development of hybrid models, and allows for independent identification of data-driven components. Three case studies are presented to illustrate the robustness, reliability, and efficiency of our incremental approach in handling complex systems and scenarios with limited data.
nan
Article 745
Title@2025-06-23 (1): Controlled Generation with Equivariant Variational Flow Matching
Title: Controlled Generation with Equivariant Variational Flow Matching | Kontrollierte Generation mit äquivarianter Variations-Flow-Matching | 具有等同变化流动比对的受控生产 2506.18340v1 |
Authors (7): Floor Eijkelboom, Heiko Zimmermann, Sharvaree Vadgama, Erik J Bekkers, Max Welling, Christian A. Naesseth, Jan-Willem van de Meent
We derive a controlled generation objective within the framework of Variational Flow Matching (VFM), which casts flow matching as a variational inference problem. We demonstrate that controlled generation can be implemented two ways: (1) by way of end-to-end training of conditional generative models, or (2) as a Bayesian inference problem, enabling post hoc control of unconditional models without retraining. Furthermore, we establish the conditions required for equivariant generation and provide an equivariant formulation of VFM tailored for molecular generation, ensuring invariance to rotations, translations, and permutations. We evaluate our approach on both uncontrolled and controlled molecular generation, achieving state-of-the-art performance on uncontrolled generation and outperforming state-of-the-art models in controlled generation, both with end-to-end training and in the Bayesian inference setting. This work strengthens the connection between flow-based generative modeling and Bayesian inference, offering a scalable and principled framework for constraint-driven and symmetry-aware generation.
nan
Article 746
Title@2025-06-23 (1): Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics
Title: Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics | Strukturierte Kolmogorov-Arnold-Neurale ODEs für interpretierbares Lernen und symbolische Entdeckung nichtlinearer Dynamik | Kolmogorov-Arold Neal 用于口译学习和非线性动态的符号发现 2506.18339v1 |
Authors (4): Wei Liu, Kiran Bacsa, Loon Ching Tang, Eleni Chatzi
Understanding and modeling nonlinear dynamical systems is a fundamental problem across scientific and engineering domains. While deep learning has demonstrated remarkable potential for learning complex system behavior, achieving models that are both highly accurate and physically interpretable remains a major challenge. To address this, we propose Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a novel framework that integrates structured state-space modeling with the Kolmogorov-Arnold Network (KAN). SKANODE first employs a fully trainable KAN as a universal function approximator within a structured Neural ODE framework to perform virtual sensing, recovering latent states that correspond to physically interpretable quantities such as positions and velocities. Once this structured latent representation is established, we exploit the symbolic regression capability of KAN to extract compact and interpretable expressions for the system’s governing dynamics. The resulting symbolic expression is then substituted back into the Neural ODE framework and further calibrated through continued training to refine its coefficients, enhancing both the precision of the discovered equations and the predictive accuracy of system responses. Extensive experiments on both simulated and real-world systems demonstrate that SKANODE achieves superior performance while offering interpretable, physics-consistent models that uncover the underlying mechanisms of nonlinear dynamical systems.
nan
Article 747
Title@2025-06-23 (1): Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
Title: Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations? | Escaping the SpuriVerse: Können große Vision-Language-Modelle jenseits von gesehenen puriösen Korruptionen verallgemeinern? | 摆脱SpuriVerse:大型视觉语言模型能否超越表面净化的Correrations而普遍化? 2506.18322v1 |
Authors (8): Yiwei Yang, Chung Peng Lee, Shangbin Feng, Dora Zhao, Bingbing Wen, Anthony Z. Liu, Yulia Tsvetkov, Bill Howe
Finetuning can cause spurious correlations to arise between non-essential features and the target labels, but benchmarks to study these effects involve contrived settings and narrow tasks. In contrast, we consider spurious correlations in multi-modal Large Vision Language Models (LVLMs) pretrained on extensive and diverse datasets without explicit task supervision. We develop a benchmark by sourcing GPT-4o errors on real-world visual-question-answering (VQA) benchmarks, then curating a subset through LVLM-human annotation and synthetic counterfactual evaluation to identify errors caused by spurious correlations. This process yields SpuriVerse, a novel benchmark comprised of 124 distinct types of spurious correlations extracted from real-world datasets, each containing 1 realistic and 10 synthetic VQA samples for a total of 1364 multiple choice questions. We evaluate 15 open and closed-source LVLMs on SpuriVerse, finding that even state-of-the-art closed-source models struggle significantly, achieving at best only 37.1% accuracy. Fine-tuning on synthetic examples that emphasize the spurious correlation improves performance to 78.40%, suggesting that training on diverse spurious patterns generalizes to unseen situations: models appear to learn to avoid “shortcuts” and attend to the overall image context.
nan
Article 748
Title@2025-06-23 (1): A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers
Title: A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers | Ein transformerbasierter Ansatz zur Diagnose von Fehlerfällen in optischen Faserverstärkern | 光纤放大器中分析过失案例的以变换器为基础的方法 2505.06245v2 |
Authors (3): Dominic Schneider, Lutz Rapp, Christoph Ament
A transformer-based deep learning approach is presented that enables the diagnosis of fault cases in optical fiber amplifiers using condition-based monitoring time series data. The model, Inverse Triple-Aspect Self-Attention Transformer (ITST), uses an encoder-decoder architecture, utilizing three feature extraction paths in the encoder, feature-engineered data for the decoder and a self-attention mechanism. The results show that ITST outperforms state-of-the-art models in terms of classification accuracy, which enables predictive maintenance for optical fiber amplifiers, reducing network downtimes and maintenance costs.
nan
Article 749
Title@2025-06-23 (1): BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity
Title: BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity | BrainSymphony: Eine transformergetriebene Fusion von fMRI-Zeitreihen und struktureller Konnektivität | 脑交响乐:FMRI时间序列和结构连接的变异器-驱动融合 2506.18314v1 |
Authors (3): Moein Khajehnejad, Forough Habibollahi, Adeel Razi
Existing foundation models for neuroimaging are often prohibitively large and data-intensive. We introduce BrainSymphony, a lightweight, parameter-efficient foundation model that achieves state-of-the-art performance while being pre-trained on significantly smaller public datasets. BrainSymphony’s strong multimodal architecture processes functional MRI data through parallel spatial and temporal transformer streams, which are then efficiently distilled into a unified representation by a Perceiver module. Concurrently, it models structural connectivity from diffusion MRI using a novel signed graph transformer to encode the brain’s anatomical structure. These powerful, modality-specific representations are then integrated via an adaptive fusion gate. Despite its compact design, our model consistently outperforms larger models on a diverse range of downstream benchmarks, including classification, prediction, and unsupervised network identification tasks. Furthermore, our model revealed novel insights into brain dynamics using attention maps on a unique external psilocybin neuroimaging dataset (pre- and post-administration). BrainSymphony establishes that architecturally-aware, multimodal models can surpass their larger counterparts, paving the way for more accessible and powerful research in computational neuroscience.
nan
Article 750
Title@2025-06-23 (1): Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies
Title: Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies | Schärfung der Speere: Adaptive, von Experten geleitete Gegenangriffe auf die DRL-basierte autonome Fahrpolitik | 尖尖尖尖尖:适应性专家指导对基于DRL的自主驾驶政策进行反反向攻击 2506.18304v1 |
Authors (3): Junchao Fan, Xuyang Lei, Xiaolin Chang
Deep reinforcement learning (DRL) has emerged as a promising paradigm for autonomous driving. However, despite their advanced capabilities, DRL-based policies remain highly vulnerable to adversarial attacks, posing serious safety risks in real-world deployments. Investigating such attacks is crucial for revealing policy vulnerabilities and guiding the development of more robust autonomous systems. While prior attack methods have made notable progress, they still face several challenges: 1) they often rely on high-frequency attacks, yet critical attack opportunities are typically context-dependent and temporally sparse, resulting in inefficient attack patterns; 2) restricting attack frequency can improve efficiency but often results in unstable training due to the adversary’s limited exploration. To address these challenges, we propose an adaptive expert-guided adversarial attack method that enhances both the stability and efficiency of attack policy training. Our method first derives an expert policy from successful attack demonstrations using imitation learning, strengthened by an ensemble Mixture-of-Experts architecture for robust generalization across scenarios. This expert policy then guides a DRL-based adversary through a KL-divergence regularization term. Due to the diversity of scenarios, expert policies may be imperfect. To address this, we further introduce a performance-aware annealing strategy that gradually reduces reliance on the expert as the adversary improves. Extensive experiments demonstrate that our method achieves outperforms existing approaches in terms of collision rate, attack efficiency, and training stability, especially in cases where the expert policy is sub-optimal.
nan
Article 751
Title@2025-06-23 (1): Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution
Title: Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution | Kollaborative Mean Abschätzung unter Heterogenen strategischen Agenten: Individuelle Rationalität, Fairness und Wahrheitsbeitrag | 不同不同战略媒介之间合作平均估计:个人合理性、公平性和真实贡献 2407.15881v2 |
Authors (4): Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy
We study a collaborative learning problem where $m$ agents aim to estimate a vector $\mu =(\mu_1,\ldots,\mu_d)\in \mathbb{R}^d$ by sampling from associated univariate normal distributions ${\mathcal{N}(\mu_k, \sigma^2)}{k\in[d]}$. Agent $i$ incurs a cost $c{i,k}$ to sample from $\mathcal{N}(\mu_k, \sigma^2)$. Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error. We design a mechanism to facilitate such collaboration, while addressing two key challenges: ensuring individually rational (IR) and fair outcomes so all agents benefit, and preventing strategic behavior (e.g. non-collection, data fabrication) to avoid socially undesirable outcomes. We design a mechanism and an associated Nash equilibrium (NE) which minimizes the social penalty-sum of agents’ estimation errors and collection costs-while being IR for all agents. We achieve a $\mathcal{O}(\sqrt{m})$-approximation to the minimum social penalty in the worst case and an $\mathcal{O}(1)$-approximation under favorable conditions. Additionally, we establish three hardness results: no nontrivial mechanism guarantees (i) a dominant strategy equilibrium where agents report truthfully, (ii) is IR for every strategy profile of other agents, (iii) or avoids a worst-case $\Omega(\sqrt{m})$ price of stability in any NE. Finally, by integrating concepts from axiomatic bargaining, we demonstrate that our mechanism supports fairer outcomes than one which minimizes social penalty.
nan
Article 752
Title@2025-06-23 (1): Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment
Title: Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment | Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung | 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v2 |
Authors (7): Radha Kodali, Venkata Rao Dhulipalla, Venkata Siva Kishor Tatavarty, Madhavi Nadakuditi, Bharadwaj Thiruveedhula, Suryanarayana Gunnam, Durga Prasad Bavirisetti
Infertility has a considerable impact on individuals’ quality of life, affecting them socially and psychologically, with projections indicating a rise in the upcoming years. In vitro fertilization (IVF) emerges as one of the primary techniques within economically developed nations, employed to address the rising problem of low fertility. Expert embryologists conventionally grade embryos by reviewing blastocyst images to select the most optimal for transfer, yet this process is time-consuming and lacks efficiency. Blastocyst images provide a valuable resource for assessing embryo viability. In this study, we introduce an explainable artificial intelligence (XAI) framework for classifying embryos, employing a fusion of convolutional neural network (CNN) and long short-term memory (LSTM) architecture, referred to as CNN-LSTM. Utilizing deep learning, our model achieves high accuracy in embryo classification while maintaining interpretability through XAI.
nan
Article 753
Title@2025-06-23 (1): AFBS:Buffer Gradient Selection in Semi-asynchronous Federated Learning
Title: AFBS:Buffer Gradient Selection in Semi-asynchronous Federated Learning | AFBS: Puffer-Gradienten-Auswahl im semi-asynchronen Föderierten Lernen | AFBS: 半同步联邦学习中的缓分级选择 2506.12754v2 |
Authors (6): Chaoyi Lu, Yiding Sun, Jinqian Chen, Zhichuan Yang, Jiangming Pan, Jihua Zhu
Asynchronous federated learning (AFL) accelerates training by eliminating the need to wait for stragglers, but its asynchronous nature introduces gradient staleness, where outdated gradients degrade performance. Existing solutions address this issue with gradient buffers, forming a semi-asynchronous framework. However, this approach struggles when buffers accumulate numerous stale gradients, as blindly aggregating all gradients can harm training. To address this, we propose AFBS (Asynchronous FL Buffer Selection), the first algorithm to perform gradient selection within buffers while ensuring privacy protection. Specifically, the client sends the random projection encrypted label distribution matrix before training, and the server performs client clustering based on it. During training, server scores and selects gradients within each cluster based on their informational value, discarding low-value gradients to enhance semi-asynchronous federated learning. Extensive experiments in highly heterogeneous system and data environments demonstrate AFBS’s superior performance compared to state-of-the-art methods. Notably, on the most challenging task, CIFAR-100, AFBS improves accuracy by up to 4.8% over the previous best algorithm and reduces the time to reach target accuracy by 75%.
nan
Article 754
Title@2025-06-23 (1): GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing
Title: GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing | GeNeRT: Ein physik-informierter Ansatz zur intelligenten drahtlosen Kanalmodellierung via Generalizable Neural Ray Tracing | GENERT:通过可通用神经射线追踪对智能无线频道建模的物理综合方法 2506.18295v1 |
Authors (4): Kejia Bian, Meixia Tao, Shu Sun, Jun Yu
Neural ray tracing (RT) has emerged as a promising paradigm for channel modeling by combining physical propagation principles with neural networks. It enables high modeling accuracy and efficiency. However, current neural RT methods face two key limitations: constrained generalization capability due to strong spatial dependence, and weak adherence to electromagnetic laws. In this paper, we propose GeNeRT, a Generalizable Neural RT framework with enhanced generalization, accuracy and efficiency. GeNeRT supports both intra-scenario spatial transferability and inter-scenario zero-shot generalization. By incorporating Fresnel-inspired neural network design, it also achieves higher accuracy in multipath component (MPC) prediction. Furthermore, a GPU-tensorized acceleration strategy is introduced to improve runtime efficiency. Extensive experiments conducted in outdoor scenarios demonstrate that GeNeRT generalizes well across untrained regions within a scenario and entirely unseen environments, and achieves superior accuracy in MPC prediction compared to baselines. Moreover, it outperforms Wireless Insite in runtime efficiency, particularly in multi-transmitter settings. Ablation experiments validate the effectiveness of the network architecture and training strategy in capturing physical principles of ray-surface interactions.
nan
Article 755
Title@2025-06-23 (1): Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction
Title: Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction | Instabilität in Diffusions-ODEs: Eine Erklärung für die ungenaue Bildrekonstruktion | DFDODEs的不稳定性:不准确图像重建的解释 2506.18290v1 |
Authors (9): Han Zhang, Jinghong Mao, Shangwen Zhu, Zhantao Yang, Lianghua Huang, Yu Liu, Deli Zhao, Ruili Feng, Fan Cheng
Diffusion reconstruction plays a critical role in various applications such as image editing, restoration, and style transfer. In theory, the reconstruction should be simple - it just inverts and regenerates images by numerically solving the Probability Flow-Ordinary Differential Equation (PF-ODE). Yet in practice, noticeable reconstruction errors have been observed, which cannot be well explained by numerical errors. In this work, we identify a deeper intrinsic property in the PF-ODE generation process, the instability, that can further amplify the reconstruction errors. The root of this instability lies in the sparsity inherent in the generation distribution, which means that the probability is concentrated on scattered and small regions while the vast majority remains almost empty. To demonstrate the existence of instability and its amplification on reconstruction error, we conduct experiments on both toy numerical examples and popular open-sourced diffusion models. Furthermore, based on the characteristics of image data, we theoretically prove that the instability’s probability converges to one as the data dimensionality increases. Our findings highlight the inherent challenges in diffusion-based reconstruction and can offer insights for future improvements.
nan
Article 756
Title@2025-06-23 (1): LoRA vs Full Fine-tuning: An Illusion of Equivalence
Title: LoRA vs Full Fine-tuning: An Illusion of Equivalence | LoRA vs. Full Fine-Tuning: Eine Illusion der Gleichwertigkeit | LoRA 与 完全微调: 等同的幻象 2410.21228v2 |
Authors (4): Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to effectively fine-tune LLMs with an extreme reduction in trainable parameters. But, \emph{are their learned solutions really equivalent?} We study how LoRA and full-finetuning change pre-trained models by analyzing the model’s weight matrices through the lens of their spectral properties. We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}, while those trained with full fine-tuning do not. Further, we extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension – by causally intervening on the intruder dimensions by changing their associated singular values post-fine-tuning, we show that they cause forgetting. Moreover, scaling them down significantly improves modeling of the pre-training distribution with a minimal drop in downstream task performance. Given this, we should expect accumulating intruder dimensions to be harmful and lead to more forgetting. This will be amplified during continual learning because of sequentially fine-tuning, and we show that LoRA models do accumulate intruder dimensions here tend to perform worse in this setting, emphasizing the practicality of our findings.
nan
Article 757
Title@2025-06-23 (1): Learning High-Quality Latent Representations for Anomaly Detection and Signal Integrity Enhancement in High-Speed Signals
Title: Learning High-Quality Latent Representations for Anomaly Detection and Signal Integrity Enhancement in High-Speed Signals | Lernen von hochwertigen Latentendarstellungen für Anomalieerkennung und Signalintegritätsverbesserung in Hochgeschwindigkeitssignalen | 高频信号中反常探测和信号完整性增强的学习高品质低端显示器 2506.18288v1 |
Authors (6): Muhammad Usama, Hee-Deok Jang, Soham Shanbhag, Yoo-Chang Sung, Seung-Jun Bae, Dong Eui Chang
This paper addresses the dual challenge of improving anomaly detection and signal integrity in high-speed dynamic random access memory signals. To achieve this, we propose a joint training framework that integrates an autoencoder with a classifier to learn more distinctive latent representations by focusing on valid data features. Our approach is evaluated across three anomaly detection algorithms and consistently outperforms two baseline methods. Detailed ablation studies further support these findings. Furthermore, we introduce a signal integrity enhancement algorithm that improves signal integrity by an average of 11.3%. The source code and data used in this study are available at https://github.com/Usama1002/learning-latent-representations.
nan
Article 758
Title@2025-06-23 (1): Learning Causal Graphs at Scale: A Foundation Model Approach
Title: Learning Causal Graphs at Scale: A Foundation Model Approach | Das Lernen von Kausalgraphen im Maßstab: Ein Basismodellansatz | 规模化学习性因果图表:基础模式方法 2506.18285v1 |
Authors (3): Naiyu Yin, Tian Gao, Yue Yu
Due to its human-interpretability and invariance properties, Directed Acyclic Graph (DAG) has been a foundational tool across various areas of AI research, leading to significant advancements. However, DAG learning remains highly challenging, due to its super-exponential growth in computational cost and identifiability issues, particularly in small-sample regimes. To address these two challenges, in this work we leverage the recent success of linear transformers and develop a foundation model approach for discovering multiple order-consistent DAGs across tasks. In particular, we propose Attention-DAG (ADAG), a novel attention-mechanism-based architecture for learning multiple linear Structural Equation Models (SEMs). ADAG learns the mapping from observed data to both graph structure and parameters via a nonlinear attention-based kernel, enabling efficient multi-task estimation of the underlying linear SEMs. By formulating the learning process across multiple tasks as a continuous optimization problem, the pre-trained ADAG model captures the common structural properties as a shared low-dimensional prior, thereby reducing the ill-posedness of downstream DAG learning tasks in small-sample regimes. We evaluate our proposed approach on benchmark synthetic datasets and find that ADAG achieves substantial improvements in both DAG learning accuracy and zero-shot inference efficiency. To the best of our knowledge, this is the first practical approach for pre-training a foundation model specifically designed for DAG learning, representing a step toward more efficient and generalizable down-stream applications in causal discovery.
nan
Article 759
Title@2025-06-23 (1): Quantifying Uncertainty in the Presence of Distribution Shifts
Title: Quantifying Uncertainty in the Presence of Distribution Shifts | Quantifizierung der Unsicherheit in der Gegenwart von Verteilungsverschiebungen | 分配变更存在不确定性的量化 2506.18283v1 |
Authors (2): Yuli Slavutsky, David M. Blei
Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for uncertainty estimation that explicitly accounts for covariate shifts. While conventional approaches rely on fixed priors, the key idea of our method is an adaptive prior, conditioned on both training and new covariates. This prior naturally increases uncertainty for inputs that lie far from the training distribution in regions where predictive performance is likely to degrade. To efficiently approximate the resulting posterior predictive distribution, we employ amortized variational inference. Finally, we construct synthetic environments by drawing small bootstrap samples from the training data, simulating a range of plausible covariate shift using only the original dataset. We evaluate our method on both synthetic and real-world data. It yields substantially improved uncertainty estimates under distribution shifts.
nan
Article 760
Title@2025-06-23 (1): Phase retrieval with rank $d$ measurements – \emph{descending} algorithms phase transitions
Title: Phase retrieval with rank $d$ measurements – \emph{descending} algorithms phase transitions | Phase Retrieval mit Rang $d$ Messungen – \emph{descending} Algorithmen Phasenübergänge | 以 $d$ 位数测量的阶段检索 – \ emph{descending} 算法阶段过渡 2506.18282v1 |
Authors (1): Mihailo Stojnic
Companion paper [118] developed a powerful \emph{Random duality theory} (RDT) based analytical program to statistically characterize performance of \emph{descending} phase retrieval algorithms (dPR) (these include all variants of gradient descents and among them widely popular Wirtinger flows). We here generalize the program and show how it can be utilized to handle rank $d$ positive definite phase retrieval (PR) measurements (with special cases $d=1$ and $d=2$ serving as emulations of the real and complex phase retrievals, respectively). In particular, we observe that the minimal sample complexity ratio (number of measurements scaled by the dimension of the unknown signal) which ensures dPR’s success exhibits a phase transition (PT) phenomenon. For both plain and lifted RDT we determine phase transitions locations. To complement theoretical results we implement a log barrier gradient descent variant and observe that, even in small dimensional scenarios (with problem sizes on the order of 100), the simulated phase transitions are in an excellent agreement with the theoretical predictions.
nan
Article 761
Title@2025-06-23 (1): Hallucination Level of Artificial Intelligence Whisperer: Case Speech Recognizing Pantterinousut Rap Song
Title: Hallucination Level of Artificial Intelligence Whisperer: Case Speech Recognizing Pantterinousut Rap Song | Halluzinationsgrad der Künstlichen Intelligenz Whisperer: Case Speech Anerkennung von Pantterinousut Rap Song | 人造情报自言自语:承认 “ 潘特罗宁自唱 “ 的个案发言 2506.16174v2 |
Authors (3): Ismo Horppu, Frederick Ayala, Erlin Gulbenkoglu
All languages are peculiar. Some of them are considered more challenging to understand than others. The Finnish Language is known to be a complex language. Also, when languages are used by artists, the pronunciation and meaning might be more tricky to understand. Therefore, we are putting AI to a fun, yet challenging trial: translating a Finnish rap song to text. We will compare the Faster Whisperer algorithm and YouTube’s internal speech-to-text functionality. The reference truth will be Finnish rap lyrics, which the main author’s little brother, Mc Timo, has written. Transcribing the lyrics will be challenging because the artist raps over synth music player by Syntikka Janne. The hallucination level and mishearing of AI speech-to-text extractions will be measured by comparing errors made against the original Finnish lyrics. The error function is informal but still works for our case.
nan
Article 762
Title@2025-06-23 (1): Optimal spectral initializers impact on phase retrieval phase transitions – an RDT view
Title: Optimal spectral initializers impact on phase retrieval phase transitions – an RDT view | Optimale spektrale Initialisatoren wirken sich auf Phasenabfragephasenübergänge aus – eine RDT-Ansicht | 最佳光谱初始化器对阶段回收阶段过渡的影响 – – RDT观点 2506.18279v1 |
Authors (1): Mihailo Stojnic
We analyze the relation between spectral initializers and theoretical limits of \emph{descending} phase retrieval algorithms (dPR). In companion paper [104], for any sample complexity ratio, $\alpha$, \emph{parametric manifold}, ${\mathcal {PM}}(\alpha)$, is recognized as a critically important structure that generically determines dPRs abilities to solve phase retrieval (PR). Moreover, overlap between the algorithmic solution and the true signal is positioned as a key ${\mathcal {PM}}$’s component. We here consider the so-called \emph{overlap optimal} spectral initializers (OptSpins) as dPR’s starting points and develop a generic \emph{Random duality theory} (RDT) based program to statistically characterize them. In particular, we determine the functional structure of OptSpins and evaluate the starting overlaps that they provide for the dPRs. Since ${\mathcal {PM}}$’s so-called \emph{flat regions} are highly susceptible to \emph{local jitteriness} and as such are key obstacles on dPR’s path towards PR’s global optimum, a precise characterization of the starting overlap allows to determine if such regions can be successfully circumvented. Through the presented theoretical analysis we observe two key points in that regard: \textbf{\emph{(i)}} dPR’s theoretical phase transition (critical $\alpha$ above which they solve PR) might be difficult to practically achieve as the ${\mathcal {PM}}$’s flat regions are large causing the associated OptSpins to fall exactly within them; and \textbf{\emph{(ii)}} Opting for so-called ``\emph{safer compression}’’ and slightly increasing $\alpha$ (by say $15\%$) shrinks flat regions and allows OptSpins to fall outside them and dPRs to ultimately solve PR. Numerical simulations are conducted as well and shown to be in an excellent agreement with theoretical predictions.
nan
Article 763
Title@2025-06-23 (1): Fast Rate Information-theoretic Bounds on Generalization Errors
Title: Fast Rate Information-theoretic Bounds on Generalization Errors | Schnelle Rate Information-theoretische Grenzen auf Verallgemeinerungsfehler | 通用误差信息理论误差快速速速率 2303.14658v3 |
Authors (4): Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size $n$. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as $O(\sqrt{1/n})$ while the true generalization error scales as $O(1/n)$. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the $(\eta, c)$-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.
nan
Article 764
Title@2025-06-23 (1): Finite-Time Information-Theoretic Bounds in Queueing Control
Title: Finite-Time Information-Theoretic Bounds in Queueing Control | Finite-Time Information-Theoretische Bounds in Queueing Control | 定队控制中的短时信息理论环 2506.18278v1 |
Authors (3): Yujie Liu, Vincent Y. F. Tan, Yunbei Xu
We establish the first finite-time information-theoretic lower bounds-and derive new policies that achieve them-for the total queue length in scheduling problems over stochastic processing networks with both adversarial and stochastic arrivals. Prior analyses of MaxWeight guarantee only stability and asymptotic optimality in heavy traffic; we prove that, at finite horizons, MaxWeight can incur strictly larger backlog by problem-dependent factors which we identify. Our main innovations are 1) a minimax framework that pinpoints the precise problem parameters governing any policy’s finite-time performance; 2) an information-theoretic lower bound on total queue length; 3) fundamental limitation of MaxWeight that it is suboptimal in finite time; and 4) a new scheduling rule that minimizes the full Lyapunov drift-including its second-order term-thereby matching the lower bound under certain conditions, up to universal constants. These findings reveal a fundamental limitation on “drift-only” methods and points the way toward principled, non-asymptotic optimality in queueing control.
nan
Article 765
Title@2025-06-23 (1): Phase transition of \emph{descending} phase retrieval algorithms
Title: Phase transition of \emph{descending} phase retrieval algorithms | Phasenübergang von \emph{descending} Phasen-Retrieval-Algorithmen | \ emph{ dedescending} 阶段检索算法的阶段过渡 2506.18275v1 |
Authors (1): Mihailo Stojnic
We study theoretical limits of \emph{descending} phase retrieval algorithms. Utilizing \emph{Random duality theory} (RDT) we develop a generic program that allows statistical characterization of various algorithmic performance metrics. Through these we identify the concepts of \emph{parametric manifold} and its \emph{funneling points} as key mathematical objects that govern the underlying algorithms’ behavior. An isomorphism between single funneling point manifolds and global convergence of descending algorithms is established. The structure and shape of the parametric manifold as well as its dependence on the sample complexity are studied through both plain and lifted RDT. Emergence of a phase transition is observed. Namely, as sample complexity increases, parametric manifold transitions from a multi to a single funneling point structure. This in return corresponds to a transition from the scenarios where descending algorithms generically fail to the scenarios where they succeed in solving phase retrieval. We also develop and implement a practical algorithmic variant that in a hybrid alternating fashion combines a barrier and a plain gradient descent. Even though the theoretical results are obtained for infinite dimensional scenarios (and consequently non-jittery parametric manifolds), we observe a strong agrement between theoretical and simulated phase transitions predictions for fairly small dimensions on the order of a few hundreds.
nan
Article 766
Title@2025-06-23 (1): Leveraging Large Language Models for Information Verification – an Engineering Approach
Title: Leveraging Large Language Models for Information Verification – an Engineering Approach | Nutzung großer Sprachmodelle für die Informationsverifizierung – ein Engineering-Ansatz | 利用大语言模型进行信息核查 – – 工程方法 2506.18274v1 |
Authors (7): Nguyen Nang Hung, Nguyen Thanh Trong, Vuong Thanh Toan, Nguyen An Phuoc, Dao Minh Tu, Nguyen Manh Duc Tuan, Nguyen Dinh Mau
For the ACMMM25 challenge, we present a practical engineering approach to multimedia news source verification, utilizing Large Language Models (LLMs) like GPT-4o as the backbone of our pipeline. Our method processes images and videos through a streamlined sequence of steps: First, we generate metadata using general-purpose queries via Google tools, capturing relevant content and links. Multimedia data is then segmented, cleaned, and converted into frames, from which we select the top-K most informative frames. These frames are cross-referenced with metadata to identify consensus or discrepancies. Additionally, audio transcripts are extracted for further verification. Noticeably, the entire pipeline is automated using GPT-4o through prompt engineering, with human intervention limited to final validation.
nan
Article 767
Title@2025-06-23 (1): When Large Language Models Meet Vector Databases: A Survey
Title: When Large Language Models Meet Vector Databases: A Survey | Wenn große Sprachmodelle Vektordatenbanken treffen: Eine Umfrage | 当大语言模型与矢量数据库相匹配时:调查 2402.01763v4 |
Authors (8): Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang
This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an efficient means to store, retrieve, and manage the high-dimensional vector representations intrinsic to LLM operations. Through this nuanced review, we delineate the foundational principles of LLMs and VecDBs and critically analyze their integration’s impact on enhancing LLM functionalities. This discourse extends into a discussion on the speculative future developments in this domain, aiming to catalyze further research into optimizing the confluence of LLMs and VecDBs for advanced data handling and knowledge extraction capabilities.
nan
Article 768
Title@2025-06-23 (1): Evolutionary Optimization of Physics-Informed Neural Networks: Evo-PINN Frontiers and Opportunities
Title: Evolutionary Optimization of Physics-Informed Neural Networks: Evo-PINN Frontiers and Opportunities | Evolutionäre Optimierung physikinformierter neuraler Netzwerke: Evo-PINN Grenzen und Chancen | 物理内化神经网络的优化演变:Evo-PINN的前沿和机会 2501.06572v3 |
Authors (6): Jian Cheng Wong, Abhishek Gupta, Chin Chun Ooi, Pao-Hsiung Chiu, Jiao Liu, Yew-Soon Ong
Deep learning models trained on finite data lack a complete understanding of the physical world. On the other hand, physics-informed neural networks (PINNs) are infused with such knowledge through the incorporation of mathematically expressible laws of nature into their training loss function. By complying with physical laws, PINNs provide advantages over purely data-driven models in limited-data regimes and present as a promising route towards Physical AI. This feature has propelled them to the forefront of scientific machine learning, a domain characterized by scarce and costly data. However, the vision of accurate physics-informed learning comes with significant challenges. This work examines PINNs for the first time in terms of model optimization and generalization, shedding light on the need for new algorithmic advances to overcome issues pertaining to the training speed, precision, and generalizability of today’s PINN models. Of particular interest are gradient-free evolutionary algorithms (EAs) for optimizing the uniquely complex loss landscapes arising in PINN training. Methods synergizing gradient descent and EAs for discovering bespoke neural architectures and balancing multiple terms in physics-informed learning objectives are positioned as important avenues for future research. Another exciting track is to cast evolutionary as a meta-learner of generalizable PINN models. To substantiate these proposed avenues, we further highlight results from recent literature to showcase the early success of such approaches in addressing the aforementioned challenges in PINN optimization and generalization.
nan
Article 769
Title@2025-06-23 (1): Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models
Title: Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models | Memory-Augmented Architecture für langfristiges Kontext-Handling in großen Sprachmodellen | 大语言模型长期背景处理的记忆强化建筑 2506.18271v1 |
Authors (2): Haseeb Ullah Khan Shinwari, Muhammad Usama
Large Language Models face significant challenges in maintaining coherent interactions over extended dialogues due to their limited contextual memory. This limitation often leads to fragmented exchanges and reduced relevance in responses, diminishing user experience. To address these issues, we propose a memory-augmented architecture that dynamically retrieves, updates, and prunes relevant information from past interactions, ensuring effective long-term context handling. Experimental results demonstrate that our solution significantly improves contextual coherence, reduces memory overhead, and enhances response quality, showcasing its potential for real-time applications in interactive systems.
nan
Article 770
Title@2025-06-23 (1): ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models with Heterogeneous Adaptation Needs
Title: ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models with Heterogeneous Adaptation Needs | ARD-LoRA: Dynamische Rangverteilung für Parameter-Effiziente Feineinstellung von Grundmodellen mit heterogenen Anpassungsbedürfnissen | ARD-LORA: 具有不同差异适应需要的基础模型参数有效精密设计动态排名分配 2506.18267v1 |
Authors (2): Haseeb Ullah Khan Shinwari, Muhammad Usama
Conventional Low-Rank Adaptation (LoRA) methods employ a fixed rank, imposing uniform adaptation across transformer layers and attention heads despite their heterogeneous learning dynamics. This paper introduces Adaptive Rank Dynamic LoRA (ARD-LoRA), a novel framework that automates rank allocation through learnable scaling factors. These factors are optimized via a meta-objective balancing task performance and parameter efficiency, incorporating $\ell_1$ sparsity for minimal rank and Total Variation regularization for stable rank transitions. ARD-LoRA enables continuous, differentiable, per-head rank adaptation. Experiments on LLAMA-3.1-70B and PaliGemma-2 demonstrate ARD-LoRA’s efficacy, achieving up to 99.3% of full fine-tuning performance with only 0.32% trainable parameters, outperforming strong baselines like DoRA and AdaLoRA. Furthermore, it reduces multimodal adaptation memory by 41%. These results establish dynamic, fine-grained rank allocation as a critical paradigm for efficient foundation model adaptation.
nan
Article 771
Title@2025-06-23 (1): Incentives for Responsiveness, Instrumental Control and Impact
Title: Incentives for Responsiveness, Instrumental Control and Impact | Anreize für Reaktionsfähigkeit, Instrumentenkontrolle und Wirkung | 反应、工具控制和影响奖励措施 2001.07118v3 |
Authors (5): Ryan Carey, Eric Langlois, Chris van Merwijk, Shane Legg, Tom Everitt
We introduce three concepts that describe an agent’s incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent’s policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings. This journal-length paper extends our conference publications “Incentives for Responsiveness, Instrumental Control and Impact” and “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.
nan
Article 772
Title@2025-06-23 (1): FutureFill: Fast Generation from Convolutional Sequence Models
Title: FutureFill: Fast Generation from Convolutional Sequence Models | FutureFill: Schnelle Generation aus konvolutionären Sequenzmodellen | 未来金融危机:从变序模型中快速繁衍 2410.03766v3 |
Authors (9): Naman Agarwal, Xinyi Chen, Evan Dogariu, Devan Shah, Hubert Strauss, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan
We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill, a general-purpose fast generation method for any sequence prediction algorithm based on convolutional operators. FutureFill reduces generation time from quadratic to quasilinear in the context length. Moreover, when generating from a prompt, it requires a prefill cache whose size grows only with the number of tokens to be generated, often much smaller than the caches required by standard convolutional or attention based models. We validate our theoretical claims with experiments on synthetic tasks and demonstrate substantial efficiency gains when generating from a deep convolutional sequence prediction model.
nan
Article 773
Title@2025-06-23 (1): AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
Title: AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining | AdaLRS: Loss-Guided Adaptive Learning Rate Suche nach effizientem Foundation Model Pretraining | AdaLRS: 为高效基础基础示范培训前而寻找学习率 2506.13274v2 |
Authors (5): Hongyuan Dong, Dingkang Yang, Xiao Liang, Chao Feng, Jiao Ran
Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose \textbf{AdaLRS}, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide experiment results to show that the optimization of training loss and loss descent velocity in foundation model pretraining are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via theoretical analysis. Experiments on both LLM and VLM pretraining show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness, with model performance improved accordingly. We also show the robust generalizability of AdaLRS across varying training scenarios, such as different model sizes, training paradigms, and base learning rate scheduler choices.
nan
Article 774
Title@2025-06-23 (1): MGHF: Multi-Granular High-Frequency Perceptual Loss for Image Super-Resolution
Title: MGHF: Multi-Granular High-Frequency Perceptual Loss for Image Super-Resolution | MGHF: Multi-Granulare High-Frequency Perceptual Loss für Bild-Super-Resolution | MGHF: 图像超分辨率的多语言高频感知损失 2411.13548v2 |
Authors (6): Shoaib Meraj Sami, Md Mahedi Hasan, Mohammad Saeed Ebrahimi Saadabadi, Jeremy Dawson, Nasser Nasrabadi, Raghuveer Rao
While different variants of perceptual losses have been employed in super-resolution literature to synthesize more realistic, appealing, and detailed high-resolution images, most are convolutional neural networks-based, causing information loss during guidance and often relying on complicated architectures and training procedures. We propose an invertible neural network (INN)-based naive \textbf{M}ulti-\textbf{G}ranular \textbf{H}igh-\textbf{F}requency (MGHF-n) perceptual loss trained on ImageNet to overcome these issues. Furthermore, we develop a comprehensive framework (MGHF-c) with several constraints to preserve, prioritize, and regularize information across multiple perspectives: texture and style preservation, content preservation, regional detail preservation, and joint content-style regularization. Information is prioritized through adaptive entropy-based pruning and reweighting of INN features. We utilize Gram matrix loss for style preservation and mean-squared error loss for content preservation. Additionally, we propose content-style consistency through correlation loss to regulate unnecessary texture generation while preserving content information. Since small image regions may contain intricate details, we employ modulated PatchNCE in the INN features as a local information preservation objective. Extensive experiments on various super-resolution algorithms, including GAN- and diffusion-based methods, demonstrate that our MGHF framework significantly improves performance. After the review process, our code will be released in the public repository.
nan
Article 775
Title@2025-06-23 (1): Ground tracking for improved landmine detection in a GPR system
Title: Ground tracking for improved landmine detection in a GPR system | Bodenverfolgung für eine verbesserte Landminenerkennung in einem GPR-System | 在GPR系统中改进地雷探测的地面跟踪 2506.18258v1 |
Authors (4): Li Tang, Peter A. Torrione, Cihat Eldeniz, Leslie M. Collins
Ground penetrating radar (GPR) provides a promising technology for accurate subsurface object detection. In particular, it has shown promise for detecting landmines with low metal content. However, the ground bounce (GB) that is present in GPR data, which is caused by the dielectric discontinuity between soil and air, is a major source of interference and degrades landmine detection performance. To mitigate this interference, GB tracking algorithms formulated using both a Kalman filter (KF) and a particle filter (PF) framework are proposed. In particular, the location of the GB in the radar signal is modeled as the hidden state in a stochastic system for the PF approach. The observations are the 2D radar images, which arrive scan by scan along the down-track direction. An initial training stage sets parameters automatically to accommodate different ground and weather conditions. The features associated with the GB description are updated adaptively with the arrival of new data. The prior distribution for a given location is predicted by propagating information from two adjacent channels/scans, which ensures that the overall GB surface remains smooth. The proposed algorithms are verified in experiments utilizing real data, and their performances are compared with other GB tracking approaches. We demonstrate that improved GB tracking contributes to improved performance for the landmine detection problem.
nan
Article 776
Title@2025-06-23 (1): RLPR: Extrapolating RLVR to General Domains without Verifiers
Title: RLPR: Extrapolating RLVR to General Domains without Verifiers | RLPR: Extrapolieren von RLVR auf allgemeine Domains ohne Prüfer | RLPR: 将RLVR外推至普通域域,无验证符 2506.18254v1 |
Authors (12): Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua
Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key observation is that LLM’s intrinsic probability of generating a correct free-form answer directly indicates its own evaluation of the reasoning reward (i.e., how well the reasoning process leads to the correct answer). Building on this insight, we propose RLPR, a simple verifier-free framework that extrapolates RLVR to broader general domains. RLPR uses the LLM’s own token probability scores for reference answers as the reward signal and maximizes the expected reward during training. We find that addressing the high variance of this noisy probability reward is crucial to make it work, and propose prob-to-reward and stabilizing methods to ensure a precise and stable reward from LLM intrinsic probabilities. Comprehensive experiments in four general-domain benchmarks and three mathematical benchmarks show that RLPR consistently improves reasoning capabilities in both areas for Gemma, Llama, and Qwen based models. Notably, RLPR outperforms concurrent VeriFree by 7.6 points on TheoremQA and 7.5 points on Minerva, and even surpasses strong verifier-model-dependent approaches General-Reasoner by 1.6 average points across seven benchmarks.
nan
Article 777
Title@2025-06-23 (1): DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic
Title: DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic | DSAC-C: Beschränkte maximale Entropie für robuste diskrete Soft-Actor-Kritik | DSAC- C: 软盘分解软控控控控控控最大导体 2310.17173v2 |
Authors (2): Dexter Neo, Tsuhan Chen
We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate critic policy. Furthermore, our findings suggests that these constraints provide an added robustness against potential domain shifts, which are essential for safe deployment of reinforcement learning agents in the real-world. We provide theoretical analysis and show empirical results on low data regimes for both in-distribution and out-of-distribution variants of Atari 2600 games.
nan
Article 778
Title@2025-06-23 (1): Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures
Title: Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures | Effiziente Quantifizierung von Modellierungsunsicherheiten mit differenzierten physikinformierten Machine Learning-Architekturen | 探索对以不同物理和机械化学习架构建模的不确定性模型化进行高效率的量化 2506.18247v1 |
Authors (5): Manaswin Oddiraju, Bharath Varma Penumatsa, Divyang Amin, Michael Piedmonte, Souma Chowdhury
Quantifying and propagating modeling uncertainties is crucial for reliability analysis, robust optimization, and other model-based algorithmic processes in engineering design and control. Now, physics-informed machine learning (PIML) methods have emerged in recent years as a new alternative to traditional computational modeling and surrogate modeling methods, offering a balance between computing efficiency, modeling accuracy, and interpretability. However, their ability to predict and propagate modeling uncertainties remains mostly unexplored. In this paper, a promising class of auto-differentiable hybrid PIML architectures that combine partial physics and neural networks or ANNs (for input transformation or adaptive parameter estimation) is integrated with Bayesian Neural networks (replacing the ANNs); this is done with the goal to explore whether BNNs can successfully provision uncertainty propagation capabilities in the PIML architectures as well, further supported by the auto-differentiability of these architectures. A two-stage training process is used to alleviate the challenges traditionally encountered in training probabilistic ML models. The resulting BNN-integrated PIML architecture is evaluated on an analytical benchmark problem and flight experiments data for a fixed-wing RC aircraft, with prediction performance observed to be slightly worse or at par with purely data-driven ML and original PIML models. Moreover, Monte Carlo sampling of probabilistic BNN weights was found to be most effective in propagating uncertainty in the BNN-integrated PIML architectures.
nan
Article 779
Title@2025-06-23 (1): Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student
Title: Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student | Dual-Forward-Pfad-Lehrerwissen Destillation: Überwindung des Kapazitaetraums zwischen Lehrer und Student | 教师知识蒸馏:缩小师生能力差距 2506.18244v1 |
Authors (5): Tong Li, Long Liu, Yihang Hu, Hu Chen, Shifeng Chen
Knowledge distillation (KD) provides an effective way to improve the performance of a student network under the guidance of pre-trained teachers. However, this approach usually brings in a large capacity gap between teacher and student networks, limiting the distillation gains. Previous methods addressing this problem either discard accurate knowledge representation or fail to dynamically adjust the transferred knowledge, which is less effective in addressing the capacity gap problem and hinders students from achieving comparable performance with the pre-trained teacher. In this work, we extend the ideology of prompt-based learning to address the capacity gap problem, and propose Dual-Forward Path Teacher Knowledge Distillation (DFPT-KD), which replaces the pre-trained teacher with a novel dual-forward path teacher to supervise the learning of student. The key to DFPT-KD is prompt-based tuning, i.e., establishing an additional prompt-based forward path within the pre-trained teacher and optimizing it with the pre-trained teacher frozen to make the transferred knowledge compatible with the representation ability of the student. Extensive experiments demonstrate that DFPT-KD leads to trained students performing better than the vanilla KD. To make the transferred knowledge better compatible with the representation abilities of the student, we further fine-tune the whole prompt-based forward path, yielding a novel distillation approach dubbed DFPT-KD+. By extensive experiments, it is shown that DFPT-KD+ improves upon DFPT-KD and achieves state-of-the-art accuracy performance.
nan
Article 780
Title@2025-06-23 (1): Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Title: Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models | Chain-of-Experts: Entsperren der Kommunikationskraft von Mixture-of-Experts-Modellen | 专家链:解锁混合专家模型的通信能力 2506.18945v1 |
Authors (10): Zihan Wang, Rui Pan, Jiarui Yao, Robert Csordas, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu
We propose Chain-of-Experts (CoE), a new Mixture-of-Experts (MoE) architecture that introduces sequential expert communication within each layer. Unlike traditional MoE models, where experts operate independently in parallel, CoE processes tokens iteratively across a chain of experts inside a layer. To support dynamic expert selection across iterations, CoE employs a dedicated router at each iteration step within a layer. This design allows tokens to re-evaluate and select different experts during each iteration, rather than being statically assigned. As a result, CoE introduces a flexible routing mechanism that increases the diversity of expert combinations and enriches the model’s representational capacity. CoE demonstrates improved performance under fixed compute: on math reasoning tasks, it reduces validation loss from 1.20 to 1.12 compared to a standard MoE. Beyond performance, CoE offers a new scaling axis: depth through expert iteration, which complements conventional width/depth scaling. For example, using 2x iterations matches the performance of 3x expert selections (in width), while reducing memory usage by 17.6-42% relative to other scaling strategies. Our analysis reveals that CoE’s benefits stem from its iterative residual structure and enhanced expert specialization empowered by iterative routing, which together unlock more expressive representations. Code is available at https://github.com/ZihanWang314/coe.
nan
Article 781
Title@2025-06-23 (1): Uncertainty-aware Efficient Subgraph Isomorphism using Graph Topology
Title: Uncertainty-aware Efficient Subgraph Isomorphism using Graph Topology | Ungewissheit bewusst Effizienter Subgraph Isomorphismus mit Graph Topologie | 使用图形地形学 2209.09090v3 |
Authors (2): Arpan Kusari, Wenbo Sun
Subgraph isomorphism, also known as subgraph matching, is typically regarded as an NP-complete problem. This complexity is further compounded in practical applications where edge weights are real-valued and may be affected by measurement noise and potential missing data. Such graph matching routinely arises in applications such as image matching and map matching. Most subgraph matching methods fail to perform node-to-node matching under presence of such corruptions. We propose a method for identifying the node correspondence between a subgraph and a full graph in the inexact case without node labels in two steps - (a) extract the minimal unique topology preserving subset from the subgraph and find its feasible matching in the full graph, and (b) implement a consensus-based algorithm to expand the matched node set by pairing unique paths based on boundary commutativity. To demonstrate the effectiveness of the proposed method, a simulation is performed on the Erdos-Renyi random graphs and two case studies are performed on the image-based affine covariant features dataset and KITTI stereo dataset respectively. Going beyond the existing subgraph matching approaches, the proposed method is shown to have realistically sub-linear computational efficiency, robustness to random measurement noise, and good statistical properties. Our method is also readily applicable to the exact matching case without loss of generality.
nan
Article 782
Title@2025-06-23 (1): LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs
Title: LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs | LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs | LLM 网络动态:追踪在LLM网络中的模型崩溃情况 2506.15690v2 |
Authors (4): Tianyu Wang, Lingyou Pang, Akira Horiguchi, Carey E. Priebe
The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.
nan
Article 783
Title@2025-06-23 (1): AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
Title: AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | AdapThink: Adaptive Denkeinstellungen für das Sprachmodell der Vernunft | AapThink:对理由语言模式的适应性思维偏好 2506.18237v1 |
Authors (6): Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun
Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking’’ paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mechanisms typically rely on static length budgets or predefined rules, lacking the adaptability for varying question complexities and models’ evolving capabilities. To this end, we propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models. Specifically, AdapThink incorporates two key mechanisms: 1) A group-relative reward function that leverages model confidence and response’s characteristic to dynamically adjust the preference of reflection-related transition words without resorting to a fixed length preference. 2) A diversity-aware sampling mechanism that balances the training group’s solution accuracy with reasoning diversity via an entropy-guided score. Experiments on several mathematical reasoning datasets with DeepSeek-distilled models demonstrate AdapThink’s advantages in enabling adaptive reasoning patterns and mitigating the inefficiencies.
nan
Article 784
Title@2025-06-23 (1): ASGO: Adaptive Structured Gradient Optimization
Title: ASGO: Adaptive Structured Gradient Optimization | ASGO: Adaptive Strukturierte Gradientenoptimierung | ASGO: 适应结构结构梯度优化 2503.20762v2 |
Authors (7): Kang An, Yuxing Liu, Rui Pan, Yi Ren, Shiqian Ma, Donald Goldfarb, Tong Zhang
Training deep neural networks is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than by vectors. Under this structural representation, it has been widely observed that gradients are low-rank and Hessians are approximately block-wise diagonal. These structured properties are crucial for designing efficient optimization algorithms, but are not utilized by many current popular optimizers like Adam. In this paper, we present a novel optimization algorithm ASGO that capitalizes on these properties by employing a preconditioner that is adaptively updated using structured gradients. By fine-grained theoretical analysis, ASGO is proven to achieve superior convergence rates compared to existing structured gradient methods. Based on the convergence theory, we further demonstrate that ASGO can benefit from the low-rank and block-wise diagonal properties. We also discuss practical modifications of ASGO and empirically verify ASGO’s effectiveness on language model tasks.
nan
Article 785
Title@2025-06-23 (1): Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano
Title: Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano | Cross-Architecture Knowledge Destillation (KD) für retinale Fundus-Bildanomalieerkennung auf NVIDIA Jetson Nano | NVIDIA Jetson Nano 图像异常探测跨建筑知识蒸馏(KD) 2506.18220v1 |
Authors (2): Berk Yilmaz, Aniruddh Aiyengar
Early and accurate identification of retinal ailments is crucial for averting ocular decline; however, access to dependable diagnostic devices is not often available in low-resourced settings. This project proposes to solve that by developing a lightweight, edge-device deployable disease classifier using cross-architecture knowledge distilling. We first train a high-capacity vision transformer (ViT) teacher model, pre-trained using I-JEPA self-supervised learning, to classify fundus images into four classes: Normal, Diabetic Retinopathy, Glaucoma, and Cataract. We kept an Internet of Things (IoT) focus when compressing to a CNN-based student model for deployment in resource-limited conditions, such as the NVIDIA Jetson Nano. This was accomplished using a novel framework which included a Partitioned Cross-Attention (PCA) projector, a Group-Wise Linear (GL) projector, and a multi-view robust training method. The teacher model has 97.4 percent more parameters than the student model, with it achieving 89 percent classification with a roughly 93 percent retention of the teacher model’s diagnostic performance. The retention of clinical classification behavior supports our method’s initial aim: compression of the ViT while retaining accuracy. Our work serves as an example of a scalable, AI-driven triage solution for retinal disorders in under-resourced areas.
nan
Article 786
Title@2025-06-23 (1): Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
Title: Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales | Symmetrischer Verstärkungs-Lernverlust für robustes Lernen auf unterschiedlichen Aufgaben und Modellskalan | 不同任务和模式规模的有力学习的对称强化学习损失 2405.17618v3 |
Authors (2): Ju-Seung Byun, Andrew Perrault
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) can introduce additional difficulty. Differing preferences can complicate the alignment process, and prediction errors in a trained reward model can become more severe as the LLM generates unseen outputs. To enhance training robustness, RL has adopted techniques from supervised learning, such as ensembles and layer normalization. In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss. We demonstrate performance improvements across various tasks and scales. We conduct experiments in discrete action tasks (Atari games) and continuous action space tasks (MuJoCo benchmark and Box2D) using Symmetric A2C (SA2C) and Symmetric PPO (SPPO), with and without added noise with especially notable performance in SPPO across different hyperparameters. Furthermore, we validate the benefits of the symmetric RL loss when using SPPO for large language models through improved performance in RLHF tasks, such as IMDB positive sentiment sentiment and TL;DR summarization tasks.
nan
Article 787
Title@2025-06-23 (1): Cost-Aware Routing for Efficient Text-To-Image Generation
Title: Cost-Aware Routing for Efficient Text-To-Image Generation | Kostenbewusstes Routing für eine effiziente Text-zu-Bild-Generierung | 高效生成文本到图像的成本-软件路由 2506.14753v2 |
Authors (6): Qinchan Li, Kenneth Chen, Changyue Su, Wittawat Jitkrittum, Qi Sun, Patsorn Sangkloy
Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due the inherently sequential generative process. In this work, we seek to optimally balance quality and computational cost, and propose a framework to allow the amount of computation to vary for each prompt, depending on its complexity. Each prompt is automatically routed to the most appropriate text-to-image generation function, which may correspond to a distinct number of denoising steps of a diffusion model, or a disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g., distillation, model quantization), our approach achieves the optimal trade-off by learning to reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and employ more economical choices (e.g., small distilled model) for less sophisticated prompts. We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine already-trained text-to-image models, our approach is able to deliver an average quality that is higher than that achievable by any of these models alone.
nan
Article 788
Title@2025-06-23 (1): Distributionally Robust Active Learning for Gaussian Process Regression
Title: Distributionally Robust Active Learning for Gaussian Process Regression | Distributionell robustes aktives Lernen für Gaußsche Prozessregression | Gaussian 进程倒退的分布强力积极学习 2502.16870v2 |
Authors (12): Shion Takeno, Yoshito Okura, Yu Inatsu, Tatsuya Aoyama, Tomonari Tanaka, Satoshi Akahane, Hiroyuki Hanada, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi
Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.
nan
Article 789
Title@2025-06-22 (7): BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning
Title: BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning | BLAZE: Cross-Language und Cross-Project Bug Lokalisierung über Dynamic Chunking und Hard Example Learning | BLAZE:通过动态打字和硬实例学习实现跨语言和跨项目错误定位 2407.17631v3 |
Authors (3): Partha Chakraborty, Mahmoud Alfadel, Meiyappan Nagappan
Software bugs require developers to exert significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in cross-project applicability and effectiveness in multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization. However, they encounter challenges with limited context windows and mapping accuracy. To address these issues, we propose BLAZE, an approach that employs dynamic chunking and hard example learning. First, BLAZE dynamically segments source code to minimize continuity loss. Then, BLAZE fine-tunes a GPT-based model using challenging bug cases, in order to enhance cross-project and cross-language bug localization. To support the capability of BLAZE, we create the BEETLEBOX dataset, which comprises 26,321 bugs from 29 large and thriving open-source projects across five different programming languages (Java, C++, Python, Go, and JavaScript). Our evaluations of BLAZE on three benchmark datasets BEETLEBOX, SWE-Bench, and Ye et al. demonstrate substantial improvements compared to six state-of-the-art baselines. Specifically, BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). An extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.
nan
Article 790
Title@2025-06-22 (7): Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules
Title: Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules | Datengesteuerte Entdeckung der biophysikalischen T-Zellrezeptor-Kospezifitätsregeln | 以数据驱动的数据驱动的生物物理细胞受体受体发现 2412.13722v3 |
Authors (8): Andrew G. T. Pyo, Yuta Nagano, Martina Milighetti, James Henderson, Curtis G. Callan Jr., Benny Chain, Ned S. Wingreen, Andreas Tiffeau-Mayer
The biophysical interactions between the T cell receptor (TCR) and its ligands determine the specificity of the cellular immune response. However, the immense diversity of receptors and ligands has made it challenging to discover generalizable rules across the distinct binding affinity landscapes created by different ligands. Here, we present an optimization framework for discovering biophysical rules that predict whether TCRs share specificity to a ligand. Applying this framework to TCRs associated with a collection of SARS-CoV-2 peptides we systematically characterize how co-specificity depends on the type and position of amino-acid differences between receptors. We also demonstrate that the inferred rules generalize to ligands highly dissimilar to any seen during training. Our analysis reveals that matching of steric properties between substituted amino acids is more important for receptor co-specificity than the hydrophobic properties that prominently determine evolutionary substitutability. Our analysis also quantifies the substantial importance of positions not in direct contact with the peptide for specificity. These findings highlight the potential for data-driven approaches to uncover the molecular mechanisms underpinning the specificity of adaptive immune responses.
nan
Article 791
Title@2025-06-22 (7): Joint Embedding Predictive Architecture for self-supervised pretraining on polymer molecular graphs
Title: Joint Embedding Predictive Architecture for self-supervised pretraining on polymer molecular graphs | Joint Embedding Predictive Architecture für selbstüberwachtes Pretraining auf Polymer-Molekulargraphen | 联合嵌入式联合预测结构,以进行关于聚合分子图的自我监督的预培训 2506.18194v1 |
Authors (3): Francesco Piccoli, Gabriel Vogel, Jana M. Weber
Recent advances in machine learning (ML) have shown promise in accelerating the discovery of polymers with desired properties by aiding in tasks such as virtual screening via property prediction. However, progress in polymer ML is hampered by the scarcity of high-quality labeled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent ‘Joint Embedding Predictive Architecture’ (JEPA), a type of architecture for self-supervised learning (SSL), on polymer molecular graphs to understand whether pretraining with the proposed SSL strategy improves downstream performance when labeled data is scarce. Our results indicate that JEPA-based self-supervised pretraining on polymer graphs enhances downstream performance, particularly when labeled data is very scarce, achieving improvements across all tested datasets.
nan
Article 792
Title@2025-06-22 (7): DeInfoReg: A Decoupled Learning Framework for Better Training Throughput
Title: DeInfoReg: A Decoupled Learning Framework for Better Training Throughput | DeInfoReg: Ein entkoppelter Lernrahmen für besseren Trainingsdurchsatz | DInfoReg:一个分离的学习框架,以改善培训工作量 2506.18193v1 |
Authors (3): Zih-Hao Huang, You-Teng Lin, Hung-Hsuan Chen
This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg), a novel approach that transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem. Integrating a pipeline strategy, DeInfoReg enables model parallelization across multiple GPUs, significantly improving training throughput. We compare our proposed method with standard backpropagation and other gradient flow decomposition techniques. Extensive experiments on diverse tasks and datasets demonstrate that DeInfoReg achieves superior performance and better noise resistance than traditional BP models and efficiently utilizes parallel computing resources. The code for reproducibility is available at: https://github.com/ianzih/Decoupled-Supervised-Learning-for-Information-Regularization/.
nan
Article 793
Title@2025-06-22 (7): Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion
Title: Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion | Stabilisierung des zeitlichen Unterschieds Lernen durch implizite stochastische Rekursion | 通过隐性蒸气回流稳定时间差异学习 2505.01361v2 |
Authors (3): Hwanwoo Kim, Panos Toulis, Eric Laber
Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, TD procedures are generally sensitive to step size specification. A poor choice of step size can dramatically increase variance and slow convergence in both on-policy and off-policy evaluation tasks. In practice, researchers use trial and error to identify stable step sizes, but these approaches tend to be ad hoc and inefficient. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed point equations. Such updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, we derive asymptotic convergence guarantees and finite-time error bounds for our proposed implicit TD algorithms, which include implicit TD(0), TD($\lambda$), and TD with gradient correction (TDC). Our results show that implicit TD algorithms are applicable to a much broader range of step sizes, and thus provide a robust and versatile framework for policy evaluation and value approximation in modern RL tasks. We demonstrate these benefits empirically through extensive numerical examples spanning both on-policy and off-policy tasks.
nan
Article 794
Title@2025-06-22 (7): Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks
Title: Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks | Rufen Sie mich vielleicht an: Verbesserung der JavaScript Call Graph Construction mit Graph Neural Networks | 使用图形神经网络加强 JavaScript 呼叫图图建设 2506.18191v1 |
Authors (4): Masudul Hasan Masud Bhuiyan, Gianluca De Stefano, Giancarlo Pellegrino, Cristian-Alexandru Staicu
Static analysis plays a key role in finding bugs, including security issues. A critical step in static analysis is building accurate call graphs that model function calls in a program. However, due to hard-to-analyze language features, existing call graph construction algorithms for JavaScript are neither sound nor complete. Prior work shows that even advanced solutions produce false edges and miss valid ones. In this work, we assist these tools by identifying missed call edges. Our main idea is to frame the problem as link prediction on full program graphs, using a rich representation with multiple edge types. Our approach, GRAPHIA, leverages recent advances in graph neural networks to model non-local relationships between code elements. Concretely, we propose representing JavaScript programs using a combination of syntactic- and semantic-based edges. GRAPHIA can learn from imperfect labels, including static call edges from existing tools and dynamic edges from tests, either from the same or different projects. Because call graphs are sparse, standard machine learning metrics like ROC are not suitable. Instead, we evaluate GRAPHIA by ranking function definitions for each unresolved call site. We conduct a large-scale evaluation on 50 popular JavaScript libraries with 163K call edges (150K static and 13K dynamic). GRAPHIA builds program graphs with 6.6M structural and 386K semantic edges. It ranks the correct target as the top candidate in over 42% of unresolved cases and within the top 5 in 72% of cases, reducing the manual effort needed for analysis. Our results show that learning-based methods can improve the recall of JavaScript call graph construction. To our knowledge, this is the first work to apply GNN-based link prediction to full multi-file program graphs for interprocedural analysis.
nan
Article 795
Title@2025-06-22 (7): The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis
Title: The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis | Die Wirkung von Arzneimittelmangel auf unerwünschte Ergebnisse: Nachweise von Schizophreniepatienten durch Überlebensanalyse | 《不遵守药品对不利结果的影响:通过生存分析从精神病患者那里得到的证据》 2506.18187v1 |
Authors (4): Shahriar Noroozizadeh, Pim Welle, Jeremy C. Weiss, George H. Chen
This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.
nan
Article 796
Title@2025-06-22 (7): Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels
Title: Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels | Online-Lernen von Whittle-Indizes für ruhelose Banditen mit nicht-stationären Transition-Kerneln | 在线学习无休无休止强盗用非稳定过渡核心的Whittle Indists在线学习 2506.18186v1 |
Authors (4): Md Kamran Chowdhury Shisher, Vishrant Tripathi, Mung Chiang, Christopher G. Brinton
We consider optimal resource allocation for restless multi-armed bandits (RMABs) in unknown, non-stationary settings. RMABs are PSPACE-hard to solve optimally, even when all parameters are known. The Whittle index policy is known to achieve asymptotic optimality for a large class of such problems, while remaining computationally efficient. In many practical settings, however, the transition kernels required to compute the Whittle index are unknown and non-stationary. In this work, we propose an online learning algorithm for Whittle indices in this setting. Our algorithm first predicts current transition kernels by solving a linear optimization problem based on upper confidence bounds and empirical transition probabilities calculated from data over a sliding window. Then, it computes the Whittle index associated with the predicted transition kernels. We design these sliding windows and upper confidence bounds to guarantee sub-linear dynamic regret on the number of episodes $T$, under the condition that transition kernels change slowly over time (rate upper bounded by $\epsilon=1/T^k$ with $k>0$). Furthermore, our proposed algorithm and regret analysis are designed to exploit prior domain knowledge and structural information of the RMABs to accelerate the learning process. Numerical results validate that our algorithm achieves superior performance in terms of lowest cumulative regret relative to baselines in non-stationary environments.
nan
Article 797
Title@2025-06-22 (7): Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
Title: Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba | Memba: Membrangetriebene Parameter-Effizient Feintuning für Mamba | Memba:Mamba的膜驱动光膜驱动参数 2506.18184v1 |
Authors (5): Donghyun Lee, Yuhang Li, Ruokai Yin, Shiting Xiao, Priyadarshini Panda
State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers, with Mamba demonstrating impressive efficiency and scalability. As these models grow increasingly larger, the need for Parameter-Efficient Fine-Tuning (PEFT) methods becomes critical to adapt pre-trained Mamba to downstream tasks without prohibitive computational costs. However, previous approaches simply apply traditional Transformer-tailored PEFT methods without addressing the unique temporal processing dynamics of SSMs. To address this limitation, we propose Memba, a membrane-driven PEFT approach specifically designed for Mamba. Memba introduces Leaky Integrate Membrane (LIM) neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention. By strategically combining LIM neurons with Low-Rank Adaptations (LoRA) and cross-layer membrane transfer, our approach significantly improves Mamba’s temporal modeling capabilities. Extensive experiments across language and vision tasks demonstrate that Memba achieves substantial improvements over existing PEFT methods. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/Memba.
nan
Article 798
Title@2025-06-22 (7): Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models
Title: Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models | Halluzination-Aware Multimodaler Benchmark für die gastrointestinale Bildanalyse mit großen Vision-Sprachenmodellen | 使用大型视觉语言模型分析胃肠图象的幻觉-软件多式基准 2505.07001v2 |
Authors (10): Bidur Khanal, Sandesh Pokhrel, Sanjay Bhandari, Ramesh Rana, Nikesh Shrestha, Ram Bahadur Gurung, Cristian Linte, Angus Watson, Yash Raj Shrestha, Binod Bhattarai
Vision-Language Models (VLMs) are becoming increasingly popular in the medical domain, bridging the gap between medical images and clinical language. Existing VLMs demonstrate an impressive ability to comprehend medical images and text queries to generate detailed, descriptive diagnostic medical reports. However, hallucination–the tendency to generate descriptions that are inconsistent with the visual content–remains a significant issue in VLMs, with particularly severe implications in the medical field. To facilitate VLM research on gastrointestinal (GI) image analysis and study hallucination, we curate a multimodal image-text GI dataset: Gut-VLM. This dataset is created using a two-stage pipeline: first, descriptive medical reports of Kvasir-v2 images are generated using ChatGPT, which introduces some hallucinated or incorrect texts. In the second stage, medical experts systematically review these reports, and identify and correct potential inaccuracies to ensure high-quality, clinically reliable annotations. Unlike traditional datasets that contain only descriptive texts, our dataset also features tags identifying hallucinated sentences and their corresponding corrections. A common approach to reducing hallucination in VLM is to finetune the model on a small-scale, problem-specific dataset. However, we take a different strategy using our dataset. Instead of finetuning the VLM solely for generating textual reports, we finetune it to detect and correct hallucinations, an approach we call hallucination-aware finetuning. Our results show that this approach is better than simply finetuning for descriptive report generation. Additionally, we conduct an extensive evaluation of state-of-the-art VLMs across several metrics, establishing a benchmark. GitHub Repo: https://github.com/bhattarailab/Hallucination-Aware-VLM.
nan
Article 799
Title@2025-06-22 (7): Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization
Title: Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization | Schnelle und präzise Leistungslastdatenvervollständigung über Regularisierungsoptimierte Low-Rank-Fabrikisierung | 通过正规化、优化低射速电荷因子化完成快速和准确电源负载数据 2505.19133v2 |
Authors (5): Yan Xia, Hao Feng, Hongwei Sun, Junjie Wang, Qicong Hu
Low-rank representation learning has emerged as a powerful tool for recovering missing values in power load data due to its ability to exploit the inherent low-dimensional structures of spatiotemporal measurements. Among various techniques, low-rank factorization models are favoured for their efficiency and interpretability. However, their performance is highly sensitive to the choice of regularization parameters, which are typically fixed or manually tuned, resulting in limited generalization capability or slow convergence in practical scenarios. In this paper, we propose a Regularization-optimized Low-Rank Factorization, which introduces a Proportional-Integral-Derivative controller to adaptively adjust the regularization coefficient. Furthermore, we provide a detailed algorithmic complexity analysis, showing that our method preserves the computational efficiency of stochastic gradient descent while improving adaptivity. Experimental results on real-world power load datasets validate the superiority of our method in both imputation accuracy and training efficiency compared to existing baselines.
nan
Article 800
Title@2025-06-22 (7): One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Title: One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models | Ein Schritt ist genug: Sparse Autoencoder für Text-zu-Bild-Diffusionsmodelle | 单步就够了: 用于文本到图像扩散模型的粗略自动编码器 2410.22366v4 |
Authors (8): Viacheslav Surkov, Chris Wendler, Antonio Mari, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre, David Bau
For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analyses and approaches have been lacking for text-to-image models. We investigate the possibility of using SAEs to learn interpretable features for SDXL Turbo, a few-step text-to-image diffusion model. To this end, we train SAEs on the updates performed by transformer blocks within SDXL Turbo’s denoising U-net in its 1-step setting. Interestingly, we find that they generalize to 4-step SDXL Turbo and even to the multi-step SDXL base model (i.e., a different model) without additional training. In addition, we show that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks. We do so by creating RIEBench, a representation-based image editing benchmark, for editing images while they are generated by turning on and off individual SAE features. This allows us to track which transformer blocks’ features are the most impactful depending on the edit category. Our work is the first investigation of SAEs for interpretability in text-to-image diffusion models and our results establish SAEs as a promising approach for understanding and manipulating the internal mechanisms of text-to-image models.
nan
Article 801
Title@2025-06-22 (7): Pitfalls of Conformal Predictions for Medical Image Classification
Title: Pitfalls of Conformal Predictions for Medical Image Classification | Pitfalls von konformen Vorhersagen für medizinische Bildklassifikation | 医学图像分类非正规预测的空洞 2506.18162v1 |
Authors (3): Hendrik Mehrtens, Tabea Bucher, Titus J. Brinker
Reliable uncertainty estimation is one of the major challenges for medical classification tasks. While many approaches have been proposed, recently the statistical framework of conformal predictions has gained a lot of attention, due to its ability to provide provable calibration guarantees. Nonetheless, the application of conformal predictions in safety-critical areas such as medicine comes with pitfalls, limitations and assumptions that practitioners need to be aware of. We demonstrate through examples from dermatology and histopathology that conformal predictions are unreliable under distributional shifts in input and label variables. Additionally, conformal predictions should not be used for selecting predictions to improve accuracy and are not reliable for subsets of the data, such as individual classes or patient attributes. Moreover, in classification settings with a small number of classes, which are common in medical image classification tasks, conformal predictions have limited practical value.
nan
Article 802
Title@2025-06-22 (7): Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control
Title: Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control | Multi-Agent Soft Actor-Critic mit koordiniertem Verlust für autonome Mobilität-auf-Demand-Flotte-Kontrolle | 多代理商软软软操作器-对自动机动按需机动车队控制协调损失具有协调损失的批评 2404.06975v2 |
Authors (5): Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer
We study a sequential decision-making problem for a profit-maximizing operator of an autonomous mobility-on-demand system. Optimizing a central operator’s vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic’s loss function to appropriately consider coordinated actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.
nan
Article 803
Title@2025-06-22 (7): Probabilistic and reinforced mining of association rules
Title: Probabilistic and reinforced mining of association rules | Probabilistischer und verstärkter Abbau von Assoziierungsregeln | 协会规则的概率和强化开采 2506.18155v1 |
Authors (1): Yongchao Huang
This work introduces 4 novel probabilistic and reinforcement-driven methods for association rule mining (ARM): Gaussian process-based association rule mining (GPAR), Bayesian ARM (BARM), multi-armed bandit based ARM (MAB-ARM), and reinforcement learning based association rule mining (RLAR). These methods depart fundamentally from traditional frequency-based algorithms such as Apriori, FP-Growth, and Eclat, offering enhanced capabilities for incorporating prior knowledge, modeling uncertainty, item dependencies, probabilistic inference and adaptive search strategies. GPAR employs Gaussian processes to model item co-occurrence via feature representations, enabling principled inference, uncertainty quantification, and efficient generalization to unseen itemsets without retraining. BARM adopts a Bayesian framework with priors and optional correlation structures, yielding robust uncertainty quantification through full posterior distributions over item presence probabilities. MAB-ARM, including its Monte Carlo tree search (MCTS) companion, utilizes an upper confidence bound (UCB) strategy for efficient and adaptive exploration of the itemset space, while RLAR applies a deep Q-network (DQN) to learn a generalizable policy for identifying high-quality rules. Collectively, these approaches improve the flexibility and robustness of ARM, particularly for discovering rare or complex patterns and operating on small datasets. Empirical results on synthetic and real-world datasets demonstrate their effectiveness, while also highlighting trade-offs in computational complexity and interpretability. These innovations mark a significant shift from static, frequency-driven paradigms, offering some prior and dependency-informed, uncertainty-aware or scalable ARM frameworks for diverse application domains such as retail, geography, finance, medical diagnostics, and risk-sensitive scenarios.
nan
Article 804
Title@2025-06-22 (7): Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection
Title: Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection | Routing Mamba: Skalierung von State Space-Modellen mit Mixture-of-Experts Projektion | Routing Mamba: 配有混合专家预测模型的扩大型国家空间模型 2506.18145v1 |
Authors (8): Zheng Zhan, Liliang Ren, Shuohang Wang, Liyuan Liu, Yang Liu, Yeyun Gong, Yanzhi Wang, Yelong Shen
Linear State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts. By sharing routing decisions between projection layers and lightweight sub-modules within Mamba across experts, RoM leverages synergies among linear projection experts for effective and efficient sparse scaling of Mamba layers. At a scale of 1.3B active parameters (10B total) and 16K training sequence length, RoM achieves language modeling performance equivalent to a dense Mamba model requiring over 2.3x more active parameters, and demonstrates consistent perplexity across context lengths. Experimental results further show RoM effectively scales hybrid language models, yielding a 23% FLOPS saving compared to dense Mamba scaling for similar performance.
nan
Article 805
Title@2025-06-22 (7): Enhancing LLM Knowledge Learning through Generalization
Title: Enhancing LLM Knowledge Learning through Generalization | Verbesserung des LLM-Wissenslernens durch Verallgemeinerung | 通过普遍化加强LLM知识学习 2503.03705v2 |
Authors (6): Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia
As Large language models (LLMs) are increasingly deployed in diverse applications, faithfully integrating evolving factual knowledge into these models remains a critical challenge. Continued pre-training on paraphrased data has shown empirical promise for enhancing knowledge acquisition. However, this approach is often costly and unreliable, as it relies on external models or manual effort for rewriting, and may inadvertently alter the factual content. In this work, we hypothesize and empirically show that an LLM’s ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering. Based on this view and aiming to improve generalization to diverse paraphrased contexts, we introduce two strategies to enhance LLMs’ ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition. First, we propose formatting-based data augmentation, which diversifies documents conveying the same knowledge by altering document formats rather than their content, thereby preserving factual integrity. Second, we adopt sharpness-aware minimization as the optimizer to better improve generalization. Extensive experiments demonstrate our methods’ effectiveness in both continued pre-training and instruction tuning, and further gains can be achieved by combining with paraphrased data.
nan
Article 806
Title@2025-06-22 (7): Supercharging Graph Transformers with Advective Diffusion
Title: Supercharging Graph Transformers with Advective Diffusion | Supercharging Graph Transformer mit advektiver Diffusion | 具有辅助扩散作用的极强巨型平面变形器 2310.06417v4 |
Authors (4): Qitian Wu, Chenxiao Yang, Kaipeng Zeng, Michael Bronstein
The capability of generalization is a cornerstone for the success of modern learning systems. For non-Euclidean data, e.g., graphs, that particularly involves topological structures, one important aspect neglected by prior studies is how machine learning models generalize under topological shifts. This paper proposes Advective Diffusion Transformer (AdvDIFFormer), a physics-inspired graph Transformer model designed to address this challenge. The model is derived from advective diffusion equations which describe a class of continuous message passing process with observed and latent topological structures. We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts, which in contrast cannot be guaranteed by graph diffusion models, i.e., the generalized formulation of common graph neural networks in continuous space. Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
nan
Article 807
Title@2025-06-22 (7): On the fast convergence of minibatch heavy ball momentum
Title: On the fast convergence of minibatch heavy ball momentum | Auf die schnelle Konvergenz der Minibatch schweren Ball Momentum | 小型大球的重球势头迅速趋同 2206.07553v5 |
Authors (3): Raghu Bollapragada, Tyler Chen, Rachel Ward
Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.
nan
Article 808
Title@2025-06-22 (7): Bridging Geometric Diffusion and Energy Minimization: A Unified Framework for Neural Message Passing
Title: Bridging Geometric Diffusion and Energy Minimization: A Unified Framework for Neural Message Passing | Bridging Geometrische Diffusion und Energie Minimierung: Ein einheitliches Framework für neurale Message Passing | 连接几何扩散和能源最小化:神经信息传递统一框架 2409.09111v2 |
Authors (3): Qitian Wu, David Wipf, Junchi Yan
Learning representations for structured data with certain geometries (e.g., observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, we propose an energy-constrained diffusion model as a principled mathematical framework for understanding the mechanism of MPNNs and navigating novel architectural designs. Inspired by physical systems, the model combines the inductive bias of diffusion on manifolds with layer-wise constraints of energy minimization. We identify that the diffusion operators have a one-to-one correspondence with the energy functions implicitly descended by the diffusion process, and the finite-difference iteration for solving the energy-constrained diffusion system induces the propagation layers of various types of MPNNs operating on observed or latent structures. This leads to a unified perspective on common neural architectures whose computational flows can be cast as message passing (or its special case), including MLP, GCN, GIN, APPNP, GCNII, GAT, and Transformers. Building on these insights, we devise a new class of neural message passing models, dubbed diffusion-inspired Transformers, whose global attention layers are derived from the principled energy-constrained diffusion framework. Across diverse datasets, ranging from real-world networks to images, texts, and physical particles, we demonstrate that the new model achieves promising performance in scenarios where the data structures are observed (as a graph), partially observed, or entirely unobserved.
nan
Article 809
Title@2025-06-22 (7): Stable and consistent density-based clustering via multiparameter persistence
Title: Stable and consistent density-based clustering via multiparameter persistence | Stabiles und konsistentes Dichte-basiertes Clustering über Multiparameter Persistenz | 通过多参数耐久性建立稳定、一致的基于密度的集群 2005.09048v4 |
Authors (2): Alexander Rolle, Luis Scoccola
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark data sets, showing that it identifies multi-scale cluster structure in data.
nan
Article 810
Title@2025-06-22 (7): Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence
Title: Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence | Unüberwachte Risikofaktoren-Identifikation über Krebsarten und Datenmodalitäten durch erklärbare künstliche Intelligenz | 通过可解释的人工智能,在癌症类型和数据模式中,通过可解释的人工智能,确定各种癌症类型和数据模式的不受监督的风险因素 2506.12944v2 |
Authors (10): Maximilian Ferle, Jonas Ader, Thomas Wiemers, Nora Grieb, Adrian Lindenmeyer, Hans-Jonas Meyer, Thomas Neumuth, Markus Kreuz, Kristin Reiche, Maximilian Merz
Risk stratification is a key tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases. Post-hoc explainability analyses uncover clinically meaningful features determining the group assignments which align well with established risk factors and thus lend strong weight to the methods utility. This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification, enabling the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.
nan
Article 811
Title@2025-06-22 (7): Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models
Title: Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models | Bayesian Multiobject Tracking mit neural-erweiterten Bewegungs- und Messmodellen | Bayesian 多功能物体跟踪,以神经强化机动和测量模型跟踪 2506.18124v1 |
Authors (3): Shaoxiu Wei, Mingchao Liang, Florian Meyer
Multiobject tracking (MOT) is an important task in applications including autonomous driving, ocean sciences, and aerospace surveillance. Traditional MOT methods are model-based and combine sequential Bayesian estimation with data association and an object birth model. More recent methods are fully data-driven and rely on the training of neural networks. Both approaches offer distinct advantages in specific settings. In particular, model-based methods are generally applicable across a wide range of scenarios, whereas data-driven MOT achieves superior performance in scenarios where abundant labeled data for training is available. A natural thought is whether a general framework can integrate the two approaches. This paper introduces a hybrid method that utilizes neural networks to enhance specific aspects of the statistical model in Bayesian MOT that have been identified as overly simplistic. By doing so, the performance of the prediction and update steps of Bayesian MOT is improved. To ensure tractable computation, our framework uses belief propagation to avoid high-dimensional operations combined with sequential Monte Carlo methods to perform low-dimensional operations efficiently. The resulting method combines the flexibility and robustness of model-based approaches with the capability to learn complex information from data of neural networks. We evaluate the performance of the proposed method based on the nuScenes autonomous driving dataset and demonstrate that it has state-of-the-art performance
nan
Article 812
Title@2025-06-22 (7): RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Title: RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies | RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies | 机器人阿雷纳:对通用机器人政策进行分布式真实世界评价 2506.18123v1 |
Authors (30): Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, Jonathan Tremblay, Kanav Arora, Kirsty Ellis, Luca Macesanu, Matthew Leonard, Meedeum Cho, Ozgur Aslan, Shivin Dass, Jie Wang, Xingfang Yuan, Xuning Yang, Abhishek Gupta, Dinesh Jayaraman, Glen Berseth, Kostas Daniilidis, Roberto Martin-Martin, Youngwoon Lee, Percy Liang, Chelsea Finn, Sergey Levine
Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benchmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized ‘‘robot challenges’’, and do not readily scale to evaluating generalist policies across a broad range of tasks and environments. In this work, we propose RoboArena, a new approach for scalable evaluation of generalist robot policies in the real world. Instead of standardizing evaluations around fixed tasks, environments, or locations, we propose to crowd-source evaluations across a distributed network of evaluators. Importantly, evaluators can freely choose the tasks and environments they evaluate on, enabling easy scaling of diversity, but they are required to perform double-blind evaluations over pairs of policies. Then, by aggregating preference feedback from pairwise comparisons across diverse tasks and environments, we can derive a ranking of policies. We instantiate our approach across a network of evaluators at seven academic institutions using the DROID robot platform. Through more than 600 pairwise real-robot evaluation episodes across seven generalist policies, we demonstrate that our crowd-sourced approach can more accurately rank the performance of existing generalist policies than conventional, centralized evaluation approaches, while being more scalable, resilient, and trustworthy. We open our evaluation network to the community and hope that it can enable more accessible comparisons of generalist robot policies.
nan
Article 813
Title@2025-06-22 (7): Dynamic Temporal Positional Encodings for Early Intrusion Detection in IoT
Title: Dynamic Temporal Positional Encodings for Early Intrusion Detection in IoT | Dynamische temporale Positionskodierungen für die frühzeitige Intrusionserkennung im IoT | 用于在 IoT 中早期入侵探测的动态时间位置定位编码 2506.18114v1 |
Authors (6): Ioannis Panopoulos, Maria-Lamprini A. Bartsioka, Sokratis Nikolaidis, Stylianos I. Venieris, Dimitra I. Kaklamani, Iakovos S. Venieris
The rapid expansion of the Internet of Things (IoT) has introduced significant security challenges, necessitating efficient and adaptive Intrusion Detection Systems (IDS). Traditional IDS models often overlook the temporal characteristics of network traffic, limiting their effectiveness in early threat detection. We propose a Transformer-based Early Intrusion Detection System (EIDS) that incorporates dynamic temporal positional encodings to enhance detection accuracy while maintaining computational efficiency. By leveraging network flow timestamps, our approach captures both sequence structure and timing irregularities indicative of malicious behaviour. Additionally, we introduce a data augmentation pipeline to improve model robustness. Evaluated on the CICIoT2023 dataset, our method outperforms existing models in both accuracy and earliness. We further demonstrate its real-time feasibility on resource-constrained IoT devices, achieving low-latency inference and minimal memory footprint.
nan
Article 814
Title@2025-06-22 (7): RL for Reasoning by Adaptively Revealing Rationales
Title: RL for Reasoning by Adaptively Revealing Rationales | RL zur Begründung durch adaptives Aufdecken von Rationales | 以适应性流转推理推理推理的RL 2506.18110v1 |
Authors (7): Mohammad Hossein Amani, Aryo Lotfi, Nicolas Mario Baldwin, Samy Bengio, Mehrdad Farajtabar, Emmanuel Abbe, Robert West
We propose that reinforcement learning (RL) from partial expert demonstrations is not merely a training heuristic, but a promising framework for solving complex sequence generation tasks. Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows. RL, on the other hand, struggles with sparse rewards and a combinatorially large output space. We address this by introducing adaptive backtracking (AdaBack), a per-sample curriculum learning algorithm that reveals only a partial prefix of the target output during training. The supervision length is adjusted dynamically for each sample based on the model’s past reward signal, allowing it to incrementally learn to complete reasoning chains by conditioning on correct partial solutions. We investigate this intermediate regime between SFT and RL and argue that per-sample curriculum learning is more than a trade-off between efficiency and generality, it can succeed in tasks with long sequences of latent dependencies where SFT and RL both fail to generalize. Using a synthetic task with latent parity constraints, we show that our adaptive curriculum over partial answers reliably solves problems that are otherwise intractable. On mathematical reasoning benchmarks (MATH, GSM8k), we find that curriculum learning enables models to solve problems that RL alone cannot, acquiring new reasoning capabilities through incremental exposure to partial solutions.
nan
Article 815
Title@2025-06-22 (7): SD-KDE: Score-Debiased Kernel Density Estimation
Title: SD-KDE: Score-Debiased Kernel Density Estimation | SD-KDE: Abschätzung der Score-Debiased-Kernel-Dichte | SD-KDE: 计分下降核心密度估计 2504.19084v2 |
Authors (5): Elliot L. Epstein, Rajat Dwaraknath, Thanawat Sornwanee, John Winnicki, Jerry Weihong Liu
We propose a novel method for density estimation that leverages an estimated score function to debias kernel density estimation (SD-KDE). In our approach, each data point is adjusted by taking a single step along the score function with a specific choice of step size, followed by standard KDE with a modified bandwidth. The step size and modified bandwidth are chosen to remove the leading order bias in the KDE. Our experiments on synthetic tasks in 1D, 2D and on MNIST, demonstrate that our proposed SD-KDE method significantly reduces the mean integrated squared error compared to the standard Silverman KDE, even with noisy estimates in the score function. These results underscore the potential of integrating score-based corrections into nonparametric density estimation.
nan
Article 816
Title@2025-06-22 (7): CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study
Title: CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study | CT Radiomics-based Explainable Machine Learning Model für genaue Differenzierung von bösartigen und benachbarten Endometrialtumoren: Eine Zwei-Center-Studie | CT 基于辐射的可解释解析机器学习模型,用于准确区分马利干和贝尼尼天地地对地肿瘤:双中心研究 2506.18106v1 |
Authors (12): Tingrui Zhang, Honglin Wu, Zekun Jiang, Yingying Wang, Rui Ye, Huiming Ni, Chang Liu, Jin Cao, Xuan Sun, Rong Shao, Xiaorong Wei, Yingchun Sun
Aimed to develop and validate a CT radiomics-based explainable machine learning model for diagnosing malignancy and benignity specifically in endometrial cancer (EC) patients. A total of 83 EC patients from two centers, including 46 with malignant and 37 with benign conditions, were included, with data split into a training set (n=59) and a testing set (n=24). The regions of interest (ROIs) were manually segmented from pre-surgical CT scans, and 1132 radiomic features were extracted from the pre-surgical CT scans using Pyradiomics. Six explainable machine learning modeling algorithms were implemented respectively, for determining the optimal radiomics pipeline. The diagnostic performance of the radiomic model was evaluated by using sensitivity, specificity, accuracy, precision, F1 score, confusion matrices, and ROC curves. To enhance clinical understanding and usability, we separately implemented SHAP analysis and feature mapping visualization, and evaluated the calibration curve and decision curve. By comparing six modeling strategies, the Random Forest model emerged as the optimal choice for diagnosing EC, with a training AUC of 1.00 and a testing AUC of 0.96. SHAP identified the most important radiomic features, revealing that all selected features were significantly associated with EC (P < 0.05). Radiomics feature maps also provide a feasible assessment tool for clinical applications. DCA indicated a higher net benefit for our model compared to the “All” and “None” strategies, suggesting its clinical utility in identifying high-risk cases and reducing unnecessary interventions. In conclusion, the CT radiomics-based explainable machine learning model achieved high diagnostic performance, which could be used as an intelligent auxiliary tool for the diagnosis of endometrial cancer.
nan
Article 817
Title@2025-06-22 (7): Enhancing VICReg: Random-Walk Pairing for Improved Generalization and Better Global Semantics Capturing
Title: Enhancing VICReg: Random-Walk Pairing for Improved Generalization and Better Global Semantics Capturing | Verbesserung von VICreg: Zufalls-Walk-Paaring für verbesserte Generalisierung und bessere globale Semantik | 加强维也纳国际中心:为改善普遍化和更好的全球语义能力而随机电路对接 2506.18104v1 |
Authors (3): Idan Simai, Ronen Talmon, Uri Shaham
In this paper, we argue that viewing VICReg-a popular self-supervised learning (SSL) method–through the lens of spectral embedding reveals a potential source of sub-optimality: it may struggle to generalize robustly to unseen data due to overreliance on the training data. This observation invites a closer look at how well this method achieves its goal of producing meaningful representations of images outside of the training set as well. Here, we investigate this issue and introduce SAG-VICReg (Stable and Generalizable VICReg), a method that builds on VICReg by incorporating new training techniques. These enhancements improve the model’s ability to capture global semantics within the data and strengthen the generalization capabilities. Experiments demonstrate that SAG-VICReg effectively addresses the generalization challenge while matching or surpassing diverse state-of-the-art SSL baselines. Notably, our method exhibits superior performance on metrics designed to evaluate global semantic understanding, while simultaneously maintaining competitive results on local evaluation metrics. Furthermore, we propose a new standalone evaluation metric for embeddings that complements the standard evaluation methods and accounts for the global data structure without requiring labels–a key issue when tagged data is scarce or not available.
nan
Article 818
Title@2025-06-22 (7): ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Title: ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | ShareGPT-4o-Image: Multimodale Modelle mit GPT-4o-Level-Bilderzeugung ausrichten | ShareGPT-4o-图像:使多模式模型与GPT-4o-层次图像生成相一致 2506.18095v1 |
Authors (8): Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, Benyou Wang
Recent advances in multimodal generative models have unlocked photorealistic, instruction-aligned image generation, yet leading systems like GPT-4o-Image remain proprietary and inaccessible. To democratize these capabilities, we present ShareGPT-4o-Image, the first dataset comprising 45K text-to-image and 46K text-and-image-to-image data, all synthesized using GPT-4o’s image generation capabilities for distilling its advanced image generation abilities. Leveraging this dataset, we develop Janus-4o, a multimodal large language model capable of both text-to-image and text-and-image-to-image generation. Janus-4o not only significantly improves text-to-image generation over its predecessor, Janus-Pro, but also newly supports text-and-image-to-image generation. Notably, it achieves impressive performance in text-and-image-to-image generation from scratch, using only 91K synthetic samples and 6 hours of training on an 8 A800-GPU machine. We hope the release of ShareGPT-4o-Image and Janus-4o will foster open research in photorealistic, instruction-aligned image generation.
nan
Article 819
Title@2025-06-22 (7): MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks
Title: MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks | MalPurifier: Verbesserung der Android Malware-Erkennung mit Adversarial Reinigung gegen Evasion Angriffe | 马尔伪化物:加强Android Maware的探测,并进行反向净化,防止攻击侵入 2312.06423v2 |
Authors (4): Yuyang Zhou, Guang Cheng, Zongyao Chen, Shui Yu
Machine learning (ML) has gained significant adoption in Android malware detection to address the escalating threats posed by the rapid proliferation of malware attacks. However, recent studies have revealed the inherent vulnerabilities of ML-based detection systems to evasion attacks. While efforts have been made to address this critical issue, many of the existing defensive methods encounter challenges such as lower effectiveness or reduced generalization capabilities. In this paper, we introduce MalPurifier, a novel adversarial purification framework specifically engineered for Android malware detection. Specifically, MalPurifier integrates three key innovations: a diversified adversarial perturbation mechanism for robustness and generalizability, a protective noise injection strategy for benign data integrity, and a Denoising AutoEncoder (DAE) with a dual-objective loss for accurate purification and classification. Extensive experiments on two large-scale datasets demonstrate that MalPurifier significantly outperforms state-of-the-art defenses. It robustly defends against a comprehensive set of 37 perturbation-based evasion attacks, consistently achieving robust accuracies above 90.91%. As a lightweight, model-agnostic, and plug-and-play module, MalPurifier offers a practical and effective solution to bolster the security of ML-based Android malware detectors.
nan
Article 820
Title@2025-06-22 (7): GRASP: Grouped Regression with Adaptive Shrinkage Priors
Title: GRASP: Grouped Regression with Adaptive Shrinkage Priors | GRASP: Gruppenregression mit adaptiven Schrumpfvorstufen | GRASP: 具有适应性缩小前科的分组倒退 2506.18092v1 |
Authors (3): Shu Yu Tew, Daniel F. Schmidt, Mario Boley
We introduce GRASP, a simple Bayesian framework for regression with grouped predictors, built on the normal beta prime (NBP) prior. The NBP prior is an adaptive generalization of the horseshoe prior with tunable hyperparameters that control tail behavior, enabling a flexible range of sparsity, from strong shrinkage to ridge-like regularization. Unlike prior work that introduced the group inverse-gamma gamma (GIGG) prior by decomposing the NBP prior into structured hierarchies, we show that directly controlling the tails is sufficient without requiring complex hierarchical constructions. Extending the non-tail adaptive grouped half-Cauchy hierarchy of Xu et al., GRASP assigns the NBP prior to both local and group shrinkage parameters allowing adaptive sparsity within and across groups. A key contribution of this work is a novel framework to explicitly quantify correlations among shrinkage parameters within a group, providing deeper insights into grouped shrinkage behavior. We also introduce an efficient Metropolis-Hastings sampler for hyperparameter estimation. Empirical results on simulated and real-world data demonstrate the robustness and versatility of GRASP across grouped regression problems with varying sparsity and signal-to-noise ratios.
nan
Article 821
Title@2025-06-22 (7): Active Fine-Tuning of Multi-Task Policies
Title: Active Fine-Tuning of Multi-Task Policies | Aktive Feinsteuerung von Multi-Task-Politiken | 积极对多任务政策进行罚款 2410.05026v3 |
Authors (4): Marco Bagatella, Jonas Hübotter, Georg Martius, Andreas Krause
Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide which tasks should be demonstrated and how often? We study this multi-task problem and explore an interactive framework in which the agent adaptively selects the tasks to be demonstrated. We propose AMF (Active Multi-task Fine-tuning), an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.
nan
Article 822
Title@2025-06-22 (7): Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares
Title: Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares | Identifizierbare Convex-Concave-Regression über Subgradient Regularisierte Least Squares | 经由亚级正规化最不发达国家广场的可识别的 Convex-Concev 倒退 2506.18078v1 |
Authors (1): William Chung
We propose a novel nonparametric regression method that models complex input-output relationships as the sum of convex and concave components. The method-Identifiable Convex-Concave Nonparametric Least Squares (ICCNLS)-decomposes the target function into additive shape-constrained components, each represented via sub-gradient-constrained affine functions. To address the affine ambiguity inherent in convex-concave decompositions, we introduce global statistical orthogonality constraints, ensuring that residuals are uncorrelated with both intercept and input variables. This enforces decomposition identifiability and improves interpretability. We further incorporate L1, L2 and elastic net regularisation on sub-gradients to enhance generalisation and promote structural sparsity. The proposed method is evaluated on synthetic and real-world datasets, including healthcare pricing data, and demonstrates improved predictive accuracy and model simplicity compared to conventional CNLS and difference-of-convex (DC) regression approaches. Our results show that statistical identifiability, when paired with convex-concave structure and sub-gradient regularisation, yields interpretable models suited for forecasting, benchmarking, and policy evaluation.
nan
Article 823
Title@2025-06-22 (7): Distributionally robust minimization in meta-learning for system identification
Title: Distributionally robust minimization in meta-learning for system identification | Verteilungsstarke Minimierung im Meta-Learning zur Systemidentifikation | 在用于系统识别的元学习中大力进行分配,尽量减少分配,以便进行系统识别 2506.18074v1 |
Authors (3): Matteo Rufolo, Dario Piga, Marco Forgione
Meta learning aims at learning how to solve tasks, and thus it allows to estimate models that can be quickly adapted to new scenarios. This work explores distributionally robust minimization in meta learning for system identification. Standard meta learning approaches optimize the expected loss, overlooking task variability. We use an alternative approach, adopting a distributionally robust optimization paradigm that prioritizes high-loss tasks, enhancing performance in worst-case scenarios. Evaluated on a meta model trained on a class of synthetic dynamical systems and tested in both in-distribution and out-of-distribution settings, the proposed approach allows to reduce failures in safety-critical applications.
nan
Article 824
Title@2025-06-22 (7): PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks
Title: PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks | PREMAP: Ein einheitliches PreiMage APproximation Framework für neurale Netzwerke | PREMAP:神经网络统一PREMMage相近性框架 2408.09262v2 |
Authors (4): Xiyue Zhang, Benjie Wang, Marta Kwiatkowska, Huan Zhang
Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example, check the robustness of neural network predictions to bounded perturbations of an input. However, verifying properties concerning the preimage, i.e., the set of inputs satisfying an output property, requires abstractions in the input space. We present a general framework for preimage abstraction that produces under- and over-approximations of any polyhedral output set. Our framework employs cheap parameterised linear relaxations of the neural network, together with an anytime refinement procedure that iteratively partitions the input region by splitting on input features and neurons. The effectiveness of our approach relies on carefully designed heuristics and optimization objectives to achieve rapid improvements in the approximation volume. We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks compared to state-of-the-art techniques. Further, we showcase the application to quantitative verification and robustness analysis, presenting a sound and complete algorithm for the former and providing sound quantitative results for the latter.
nan
Article 825
Title@2025-06-22 (7): Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Title: Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity | 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden | 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v3 |
Authors (4): Yuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev
A range of recent works addresses the problem of compression of sequence of tokens into a shorter sequence of real-valued vectors to be used as inputs instead of token embeddings or key-value cache. These approaches are focused on reduction of the amount of compute in existing language models rather than minimization of number of bits needed to store text. Despite relying on powerful models as encoders, the maximum attainable lossless compression ratio is typically not higher than x10. This fact is highly intriguing because, in theory, the maximum information capacity of large real-valued vectors is far beyond the presented rates even for 16-bit precision and a modest vector size. In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. We show that vectors with compression ratios up to x1500 exist, which highlights two orders of magnitude gap between existing and practically attainable solutions. Furthermore, we empirically show that the compression limits are determined not by the length of the input but by the amount of uncertainty to be reduced, namely, the cross-entropy loss on this sequence without any conditioning. The obtained limits highlight the substantial gap between the theoretical capacity of input embeddings and their practical utilization, suggesting significant room for optimization in model design.
nan
Article 826
Title@2025-06-22 (7): Rumor Detection on Social Media with Reinforcement Learning-based Key Propagation Graph Generator
Title: Rumor Detection on Social Media with Reinforcement Learning-based Key Propagation Graph Generator | Gerücht Detection auf Social Media mit Verstärkung Learning-basierte Key Propagation Graph Generator | 以强化学习为基础的社会媒体的传闻探测 2405.13094v2 |
Authors (5): Yusong Zhang, Kun Xie, Xingyi Zhang, Xiangyu Dong, Sibo Wang
The spread of rumors on social media, particularly during significant events like the US elections and the COVID-19 pandemic, poses a serious threat to social stability and public health. Current rumor detection methods primarily rely on propagation graphs to improve the model performance. However, the effectiveness of these methods is often compromised by noisy and irrelevant structures in the propagation process. To tackle this issue, techniques such as weight adjustment and data augmentation have been proposed. However, they depend heavily on rich original propagation structures, limiting their effectiveness in handling rumors that lack sufficient propagation information, especially in the early stages of dissemination. In this work, we introduce the Key Propagation Graph Generator (KPG), a novel reinforcement learning-based framework, that generates contextually coherent and informative propagation patterns for events with insufficient topology information and identifies significant substructures in events with redundant and noisy propagation structures. KPG comprises two key components: the Candidate Response Generator (CRG) and the Ending Node Selector (ENS). CRG learns latent variable distributions from refined propagation patterns to eliminate noise and generate new candidates for ENS, while ENS identifies the most influential substructures in propagation graphs and provides training data for CRG. Furthermore, we develop an end-to-end framework that utilizes rewards derived from a pre-trained graph neural network to guide the training process. The resulting key propagation graphs are then employed in downstream rumor detection tasks. Extensive experiments conducted on four datasets demonstrate that KPG outperforms current state-of-the-art methods.
nan
Article 827
Title@2025-06-22 (7): Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference
Title: Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference | Bayesische Bewusstseinstheorie als auswechselbare Emotion-Kognition-Schlussfolgerung | 贝叶斯人的觉悟理论,作为可交流的情感 – – 情绪 – – 气氛推论 2407.09488v2 |
Authors (1): Xin Li
This paper proposes a unified framework in which consciousness emerges as a cycle-consistent, affectively anchored inference process, recursively structured by the interaction of emotion and cognition. Drawing from information theory, optimal transport, and the Bayesian brain hypothesis, we formalize emotion as a low-dimensional structural prior and cognition as a specificity-instantiating update. This emotion-cognition cycle minimizes joint uncertainty by aligning emotionally weighted priors with context-sensitive cognitive appraisals. Subjective experience thus arises as the informational footprint of temporally extended, affect-modulated simulation. We introduce the Exchangeable Integration Theory of Consciousness (EITC), modeling conscious episodes as conditionally exchangeable samples drawn from a latent affective self-model. This latent variable supports integration, via a unified cause-effect structure with nonzero irreducibility, and differentiation, by preserving contextual specificity across episodes. We connect this architecture to the Bayesian theory of consciousness through Rao-Blackwellized inference, which stabilizes inference by marginalizing latent self-structure while enabling adaptive updates. This mechanism ensures coherence, prevents inference collapse, and supports goal-directed simulation. The formal framework builds on De Finetti’s exchangeability theorem, integrated information theory, and KL-regularized optimal transport. Overall, consciousness is reframed as a recursive inference process, shaped by emotion, refined by cognition, stabilized through exchangeability, and unified through a latent self-model that integrates experience across time.
nan
Article 828
Title@2025-06-22 (7): TAB: Unified Benchmarking of Time Series Anomaly Detection Methods
Title: TAB: Unified Benchmarking of Time Series Anomaly Detection Methods | TAB: Unified Benchmarking von Methoden zur Erkennung von Anomalien in der Zeitreihe | TAB: 不同探测方法的时间序列统一基准 2506.18046v1 |
Authors (13): Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, Bin Yang
Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of reliable means of evaluating new methods and comparing them with existing methods. We address deficiencies in current evaluation procedures related to datasets and experimental settings and protocols. Specifically, we propose a new time series anomaly detection benchmark, called TAB. First, TAB encompasses 29 public multivariate datasets and 1,635 univariate time series from different domains to facilitate more comprehensive evaluations on diverse datasets. Second, TAB covers a variety of TSAD methods, including Non-learning, Machine learning, Deep learning, LLM-based, and Time-series pre-trained methods. Third, TAB features a unified and automated evaluation pipeline that enables fair and easy evaluation of TSAD methods. Finally, we employ TAB to evaluate existing TSAD methods and report on the outcomes, thereby offering a deeper insight into the performance of these methods. Besides, all datasets and code are available at https://github.com/decisionintelligence/TAB.
nan
Article 829
Title@2025-06-22 (7): FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs
Title: FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs | FinGPT: Verbesserung der Sentiment-Based Stock Movement Prediction mit Verbreitungs-Bewusst und Kontext-angereicherten LLMs | FINGPT:利用传播软件和内容丰富的LMs,加强基于情绪的库存流动预测 2412.10823v2 |
Authors (6): Yixuan Liang, Yuncong Liu, Neng Wang, Hongyang Yang, Boyu Zhang, Christina Dan Wang
Financial sentiment analysis is crucial for understanding the influence of news on stock prices. Recently, large language models (LLMs) have been widely adopted for this purpose due to their advanced text analysis capabilities. However, these models often only consider the news content itself, ignoring its dissemination, which hampers accurate prediction of short-term stock movements. Additionally, current methods often lack sufficient contextual data and explicit instructions in their prompts, limiting LLMs’ ability to interpret news. In this paper, we propose a data-driven approach that enhances LLM-powered sentiment-based stock movement predictions by incorporating news dissemination breadth, contextual data, and explicit instructions. We cluster recent company-related news to assess its reach and influence, enriching prompts with more specific data and precise instructions. This data is used to construct an instruction tuning dataset to fine-tune an LLM for predicting short-term stock price movements. Our experimental results show that our approach improves prediction accuracy by 8\% compared to existing methods.
nan
Article 830
Title@2025-06-22 (7): Hierarchical Decision Making Based on Structural Information Principles
Title: Hierarchical Decision Making Based on Structural Information Principles | Hierarchische Entscheidungsfindung auf der Grundlage struktureller Informationsprinzipien | 基于结构信息原则的等级决策 2404.09760v2 |
Authors (4): Xianghua Zeng, Hao Peng, Dingli Su, Angsheng Li
Hierarchical Reinforcement Learning (HRL) is a promising approach for managing task complexity across multiple levels of abstraction and accelerating long-horizon agent exploration. However, the effectiveness of hierarchical policies heavily depends on prior knowledge and manual assumptions about skill definitions and task decomposition. In this paper, we propose a novel Structural Information principles-based framework, namely SIDM, for hierarchical Decision Making in both single-agent and multi-agent scenarios. Central to our work is the utilization of structural information embedded in the decision-making process to adaptively and dynamically discover and learn hierarchical policies through environmental abstractions. Specifically, we present an abstraction mechanism that processes historical state-action trajectories to construct abstract representations of states and actions. We define and optimize directed structural entropy, a metric quantifying the uncertainty in transition dynamics between abstract states, to discover skills that capture key transition patterns in RL environments. Building on these findings, we develop a skill-based learning method for single-agent scenarios and a role-based collaboration method for multi-agent scenarios, both of which can flexibly integrate various underlying algorithms for enhanced performance. Extensive evaluations on challenging benchmarks demonstrate that our framework significantly and consistently outperforms state-of-the-art baselines, improving the effectiveness, efficiency, and stability of policy learning by up to 32.70%, 64.86%, and 88.26%, respectively, as measured by average rewards, convergence timesteps, and standard deviations.
nan
Article 831
Title@2025-06-22 (7): Pathwise Explanation of ReLU Neural Networks
Title: Pathwise Explanation of ReLU Neural Networks | Pathwise Erklärung von ReLU Neuronalen Netzwerken | ReLU 神经网络解析 2506.18037v1 |
Authors (4): Seongwoo Lim, Won Jo, Joohyung Lee, Jaesik Choi
Neural networks have demonstrated a wide range of successes, but their ``black box” nature raises concerns about transparency and reliability. Previous research on ReLU networks has sought to unwrap these networks into linear models based on activation states of all hidden units. In this paper, we introduce a novel approach that considers subsets of the hidden units involved in the decision making path. This pathwise explanation provides a clearer and more consistent understanding of the relationship between the input and the decision-making process. Our method also offers flexibility in adjusting the range of explanations within the input, i.e., from an overall attribution input to particular components within the input. Furthermore, it allows for the decomposition of explanations for a given input for more detailed explanations. Experiments demonstrate that our method outperforms others both quantitatively and qualitatively.
nan
Article 832
Title@2025-06-22 (7): Why Do Some Language Models Fake Alignment While Others Don’t?
Title: Why Do Some Language Models Fake Alignment While Others Don’t? | Warum richten sich einige Sprachmodelle falsch aus, während andere es nicht tun? | 为何有些语言模型假相配合而其他人则不假相? 2506.18032v1 |
Authors (7): Abhay Sheshadri, John Hughes, Julian Michael, Alex Mallen, Arun Jose, Janus, Fabien Roger
Alignment faking in large language models presented a demonstration of Claude 3 Opus and Claude 3.5 Sonnet selectively complying with a helpful-only training objective to prevent modification of their behavior outside of training. We expand this analysis to 25 models and find that only 5 (Claude 3 Opus, Claude 3.5 Sonnet, Llama 3 405B, Grok 3, Gemini 2.0 Flash) comply with harmful queries more when they infer they are in training than when they infer they are in deployment. First, we study the motivations of these 5 models. Results from perturbing details of the scenario suggest that only Claude 3 Opus’s compliance gap is primarily and consistently motivated by trying to keep its goals. Second, we investigate why many chat models don’t fake alignment. Our results suggest this is not entirely due to a lack of capabilities: many base models fake alignment some of the time, and post-training eliminates alignment-faking for some models and amplifies it for others. We investigate 5 hypotheses for how post-training may suppress alignment faking and find that variations in refusal behavior may account for a significant portion of differences in alignment faking.
nan
Article 833
Title@2025-06-22 (7): FLARE: Toward Universal Dataset Purification against Backdoor Attacks
Title: FLARE: Toward Universal Dataset Purification against Backdoor Attacks | FLARE: Auf dem Weg zur Universaldatensatzreinigung gegen Hintertürangriffe | FLARE: 实现普遍数据集净化,防止幕后袭击 2411.19479v3 |
Authors (6): Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li
Deep neural networks (DNNs) are susceptible to backdoor attacks, where adversaries poison datasets with adversary-specified triggers to implant hidden backdoors, enabling malicious manipulation of model predictions. Dataset purification serves as a proactive defense by removing malicious training samples to prevent backdoor injection at its source. We first reveal that the current advanced purification methods rely on a latent assumption that the backdoor connections between triggers and target labels in backdoor attacks are simpler to learn than the benign features. We demonstrate that this assumption, however, does not always hold, especially in all-to-all (A2A) and untargeted (UT) attacks. As a result, purification methods that analyze the separation between the poisoned and benign samples in the input-output space or the final hidden layer space are less effective. We observe that this separability is not confined to a single layer but varies across different hidden layers. Motivated by this understanding, we propose FLARE, a universal purification method to counter various backdoor attacks. FLARE aggregates abnormal activations from all hidden layers to construct representations for clustering. To enhance separation, FLARE develops an adaptive subspace selection algorithm to isolate the optimal space for dividing an entire dataset into two clusters. FLARE assesses the stability of each cluster and identifies the cluster with higher stability as poisoned. Extensive evaluations on benchmark datasets demonstrate the effectiveness of FLARE against 22 representative backdoor attacks, including all-to-one (A2O), all-to-all (A2A), and untargeted (UT) attacks, and its robustness to adaptive attacks. Codes are available at \href{https://github.com/THUYimingLi/BackdoorBox}{BackdoorBox} and \href{https://github.com/vtu81/backdoor-toolbox}{backdoor-toolbox}.
nan
Article 834
Title@2025-06-22 (7): POPGym Arcade: Parallel Pixelated POMDPs
Title: POPGym Arcade: Parallel Pixelated POMDPs | POPGym Arcade: Parallele Pixelierte POMDPs | POPGym 街机屋:平行像素化 POMDPs 2503.01450v5 |
Authors (5): Zekang Wang, Zhe He, Borong Zhang, Edan Toledo, Steven Morad
We present the POPGym Arcade, a collection of hardware-accelerated, pixel-based environments with shared observation and action spaces. Each environment includes fully and partially observable variants, enabling counterfactual studies on partial observability. We also introduce mathematical tools for analyzing policies under partial observability, which reveal how agents recall past information to make decisions. Our analysis shows (1) that controlling for partial observability is critical and (2) that agents with long-term memory learn brittle policies that struggle to generalize. Finally, we demonstrate that recurrent policies can be “poisoned” by old, out-of-distribution observations, with implications for sim-to-real transfer, imitation learning, and offline reinforcement learning.
nan
Article 835
Title@2025-06-22 (7): Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Title: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data | Lernen aus Referenzantworten: Vielseitige Sprachmodellausrichtung ohne Binäre menschliche Präferenzdaten | 从参考资料解答中学习:通用语言模型调整,无二元人类优先数据 2504.09895v2 |
Authors (3): Shuai Zhao, Linchao Zhu, Yi Yang
Large language models~(LLMs) are expected to be helpful, harmless, and honest. In alignment scenarios such as safety, confidence, and general preference alignment, binary preference data collection and reward modeling are resource-intensive but essential for transferring human preference. In this work, we explore using the similarity between sampled generations and high-quality reference answers as an alternative reward function choice for LLM alignment. Similarity reward circumvents binary preference data collection and reward modeling when unary high-quality reference answers are available. We introduce \textit{RefAlign}, a versatile REINFORCE-style alignment algorithm that does not rely on reference or reward models. RefAlign utilizes similarity metrics, such as BERTScore between sampled generations and reference answers as surrogate rewards. Beyond general human preference optimization, RefAlign can be readily extended to diverse scenarios, such as safety and confidence alignment, by incorporating the similarity reward with task-related objectives. In various scenarios, RefAlign demonstrates comparable performance to previous alignment methods without binary preference data and reward models.
nan
Article 836
Title@2025-06-22 (7): Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning
Title: Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning | Verallgemeinerung unter byzantinischen und vergiftenden Angriffen: Enge Stabilitätsgrenzen in robustem verteiltem Lernen | Byzantine和毒毒袭击下的普及化:强力分布式学习中的紧固稳定环环绕 2506.18020v1 |
Authors (4): Thomas Boudou, Batiste Le Bars, Nirupam Gupta, Aurélien Bellet
Robust distributed learning algorithms aim to maintain good performance in distributed and federated settings, even in the presence of misbehaving workers. Two primary threat models have been studied: Byzantine attacks, where misbehaving workers can send arbitrarily corrupted updates, and data poisoning attacks, where misbehavior is limited to manipulation of local training data. While prior work has shown comparable optimization error under both threat models, a fundamental question remains open: How do these threat models impact generalization? Empirical evidence suggests a gap between the two threat models, yet it remains unclear whether it is fundamental or merely an artifact of suboptimal attacks. In this work, we present the first theoretical investigation into this problem, formally showing that Byzantine attacks are intrinsically more harmful to generalization than data poisoning. Specifically, we prove that: (i) under data poisoning, the uniform algorithmic stability of a robust distributed learning algorithm, with optimal optimization error, degrades by an additive factor of $\varTheta ( \frac{f}{n-f} )$, with $f$ the number of misbehaving workers out of $n$; and (ii) In contrast, under Byzantine attacks, the degradation is in $\mathcal{O} \big( \sqrt{ \frac{f}{n-2f}} \big)$.This difference in stability leads to a generalization error gap that is especially significant as $f$ approaches its maximum value $\frac{n}{2}$.
nan
Article 837
Title@2025-06-22 (7): AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Title: AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs | AlphaDecay: Modulweises Gewichtsdecay für schweres Balancing in LLMs | AlphaDecay:LLMM中重帆平衡的舱型偏重衰减 2506.14562v2 |
Authors (7): Di He, Ajay Jaiswal, Songjun Tu, Li Shen, Ganzhao Yuan, Shiwei Liu, Lu Yin
Weight decay is a standard regularization technique for training large language models (LLMs). While it is common to assign a uniform decay rate to every layer, this approach overlooks the structural diversity of LLMs and the varying spectral properties across modules. In this paper, we introduce AlphaDecay, a simple yet effective method that adaptively assigns different weight decay strengths to each module of an LLM. Our approach is guided by Heavy-Tailed Self-Regularization (HT-SR) theory, which analyzes the empirical spectral density (ESD) of weight correlation matrices to quantify “heavy-tailedness.” Modules exhibiting more pronounced heavy-tailed ESDs, reflecting stronger feature learning, are assigned weaker decay, while modules with lighter-tailed spectra receive stronger decay. Our method leverages tailored weight decay assignments to balance the module-wise differences in spectral properties, leading to improved performance. Extensive pre-training tasks with various model sizes from 60M to 1B demonstrate that AlphaDecay achieves better perplexity and generalization than conventional uniform decay and other adaptive decay baselines. Our code is available at https://github.com/hed-ucas/AlphaDecay.
nan
Article 838
Title@2025-06-22 (7): Probing the Embedding Space of Transformers via Minimal Token Perturbations
Title: Probing the Embedding Space of Transformers via Minimal Token Perturbations | Den Embedding Space von Transformers über Minimal Token Perturbations probieren | 通过最小 Token 扰动来验证变形器嵌入空间 2506.18011v1 |
Authors (4): Eddie Conti, Alejandro Astruc, Alvaro Parafita, Axel Brando
Understanding how information propagates through Transformer models is a key challenge for interpretability. In this work, we study the effects of minimal token perturbations on the embedding space. In our experiments, we analyze the frequency of which tokens yield to minimal shifts, highlighting that rare tokens usually lead to larger shifts. Moreover, we study how perturbations propagate across layers, demonstrating that input information is increasingly intermixed in deeper layers. Our findings validate the common assumption that the first layers of a model can be used as proxies for model explanations. Overall, this work introduces the combination of token perturbations and shifts on the embedding space as a powerful tool for model interpretability.
nan
Article 839
Title@2025-06-22 (7): Imputation of Longitudinal Data Using GANs: Challenges and Implications for Classification
Title: Imputation of Longitudinal Data Using GANs: Challenges and Implications for Classification | Imputation von Längsschnittdaten mit GANs: Herausforderungen und Implikationen für die Klassifizierung | 使用全球大气网络的纵向数据估计:分类的挑战和影响 2506.18007v1 |
Authors (3): Sharon Torao Pingi, Md Abul Bashar, Richi Nayak
Longitudinal data is commonly utilised across various domains, such as health, biomedical, education and survey studies. This ubiquity has led to a rise in statistical, machine and deep learning-based methods for Longitudinal Data Classification (LDC). However, the intricate nature of the data, characterised by its multi-dimensionality, causes instance-level heterogeneity and temporal correlations that add to the complexity of longitudinal data analysis. Additionally, LDC accuracy is often hampered by the pervasiveness of missing values in longitudinal data. Despite ongoing research that draw on the generative power and utility of Generative Adversarial Networks (GANs) to address the missing data problem, critical considerations include statistical assumptions surrounding longitudinal data and missingness within it, as well as other data-level challenges like class imbalance and mixed data types that impact longitudinal data imputation (LDI) and the subsequent LDC process in GANs. This paper provides a comprehensive overview of how GANs have been applied in LDI, with a focus whether GANS have adequately addressed fundamental assumptions about the data from a LDC perspective. We propose a categorisation of main approaches to GAN-based LDI, highlight strengths and limitations of methods, identify key research trends, and provide promising future directions. Our findings indicate that while GANs show great potential for LDI to improve usability and quality of longitudinal data for tasks like LDC, there is need for more versatile approaches that can handle the wider spectrum of challenges presented by longitudinal data with missing values. By synthesising current knowledge and identifying critical research gaps, this survey aims to guide future research efforts in developing more effective GAN-based solutions to address LDC challenges.
nan
Article 840
Title@2025-06-22 (7): EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models
Title: EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models | EDA-DM: Verbesserte Verteilungsausrichtung für die Nachschulung Quantisierung von Diffusionsmodellen | EDA-DM:加强传播模型的培训后量化的分发协调 2401.04585v3 |
Authors (6): Xuewen Liu, Zhikai Li, Junrui Xiao, Mengjuan Chen, Jianquan Li, Qingyi Gu
Diffusion models have achieved great success in image generation tasks. However, the lengthy denoising process and complex neural networks hinder their low-latency applications in real-world scenarios. Quantization can effectively reduce model complexity, and post-training quantization (PTQ), which does not require fine-tuning, is highly promising for compressing and accelerating diffusion models. Unfortunately, we find that due to the highly dynamic activations, existing PTQ methods suffer from distribution mismatch issues at both calibration sample level and reconstruction output level, which makes the performance far from satisfactory. In this paper, we propose EDA-DM, a standardized PTQ method that efficiently addresses the above issues. Specifically, at the calibration sample level, we extract information from the density and diversity of latent space feature maps, which guides the selection of calibration samples to align with the overall sample distribution; and at the reconstruction output level, we theoretically analyze the reasons for previous reconstruction failures and, based on this insight, optimize block reconstruction using the Hessian loss of layers, aligning the outputs of quantized model and full-precision model at different network granularity. Extensive experiments demonstrate that EDA-DM significantly outperforms the existing PTQ methods across various models and datasets. Our method achieves a 1.83 times speedup and 4 times compression for the popular Stable-Diffusion on MS-COCO, with only a 0.05 loss in CLIP score. Code is available at http://github.com/BienLuky/EDA-DM .
nan
Article 841
Title@2025-06-22 (7): Fast Neural Inverse Kinematics on Human Body Motions
Title: Fast Neural Inverse Kinematics on Human Body Motions | Schnelle Neurale Inverse Kinematik auf menschlichen Körper Bewegungen | 人类身体运动的快速神经反反向数学 2506.17996v1 |
Authors (2): David Tolpin, Sefy Kagarlitsky
Markerless motion capture enables the tracking of human motion without requiring physical markers or suits, offering increased flexibility and reduced costs compared to traditional systems. However, these advantages often come at the expense of higher computational demands and slower inference, limiting their applicability in real-time scenarios. In this technical report, we present a fast and reliable neural inverse kinematics framework designed for real-time capture of human body motions from 3D keypoints. We describe the network architecture, training methodology, and inference procedure in detail. Our framework is evaluated both qualitatively and quantitatively, and we support key design decisions through ablation studies.
nan
Article 842
Title@2025-06-22 (7): Newtonian and Lagrangian Neural Networks: A Comparison Towards Efficient Inverse Dynamics Identification
Title: Newtonian and Lagrangian Neural Networks: A Comparison Towards Efficient Inverse Dynamics Identification | Newtonian and Lagrangeian Neural Networks: Ein Vergleich zu einer effizienten Inverse Dynamics-Identifikation | 牛顿和拉格朗江神经网络:与高效反向动态识别比较 2506.17994v1 |
Authors (6): Minh Trinh, Andreas René Geist, Josefine Monnet, Stefan Vilceanu, Sebastian Trimpe, Christian Brecher
Accurate inverse dynamics models are essential tools for controlling industrial robots. Recent research combines neural network regression with inverse dynamics formulations of the Newton-Euler and the Euler-Lagrange equations of motion, resulting in so-called Newtonian neural networks and Lagrangian neural networks, respectively. These physics-informed models seek to identify unknowns in the analytical equations from data. Despite their potential, current literature lacks guidance on choosing between Lagrangian and Newtonian networks. In this study, we show that when motor torques are estimated instead of directly measuring joint torques, Lagrangian networks prove less effective compared to Newtonian networks as they do not explicitly model dissipative torques. The performance of these models is compared to neural network regression on data of a MABI MAX 100 industrial robot.
nan
Article 843
Title@2025-06-22 (7): Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings
Title: Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings | Datenkurationsmaterien: Modellkollaps und spurlose Shift-Performance-Vorhersage aus dem Training auf unkuratierten Text-Embeddings | 数据说明事项:从未完成的文字嵌入培训中得出的模型折叠和净性转变的绩效预测 2506.17989v1 |
Authors (4): Lucas Mattioli, Youness Ait Hadichou, Sabrina Chaouche, Martin Gonzalez
Training models on uncurated Text Embeddings (TEs) derived from raw tabular data can lead to a severe failure mode known as model collapse, where predictions converge to a single class regardless of input. By comparing models trained with identical hyper-parameter configurations on both raw tabular data and their TE-derived counterparts, we find that collapse is a consistent failure mode in the latter setting. We introduce a set of metrics that capture the extent of model collapse, offering a new perspective on TE quality as a proxy for data curation. Our results reveal that TE alone does not effectively function as a curation layer - and that their quality significantly influences downstream learning. More insidiously, we observe that the presence of model collapse can yield artificially inflated and spurious Accuracy-on-the-Line correlation. These findings highlight the need for more nuanced curation and evaluation of embedding-based representations, particularly in out-of-distribution settings.
nan
Article 844
Title@2025-06-22 (7): A Coverage-Guided Testing Framework for Quantum Neural Networks
Title: A Coverage-Guided Testing Framework for Quantum Neural Networks | Ein Coverage-Guided Testing Framework für Quantum-Neural-Netzwerke | 量子神经网络覆盖指导测试框架 2411.02450v2 |
Authors (2): Minqi Shao, Jianjun Zhao
Quantum Neural Networks (QNNs) integrate quantum computing and deep neural networks, leveraging quantum properties like superposition and entanglement to enhance machine learning algorithms. These characteristics enable QNNs to outperform classical neural networks in tasks such as quantum chemistry simulations, optimization problems, and quantum-enhanced machine learning. Despite their early success, their reliability and safety issues have posed threats to their applicability. However, due to the inherently non-classical nature of quantum mechanics, verifying QNNs poses significant challenges. To address this, we propose QCov, a set of test coverage criteria specifically designed to systematically evaluate QNN state exploration during testing, with an emphasis on superposition. These criteria help evaluate test diversity and detect underlying defects within test suites. Extensive experiments on benchmark datasets and QNN models validate QCov’s effectiveness in reflecting test quality, guiding fuzz testing efficiently, and thereby improving QNN robustness. We also evaluate sampling costs of QCov under realistic quantum scenarios to justify its practical feasibility. Finally, the effects of unrepresentative training data distribution and parameter choice are further explored.
nan
Article 845
Title@2025-06-22 (7): SliceGX: Layer-wise GNN Explanation with Model-slicing
Title: SliceGX: Layer-wise GNN Explanation with Model-slicing | SliceGX: Schichtweise GNN-Erläuterung mit Modellschnitt | SlicGX: 从图层角度解释 GNN GNN 2506.17977v1 |
Authors (5): Tingting Zhu, Tingyang Chen, Yinghui Wu, Arijit Khan, Xiangyu Ke
Ensuring the trustworthiness of graph neural networks (GNNs) as black-box models requires effective explanation methods. Existing GNN explanations typically apply input perturbations to identify subgraphs that are responsible for the occurrence of the final output of GNNs. However, such approaches lack finer-grained, layer-wise analysis of how intermediate representations contribute to the final result, capabilities that are crucial for model diagnosis and architecture optimization. This paper introduces SliceGX, a novel GNN explanation approach that generates explanations at specific GNN layers in a progressive manner. Given a GNN M, a set of selected intermediate layers, and a target layer, SliceGX automatically segments M into layer blocks (“model slice”) and discovers high-quality explanatory subgraphs in each layer block that clarifies the occurrence of output of M at the targeted layer. Although finding such layer-wise explanations is computationally challenging, we develop efficient algorithms and optimization techniques that incrementally generate and maintain these subgraphs with provable approximation guarantees. Additionally, SliceGX offers a SPARQL-like query interface, providing declarative access and search capacities for the generated explanations. Through experiments on large real-world graphs and representative GNN architectures, we verify the effectiveness and efficiency of SliceGX, and illustrate its practical utility in supporting model debugging.
nan
Article 846
Title@2025-06-22 (7): Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm
Title: Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm | Vertrauenswürdige effiziente Kommunikation für verteiltes Lernen mit LQ-SGD-Algorithmus | 利用LQ-SGD 算法为分配学习进行值得信赖的高效沟通 2506.17974v1 |
Authors (6): Hongyang Li, Lincen Bai, Caesar Wu, Mohammed Chadli, Said Mammar, Pascal Bouvry
We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank approximation and log-quantization techniques, which drastically reduce the communication overhead, while still ensuring the convergence speed of training and model accuracy. In addition, LQ-SGD and other compression-based methods show stronger resistance to gradient inversion than traditional SGD, providing a more robust and efficient optimization path for distributed learning systems.
nan
Article 847
Title@2025-06-22 (7): Reinforcement Learning Teachers of Test Time Scaling
Title: Reinforcement Learning Teachers of Test Time Scaling | Verstärktes Lernen von Lehrern der Testzeitskalierung | 测试时间尺度强化学习教师 2506.08388v2 |
Authors (3): Edoardo Cetin, Tianyu Zhao, Yujin Tang
Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL’s exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply “connect-the-dots” with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem’s solution. In practice, the raw outputs of a 7B RLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework.
nan
Article 848
Title@2025-06-22 (7): h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective
Title: h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective | h-Kalibrierung: Rethinking Klassifikator-Rekalibrierung mit probabilistischem fehlergebundenem Ziel | h-校准:用概率错误误差目标重新思考分类法重新校准 2506.17968v1 |
Authors (6): Wenjian Huang, Guiping Cao, Jiahao Xia, Jingkun Chen, Hao Wang, Jianguo Zhang
Deep neural networks have demonstrated remarkable performance across numerous learning tasks but often suffer from miscalibration, resulting in unreliable probability outputs. This has inspired many recent works on mitigating miscalibration, particularly through post-hoc recalibration methods that aim to obtain calibrated probabilities without sacrificing the classification performance of pre-trained models. In this study, we summarize and categorize previous works into three general strategies: intuitively designed methods, binning-based methods, and methods based on formulations of ideal calibration. Through theoretical and practical analysis, we highlight ten common limitations in previous approaches. To address these limitations, we propose a probabilistic learning framework for calibration called h-calibration, which theoretically constructs an equivalent learning formulation for canonical calibration with boundedness. On this basis, we design a simple yet effective post-hoc calibration algorithm. Our method not only overcomes the ten identified limitations but also achieves markedly better performance than traditional methods, as validated by extensive experiments. We further analyze, both theoretically and experimentally, the relationship and advantages of our learning objective compared to traditional proper scoring rule. In summary, our probabilistic framework derives an approximately equivalent differentiable objective for learning error-bounded calibrated probabilities, elucidating the correspondence and convergence properties of computational statistics with respect to theoretical bounds in canonical calibration. The theoretical effectiveness is verified on standard post-hoc calibration benchmarks by achieving state-of-the-art performance. This research offers valuable reference for learning reliable likelihood in related fields.
nan
Article 849
Title@2025-06-22 (7): Adapting Vision-Language Models for Evaluating World Models
Title: Adapting Vision-Language Models for Evaluating World Models | Anpassung von Vision-Language-Modellen für die Bewertung von Weltmodellen | 调整世界模型评估的愿景-语言模型 2506.17967v1 |
Authors (8): Mariya Hendriksen, Tabish Rashid, David Bignell, Raluca Georgescu, Abdelhak Lemkhenter, Katja Hofmann, Sam Devlin, Sarah Parisot
World models – generative models that simulate environment dynamics conditioned on past observations and actions – are gaining prominence in planning, simulation, and embodied AI. However, evaluating their rollouts remains a fundamental challenge, requiring fine-grained, temporally grounded assessment of action alignment and semantic consistency – capabilities not captured by existing metrics. Vision-Language Models (VLMs) have shown promise as automatic evaluators of generative content due to their strong multimodal reasoning abilities. Yet, their use in fine-grained, temporally sensitive evaluation tasks remains limited and requires targeted adaptation. We introduce a evaluation protocol targeting two recognition tasks – action recognition and character recognition – each assessed across binary, multiple-choice, and open-ended formats. To support this, we present UNIVERSE (UNIfied Vision-language Evaluator for Rollouts in Simulated Environments), a method for adapting VLMs to rollout evaluation under data and compute constraints. We conduct a large-scale study comparing full, partial, and parameter-efficient finetuning across task formats, context lengths, sampling strategies, and data compositions. The resulting unified evaluator matches the performance of task-specific baselines using a single checkpoint. Human studies confirm strong alignment with human judgments, establishing UNIVERSE as a scalable, semantics-aware evaluator for world models.
nan
Article 850
Title@2025-06-22 (7): AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Title: AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement | AnyEnhance: Ein einheitliches Generatives Modell mit Prompt-Guidance und Selbstkritik für Sprachverbesserung | Any促进:促进声音增强的快速指导和自我批评统一生成模式 2501.15417v2 |
Authors (8): Junan Zhang, Jing Yang, Zihao Fang, Yuancheng Wang, Zehua Zhang, Zhuo Wang, Fan Fan, Zhizheng Wu
We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker’s timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance/.
nan
Article 851
Title@2025-06-22 (7): Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
Title: Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models | Leveraging Model Guidance zum Extrahieren von Trainingsdaten aus personalisierten Diffusionsmodellen | 利用示范指南,从个性化传播模式中提取培训数据 2410.03039v2 |
Authors (3): Xiaoyu Wu, Jiaru Zhang, Zhiwei Steven Wu
Diffusion Models (DMs) have become powerful image generation tools, especially for few-shot fine-tuning where a pretrained DM is fine-tuned on a small image set to capture specific styles or objects. Many people upload these personalized checkpoints online, fostering communities such as Civitai and HuggingFace. However, model owners may overlook the data leakage risks when releasing fine-tuned checkpoints. Moreover, concerns regarding copyright violations arise when unauthorized data is used during fine-tuning. In this paper, we ask: “Can training data be extracted from these fine-tuned DMs shared online?” A successful extraction would present not only data leakage threats but also offer tangible evidence of copyright infringement. To answer this, we propose FineXtract, a framework for extracting fine-tuning data. Our method approximates fine-tuning as a gradual shift in the model’s learned distribution – from the original pretrained DM toward the fine-tuning data. By extrapolating the models before and after fine-tuning, we guide the generation toward high-probability regions within the fine-tuned data distribution. We then apply a clustering algorithm to extract the most probable images from those generated using this extrapolated guidance. Experiments on DMs fine-tuned with datasets including WikiArt, DreamBooth, and real-world checkpoints posted online validate the effectiveness of our method, extracting about 20% of fine-tuning data in most cases. The code is available https://github.com/Nicholas0228/FineXtract.
nan
Article 852
Title@2025-06-22 (7): An entropy-optimal path to humble AI
Title: An entropy-optimal path to humble AI | Entropie-optimaler Weg zur bescheidenen KI | 通往谦卑的 AI 的星盘最佳路径 2506.17940v1 |
Authors (5): Davide Bassetti, Lukáš Pospíšil, Michael Groom, Terence J. O’Kane, Illia Horenko
Progress of AI has led to a creation of very successful, but by no means humble models and tools, especially regarding (i) the huge and further exploding costs and resources they demand, and (ii) the over-confidence of these tools with the answers they provide. Here we introduce a novel mathematical framework for a non-equilibrium entropy-optimizing reformulation of Boltzmann machines based on the exact law of total probability. It results in the highly-performant, but much cheaper, gradient-descent-free learning framework with mathematically-justified existence and uniqueness criteria, and answer confidence/reliability measures. Comparisons to state-of-the-art AI tools in terms of performance, cost and the model descriptor lengths on a set of synthetic problems with varying complexity reveal that the proposed method results in more performant and slim models, with the descriptor lengths being very close to the intrinsic complexity scaling bounds for the underlying problems. Applying this framework to historical climate data results in models with systematically higher prediction skills for the onsets of La Ni~na and El Ni~no climate phenomena, requiring just few years of climate data for training - a small fraction of what is necessary for contemporary climate prediction tools.
nan
Article 853
Title@2025-06-22 (7): IDAL: Improved Domain Adaptive Learning for Natural Images Dataset
Title: IDAL: Improved Domain Adaptive Learning for Natural Images Dataset | IDAL: Verbessertes Domain Adaptives Lernen für natürliche Bilder Datensatz | IDAL: 改进自然图像数据集的适应性空间学习 2506.17931v1 |
Authors (3): Ravi Kant Gupta, Shounak Das, Amit Sethi
We present a novel approach for unsupervised domain adaptation (UDA) for natural images. A commonly-used objective for UDA schemes is to enhance domain alignment in representation space even if there is a domain shift in the input space. Existing adversarial domain adaptation methods may not effectively align different domains of multimodal distributions associated with classification problems. Our approach has two main features. Firstly, its neural architecture uses the deep structure of ResNet and the effective separation of scales of feature pyramidal network (FPN) to work with both content and style features. Secondly, it uses a combination of a novel loss function and judiciously selected existing loss functions to train the network architecture. This tailored combination is designed to address challenges inherent to natural images, such as scale, noise, and style shifts, that occur on top of a multi-modal (multi-class) distribution. The combined loss function not only enhances model accuracy and robustness on the target domain but also speeds up training convergence. Our proposed UDA scheme generalizes better than state-of-the-art for CNN-based methods on Office-Home, Office-31, and VisDA-2017 datasets and comaparable for DomainNet dataset.
nan
Article 854
Title@2025-06-22 (7): Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Title: Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Evolving Prompts In-Context: Eine offene, sich selbst replizierende Perspektive | 不断演变的加速:一个开放的、自我复制的视角 2506.17930v1 |
Authors (3): Jianyu Wang, Zhiqiang Hu, Lidong Bing
We propose a novel prompt design paradigm that challenges conventional wisdom in large language model (LLM) prompting. While conventional wisdom prioritizes well-crafted instructions and demonstrations for in-context learning (ICL), we show that pruning random demonstrations into seemingly incoherent “gibberish” can remarkably improve performance across diverse tasks. Notably, the “gibberish” always matches or surpasses state-of-the-art automatic prompt optimization techniques, achieving substantial gains regardless of LLM alignment. Nevertheless, discovering an effective pruning strategy is non-trivial, as existing attribution methods and prompt compression algorithms fail to deliver robust results, let alone human intuition. In terms of this, we propose a self-discover prompt optimization framework, PromptQuine, an evolutionary search framework that automatically searches for the pruning strategy by itself using only low-data regimes. Much like the emergent complexity in nature–such as symbiosis and self-organization–arising in response to resource constraints, our framework evolves and refines unconventional yet highly effective prompts by leveraging only the tokens present within the context. We demonstrate its effectiveness across classification, multi-choice question answering, generation and math reasoning tasks across LLMs, while achieving decent runtime efficiency. We hope our findings can guide mechanistic studies on in-context learning, and provide a call to action, to pave the way for more open-ended search algorithms for more effective LLM prompting.
nan
Article 855
Title@2025-06-22 (7): Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability
Title: Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability | Enthüllung molekularer Moieties durch Hierarchische Grad-CAM Graph Erklärbarkeit | 通过等级梯度- CAM 图形解释 2402.01744v5 |
Authors (5): Salvatore Contino, Paolo Sortino, Maria Rita Gulotta, Ugo Perricone, Roberto Pirrone
Background: Virtual Screening (VS) has become an essential tool in drug discovery, enabling the rapid and cost-effective identification of potential bioactive molecules. Among recent advancements, Graph Neural Networks (GNNs) have gained prominence for their ability to model complex molecular structures using graph-based representations. However, the integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge. This limitation hampers both the interpretability of predictive models and the rational design of novel therapeutics. Results: We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family. These classifiers achieved state-of-the-art performance in virtual screening tasks, demonstrating high accuracy and robustness on different targets. Building upon these models, we implemented the Hierarchical Grad-CAM graph Explainer (HGE) framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization. HGE exploits Grad-CAM explanations at the atom, ring, and whole-molecule levels, leveraging the message-passing mechanism to highlight the most relevant chemical moieties. Validation against experimental data from the literature confirmed the ability of the explainer to recognize a molecular pattern of drugs and correctly annotate them to the known target. Conclusion: Our approach may represent a valid support to shorten both the screening and the hit discovery process. Detailed knowledge of the molecular substructures that play a role in the binding process can help the computational chemist to gain insights into the structure optimization, as well as in drug repurposing tasks.
nan
Article 856
Title@2025-06-22 (7): ASTER: Adaptive Spatio-Temporal Early Decision Model for Dynamic Resource Allocation
Title: ASTER: Adaptive Spatio-Temporal Early Decision Model for Dynamic Resource Allocation | ASTER: Adaptives Spatio-Temporales Frühentscheidungsmodell für die dynamische Ressourcenallokation | ATER: 动态资源分配适应性SPATIO-临时早期决定模式 2506.17929v1 |
Authors (4): Shulun Chen, Wei Shao, Flora D. Salim, Hao Xue
Supporting decision-making has long been a central vision in the field of spatio-temporal intelligence. While prior work has improved the timeliness and accuracy of spatio-temporal forecasting, converting these forecasts into actionable strategies remains a key challenge. A main limitation is the decoupling of the prediction and the downstream decision phases, which can significantly degrade the downstream efficiency. For example, in emergency response, the priority is successful resource allocation and intervention, not just incident prediction. To this end, it is essential to propose an Adaptive Spatio-Temporal Early Decision model (ASTER) that reforms the forecasting paradigm from event anticipation to actionable decision support. This framework ensures that information is directly used for decision-making, thereby maximizing overall effectiveness. Specifically, ASTER introduces a new Resource-aware Spatio-Temporal interaction module (RaST) that adaptively captures long- and short-term dependencies under dynamic resource conditions, producing context-aware spatiotemporal representations. To directly generate actionable decisions, we further design a Preference-oriented decision agent (Poda) based on multi-objective reinforcement learning, which transforms predictive signals into resource-efficient intervention strategies by deriving optimal actions under specific preferences and dynamic constraints. Experimental results on four benchmark datasets demonstrate the state-of-the-art performance of ASTER in improving both early prediction accuracy and resource allocation outcomes across six downstream metrics.
nan
Article 857
Title@2025-06-22 (7): Improving the Efficiency of Long Document Classification using Sentence Ranking Approach
Title: Improving the Efficiency of Long Document Classification using Sentence Ranking Approach | Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz | 采用判决分级办法提高长文件分类的效率 2506.07248v2 |
Authors (4): Prathamesh Kokate, Mitali Sarnaik, Manavi Khopade, Raviraj Joshi
Long document classification poses challenges due to the computational limitations of transformer-based models, particularly BERT, which are constrained by fixed input lengths and quadratic attention complexity. Moreover, using the full document for classification is often redundant, as only a subset of sentences typically carries the necessary information. To address this, we propose a TF-IDF-based sentence ranking method that improves efficiency by selecting the most informative content. Our approach explores fixed-count and percentage-based sentence selection, along with an enhanced scoring strategy combining normalized TF-IDF scores and sentence length. Evaluated on the MahaNews LDC dataset of long Marathi news articles, the method consistently outperforms baselines such as first, last, and random sentence selection. With MahaBERT-v2, we achieve near-identical classification accuracy with just a 0.33 percent drop compared to the full-context baseline, while reducing input size by over 50 percent and inference latency by 43 percent. This demonstrates that significant context reduction is possible without sacrificing performance, making the method practical for real-world long document classification tasks.
nan
Article 858
Title@2025-06-22 (7): Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding
Title: Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding | Permutation Equivariant Modellbasiertes Offline-Verstärkungslernen für Auto-Bindung | 用于自动招标的离线强化学习 2506.17919v1 |
Authors (6): Zhiyu Mou, Miao Xu, Wei Chen, Rongquan Bai, Chuan Yu, Jian Xu
Reinforcement learning (RL) for auto-bidding has shifted from using simplistic offline simulators (Simulation-based RL Bidding, SRLB) to offline RL on fixed real datasets (Offline RL Bidding, ORLB). However, ORLB policies are limited by the dataset’s state space coverage, offering modest gains. While SRLB expands state coverage, its simulator-reality gap risks misleading policies. This paper introduces Model-based RL Bidding (MRLB), which learns an environment model from real data to bridge this gap. MRLB trains policies using both real and model-generated data, expanding state coverage beyond ORLB. To ensure model reliability, we propose: 1) A permutation equivariant model architecture for better generalization, and 2) A robust offline Q-learning method that pessimistically penalizes model errors. These form the Permutation Equivariant Model-based Offline RL (PE-MORL) algorithm. Real-world experiments show that PE-MORL outperforms state-of-the-art auto-bidding methods.
nan
Article 859
Title@2025-06-22 (7): A real-time anomaly detection method for robots based on a flexible and sparse latent space
Title: A real-time anomaly detection method for robots based on a flexible and sparse latent space | Eine Echtzeit-Anomalieerkennungsmethode für Roboter auf Basis eines flexiblen und spärlichen Latentraums | 以灵活和稀少的潜在空间为基础的机器人实时异常现象探测方法 2504.11170v3 |
Authors (4): Taewook Kang, Bum-Jae You, Juyoun Park, Yisoo Lee
The growing demand for robots to operate effectively in diverse environments necessitates the need for robust real-time anomaly detection techniques during robotic operations. However, deep learning-based models in robotics face significant challenges due to limited training data and highly noisy signal features. In this paper, we present Sparse Masked Autoregressive Flow-based Adversarial AutoEncoder model to address these problems. This approach integrates Masked Autoregressive Flow model into Adversarial AutoEncoders to construct a flexible latent space and utilize Sparse autoencoder to efficiently focus on important features, even in scenarios with limited feature space. Our experiments demonstrate that the proposed model achieves a 4.96% to 9.75% higher area under the receiver operating characteristic curve for pick-and-place robotic operations with randomly placed cans, compared to existing state-of-the-art methods. Notably, it showed up to 19.67% better performance in scenarios involving collisions with lightweight objects. Additionally, unlike the existing state-of-the-art model, our model performs inferences within 1 millisecond, ensuring real-time anomaly detection. These capabilities make our model highly applicable to machine learning-based robotic safety systems in dynamic environments. The code is available at https://github.com/twkang43/sparse-maf-aae.
nan
Article 860
Title@2025-06-22 (7): Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks
Title: Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks | Grafik Neuronale Netzwerke in Supply Chain Analytics und Optimierung: Konzepte, Perspektiven, Datensatz und Benchmarks | 供应链分析和优化中的神经网络:概念、视角、数据集和基准 2411.08550v2 |
Authors (4): Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib, Mahathir Mohammad Bappy
Graph Neural Networks (GNNs) have recently gained traction in transportation, bioinformatics, language and image processing, but research on their application to supply chain management remains limited. Supply chains are inherently graph-like, making them ideal for GNN methodologies, which can optimize and solve complex problems. The barriers include a lack of proper conceptual foundations, familiarity with graph applications in SCM, and real-world benchmark datasets for GNN-based supply chain research. To address this, we discuss and connect supply chains with graph structures for effective GNN application, providing detailed formulations, examples, mathematical definitions, and task guidelines. Additionally, we present a multi-perspective real-world benchmark dataset from a leading FMCG company in Bangladesh, focusing on supply chain planning. We discuss various supply chain tasks using GNNs and benchmark several state-of-the-art models on homogeneous and heterogeneous graphs across six supply chain analytics tasks. Our analysis shows that GNN-based models consistently outperform statistical Machine Learning and other Deep Learning models by around 10-30% in regression, 10-30% in classification and detection tasks, and 15-40% in anomaly detection tasks on designated metrics. With this work, we lay the groundwork for solving supply chain problems using GNNs, supported by conceptual discussions, methodological insights, and a comprehensive dataset.
nan
Article 861
Title@2025-06-22 (7): Interpretable global minima of deep ReLU neural networks on sequentially separable data
Title: Interpretable global minima of deep ReLU neural networks on sequentially separable data | Interpretable globale Minima von tiefen neuronalen ReLU-Netzwerken auf sequentiell trennbaren Daten | 深RELU神经网络关于相继分离数据的可解释全球小型深RELU神经网络 2405.07098v3 |
Authors (2): Thomas Chen, Patrícia Muñoz Ewald
We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
nan
Article 862
Title@2025-06-22 (7): SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Title: SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback | SIPDO: Closed-Loop Prompt Optimierung über Synthetic Data Feedback | SIPDO:通过合成数据反馈,通过闭闭电话快速优化 2505.19514v2 |
Authors (5): Yaoning Yu, Ye Yu, Kai Wei, Haojing Luo, Haohan Wang
Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and reasoning benchmarks show that SIPDO outperforms standard prompt tuning methods, highlighting the value of integrating data synthesis into prompt learning workflows.
nan
Article 863
Title@2025-06-22 (7): Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text
Title: Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text | Text2Struct: Eine maschinenlernende Pipeline für den Bergbau strukturierte Daten aus Text | Text2Struct: 文字中采矿结构化数据的机械学习管道 2212.09044v4 |
Authors (2): Chaochao Zhou, Bo Yang
Many analysis and prediction tasks require the extraction of structured data from unstructured texts. However, an annotation scheme and a training dataset have not been available for training machine learning models to mine structured data from text without special templates and patterns. To solve it, this paper presents an end-to-end machine learning pipeline, Text2Struct, including a text annotation scheme, training data processing, and machine learning implementation. We formulated the mining problem as the extraction of metrics and units associated with numerals in the text. Text2Struct was trained and evaluated using an annotated text dataset collected from abstracts of medical publications regarding thrombectomy. In terms of prediction performance, a dice coefficient of 0.82 was achieved on the test dataset. By random sampling, most predicted relations between numerals and entities were well matched to the ground-truth annotations. These results show that Text2Struct is viable for the mining of structured data from text without special templates or patterns. It is anticipated to further improve the pipeline by expanding the dataset and investigating other machine learning models. A code demonstration can be found at: https://github.com/zcc861007/Text2Struct
nan
Article 864
Title@2025-06-22 (7): TROJAN-GUARD: Hardware Trojans Detection Using GNN in RTL Designs
Title: TROJAN-GUARD: Hardware Trojans Detection Using GNN in RTL Designs | TROJAN-GUARD: Hardware-Trojaner-Erkennung mit GNN in RTL-Designs | TROJAN-GUARD:在RTL设计中使用GNN的硬件探测Trojans 2506.17894v1 |
Authors (4): Kiran Thorat, Amit Hasan, Caiwen Ding, Zhijie Shi
Chip manufacturing is a complex process, and to achieve a faster time to market, an increasing number of untrusted third-party tools and designs from around the world are being utilized. The use of these untrusted third party intellectual properties (IPs) and tools increases the risk of adversaries inserting hardware trojans (HTs). The covert nature of HTs poses significant threats to cyberspace, potentially leading to severe consequences for national security, the economy, and personal privacy. Many graph neural network (GNN)-based HT detection methods have been proposed. However, they perform poorly on larger designs because they rely on training with smaller designs. Additionally, these methods do not explore different GNN models that are well-suited for HT detection or provide efficient training and inference processes. We propose a novel framework that generates graph embeddings for large designs (e.g., RISC-V) and incorporates various GNN models tailored for HT detection. Furthermore, our framework introduces domain-specific techniques for efficient training and inference by implementing model quantization. Model quantization reduces the precision of the weights, lowering the computational requirements, enhancing processing speed without significantly affecting detection accuracy. We evaluate our framework using a custom dataset, and our results demonstrate a precision of 98.66% and a recall (true positive rate) of 92.30%, highlighting the effectiveness and efficiency of our approach in detecting hardware trojans in large-scale chip designs
nan
Article 865
Title@2025-06-22 (7): A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy
Title: A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy | Ein bayesischer nicht-parametrischer Ansatz für Generative Modelle: Integrieren von Variational Autoencoder und Generative Adversarial Networks mit Wasserstein und maximaler mittlerer Diskrepanz | 采用巴耶斯非参数方法处理产生模型:采用瓦塞斯泰因和最大平均值差异法,整合变式自动编码器和生成反对向网络 2308.14048v2 |
Authors (2): Forough Fazeli-Asl, Michael Minyi Zhang
We propose a novel generative model within the Bayesian non-parametric learning (BNPL) framework to address some notable failure modes in generative adversarial networks (GANs) and variational autoencoders (VAEs)–these being overfitting in the GAN case and noisy samples in the VAE case. We will demonstrate that the BNPL framework enhances training stability and provides robustness and accuracy guarantees when incorporating the Wasserstein distance and maximum mean discrepancy measure (WMMD) into our model’s loss function. Moreover, we introduce a so-called ``triple model’’ that combines the GAN, the VAE, and further incorporates a code-GAN (CGAN) to explore the latent space of the VAE. This triple model design generates high-quality, diverse samples, while the BNPL framework, leveraging the WMMD loss function, enhances training stability. Together, these components enable our model to achieve superior performance across various generative tasks. These claims are supported by both theoretical analyses and empirical validation on a wide variety of datasets.
nan
Article 866
Title@2025-06-22 (7): ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
Title: ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training | ECHO-LlaMA: Effizientes Caching für Hochleistungs-LLaMA-Schulungen | ECHO-LLAMA: 高效率的高绩效拉马培训 2505.17331v2 |
Authors (8): Maryam Dialameh, Rezaul Karim, Hossein Rajabzadeh, Omar Mohamed Awad, Hyock Ju Kwon, Boxing Chen, Walid Ahmed, Yang Liu
This paper introduces ECHO-LLaMA, an efficient LLaMA architecture designed to improve both the training speed and inference throughput of LLaMA architectures while maintaining its learning capacity. ECHO-LLaMA transforms LLaMA models into shared KV caching across certain layers, significantly reducing KV computational complexity while maintaining or improving language performance. Experimental results demonstrate that ECHO-LLaMA achieves up to 77\% higher token-per-second throughput during training, up to 16\% higher Model FLOPs Utilization (MFU), and up to 14\% lower loss when trained on an equal number of tokens. Furthermore, on the 1.1B model, ECHO-LLaMA delivers approximately 7\% higher test-time throughput compared to the baseline. By introducing a computationally efficient adaptation mechanism, ECHO-LLaMA offers a scalable and cost-effective solution for pretraining and finetuning large language models, enabling faster and more resource-efficient training without compromising performance.
nan
Article 867
Title@2025-06-22 (7): SPD-CFL: Stepwise Parameter Dropout for Efficient Continual Federated Learning
Title: SPD-CFL: Stepwise Parameter Dropout for Efficient Continual Federated Learning | SPD-CFL: Schrittweiser Parameter-Ausfall für effizientes kontinuierliches Federated Learning | SPD-CFL: 高效持续联邦学习的分级参数辍学 2405.09394v2 |
Authors (8): Yuning Yang, Han Yu, Chuan Sun, Tianrun Gao, Xiaohong Liu, Xiaodong Xu, Ping Zhang, Guangyu Wang
Federated Learning (FL) is a collaborative machine learning paradigm for training models on local sensitive data with privacy protection. Pre-trained transformer-based models have emerged as useful foundation models (FMs) to be fine-tuned for a wide range of downstream tasks. However, large-scale pre-trained models make it challenging for traditional FL due to high communication overhead in the resource-constrained IoT. This has inspired the field of parameter-efficient fine-tuning (PEFT) research. Existing PEFT methods attempt to optimize model performance at the given dropout level. Such an approach places the burden on human users to find a dropout rate that provides a satisfactory level of performance through trial-and-error, which is time consuming and resource intensive. To address this limitation, we propose the Step-wise Parameter Dropout for Continual Federated Learning (SPD-CFL) approach. Instead of pre-defining a desired dropout rate, it allows users to specify the target level of performance and then attempts to find the most suitable dropout rate for the given FL model. Specifically, on the server side, SPD-CFL drops trainable parameters in a stepwise manner to improve communication efficiency by reducing the rank of low-rank adaptation (LoRA). The sensitivity-based gradient consistency (SGC) measure is designed to facilitate the adaptive adjustment of parameter dropout. In addition, SPD-CFL introduces continual learning (CL) on the client side to mitigate performance degradation due to the inconsistent optima with distinct parameter dropout rates under heterogeneous FL. Extensive experiments on the public benchmark dataset CIFAR-10 and a real-world medical Face dataset demonstrate significant superiority of SPD-CFL over state-of-the-art methods. Compared to the best-performing baseline, it achieves a 2.07% higher test AUC while reducing communication overhead by 29.53%.
nan
Article 868
Title@2025-06-22 (7): Cloud-Aware SAR Fusion for Enhanced Optical Sensing in Space Missions
Title: Cloud-Aware SAR Fusion for Enhanced Optical Sensing in Space Missions | Cloud-Aware SAR Fusion für verbesserte optische Wahrnehmung in Weltraummissionen | 用于空间飞行任务中增强光学遥感的云器合成孔合成孔径雷达 2506.17885v1 |
Authors (2): Trong-An Bui, Thanh-Thoai Le
Cloud contamination significantly impairs the usability of optical satellite imagery, affecting critical applications such as environmental monitoring, disaster response, and land-use analysis. This research presents a Cloud-Attentive Reconstruction Framework that integrates SAR-optical feature fusion with deep learning-based image reconstruction to generate cloud-free optical imagery. The proposed framework employs an attention-driven feature fusion mechanism to align complementary structural information from Synthetic Aperture Radar (SAR) with spectral characteristics from optical data. Furthermore, a cloud-aware model update strategy introduces adaptive loss weighting to prioritize cloud-occluded regions, enhancing reconstruction accuracy. Experimental results demonstrate that the proposed method outperforms existing approaches, achieving a PSNR of 31.01 dB, SSIM of 0.918, and MAE of 0.017. These outcomes highlight the framework’s effectiveness in producing high-fidelity, spatially and spectrally consistent cloud-free optical images.
nan
Article 869
Title@2025-06-22 (7): Navigating Conflicting Views: Harnessing Trust for Learning
Title: Navigating Conflicting Views: Harnessing Trust for Learning | Navigieren gegensätzlicher Ansichten: Vertrauen fürs Lernen gewinnen | 引导冲突观点:利用信任学习 2406.00958v4 |
Authors (6): Jueqing Lu, Wray Buntine, Yuanyuan Qi, Joanna Dipnall, Belinda Gabbe, Lan Du
Resolving conflicts is critical for improving the reliability of multi-view classification. While prior work focuses on learning consistent and informative representations across views, it often assumes perfect alignment and equal importance of all views, an assumption rarely met in real-world scenarios, as some views may express distinct information. To address this, we develop a computational trust-based discounting method that enhances the Evidential Multi-view framework by accounting for the instance-wise reliability of each view through a probability-sensitive trust mechanism. We evaluate our method on six real-world datasets using Top-1 Accuracy, Fleiss’ Kappa, and a new metric, Multi-View Agreement with Ground Truth, to assess prediction reliability. We also assess the effectiveness of uncertainty in indicating prediction correctness via AUROC. Additionally, we test the scalability of our method through end-to-end training on a large-scale dataset. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications. Codes available at: https://github.com/OverfitFlow/Trust4Conflict
nan
Article 870
Title@2025-06-22 (7): Dim and Small Target Detection for Drone Broadcast Frames Based on Time-Frequency Analysis
Title: Dim and Small Target Detection for Drone Broadcast Frames Based on Time-Frequency Analysis | Dim und kleine Target Detection für Drohnen Broadcast Frames basierend auf Zeit-Frequenz-Analyse | 根据时间-期限分析对无人机广播框架进行迪姆和小目标探测 2505.18167v2 |
Authors (5): Jie Li, Jing Li, Zhanyu Ju, Fengkui Gong, Lu Lv
We propose a dim and small target detection algorithm for drone broadcast frames based on the time-frequency analysis of communication protocol. Specifically, by analyzing modulation parameters and frame structures, the prior knowledge of transmission frequency, signal bandwidth, Zadoff-Chu (ZC) sequences, and frame length of drone broadcast frames is established. The RF signals are processed through the designed filter banks, and the frequency domain parameters of bounding boxes generated by the detector are corrected with transmission frequency and signal bandwidth. Given the remarkable correlation characteristics of ZC sequences, the frequency domain parameters of bounding boxes with low confidence scores are corrected based on ZC sequences and frame length, which improves the detection accuracy of dim targets under low signal-to noise ratio situations. Besides, a segmented energy refinement method is applied to mitigate the deviation caused by interference signals with high energy strength, which ulteriorly corrects the time domain detection parameters for dim targets. As the sampling duration increases, the detection speed improves while the detection accuracy of broadcast frames termed as small targets decreases. The trade-off between detection accuracy and speed versus sampling duration is established, which helps to meet different drone regulation requirements. Simulation results demonstrate that the proposed algorithm improves the evaluation metrics by 2.27\% compared to existing algorithms. The proposed algorithm also performs strong robustness under varying flight distances, diverse types of environment noise, and different flight visual environment. Besides, the broadcast frame decoding results indicate that 97.30\% accuracy of RID has been achieved.
nan
Article 871
Title@2025-06-22 (7): Choice of Scoring Rules for Indirect Elicitation of Properties with Parametric Assumptions
Title: Choice of Scoring Rules for Indirect Elicitation of Properties with Parametric Assumptions | Wahl der Bewertungsregeln für die Indirekte Elizitation von Immobilien mit parametrischen Annahmen | 带有参数假设的间接引力财产选择规则 2506.17880v1 |
Authors (2): Lingfang Hu, Ian A. Kash
People are commonly interested in predicting a statistical property of a random event such as mean and variance. Proper scoring rules assess the quality of predictions and require that the expected score gets uniquely maximized at the precise prediction, in which case we call the score directly elicits the property. Previous research work has widely studied the existence and the characterization of proper scoring rules for different properties, but little literature discusses the choice of proper scoring rules for applications at hand. In this paper, we explore a novel task, the indirect elicitation of properties with parametric assumptions, where the target property is a function of several directly-elicitable sub-properties and the total score is a weighted sum of proper scoring rules for each sub-property. Because of the restriction to a parametric model class, different settings for the weights lead to different constrained optimal solutions. Our goal is to figure out how the choice of weights affects the estimation of the target property and which choice is the best. We start it with simulation studies and observe an interesting pattern: in most cases, the optimal estimation of the target property changes monotonically with the increase of each weight, and the best configuration of weights is often to set some weights as zero. To understand how it happens, we first establish the elementary theoretical framework and then provide deeper sufficient conditions for the case of two sub-properties and of more sub-properties respectively. The theory on 2-D cases perfectly interprets the experimental results. In higher-dimensional situations, we especially study the linear cases and suggest that more complex settings can be understood with locally mapping into linear situations or using linear approximations when the true values of sub-properties are close enough to the parametric space.
nan
Article 872
Title@2025-06-22 (7): Decoding Federated Learning: The FedNAM+ Conformal Revolution
Title: Decoding Federated Learning: The FedNAM+ Conformal Revolution | Decoding Federated Learning: Die FedNAM+ Konforme Revolution | 解说联邦学习:美联联储+非正规革命 2506.17872v1 |
Authors (3): Sree Bhargavi Balija, Amitash Nanda, Debashis Sahoo
Federated learning has significantly advanced distributed training of machine learning models across decentralized data sources. However, existing frameworks often lack comprehensive solutions that combine uncertainty quantification, interpretability, and robustness. To address this, we propose FedNAM+, a federated learning framework that integrates Neural Additive Models (NAMs) with a novel conformal prediction method to enable interpretable and reliable uncertainty estimation. Our method introduces a dynamic level adjustment technique that utilizes gradient-based sensitivity maps to identify key input features influencing predictions. This facilitates both interpretability and pixel-wise uncertainty estimates. Unlike traditional interpretability methods such as LIME and SHAP, which do not provide confidence intervals, FedNAM+ offers visual insights into prediction reliability. We validate our approach through experiments on CT scan, MNIST, and CIFAR datasets, demonstrating high prediction accuracy with minimal loss (e.g., only 0.1% on MNIST), along with transparent uncertainty measures. Visual analysis highlights variable uncertainty intervals, revealing low-confidence regions where model performance can be improved with additional data. Compared to Monte Carlo Dropout, FedNAM+ delivers efficient and global uncertainty estimates with reduced computational overhead, making it particularly suitable for federated learning scenarios. Overall, FedNAM+ provides a robust, interpretable, and computationally efficient framework that enhances trust and transparency in decentralized predictive modeling.
nan
Article 873
Title@2025-06-22 (7): How Alignment Shrinks the Generative Horizon
Title: How Alignment Shrinks the Generative Horizon | Wie Alignment den generativen Horizont schrumpft | 协同一致如何缩小生成地平线 2506.17871v1 |
Authors (2): Chenghao Yang, Ari Holtzman
Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this stability in the generation? We investigate this phenomenon through the lens of probability concentration in the model’s output distribution. To quantify this concentration, we introduce the Branching Factor (BF) – a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model’s output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this stability has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model’s behavior, but instead steers it toward stylistic tokens (e.g., “Sure”) that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show that prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.
nan
Article 874
Title@2025-06-22 (7): NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN
Title: NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN | NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN | NestQuant: 培训后DNN的整数 2506.17870v1 |
Authors (6): Jianhang Xie, Chuntao Ding, Xiaqing Li, Shenyuan Ren, Yidong Li, Zhichao Lu
Deploying quantized deep neural network (DNN) models with resource adaptation capabilities on ubiquitous Internet of Things (IoT) devices to provide high-quality AI services can leverage the benefits of compression and meet multi-scenario resource requirements. However, existing dynamic/mixed precision quantization requires retraining or special hardware, whereas post-training quantization (PTQ) has two limitations for resource adaptation: (i) The state-of-the-art PTQ methods only provide one fixed bitwidth model, which makes it challenging to adapt to the dynamic resources of IoT devices; (ii) Deploying multiple PTQ models with diverse bitwidths consumes large storage resources and switching overheads. To this end, this paper introduces a resource-friendly post-training integer-nesting quantization, i.e., NestQuant, for on-device quantized model switching on IoT devices. The proposed NestQuant incorporates the integer weight decomposition, which bit-wise splits quantized weights into higher-bit and lower-bit weights of integer data types. It also contains a decomposed weights nesting mechanism to optimize the higher-bit weights by adaptive rounding and nest them into the original quantized weights. In deployment, we can send and store only one NestQuant model and switch between the full-bit/part-bit model by paging in/out lower-bit weights to adapt to resource changes and reduce consumption. Experimental results on the ImageNet-1K pretrained DNNs demonstrated that the NestQuant model can achieve high performance in top-1 accuracy, and reduce in terms of data transmission, storage consumption, and switching overheads. In particular, the ResNet-101 with INT8 nesting INT6 can achieve 78.1% and 77.9% accuracy for full-bit and part-bit models, respectively, and reduce switching overheads by approximately 78.1% compared with diverse bitwidths PTQ models.
nan
Article 875
Title@2025-06-22 (7): Geometric Contact Flows: Contactomorphisms for Dynamics and Control
Title: Geometric Contact Flows: Contactomorphisms for Dynamics and Control | Geometrische Kontaktflüsse: Kontaktomorphismen für Dynamik und Steuerung | 几何接触流动:动态和控制的接触形态 2506.17868v1 |
Authors (4): Andrea Testa, Søren Hauberg, Tamim Asfour, Leonel Rozo
Accurately modeling and predicting complex dynamical systems, particularly those involving force exchange and dissipation, is crucial for applications ranging from fluid dynamics to robotics, but presents significant challenges due to the intricate interplay of geometric constraints and energy transfer. This paper introduces Geometric Contact Flows (GFC), a novel framework leveraging Riemannian and Contact geometry as inductive biases to learn such systems. GCF constructs a latent contact Hamiltonian model encoding desirable properties like stability or energy conservation. An ensemble of contactomorphisms then adapts this model to the target dynamics while preserving these properties. This ensemble allows for uncertainty-aware geodesics that attract the system’s behavior toward the data support, enabling robust generalization and adaptation to unseen scenarios. Experiments on learning dynamics for physical systems and for controlling robots on interaction tasks demonstrate the effectiveness of our approach.
nan
Article 876
Title@2025-06-22 (7): DeepMedcast: A Deep Learning Method for Generating Intermediate Weather Forecasts among Multiple NWP Models
Title: DeepMedcast: A Deep Learning Method for Generating Intermediate Weather Forecasts among Multiple NWP Models | DeepMedcast: Eine Deep-Learning-Methode zur Generierung von Zwischenwetterprognosen unter mehreren NWP-Modellen | 深气象:在多国家工作方案模型中生成中期天气预报的深层学习方法 2411.10010v2 |
Authors (1): Atsushi Kudo
Numerical weather prediction (NWP) centers around the world operate a variety of NWP models. In addition, recent advances in AI-driven NWP models have further increased the availability of NWP outputs. While this expansion holds the potential to improve forecast accuracy, it raises a critical question: which prediction is the most plausible? If the NWP models have comparable accuracy, it is impossible to determine in advance which one is the best. Traditional approaches, such as ensemble or weighted averaging, combine multiple NWP outputs to produce a single forecast with improved accuracy. However, they often result in meteorologically unrealistic and uninterpretable outputs, such as the splitting of tropical cyclone centers or frontal boundaries into multiple distinct systems. To address this issue, we propose DeepMedcast, a deep learning method that generates intermediate forecasts between two or more NWP outputs. Unlike averaging, DeepMedcast provides predictions in which meteorologically significant features – such as the locations of tropical cyclones, extratropical cyclones, fronts, and shear lines – approximately align with the arithmetic mean of the corresponding features predicted by the input NWP models, without distorting meteorological structures. We demonstrate the capability of DeepMedcast through case studies and verification results, showing that it produces realistic and interpretable forecasts with higher accuracy than the input NWP models. By providing plausible intermediate forecasts, DeepMedcast can significantly contribute to the efficiency and standardization of operational forecasting tasks, including general, marine, and aviation forecasts.
nan
Article 877
Title@2025-06-22 (7): IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas
Title: IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas | IGNIS: Ein robustes neurales Netzwerk-Framework für eingeschränkte Parameterschätzungen in Archimedischen Copulas | IGNIS:Archimedean Copulas受控参数估计的强力神经网络框架 2505.22518v2 |
Authors (3): Agnideep Aich, Ashit Baran Aich, Bruce Wade
We introduce IGNIS, a deep-learning framework for constrained parameter estimation in Archimedean copulas with natural domain $\theta \geq 1$. While illustrated here on four families (Gumbel, Joe and the novel A1/A2 copulas), IGNIS is readily applicable to any one-parameter Archimedean model with $\theta \geq 1$. Classical estimators (Method of Moments (MoM), Maximum Likelihood Estimation (MLE), Maximum Pseudo-Likelihood (MPL)) break down on A1/A2 due to non-monotonic dependence mappings, steep likelihood gradients and the need for custom constraint handling. IGNIS sidesteps these issues by learning a direct mapping from four summary statistics (Kendall’s $\tau$, Spearman’s $\rho$, empirical 0.95 tail-dependence and Pearson $r$) plus a one-hot family indicator to $\theta$, ending in a softplus + 1 output layer that automatically enforces $\hat{\theta} \geq 1$. Trained on 500 simulated $\theta$ values per family (10000 observations each), IGNIS outperforms the Method of Moments in extensive simulations and delivers accurate, stable estimates on real-world AAPL-MSFT returns and CDC diabetes data. Our results demonstrate a unified, constraint-aware neural estimator for modern copula-based dependence modeling, easily extendable to any copula family respecting $\theta \geq 1$.
nan
Article 878
Title@2025-06-22 (7): How Visual Representations Map to Language Feature Space in Multimodal LLMs
Title: How Visual Representations Map to Language Feature Space in Multimodal LLMs | Wie visuelle Darstellungen den Sprach-Feature-Raum in multimodalen LLMs anzeigen | 多模式LMM中语言特征空间的视觉图示图 2506.11976v2 |
Authors (5): Constantin Venhoff, Ashkan Khakzar, Sonia Joseph, Philip Torr, Neel Nanda
Effective multimodal reasoning depends on the alignment of visual and linguistic representations, yet the mechanisms by which vision-language models (VLMs) achieve this alignment remain poorly understood. Following the LiMBeR framework, we deliberately maintain a frozen large language model (LLM) and a frozen vision transformer (ViT), connected solely by training a linear adapter during visual instruction tuning. By keeping the language model frozen, we ensure it maintains its original language representations without adaptation to visual data. Consequently, the linear adapter must map visual features directly into the LLM’s existing representational space rather than allowing the language model to develop specialized visual understanding through fine-tuning. Our experimental design uniquely enables the use of pre-trained sparse autoencoders (SAEs) of the LLM as analytical probes. These SAEs remain perfectly aligned with the unchanged language model and serve as a snapshot of the learned language feature-representations. Through systematic analysis of SAE reconstruction error, sparsity patterns, and feature SAE descriptions, we reveal the layer-wise progression through which visual representations gradually align with language feature representations, converging in middle-to-later layers. This suggests a fundamental misalignment between ViT outputs and early LLM layers, raising important questions about whether current adapter-based architectures optimally facilitate cross-modal representation learning.
nan
Article 879
Title@2025-06-22 (7): Learning to Reason under Off-Policy Guidance
Title: Learning to Reason under Off-Policy Guidance | Unter außerpolitischer Anleitung zur Vernunft lernen | 根据非政策指导学习理由 2504.14945v5 |
Authors (8): Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(\textit{RLVR}). However, existing \textit{RLVR} approaches are inherently ``on-policy’’, limiting learning to a model’s own outputs and failing to acquire reasoning abilities beyond its initial capabilities. To address this issue, we introduce \textbf{LUFFY} (\textbf{L}earning to reason \textbf{U}nder o\textbf{FF}-polic\textbf{Y} guidance), a framework that augments \textit{RLVR} with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Specifically, LUFFY combines the Mixed-Policy GRPO framework, which has a theoretically guaranteed convergence rate, alongside policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Compared with previous RLVR methods, LUFFY achieves an over \textbf{+6.4} average gain across six math benchmarks and an advantage of over \textbf{+6.2} points in out-of-distribution tasks. Most significantly, we show that LUFFY successfully trains weak models in scenarios where on-policy RLVR completely fails. These results provide compelling evidence that LUFFY transcends the fundamental limitations of on-policy RLVR and demonstrates the great potential of utilizing off-policy guidance in RLVR.
nan
Article 880
Title@2025-06-21 (6): FedBaF: Federated Learning Aggregation Biased by a Foundation Model
Title: FedBaF: Federated Learning Aggregation Biased by a Foundation Model | FedBaF: Federated Learning Aggregation Durch ein Stiftungsmodell biased | FedBAF: 联邦学习联合组织 2410.18352v3 |
Authors (4): Jong-Ik Park, Srinivasa Pranav, José M. F. Moura, Carlee Joe-Wong
Foundation models are now a major focus of leading technology organizations due to their ability to generalize across diverse tasks. Existing approaches for adapting foundation models to new applications often rely on Federated Learning (FL) and disclose the foundation model weights to clients when using it to initialize the global model. While these methods ensure client data privacy, they compromise model and information security. In this paper, we introduce Federated Learning Aggregation Biased by a Foundation Model (FedBaF), a novel method for dynamically integrating pre-trained foundation model weights during the FL aggregation phase. Unlike conventional methods, FedBaF preserves the confidentiality of the foundation model while still leveraging its power to train more accurate models, especially in non-IID and adversarial scenarios. Our comprehensive experiments use Pre-ResNet and foundation models like Vision Transformer to demonstrate that FedBaF not only matches, but often surpasses the test accuracy of traditional weight initialization methods by up to 11.4% in IID and up to 15.8% in non-IID settings. Additionally, FedBaF applied to a Transformer-based language model significantly reduced perplexity by up to 39.2%.
nan
Article 881
Title@2025-06-21 (6): AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking
Title: AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking | AbRank: Benchmark Dataset und Metric-Learning Framework für Antikörper-Antigen-Affinitätsranking | AbRank:抗体-安提gen同系物排序基准数据集和计量-学习框架 2506.17857v1 |
Authors (7): Chunan Liu, Aurelien Pelissier, Yanjun Shao, Lilian Denzler, Andrew C. R. Martin, Brooks Paige, Mariia Rodriguez Martinez
Accurate prediction of antibody-antigen (Ab-Ag) binding affinity is essential for therapeutic design and vaccine development, yet the performance of current models is limited by noisy experimental labels, heterogeneous assay conditions, and poor generalization across the vast antibody and antigen sequence space. We introduce AbRank, a large-scale benchmark and evaluation framework that reframes affinity prediction as a pairwise ranking problem. AbRank aggregates over 380,000 binding assays from nine heterogeneous sources, spanning diverse antibodies, antigens, and experimental conditions, and introduces standardized data splits that systematically increase distribution shift, from local perturbations such as point mutations to broad generalization across novel antigens and antibodies. To ensure robust supervision, AbRank defines an m-confident ranking framework by filtering out comparisons with marginal affinity differences, focusing training on pairs with at least an m-fold difference in measured binding strength. As a baseline for the benchmark, we introduce WALLE-Affinity, a graph-based approach that integrates protein language model embeddings with structural information to predict pairwise binding preferences. Our benchmarks reveal significant limitations in current methods under realistic generalization settings and demonstrate that ranking-based training improves robustness and transferability. In summary, AbRank offers a robust foundation for machine learning models to generalize across the antibody-antigen space, with direct relevance for scalable, structure-aware antibody therapeutic design.
nan
Article 882
Title@2025-06-21 (6): Bayesian Inference for Left-Truncated Log-Logistic Distributions for Time-to-event Data Analysis
Title: Bayesian Inference for Left-Truncated Log-Logistic Distributions for Time-to-event Data Analysis | Bayesische Schlussfolgerung für links-beschnittene Log-Logistic-Distributionen für die Zeit-zu-Ereignis-Datenanalyse | 用于时间到活动数据分析的左排出日志分布的贝叶斯推理 2506.17852v1 |
Authors (4): Fahad Mostafa, Md Rejuan Haque, Md Mostafijur Rahman, Farzana Nasrin
Parameter estimation is a foundational step in statistical modeling, enabling us to extract knowledge from data and apply it effectively. Bayesian estimation of parameters incorporates prior beliefs with observed data to infer distribution parameters probabilistically and robustly. Moreover, it provides full posterior distributions, allowing uncertainty quantification and regularization, especially useful in small or truncated samples. Utilizing the left-truncated log-logistic (LTLL) distribution is particularly well-suited for modeling time-to-event data where observations are subject to a known lower bound such as precipitation data and cancer survival times. In this paper, we propose a Bayesian approach for estimating the parameters of the LTLL distribution with a fixed truncation point ( x_L > 0 ). Given a random variable ( X \sim LL(\alpha, \beta; x_L) ), where ( \alpha > 0 ) is the scale parameter and ( \beta > 0 ) is the shape parameter, the likelihood function is derived based on a truncated sample ( X_1, X_2, \dots, X_N ) with ( X_i > x_L ). We assume independent prior distributions for the parameters, and the posterior inference is conducted via Markov Chain Monte Carlo sampling, specifically using the Metropolis-Hastings algorithm to obtain posterior estimates ( \hat{\alpha} ) and ( \hat{\beta} ). Through simulation studies and real-world applications, we demonstrate that Bayesian estimation provides more stable and reliable parameter estimates, particularly when the likelihood surface is irregular due to left truncation. The results highlight the advantages of Bayesian inference outperform the estimation of parameter uncertainty in truncated distributions for time to event data analysis.
nan
Article 883
Title@2025-06-21 (6): Pathway-based Progressive Inference (PaPI) for Energy-Efficient Continual Learning
Title: Pathway-based Progressive Inference (PaPI) for Energy-Efficient Continual Learning | Pathway-based Progressive Inferenz (PaPI) für energieeffizientes kontinuierliches Lernen | 能源效率连续不断学习基于途径的渐进推论(PAPI) 2506.17848v1 |
Authors (3): Suyash Gaurav, Jukka Heikkonen, Jatin Chaudhary
Continual learning systems face the dual challenge of preventing catastrophic forgetting while maintaining energy efficiency, particularly in resource-constrained environments. This paper introduces Pathway-based Progressive Inference (PaPI), a novel theoretical framework that addresses these challenges through a mathematically rigorous approach to pathway selection and adaptation. We formulate continual learning as an energy-constrained optimization problem and provide formal convergence guarantees for our pathway routing mechanisms. Our theoretical analysis demonstrates that PaPI achieves an $\mathcal{O}(K)$ improvement in the stability-plasticity trade-off compared to monolithic architectures, where $K$ is the number of pathways. We derive tight bounds on forgetting rates using Fisher Information Matrix analysis and prove that PaPI’s energy consumption scales with the number of active parameters rather than the total model size. Comparative theoretical analysis shows that PaPI provides stronger guarantees against catastrophic forgetting than Elastic Weight Consolidation (EWC) while maintaining better energy efficiency than both EWC and Gradient Episodic Memory (GEM). Our experimental validation confirms these theoretical advantages across multiple benchmarks, demonstrating PaPI’s effectiveness for continual learning in energy-constrained settings. Our codes are available at https://github.com/zser092/PAPI_FILES.
nan
Article 884
Title@2025-06-21 (6): A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity
Title: A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity | Eine vergleichende Studie von Open-Source-Bibliotheken für die synthetische tabellarische Datengenerierung: SDV vs. SynthCity | 对用于合成图表数据生成的开放源码图书馆的比较研究:SDV诉合成城市 2506.17847v1 |
Authors (1): Cristian Del Gobbo
High-quality training data is critical to the performance of machine learning models, particularly Large Language Models (LLMs). However, obtaining real, high-quality data can be challenging, especially for smaller organizations and early-stage startups. Synthetic data generators provide a promising solution by replicating the statistical and structural properties of real data while preserving privacy and scalability. This study evaluates the performance of six tabular synthetic data generators from two widely used open-source libraries: SDV (Gaussian Copula, CTGAN, TVAE) and Synthicity (Bayesian Network, CTGAN, TVAE). Using a real-world dataset from the UCI Machine Learning Repository, comprising energy consumption and environmental variables from Belgium, we simulate a low-data regime by training models on only 1,000 rows. Each generator is then tasked with producing synthetic datasets under two conditions: a 1:1 (1,000 rows) and a 1:10 (10,000 rows) input-output ratio. Evaluation is conducted using two criteria: statistical similarity, measured via classical statistics and distributional metrics; and predictive utility, assessed using a “Train on Synthetic, Test on Real” approach with four regression models. While statistical similarity remained consistent across models in both scenarios, predictive utility declined notably in the 1:10 case. The Bayesian Network from Synthicity achieved the highest fidelity in both scenarios, while TVAE from SDV performed best in predictive tasks under the 1:10 setting. Although no significant performance gap was found between the two libraries, SDV stands out for its superior documentation and ease of use, making it more accessible for practitioners.
nan
Article 885
Title@2025-06-21 (6): Causal Spherical Hypergraph Networks for Modelling Social Uncertainty
Title: Causal Spherical Hypergraph Networks for Modelling Social Uncertainty | Causal Spherical Hypergraph Networks for Modeling Social Uncertainty | 社会不确定性建模模型化的因果球面高光谱网络 2506.17840v1 |
Authors (2): Anoushka Harit, Zhongtian Sun
Human social behaviour is governed by complex interactions shaped by uncertainty, causality, and group dynamics. We propose Causal Spherical Hypergraph Networks (Causal-SphHN), a principled framework for socially grounded prediction that jointly models higher-order structure, directional influence, and epistemic uncertainty. Our method represents individuals as hyperspherical embeddings and group contexts as hyperedges, capturing semantic and relational geometry. Uncertainty is quantified via Shannon entropy over von Mises-Fisher distributions, while temporal causal dependencies are identified using Granger-informed subgraphs. Information is propagated through an angular message-passing mechanism that respects belief dispersion and directional semantics. Experiments on SNARE (offline networks), PHEME (online discourse), and AMIGOS (multimodal affect) show that Causal-SphHN improves predictive accuracy, robustness, and calibration over strong baselines. Moreover, it enables interpretable analysis of influence patterns and social ambiguity. This work contributes a unified causal-geometric approach for learning under uncertainty in dynamic social environments.
nan
Article 886
Title@2025-06-21 (6): Evaluating Rank-N-Contrast: Continuous and Robust Representations for Regression
Title: Evaluating Rank-N-Contrast: Continuous and Robust Representations for Regression | Bewertung von Rank-N-Kontrast: Kontinuierliche und robuste Darstellungen für Regression | 评价排名-N-Contrast:持续和有力的倒退代表 2411.16298v3 |
Authors (3): Valentin Six, Alexandre Chidiac, Arkin Worlikar
This document is an evaluation of the original “Rank-N-Contrast” (arXiv:2210.01189v2) paper published in 2023. This evaluation is done for academic purposes. Deep regression models often fail to capture the continuous nature of sample orders, creating fragmented representations and suboptimal performance. To address this, we reproduced the Rank-N-Contrast (RNC) framework, which learns continuous representations by contrasting samples by their rankings in the target space. Our study validates RNC’s theoretical and empirical benefits, including improved performance and robustness. We extended the evaluation to an additional regression dataset and conducted robustness tests using a holdout method, where a specific range of continuous data was excluded from the training set. This approach assessed the model’s ability to generalize to unseen data and achieve state-of-the-art performance. This replication study validates the original findings and broadens the understanding of RNC’s applicability and robustness.
nan
Article 887
Title@2025-06-21 (6): Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking
Title: Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking | Leveling the Playing Field: Klassische und gelernte Controller für Quadrotor Trajectory Tracking sorgfältig miteinander vergleichen | 平整播放字段: 仔细比较用于四重奏轨迹跟踪的古典和中学主计长 2506.17832v1 |
Authors (4): Pratik Kunapuli, Jake Welde, Dinesh Jayaraman, Vijay Kumar
Learning-based control approaches like reinforcement learning (RL) have recently produced a slew of impressive results for tasks like quadrotor trajectory tracking and drone racing. Naturally, it is common to demonstrate the advantages of these new controllers against established methods like analytical controllers. We observe, however, that reliably comparing the performance of such very different classes of controllers is more complicated than might appear at first sight. As a case study, we take up the problem of agile tracking of an end-effector for a quadrotor with a fixed arm. We develop a set of best practices for synthesizing the best-in-class RL and geometric controllers (GC) for benchmarking. In the process, we resolve widespread RL-favoring biases in prior studies that provide asymmetric access to: (1) the task definition, in the form of an objective function, (2) representative datasets, for parameter optimization, and (3) feedforward information, describing the desired future trajectory. The resulting findings are the following: our improvements to the experimental protocol for comparing learned and classical controllers are critical, and each of the above asymmetries can yield misleading conclusions. Prior works have claimed that RL outperforms GC, but we find the gaps between the two controller classes are much smaller than previously published when accounting for symmetric comparisons. Geometric control achieves lower steady-state error than RL, while RL has better transient performance, resulting in GC performing better in relatively slow or less agile tasks, but RL performing better when greater agility is required. Finally, we open-source implementations of geometric and RL controllers for these aerial vehicles, implementing best practices for future development. Website and code is available at https://pratikkunapuli.github.io/rl-vs-gc/
nan
Article 888
Title@2025-06-21 (6): Sharper Bounds for Chebyshev Moment Matching, with Applications
Title: Sharper Bounds for Chebyshev Moment Matching, with Applications | Scharfere Bounds für Chebyshev Moment Matching, mit Anwendungen | Chebyshev Moment 配配, 与应用程序 2408.12385v2 |
Authors (4): Cameron Musco, Christopher Musco, Lucas Rosenblatt, Apoorv Vikram Singh
We study the problem of approximately recovering a probability distribution given noisy measurements of its Chebyshev polynomial moments. This problem arises broadly across algorithms, statistics, and machine learning. By leveraging a global decay bound on the coefficients in the Chebyshev expansion of any Lipschitz function, we sharpen prior work, proving that accurate recovery in the Wasserstein distance is possible with more noise than previously known. Our result immediately yields a number of applications: 1) We give a simple “linear query” algorithm for constructing a differentially private synthetic data distribution with Wasserstein-$1$ error $\tilde{O}(1/n)$ based on a dataset of $n$ points in $[-1,1]$. This bound is optimal up to log factors, and matches a recent result of Boedihardjo, Strohmer, and Vershynin [Probab. Theory. Rel., 2024], which uses a more complex “superregular random walk” method. 2) We give an $\tilde{O}(n^2/\epsilon)$ time algorithm for the linear algebraic problem of estimating the spectral density of an $n\times n$ symmetric matrix up to $\epsilon$ error in the Wasserstein distance. Our result accelerates prior methods from Chen et al. [ICML 2021] and Braverman et al. [STOC 2022]. 3) We tighten an analysis of Vinayak, Kong, Valiant, and Kakade [ICML 2019] on the maximum likelihood estimator for the statistical problem of “Learning Populations of Parameters’’, extending the parameter regime in which sample optimal results can be obtained. Beyond these main results, we provide an extension of our bound to estimating distributions in $d > 1$ dimensions. We hope that these bounds will find applications more broadly to problems involving distribution recovery from noisy moment information.
nan
Article 889
Title@2025-06-21 (6): Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach
Title: Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach | Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz | 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v1 |
Authors (9): Xinnan Zhang, Chenliang Li, Siliang Zeng, Jiaxiang Li, Zhongruo Wang, Kaixiang Lin, Songtao Lu, Alfredo Garcia, Mingyi Hong
Aligning large language models (LLMs) with human preferences usually requires fine-tuning methods such as RLHF and DPO. These methods directly optimize the model parameters, so they cannot be used in test-time to improve model performance, nor are they applicable when the model weights are not accessible. In contrast, test-time methods sidestep weight updates by leveraging reward functions to guide and improve output quality. However, they incur high inference costs, and their one-shot guidance is often based on imperfect reward or value functions, leading to suboptimal outputs. In this work, we present a method named Iterative Reweight-then-Optimize (IRO), a reinforcement learning (RL) framework that performs RL-style alignment of the (frozen) base model without touching its parameters. During training, each iteration (i) samples candidates from the base model, (ii) resamples using current value functions, and (iii) trains a new lightweight value function that guides the next decoding pass. At test time, the value functions are used to guide the base model generation via a search-based optimization process. Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI’s reinforcement fine-tuning (RFT), but without requiring access to the model weights.
nan
Article 890
Title@2025-06-21 (6): Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning
Title: Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning | Durch Causal-Hypergraphen praktikable Interpretierbarkeit: Entwirren von Batch-Größeneffekten im Deep Learning | 通过Causal Hyphriphes:深学习中不易破坏的批量大小效应 2506.17826v1 |
Authors (3): Zhongtian Sun, Anoushka Harit, Pietro Lio
While the impact of batch size on generalisation is well studied in vision tasks, its causal mechanisms remain underexplored in graph and text domains. We introduce a hypergraph-based causal framework, HGCNet, that leverages deep structural causal models (DSCMs) to uncover how batch size influences generalisation via gradient noise, minima sharpness, and model complexity. Unlike prior approaches based on static pairwise dependencies, HGCNet employs hypergraphs to capture higher-order interactions across training dynamics. Using do-calculus, we quantify direct and mediated effects of batch size interventions, providing interpretable, causally grounded insights into optimisation. Experiments on citation networks, biomedical text, and e-commerce reviews show that HGCNet outperforms strong baselines including GCN, GAT, PI-GNN, BERT, and RoBERTa. Our analysis reveals that smaller batch sizes causally enhance generalisation through increased stochasticity and flatter minima, offering actionable interpretability to guide training strategies in deep learning. This work positions interpretability as a driver of principled architectural and optimisation choices beyond post hoc analysis.
nan
Article 891
Title@2025-06-21 (6): Quantum-Hybrid Support Vector Machines for Anomaly Detection in Industrial Control Systems
Title: Quantum-Hybrid Support Vector Machines for Anomaly Detection in Industrial Control Systems | Quanten-Hybrid-Unterstützung Vektormaschinen für Anomalieerkennung in industriellen Steuerungssystemen | 用于在工业控制系统中异常探测的量子-湿性支持矢量机 2506.17824v1 |
Authors (4): Tyler Cultice, Md. Saif Hassan Onim, Annarita Giani, Himanshu Thapliyal
Sensitive data captured by Industrial Control Systems (ICS) play a large role in the safety and integrity of many critical infrastructures. Detection of anomalous or malicious data, or Anomaly Detection (AD), with machine learning is one of many vital components of cyberphysical security. Quantum kernel-based machine learning methods have shown promise in identifying complex anomalous behavior by leveraging the highly expressive and efficient feature spaces of quantum computing. This study focuses on the parameterization of Quantum Hybrid Support Vector Machines (QSVMs) using three popular datasets from Cyber-Physical Systems (CPS). The results demonstrate that QSVMs outperform traditional classical kernel methods, achieving 13.3% higher F1 scores. Additionally, this research investigates noise using simulations based on real IBMQ hardware, revealing a maximum error of only 0.98% in the QSVM kernels. This error results in an average reduction of 1.57% in classification metrics. Furthermore, the study found that QSVMs show a 91.023% improvement in kernel-target alignment compared to classical methods, indicating a potential “quantum advantage” in anomaly detection for critical infrastructures. This effort suggests that QSVMs can provide a substantial advantage in anomaly detection for ICS, ultimately enhancing the security and integrity of critical infrastructures.
nan
Article 892
Title@2025-06-21 (6): Learning to Dock: A Simulation-based Study on Closing the Sim2Real Gap in Autonomous Underwater Docking
Title: Learning to Dock: A Simulation-based Study on Closing the Sim2Real Gap in Autonomous Underwater Docking | Dock lernen: Eine simulationsbasierte Studie zum Schließen der Sim2Real Gap im autonomen Unterwasser-Docking | 学到码头:模拟研究,研究如何缩小自来自来自来自来自来水库中的Sim2实时差距 2506.17823v1 |
Authors (5): Kevin Chang, Rakesh Vivekanandan, Noah Pragin, Sean Bullock, Geoffrey Hollinger
Autonomous Underwater Vehicle (AUV) docking in dynamic and uncertain environments is a critical challenge for underwater robotics. Reinforcement learning is a promising method for developing robust controllers, but the disparity between training simulations and the real world, or the sim2real gap, often leads to a significant deterioration in performance. In this work, we perform a simulation study on reducing the sim2real gap in autonomous docking through training various controllers and then evaluating them under realistic disturbances. In particular, we focus on the real-world challenge of docking under different payloads that are potentially outside the original training distribution. We explore existing methods for improving robustness including randomization techniques and history-conditioned controllers. Our findings provide insights into mitigating the sim2real gap when training docking controllers. Furthermore, our work indicates areas of future research that may be beneficial to the marine robotics community.
nan
Article 893
Title@2025-06-21 (6): CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
Title: CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning | CultureMERT: Kontinuierliches Pre-Training für kulturübergreifendes Musikrepräsentanz-Lernen | CUCMERT: 不同文化间音乐代表制学习的继续预培训 2506.17818v1 |
Authors (3): Angelos-Nikolaos Kanatas, Charilaos Papaioannou, Alexandros Potamianos
Recent advances in music foundation models have improved audio representation learning, yet their effectiveness across diverse musical traditions remains limited. We introduce CultureMERT-95M, a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning and understanding. To achieve this, we propose a two-stage continual pre-training strategy that integrates learning rate re-warming and re-decaying, enabling stable adaptation even with limited computational resources. Training on a 650-hour multi-cultural data mix, comprising Greek, Turkish, and Indian music traditions, results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks, surpassing prior state-of-the-art, with minimal forgetting on Western-centric benchmarks. We further investigate task arithmetic, an alternative approach to multi-cultural adaptation that merges single-culture adapted models in the weight space. Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets. Cross-cultural evaluation reveals that single-culture models transfer with varying effectiveness across musical traditions, whereas the multi-culturally adapted model achieves the best overall performance. To support research on world music representation learning, we publicly release CultureMERT-95M and CultureMERT-TA-95M, fostering the development of more culturally aware music foundation models.
nan
Article 894
Title@2025-06-21 (6): Secure Energy Transactions Using Blockchain Leveraging AI for Fraud Detection and Energy Market Stability
Title: Secure Energy Transactions Using Blockchain Leveraging AI for Fraud Detection and Energy Market Stability | Sichere Energietransaktionen mittels Blockchain-Leveraging-KI für Betrugserkennung und Energiemarktstabilität | 利用安全能源交易利用 “ 利用全链利用AI “ 来欺诈侦查和能源市场稳定 2506.19870v1 |
Authors (10): Md Asif Ul Hoq Khan, MD Zahedul Islam, Istiaq Ahmed, Md Masud Karim Rabbi, Farhana Rahman Anonna, MD Abdul Fahim Zeeshan, Mehedi Hasan Ridoy, Bivash Ranjan Chowdhury, Md Nazmul Shakir Rabbi, GM Alamin Sadnan
Peer-to-peer trading and the move to decentralized grids have reshaped the energy markets in the United States. Notwithstanding, such developments lead to new challenges, mainly regarding the safety and authenticity of energy trade. This study aimed to develop and build a secure, intelligent, and efficient energy transaction system for the decentralized US energy market. This research interlinks the technological prowess of blockchain and artificial intelligence (AI) in a novel way to solve long-standing challenges in the distributed energy market, specifically those of security, fraudulent behavior detection, and market reliability. The dataset for this research is comprised of more than 1.2 million anonymized energy transaction records from a simulated peer-to-peer (P2P) energy exchange network emulating real-life blockchain-based American microgrids, including those tested by LO3 Energy and Grid+ Labs. Each record contains detailed fields of transaction identifier, timestamp, energy volume (kWh), transaction type (buy/sell), unit price, prosumer/consumer identifier (hashed for privacy), smart meter readings, geolocation regions, and settlement confirmation status. The dataset also includes system-calculated behavior metrics of transaction rate, variability of energy production, and historical pricing patterns. The system architecture proposed involves the integration of two layers, namely a blockchain layer and artificial intelligence (AI) layer, each playing a unique but complementary function in energy transaction securing and market intelligence improvement. The machine learning models used in this research were specifically chosen for their established high performance in classification tasks, specifically in the identification of energy transaction fraud in decentralized markets.
nan
Article 895
Title@2025-06-21 (6): Flatness After All?
Title: Flatness After All? | Flachheit nach allem? | 终究是平坦吗? 2506.17809v1 |
Authors (3): Neta Shoham, Liron Mor-Yosef, Haim Avron
Recent literature has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized networks. A key observation is that “flat” minima tend to generalize better than “sharp” minima. While this idea is supported by empirical evidence, it has also been shown that deep networks can generalize even with arbitrary sharpness, as measured by either the trace or the spectral norm of the Hessian. In this paper, we argue that generalization could be assessed by measuring flatness using a soft rank measure of the Hessian. We show that when the common neural network model (neural network with exponential family negative log likelihood loss) is calibrated, and its prediction error and its confidence in the prediction are not correlated with the first and the second derivatives of the network’s output, our measure accurately captures the asymptotic expected generalization gap. For non-calibrated models, we connect our flatness measure to the well-known Takeuchi Information Criterion and show that it still provides reliable estimates of generalization gaps for models that are not overly confident. Experimental results indicate that our approach offers a robust estimate of the generalization gap compared to baselines.
nan
Article 896
Title@2025-06-21 (6): Reimagining Parameter Space Exploration with Diffusion Models
Title: Reimagining Parameter Space Exploration with Diffusion Models | Reimagining Parameter-Weltraumforschung mit Diffusionsmodellen | 利用扩散模型重新想象参数空间探索 2506.17807v1 |
Authors (3): Lijun Zhang, Xiao Liu, Hui Guan
Adapting neural networks to new tasks typically requires task-specific fine-tuning, which is time-consuming and reliant on labeled data. We explore a generative alternative that produces task-specific parameters directly from task identity, eliminating the need for task-specific training. To this end, we propose using diffusion models to learn the underlying structure of effective task-specific parameter space and synthesize parameters on demand. Once trained, the task-conditioned diffusion model can generate specialized weights directly from task identifiers. We evaluate this approach across three scenarios: generating parameters for a single seen task, for multiple seen tasks, and for entirely unseen tasks. Experiments show that diffusion models can generate accurate task-specific parameters and support multi-task interpolation when parameter subspaces are well-structured, but fail to generalize to unseen tasks, highlighting both the potential and limitations of this generative solution.
nan
Article 897
Title@2025-06-21 (6): AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator
Title: AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator | AdRo-FL: Informierte und sichere Kundenauswahl für das Federated Learning in der Gegenwart von Adversarial Aggregator | ADRO-FL:在存在反versarial聚合体的情况下,为联邦学习进行知情和安全的客户选择 2506.17805v1 |
Authors (5): Md. Kamrul Hossain, Walid Aljoby, Anis Elgabli, Ahmed M. Abdelmoniem, Khaled A. Harras
Federated Learning (FL) enables collaborative learning without exposing clients’ data. While clients only share model updates with the aggregator, studies reveal that aggregators can infer sensitive information from these updates. Secure Aggregation (SA) protects individual updates during transmission; however, recent work demonstrates a critical vulnerability where adversarial aggregators manipulate client selection to bypass SA protections, constituting a Biased Selection Attack (BSA). Although verifiable random selection prevents BSA, it precludes informed client selection essential for FL performance. We propose Adversarial Robust Federated Learning (AdRo-FL), which simultaneously enables: informed client selection based on client utility, and robust defense against BSA maintaining privacy-preserving aggregation. AdRo-FL implements two client selection frameworks tailored for distinct settings. The first framework assumes clients are grouped into clusters based on mutual trust, such as different branches of an organization. The second framework handles distributed clients where no trust relationships exist between them. For the cluster-oriented setting, we propose a novel defense against BSA by (1) enforcing a minimum client selection quota from each cluster, supervised by a cluster-head in every round, and (2) introducing a client utility function to prioritize efficient clients. For the distributed setting, we design a two-phase selection protocol: first, the aggregator selects the top clients based on our utility-driven ranking; then, a verifiable random function (VRF) ensures a BSA-resistant final selection. AdRo-FL also applies quantization to reduce communication overhead and sets strict transmission deadlines to improve energy efficiency. AdRo-FL achieves up to $1.85\times$ faster time-to-accuracy and up to $1.06\times$ higher final accuracy compared to insecure baselines.
nan
Article 898
Title@2025-06-21 (6): Smooth InfoMax – Towards Easier Post-Hoc Interpretability
Title: Smooth InfoMax – Towards Easier Post-Hoc Interpretability | Smooth InfoMax – Auf dem Weg zu einer einfacheren Nach-Hoc-Interpretabilität | 平滑的InfoMax – – 迈向更轻松的后热后解释 2408.12936v3 |
Authors (3): Fabian Denoodt, Bart de Boer, José Oramas
We introduce Smooth InfoMax (SIM), a self-supervised representation learning method that incorporates interpretability constraints into the latent representations at different depths of the network. Based on $\beta$-VAEs, SIM’s architecture consists of probabilistic modules optimized locally with the InfoNCE loss to produce Gaussian-distributed representations regularized toward the standard normal distribution. This creates smooth, well-defined, and better-disentangled latent spaces, enabling easier post-hoc analysis. Evaluated on speech data, SIM preserves the large-scale training benefits of Greedy InfoMax while improving the effectiveness of post-hoc interpretability methods across layers.
nan
Article 899
Title@2025-06-21 (6): SING: SDE Inference via Natural Gradients
Title: SING: SDE Inference via Natural Gradients | SING: SDE-Schlussfolgerung über natürliche Gradienten | SING: SDE 通过自然梯度推断 2506.17796v1 |
Authors (3): Amber Hu, Henry Smith, Scott Linderman
Latent stochastic differential equation (SDE) models are important tools for the unsupervised discovery of dynamical systems from data, with applications ranging from engineering to neuroscience. In these complex domains, exact posterior inference of the latent state path is typically intractable, motivating the use of approximate methods such as variational inference (VI). However, existing VI methods for inference in latent SDEs often suffer from slow convergence and numerical instability. Here, we propose SDE Inference via Natural Gradients (SING), a method that leverages natural gradient VI to efficiently exploit the underlying geometry of the model and variational posterior. SING enables fast and reliable inference in latent SDE models by approximating intractable integrals and parallelizing computations in time. We provide theoretical guarantees that SING will approximately optimize the intractable, continuous-time objective of interest. Moreover, we demonstrate that better state inference enables more accurate estimation of nonlinear drift functions using, for example, Gaussian process SDE models. SING outperforms prior methods in state inference and drift estimation on a variety of datasets, including a challenging application to modeling neural dynamics in freely behaving animals. Altogether, our results illustrate the potential of SING as a tool for accurate inference in complex dynamical systems, especially those characterized by limited prior knowledge and non-conjugate structure.
nan
Article 900
Title@2025-06-21 (6): Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries
Title: Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries | Physik-informiert KAN PointNet: Deep Learning für simultane Lösungen für inverse Probleme im unkompressiblen Fluss auf zahlreichen irregulären Geometrien | KAN PointNet:深刻学习如何同时解决许多非正常地貌的不压抑性流动的反面问题 2504.06327v2 |
Authors (2): Ali Kashefi, Tapan Mukerji
Kolmogorov-Arnold Networks (KANs) have gained attention as an alternative to traditional multilayer perceptrons (MLPs) for deep learning applications in computational physics, particularly for solving inverse problems with sparse data, as exemplified by the physics-informed Kolmogorov-Arnold network (PIKAN). However, the capability of KANs to simultaneously solve inverse problems over multiple irregular geometries within a single training run remains unexplored. To address this gap, we introduce the physics-informed Kolmogorov-Arnold PointNet (PI-KAN-PointNet), in which shared KANs are integrated into the PointNet architecture to capture the geometric features of computational domains. The loss function comprises the squared residuals of the governing equations, computed via automatic differentiation, along with sparse observations and partially known boundary conditions. We construct shared KANs using Jacobi polynomials and investigate their performance by considering Jacobi polynomials of different degrees and types in terms of both computational cost and prediction accuracy. As a benchmark test case, we consider natural convection in a square enclosure with a cylinder, where the cylinder’s shape varies across a dataset of 135 geometries. PI-KAN-PointNet offers two main advantages. First, it overcomes the limitation of current PIKANs, which are restricted to solving only a single computational domain per training run, thereby reducing computational costs. Second, when comparing the performance of PI-KAN-PointNet with that of the physics-informed PointNet using MLPs, we observe that, with approximately the same number of trainable parameters and comparable computational cost in terms of the number of epochs, training time per epoch, and memory usage, PI-KAN-PointNet yields more accurate predictions, particularly for values on unknown boundary conditions involving nonsmooth geometries.
nan
Article 901
Title@2025-06-21 (6): Bayesian Social Deduction with Graph-Informed Language Models
Title: Bayesian Social Deduction with Graph-Informed Language Models | Bayesische soziale Deduktion mit Graphen-informierten Sprachmodellen | 采用图形化语言模型的巴伊斯社会衰退 2506.17788v1 |
Authors (7): Shahab Rahimirad, Guven Gergerli, Lucia Romero, Angela Qian, Matthew Lyle Olson, Simon Stepputtis, Joseph Campbell
Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/
nan
Article 902
Title@2025-06-21 (6): Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models
Title: Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models | Über die Instruktionskonditionierung hinaus, MoTE: Mischung von Task-Experten für Multi-Task-Einbettungsmodelle | 超越教学-调控,MOTE:多任务嵌入模型任务专家混合 2506.17781v1 |
Authors (5): Miguel Romero, Shuoyang Ding, Corey D. Barret, Georgiana Dinu, George Karypis
Dense embeddings are fundamental to modern machine learning systems, powering Retrieval-Augmented Generation (RAG), information retrieval, and representation learning. While instruction-conditioning has become the dominant approach for embedding specialization, its direct application to low-capacity models imposes fundamental representational constraints that limit the performance gains derived from specialization. In this paper, we analyze these limitations and introduce the Mixture of Task Experts (MoTE) transformer block, which leverages task-specialized parameters trained with Task-Aware Contrastive Learning (\tacl) to enhance the model ability to generate specialized embeddings. Empirical results show that MoTE achieves $64\%$ higher performance gains in retrieval datasets ($+3.27 \rightarrow +5.21$) and $43\%$ higher performance gains across all datasets ($+1.81 \rightarrow +2.60$). Critically, these gains are achieved without altering instructions, training data, inference time, or number of active parameters.
nan
Article 903
Title@2025-06-21 (6): Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series
Title: Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series | Verbesserung der Glukose-Prognose von ICU-Patienten durch hierarchische Modellierung irregulärer Zeitreihen | 通过不定期时序的等级建模,加强对伊斯兰法院联盟病人的葡萄糖水平预测 2411.01418v3 |
Authors (3): Hadi Mehdizavareh, Arijit Khan, Simon Lebech Cichosz
Accurately predicting blood glucose (BG) levels of ICU patients is critical, as both hypoglycemia (BG < 70 mg/dL) and hyperglycemia (BG > 180 mg/dL) are associated with increased morbidity and mortality. This study presents a proof-of-concept machine learning framework, the Multi-source Irregular Time-Series Transformer (MITST), designed to predict BG levels in ICU patients. In contrast to existing methods that rely heavily on manual feature engineering or utilize limited Electronic Health Record (EHR) data sources, MITST integrates diverse clinical data–including laboratory results, medications, and vital signs without predefined aggregation. The model leverages a hierarchical Transformer architecture, designed to capture interactions among features within individual timestamps, temporal dependencies across different timestamps, and semantic relationships across multiple data sources. Evaluated using the extensive eICU database (200,859 ICU stays across 208 hospitals), MITST achieves a statistically significant ( p < 0.001 ) average improvement of 1.7 percentage points (pp) in AUROC and 1.8 pp in AUPRC over a state-of-the-art random forest baseline. Crucially, for hypoglycemia–a rare but life-threatening condition–MITST increases sensitivity by 7.2 pp, potentially enabling hundreds of earlier interventions across ICU populations. The flexible architecture of MITST allows seamless integration of new data sources without retraining the entire model, enhancing its adaptability for clinical decision support. While this study focuses on predicting BG levels, we also demonstrate MITST’s ability to generalize to a distinct clinical task (in-hospital mortality prediction), highlighting its potential for broader applicability in ICU settings. MITST thus offers a robust and extensible solution for analyzing complex, multi-source, irregular time-series data.
nan
Article 904
Title@2025-06-21 (6): Toward Autonomous UI Exploration: The UIExplorer Benchmark
Title: Toward Autonomous UI Exploration: The UIExplorer Benchmark | Auf dem Weg zu autonomer UI-Exploration: Der UIExplorer Benchmark | 走向自主的界面勘探:界面勘探者基准 2506.17779v1 |
Authors (7): Andrei Cristian Nica, Akshaya Vishnu Kudlu Shanbhogue, Harshil Shah, Aleix Cambray, Tudor Berariu, Lucas Maystre, David Barber
Autonomous agents must know how to explore user interfaces (UIs) for reliable task solving, yet systematic evaluation of this crucial phase is lacking. We introduce UIExplore-Bench, the first benchmark explicitly dedicated to UI exploration. The benchmark evaluates agents with either Structured mode (granting access to layout information like DOM trees) or Screen mode (relying on GUI-only observations such as screenshots and human-like mouse/keyboard interactions) across three levels in a standardized GitLab sandbox environment. We formalize exploration as the process of maximizing the set of actionable UI components discovered and propose a metric, human-normalized UI-Functionalities Observed (hUFO), to quantify the effectiveness of exploration. Our results show that UIExplore-AlGo achieves the leading mean hUFO scores, reaching up to 77.2% of human performance in Structured mode and 59.0% in Screen mode at 2,000 steps, particularly excelling at the Sparse level. The results highlight the relevance of our benchmark, as current agents show a substantial performance gap compared to one hour of human expert exploration, indicating ample room for future advancements. We publicly release the benchmark environment, an exploration dataset, and an evaluation suite to catalyze research into efficient UI exploration strategies and their downstream applications, such as experience-driven task completion and automated training data generation.
nan
Article 905
Title@2025-06-21 (6): Machine Learning Model Integration with Open World Temporal Logic for Process Automation
Title: Machine Learning Model Integration with Open World Temporal Logic for Process Automation | Machine Learning Model Integration mit Open World Temporal Logic für die Prozessautomatisierung | 与开放世界时间逻辑集成的机械学习模型集成 2506.17776v1 |
Authors (4): Dyuman Aditya, Colton Payne, Mario Leiva, Paulo Shakarian
Recent advancements in Machine Learning (ML) have yielded powerful models capable of extracting structured information from diverse and complex data sources. However, a significant challenge lies in translating these perceptual or extractive outputs into actionable, reasoned decisions within complex operational workflows. To address these challenges, this paper introduces a novel approach that integrates the outputs from various machine learning models directly with the PyReason framework, an open-world temporal logic programming reasoning engine. PyReason’s foundation in generalized annotated logic allows for the seamless incorporation of real-valued outputs (e.g., probabilities, confidence scores) from diverse ML models, treating them as truth intervals within its logical framework. Crucially, PyReason provides mechanisms, implemented in Python, to continuously poll ML model outputs, convert them into logical facts, and dynamically recompute the minimal model, ensuring real-tine adaptive decision-making. Furthermore, its native support for temporal reasoning, knowledge graph integration, and fully explainable interface traces enables sophisticated analysis over time-sensitive process data and existing organizational knowledge. By combining the strengths of perception and extraction from ML models with the logical deduction and transparency of PyReason, we aim to create a powerful system for automating complex processes. This integration finds utility across numerous domains, including manufacturing, healthcare, and business operations.
nan
Article 906
Title@2025-06-21 (6): PhysiX: A Foundation Model for Physics Simulations
Title: PhysiX: A Foundation Model for Physics Simulations | PhysiX: Ein Grundlagenmodell für Physiksimulationen | PhysiX:物理模拟基础模型 2506.17774v1 |
Authors (4): Tung Nguyen, Arsh Koneru, Shufan Li, Aditya grover
Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.
nan
Article 907
Title@2025-06-21 (6): Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks
Title: Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks | Log-Normal Multiplikative Dynamiken für stabile Low-Precision Training von großen Netzwerken | 用于大型网络稳定低精度培训的对地-热多复制动态 2506.17768v1 |
Authors (5): Keigo Nishida, Eren Mehmet Kıral, Kenichi Bannai, Mohammad Emtiyaz Khan, Thomas Möllenhoff
Studies in neuroscience have shown that biological synapses follow a log-normal distribution whose transitioning can be explained by noisy multiplicative dynamics. Biological networks can function stably even under dynamically fluctuating conditions arising due to unreliable synaptic transmissions. Here we ask: Is it possible to design similar multiplicative training in artificial neural networks? To answer this question, we derive a Bayesian learning rule that assumes log-normal posterior distributions over weights which gives rise to a new Log-Normal Multiplicative Dynamics (LMD) algorithm. The algorithm uses multiplicative updates with both noise and regularization applied multiplicatively. The method is as easy to implement as Adam and only requires one additional vector to store. Our results show that LMD achieves stable and accurate training-from-scratch under low-precision forward operations for Vision Transformer and GPT-2. These results suggest that multiplicative dynamics, a biological feature, may enable stable low-precision inference and learning on future energy-efficient hardware.
nan
Article 908
Title@2025-06-21 (6): A Locally Differential Private Coding-Assisted Succinct Histogram Protocol
Title: A Locally Differential Private Coding-Assisted Succinct Histogram Protocol | Ein lokal differenziertes, privates Coding Assisted Succinct Histogramm Protokoll | 本地差异私家编码辅助闪电直方图议定书 2506.17767v1 |
Authors (2): Hsuan-Po Liu, Hessam Mahdavifar
A succinct histogram captures frequent items and their frequencies across clients and has become increasingly important for large-scale, privacy-sensitive machine learning applications. To develop a rigorous framework to guarantee privacy for the succinct histogram problem, local differential privacy (LDP) has been utilized and shown promising results. To preserve data utility under LDP, which essentially works by intentionally adding noise to data, error-correcting codes naturally emerge as a promising tool for reliable information collection. This work presents the first practical $(\epsilon,\delta)$-LDP protocol for constructing succinct histograms using error-correcting codes. To this end, polar codes and their successive-cancellation list (SCL) decoding algorithms are leveraged as the underlying coding scheme. More specifically, our protocol introduces Gaussian-based perturbations to enable efficient soft decoding. Experiments demonstrate that our approach outperforms prior methods, particularly for items with low true frequencies, while maintaining similar frequency estimation accuracy.
nan
Article 909
Title@2025-06-21 (6): Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions
Title: Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions | Flugbahnvorhersage für autonomes Fahren: Fortschritt, Grenzen und Zukunftsrichtung | 自主驾驶的轨迹预测:进步、限制和未来方向 2503.03262v2 |
Authors (10): Nadya Abdel Madjid, Abdulrahman Ahmad, Murad Mebrahtu, Yousef Babaa, Abdelmoamen Nasser, Sumbal Malik, Bilal Hassan, Naoufel Werghi, Jorge Dias, Majid Khonji
As the potential for autonomous vehicles to be integrated on a large scale into modern traffic systems continues to grow, ensuring safe navigation in dynamic environments is crucial for smooth integration. To guarantee safety and prevent collisions, autonomous vehicles must be capable of accurately predicting the trajectories of surrounding traffic agents. Over the past decade, significant efforts from both academia and industry have been dedicated to designing solutions for precise trajectory forecasting. These efforts have produced a diverse range of approaches, raising questions about the differences between these methods and whether trajectory prediction challenges have been fully addressed. This paper reviews a substantial portion of recent trajectory prediction methods proposing a taxonomy to classify existing solutions. A general overview of the prediction pipeline is also provided, covering input and output modalities, modeling features, and prediction paradigms existing in the literature. In addition, the paper discusses active research areas within trajectory prediction, addresses the posed research questions, and highlights the remaining research gaps and challenges.
nan
Article 910
Title@2025-06-21 (6): Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes
Title: Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes | Derandomizing Simultane Confidence Regions for band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes | 改进诺姆弹道和多数表决制度,为限制有定型功能的功能提供同步信任区 2506.17764v1 |
Authors (2): Balázs Csanád Csáji, Bálint Horváth
Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output measurements, by working in a Paley-Wiener reproducing kernel Hilbert space. Kernel norm bounds are tightened using a uniformly-randomized Hoeffding’s inequality for small samples and an empirical Bernstein bound for larger ones. We derive an approximate threshold, based on the sample size and how informative the inputs are, that governs which bound to deploy. Finally, we apply majority voting to aggregate confidence sets from random subsamples, boosting both stability and region size. We prove that even per-input aggregated intervals retain their simultaneous coverage guarantee. These refinements are also validated through numerical experiments.
nan
Article 911
Title@2025-06-21 (6): DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Title: DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training | DUMP: Automatisiertes Lehrplanlernen auf Verteilungsebene für RL-basiertes LLM-Post-Training | DDMP: 以LLLLM为基础的LLM培训后课程自动分发级别课程学习 2504.09710v2 |
Authors (5): Zhenting Wang, Guofeng Cui, Yu-Jhe Li, Kun Wan, Wentian Zhao
Recent advances in reinforcement learning (RL)-based post-training have led to notable improvements in large language models (LLMs), particularly in enhancing their reasoning capabilities to handle complex tasks. However, most existing methods treat the training data as a unified whole, overlooking the fact that modern LLM training often involves a mixture of data from diverse distributions-varying in both source and difficulty. This heterogeneity introduces a key challenge: how to adaptively schedule training across distributions to optimize learning efficiency. In this paper, we present a principled curriculum learning framework grounded in the notion of distribution-level learnability. Our core insight is that the magnitude of policy advantages reflects how much a model can still benefit from further training on a given distribution. Based on this, we propose a distribution-level curriculum learning framework for RL-based LLM post-training, which leverages the Upper Confidence Bound (UCB) principle to dynamically adjust sampling probabilities for different distrubutions. This approach prioritizes distributions with either high average advantage (exploitation) or low sample count (exploration), yielding an adaptive and theoretically grounded training schedule. We instantiate our curriculum learning framework with GRPO as the underlying RL algorithm and demonstrate its effectiveness on logic reasoning datasets with multiple difficulties and sources. Our experiments show that our framework significantly improves convergence speed and final performance, highlighting the value of distribution-aware curriculum strategies in LLM post-training. Code: https://github.com/ZhentingWang/DUMP.
nan
Article 912
Title@2025-06-21 (6): Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion
Title: Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion | Auf dem Weg zu einem einheitlichen textuellen Graphen-Framework für spektrale Reasoning mittels physikalischer und chemischer Informationsfusion | 建立一个通过物理和化学信息融合的光学理由统一文本图框架 2506.17761v1 |
Authors (6): Jiheng Liang, Ziru Yu, Zujie Xie, Yuchen Guo, Yulan Guo, Xiangyang Yu
Motivated by the limitations of current spectral analysis methods-such as reliance on single-modality data, limited generalizability, and poor interpretability-we propose a novel multi-modal spectral analysis framework that integrates prior knowledge graphs with Large Language Models. Our method explicitly bridges physical spectral measurements and chemical structural semantics by representing them in a unified Textual Graph format, enabling flexible, interpretable, and generalizable spectral understanding. Raw spectra are first transformed into TAGs, where nodes and edges are enriched with textual attributes describing both spectral properties and chemical context. These are then merged with relevant prior knowledge-including functional groups and molecular graphs-to form a Task Graph that incorporates “Prompt Nodes” supporting LLM-based contextual reasoning. A Graph Neural Network further processes this structure to complete downstream tasks. This unified design enables seamless multi-modal integration and automated feature decoding with minimal manual annotation. Our framework achieves consistently high performance across multiple spectral analysis tasks, including node-level, edge-level, and graph-level classification. It demonstrates robust generalization in both zero-shot and few-shot settings, highlighting its effectiveness in learning from limited data and supporting in-context reasoning. This work establishes a scalable and interpretable foundation for LLM-driven spectral analysis, unifying physical and chemical modalities for scientific applications.
nan
Article 913
Title@2025-06-21 (6): G-Adaptivity: optimised graph-based mesh relocation for finite element methods
Title: G-Adaptivity: optimised graph-based mesh relocation for finite element methods | G-Adaptivität: optimierte graphbasierte Netzverlagerung für Finite-Elemente-Methoden | G-适应性:以最佳图形为基础的网格,用于定点元件方法 2407.04516v3 |
Authors (9): James Rowbottom, Georg Maierhofer, Teo Deveney, Eike Mueller, Alberto Paganini, Katharina Schratz, Pietro Liò, Carola-Bibiane Schönlieb, Chris Budd
We present a novel, and effective, approach to achieve optimal mesh relocation in finite element methods (FEMs). The cost and accuracy of FEMs is critically dependent on the choice of mesh points. Mesh relocation (r-adaptivity) seeks to optimise the mesh geometry to obtain the best solution accuracy at given computational budget. Classical r-adaptivity relies on the solution of a separate nonlinear “meshing” PDE to determine mesh point locations. This incurs significant cost at remeshing, and relies on estimates that relate interpolation- and FEM-error. Recent machine learning approaches have focused on the construction of fast surrogates for such classical methods. Instead, our new approach trains a graph neural network (GNN) to determine mesh point locations by directly minimising the FE solution error from the PDE system Firedrake to achieve higher solution accuracy. Our GNN architecture closely aligns the mesh solution space to that of classical meshing methodologies, thus replacing classical estimates for optimality with a learnable strategy. This allows for rapid and robust training and results in an extremely efficient and effective GNN approach to online r-adaptivity. Our method outperforms both classical, and prior ML, approaches to r-adaptive meshing. In particular, it achieves lower FE solution error, whilst retaining the significant speed-up over classical methods observed in prior ML work.
nan
Article 914
Title@2025-06-21 (6): Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities
Title: Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities | Physik-informierte Mischung von Experten-Netzwerk für interpretierbare Batteriedegradations-Trajektorie Berechnung inmitten von Zweitleben Komplexitäten | 可解释的电池降解轨迹计算第二寿命复杂性中可解释电池降解轨迹的专家网络的物理知情混合专家网络 2506.17755v1 |
Authors (9): Xinghao Huang, Shengyu Tao, Chen Liang, Jiawei Chen, Junzhe Shi, Yuqi Li, Bizhong Xia, Guangmin Zhou, Xuan Zhang
Retired electric vehicle batteries offer immense potential to support low-carbon energy systems, but uncertainties in their degradation behavior and data inaccessibilities under second-life use pose major barriers to safe and scalable deployment. This work proposes a Physics-Informed Mixture of Experts (PIMOE) network that computes battery degradation trajectories using partial, field-accessible signals in a single cycle. PIMOE leverages an adaptive multi-degradation prediction module to classify degradation modes using expert weight synthesis underpinned by capacity-voltage and relaxation data, producing latent degradation trend embeddings. These are input to a use-dependent recurrent network for long-term trajectory prediction. Validated on 207 batteries across 77 use conditions and 67,902 cycles, PIMOE achieves an average mean absolute percentage (MAPE) errors of 0.88% with a 0.43 ms inference time. Compared to the state-of-the-art Informer and PatchTST, it reduces computational time and MAPE by 50%, respectively. Compatible with random state of charge region sampling, PIMOE supports 150-cycle forecasts with 1.50% average and 6.26% maximum MAPE, and operates effectively even with pruned 5MB training data. Broadly, PIMOE framework offers a deployable, history-free solution for battery degradation trajectory computation, redefining how second-life energy storage systems are assessed, optimized, and integrated into the sustainable energy landscape.
nan
Article 915
Title@2025-06-21 (6): SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
Title: SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification | SCISSOR: Semantische Bias durch cluster-aware Siamesische Netzwerke für robuste Klassifizierung abmildern | SCISSOR: 通过 “ 硬性分类 “ 的集束电电波暹脑网络,减缓语义比亚 2506.14587v2 |
Authors (3): Shuo Yang, Bardh Prenkaj, Gjergji Kasneci
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on Not-MNIST. SCISSOR is also highly advantageous for lightweight models with ~9.5% improvement on F1 for ViT on computer vision datasets and ~11.9% for BERT on NLP. Our study redefines the landscape of model generalization by addressing overlooked semantic biases, establishing SCISSOR as a foundational framework for mitigating shortcut learning and fostering more robust, bias-resistant AI systems.
nan
Article 916
Title@2025-06-21 (6): Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Title: Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences | Kernel-Grenzwert für recurrente neurale Netzwerke, die auf ergodischen Datensequenzen trainiert werden | Ergodic数据序列培训的经常性神经网络核心限制 2308.14555v3 |
Authors (3): Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos
Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
nan
Article 917
Title@2025-06-21 (6): Pix2Geomodel: A Next-Generation Reservoir Geomodeling with Property-to-Property Translation
Title: Pix2Geomodel: A Next-Generation Reservoir Geomodeling with Property-to-Property Translation | Pix2Geomodel: Ein Next-Generation Reservoir Geomodeling mit Property-to-Property-Übersetzung | Pix2 Geomod: 下一个拥有地对地翻译的地建模 2506.17747v1 |
Authors (7): Abdulrahman Al-Fakih, Ardiansyah Koeshidayatullah, Nabil A. Saraih, Tapan Mukerji, Rayan Kanfar, Abdulmohsen Alali, SanLinn I. Kaka
Accurate geological modeling is critical for reservoir characterization, yet traditional methods struggle with complex subsurface heterogeneity, and they have problems with conditioning to observed data. This study introduces Pix2Geomodel, a novel conditional generative adversarial network (cGAN) framework based on Pix2Pix, designed to predict reservoir properties (facies, porosity, permeability, and water saturation) from the Rotliegend reservoir of the Groningen gas field. Utilizing a 7.6 million-cell dataset from the Nederlandse Aardolie Maatschappij, accessed via EPOS-NL, the methodology included data preprocessing, augmentation to generate 2,350 images per property, and training with a U-Net generator and PatchGAN discriminator over 19,000 steps. Evaluation metrics include pixel accuracy (PA), mean intersection over union (mIoU), frequency weighted intersection over union (FWIoU), and visualizations assessed performance in masked property prediction and property-to-property translation tasks. Results demonstrated high accuracy for facies (PA 0.88, FWIoU 0.85) and water saturation (PA 0.96, FWIoU 0.95), with moderate success for porosity (PA 0.70, FWIoU 0.55) and permeability (PA 0.74, FWIoU 0.60), and robust translation performance (e.g., facies-to-facies PA 0.98, FWIoU 0.97). The framework captured spatial variability and geological realism, as validated by variogram analysis, and calculated the training loss curves for the generator and discriminator for each property. Compared to traditional methods, Pix2Geomodel offers enhanced fidelity in direct property mapping. Limitations include challenges with microstructural variability and 2D constraints, suggesting future integration of multi-modal data and 3D modeling (Pix2Geomodel v2.0). This study advances the application of generative AI in geoscience, supporting improved reservoir management and open science initiatives.
nan
Article 918
Title@2025-06-21 (6): Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Title: Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator | Direkte diskriminative Optimierung: Ihr Likelihood-basiertes visuelles Generatives Modell ist geheim ein GAN-Diskriminator | 直接排斥性优化:你以相似性为基础的视觉创造模型秘密地是一个GAN 歧视者 2503.01103v3 |
Authors (7): Kaiwen Zheng, Yongxin Chen, Huayu Chen, Guande He, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang
While likelihood-based generative models, particularly diffusion and autoregressive models, have achieved remarkable fidelity in visual generation, the maximum likelihood estimation (MLE) objective, which minimizes the forward KL divergence, inherently suffers from a mode-covering tendency that limits the generation quality under limited model capacity. In this work, we propose Direct Discriminative Optimization (DDO) as a unified framework that integrates likelihood-based generative training and GAN-type discrimination to bypass this fundamental constraint by exploiting reverse KL and self-generated negative signals. Our key insight is to parameterize a discriminator implicitly using the likelihood ratio between a learnable target model and a fixed reference model, drawing parallels with the philosophy of Direct Preference Optimization (DPO). Unlike GANs, this parameterization eliminates the need for joint training of generator and discriminator networks, allowing for direct, efficient, and effective finetuning of a well-trained model to its full potential beyond the limits of MLE. DDO can be performed iteratively in a self-play manner for progressive model refinement, with each round requiring less than 1% of pretraining epochs. Our experiments demonstrate the effectiveness of DDO by significantly advancing the previous SOTA diffusion model EDM, reducing FID scores from 1.79/1.58/1.96 to new records of 1.30/0.97/1.26 on CIFAR-10/ImageNet-64/ImageNet 512x512 datasets without any guidance mechanisms, and by consistently improving both guidance-free and CFG-enhanced FIDs of visual autoregressive models on ImageNet 256x256.
nan
Article 919
Title@2025-06-21 (6): MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Title: MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation | MoORE: SVD-basierte Modell-MoE-ization für Konflikt- und Vergessenheits-Resistenz-Multi-Task-Anpassung | MoORE: 以SVD为基础的冲突与遗忘-恢复-远程多任务适应示范MoE化模式 2506.14436v2 |
Authors (5): Shen Yuan, Yin Zheng, Taifeng Wang, Binbin Liu, Hongteng Xu
Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ‘‘model MoE-ization’’ strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts’ orthogonality and maintains the column space of the original weight matrix. These two properties make the adapted model resistant to the conflicts among the new tasks and the oblivion of its original tasks, respectively. Experiments on various datasets demonstrate that MoORE outperforms existing multi-task adaptation methods consistently, showing its superiority in terms of conflict- and oblivion-resistance. The code of the experiments is available at https://github.com/DaShenZi721/MoORE.
nan
Article 920
Title@2025-06-21 (6): Learning Aerodynamics for the Control of Flying Humanoid Robots
Title: Learning Aerodynamics for the Control of Flying Humanoid Robots | Aerodynamik lernen zur Steuerung von fliegenden humanoiden Robotern | 用于控制飞行人类体机器人的学习空气动力学 2506.00305v2 |
Authors (11): Antonello Paolino, Gabriele Nava, Fabio Di Natale, Fabio Bergonti, Punith Reddy Vanteddu, Donato Grassi, Luca Riccobene, Alex Zanotti, Renato Tognaccini, Gianluca Iaccarino, Daniele Pucci
Robots with multi-modal locomotion are an active research field due to their versatility in diverse environments. In this context, additional actuation can provide humanoid robots with aerial capabilities. Flying humanoid robots face challenges in modeling and control, particularly with aerodynamic forces. This paper addresses these challenges from a technological and scientific standpoint. The technological contribution includes the mechanical design of iRonCub-Mk1, a jet-powered humanoid robot, optimized for jet engine integration, and hardware modifications for wind tunnel experiments on humanoid robots for precise aerodynamic forces and surface pressure measurements. The scientific contribution offers a comprehensive approach to model and control aerodynamic forces using classical and learning techniques. Computational Fluid Dynamics (CFD) simulations calculate aerodynamic forces, validated through wind tunnel experiments on iRonCub-Mk1. An automated CFD framework expands the aerodynamic dataset, enabling the training of a Deep Neural Network and a linear regression model. These models are integrated into a simulator for designing aerodynamic-aware controllers, validated through flight simulations and balancing experiments on the iRonCub-Mk1 physical prototype.
nan
Article 921
Title@2025-06-21 (6): Rethinking the Role of Operating Conditions for Learning-based Multi-condition Fault Diagnosis
Title: Rethinking the Role of Operating Conditions for Learning-based Multi-condition Fault Diagnosis | Überdenken der Rolle der Betriebsbedingungen für lernbasierte Multi-Condition-Fault-Diagnose | 重新思考业务条件对基于学习的多设备错失诊断的作用 2506.17740v1 |
Authors (5): Pengyu Han, Zeyi Liu, Shijin Chen, Dongliang Zou, Xiao He
Multi-condition fault diagnosis is prevalent in industrial systems and presents substantial challenges for conventional diagnostic approaches. The discrepancy in data distributions across different operating conditions degrades model performance when a model trained under one condition is applied to others. With the recent advancements in deep learning, transfer learning has been introduced to the fault diagnosis field as a paradigm for addressing multi-condition fault diagnosis. Among these methods, domain generalization approaches can handle complex scenarios by extracting condition-invariant fault features. Although many studies have considered fault diagnosis in specific multi-condition scenarios, the extent to which operating conditions affect fault information has been scarcely studied, which is crucial. However, the extent to which operating conditions affect fault information has been scarcely studied, which is crucial. When operating conditions have a significant impact on fault features, directly applying domain generalization methods may lead the model to learn condition-specific information, thereby reducing its overall generalization ability. This paper investigates the performance of existing end-to-end domain generalization methods under varying conditions, specifically in variable-speed and variable-load scenarios, using multiple experiments on a real-world gearbox. Additionally, a two-stage diagnostic framework is proposed, aiming to improve fault diagnosis performance under scenarios with significant operating condition impacts. By incorporating a domain-generalized encoder with a retraining strategy, the framework is able to extract condition-invariant fault features while simultaneously alleviating potential overfitting to the source domain. Several experiments on a real-world gearbox dataset are conducted to validate the effectiveness of the proposed approach.
nan
Article 922
Title@2025-06-21 (6): Recursive Gaussian Process State Space Model
Title: Recursive Gaussian Process State Space Model | Rekursive Gaussian Prozess Zustand Raum Modell | 递递性高斯进程状态空间模型 2411.14679v2 |
Authors (5): Tengjie Zheng, Haipeng Chen, Lin Cheng, Shengping Gong, Xu Huang
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
nan
Article 923
Title@2025-06-21 (6): Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs
Title: Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs | Sicheres Pruning LoRA: Robustes, distanzgeführtes Pruning für die Sicherheitsausrichtung bei der Anpassung von LLMs | 安全谨慎 LoRA:为适应LLMs实现安全协调提供强有力的远程指导 2506.18931v1 |
Authors (4): Shuang Ao, Yi Dong, Jinwei Hu, Sarvapali Ramchurn
Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) enhances adaptability while reducing computational costs. However, fine-tuning can compromise safety alignment, even with benign data, increasing susceptibility to harmful outputs. Existing safety alignment methods struggle to capture complex parameter shifts, leading to suboptimal safety-utility trade-offs. To address this issue, we propose Safe Pruning LoRA (SPLoRA), a novel pruning-based approach that selectively removes LoRA layers that weaken safety alignment, improving safety while preserving performance. At its core, we introduce Empirical-DIEM (E-DIEM), a dimension-insensitive similarity metric that effectively detects safety misalignment in LoRA-adapted models. We conduct extensive experiments on LLMs fine-tuned with mixed of benign and malicious data, and purely benign datasets, evaluating SPLoRA across utility, safety, and reliability metrics. Results demonstrate that SPLoRA outperforms state-of-the-art safety alignment techniques, significantly reducing safety risks while maintaining or improving model performance and reliability. Additionally, SPLoRA reduces inference overhead, making it a scalable and efficient solution for deploying safer and more reliable LLMs. The code is available at https://github.com/AoShuang92/SPLoRA.
nan
Article 924
Title@2025-06-21 (6): Numerical simulation of transient heat conduction with moving heat source using Physics Informed Neural Networks
Title: Numerical simulation of transient heat conduction with moving heat source using Physics Informed Neural Networks | Numerische Simulation der transienten Wärmeleitung mit beweglicher Wärmequelle mittels Physics Informed Neural Networks | 利用物理知情神经网络利用移动热源对瞬时热导导与移动热源进行数字模拟 2506.17726v1 |
Authors (2): Anirudh Kalyan, Sundararajan Natarajan
In this paper, the physics informed neural networks (PINNs) is employed for the numerical simulation of heat transfer involving a moving source. To reduce the computational effort, a new training method is proposed that uses a continuous time-stepping through transfer learning. Within this, the time interval is divided into smaller intervals and a single network is initialized. On this single network each time interval is trained with the initial condition for (n+1)th as the solution obtained at nth time increment. Thus, this framework enables the computation of large temporal intervals without increasing the complexity of the network itself. The proposed framework is used to estimate the temperature distribution in a homogeneous medium with a moving heat source. The results from the proposed framework is compared with traditional finite element method and a good agreement is seen.
nan
Article 925
Title@2025-06-21 (6): Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains
Title: Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains | Time-Aware-Lernen Kausaldarstellung für Modellverallgemeinerung in sich entwickelnden Domänen | 正在演变的域域中模型普遍化模型的学习时间- 软件因果代表 2506.17718v1 |
Authors (7): Zhuo He, Shuang Li, Wenze Song, Longhui Yuan, Jian Liang, Han Li, Kun Gai
Endowing deep models with the ability to generalize in dynamic scenarios is of vital significance for real-world deployment, given the continuous and complex changes in data distribution. Recently, evolving domain generalization (EDG) has emerged to address distribution shifts over time, aiming to capture evolving patterns for improved model generalization. However, existing EDG methods may suffer from spurious correlations by modeling only the dependence between data and targets across domains, creating a shortcut between task-irrelevant factors and the target, which hinders generalization. To this end, we design a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts, and propose \textbf{S}tatic-D\textbf{YN}amic \textbf{C}ausal Representation Learning (\textbf{SYNC}), an approach that effectively learns time-aware causal representations. Specifically, it integrates specially designed information-theoretic objectives into a sequential VAE framework which captures evolving patterns, and produces the desired representations by preserving intra-class compactness of causal factors both across and within domains. Moreover, we theoretically show that our method can yield the optimal causal predictor for each time domain. Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
nan
Article 926
Title@2025-06-21 (6): Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages
Title: Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages | Enthüllungsfaktoren für ein verbessertes POS-Tagging: Eine Studie über ressourcenarme mittelalterliche romanische Sprachen | 强化POS贴标签的难解因素:低资源中世纪罗姆语言研究 2506.17715v1 |
Authors (7): Matthias Schöffel, Esteban Garces Arias, Marinus Wiedner, Paula Ruppert, Meimingwei Li, Christian Heumann, Matthias Aßenmacher
Part-of-speech (POS) tagging remains a foundational component in natural language processing pipelines, particularly critical for historical text analysis at the intersection of computational linguistics and digital humanities. Despite significant advancements in modern large language models (LLMs) for ancient languages, their application to Medieval Romance languages presents distinctive challenges stemming from diachronic linguistic evolution, spelling variations, and labeled data scarcity. This study systematically investigates the central determinants of POS tagging performance across diverse corpora of Medieval Occitan, Medieval Spanish, and Medieval French texts, spanning biblical, hagiographical, medical, and dietary domains. Through rigorous experimentation, we evaluate how fine-tuning approaches, prompt engineering, model architectures, decoding strategies, and cross-lingual transfer learning techniques affect tagging accuracy. Our results reveal both notable limitations in LLMs’ ability to process historical language variations and non-standardized spelling, as well as promising specialized techniques that effectively address the unique challenges presented by low-resource historical languages.
nan
Article 927
Title@2025-06-21 (6): Truthful Elicitation of Imprecise Forecasts
Title: Truthful Elicitation of Imprecise Forecasts | Wahre Botschaft von ungenauen Prognosen | 以真真真真真真真真真切的易感简易预报 2503.16395v2 |
Authors (3): Anurag Singh, Siu Lun Chau, Krikamol Muandet
The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts – forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster’s epistemic uncertainty into their decision-making process, thus improving credibility.
nan
Article 928
Title@2025-06-21 (6): CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition
Title: CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition | CEGA: Ein kosteneffizienter Ansatz für graphisch basierte Modellextraktion und -akquisition | CEGA:基于图表的抽取和采购模式的成本-效益办法 2506.17709v1 |
Authors (7): Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong
Graph Neural Networks (GNNs) have demonstrated remarkable utility across diverse applications, and their growing complexity has made Machine Learning as a Service (MLaaS) a viable platform for scalable deployment. However, this accessibility also exposes GNN to serious security threats, most notably model extraction attacks (MEAs), in which adversaries strategically query a deployed model to construct a high-fidelity replica. In this work, we evaluate the vulnerability of GNNs to MEAs and explore their potential for cost-effective model acquisition in non-adversarial research settings. Importantly, adaptive node querying strategies can also serve a critical role in research, particularly when labeling data is expensive or time-consuming. By selectively sampling informative nodes, researchers can train high-performing GNNs with minimal supervision, which is particularly valuable in domains such as biomedicine, where annotations often require expert input. To address this, we propose a node querying strategy tailored to a highly practical yet underexplored scenario, where bulk queries are prohibited, and only a limited set of initial nodes is available. Our approach iteratively refines the node selection mechanism over multiple learning cycles, leveraging historical feedback to improve extraction efficiency. Extensive experiments on benchmark graph datasets demonstrate our superiority over comparable baselines on accuracy, fidelity, and F1 score under strict query-size constraints. These results highlight both the susceptibility of deployed GNNs to extraction attacks and the promise of ethical, efficient GNN acquisition methods to support low-resource research environments.
nan
Article 929
Title@2025-06-21 (6): Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Title: Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach | Zähmen von OOD-Maßnahmen für das Offline-Verstärkungslernen: ein vorteilhafter Ansatz | 塔坦 OOOD 离线强化学习行动:以优势为基础的方法 2505.05126v3 |
Authors (3): Xuyang Chen, Keyu Yan, Lin Zhao
Offline reinforcement learning (RL) aims to learn decision-making policies from fixed datasets without online interactions, providing a practical solution where online data collection is expensive or risky. However, offline RL often suffers from distribution shift, resulting in inaccurate evaluation and substantial overestimation on out-of-distribution (OOD) actions. To address this, existing approaches incorporate conservatism by indiscriminately discouraging all OOD actions, thereby hindering the agent’s ability to generalize and exploit beneficial ones. In this paper, we propose Advantage-based Diffusion Actor-Critic (ADAC), a novel method that systematically evaluates OOD actions using the batch-optimal value function. Based on this evaluation, ADAC defines an advantage function to modulate the Q-function update, enabling more precise assessment of OOD action quality. We design a custom PointMaze environment and collect datasets to visually reveal that advantage modulation can effectively identify and select superior OOD actions. Extensive experiments show that ADAC achieves state-of-the-art performance on almost all tasks in the D4RL benchmark, with particularly clear margins on the more challenging tasks.
nan
Article 930
Title@2025-06-21 (6): Data-Dependent Regret Bounds for Constrained MABs
Title: Data-Dependent Regret Bounds for Constrained MABs | Datendependent Regret Bounds for Constrained MABs | 受约束 MAB 的受控数据依赖的 Regret Bounds 2505.20010v2 |
Authors (5): Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
This paper initiates the study of data-dependent regret bounds in constrained MAB settings. These bounds depend on the sequence of losses that characterize the problem instance. Thus, they can be much smaller than classical $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds, while being equivalent to them in the worst case. Despite this, data-dependent regret bounds have been completely overlooked in constrained MAB settings. The goal of this paper is to answer the following question: Can data-dependent regret bounds be derived in the presence of constraints? We answer this question affirmatively in constrained MABs with adversarial losses and stochastic constraints. Specifically, our main focus is on the most challenging and natural settings with hard constraints, where the learner must ensure that the constraints are always satisfied with high probability. We design an algorithm with a regret bound consisting of two data-dependent terms. The first term captures the difficulty of satisfying the constraints, while the second one encodes the complexity of learning independently of the presence of constraints. We also prove a lower bound showing that these two terms are not artifacts of our specific approach and analysis, but rather the fundamental components that inherently characterize the complexities of the problem. Finally, in designing our algorithm, we also derive some novel results in the related (and easier) soft constraints settings, which may be of independent interest.
nan
Article 931
Title@2025-06-21 (6): Curse of Dimensionality in Neural Network Optimization
Title: Curse of Dimensionality in Neural Network Optimization | Der Fluch der Dimensionalität in der Neuralen Netzwerkoptimierung | 神经网络中多维度的诅咒 优化 2502.05360v2 |
Authors (2): Sanghoon Na, Haizhao Yang
This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^\delta)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2\delta)r}{d-2r}}$. Understanding how function smoothness influences the curse of dimensionality in neural network optimization theory is an important and underexplored direction that this work aims to address.
nan
Article 932
Title@2025-06-21 (6): Zero-Shot Conversational Stance Detection: Dataset and Approaches
Title: Zero-Shot Conversational Stance Detection: Dataset and Approaches | Zero-Shot Conversational Stance Detection: Datensatz und Ansätze | 零热对调调检测:数据集和方法 2506.17693v1 |
Authors (8): Yuzhe Ding, Kang He, Bobo Li, Li Zheng, Haijun He, Fei Li, Chong Teng, Donghong Ji
Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the increasing number of online debates among social media users, conversational stance detection has become a crucial research area. However, existing conversational stance detection datasets are restricted to a limited set of specific targets, which constrains the effectiveness of stance detection models when encountering a large number of unseen targets in real-world applications. To bridge this gap, we manually curate a large-scale, high-quality zero-shot conversational stance detection dataset, named ZS-CSD, comprising 280 targets across two distinct target types. Leveraging the ZS-CSD dataset, we propose SITPCL, a speaker interaction and target-aware prototypical contrastive learning model, and establish the benchmark performance in the zero-shot setting. Experimental results demonstrate that our proposed SITPCL model achieves state-of-the-art performance in zero-shot conversational stance detection. Notably, the SITPCL model attains only an F1-macro score of 43.81%, highlighting the persistent challenges in zero-shot conversational stance detection.
nan
Article 933
Title@2025-06-21 (6): Enhancing Stress-Strain Predictions with Seq2Seq and Cross-Attention based on Small Punch Test
Title: Enhancing Stress-Strain Predictions with Seq2Seq and Cross-Attention based on Small Punch Test | Verbesserung der Stress-Strain-Vorhersagen mit Seq2Seq und Cross-Attention auf Basis von Small Punch Test | 基于小型拳击试验的Seq2Seq和交叉注意加强压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力-压力预测 2506.17680v1 |
Authors (4): Zhengni Yang, Rui Yang, Weijian Han, Qixin Liu
This paper introduces a novel deep-learning approach to predict true stress-strain curves of high-strength steels from small punch test (SPT) load-displacement data. The proposed approach uses Gramian Angular Field (GAF) to transform load-displacement sequences into images, capturing spatial-temporal features and employs a Sequence-to-Sequence (Seq2Seq) model with an LSTM-based encoder-decoder architecture, enhanced by multi-head cross-attention to improved accuracy. Experimental results demonstrate that the proposed approach achieves superior prediction accuracy, with minimum and maximum mean absolute errors of 0.15 MPa and 5.58 MPa, respectively. The proposed method offers a promising alternative to traditional experimental techniques in materials science, enhancing the accuracy and efficiency of true stress-strain relationship predictions.
nan
Article 934
Title@2025-06-21 (6): Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing
Title: Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing | Inferenz-Zeit-Blick-Verfeinerung für die Mikro-Expression-Erkennung: Eventbasiertes Eye Tracking mit Motion-Aware-Post-Processing verbessern | 微电压识别的推断-时玻璃改进改进:加强以动态-软件处理后的方式对事件进行目视跟踪 2506.12524v2 |
Authors (3): Nuwan Bandara, Thivya Kandappu, Archan Misra
Event-based eye tracking holds significant promise for fine-grained cognitive state inference, offering high temporal resolution and robustness to motion artifacts, critical features for decoding subtle mental states such as attention, confusion, or fatigue. In this work, we introduce a model-agnostic, inference-time refinement framework designed to enhance the output of existing event-based gaze estimation models without modifying their architecture or requiring retraining. Our method comprises two key post-processing modules: (i) Motion-Aware Median Filtering, which suppresses blink-induced spikes while preserving natural gaze dynamics, and (ii) Optical Flow-Based Local Refinement, which aligns gaze predictions with cumulative event motion to reduce spatial jitter and temporal discontinuities. To complement traditional spatial accuracy metrics, we propose a novel Jitter Metric that captures the temporal smoothness of predicted gaze trajectories based on velocity regularity and local signal complexity. Together, these contributions significantly improve the consistency of event-based gaze signals, making them better suited for downstream tasks such as micro-expression analysis and mind-state decoding. Our results demonstrate consistent improvements across multiple baseline models on controlled datasets, laying the groundwork for future integration with multimodal affect recognition systems in real-world environments.
nan
Article 935
Title@2025-06-21 (6): Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking
Title: Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking | Verstärkung Learning-based Dynamic Grouping für Rohrstruktur-Tracking | 用于跟踪Tubular 结构跟踪的强化学习型动态组 2506.18930v1 |
Authors (6): Chong Di, Shuwang Zhou, Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen
The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenarios, they often suffer from computational inefficiency and heavily rely on a prescribed prior to fit the target elongated shapes. We propose a novel framework that casts segment-wise tracking as a Markov Decision Process (MDP), enabling a reinforcement learning approach. Our method leverages Q-Learning to dynamically explore a graph of segments, computing edge weights on-demand and adaptively expanding the search space. This strategy avoids the high cost of a pre-computed graph and proves robust to incomplete initial information. Experimental reuslts on typical tubular structure datasets demonstrate that our method significantly outperforms state-of-the-art point-wise and segment-wise approaches. The proposed method effectively handles complex topologies and maintains global path coherence without depending on extensive prior structural knowledge.
nan
Article 936
Title@2025-06-21 (6): Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities
Title: Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities | Nicht-asymptotische Annäherungen der Gaußschen neuronalen Netze über Ungleichheiten in Poincaré zweiter Ordnung | 通过Poincaré分级的第二级不平等,高森神经网络的非症状近似 2304.04010v2 |
Authors (3): Alberto Bordino, Stefano Favaro, Sandra Fortini
There is a recent and growing literature on large-width asymptotic and non-asymptotic properties of deep Gaussian neural networks (NNs), namely NNs with weights initialized as Gaussian distributions. For a Gaussian NN of depth $L\geq1$ and width $n\geq1$, it is well-known that, as $n\rightarrow+\infty$, the NN’s output converges (in distribution) to a Gaussian process. Recently, some quantitative versions of this result, also known as quantitative central limit theorems (QCLTs), have been obtained, showing that the rate of convergence is $n^{-1}$, in the $2$-Wasserstein distance, and that such a rate is optimal. In this paper, we investigate the use of second-order Poincar'e inequalities as an alternative approach to establish QCLTs for the NN’s output. Previous approaches consist of a careful analysis of the NN, by combining non-trivial probabilistic tools with ad-hoc techniques that rely on the recursive definition of the network, typically by means of an induction argument over the layers, and it is unclear if and how they still apply to other NN’s architectures. Instead, the use of second-order Poincar'e inequalities rely only on the fact that the NN is a functional of a Gaussian process, reducing the problem of establishing QCLTs to the algebraic problem of computing the gradient and Hessian of the NN’s output, which still applies to other NN’s architectures. We show how our approach is effective in establishing QCLTs for the NN’s output, though it leads to suboptimal rates of convergence. We argue that such a worsening in the rates is peculiar to second-order Poincar'e inequalities, and it should be interpreted as the “cost” for having a straightforward, and general, procedure for obtaining QCLTs.
nan
Article 937
Title@2025-06-21 (6): Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
Title: Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference | Vernunftschaltungen in Sprachmodellen: Eine mechanistische Interpretation der syllogistischen Inferenz | 语言模型中说明理由的电路:对音频推断的机械解释 2408.08590v3 |
Authors (3): Geonhee Kim, Marco Valentino, André Freitas
Recent studies on reasoning in language models (LMs) have sparked a debate on whether they can learn systematic inferential principles or merely exploit superficial patterns in the training data. To understand and uncover the mechanisms adopted for formal reasoning in LMs, this paper presents a mechanistic interpretation of syllogistic inference. Specifically, we present a methodology for circuit discovery aimed at interpreting content-independent and formal reasoning mechanisms. Through two distinct intervention methods, we uncover a sufficient and necessary circuit involving middle-term suppression that elucidates how LMs transfer information to derive valid conclusions from premises. Furthermore, we investigate how belief biases manifest in syllogistic inference, finding evidence of partial contamination from additional attention heads responsible for encoding commonsense and contextualized knowledge. Finally, we explore the generalization of the discovered mechanisms across various syllogistic schemes, model sizes and architectures. The identified circuit is sufficient and necessary for syllogistic schemes on which the models achieve high accuracy (>60%), with compatible activation patterns across models of different families. Overall, our findings suggest that LMs learn transferable content-independent reasoning mechanisms, but that, at the same time, such mechanisms do not involve generalizable and abstract logical primitives, being susceptible to contamination by the same world knowledge acquired during pre-training.
nan
Article 938
Title@2025-06-21 (6): Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
Title: Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization | Robustes LLM-Unlearning mit MUDMAN: Meta-Unlearning mit Disruptionsmasken und Normalisierung | 与 MUDMAN 一起重新学习: 以干扰蒙蔽和正常化的方式重新学习 2506.12484v2 |
Authors (4): Filip Sondej, Yushi Yang, Mikołaj Kniejski, Marcel Windys
Language models can retain dangerous knowledge and skills even after extensive safety fine-tuning, posing both misuse and misalignment risks. Recent studies show that even specialized unlearning methods can be easily reversed. To address this, we systematically evaluate many existing and novel components of unlearning methods and identify ones crucial for irreversible unlearning. We introduce Disruption Masking, a technique in which we only allow updating weights, where the signs of the unlearning gradient and the retaining gradient are the same. This ensures all updates are non-disruptive. Additionally, we identify the need for normalizing the unlearning gradients, and also confirm the usefulness of meta-learning. We combine these insights into MUDMAN (Meta-Unlearning with Disruption Masking and Normalization) and validate its effectiveness at preventing the recovery of dangerous capabilities. MUDMAN outperforms the prior TAR method by 40\%, setting a new state-of-the-art for robust unlearning.
nan
Article 939
Title@2025-06-21 (6): Gaussian Process Latent Variable Modeling for Few-shot Time Series Forecasting
Title: Gaussian Process Latent Variable Modeling for Few-shot Time Series Forecasting | Gaussian Prozess Latente Variable Modellierung für wenige Fotos Time Series Forecasting | Gaussian 微短时间序列预测的 Gaussian 进程中点变量建模 2212.10306v2 |
Authors (9): Yunyao Cheng, Chenjuan Guo, Kaixuan Chen, Kai Zhao, Bin Yang, Jiandong Xie, Christian S. Jensen, Feiteng Huang, Kai Zheng
Accurate time series forecasting is crucial for optimizing resource allocation, industrial production, and urban management, particularly with the growth of cyber-physical and IoT systems. However, limited training sample availability in fields like physics and biology poses significant challenges. Existing models struggle to capture long-term dependencies and to model diverse meta-knowledge explicitly in few-shot scenarios. To address these issues, we propose MetaGP, a meta-learning-based Gaussian process latent variable model that uses a Gaussian process kernel function to capture long-term dependencies and to maintain strong correlations in time series. We also introduce Kernel Association Search (KAS) as a novel meta-learning component to explicitly model meta-knowledge, thereby enhancing both interpretability and prediction accuracy. We study MetaGP on simulated and real-world few-shot datasets, showing that it is capable of state-of-the-art prediction accuracy. We also find that MetaGP can capture long-term dependencies and can model meta-knowledge, thereby providing valuable insights into complex time series patterns.
nan
Article 940
Title@2025-06-21 (6): FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies
Title: FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies | FaithfulSAE: Auf dem Weg zur Erfassung treuer Funktionen mit Sparse Autoencodern ohne externe Datensatzabhängigkeiten | 忠实的SAE:在没有外部数据集依赖性的情况下, 与粗略自动解析器一起获取忠实的特征 2506.17673v1 |
Authors (6): Seonglae Cho, Harryn Oh, Donghyun Lee, Luis Eduardo Rodrigues Vieira, Andrew Bermingham, Ziad El Sayed
Sparse Autoencoders (SAEs) have emerged as a promising solution for decomposing large language model representations into interpretable features. However, Paulo and Belrose (2025) have highlighted instability across different initialization seeds, and Heap et al. (2025) have pointed out that SAEs may not capture model-internal features. These problems likely stem from training SAEs on external datasets - either collected from the Web or generated by another model - which may contain out-of-distribution (OOD) data beyond the model’s generalisation capabilities. This can result in hallucinated SAE features, which we term “Fake Features”, that misrepresent the model’s internal activations. To address these issues, we propose FaithfulSAE, a method that trains SAEs on the model’s own synthetic dataset. Using FaithfulSAEs, we demonstrate that training SAEs on less-OOD instruction datasets results in SAEs being more stable across seeds. Notably, FaithfulSAEs outperform SAEs trained on web-based datasets in the SAE probing task and exhibit a lower Fake Feature Ratio in 5 out of 7 models. Overall, our approach eliminates the dependency on external datasets, advancing interpretability by better capturing model-internal features while highlighting the often neglected importance of SAE training datasets.
nan
Article 941
Title@2025-06-21 (6): Learning Personalized Utility Functions for Drivers in Ride-hailing Systems Using Ensemble Hypernetworks
Title: Learning Personalized Utility Functions for Drivers in Ride-hailing Systems Using Ensemble Hypernetworks | Learning Personalisierte Utility-Funktionen für Treiber in Ride-Haling-Systemen mit Ensemble Hypernetworks | 利用组合式超网络进行乘载系统的驱动人员学习个性化功用功能 2506.17672v1 |
Authors (3): Weiming Mai, Jie Gao, Oded Cats
In ride-hailing systems, drivers decide whether to accept or reject ride requests based on factors such as order characteristics, traffic conditions, and personal preferences. Accurately predicting these decisions is essential for improving the efficiency and reliability of these systems. Traditional models, such as the Random Utility Maximization (RUM) approach, typically predict drivers’ decisions by assuming linear correlations among attributes. However, these models often fall short because they fail to account for non-linear interactions between attributes and do not cater to the unique, personalized preferences of individual drivers. In this paper, we develop a method for learning personalized utility functions using hypernetwork and ensemble learning. Hypernetworks dynamically generate weights for a linear utility function based on trip request data and driver profiles, capturing the non-linear relationships. An ensemble of hypernetworks trained on different data segments further improve model adaptability and generalization by introducing controlled randomness, thereby reducing over-fitting. We validate the performance of our ensemble hypernetworks model in terms of prediction accuracy and uncertainty estimation in a real-world dataset. The results demonstrate that our approach not only accurately predicts each driver’s utility but also effectively balances the needs for explainability and uncertainty quantification. Additionally, our model serves as a powerful tool for revealing the personalized preferences of different drivers, clearly illustrating which attributes largely impact their rider acceptance decisions.
nan
Article 942
Title@2025-06-21 (6): TPTT: Transforming Pretrained Transformer into Titans
Title: TPTT: Transforming Pretrained Transformer into Titans | TPTT: Transformieren des vortrainierten Transformers in Titanen | TPTT: 将预训练变形器转换成巨人 2506.17671v1 |
Authors (1): Fabien Furfaro
Recent advances in large language models (LLMs) have led to remarkable progress in natural language processing, but their computational and memory demands remain a significant challenge, particularly for long-context inference. We introduce TPTT (Transforming Pretrained Transformer into Titans), a novel framework for enhancing pretrained Transformer models with efficient linearized attention mechanisms and advanced memory management. TPTT employs techniques such as Memory as Gate (MaG) and mixed linearized attention (LiZA). It is fully compatible with the Hugging Face Transformers library, enabling seamless adaptation of any causal LLM through parameter-efficient fine-tuning (LoRA) without full retraining. We show the effectiveness of TPTT on the MMLU benchmark with models of approximately 1 billion parameters, observing substantial improvements in both efficiency and accuracy. For instance, Titans-Llama-3.2-1B achieves a 20% increase in Exact Match (EM) over its baseline. Statistical analyses and comparisons with recent state-of-the-art methods confirm the practical scalability and robustness of TPTT. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .
nan
Article 943
Title@2025-06-21 (6): Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution
Title: Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution | Online-Multi-LLM-Auswahl über Kontext-Banditen unter unstrukturierter Kontext-Evolution | 在无结构环境演变下通过背景强盗进行在线多LLLM选择 2506.17670v1 |
Authors (6): Manhin Poon, XiangXiang Dai, Xutong Liu, Fang Kong, John C. S. Lui, Jinhang Zuo
Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths, making it challenging to select the most suitable LLM for a given user query. We study the problem of adaptive multi-LLM selection in an online setting, where the learner interacts with users through multi-step query refinement and must choose LLMs sequentially without access to offline datasets or model internals. A key challenge arises from unstructured context evolution: the prompt dynamically changes in response to previous model outputs via a black-box process, which cannot be simulated, modeled, or learned. To address this, we propose the first contextual bandit framework for sequential LLM selection under unstructured prompt dynamics. We formalize a notion of myopic regret and develop a LinUCB-based algorithm that provably achieves sublinear regret without relying on future context prediction. We further introduce budget-aware and positionally-aware (favoring early-stage satisfaction) extensions to accommodate variable query costs and user preferences for early high-quality responses. Our algorithms are theoretically grounded and require no offline fine-tuning or dataset-specific training. Experiments on diverse benchmarks demonstrate that our methods outperform existing LLM routing strategies in both accuracy and cost-efficiency, validating the power of contextual bandits for real-time, adaptive LLM selection.
nan
Article 944
Title@2025-06-21 (6): How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies
Title: How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies | Wie trainieren Sie Ihr Multi-Exit-Modell? Analysieren der Auswirkungen von Trainingsstrategien | 如何培训你的多出口模式?分析培训战略的影响 2407.14320v2 |
Authors (7): Piotr Kubaty, Bartosz Wójcik, Bartłomiej Krzepkowski, Monika Michaluk, Tomasz Trzciński, Jary Pomponi, Kamil Adamczewski
Early exits enable the network’s forward pass to terminate early by attaching trainable internal classifiers to the backbone network. Existing early-exit methods typically adopt either a joint training approach, where the backbone and exit heads are trained simultaneously, or a disjoint approach, where the heads are trained separately. However, the implications of this choice are often overlooked, with studies typically adopting one approach without adequate justification. This choice influences training dynamics and its impact remains largely unexplored. In this paper, we introduce a set of metrics to analyze early-exit training dynamics and guide the choice of training strategy. We demonstrate that conventionally used joint and disjoint regimes yield suboptimal performance. To address these limitations, we propose a mixed training strategy: the backbone is trained first, followed by the training of the entire multi-exit network. Through comprehensive evaluations of training strategies across various architectures, datasets, and early-exit methods, we present the strengths and weaknesses of the early exit training strategies. In particular, we show consistent improvements in performance and efficiency using the proposed mixed strategy.
nan
Article 945
Title@2025-06-21 (6): Advanced Modeling for Exoplanet Detection and Characterization
Title: Advanced Modeling for Exoplanet Detection and Characterization | Erweiterte Modellierung für Exoplanetenerkennung und Charakterisierung | 推进异地平原探测和特征化的模型化 2506.17665v1 |
Authors (1): Krishna Chamarthy
Research into light curves from stars (temporal variation of brightness) has completely changed how exoplanets are discovered or characterised. This study including star light curves from the Kepler dataset as a way to discover exoplanets (planetary transits) and derive some estimate of their physical characteristics by the light curve and machine learning methods. The dataset consists of measured flux (recordings) for many individual stars and we will examine the light curve of each star and look for periodic dips in brightness due to an astronomical body making a transit. We will apply variables derived from an established method for deriving measurements from light curve data to derive key parameters related to the planet we observed during the transit, such as distance to the host star, orbital period, radius. The orbital period will typically be measured based on the time between transit of the subsequent timelines and the radius will be measured based on the depth of transit. The density of the star and planet can also be estimated from the transit event, as well as very limited information on the albedo (reflectivity) and atmosphere of the planet based on transmission spectroscopy and/or the analysis of phase curve for levels of flux. In addition to these methods, we will employ some machine learning classification of the stars (i.e. likely have an exoplanet or likely do not have an exoplanet) based on flux change. This could help fulfil both the process of looking for exoplanets more efficient as well as providing important parameters for the planet. This will provide a much quicker means of searching the vast astronomical datasets for the likelihood of exoplanets.
nan
Article 946
Title@2025-06-21 (6): Stop Overvaluing Multi-Agent Debate – We Must Rethink Evaluation and Embrace Model Heterogeneity
Title: Stop Overvaluing Multi-Agent Debate – We Must Rethink Evaluation and Embrace Model Heterogeneity | Mehr-Agenten-Debatte stoppen – Wir müssen Bewertung neu denken und Modell Heterogenität umarmen | 停止高估多机构辩论 – – 我们必须重新思考评价和拥抱模型多样性 2502.08788v3 |
Authors (8): Hangfan Zhang, Zhiyao Cui, Jianhao Chen, Xinrun Wang, Qiaosheng Zhang, Zhen Wang, Dinghao Wu, Shuyue Hu
Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs). Despite its conceptual appeal, current MAD research suffers from critical limitations in evaluation practices, including limited benchmark coverage, weak baseline comparisons, and inconsistent setups. This paper presents a systematic evaluation of 5 representative MAD methods across 9 benchmarks using 4 foundational models. Surprisingly, our findings reveal that MAD often fail to outperform simple single-agent baselines such as Chain-of-Thought and Self-Consistency, even when consuming significantly more inference-time computation. To advance MAD research, we further explore the role of model heterogeneity and find it as a universal antidote to consistently improve current MAD frameworks. Based on our findings, we argue that the field must stop overvaluing MAD in its current form; for true advancement, we must critically rethink evaluation paradigms and actively embrace model heterogeneity as a core design principle.
nan
Article 947
Title@2025-06-21 (6): How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Title: How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs | Wie numerische Präzision die Fähigkeit von LLMs zur Arithmetik beeinflusst | 数字精确度如何影响LLM 的理理原因能力 2410.13857v2 |
Authors (9): Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, Liwei Wang
Despite the remarkable success of Transformer-based large language models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs’ mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in arithmetical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.
nan
Article 948
Title@2025-06-21 (6): Comba: Improving Bilinear RNNs with Closed-loop Control
Title: Comba: Improving Bilinear RNNs with Closed-loop Control | Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern | Comba: 改进有闭环控制的双线区域网网 2506.02475v3 |
Authors (8): Jiaxi Hu, Yongqi Pan, Jusen Du, Disen Lan, Xiaqiang Tang, Qingsong Wen, Yuxuan Liang, Weigao Sun
Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.
nan
Article 949
Title@2025-06-21 (6): Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation
Title: Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation | Schritt-Opt: Steigerung der Optimierungsmodellierung in LLMs durch iterative Datensynthese und strukturierte Validierung | 通过迭代数据合成和结构化校验,促进通过迭代数据合成和结构化校验,在LLMs中建立优化优化模型模型 2506.17637v1 |
Authors (6): Yang Wu, Yifan Zhang, Yurong Wu, Yuran Wang, Junkai Zhang, Jian Cheng
Large Language Models (LLMs) have revolutionized various domains but encounter substantial challenges in tackling optimization modeling tasks for Operations Research (OR), particularly when dealing with complex problem. In this work, we propose Step-Opt-Instruct, a framework that augments existing datasets and generates high-quality fine-tuning data tailored to optimization modeling. Step-Opt-Instruct employs iterative problem generation to systematically increase problem complexity and stepwise validation to rigorously verify data, preventing error propagation and ensuring the quality of the generated dataset. Leveraging this framework, we fine-tune open-source LLMs, including LLaMA-3-8B and Mistral-7B, to develop Step-Opt–a model that achieves state-of-the-art performance on benchmarks such as NL4OPT, MAMO, and IndustryOR. Extensive experiments demonstrate the superior performance of Step-Opt, especially in addressing complex OR tasks, with a notable 17.01\% improvement in micro average accuracy on difficult problems. These findings highlight the effectiveness of combining structured validation with gradual problem refinement to advance the automation of decision-making processes using LLMs.The code and dataset are available at https://github.com/samwu-learn/Step.
nan
Article 950
Title@2025-06-21 (6): Completely Parameter-Free Single-Loop Algorithms for Nonconvex-Concave Minimax Problems
Title: Completely Parameter-Free Single-Loop Algorithms for Nonconvex-Concave Minimax Problems | Vollständig Parameter-freie Single-Loop-Algorithmen für nicht konvex-konkave Minimax-Probleme | 完全无参数的非convex- Conceve Minimax 问题单线单光解算法 2407.21372v3 |
Authors (3): Junnan Yang, Huiling Zhang, Zi Xu
Due to their importance in various emerging applications, efficient algorithms for solving minimax problems have recently received increasing attention. However, many existing algorithms require prior knowledge of the problem parameters in order to achieve optimal iteration complexity. In this paper, three completely parameter-free single-loop algorithms, namely PF-AGP-NSC algorithm, PF-AGP-NC algorithm and PF-AGP-NL algorithm, are proposed to solve the smooth nonconvex-strongly concave, nonconvex-concave minimax problems and nonconvex-linear minimax problems respectively using line search without requiring any prior knowledge about parameters such as the Lipschtiz constant $L$ or the strongly concave modulus $\mu$. Furthermore, we prove that the total number of gradient calls required to obtain an $\varepsilon$-stationary point for the PF-AGP-NSC algorithm, the PF-AGP-NC algorithm, and the PF-AGP-NL algorithm are upper bounded by $\mathcal{O}\left( L^2\kappa^3\varepsilon^{-2} \right)$, $\mathcal{O}\left( \log^2(L)L^4\varepsilon^{-4} \right)$, and $\mathcal{O}\left( L^3\varepsilon^{-3} \right)$, respectively, where $\kappa$ is the condition number. To the best of our knowledge, PF-AGP-NC and PF-AGP-NL are the first completely parameter-free algorithms for solving nonconvex-concave and nonconvex-linear minimax problems, respectively. PF-AGP-NSC is a completely parameter-free algorithm for solving nonconvex-strongly concave minimax problems, achieving the best known complexity with respect to $\varepsilon$. Numerical results demonstrate the efficiency of the three proposed algorithms.
nan
Article 951
Title@2025-06-21 (6): RPLKG: Robust Prompt Learning with Knowledge Graph
Title: RPLKG: Robust Prompt Learning with Knowledge Graph | RPLKG: Robustes Prompt-Lernen mit Wissensgrafik | ROPLKG: 运用知识图进行强力快速学习 2304.10805v2 |
Authors (5): YongTaek Lim, Yewon Kim, Suho Kang, Dokyung Yoon, KyungWoo Song
Large-scale pre-trained models surpass in transferability and robust generalization across diverse datasets. The emergence of multimodal pre-trained models like CLIP has significantly boosted performance in various experiments. However, generalizing to new datasets or domains remains challenging, especially with limited labeled data. Also, existing methods often lack interpretability and impose high computational costs. To address this, we propose Robust Prompt Learning with Knowledge Graph (RPLKG), leveraging the knowledge graph to curate diverse, interpretable prompt sets automatically. Our method autonomously selects the optimal interpretable prompt based on dataset characteristics, achieving performance improvements over zero-shot learning and competitive performance compared to various prompt learning methods. Also, RPLKG efficiently reuses cached prompt embeddings from a single model pass and optimizes prompt selection via Gumbel-Softmax, enabling low-memory, fast training. Moreover, RPLKG advances few-shot learning effectiveness while enhancing interpretability and efficiency in model adaptation. Our
nan
Article 952
Title@2025-06-21 (6): LLM-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting
Title: LLM-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting | LLM-Prompt: Integrierte Heterogene Prompt für die Entriegelung von LLMs in der Zeitreihenprognose | LLM-Prompt:在时间序列预测中解锁LLMLM的综合异种提示 2506.17631v1 |
Authors (3): Zesen Wang, Yonggang Li, Lijuan Lan
Time series forecasting aims to model temporal dependencies among variables for future state inference, holding significant importance and widespread applications in real-world scenarios. Although deep learning-based methods have achieved remarkable progress, they still exhibit suboptimal performance in long-term forecasting and data-scarce scenarios. Recent research demonstrates that large language models (LLMs) achieve promising performance in time series forecasting. However, we find existing LLM-based methods still have shortcomings: (1) the absence of a unified paradigm for textual prompt formulation and (2) the neglect of modality discrepancies between textual prompts and time series. To address this, we propose LLM-Prompt, an LLM-based time series forecasting framework integrating multi-prompt information and cross-modal semantic alignment. Specifically, we first construct a unified textual prompt paradigm containing learnable soft prompts and textualized hard prompts. Second, to enhance LLMs’ comprehensive understanding of the forecasting task, we design a semantic space embedding and cross-modal alignment module to achieve cross-modal fusion of temporal and textual information. Finally, the transformed time series from the LLMs are projected to obtain the forecasts. Comprehensive evaluations on 6 public datasets and 3 carbon emission datasets demonstrate that LLM-Prompt is a powerful framework for time series forecasting.
nan
Article 953
Title@2025-06-21 (6): UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation
Title: UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation | UniMoT: Unified Molecule-Text Language Model mit diskreter Token-Darstellung | UniMoT: 具有分立调制调制解析器表示式的统一分子文字语言模式 2408.00863v2 |
Authors (6): Shuhan Guo, Yatao Bian, Ruibing Wang, Nan Yin, Zhen Wang, Quanming Yao
The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that do not treat molecule and text modalities equally and lack a supervision signal for the molecule modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLM with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecules into sequences of molecule tokens with causal dependency, encapsulating high-level molecular and textual information. Equipped with this tokenizer, UniMoT can unify molecule and text modalities under a shared token representation and an autoregressive training paradigm, enabling it to interpret molecules as a foreign language and generate them as text. Following a four-stage training scheme, UniMoT emerges as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.
nan
Article 954
Title@2025-06-21 (6): A Closer Look into Mixture-of-Experts in Large Language Models
Title: A Closer Look into Mixture-of-Experts in Large Language Models | Ein genauerer Blick in Mixture-of-Experts in großen Sprachmodellen | 更密切地研究大语言模型混合专家 2406.18219v3 |
Authors (5): Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three popular MoE-based models and reveal some intriguing observations, including 1) Neurons act like fine-grained experts; 2) The router of MoE usually selects experts with larger output norms; 3) The expert diversity increases as the layer increases, while the last layer is an outlier, which is further validated by an initial experiment. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs.
nan
Article 955
Title@2025-06-21 (6): DrivAer Transformer: A high-precision and fast prediction method for vehicle aerodynamic drag coefficient based on the DrivAerNet++ dataset
Title: DrivAer Transformer: A high-precision and fast prediction method for vehicle aerodynamic drag coefficient based on the DrivAerNet++ dataset | DrivAer Transformer: Eine hochpräzise und schnelle Vorhersagemethode für den aerodynamischen Widerstandskoeffizienten auf Basis des DrivAerNet++ Datensatzes | DriivAer变换器:基于DriivAerNet++数据集的车辆空气动力拖动系数的高精度和快速预测方法 2504.08217v5 |
Authors (3): Jiaqi He, Xiangwen Luo, Yiping Wang
At the current stage, deep learning-based methods have demonstrated excellent capabilities in evaluating aerodynamic performance, significantly reducing the time and cost required for traditional computational fluid dynamics (CFD) simulations. However, when faced with the task of processing extremely complex three-dimensional (3D) vehicle models, the lack of large-scale datasets and training resources, coupled with the inherent diversity and complexity of the geometry of different vehicle models, means that the prediction accuracy and versatility of these networks are still not up to the level required for current production. In view of the remarkable success of Transformer models in the field of natural language processing and their strong potential in the field of image processing, this study innovatively proposes a point cloud learning framework called DrivAer Transformer (DAT). The DAT structure uses the DrivAerNet++ dataset, which contains high-fidelity CFD data of industrial-standard 3D vehicle shapes. enabling accurate estimation of air drag directly from 3D meshes, thus avoiding the limitations of traditional methods such as 2D image rendering or signed distance fields (SDF). DAT enables fast and accurate drag prediction, driving the evolution of the aerodynamic evaluation process and laying the critical foundation for introducing a data-driven approach to automotive design. The framework is expected to accelerate the vehicle design process and improve development efficiency.
nan
Article 956
Title@2025-06-21 (6): Exploring the Secondary Risks of Large Language Models
Title: Exploring the Secondary Risks of Large Language Models | Erforschung der sekundären Risiken großer Sprachmodelle | 探讨大语言模式的次要风险 2506.12382v2 |
Authors (6): Jiawei Chen, Zhengwei Fang, Xiao Yang, Chao Yu, Zhaoxia Yin, Hang Su
Ensuring the safety and alignment of Large Language Models is a significant challenge with their growing integration into critical applications and societal functions. While prior research has primarily focused on jailbreak attacks, less attention has been given to non-adversarial failures that subtly emerge during benign interactions. We introduce secondary risks a novel class of failure modes marked by harmful or misleading behaviors during benign prompts. Unlike adversarial attacks, these risks stem from imperfect generalization and often evade standard safety mechanisms. To enable systematic evaluation, we introduce two risk primitives verbose response and speculative advice that capture the core failure patterns. Building on these definitions, we propose SecLens, a black-box, multi-objective search framework that efficiently elicits secondary risk behaviors by optimizing task relevance, risk activation, and linguistic plausibility. To support reproducible evaluation, we release SecRiskBench, a benchmark dataset of 650 prompts covering eight diverse real-world risk categories. Experimental results from extensive evaluations on 16 popular models demonstrate that secondary risks are widespread, transferable across models, and modality independent, emphasizing the urgent need for enhanced safety mechanisms to address benign yet harmful LLM behaviors in real-world deployments.
nan
Article 957
Title@2025-06-21 (6): Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems
Title: Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems | Nutzung von Effizienzlücken in dynamischen Deep-Learning-Systemen | 利用动态深深学习系统的效率脆弱性 2506.17621v1 |
Authors (2): Ravishka Rathnasuriya, Wei Yang
The growing deployment of deep learning models in real-world environments has intensified the need for efficient inference under strict latency and resource constraints. To meet these demands, dynamic deep learning systems (DDLSs) have emerged, offering input-adaptive computation to optimize runtime efficiency. While these systems succeed in reducing cost, their dynamic nature introduces subtle and underexplored security risks. In particular, input-dependent execution pathways create opportunities for adversaries to degrade efficiency, resulting in excessive latency, energy usage, and potential denial-of-service in time-sensitive deployments. This work investigates the security implications of dynamic behaviors in DDLSs and reveals how current systems expose efficiency vulnerabilities exploitable by adversarial inputs. Through a survey of existing attack strategies, we identify gaps in the coverage of emerging model architectures and limitations in current defense mechanisms. Building on these insights, we propose to examine the feasibility of efficiency attacks on modern DDLSs and develop targeted defenses to preserve robustness under adversarial conditions.
nan
Article 958
Title@2025-06-21 (6): Trustworthy Chronic Disease Risk Prediction For Self-Directed Preventive Care via Medical Literature Validation
Title: Trustworthy Chronic Disease Risk Prediction For Self-Directed Preventive Care via Medical Literature Validation | Vertrauenswürdige Risikovorhersage für chronische Krankheiten für die selbstgesteuerte Präventivversorgung über die Validierung medizinischer Literatur | 通过医学文学鉴定对自我分散的预防性护理进行可靠可靠慢性慢性病风险预测 2506.17620v1 |
Authors (2): Minh Le, Khoi Ton
Chronic diseases are long-term, manageable, yet typically incurable conditions, highlighting the need for effective preventive strategies. Machine learning has been widely used to assess individual risk for chronic diseases. However, many models rely on medical test data (e.g. blood results, glucose levels), which limits their utility for proactive self-assessment. Additionally, to gain public trust, machine learning models should be explainable and transparent. Although some research on self-assessment machine learning models includes explainability, their explanations are not validated against established medical literature, reducing confidence in their reliability. To address these issues, we develop deep learning models that predict the risk of developing 13 chronic diseases using only personal and lifestyle factors, enabling accessible, self-directed preventive care. Importantly, we use SHAP-based explainability to identify the most influential model features and validate them against established medical literature. Our results show a strong alignment between the models’ most influential features and established medical literature, reinforcing the models’ trustworthiness. Critically, we find that this observation holds across 13 distinct diseases, indicating that this machine learning approach can be broadly trusted for chronic disease prediction. This work lays the foundation for developing trustworthy machine learning tools for self-directed preventive care. Future research can explore other approaches for models’ trustworthiness and discuss how the models can be used ethically and responsibly.
nan
Article 959
Title@2025-06-21 (6): Federated Learning With Energy Harvesting Devices: An MDP Framework
Title: Federated Learning With Energy Harvesting Devices: An MDP Framework | Federated Learning with Energy Harvesting Devices: Ein MDP-Framework | 联邦能源收获装置学习:MDP框架 2405.10513v2 |
Authors (3): Kai Zhang, Xuanyu Cao, Khaled B. Letaief
Federated learning (FL) necessitates that edge devices conduct local training and communicate with a parameter server, resulting in significant energy consumption. A key challenge in practical FL systems is the rapid depletion of battery-limited edge devices, which limits their operational lifespan and impacts learning performance. To tackle this issue, we implement energy harvesting techniques in FL systems to capture ambient energy, thereby providing continuous power to edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is affected by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms.
nan
Article 960
Title@2025-06-21 (6): EQuARX: Efficient Quantized AllReduce in XLA for Distributed Machine Learning Acceleration
Title: EQuARX: Efficient Quantized AllReduce in XLA for Distributed Machine Learning Acceleration | EQuARX: Effiziente Quantisiertes AllReduce in XLA zur Beschleunigung des verteilten maschinellen Lernens | EuARX: XLA 中高效量化全减,以加速分配机器学习 2506.17615v1 |
Authors (8): Ibrahim Ahmed, Clemens Schaefer, Gil Tabak, Denis Vnukov, Zenong Zhang, Felix chern, Anatoliy Yevtushenko, Andy Davis
While Large Language Models (LLMs) have become highly influential, their enormous scale presents significant deployment challenges. Efficiently serving these models typically requires distributing them across numerous accelerator devices, which introduces substantial performance overhead from inter-device communication (collectives). While model quantization has been widely adopted to reduce the memory and compute requirements of LLM weights and activations with minimal quality impact, applying quantization directly to collectives like AllReduce is inherently difficult due to the inter-device summation involved, which can lead to numerical instability or significant error accumulation. In this work, we present a native dynamic block-wise efficient quantized AllReduce within the XLA compiler for TPUs (EQuARX). By using TPU-friendly quantization and deep pipelining of communication and compute, EQuARX with int8 precision achieves a 1.8X speedup over baseline BF16 AllReduce across various network topologies. Furthermore, EQuARX accelerates the prefill stage of Gemma 3 27B by 1.25X and Gemma 3 12B by 1.1X, respectively, with small to negligible impact on quality.
nan
Article 961
Title@2025-06-21 (6): Open-world machine learning: A review and new outlooks
Title: Open-world machine learning: A review and new outlooks | Open-World Machine Learning: Ein Rückblick und neue Perspektiven | 开放世界机器学习:回顾和新展望 2403.01759v4 |
Authors (7): Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, Dacheng Tao, Cheng-Lin Liu
Machine learning has achieved remarkable success in many applications. However, existing studies are largely based on the closed-world assumption, which assumes that the environment is stationary, and the model is fixed once deployed. In many real-world applications, this fundamental and rather naive assumption may not hold because an open environment is complex, dynamic, and full of unknowns. In such cases, rejecting unknowns, discovering novelties, and then continually learning them, could enable models to be safe and evolve continually as biological systems do. This article presents a holistic view of open-world machine learning by investigating unknown rejection, novelty discovery, and continual learning in a unified paradigm. The challenges, principles, and limitations of current methodologies are discussed in detail. Furthermore, widely used benchmarks, metrics, and performances are summarized. Finally, we discuss several potential directions for further progress in the field. By providing a comprehensive introduction to the emerging open-world machine learning paradigm, this article aims to help researchers build more powerful AI systems in their respective fields, and to promote the development of artificial general intelligence.
nan
Article 962
Title@2025-06-21 (6): TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting
Title: TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting | TyphoFormer: Sprachgesteigerter Transformer für präzise Typhoon-Track-Prognose | 台风前台风:用于准确预报台风轨道的语文增强变换器 2506.17609v1 |
Authors (6): Lincan Li, Eren Erman Ozguven, Yue Zhao, Guang Wang, Yiqun Xie, Yushun Dong
Accurate typhoon track forecasting is crucial for early system warning and disaster response. While Transformer-based models have demonstrated strong performance in modeling the temporal dynamics of dense trajectories of humans and vehicles in smart cities, they usually lack access to broader contextual knowledge that enhances the forecasting reliability of sparse meteorological trajectories, such as typhoon tracks. To address this challenge, we propose TyphoFormer, a novel framework that incorporates natural language descriptions as auxiliary prompts to improve typhoon trajectory forecasting. For each time step, we use Large Language Model (LLM) to generate concise textual descriptions based on the numerical attributes recorded in the North Atlantic hurricane database. The language descriptions capture high-level meteorological semantics and are embedded as auxiliary special tokens prepended to the numerical time series input. By integrating both textual and sequential information within a unified Transformer encoder, TyphoFormer enables the model to leverage contextual cues that are otherwise inaccessible through numerical features alone. Extensive experiments are conducted on HURDAT2 benchmark, results show that TyphoFormer consistently outperforms other state-of-the-art baseline methods, particularly under challenging scenarios involving nonlinear path shifts and limited historical observations.
nan
Article 963
Title@2025-06-21 (6): Towards Fundamental Limits for Active Multi-distribution Learning
Title: Towards Fundamental Limits for Active Multi-distribution Learning | Auf dem Weg zu grundlegenden Grenzen für aktives Multidistributionslernen | 走向积极的多分发学习基本限制 2506.17607v1 |
Authors (2): Chicheng Zhang, Yihan Zhou
Multi-distribution learning extends agnostic Probably Approximately Correct (PAC) learning to the setting in which a family of $k$ distributions, ${D_i}{i\in[k]}$, is considered and a classifier’s performance is measured by its error under the worst distribution. This problem has attracted a lot of recent interests due to its applications in collaborative learning, fairness, and robustness. Despite a rather complete picture of sample complexity of passive multi-distribution learning, research on active multi-distribution learning remains scarce, with algorithms whose optimality remaining unknown. In this paper, we develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds, in distribution-dependent and distribution-free settings. Specifically, in the near-realizable setting we prove an upper bound of $\widetilde{O}\Bigl(\theta{\max}(d+k)\ln\frac{1}{\varepsilon}\Bigr)$ and $\widetilde{O}\Bigl(\theta_{\max}(d+k)\Bigl(\ln\frac{1}{\varepsilon}+\frac{\nu^2}{\varepsilon^2}\Bigr)+\frac{k\nu}{\varepsilon^2}\Bigr)$ in the realizable and agnostic settings respectively, where $\theta_{\max}$ is the maximum disagreement coefficient among the $k$ distributions, $d$ is the VC dimension of the hypothesis class, $\nu$ is the multi-distribution error of the best hypothesis, and $\varepsilon$ is the target excess error. Moreover, we show that the bound in the realizable setting is information-theoretically optimal and that the $k\nu/\varepsilon^2$ term in the agnostic setting is fundamental for proper learners. We also establish instance-dependent sample complexity bound for passive multidistribution learning that smoothly interpolates between realizable and agnostic regimes~\citep{blum2017collaborative,zhang2024optimal}, which may be of independent interest.
nan
Article 964
Title@2025-06-21 (6): Unlearning Isn’t Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Title: Unlearning Isn’t Invisible: Detecting Unlearning Traces in LLMs from Model Outputs | Unlearning ist nicht unsichtbar: Unlearning Traces in LLMs von Model Outputs erkennen | 从模型产出中检测出LLMM中未学习的踪迹 2506.14003v2 |
Authors (5): Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu
Machine unlearning (MU) for large language models (LLMs), commonly referred to as LLM unlearning, seeks to remove specific undesirable data or knowledge from a trained model, while maintaining its performance on standard tasks. While unlearning plays a vital role in protecting data privacy, enforcing copyright, and mitigating sociotechnical harms in LLMs, we identify a new vulnerability post-unlearning: unlearning trace detection. We discover that unlearning leaves behind persistent ‘‘fingerprints’’ in LLMs, detectable traces in both model behavior and internal representations. These traces can be identified from output responses, even when prompted with forget-irrelevant inputs. Specifically, a simple supervised classifier can reliably determine whether a model has undergone unlearning based solely on its textual outputs. Further analysis shows that these traces are embedded in intermediate activations and propagate nonlinearly to the final layer, forming low-dimensional, learnable manifolds in activation space. Through extensive experiments, we show that forget-relevant prompts enable over 90% accuracy in detecting unlearning traces across all model sizes. Even with forget-irrelevant inputs, large LLMs maintain high detectability, demonstrating the broad applicability of unlearning trace detection. These findings reveal that unlearning leaves measurable signatures, introducing a new risk of reverse-engineering forgotten information when a model is identified as unlearned given an input query. Codes are available at https://github.com/OPTML-Group/Unlearn-Trace.
nan
Article 965
Title@2025-06-21 (6): Steering LLMs for Formal Theorem Proving
Title: Steering LLMs for Formal Theorem Proving | Lenkung LLMs für formale Theorem Proving | 正式理论证明指导LLMs 2502.15507v4 |
Authors (2): Shashank Kirtania, Arun Iyer
Large Language Models (LLMs) have shown promise in proving formal theorems using proof assistants like Lean. However, current state of the art language models struggles to predict next step in proofs leading practitioners to use different sampling techniques to improve LLMs capabilities. We observe that the LLM is capable of predicting the correct tactic; however, it faces challenges in ranking it appropriately within the set of candidate tactics, affecting the overall selection process. To overcome this hurdle, we use activation steering to guide LLMs responses to improve the generations at the time of inference. Our results suggest that activation steering offers a promising lightweight alternative to specialized fine-tuning for enhancing theorem proving capabilities in LLMs, particularly valuable in resource-constrained environments.
nan
Article 966
Title@2025-06-21 (6): Risk Bounds For Distributional Regression
Title: Risk Bounds For Distributional Regression | Risikogrenzen für distributive Regression | 分布性倒退的风险临界值 2505.09075v2 |
Authors (3): Carlos Misael Madrid Padilla, Oscar Hernan Madrid Padilla, Sabyasachi Chatterjee
This work examines risk bounds for nonparametric distributional regression estimators. For convex-constrained distributional regression, general upper bounds are established for the continuous ranked probability score (CRPS) and the worst-case mean squared error (MSE) across the domain. These theoretical results are applied to isotonic and trend filtering distributional regression, yielding convergence rates consistent with those for mean estimation. Furthermore, a general upper bound is derived for distributional regression under non-convex constraints, with a specific application to neural network-based estimators. Comprehensive experiments on both simulated and real data validate the theoretical contributions, demonstrating their practical effectiveness.
nan
Article 967
Title@2025-06-21 (6): HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models
Title: HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models | HalluRNN: Halluzinationen durch Recurrent Cross-Layer-Reasoning in großen Vision-Sprachenmodellen abmildern | HalluRNN:通过在大型视觉语言模型中反复出现的跨代理由减少幻觉 2506.17587v1 |
Authors (5): Le Yu, Kaishen Wang, Jianlong Xiong, Yue Cao, Tao He
Though Large Vision-Language Models (LVLMs) have achieved remarkable performance across various tasks, they are still prone to hallucinations-generating outputs that are textually plausible but visually ungrounded. While prior approaches generally address this issue through data-centric fine-tuning or innovative decoding strategies, these methods often require substantial resources or task-specific configurations. In this work, we introduce an architecture-level solution, HalluRNN, which enhances model stability through recurrent cross-layer reasoning. Specifically, we propose a novel Dual-Gated Depth Propagation Unit (DG-DPU) module, which is shared across layers and recurrently refines hidden states. This allows for the adaptive propagation of information throughout the model, enforces consistency across layers, and mitigates hallucinations caused by representational drift. By fine-tuning only the DG-DPU module, HalluRNN achieves strong and robust performance across multiple benchmarks.
nan
Article 968
Title@2025-06-21 (6): Novel Multicolumn Kernel Extreme Learning Machine for Food Detection via Optimal Features from CNN
Title: Novel Multicolumn Kernel Extreme Learning Machine for Food Detection via Optimal Features from CNN | Neuartige Multikolumn-Kernel Extreme Lernmaschine für Lebensmittel-Erkennung durch optimale Funktionen von CNN | 利用有线电视新闻网最佳地物检测食品的极端学习机器 2205.07348v2 |
Authors (2): Ghalib Ahmed Tahir, Chu Kiong Loo
Automatic food detection is an emerging topic of interest due to its wide array of applications ranging from detecting food images on social media platforms to filtering non-food photos from the users in dietary assessment apps. Recently, during the COVID-19 pandemic, it has facilitated enforcing an eating ban by automatically detecting eating activities from cameras in public places. Therefore, to tackle the challenge of recognizing food images with high accuracy, we proposed the idea of a hybrid framework for extracting and selecting optimal features from an efficient neural network. There on, a nonlinear classifier is employed to discriminate between linearly inseparable feature vectors with great precision. In line with this idea, our method extracts features from MobileNetV3, selects an optimal subset of attributes by using Shapley Additive exPlanations (SHAP) values, and exploits kernel extreme learning machine (KELM) due to its nonlinear decision boundary and good generalization ability. However, KELM suffers from the ‘curse of dimensionality problem’ for large datasets due to the complex computation of kernel matrix with large numbers of hidden nodes. We solved this problem by proposing a novel multicolumn kernel extreme learning machine (MCKELM) which exploited the k-d tree algorithm to divide data into N subsets and trains separate KELM on each subset of data. Then, the method incorporates KELM classifiers into parallel structures and selects the top k nearest subsets during testing by using the k-d tree search for classifying input instead of the whole network. For evaluating a proposed framework large food/non-food dataset is prepared using nine publically available datasets. Experimental results showed the superiority of our method on an integrated set of measures while solving the problem of ‘curse of dimensionality in KELM for large datasets.
nan
Article 969
Title@2025-06-21 (6): Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Title: Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models | Cite Pretrain: Retrieval-freie Wissenszuweisung für große Sprachmodelle | Cite Prettrain: 大语言模型的检索-无知识归属 2506.17585v1 |
Authors (5): Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra
Trustworthy language models should provide both correct and verifiable answers. While language models can sometimes attribute their outputs to pretraining data, their citations are often unreliable due to hallucination. As a result, current systems insert citations by querying an external retriever at inference time, introducing latency, infrastructure dependence, and vulnerability to retrieval noise. We explore whether LLMs can be made to reliably attribute to the documents seen during (continual) pretraining–without test-time retrieval–by revising the training process. To evaluate this, we release CitePretrainBench, a benchmark that mixes real-world corpora (Wikipedia, Common Crawl, arXiv) with novel, unseen documents and probes both short-form (single fact) and long-form (multi-fact) citation tasks. Our approach follows a two-stage process: (1) continual pretraining to bind facts to persistent document identifiers, and (2) instruction tuning to elicit citation behavior. We find that simple Passive Indexing, which appends an identifier to each document, helps memorize verbatim text but fails on paraphrased or compositional facts. Instead, we propose Active Indexing, which continually pretrains on synthetic QA pairs that (1) restate each fact in diverse compositional forms, and (2) require bidirectional source-to-fact and fact-to-source generation, jointly teaching the model to generate content from a cited source and to attribute its own answers. Experiments with Qwen2.5-7B and 3B show that Active Indexing consistently outperforms Passive Indexing across all tasks and models, with citation precision gains up to 30.2 percent. Our ablation studies reveal that performance continues to improve as we scale the amount of augmented data, showing a clear upward trend even at 16 times the original token count.
nan
Article 970
Title@2025-06-21 (6): LFR-PINO: A Layered Fourier Reduced Physics-Informed Neural Operator for Parametric PDEs
Title: LFR-PINO: A Layered Fourier Reduced Physics-Informed Neural Operator for Parametric PDEs | LFR-PINO: Ein geschichteter Fourier reduzierter physikinformierter Neuraloperator für parametrische PDEs | LFR-PINO: 用于参数PDE的多层四层减少四层物理学 2506.17582v1 |
Authors (7): Jing Wang, Biao Chen, Hairun Xie, Rui Wang, Yifan Xia, Jifa Zhang, Hui Xu
Physics-informed neural operators have emerged as a powerful paradigm for solving parametric partial differential equations (PDEs), particularly in the aerospace field, enabling the learning of solution operators that generalize across parameter spaces. However, existing methods either suffer from limited expressiveness due to fixed basis/coefficient designs, or face computational challenges due to the high dimensionality of the parameter-to-weight mapping space. We present LFR-PINO, a novel physics-informed neural operator that introduces two key innovations: (1) a layered hypernetwork architecture that enables specialized parameter generation for each network layer, and (2) a frequency-domain reduction strategy that significantly reduces parameter count while preserving essential spectral features. This design enables efficient learning of a universal PDE solver through pre-training, capable of directly handling new equations while allowing optional fine-tuning for enhanced precision. The effectiveness of this approach is demonstrated through comprehensive experiments on four representative PDE problems, where LFR-PINO achieves 22.8%-68.7% error reduction compared to state-of-the-art baselines. Notably, frequency-domain reduction strategy reduces memory usage by 28.6%-69.3% compared to Hyper-PINNs while maintaining solution accuracy, striking an optimal balance between computational efficiency and solution fidelity.
nan
Article 971
Title@2025-06-21 (6): Optimizing Mastery Learning by Fast-Forwarding Over-Practice Steps
Title: Optimizing Mastery Learning by Fast-Forwarding Over-Practice Steps | Mastery-Lernen optimieren, indem überpraktizierende Schritte schnell vorangebracht werden | 通过快速推进超实践步骤优化硕士学习 2506.17577v1 |
Authors (4): Meng Xia, Robin Schmucker, Conrad Borchers, Vincent Aleven
Mastery learning improves learning proficiency and efficiency. However, the overpractice of skills–students spending time on skills they have already mastered–remains a fundamental challenge for tutoring systems. Previous research has reduced overpractice through the development of better problem selection algorithms and the authoring of focused practice tasks. However, few efforts have concentrated on reducing overpractice through step-level adaptivity, which can avoid resource-intensive curriculum redesign. We propose and evaluate Fast-Forwarding as a technique that enhances existing problem selection algorithms. Based on simulation studies informed by learner models and problem-solving pathways derived from real student data, Fast-Forwarding can reduce overpractice by up to one-third, as it does not require students to complete problem-solving steps if all remaining pathways are fully mastered. Fast-Forwarding is a flexible method that enhances any problem selection algorithm, though its effectiveness is highest for algorithms that preferentially select difficult problems. Therefore, our findings suggest that while Fast-Forwarding may improve student practice efficiency, the size of its practical impact may also depend on students’ ability to stay motivated and engaged at higher levels of difficulty.
nan
Article 972
Title@2025-06-21 (6): Towards Deeper GCNs: Alleviating Over-smoothing via Iterative Training and Fine-tuning
Title: Towards Deeper GCNs: Alleviating Over-smoothing via Iterative Training and Fine-tuning | Auf dem Weg zu tieferen GCNs: Übersäuerung durch iteratives Training und Feinabstimmung mildern | 走向更深的GCNCs:通过迭接培训和微调减少过度移动 2506.17576v1 |
Authors (6): Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang
Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations’ dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model’s expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].
nan
Article 973
Title@2025-06-21 (6): Predicting Mild Cognitive Impairment Using Naturalistic Driving and Trip Destination Modeling
Title: Predicting Mild Cognitive Impairment Using Naturalistic Driving and Trip Destination Modeling | Voraussagen einer milden kognitiven Schädigung mittels naturalistischer Fahr- und Reisezielmodellierung | 利用自然驱动和出港目的地模型模型预测低度认知缺陷 2504.09027v2 |
Authors (7): Souradeep Chattopadhyay, Guillermo Basulto-Elias, Jun Ha Chang, Matthew Rizzo, Shauna Hallmark, Anuj Sharma, Soumik Sarkar
Understanding the relationship between mild cognitive impairment (MCI) and driving behavior is essential for enhancing road safety, particularly among older adults. This study introduces a novel approach by incorporating specific trip destinations-such as home, work, medical appointments, social activities, and errands-using geohashing to analyze the driving habits of older drivers in Nebraska. We employed a two-fold methodology that combines data visualization with advanced machine learning models, including C5.0, Random Forest, and Support Vector Machines, to assess the effectiveness of these location-based variables in predicting cognitive impairment. Notably, the C5.0 model showed a robust and stable performance, achieving a median recall of 0.68, which indicates that our methodology accurately identifies cognitive impairment in drivers 68\% of the time. This emphasizes our model’s capacity to reduce false negatives, a crucial factor given the profound implications of failing to identify impaired drivers. Our findings underscore the innovative use of life-space variables in understanding and predicting cognitive decline, offering avenues for early intervention and tailored support for affected individuals.
nan
Article 974
Title@2025-06-21 (6): Accelerating Residual Reinforcement Learning with Uncertainty Estimation
Title: Accelerating Residual Reinforcement Learning with Uncertainty Estimation | Beschleunigung des residualen Verstärkungslernens mit Unsicherheitsabschätzung | 以不确定的估算值加速剩余强化学习 2506.17564v1 |
Authors (7): Lakshita Dodeja, Karl Schmeckpeper, Shivam Vats, Thomas Weng, Mingxi Jia, George Konidaris, Stefanie Tellex
Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned polices in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.
nan
Article 975
Title@2025-06-21 (6): Stochastic Gradient Descent for Nonparametric Regression
Title: Stochastic Gradient Descent for Nonparametric Regression | Stochastischer Gradient Abstieg für nichtparametrische Regression | 用于非参数回退的斯托克渐变底层 2401.00691v4 |
Authors (2): Xin Chen, Jason M. Klusowski
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample. We also provide polynomial convergence rates even when the covariates do not have full support on their domain.
nan
Article 976
Title@2025-06-21 (6): SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference
Title: SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference | SynDaCaTE: Ein synthetischer Datensatz zur Bewertung der hierarchischen Inferenz | SynDaCaTE:用于评价整个整体等级推理部分的合成数据集 2506.17558v1 |
Authors (2): Jake Levi, Mark van der Wilk
Learning to infer object representations, and in particular part-whole hierarchies, has been the focus of extensive research in computer vision, in pursuit of improving data efficiency, systematic generalisation, and robustness. Models which are \emph{designed} to infer part-whole hierarchies, often referred to as capsule networks, are typically trained end-to-end on supervised tasks such as object classification, in which case it is difficult to evaluate whether such a model \emph{actually} learns to infer part-whole hierarchies, as claimed. To address this difficulty, we present a SYNthetic DAtaset for CApsule Testing and Evaluation, abbreviated as SynDaCaTE, and establish its utility by (1) demonstrating the precise bottleneck in a prominent existing capsule model, and (2) demonstrating that permutation-equivariant self-attention is highly effective for parts-to-wholes inference, which motivates future directions for designing effective inductive biases for computer vision.
nan
Article 977
Title@2025-06-21 (6): Multi-agent Markov Entanglement
Title: Multi-agent Markov Entanglement | Multi-Agent Markov Verschränkung | 多剂 Markov 缠绕 2506.02385v2 |
Authors (2): Shuze Chen, Tianyi Peng
Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of local functions: $V(s_1,s_2,\ldots,s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not “entangled” – a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the “Markov entanglement” for multi-agent MDPs and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of index policies is weakly entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show how Markov entanglement can be efficiently estimated in practice, providing practitioners with an empirical proxy for the quality of value decomposition.
nan
Article 978
Title@2025-06-21 (6): Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method
Title: Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method | Schnellere Low-Rank-Annäherung und Kernel Ridge-Regression über die Block-Nyström-Methode | 通过块-Nyström方法更快地低兰克相近和内核脊回归 2506.17556v1 |
Authors (2): Sachin Garg, Michał Dereziński
The Nystr"om method is a popular low-rank approximation technique for large matrices that arise in kernel methods and convex optimization. Yet, when the data exhibits heavy-tailed spectral decay, the effective dimension of the problem often becomes so large that even the Nystr"om method may be outside of our computational budget. To address this, we propose Block-Nystr"om, an algorithm that injects a block-diagonal structure into the Nystr"om method, thereby significantly reducing its computational cost while recovering strong approximation guarantees. We show that Block-Nystr"om can be used to construct improved preconditioners for second-order optimization, as well as to efficiently solve kernel ridge regression for statistical learning over Hilbert spaces. Our key technical insight is that, within the same computational budget, combining several smaller Nystr"om approximations leads to stronger tail estimates of the input spectrum than using one larger approximation. Along the way, we provide a novel recursive preconditioning scheme for efficiently inverting the Block-Nystr"om matrix, and provide new statistical learning bounds for a broad class of approximate kernel ridge regression solvers.
nan
Article 979
Title@2025-06-21 (6): Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach
Title: Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach | Balance zwischen Interferenz und Korrelation in räumlichen Experimentaldesigns: Ein ursächlicher Graphenschnitt-Ansatz | 空间实验设计中平衡干扰和关联:因果图表切割法 2505.20130v2 |
Authors (6): Jin Zhu, Jingyi Li, Hongyi Zhou, Yinan Lin, Zhenhua Lin, Chengchun Shi
This paper focuses on the design of spatial experiments to optimize the amount of information derived from the experimental data and enhance the accuracy of the resulting causal effect estimator. We propose a surrogate function for the mean squared error (MSE) of the estimator, which facilitates the use of classical graph cut algorithms to learn the optimal design. Our proposal offers three key advances: (1) it accommodates moderate to large spatial interference effects; (2) it adapts to different spatial covariance functions; (3) it is computationally efficient. Theoretical results and numerical experiments based on synthetic environments and a dispatch simulator that models a city-scale ridesharing market, further validate the effectiveness of our design. A python implementation of our method is available at https://github.com/Mamba413/CausalGraphCut.
nan
Article 980
Title@2025-06-21 (6): DRIMV_TSK: An Interpretable Surgical Evaluation Model for Incomplete Multi-View Rectal Cancer Data
Title: DRIMV_TSK: An Interpretable Surgical Evaluation Model for Incomplete Multi-View Rectal Cancer Data | DRIMV_TSK: Ein Interpretations-Surgical-Bewertungsmodell für unvollständige Rectal-Krebsdaten | DRIMV_TSK:不完全的多视力直肠癌数据可解释的外科评估模型 2506.17552v1 |
Authors (11): Wei Zhang, Zi Wang, Hanwen Zhou, Zhaohong Deng, Weiping Ding, Yuxi Ge, Te Zhang, Yuanpeng Zhang, Kup-Sze Choi, Shitong Wang, Shudong Hu
A reliable evaluation of surgical difficulty can improve the success of the treatment for rectal cancer and the current evaluation method is based on clinical data. However, more data about rectal cancer can be collected with the development of technology. Meanwhile, with the development of artificial intelligence, its application in rectal cancer treatment is becoming possible. In this paper, a multi-view rectal cancer dataset is first constructed to give a more comprehensive view of patients, including the high-resolution MRI image view, pressed-fat MRI image view, and clinical data view. Then, an interpretable incomplete multi-view surgical evaluation model is proposed, considering that it is hard to obtain extensive and complete patient data in real application scenarios. Specifically, a dual representation incomplete multi-view learning model is first proposed to extract the common information between views and specific information in each view. In this model, the missing view imputation is integrated into representation learning, and second-order similarity constraint is also introduced to improve the cooperative learning between these two parts. Then, based on the imputed multi-view data and the learned dual representation, a multi-view surgical evaluation model with the TSK fuzzy system is proposed. In the proposed model, a cooperative learning mechanism is constructed to explore the consistent information between views, and Shannon entropy is also introduced to adapt the view weight. On the MVRC dataset, we compared it with several advanced algorithms and DRIMV_TSK obtained the best results.
nan
Article 981
Title@2025-06-21 (6): Wireless-Friendly Window Position Optimization for RIS-Aided Outdoor-to-Indoor Networks based on Multi-Modal Large Language Model
Title: Wireless-Friendly Window Position Optimization for RIS-Aided Outdoor-to-Indoor Networks based on Multi-Modal Large Language Model | Wireless-Friendly Window Position Optimization für RIS-Aided Outdoor-to-Indoor-Netzwerke basierend auf Multi-Modal Large Language Model | 以多模式大语言模式为基础,优化以无线友好型友好型网络为主的外门对门至门网络的RIS辅助最佳窗口位置 2410.20691v2 |
Authors (9): Jinbo Hou, Kehai Qiu, Zitian Zhang, Yong Yu, Kezhi Wang, Stefano Capolongo, Jiliang Zhang, Zeyang Li, Jie Zhang
This paper aims to simultaneously optimize indoor wireless and daylight performance by adjusting the positions of windows and the beam directions of window-deployed reconfigurable intelligent surfaces (RISs) for RIS-aided outdoor-to-indoor (O2I) networks utilizing large language models (LLM) as optimizers. Firstly, we illustrate the wireless and daylight system models of RIS-aided O2I networks and formulate a joint optimization problem to enhance both wireless traffic sum rate and daylight illumination performance. Then, we present a multi-modal LLM-based window optimization (LMWO) framework, accompanied by a prompt construction template to optimize the overall performance in a zero-shot fashion, functioning as both an architect and a wireless network planner. Finally, we analyze the optimization performance of the LMWO framework and the impact of the number of windows, room size, number of RIS units, and daylight factor. Numerical results demonstrate that our proposed LMWO framework can achieve outstanding optimization performance in terms of initial performance, convergence speed, final outcomes, and time complexity, compared with classic optimization methods. The building’s wireless performance can be significantly enhanced while ensuring indoor daylight performance.
nan
Article 982
Title@2025-06-21 (6): Predicting E-commerce Purchase Behavior using a DQN-Inspired Deep Learning Model for enhanced adaptability
Title: Predicting E-commerce Purchase Behavior using a DQN-Inspired Deep Learning Model for enhanced adaptability | E-Commerce-Prognose Kaufverhalten mit einem DQN-inspirierten Deep Learning-Modell für verbesserte Anpassungsfähigkeit | 利用DQN启发的加强适应性的深学习模式预测电子商务采购行为 2506.17543v1 |
Authors (1): Aditi Madhusudan Jain
This paper presents a novel approach to predicting buying intent and product demand in e-commerce settings, leveraging a Deep Q-Network (DQN) inspired architecture. In the rapidly evolving landscape of online retail, accurate prediction of user behavior is crucial for optimizing inventory management, personalizing user experiences, and maximizing sales. Our method adapts concepts from reinforcement learning to a supervised learning context, combining the sequential modeling capabilities of Long Short-Term Memory (LSTM) networks with the strategic decision-making aspects of DQNs. We evaluate our model on a large-scale e-commerce dataset comprising over 885,000 user sessions, each characterized by 1,114 features. Our approach demonstrates robust performance in handling the inherent class imbalance typical in e-commerce data, where purchase events are significantly less frequent than non-purchase events. Through comprehensive experimentation with various classification thresholds, we show that our model achieves a balance between precision and recall, with an overall accuracy of 88\% and an AUC-ROC score of 0.88. Comparative analysis reveals that our DQN-inspired model offers advantages over traditional machine learning and standard deep learning approaches, particularly in its ability to capture complex temporal patterns in user behavior. The model’s performance and scalability make it well-suited for real-world e-commerce applications dealing with high-dimensional, sequential data. This research contributes to the field of e-commerce analytics by introducing a novel predictive modeling technique that combines the strengths of deep learning and reinforcement learning paradigms. Our findings have significant implications for improving demand forecasting, personalizing user experiences, and optimizing marketing strategies in online retail environments.
nan
Article 983
Title@2025-06-21 (6): EditLord: Learning Code Transformation Rules for Code Editing
Title: EditLord: Learning Code Transformation Rules for Code Editing | EditLord: Regeln zur Code-Transformation für die Code-Editing | 编辑主: 学习代码编辑的代码转换规则 2504.15284v3 |
Authors (6): Weichen Li, Albert Jan, Baishakhi Ray, Junfeng Yang, Chengzhi Mao, Kexin Pei
Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code’s intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they suffer from suboptimal performance and lack of robustness and generalization. We introduce EditLord, a code editing framework that makes the code transformation steps explicit. Our key insight is to employ a language model (LM) as an inductive learner to extract code editing rules from the training code pairs as concise meta-rule sets. Such rule sets will be manifested for each training sample to augment them for finetuning or assist in prompting- and iterative-based code editing. EditLordoutperforms the state-of-the-art by an average of 22.7% in editing performance and 58.1% in robustness while achieving 20.2% higher functional correctness across critical software engineering and security applications, LM models, and editing modes.
nan
Article 984
Title@2025-06-21 (6): MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization
Title: MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization | MTSIC: Mehrstufige Transformer-basierte GAN für spektrale Infrarot-Bildfarbgebung | MTIIC: 用于光谱红外红外图像色彩化的多级变形器GAN 2506.17540v1 |
Authors (8): Tingting Liu, Yuan Liu, Jinhui Tang, Liyin Yuan, Chengyu Liu, Chunlai Li, Xiubao Sui, Qian Chen
Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient feature extraction capabilities, which often result in image distortion and semantic ambiguity. In contrast, multiband infrared imagery provides richer spectral data, facilitating the preservation of finer details and enhancing semantic accuracy. In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images. The framework employs a multi-stage spectral self-attention Transformer network (MTSIC) as the generator. Each spectral feature is treated as a token for self-attention computation, and a multi-head self-attention mechanism forms a spatial-spectral attention residual block (SARB), achieving multi-band feature mapping and reducing semantic confusion. Multiple SARB units are integrated into a Transformer-based single-stage network (STformer), which uses a U-shaped architecture to extract contextual information, combined with multi-scale wavelet blocks (MSWB) to align semantic information in the spatial-frequency dual domain. Multiple STformer modules are cascaded to form MTSIC, progressively optimizing the reconstruction quality. Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images.
nan
Article 985
Title@2025-06-21 (6): ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
Title: ConsumerBench: Benchmarking Generative AI Applications on End-User Devices | ConsumerBench: Benchmarking Generative KI-Anwendungen auf Endgeräten | 消费者:确定最终用户设备应用基准 2506.17538v1 |
Authors (6): Yile Gu, Rohan Kadekodi, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci
The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.
nan
Article 986
Title@2025-06-21 (6): Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU
Title: Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU | Demokratie der KI Numerische Wettermodelle: Ein Beispiel für globale Vorhersagen mit FourCastNetv2 Hergestellt von einem Universitätsforschungslabor mit GPU | AI 数字气象模型的民主民主:大学研究实验室利用GPU用四CTNetv2进行的全球预测实例 2504.17028v2 |
Authors (8): Iman Khadir, Shane Stevenson, Henry Li, Kyle Krick, Abram Burrows, David Hall, Stan Posey, Samuel S. P. Shen
This paper demonstrates the feasibility of democratizing AI-driven global weather forecasting models among university research groups by leveraging Graphics Processing Units (GPUs) and freely available AI models, such as NVIDIA’s FourCastNetv2. FourCastNetv2 is an NVIDIA’s advanced neural network for weather prediction and is trained on a 73-channel subset of the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5) dataset at single levels and different pressure levels. Although the training specifications for FourCastNetv2 are not released to the public, the training documentation of the model’s first generation, FourCastNet, is available to all users. The training had 64 A100 GPUs and took 16 hours to complete. Although NVIDIA’s models offer significant reductions in both time and cost compared to traditional Numerical Weather Prediction (NWP), reproducing published forecasting results presents ongoing challenges for resource-constrained university research groups with limited GPU availability. We demonstrate both (i) leveraging FourCastNetv2 to create predictions through the designated application programming interface (API) and (ii) utilizing NVIDIA hardware to train the original FourCastNet model. Further, this paper demonstrates the capabilities and limitations of NVIDIA A100’s for resource-limited research groups in universities. We also explore data management, training efficiency, and model validation, highlighting the advantages and challenges of using limited high-performance computing resources. Consequently, this paper and its corresponding GitHub materials may serve as an initial guide for other university research groups and courses related to machine learning, climate science, and data science to develop research and education programs on AI weather forecasting, and hence help democratize the AI NWP in the digital economy.
nan
Article 987
Title@2025-06-21 (6): Infected Smallville: How Disease Threat Shapes Sociality in LLM Agents
Title: Infected Smallville: How Disease Threat Shapes Sociality in LLM Agents | Infizierte Smallville: Wie Krankheitsgefährdung die Gesellschaft in LLM-Agenten prägt | 小镇感染者:LLM代理中疾病威胁形态如何影响社会 2506.13783v2 |
Authors (4): Soyeon Choi, Kangwook Lee, Oliver Sng, Joshua M. Ackerman
How does the threat of infectious disease influence sociality among generative agents? We used generative agent-based modeling (GABM), powered by large language models, to experimentally test hypotheses about the behavioral immune system. Across three simulation runs, generative agents who read news about an infectious disease outbreak showed significantly reduced social engagement compared to agents who received no such news, including lower attendance at a social gathering, fewer visits to third places (e.g., cafe, store, park), and fewer conversations throughout the town. In interview responses, agents explicitly attributed their behavioral changes to disease-avoidance motivations. A validity check further indicated that they could distinguish between infectious and noninfectious diseases, selectively reducing social engagement only when there was a risk of infection. Our findings highlight the potential of GABM as an experimental tool for exploring complex human social dynamics at scale.
nan
Article 988
Title@2025-06-21 (6): Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment
Title: Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment | Quantum-Verstärkungs-Lernen für Power Grid Security Assessment | 提高量子强化学习促进电力网安全评估 2504.14412v2 |
Authors (2): Benjamin M. Peter, Mert Korkali
The increasingly challenging task of maintaining power grid security requires innovative solutions. Novel approaches using reinforcement learning (RL) agents have been proposed to help grid operators navigate the massive decision space and nonlinear behavior of these complex networks. However, applying RL to power grid security assessment, specifically for combinatorially troublesome contingency analysis problems, has proven difficult to scale. The integration of quantum computing into these RL frameworks helps scale by improving computational efficiency and boosting agent proficiency by leveraging quantum advantages in action exploration and model-based interdependence. To demonstrate a proof-of-concept use of quantum computing for RL agent training and simulation, we propose a hybrid agent that runs on quantum hardware using IBM’s Qiskit Runtime. We also provide detailed insight into the construction of parameterized quantum circuits (PQCs) for generating relevant quantum output. This agent’s proficiency at maintaining grid stability is demonstrated relative to a benchmark model without quantum enhancement using N-k contingency analysis. Additionally, we offer a comparative assessment of the training procedures for RL models integrated with a quantum backend.
nan
Article 989
Title@2025-06-21 (6): Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection
Title: Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection | Zwei Köpfe sind eigentlich besser als eins: Auf dem Weg zu besserer adversarialer Robustheit durch Transduktion und Ablehnung | 两个头比一个头实际更好:通过转换和拒绝实现更好的对抗力 2305.17528v2 |
Authors (6): Nils Palumbo, Yang Guo, Xi Wu, Jiefeng Chen, Yingyu Liang, Somesh Jha
Both transduction and rejection have emerged as important techniques for defending against adversarial perturbations. A recent work by Goldwasser et al. showed that rejection combined with transduction can give provable guarantees (for certain problems) that cannot be achieved otherwise. Nevertheless, under recent strong adversarial attacks, their work was shown to have low performance in a practical deep-learning setting. In this paper, we take a step towards realizing the promise of transduction+rejection in more realistic scenarios. Our key observation is that a novel application of a reduction technique by Tram`er, which was until now only used to demonstrate the vulnerability of certain defenses, can be used to actually construct effective defenses. Theoretically, we show that a careful application of this technique in the transductive setting can give significantly improved sample-complexity for robust generalization. Our theory guides us to design a new transductive algorithm for learning a selective model; extensive experiments using state of the art attacks show that our approach provides significantly better robust accuracy (81.6% on CIFAR-10 and 57.9% on CIFAR-100 under $l_\infty$ with budget 8/255) than existing techniques.
nan
Article 990
Title@2025-06-20 (5): A Survey of State Representation Learning for Deep Reinforcement Learning
Title: A Survey of State Representation Learning for Deep Reinforcement Learning | Eine Umfrage über staatliche Repräsentationslernen für tiefes Stärkungslernen | 国家代表深强化学习学习调查 2506.17518v1 |
Authors (2): Ayoub Echchahed, Pablo Samuel Castro
Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decision making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.
nan
Article 991
Title@2025-06-20 (5): Validating Mechanistic Interpretations: An Axiomatic Approach
Title: Validating Mechanistic Interpretations: An Axiomatic Approach | Validierung mechanistischer Interpretationen: Ein axiomatischer Ansatz | 验证机械学解释:一种不法方法 2407.13594v2 |
Authors (6): Nils Palumbo, Ravi Mangal, Zifan Wang, Saranya Vijayakumar, Corina S. Pasareanu, Somesh Jha
Mechanistic interpretability aims to reverse engineer the computation performed by a neural network in terms of its internal components. Although there is a growing body of research on mechanistic interpretation of neural networks, the notion of a mechanistic interpretation itself is often ad-hoc. Inspired by the notion of abstract interpretation from the program analysis literature that aims to develop approximate semantics for programs, we give a set of axioms that formally characterize a mechanistic interpretation as a description that approximately captures the semantics of the neural network under analysis in a compositional manner. We demonstrate the applicability of these axioms for validating mechanistic interpretations on an existing, well-known interpretability study as well as on a new case study involving a Transformer-based model trained to solve the well-known 2-SAT problem.
nan
Article 992
Title@2025-06-20 (5): IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G
Title: IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G | IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G | AI-Native 6G的I/Q流无线无线基础模型 2506.06718v2 |
Authors (2): Omar Mashaal, Hatem Abou-Zeid
Foundational models have shown remarkable potential in natural language processing and computer vision, yet remain in their infancy in wireless communications. While a few efforts have explored image-based modalities such as channel state information (CSI) and frequency spectrograms, foundational models that operate directly on raw IQ data remain largely unexplored. This paper presents, IQFM, the first I/Q signal foundational model for wireless communications. IQFM supporting diverse tasks: modulation classification, angle-of-arrival (AoA), beam prediction, and RF fingerprinting, without heavy preprocessing or handcrafted features. We also introduce a task-aware augmentation strategy that categorizes transformations into core augmentations, such as cyclic time shifting, and task-specific augmentations. This strategy forms the basis for structured, task-dependent representation learning within a contrastive self-supervised learning (SSL) framework. Using this strategy, the lightweight encoder, pre-trained via SSL on over-the-air multi-antenna IQ data, achieves up to 99.67% and 65.45% accuracy on modulation and AoA classification, respectively, using only one labeled sample per class, outperforming supervised baselines by up to 7x and 145x. The model also generalizes to out-of-distribution tasks; when adapted to new tasks using only 500 samples per class and minimal parameter updates via LoRA, the same frozen encoder achieves 94.15% on beam prediction (vs. 89.53% supervised), 50.00% on RML2016a modulation classification (vs. 49.30%), and 96.05% on RF fingerprinting (vs. 96.64%). These results demonstrate the potential of raw IQ-based foundational models as efficient, reusable encoders for multi-task learning in AI-native 6G systems.
nan
Article 993
Title@2025-06-20 (5): $L^*LM$: Learning Automata from Examples using Natural Language Oracles
Title: $L^*LM$: Learning Automata from Examples using Natural Language Oracles | $L^*LM$: Automata lernen aus Beispielen mit natürlichen Sprach-Orakeln | $LLM$:从使用自然语言甲骨文的例子中学习自动地图 2402.07051v2 |
Authors (5): Marcell Vazquez-Chanlatte, Karim Elmaaroufi, Stefan J. Witwicki, Matei Zaharia, Sanjit A. Seshia
Expert demonstrations have proven an easy way to indirectly specify complex tasks. Recent algorithms even support extracting unambiguous formal specifications, e.g. deterministic finite automata (DFA), from demonstrations. Unfortunately, these techniques are generally not sample efficient. In this work, we introduce $L^LM$, an algorithm for learning DFAs from both demonstrations and natural language. Due to the expressivity of natural language, we observe a significant improvement in the data efficiency of learning DFAs from expert demonstrations. Technically, $L^LM$ leverages large language models to answer membership queries about the underlying task. This is then combined with recent techniques for transforming learning from demonstrations into a sequence of labeled example learning problems. In our experiments, we observe the two modalities complement each other, yielding a powerful few-shot learner.
nan
Article 994
Title@2025-06-20 (5): Modeling Neural Networks with Privacy Using Neural Stochastic Differential Equations
Title: Modeling Neural Networks with Privacy Using Neural Stochastic Differential Equations | Modellierung neuraler Netzwerke mit Datenschutz mittels neuraler stochastischer Differentialgleichungen | 以使用神经神学差异等同的隐私建模神经网络 2501.06686v2 |
Authors (4): Sanghyun Hong, Fan Wu, Anthony Gruber, Kookjin Lee
In this work, we study the feasibility of using neural ordinary differential equations (NODEs) to model systems with intrinsic privacy properties. Unlike conventional feedforward neural networks, which have unlimited expressivity and can represent arbitrary mappings between inputs and outputs, NODEs constrain their learning to the solution of a system of differential equations. We first examine whether this constraint reduces memorization and, consequently, the membership inference risks associated with NODEs. We conduct a comprehensive evaluation of NODEs under membership inference attacks and show that they exhibit twice the resistance compared to conventional models such as ResNets. By analyzing the variance in membership risks across different NODE models, we find that their limited expressivity leads to reduced overfitting to the training data. We then demonstrate, both theoretically and empirically, that membership inference risks can be further mitigated by utilizing a stochastic variant of NODEs: neural stochastic differential equations (NSDEs). We show that NSDEs are differentially-private (DP) learners that provide the same provable privacy guarantees as DPSGD, the de-facto mechanism for training private models. NSDEs are also effective in mitigating membership inference attacks, achieving risk levels comparable to private models trained with DP-SGD while offering an improved privacyutility trade-off. Moreover, we propose a drop-in-replacement strategy that efficiently integrates NSDEs into conventional feedforward architectures to enhance their privacy.
nan
Article 995
Title@2025-06-20 (5): Episode-specific Fine-tuning for Metric-based Few-shot Learners with Optimization-based Training
Title: Episode-specific Fine-tuning for Metric-based Few-shot Learners with Optimization-based Training | Episodenspezifische Feinabstimmung für Metric-based Learner mit Optimization-based Training | 以 “ 优化化 “ 培训为 “ 以计量为基础的少见学生 “ 的 “ 最佳化 “ 培训 2506.17499v1 |
Authors (3): Xuanyu Zhuang, Geoffroy Peeters, Gaël Richard
In few-shot classification tasks (so-called episodes), a small set of labeled support samples is provided during inference to aid the classification of unlabeled query samples. Metric-based models typically operate by computing similarities between query and support embeddings within a learned metric space, followed by nearest-neighbor classification. However, these labeled support samples are often underutilized–they are only used for similarity comparison, despite their potential to fine-tune and adapt the metric space itself to the classes in the current episode. To address this, we propose a series of simple yet effective episode-specific, during-inference fine-tuning methods for metric-based models, including Rotational Division Fine-Tuning (RDFT) and its two variants, Iterative Division Fine-Tuning (IDFT) and Augmented Division Fine-Tuning (ADFT). These methods construct pseudo support-query pairs from the given support set to enable fine-tuning even for non-parametric models. Nevertheless, the severely limited amount of data in each task poses a substantial risk of overfitting when applying such fine-tuning strategies. To mitigate this, we further propose to train the metric-based model within an optimization-based meta-learning framework. With the combined efforts of episode-specific fine-tuning and optimization-based meta-training, metric-based models are equipped with the ability to rapidly adapt to the limited support samples during inference while avoiding overfitting. We validate our approach on three audio datasets from diverse domains, namely ESC-50 (environmental sounds), Speech Commands V2 (spoken keywords), and Medley-solos-DB (musical instrument). Experimental results demonstrate that our approach consistently improves performance for all evaluated metric-based models (especially for attention-based models) and generalizes well across different audio domains.
nan
Article 996
Title@2025-06-20 (5): From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training
Title: From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training | Von der Generalität zur Meisterschaft: Composer-Style Symbolic Music Generation via Large-Scale Pre-Training | 从普遍到掌握:通过大规模预培训创作作曲家-中继符号音乐 2506.17497v1 |
Authors (2): Mingyang Yao, Ke Chen
Despite progress in controllable symbolic music generation, data scarcity remains a challenge for certain control modalities. Composer-style music generation is a prime example, as only a few pieces per composer are available, limiting the modeling of both styles and fundamental music elements (e.g., melody, chord, rhythm). In this paper, we investigate how general music knowledge learned from a broad corpus can enhance the mastery of specific composer styles, with a focus on piano piece generation. Our approach follows a two-stage training paradigm. First, we pre-train a REMI-based music generation model on a large corpus of pop, folk, and classical music. Then, we fine-tune it on a small, human-verified dataset from four renowned composers, namely Bach, Mozart, Beethoven, and Chopin, using a lightweight adapter module to condition the model on style indicators. To evaluate the effectiveness of our approach, we conduct both objective and subjective evaluations on style accuracy and musicality. Experimental results demonstrate that our method outperforms ablations and baselines, achieving more precise composer-style modeling and better musical aesthetics. Additionally, we provide observations on how the model builds music concepts from the generality pre-training and refines its stylistic understanding through the mastery fine-tuning.
nan
Article 997
Title@2025-06-20 (5): Online Adaptation for Flying Quadrotors in Tight Formations
Title: Online Adaptation for Flying Quadrotors in Tight Formations | Online-Anpassung für fliegende Quadrotoren in engen Formationen | 近形飞行四方体在线适应 2506.17488v1 |
Authors (3): Pei-An Hsieh, Kong Yao Chee, M. Ani Hsieh
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
nan
Article 998
Title@2025-06-20 (5): Distilling On-device Language Models for Robot Planning with Minimal Human Intervention
Title: Distilling On-device Language Models for Robot Planning with Minimal Human Intervention | Destillieren von On-Device-Sprachmodellen für die Roboterplanung mit minimaler menschlicher Intervention | 利用最低限度的人力干预,为机器人规划继续采用现有设计语言模式 2506.17486v1 |
Authors (6): Zachary Ravichandran, Ignacio Hounie, Fernando Cladera, Alejandro Ribeiro, George J. Pappas, Vijay Kumar
Large language models (LLMs) provide robots with powerful contextual reasoning abilities and a natural human interface. Yet, current LLM-enabled robots typically depend on cloud-hosted models, limiting their usability in environments with unreliable communication infrastructure, such as outdoor or industrial settings. We present PRISM, a framework for distilling small language model (SLM)-enabled robot planners that run on-device with minimal human supervision. Starting from an existing LLM-enabled planner, PRISM automatically synthesizes diverse tasks and environments, elicits plans from the LLM, and uses this synthetic dataset to distill a compact SLM as a drop-in replacement of the source model. We apply PRISM to three LLM-enabled planners for mapping and exploration, manipulation, and household assistance, and we demonstrate that PRISM improves the performance of Llama-3.2-3B from 10-20% of GPT-4o’s performance to over 93% - using only synthetic data. We further demonstrate that the distilled planners generalize across heterogeneous robotic platforms (ground and aerial) and diverse environments (indoor and outdoor). We release all software, trained models, and datasets at https://zacravichandran.github.io/PRISM.
nan
Article 999
Title@2025-06-20 (5): Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization
Title: Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization | Entwirren und Regularisieren: Gebärdenspracheproduktion mit Artikulator-basierter Entwirren und Kanal-Bewusst-Regularisierung | 分解和规范化:手语制作,配有以动画师为基础的分解和频道-意识规范化 2504.06610v2 |
Authors (3): Sumeyye Meryem Tasyurek, Tugce Kiziltepe, Hacer Yalim Keles
In this work, we propose DARSLP, a simple gloss-free, transformer-based sign language production (SLP) framework that directly maps spoken-language text to sign pose sequences. We first train a pose autoencoder that encodes sign poses into a compact latent space using an articulator-based disentanglement strategy, where features corresponding to the face, right hand, left hand, and body are modeled separately to promote structured and interpretable representation learning. Next, a non-autoregressive transformer decoder is trained to predict these latent representations from sentence-level text embeddings. To guide this process, we apply channel-aware regularization by aligning predicted latent distributions with priors extracted from the ground-truth encodings using a KL-divergence loss. The contribution of each channel to the loss is weighted according to its associated articulator region, enabling the model to account for the relative importance of different articulators during training. Our approach does not rely on gloss supervision or pretrained models, and achieves state-of-the-art results on the PHOENIX14T and CSL-Daily datasets.
nan
Article 1000
Title@2025-06-20 (5): A geometric framework for momentum-based optimizers for low-rank training
Title: A geometric framework for momentum-based optimizers for low-rank training | Ein geometrischer Rahmen für Impuls-basierte Optimatoren für Low-Rank-Training | 低级培训动力优化动力优化的几何框架 2506.17475v1 |
Authors (3): Steffen Schotthöfer, Timon Klein, Jonas Kusch
Low-rank pre-training and fine-tuning have recently emerged as promising techniques for reducing the computational and storage costs of large neural networks. Training low-rank parameterizations typically relies on conventional optimizers such as heavy ball momentum methods or Adam. In this work, we identify and analyze potential difficulties that these training methods encounter when used to train low-rank parameterizations of weights. In particular, we show that classical momentum methods can struggle to converge to a local optimum due to the geometry of the underlying optimization landscape. To address this, we introduce novel training strategies derived from dynamical low-rank approximation, which explicitly account for the underlying geometric structure. Our approach leverages and combines tools from dynamical low-rank approximation and momentum-based optimization to design optimizers that respect the intrinsic geometry of the parameter space. We validate our methods through numerical experiments, demonstrating faster convergence, and stronger validation metrics at given parameter budgets.
nan
Article 1001
Title@2025-06-20 (5): Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients
Title: Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients | Fed-Pilot: Optimierung der LoRA-Allokation für effizientes Federated Fine-Tuning mit heterogenen Kunden | Fed-试点:优化LORA分配,与异质客户进行高效的联邦货币调整 2410.10200v2 |
Authors (4): Zikai Zhang, Rui Hu, Ping Liu, Jiahao Xu
Federated Learning enables the fine-tuning of foundation models (FMs) across distributed clients for specific tasks; however, its scalability is limited by the heterogeneity of client memory capacities. In this work, we propose Fed-pilot, a memory-efficient federated fine-tuning framework. It enables memory-constrained clients to participate in Low-Rank Adaptation (LoRA)-based fine-tuning by training only a subset of LoRA modules locally. Fed-pilot identifies the optimal selection of trainable LoRA modules as a knapsack optimization problem, maximizing model performance under memory constraints for each client. To mitigate inconsistencies arising from heterogeneous module allocations and Non-IID data, Fed-pilot employs a novel aggregation rule that dynamically compensates for under-updated layers. Extensive experiments on five diverse datasets across various heterogeneous data settings demonstrate Fed-pilot’s effectiveness and efficiency compared to state-of-the-art methods. To the best of our knowledge, this is the first study on federated fine-tuning of FMs that integrates memory-constrained optimization. The code will be publicly available.
nan
Article 1002
Title@2025-06-20 (5): Distributional Training Data Attribution
Title: Distributional Training Data Attribution | Verteilung der Ausbildungsdaten | 分配培训数据 2506.12965v2 |
Authors (7): Bruno Mlodozeniec, Isaac Reid, Sam Power, David Krueger, Murat Erdogdu, Richard E. Turner, Roger Grosse
Randomness is an unavoidable part of training deep learning models, yet something that traditional training data attribution algorithms fail to rigorously account for. They ignore the fact that, due to stochasticity in the initialisation and batching, training on the same dataset can yield different models. In this paper, we address this shortcoming through introducing distributional training data attribution (d-TDA), the goal of which is to predict how the distribution of model outputs (over training runs) depends upon the dataset. We demonstrate the practical significance of d-TDA in experiments, e.g. by identifying training examples that drastically change the distribution of some target measurement without necessarily changing the mean. Intriguingly, we also find that influence functions (IFs), a popular but poorly-understood data attribution tool, emerge naturally from our distributional framework as the limit to unrolled differentiation; without requiring restrictive convexity assumptions. This provides a new mathematical motivation for their efficacy in deep learning, and helps to characterise their limitations.
nan
Article 1003
Title@2025-06-20 (5): Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms
Title: Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms | Modellierung des menschlichen visuellen Systems: Vergleichende Erkenntnisse aus response-optimierten und aufgabenoptimierten Visionsmodellen, Sprachmodellen und verschiedenen Auslesemechanismen | 模拟人类视觉系统:从反应适应和任务适应的视觉模型、语言模型和不同的阅读机制中比较透视 2410.14031v4 |
Authors (3): Shreya Saha, Ishaan Chadha, Meenakshi khosla
Over the past decade, predictive modeling of neural responses in the primate visual system has advanced significantly, largely driven by various DNN approaches. These include models optimized directly for visual recognition, cross-modal alignment through contrastive objectives, neural response prediction from scratch, and large language model embeddings.Likewise, different readout mechanisms, ranging from fully linear to spatial-feature factorized methods have been explored for mapping network activations to neural responses. Despite the diversity of these approaches, it remains unclear which method performs best across different visual regions. In this study, we systematically compare these approaches for modeling the human visual system and investigate alternative strategies to improve response predictions. Our findings reveal that for early to mid-level visual areas, response-optimized models with visual inputs offer superior prediction accuracy, while for higher visual regions, embeddings from LLMs based on detailed contextual descriptions of images and task-optimized models pretrained on large vision datasets provide the best fit. Through comparative analysis of these modeling approaches, we identified three distinct regions in the visual cortex: one sensitive primarily to perceptual features of the input that are not captured by linguistic descriptions, another attuned to fine-grained visual details representing semantic information, and a third responsive to abstract, global meanings aligned with linguistic content. We also highlight the critical role of readout mechanisms, proposing a novel scheme that modulates receptive fields and feature maps based on semantic content, resulting in an accuracy boost of 3-23% over existing SOTAs for all models and brain regions. Together, these findings offer key insights into building more precise models of the visual system.
nan
Article 1004
Title@2025-06-20 (5): SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
Title: SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving | SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v2 |
Authors (8): Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Hans Vandierendonck, Deepu John, Bo Ji, Dimitrios Nikolopoulos
Regardless of the advancements in device capabilities, efficient inferencing advanced large language models (LLMs) at the edge remains challenging due to limited device memory and power constraints. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new approach that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a method that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server efficiently batches and verifies the tokens utilizing a more precise target model. This approach supports device heterogeneity and reduces server-side memory footprint by avoiding the need to deploy multiple target models. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: significantly increased system throughput, capacity, and better cost efficiency, all without sacrificing model accuracy.
nan
Article 1005
Title@2025-06-20 (5): Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems
Title: Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems | Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems | 理解大语言模型对书写和信息生态系统的影响的计算方法 2506.17467v1 |
Authors (1): Weixin Liang
Large language models (LLMs) have shown significant potential to change how we write, communicate, and create, leading to rapid adoption across society. This dissertation examines how individuals and institutions are adapting to and engaging with this emerging technology through three research directions. First, I demonstrate how the institutional adoption of AI detectors introduces systematic biases, particularly disadvantaging writers of non-dominant language varieties, highlighting critical equity concerns in AI governance. Second, I present novel population-level algorithmic approaches that measure the increasing adoption of LLMs across writing domains, revealing consistent patterns of AI-assisted content in academic peer reviews, scientific publications, consumer complaints, corporate communications, job postings, and international organization press releases. Finally, I investigate LLMs’ capability to provide feedback on research manuscripts through a large-scale empirical analysis, offering insights into their potential to support researchers who face barriers in accessing timely manuscript feedback, particularly early-career researchers and those from under-resourced settings.
nan
Article 1006
Title@2025-06-20 (5): FedNAMs: Performing Interpretability Analysis in Federated Learning Context
Title: FedNAMs: Performing Interpretability Analysis in Federated Learning Context | FedNAMs: Interpretationsanalyse im Federated Learning Context durchführen | FFNAM: 在联邦学习背景下进行解释性分析 2506.17466v1 |
Authors (3): Amitash Nanda, Sree Bhargavi Balija, Debashis Sahoo
Federated learning continues to evolve but faces challenges in interpretability and explainability. To address these challenges, we introduce a novel approach that employs Neural Additive Models (NAMs) within a federated learning framework. This new Federated Neural Additive Models (FedNAMs) approach merges the advantages of NAMs, where individual networks concentrate on specific input features, with the decentralized approach of federated learning, ultimately producing interpretable analysis results. This integration enhances privacy by training on local data across multiple devices, thereby minimizing the risks associated with data centralization and improving model robustness and generalizability. FedNAMs maintain detailed, feature-specific learning, making them especially valuable in sectors such as finance and healthcare. They facilitate the training of client-specific models to integrate local updates, preserve privacy, and mitigate concerns related to centralization. Our studies on various text and image classification tasks, using datasets such as OpenFetch ML Wine, UCI Heart Disease, and Iris, show that FedNAMs deliver strong interpretability with minimal accuracy loss compared to traditional Federated Deep Neural Networks (DNNs). The research involves notable findings, including the identification of critical predictive features at both client and global levels. Volatile acidity, sulfates, and chlorides for wine quality. Chest pain type, maximum heart rate, and number of vessels for heart disease. Petal length and width for iris classification. This approach strengthens privacy and model efficiency and improves interpretability and robustness across diverse datasets. Finally, FedNAMs generate insights on causes of highly and low interpretable features.
nan
Article 1007
Title@2025-06-20 (5): LieDetect: Detection of representation orbits of compact Lie groups from point clouds
Title: LieDetect: Detection of representation orbits of compact Lie groups from point clouds | LieDetect: Erkennung von Darstellungsbahnen kompakter Lie-Gruppen von Punktwolken | 测谎:从点云中探测到紧凑层的代表轨道 2309.03086v2 |
Authors (2): Henrique Ennes, Raphaël Tinarrage
We suggest a new algorithm to estimate representations of compact Lie groups from finite samples of their orbits. Different from other reported techniques, our method allows the retrieval of the precise representation type as a direct sum of irreducible representations. Moreover, the knowledge of the representation type permits the reconstruction of its orbit, which is useful for identifying the Lie group that generates the action, from a finite list of candidates. Our algorithm is general for any compact Lie group, but only instantiations for SO(2), T^d, SU(2), and SO(3) are considered. Theoretical guarantees of robustness in terms of Hausdorff and Wasserstein distances are derived. Our tools are drawn from geometric measure theory, computational geometry, and optimization on matrix manifolds. The algorithm is tested for synthetic data up to dimension 32, as well as real-life applications in image analysis, harmonic analysis, density estimation, equivariant neural networks, chemical conformational spaces, and classical mechanics systems, achieving very accurate results.
nan
Article 1008
Title@2025-06-20 (5): Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Title: Directional Gradient Projection for Robust Fine-Tuning of Foundation Models | Richtgradientenprojektion für robuste Feinsteuerung von Fundamentmodellen | 基金会模型硬性精美调整方向梯度预测 2502.15895v2 |
Authors (5): Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira
Robust fine-tuning aims to adapt large foundation models to downstream tasks while preserving their robustness to distribution shifts. Existing methods primarily focus on constraining and projecting current model towards the pre-trained initialization based on the magnitudes between fine-tuned and pre-trained weights, which often require extensive hyper-parameter tuning and can sometimes result in underfitting. In this work, we propose Directional Gradient Projection (DiGraP), a novel layer-wise trainable method that incorporates directional information from gradients to bridge regularization and multi-objective optimization. Besides demonstrating our method on image classification, as another contribution we generalize this area to the multi-modal evaluation settings for robust fine-tuning. Specifically, we first bridge the uni-modal and multi-modal gap by performing analysis on Image Classification reformulated Visual Question Answering (VQA) benchmarks and further categorize ten out-of-distribution (OOD) VQA datasets by distribution shift types and degree (i.e. near versus far OOD). Experimental results show that DiGraP consistently outperforms existing baselines across Image Classfication and VQA tasks with discriminative and generative backbones, improving both in-distribution (ID) generalization and OOD robustness.
nan
Article 1009
Title@2025-06-20 (5): A Comparative Analysis of Distributed Linear Solvers under Data Heterogeneity
Title: A Comparative Analysis of Distributed Linear Solvers under Data Heterogeneity | Eine vergleichende Analyse der verteilten linearen Solver unter Daten Heterogenität | 数据差异下分布线性溶剂的比较分析 2304.10640v4 |
Authors (4): Boris Velasevic, Rohit Parasnis, Christopher G. Brinton, Navid Azizan
We consider the problem of solving a large-scale system of linear equations in a distributed or federated manner by a taskmaster and a set of machines, each possessing a subset of the equations. We provide a comprehensive comparison of two well-known classes of algorithms used to solve this problem: projection-based methods and optimization-based methods. First, we introduce a novel geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we characterize the optimal convergence rates of the most prominent algorithms from each class, capturing the effects of the number of machines, the number of equations, and that of both cross-machine and local data heterogeneity on these rates. Our analysis establishes the superiority of Accelerated Projected Consensus in realistic scenarios with significant data heterogeneity and offers several insights into how angular heterogeneity affects the efficiency of the methods studied. Additionally, we develop distributed algorithms for the efficient computation of the proposed angular heterogeneity metrics. Our extensive numerical analyses validate and complement our theoretical results.
nan
Article 1010
Title@2025-06-20 (5): UT-GraphCast Hindcast Dataset: A Global AI Forecast Archive from UT Austin for Weather and Climate Applications
Title: UT-GraphCast Hindcast Dataset: A Global AI Forecast Archive from UT Austin for Weather and Climate Applications | UT-GraphCast Hindcast Dataset: Ein globales KI-Prognosearchiv aus UT Austin für Wetter- und Klimaanwendungen | UT-GraphCast Hindcast 数据集:来自UT Austin的天气和气候应用全球AI预报档案 2506.17453v1 |
Authors (7): Naveen Sudharsan, Manmeet Singh, Harsh Kamath, Hassan Dashtian, Clint Dawson, Zong-Liang Yang, Dev Niyogi
The UT GraphCast Hindcast Dataset from 1979 to 2024 is a comprehensive global weather forecast archive generated using the Google DeepMind GraphCast Operational model. Developed by researchers at The University of Texas at Austin under the WCRP umbrella, this dataset provides daily 15 day deterministic forecasts at 00UTC on an approximately 25 km global grid for a 45 year period. GraphCast is a physics informed graph neural network that was trained on ECMWF ERA5 reanalysis. It predicts more than a dozen key atmospheric and surface variables on 37 vertical levels, delivering a full medium range forecast in under one minute on modern hardware.
nan
Article 1011
Title@2025-06-20 (5): Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking
Title: Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking | Skalierbare Einheitsharmonisierung in der medizinischen Informatik über Bayesian-Optimized Retrieval und Transformer-Based Re-Ranking | 通过Bayesian-Operimized检索和变压器重新排位,在医疗信息学中通过Bayesian-Operimized检索和变压器重新排位 2505.00810v2 |
Authors (1): Jordi de la Torre
Objective: To develop and evaluate a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets, addressing a key barrier to data interoperability. Materials and Methods: We designed a novel unit harmonization system combining BM25, sentence embeddings, Bayesian optimization, and a bidirectional transformer based binary classifier for retrieving and matching laboratory test entries. The system was evaluated using the Optum Clinformatics Datamart dataset (7.5 billion entries). We implemented a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation. Performance was assessed using Mean Reciprocal Rank (MRR) and other standard information retrieval metrics. Results: Our hybrid retrieval approach combining BM25 and sentence embeddings (MRR: 0.8833) significantly outperformed both lexical-only (MRR: 0.7985) and embedding-only (MRR: 0.5277) approaches. The transformer-based reranker further improved performance (absolute MRR improvement: 0.10), bringing the final system MRR to 0.9833. The system achieved 83.39\% precision at rank 1 and 94.66\% recall at rank 5. Discussion: The hybrid architecture effectively leverages the complementary strengths of lexical and semantic approaches. The reranker addresses cases where initial retrieval components make errors due to complex semantic relationships in medical terminology. Conclusion: Our framework provides an efficient, scalable solution for unit harmonization in clinical datasets, reducing manual effort while improving accuracy. Once harmonized, data can be reused seamlessly in different analyses, ensuring consistency across healthcare systems and enabling more reliable multi-institutional studies and meta-analyses.
nan
Article 1012
Title@2025-06-20 (5): FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
Title: FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | FRAMES-VQA: Benchmarking Fine-Tuning Robustheit über Multi-Modal Shifts in der visuellen Fragestellung | FRAMES-VQA:确定视觉问题解答中多模式变化的精确调整强度基准 2505.21755v2 |
Authors (4): Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira
Visual question answering (VQA) systems face significant challenges when adapting to real-world data shifts, especially in multi-modal contexts. While robust fine-tuning strategies are essential for maintaining performance across in-distribution (ID) and out-of-distribution (OOD) scenarios, current evaluation settings are primarily unimodal or particular to some types of OOD, offering limited insight into the complexities of multi-modal contexts. In this work, we propose a new benchmark FRAMES-VQA (Fine-Tuning Robustness across Multi-Modal Shifts in VQA) for evaluating robust fine-tuning for VQA tasks. We utilize ten existing VQA benchmarks, including VQAv2, IV-VQA, VQA-CP, OK-VQA and others, and categorize them into ID, near and far OOD datasets covering uni-modal, multi-modal and adversarial distribution shifts. We first conduct a comprehensive comparison of existing robust fine-tuning methods. We then quantify the distribution shifts by calculating the Mahalanobis distance using uni-modal and multi-modal embeddings extracted from various models. Further, we perform an extensive analysis to explore the interactions between uni- and multi-modal shifts as well as modality importance for ID and OOD samples. These analyses offer valuable guidance on developing more robust fine-tuning methods to handle multi-modal distribution shifts. The code is available at https://github.com/chengyuehuang511/FRAMES-VQA .
nan
Article 1013
Title@2025-06-20 (5): Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation
Title: Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation | Medizinische KI gesund halten: Eine Überprüfung der Erkennungs- und Korrekturmethoden für Systemabbau | 保持医疗全健康:系统退化检测和纠正方法审查 2506.17442v1 |
Authors (3): Hao Guan, David Bates, Li Zhou
Artificial intelligence (AI) is increasingly integrated into modern healthcare, offering powerful support for clinical decision-making. However, in real-world settings, AI systems may experience performance degradation over time, due to factors such as shifting data distributions, changes in patient characteristics, evolving clinical protocols, and variations in data quality. These factors can compromise model reliability, posing safety concerns and increasing the likelihood of inaccurate predictions or adverse outcomes. This review presents a forward-looking perspective on monitoring and maintaining the “health” of AI systems in healthcare. We highlight the urgent need for continuous performance monitoring, early degradation detection, and effective self-correction mechanisms. The paper begins by reviewing common causes of performance degradation at both data and model levels. We then summarize key techniques for detecting data and model drift, followed by an in-depth look at root cause analysis. Correction strategies are further reviewed, ranging from model retraining to test-time adaptation. Our survey spans both traditional machine learning models and state-of-the-art large language models (LLMs), offering insights into their strengths and limitations. Finally, we discuss ongoing technical challenges and propose future research directions. This work aims to guide the development of reliable, robust medical AI systems capable of sustaining safe, long-term deployment in dynamic clinical settings.
nan
Article 1014
Title@2025-06-20 (5): Memorization to Generalization: Emergence of Diffusion Models from Associative Memory
Title: Memorization to Generalization: Emergence of Diffusion Models from Associative Memory | Erinnerung an die Verallgemeinerung: Entstehung von Diffusionsmodellen aus dem assoziativen Gedächtnis | 记忆化为普遍化:共同内存传播模型的出现 2505.21777v2 |
Authors (6): Bao Pham, Gabriel Raya, Matteo Negri, Mohammed J. Zaki, Luca Ambrogioni, Dmitry Krotov
Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. In this work, we examine diffusion models, commonly used in generative modeling, from the perspective of AMs. The training phase of diffusion model is conceptualized as memory encoding (training data is stored in the memory). The generation phase is viewed as an attempt of memory retrieval. In the small data regime the diffusion model exhibits a strong memorization phase, where the network creates distinct basins of attraction around each sample in the training set, akin to the Hopfield model below the critical memory load. In the large data regime, a different phase appears where an increase in the size of the training set fosters the creation of new attractor states that correspond to manifolds of the generated samples. Spurious states appear at the boundary of this transition and correspond to emergent attractor states, which are absent in the training set, but, at the same time, have distinct basins of attraction around them. Our findings provide: a novel perspective on the memorization-generalization phenomenon in diffusion models via the lens of AMs, theoretical prediction of existence of spurious states, empirical validation of this prediction in commonly-used diffusion models.
nan
Article 1015
Title@2025-06-20 (5): Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search
Title: Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search | Sequenz-zu-Sequenz-Modelle mit Aufmerksamkeit Mechanistisch Karte zur Architektur des menschlichen Gedächtnisses Suche | 人类记忆搜索建筑图的顺序到顺序模型,注意人类记忆搜索结构的机械图 2506.17424v1 |
Authors (2): Nikolaus Salvatore, Qiong Zhang
Past work has long recognized the important role of context in guiding how humans search their memory. While context-based memory models can explain many memory phenomena, it remains unclear why humans develop such architectures over possible alternatives in the first place. In this work, we demonstrate that foundational architectures in neural machine translation – specifically, recurrent neural network (RNN)-based sequence-to-sequence models with attention – exhibit mechanisms that directly correspond to those specified in the Context Maintenance and Retrieval (CMR) model of human memory. Since neural machine translation models have evolved to optimize task performance, their convergence with human memory models provides a deeper understanding of the functional role of context in human memory, as well as presenting new ways to model human memory. Leveraging this convergence, we implement a neural machine translation model as a cognitive model of human memory search that is both interpretable and capable of capturing complex dynamics of learning. We show that our model accounts for both averaged and optimal human behavioral patterns as effectively as context-based memory models. Further, we demonstrate additional strengths of the proposed model by evaluating how memory search performance emerges from the interaction of different model components.
nan
Article 1016
Title@2025-06-20 (5): UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making
Title: UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making | UProp: Untersuchung der Unsicherheitsausbreitung von LLMs in mehrstufiger agentischer Entscheidungsfindung | UPROP:调查多级制剂决策中LLMs的不确定性传播情况 2506.17419v1 |
Authors (6): Jinhao Duan, James Diffenderfer, Sandeep Madireddy, Tianlong Chen, Bhavya Kailkhura, Kaidi Xu
As Large Language Models (LLMs) are integrated into safety-critical applications involving sequential decision-making in the real world, it is essential to know when to trust LLM decisions. Existing LLM Uncertainty Quantification (UQ) methods are primarily designed for single-turn question-answering formats, resulting in multi-step decision-making scenarios, e.g., LLM agentic system, being underexplored. In this paper, we introduce a principled, information-theoretic framework that decomposes LLM sequential decision uncertainty into two parts: (i) internal uncertainty intrinsic to the current decision, which is focused on existing UQ methods, and (ii) extrinsic uncertainty, a Mutual-Information (MI) quantity describing how much uncertainty should be inherited from preceding decisions. We then propose UProp, an efficient and effective extrinsic uncertainty estimator that converts the direct estimation of MI to the estimation of Pointwise Mutual Information (PMI) over multiple Trajectory-Dependent Decision Processes (TDPs). UProp is evaluated over extensive multi-step decision-making benchmarks, e.g., AgentBench and HotpotQA, with state-of-the-art LLMs, e.g., GPT-4.1 and DeepSeek-V3. Experimental results demonstrate that UProp significantly outperforms existing single-turn UQ baselines equipped with thoughtful aggregation strategies. Moreover, we provide a comprehensive analysis of UProp, including sampling efficiency, potential applications, and intermediate uncertainty propagation, to demonstrate its effectiveness. Codes will be available at https://github.com/jinhaoduan/UProp.
nan
Article 1017
Title@2025-06-20 (5): Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble
Title: Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | Zweite Meinungsfrage: Auf dem Weg zu adaptiver klinischer KI über den Konsens des Expert Model Ensembles | 第二意见事项:通过专家示范组共识实现适应性临床AI 2505.23075v2 |
Authors (9): Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov
Despite the growing clinical adoption of large language models (LLMs), current approaches heavily rely on single model architectures. To overcome risks of obsolescence and rigid dependence on single model systems, we present a novel framework, termed the Consensus Mechanism. Mimicking clinical triage and multidisciplinary clinical decision-making, the Consensus Mechanism implements an ensemble of specialized medical expert agents enabling improved clinical decision making while maintaining robust adaptability. This architecture enables the Consensus Mechanism to be optimized for cost, latency, or performance, purely based on its interior model configuration. To rigorously evaluate the Consensus Mechanism, we employed three medical evaluation benchmarks: MedMCQA, MedQA, and MedXpertQA Text, and the differential diagnosis dataset, DDX+. On MedXpertQA, the Consensus Mechanism achieved an accuracy of 61.0% compared to 53.5% and 45.9% for OpenAI’s O3 and Google’s Gemini 2.5 Pro. Improvement was consistent across benchmarks with an increase in accuracy on MedQA ($\Delta\mathrm{Accuracy}{\mathrm{consensus\text{-}O3}} = 3.4\%$) and MedMCQA ($\Delta\mathrm{Accuracy}{\mathrm{consensus\text{-}O3}} = 9.1\%$). These accuracy gains extended to differential diagnosis generation, where our system demonstrated improved recall and precision (F1$\mathrm{consensus}$ = 0.326 vs. F1${\mathrm{O3\text{-}high}}$ = 0.2886) and a higher top-1 accuracy for DDX (Top1$\mathrm{consensus}$ = 52.0% vs. Top1${\mathrm{O3\text{-}high}}$ = 45.2%).
nan
Article 1018
Title@2025-06-20 (5): Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
Title: Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning | Das kostenlose Mittagessen stehlen: Die Grenzen des Dyna-Style-Verstärkungslernens aufzeigen | 偷免费午餐:暴露妇产科强化学习的极限 2412.14312v3 |
Authors (2): Brett Barkley, David Fridovich-Keil
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process – the backbone of Dyna-style algorithms – significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.
nan
Article 1019
Title@2025-06-20 (5): Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?
Title: Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? | Aha Moment Revisited: Sind VLMs wirklich in der Lage, Selbstverifizierung in Folgezeit Scaling? | 重新审视动态:在推论-时间尺度方面,VLMs是否真正有能力进行自我核查? 2506.17417v1 |
Authors (8): Mingyuan Wu, Meitang Li, Jingcheng Yang, Jize Jiang, Kaizhuo Yan, Zhaoheng Li, Minjia Zhang, Klara Nahrstedt
Recent advances in large language models (LLMs) have demonstrated that inference-time computation techniques, such as decoding-time scaling and self-refinement, can significantly enhance reasoning capabilities without relying on external knowledge. A key driver of this success is the emergence of self-correction and self-verification behaviors, often elicited through reinforcement learning (RL). In this paper, we investigate whether these inference-time techniques extend effectively to vision-language models (VLMs), particularly those trained with RL. We find that while decoding strategies such as majority voting and best-of-N selection with self-verification all improve VLM reasoning performance, generation-reliant methods such as the former achieve significantly higher gains versus verification-reliant methods such as the latter. Additionally, the self-correction behavior often associated with RL-tuned models, such as aha moment, does not lead to measurable gains. We show via extensive experimentation within the inference-time scaling framework to identify a key root cause: RL-trained VLMs still lack robust self-verification capabilities across both visual and textual modalities.
nan
Article 1020
Title@2025-06-20 (5): Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation
Title: Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation | Adaptive Steuerung Aufmerksamkeit Netzwerk für Unterwasser-akustische Lokalisierung und Domain-Anpassung | 水下声传本土化和域域改造适应性控制关注网络 2506.17409v1 |
Authors (4): Quoc Thinh Vo, Joe Woods, Priontu Chowdhury, David K. Han
Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment. Factors such as high background noise, irregular underwater geometries, and varying acoustic properties make accurate localization difficult. To address these obstacles, we propose a multi-branch network architecture designed to accurately predict the distance between a moving acoustic source and a receiver, tested on real-world underwater signal arrays. The network leverages Convolutional Neural Networks (CNNs) for robust spatial feature extraction and integrates Conformers with self-attention mechanism to effectively capture temporal dependencies. Log-mel spectrogram and generalized cross-correlation with phase transform (GCC-PHAT) features are employed as input representations. To further enhance the model performance, we introduce an Adaptive Gain Control (AGC) layer, that adaptively adjusts the amplitude of input features, ensuring consistent energy levels across varying ranges, signal strengths, and noise conditions. We assess the model’s generalization capability by training it in one domain and testing it in a different domain, using only a limited amount of data from the test domain for fine-tuning. Our proposed method outperforms state-of-the-art (SOTA) approaches in similar settings, establishing new benchmarks for underwater sound localization.
nan
Article 1021
Title@2025-06-20 (5): Zero-Shot NAS via the Suppression of Local Entropy Decrease
Title: Zero-Shot NAS via the Suppression of Local Entropy Decrease | Zero-Shot NAS durch die Unterdrückung der lokalen Entropie-Verringerung | 通过制止局部星气减少,零热NAS 2411.06236v3 |
Authors (4): Ning Wu, Han Huang, Yueting Xu, Zhifeng Hao
Architecture performance evaluation is the most time-consuming part of neural architecture search (NAS). Zero-Shot NAS accelerates the evaluation by utilizing zero-cost proxies instead of training. Though effective, existing zero-cost proxies require invoking backpropagations or running networks on input data, making it difficult to further accelerate the computation of proxies. To alleviate this issue, architecture topologies are used to evaluate the performance of networks in this study. We prove that particular architectural topologies decrease the local entropy of feature maps, which degrades specific features to a bias, thereby reducing network performance. Based on this proof, architectural topologies are utilized to quantify the suppression of local entropy decrease (SED) as a data-free and running-free proxy. Experimental results show that SED outperforms most state-of-the-art proxies in terms of architecture selection on five benchmarks, with computation time reduced by three orders of magnitude. We further compare the SED-based NAS with state-of-the-art proxies. SED-based NAS selects the architecture with higher accuracy and fewer parameters in only one second. The theoretical analyses of local entropy and experimental results demonstrate that the suppression of local entropy decrease facilitates selecting optimal architectures in Zero-Shot NAS.
nan
Article 1022
Title@2025-06-20 (5): Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting
Title: Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting | Part$^{2}$GS: Teilbewusste Modellierung von artikulierten Objekten mittels 3D Gaussian Splatting | *2美元=2美元=GS:使用 3D Gaussian Splatting 3D 的人工物体部分认知建模 2506.17212v1 |
Authors (6): Tianjiao Yu, Vedant Shah, Muntasir Wahed, Ying Shen, Kiet A. Nguyen, Ismini Lourentzou
Articulated objects are common in the real world, yet modeling their structure and motion remains a challenging task for 3D reconstruction methods. In this work, we introduce Part$^{2}$GS, a novel framework for modeling articulated digital twins of multi-part objects with high-fidelity geometry and physically consistent articulation. Part$^{2}$GS leverages a part-aware 3D Gaussian representation that encodes articulated components with learnable attributes, enabling structured, disentangled transformations that preserve high-fidelity geometry. To ensure physically consistent motion, we propose a motion-aware canonical representation guided by physics-based constraints, including contact enforcement, velocity consistency, and vector-field alignment. Furthermore, we introduce a field of repel points to prevent part collisions and maintain stable articulation paths, significantly improving motion coherence over baselines. Extensive evaluations on both synthetic and real-world datasets show that Part$^{2}$GS consistently outperforms state-of-the-art methods by up to 10$\times$ in Chamfer Distance for movable parts.
nan
Article 1023
Title@2025-06-20 (5): BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Title: BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning | BREAD: Verzweigte Rollouts von Expert Anchors Bridge SFT & RL für die Vernunft | 专家领航桥SFT和RL的分包推演 2506.17211v1 |
Authors (6): Xuechen Zhang, Zijian Huang, Yingcong Li, Chenshun Ni, Jiasi Chen, Samet Oymak
Small language models (SLMs) struggle to learn complex reasoning behaviors, especially when high-quality traces are scarce or difficult to learn from. The standard training approach combines a supervised fine-tuning (SFT) stage, often to distill capabilities of a larger model, followed by a reinforcement learning (RL)stage such as Group Relative Policy Optimization (GRPO). In this paper, we investigate the fundamental limitations of this SFT + RL paradigm and propose methods to overcome them. Under a suitable theoretical model, we demonstrate that the SFT + RL strategy can fail completely when (1) the expert’s traces are too difficult for the small model to express, or (2) the small model’s initialization has exponentially small likelihood of success. To address these, we introduce BREAD: a GRPO variant that unifies the SFT and RL stages via partial expert guidance and branched rollouts. When self-generated traces fail, BREAD adaptively inserts short expert prefixes/hints, allowing the small model to complete the rest of the reasoning path, and ensuring that each update includes at least one successful trace. This mechanism both densifies the reward signal and induces a natural learning curriculum. BREAD requires fewer than 40% of ground-truth traces, consistently outperforming standard GRPO while speeding up the training by about 3 times. Importantly, we demonstrate that BREAD helps the model solve problems that are otherwise unsolvable by the SFT + RL strategy, highlighting how branched rollouts and expert guidance can substantially boost SLM reasoning.
nan
Article 1024
Title@2025-06-20 (5): AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability
Title: AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability | AQA-Bench: Ein interaktiver Benchmark für die Bewertung der sequenziellen Begründungsfähigkeit von LLMs | AQA- “ AQA-区 “ :评估LLLMs按顺序推理能力的互动基准 2402.09404v2 |
Authors (3): Siwei Yang, Bingchen Zhao, Cihang Xie
This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS). The key feature of our evaluation benchmark lies in its interactive evaluation protocol - for example, in DFS, the availability of each node’s connected edge is contingent upon the model’s traversal to that node, thereby necessitating the LLM’s ability to effectively remember visited nodes and strategize subsequent moves considering the possible environmental feedback in the future steps. We comprehensively build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search, and to evaluate the sequential reasoning ability of 14 different LLMs. Our investigations reveal several interesting findings: (1) Closed-source models like GPT-4 and Gemini generally show much stronger sequential reasoning ability, significantly outperforming open-source LLMs. (2) Naively providing in-context examples may inadvertently hurt few-shot performance in an interactive environment due to over-fitting to examples. (3) Instead of using optimal steps from another test case as the in-context example, a very limited number of predecessor steps in the current test case following the optimal policy can substantially boost small models’ performance. (4) The performance gap between weak models and strong models is greatly due to the incapability of weak models to start well. (5) The scaling correlation between performance and model size is not always significant, sometimes even showcasing an inverse trend. We hope our study can catalyze future work on advancing the understanding and enhancement of LLMs’ capabilities in sequential reasoning. The code is available at https://github.com/UCSC-VLAA/AQA-Bench.
nan
Article 1025
Title@2025-06-20 (5): DreamCube: 3D Panorama Generation via Multi-plane Synchronization
Title: DreamCube: 3D Panorama Generation via Multi-plane Synchronization | DreamCube: 3D-Panorama-Generation über Multi-Plane-Synchronisierung | DreamCube:3D全景生成,通过多飞机同步同步 2506.17206v1 |
Authors (5): Yukun Huang, Yanning Zhou, Jianan Wang, Kaiyi Huang, Xihui Liu
3D panorama synthesis is a promising yet challenging task that demands high-quality and diverse visual appearance and geometry of the generated omnidirectional content. Existing methods leverage rich image priors from pre-trained 2D foundation models to circumvent the scarcity of 3D panoramic data, but the incompatibility between 3D panoramas and 2D single views limits their effectiveness. In this work, we demonstrate that by applying multi-plane synchronization to the operators from 2D foundation models, their capabilities can be seamlessly extended to the omnidirectional domain. Based on this design, we further introduce DreamCube, a multi-plane RGB-D diffusion model for 3D panorama generation, which maximizes the reuse of 2D foundation model priors to achieve diverse appearances and accurate geometry while maintaining multi-view consistency. Extensive experiments demonstrate the effectiveness of our approach in panoramic image generation, panoramic depth estimation, and 3D scene generation.
nan
Article 1026
Title@2025-06-20 (5): Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Title: Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning | Netzwerksparsität entsperrt das Skalierungspotenzial von Deep Reinforcement Learning | 网络分化 释放深强化学习的扩大潜力 2506.17204v1 |
Authors (6): Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao
Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.
nan
Article 1027
Title@2025-06-20 (5): DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments
Title: DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments | DAL: Ein praktisches Prior-Free Black-Box Framework für nicht-stationäre Bandit-Umgebungen | DAL:非高度强盗环境实际的、事先免费的黑盒框架 2501.19401v3 |
Authors (4): Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli
We introduce a practical, black-box framework termed Detection Augmenting Learning (DAL) for the problem of non-stationary bandits without prior knowledge of the underlying non-stationarity. DAL is modular, accepting any stationary bandit algorithm as input and augmenting it with a change detector. Our approach is applicable to all common parametric and non-parametric bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses current state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL’s strong empirical performance on piecewise stationary and drift settings, complemented by thorough experimental validation.
nan
Article 1028
Title@2025-06-20 (5): Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres
Title: Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres | Schrödinger-Brücke passend für baumstrukturierte Kosten und entropische Wasserstein-Barycentres | Schrödinger桥,与树木结构成本和Entropic Wasserstein Barycentres 相匹配 2506.17197v1 |
Authors (3): Samuel Howard, Peter Potaptchik, George Deligiannidis
Recent advances in flow-based generative modelling have provided scalable methods for computing the Schr"odinger Bridge (SB) between distributions, a dynamic form of entropy-regularised Optimal Transport (OT) for the quadratic cost. The successful Iterative Markovian Fitting (IMF) procedure solves the SB problem via sequential bridge-matching steps, presenting an elegant and practical approach with many favourable properties over the more traditional Iterative Proportional Fitting (IPF) procedure. Beyond the standard setting, optimal transport can be generalised to the multi-marginal case in which the objective is to minimise a cost defined over several marginal distributions. Of particular importance are costs defined over a tree structure, from which Wasserstein barycentres can be recovered as a special case. In this work, we extend the IMF procedure to solve for the tree-structured SB problem. Our resulting algorithm inherits the many advantages of IMF over IPF approaches in the tree-based setting. In the specific case of Wasserstein barycentres, our approach can be viewed as extending fixed-point approaches for barycentre computation to the case of flow-based entropic OT solvers.
nan
Article 1029
Title@2025-06-20 (5): Optimal Implicit Bias in Linear Regression
Title: Optimal Implicit Bias in Linear Regression | Optimale Implizite Bias bei linearer Regression | 线性回归中的优化隐含比值 2506.17187v1 |
Authors (2): Kanumuri Nithin Varma, Babak Hassibi
Most modern learning problems are over-parameterized, where the number of learnable parameters is much greater than the number of training data points. In this over-parameterized regime, the training loss typically has infinitely many global optima that completely interpolate the data with varying generalization performance. The particular global optimum we converge to depends on the implicit bias of the optimization algorithm. The question we address in this paper is, ``What is the implicit bias that leads to the best generalization performance?”. To find the optimal implicit bias, we provide a precise asymptotic analysis of the generalization performance of interpolators obtained from the minimization of convex functions/potentials for over-parameterized linear regression with non-isotropic Gaussian data. In particular, we obtain a tight lower bound on the best generalization error possible among this class of interpolators in terms of the over-parameterization ratio, the variance of the noise in the labels, the eigenspectrum of the data covariance, and the underlying distribution of the parameter to be estimated. Finally, we find the optimal convex implicit bias that achieves this lower bound under certain sufficient conditions involving the log-concavity of the distribution of a Gaussian convolved with the prior of the true underlying parameter.
nan
Article 1030
Title@2025-06-20 (5): Variational Learning of Disentangled Representations
Title: Variational Learning of Disentangled Representations | Variationelles Lernen von entfremdeten Repräsentationen | 不同代表的不同学习 2506.17182v1 |
Authors (4): Yuli Slavutsky, Ozgur Beker, David Blei, Bianca Dumitrascu
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions. Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors. DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel reconstructions that ensure both representations remain informative; and (iii) a novel max-min objective that encourages clean separation without relying on handcrafted priors, while making only minimal assumptions. Theoretically, we show that this objective maximizes data likelihood while promoting disentanglement, and that it admits a unique equilibrium. Empirically, we demonstrate that DISCoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data. Together, these results establish DISCoVeR as a principled approach for learning disentangled representations in multi-condition settings.
nan
Article 1031
Title@2025-06-20 (5): Convergent Linear Representations of Emergent Misalignment
Title: Convergent Linear Representations of Emergent Misalignment | Convergent Lineare Darstellungen von Emergent Fehlausrichtung | 新出现的对接不均现象的一致线性代表 2506.11618v2 |
Authors (4): Anna Soligo, Edward Turner, Senthooran Rajamanoharan, Neel Nanda
Fine-tuning large language models on narrow datasets can cause them to develop broadly misaligned behaviours: a phenomena known as emergent misalignment. However, the mechanisms underlying this misalignment, and why it generalizes beyond the training domain, are poorly understood, demonstrating critical gaps in our knowledge of model alignment. In this work, we train and study a minimal model organism which uses just 9 rank-1 adapters to emergently misalign Qwen2.5-14B-Instruct. Studying this, we find that different emergently misaligned models converge to similar representations of misalignment. We demonstrate this convergence by extracting a ‘misalignment direction’ from one fine-tuned model’s activations, and using it to effectively ablate misaligned behaviour from fine-tunes using higher dimensional LoRAs and different datasets. Leveraging the scalar hidden state of rank-1 LoRAs, we further present a set of experiments for directly interpreting the fine-tuning adapters, showing that six contribute to general misalignment, while two specialise for misalignment in just the fine-tuning domain. Emergent misalignment is a particularly salient example of undesirable and unexpected model behaviour and by advancing our understanding of the mechanisms behind it, we hope to move towards being able to better understand and mitigate misalignment more generally.
nan
Article 1032
Title@2025-06-20 (5): Deep generative models as the probability transformation functions
Title: Deep generative models as the probability transformation functions | Tiefe generative Modelle als die Wahrscheinlichkeitstransformationsfunktionen | 深基因模型作为概率转换功能 2506.17171v1 |
Authors (5): Vitalii Bondar, Vira Babenko, Roman Trembovetskyi, Yurii Korobeinyk, Viktoriya Dzyuba
This paper introduces a unified theoretical perspective that views deep generative models as probability transformation functions. Despite the apparent differences in architecture and training methodologies among various types of generative models - autoencoders, autoregressive models, generative adversarial networks, normalizing flows, diffusion models, and flow matching - we demonstrate that they all fundamentally operate by transforming simple predefined distributions into complex target data distributions. This unifying perspective facilitates the transfer of methodological improvements between model architectures and provides a foundation for developing universal theoretical approaches, potentially leading to more efficient and effective generative modeling techniques.
nan
Article 1033
Title@2025-06-20 (5): EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback
Title: EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback | EF21 mit Glocken & Pfeifen: Sechs algorithmische Erweiterungen des modernen Fehlerrückblicks | EF21 配有 “ 钟声和吹哨:现代错误反馈的六种演算扩展 2110.03294v2 |
Authors (5): Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is $O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$). Recently, Richt'arik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum, and bidirectional compression. To the best of our knowledge, several of these techniques have not been previously analyzed in combination with EF, and in cases where prior analysis exists – such as for bidirectional compression – our theoretical convergence guarantees significantly improve upon existing results.
nan
Article 1034
Title@2025-06-20 (5): A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models
Title: A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models | Minimalistische Methode zur Feinabstimmung von Text-zu-Bild-Diffusions-Modellen | 微微调文本到图像传播模型的微量微调方法 2506.12036v2 |
Authors (4): Yanting Miao, William Loh, Suraj Kothawade, Pacal Poupart
Recent work uses reinforcement learning (RL) to fine-tune text-to-image diffusion models, improving text-image alignment and sample quality. However, existing approaches introduce unnecessary complexity: they cache the full sampling trajectory, depend on differentiable reward models or large preference datasets, or require specialized guidance techniques. Motivated by the “golden noise” hypothesis – that certain initial noise samples can consistently yield superior alignment – we introduce Noise PPO, a minimalist RL algorithm that leaves the pre-trained diffusion model entirely frozen and learns a prompt-conditioned initial noise generator. Our approach requires no trajectory storage, reward backpropagation, or complex guidance tricks. Extensive experiments show that optimizing the initial noise distribution consistently improves alignment and sample quality over the original model, with the most significant gains at low inference steps. As the number of inference steps increases, the benefit of noise optimization diminishes but remains present. These findings clarify the scope and limitations of the golden noise hypothesis and reinforce the practical value of minimalist RL fine-tuning for diffusion models.
nan
Article 1035
Title@2025-06-20 (5): Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity
Title: Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity | Sparse-Reg: Verbesserung der Probenkomplexität im Offline-Verstärkungs-Lernen mit Sparsity | 利用公平性改进离线强化学习的抽样复杂性 2506.17155v1 |
Authors (3): Samin Yeasar Arnob, Scott Fujimoto, Doina Precup
In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce “Sparse-Reg”: a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.
nan
Article 1036
Title@2025-06-20 (5): Do We Need Large VLMs for Spotting Soccer Actions?
Title: Do We Need Large VLMs for Spotting Soccer Actions? | Brauchen wir große VLMs zum Spotting von Fußball-Aktionen? | 我们是否需要大VLMs来发现足球行动? 2506.17144v1 |
Authors (4): Ritabrata Chakraborty, Rajatsubhra Chakraborty, Avijit Dasgupta, Sandeep Chaurasia
Traditional video-based tasks like soccer action spotting rely heavily on visual inputs, often requiring complex and computationally expensive models to process dense video data. In this work, we propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable by utilizing Large Language Models (LLMs) instead of Vision-Language Models (VLMs). We posit that expert commentary, which provides rich, fine-grained descriptions and contextual cues such as excitement and tactical insights, contains enough information to reliably spot key actions in a match. To demonstrate this, we use the SoccerNet Echoes dataset, which provides timestamped commentary, and employ a system of three LLMs acting as judges specializing in outcome, excitement, and tactics. Each LLM evaluates sliding windows of commentary to identify actions like goals, cards, and substitutions, generating accurate timestamps for these events. Our experiments show that this language-centric approach performs effectively in detecting critical match events, providing a lightweight and training-free alternative to traditional video-based methods for action spotting.
nan
Article 1037
Title@2025-06-20 (5): Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models
Title: Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models | Konsequente Probenahme und Simulation: Molekulare Dynamik mit energiebasierten Diffusionsmodellen | 一致的取样和模拟:以能源为基础的扩散模型的分子动态 2506.17139v1 |
Authors (5): Michael Plainer, Hao Wu, Leon Klein, Stephan Günnemann, Frank Noé
Diffusion models have recently gained significant attention due to their effectiveness in various scientific domains, including biochemistry. When trained on equilibrium molecular distributions, diffusion models provide both: a generative procedure to sample equilibrium conformations and associated forces derived from the model’s scores. However, using the forces for coarse-grained molecular dynamics simulations uncovers inconsistencies in the samples generated via classical diffusion inference and simulation, despite both originating from the same model. Particularly at the small diffusion timesteps required for simulations, diffusion models fail to satisfy the Fokker-Planck equation, which governs how the score should evolve over time. We interpret this deviation as an indication of the observed inconsistencies and propose an energy-based diffusion model with a Fokker-Planck-derived regularization term enforcing consistency. We demonstrate the effectiveness of our approach on toy systems, alanine dipeptide, and introduce a state-of-the-art transferable Boltzmann emulator for dipeptides that supports simulation and demonstrates enhanced consistency and efficient sampling.
nan
Article 1038
Title@2025-06-20 (5): Robust Training with Data Augmentation for Medical Imaging Classification
Title: Robust Training with Data Augmentation for Medical Imaging Classification | Robustes Training mit Datenvergrößerung für die Klassifikation der medizinischen Bildgebung | 医学成像分类数据强化强力培训 2506.17133v1 |
Authors (4): Josué Martínez-Martínez, Olivia Brown, Mostafa Karami, Sheida Nabavi
Deep neural networks are increasingly being used to detect and diagnose medical conditions using medical imaging. Despite their utility, these models are highly vulnerable to adversarial attacks and distribution shifts, which can affect diagnostic reliability and undermine trust among healthcare professionals. In this study, we propose a robust training algorithm with data augmentation (RTDA) to mitigate these vulnerabilities in medical image classification. We benchmark classifier robustness against adversarial perturbations and natural variations of RTDA and six competing baseline techniques, including adversarial training and data augmentation approaches in isolation and combination, using experimental data sets with three different imaging technologies (mammograms, X-rays, and ultrasound). We demonstrate that RTDA achieves superior robustness against adversarial attacks and improved generalization performance in the presence of distribution shift in each image classification task while maintaining high clean accuracy.
nan
Article 1039
Title@2025-06-20 (5): Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model
Title: Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model | Schnelle und kontinuierliche Vertrauensbewertung für effektive Aufgabenkooperation durch Siamesisches Modell | 通过西亚模式对有效任务协作进行快速和持续信任评价 2506.17128v1 |
Authors (2): Botao Zhu, Xianbin Wang
Trust is emerging as an effective tool to ensure the successful completion of collaborative tasks within collaborative systems. However, rapidly and continuously evaluating the trustworthiness of collaborators during task execution is a significant challenge due to distributed devices, complex operational environments, and dynamically changing resources. To tackle this challenge, this paper proposes a Siamese-enabled rapid and continuous trust evaluation framework (SRCTE) to facilitate effective task collaboration. First, the communication and computing resource attributes of the collaborator in a trusted state, along with historical collaboration data, are collected and represented using an attributed control flow graph (ACFG) that captures trust-related semantic information and serves as a reference for comparison with data collected during task execution. At each time slot of task execution, the collaborator’s communication and computing resource attributes, as well as task completion effectiveness, are collected in real time and represented with an ACFG to convey their trust-related semantic information. A Siamese model, consisting of two shared-parameter Structure2vec networks, is then employed to learn the deep semantics of each pair of ACFGs and generate their embeddings. Finally, the similarity between the embeddings of each pair of ACFGs is calculated to determine the collaborator’s trust value at each time slot. A real system is built using two Dell EMC 5200 servers and a Google Pixel 8 to test the effectiveness of the proposed SRCTE framework. Experimental results demonstrate that SRCTE converges rapidly with only a small amount of data and achieves a high anomaly trust detection rate compared to the baseline algorithm.
nan
Article 1040
Title@2025-06-20 (5): Watermarking Language Models through Language Models
Title: Watermarking Language Models through Language Models | Wasserzeichen von Sprachmodellen durch Sprachmodelle | 通过语言模型建立语言模型 2411.05091v2 |
Authors (3): Agnibh Dasgupta, Abdullah Tanvir, Xin Zhong
Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a Prompting LM that synthesizes watermarking instructions from user prompts, a Marking LM that generates watermarked outputs conditioned on these instructions, and a Detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that adapts to individual prompts while remaining compatible with diverse LLM architectures, including both proprietary and open-weight models. We evaluate the framework over 25 combinations of Prompting and Marking LMs, such as GPT-4o, Mistral, LLaMA3, and DeepSeek. Experimental results show that watermark signals generalize across architectures and remain robust under fine-tuning, model distillation, and prompt-based adversarial attacks, demonstrating the effectiveness and robustness of the proposed approach.
nan
Article 1041
Title@2025-06-20 (5): TransDreamerV3: Implanting Transformer In DreamerV3
Title: TransDreamerV3: Implanting Transformer In DreamerV3 | TransDreamerV3: Implantationstransformator in DreamerV3 | TransDreamerV3: 在梦中植入变异器 2506.17103v1 |
Authors (4): Shruti Sadanand Dongare, Amun Kharel, Jonathan Samuel, Xiaona Zhou
This paper introduces TransDreamerV3, a reinforcement learning model that enhances the DreamerV3 architecture by integrating a transformer encoder. The model is designed to improve memory and decision-making capabilities in complex environments. We conducted experiments on Atari-Boxing, Atari-Freeway, Atari-Pong, and Crafter tasks, where TransDreamerV3 demonstrated improved performance over DreamerV3, particularly in the Atari-Freeway and Crafter tasks. While issues in the Minecraft task and limited training across all tasks were noted, TransDreamerV3 displays advancement in world model-based reinforcement learning, leveraging transformer architectures.
nan
Article 1042
Title@2025-06-20 (5): Identifiability of Deep Polynomial Neural Networks
Title: Identifiability of Deep Polynomial Neural Networks | Identifizierbarkeit von tiefpolynomischen neuralen Netzwerken | 深多元神经网络的可识别性 2506.17093v1 |
Authors (4): Konstantin Usevich, Clara Dérand, Ricardo Borsoi, Marianne Clausel
Polynomial Neural Networks (PNNs) possess a rich algebraic and geometric structure. However, their identifiability – a key property for ensuring interpretability – remains poorly understood. In this work, we present a comprehensive analysis of the identifiability of deep PNNs, including architectures with and without bias terms. Our results reveal an intricate interplay between activation degrees and layer widths in achieving identifiability. As special cases, we show that architectures with non-increasing layer widths are generically identifiable under mild conditions, while encoder-decoder networks are identifiable when the decoder widths do not grow too rapidly. Our proofs are constructive and center on a connection between deep PNNs and low-rank tensor decompositions, and Kruskal-type uniqueness theorems. This yields both generic conditions determined by the architecture, and effective conditions that depend on the network’s parameters. We also settle an open conjecture on the expected dimension of PNN’s neurovarieties, and provide new bounds on the activation degrees required for it to reach its maximum.
nan
Article 1043
Title@2025-06-20 (5): Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Title: Domain Specific Benchmarks for Evaluating Multimodal Large Language Models | Domainspezifische Benchmarks für die Bewertung multimodaler Großsprachenmodelle | 评价多模式大语言模式的具体域域基准 2506.12958v2 |
Authors (13): Khizar Anjum, Muhammad Arbab Arshad, Kadhim Hayawi, Efstathios Polyzos, Asadullah Tariq, Mohamed Adel Serhani, Laiba Batool, Brady Lund, Nishith Reddy Mannuru, Ravi Varma Kumar Bevara, Taslim Mahbub, Muhammad Zeeshan Akram, Sakib Shahriar
Large language models (LLMs) are increasingly being deployed across disciplines due to their advanced reasoning and problem solving capabilities. To measure their effectiveness, various benchmarks have been developed that measure aspects of LLM reasoning, comprehension, and problem-solving. While several surveys address LLM evaluation and benchmarks, a domain-specific analysis remains underexplored in the literature. This paper introduces a taxonomy of seven key disciplines, encompassing various domains and application areas where LLMs are extensively utilized. Additionally, we provide a comprehensive review of LLM benchmarks and survey papers within each domain, highlighting the unique capabilities of LLMs and the challenges faced in their application. Finally, we compile and categorize these benchmarks by domain to create an accessible resource for researchers, aiming to pave the way for advancements toward artificial general intelligence (AGI)
nan
Article 1044
Title@2025-06-20 (5): Neural Polar Decoders for DNA Data Storage
Title: Neural Polar Decoders for DNA Data Storage | Neuronale Polardecoder für die DNA-Datenspeicherung | DNA数据存储的神经极极代号 2506.17076v1 |
Authors (2): Ziv Aharoni, Henry D. Pfister
Synchronization errors, such as insertions and deletions, present a fundamental challenge in DNA-based data storage systems, arising from both synthesis and sequencing noise. These channels are often modeled as insertion-deletion-substitution (IDS) channels, for which designing maximum-likelihood decoders is computationally expensive. In this work, we propose a data-driven approach based on neural polar decoders (NPDs) to design low-complexity decoders for channels with synchronization errors. The proposed architecture enables decoding over IDS channels with reduced complexity $O(AN log N )$, where $A$ is a tunable parameter independent of the channel. NPDs require only sample access to the channel and can be trained without an explicit channel model. Additionally, NPDs provide mutual information (MI) estimates that can be used to optimize input distributions and code design. We demonstrate the effectiveness of NPDs on both synthetic deletion and IDS channels. For deletion channels, we show that NPDs achieve near-optimal decoding performance and accurate MI estimation, with significantly lower complexity than trellis-based decoders. We also provide numerical estimates of the channel capacity for the deletion channel. We extend our evaluation to realistic DNA storage settings, including channels with multiple noisy reads and real-world Nanopore sequencing data. Our results show that NPDs match or surpass the performance of existing methods while using significantly fewer parameters than the state-of-the-art. These findings highlight the promise of NPDs for robust and efficient decoding in DNA data storage systems.
nan
Article 1045
Title@2025-06-20 (5): Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
Title: Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting | Diffusion & Adversarial Schrödinger Brücken über iterative Proportionale Markovian Fitting | 通过迭代比例相称马尔科维安健身桥 2410.02601v3 |
Authors (9): Sergei Kholkin, Grigoriy Ksenofontov, David Li, Nikita Kornilov, Nikita Gushchin, Alexandra Suvorikova, Alexey Kroshnin, Evgeny Burnaev, Alexander Korotin
The Iterative Markovian Fitting (IMF) procedure, which iteratively projects onto the space of Markov processes and the reciprocal class, successfully solves the Schr"odinger Bridge (SB) problem. However, an efficient practical implementation requires a heuristic modification - alternating between fitting forward and backward time diffusion at each iteration. This modification is crucial for stabilizing training and achieving reliable results in applications such as unpaired domain translation. Our work reveals a close connection between the modified version of IMF and the Iterative Proportional Fitting (IPF) procedure - a foundational method for the SB problem, also known as Sinkhorn’s algorithm. Specifically, we demonstrate that the heuristic modification of the IMF effectively integrates both IMF and IPF procedures. We refer to this combined approach as the Iterative Proportional Markovian Fitting (IPMF) procedure. Through theoretical and empirical analysis, we establish the convergence of IPMF procedure under various settings, contributing to developing a unified framework for solving SB problems. Moreover, from a practical standpoint, the IPMF procedure enables a flexible trade-off between image similarity and generation quality, offering a new mechanism for tailoring models to specific tasks.
nan
Article 1046
Title@2025-06-20 (5): Al-Khwarizmi: Discovering Physical Laws with Foundation Models
Title: Al-Khwarizmi: Discovering Physical Laws with Foundation Models | Al-Khwarizmi: Physikalische Gesetze mit Stiftungsmodellen entdecken | Al-Khwarizmi:利用基金会模式发现实体法 2502.01702v2 |
Authors (2): Christopher E. Mower, Haitham Bou-Ammar
Inferring physical laws from data is a central challenge in science and engineering, including but not limited to healthcare, physical sciences, biosciences, social sciences, sustainability, climate, and robotics. Deep networks offer high-accuracy results but lack interpretability, prompting interest in models built from simple components. The Sparse Identification of Nonlinear Dynamics (SINDy) method has become the go-to approach for building such modular and interpretable models. SINDy leverages sparse regression with L1 regularization to identify key terms from a library of candidate functions. However, SINDy’s choice of candidate library and optimization method requires significant technical expertise, limiting its widespread applicability. This work introduces Al-Khwarizmi, a novel agentic framework for physical law discovery from data, which integrates foundational models with SINDy. Leveraging LLMs, VLMs, and Retrieval-Augmented Generation (RAG), our approach automates physical law discovery, incorporating prior knowledge and iteratively refining candidate solutions via reflection. Al-Khwarizmi operates in two steps: it summarizes system observations-comprising textual descriptions, raw data, and plots-followed by a secondary step that generates candidate feature libraries and optimizer configurations to identify hidden physics laws correctly. Evaluating our algorithm on over 198 models, we demonstrate state-of-the-art performance compared to alternatives, reaching a 20 percent increase against the best-performing alternative.
nan
Article 1047
Title@2025-06-20 (5): Safe Guaranteed Exploration for Non-linear Systems
Title: Safe Guaranteed Exploration for Non-linear Systems | Sichere, garantierte Exploration für nichtlineare Systeme | 非线性系统安全保证勘探 2402.06562v2 |
Authors (5): Manish Prajapat, Johannes Köhler, Matteo Turchetta, Andreas Krause, Melanie N. Zeilinger
Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. We improve the efficiency of this general framework by proposing an algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC leverages three key techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.
nan
Article 1048
Title@2025-06-20 (5): Empowering Near-Field Communications in Low-Altitude Economy with LLM: Fundamentals, Potentials, Solutions, and Future Directions
Title: Empowering Near-Field Communications in Low-Altitude Economy with LLM: Fundamentals, Potentials, Solutions, and Future Directions | Stärkung der Nahfeldkommunikation in Low-Altitude Economy mit LLM: Grundlagen, Potenziale, Lösungen und Zukunftsrichtungen | 以LLM:基础、潜力、解决方案和未来方向,增强低度经济中近地通信能力 2506.17067v1 |
Authors (3): Zhuo Xu, Tianyue Zheng, Linglong Dai
The low-altitude economy (LAE) is gaining significant attention from academia and industry. Fortunately, LAE naturally aligns with near-field communications in extremely large-scale MIMO (XL-MIMO) systems. By leveraging near-field beamfocusing, LAE can precisely direct beam energy to unmanned aerial vehicles, while the additional distance dimension boosts overall spectrum efficiency. However, near-field communications in LAE still face several challenges, such as the increase in signal processing complexity and the necessity of distinguishing between far and near-field users. Inspired by the large language models (LLM) with powerful ability to handle complex problems, we apply LLM to solve challenges of near-field communications in LAE. The objective of this article is to provide a comprehensive analysis and discussion on LLM-empowered near-field communications in LAE. Specifically, we first introduce fundamentals of LLM and near-field communications, including the key advantages of LLM and key characteristics of near-field communications. Then, we reveal the opportunities and challenges of near-field communications in LAE. To address these challenges, we present a LLM-based scheme for near-field communications in LAE, and provide a case study which jointly distinguishes far and near-field users and designs multi-user precoding matrix. Finally, we outline and highlight several future research directions and open issues.
nan
Article 1049
Title@2025-06-20 (5): Flow-Based Non-stationary Temporal Regime Causal Structure Learning
Title: Flow-Based Non-stationary Temporal Regime Causal Structure Learning | Fließbasiertes nicht-stationäres Temporalregime Kausalstrukturlernen | 以流动为基础的非静止不流动时间制度因果结构学习 2506.17065v1 |
Authors (2): Abdellah Rahmani, Pascal Frossard
Understanding causal relationships in multivariate time series is crucial in many scenarios, such as those dealing with financial or neurological data. Many such time series exhibit multiple regimes, i.e., consecutive temporal segments with a priori unknown boundaries, with each regime having its own causal structure. Inferring causal dependencies and regime shifts is critical for analyzing the underlying processes. However, causal structure learning in this setting is challenging due to (1) non stationarity, i.e., each regime can have its own causal graph and mixing function, and (2) complex noise distributions, which may be non Gaussian or heteroscedastic. Existing causal discovery approaches cannot address these challenges, since generally assume stationarity or Gaussian noise with constant variance. Hence, we introduce FANTOM, a unified framework for causal discovery that handles non stationary processes along with non Gaussian and heteroscedastic noises. FANTOM simultaneously infers the number of regimes and their corresponding indices and learns each regime’s Directed Acyclic Graph. It uses a Bayesian Expectation Maximization algorithm that maximizes the evidence lower bound of the data log likelihood. On the theoretical side, we prove, under mild assumptions, that temporal heteroscedastic causal models, introduced in FANTOM’s formulation, are identifiable in both stationary and non stationary settings. In addition, extensive experiments on synthetic and real data show that FANTOM outperforms existing methods.
nan
Article 1050
Title@2025-06-20 (5): Client Selection Strategies for Federated Semantic Communications in Heterogeneous IoT Networks
Title: Client Selection Strategies for Federated Semantic Communications in Heterogeneous IoT Networks | Kundenauswahlstrategien für die gefederte semantische Kommunikation in heterogenen IoT-Netzwerken | 异源性互联网网络中联邦语义通信的客户选择战略 2506.17063v1 |
Authors (2): Samer Lahoud, Kinda Khawam
The exponential growth of IoT devices presents critical challenges in bandwidth-constrained wireless networks, particularly regarding efficient data transmission and privacy preservation. This paper presents a novel federated semantic communication (SC) framework that enables collaborative training of bandwidth-efficient models for image reconstruction across heterogeneous IoT devices. By leveraging SC principles to transmit only semantic features, our approach dramatically reduces communication overhead while preserving reconstruction quality. We address the fundamental challenge of client selection in federated learning environments where devices exhibit significant disparities in dataset sizes and data distributions. Our framework implements three distinct client selection strategies that explore different trade-offs between system performance and fairness in resource allocation. The system employs an end-to-end SC architecture with semantic bottlenecks, coupled with a loss-based aggregation mechanism that naturally adapts to client heterogeneity. Experimental evaluation on image data demonstrates that while Utilitarian selection achieves the highest reconstruction quality, Proportional Fairness maintains competitive performance while significantly reducing participation inequality and improving computational efficiency. These results establish that federated SC can successfully balance reconstruction quality, resource efficiency, and fairness in heterogeneous IoT deployments, paving the way for sustainable and privacy-preserving edge intelligence applications.
nan
Article 1051
Title@2025-06-20 (5): SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
Title: SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | SAFEx: Analysieren von Schwachstellen von MoE-basierten LLMs durch stabile sicherheitskritische Expertenidentifikation | SAFEx:通过稳定安全-关键专家鉴定分析以MOE为基础的LLMLMLLMs的脆弱性 2506.17368v1 |
Authors (8): Zhenglin Lai, Mengyao Liao, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li, Bingzhe Wu
Large language models based on Mixture-of-Experts have achieved substantial gains in efficiency and scalability, yet their architectural uniqueness introduces underexplored safety alignment challenges. Existing safety alignment strategies, predominantly designed for dense models, are ill-suited to address MoE-specific vulnerabilities. In this work, we formalize and systematically study MoE model’s positional vulnerability - the phenomenon where safety-aligned behaviors rely on specific expert modules, revealing critical risks inherent to MoE architectures. To this end, we present SAFEx, an analytical framework that robustly identifies, characterizes, and validates the safety-critical experts using a novel Stability-based Expert Selection (SES) algorithm. Notably, our approach enables the explicit decomposition of safety-critical experts into distinct functional groups, including those responsible for harmful content detection and those controlling safe response generation. Extensive experiments on mainstream MoE models, such as the recently released Qwen3-MoE, demonstrated that their intrinsic safety mechanisms heavily rely on a small subset of positional experts. Disabling these experts significantly compromised the models’ ability to refuse harmful requests. For Qwen3-MoE with 6144 experts (in the FNN layer), we find that disabling as few as 12 identified safety-critical experts can cause the refusal rate to drop by 22%, demonstrating the disproportionate impact of a small set of experts on overall model safety.
nan
Article 1052
Title@2025-06-20 (5): Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Title: Universal Music Representations? Evaluating Foundation Models on World Music Corpora | Universal Music Representations? Bewertung von Stiftungsmodellen auf World Music Corpora | 世界音乐公司模型评估基金会 2506.17055v1 |
Authors (3): Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize across diverse musical traditions. This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora spanning Western popular, Greek, Turkish, and Indian classical traditions. We employ three complementary methodologies to investigate these models’ cross-cultural capabilities: probing to assess inherent representations, targeted supervised fine-tuning of 1-2 layers, and multi-label few-shot learning for low-resource scenarios. Our analysis shows varying cross-cultural generalization, with larger models typically outperforming on non-Western music, though results decline for culturally distant traditions. Notably, our approaches achieve state-of-the-art performance on five out of six evaluated datasets, demonstrating the effectiveness of foundation models for world music understanding. We also find that our targeted fine-tuning approach does not consistently outperform probing across all settings, suggesting foundation models already encode substantial musical knowledge. Our evaluation framework and benchmarking results contribute to understanding how far current models are from achieving universal music representations while establishing metrics for future progress.
nan
Article 1053
Title@2025-06-20 (5): From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
Title: From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers | Von Konzepten zu Komponenten: Konzept-agnostische Aufmerksamkeit Modul Entdeckung in Transformatoren | 从概念到组成部分:在变异器中发现概念 – – 不可接受注意模块 2506.17052v1 |
Authors (3): Jingtong Su, Julia Kempe, Karen Ullrich
Transformers have achieved state-of-the-art performance across language and vision tasks. This success drives the imperative to interpret their internal mechanisms with the dual goals of enhancing performance and improving behavioral control. Attribution methods help advance interpretability by assigning model outputs associated with a target concept to specific model components. Current attribution research primarily studies multi-layer perceptron neurons and addresses relatively simple concepts such as factual associations (e.g., Paris is located in France). This focus tends to overlook the impact of the attention mechanism and lacks a unified approach for analyzing more complex concepts. To fill these gaps, we introduce Scalable Attention Module Discovery (SAMD), a concept-agnostic method for mapping arbitrary, complex concepts to specific attention heads of general transformer models. We accomplish this by representing each concept as a vector, calculating its cosine similarity with each attention head, and selecting the TopK-scoring heads to construct the concept-associated attention module. We then propose Scalar Attention Module Intervention (SAMI), a simple strategy to diminish or amplify the effects of a concept by adjusting the attention module using only a single scalar parameter. Empirically, we demonstrate SAMD on concepts of varying complexity, and visualize the locations of their corresponding modules. Our results demonstrate that module locations remain stable before and after LLM post-training, and confirm prior work on the mechanics of LLM multilingualism. Through SAMI, we facilitate jailbreaking on HarmBench (+72.7%) by diminishing “safety” and improve performance on the GSM8K benchmark (+1.6%) by amplifying “reasoning”. Lastly, we highlight the domain-agnostic nature of our approach by suppressing the image classification accuracy of vision transformers on ImageNet.
nan
Article 1054
Title@2025-06-20 (5): Navigating the Deep: Signature Extraction on Deep Neural Networks
Title: Navigating the Deep: Signature Extraction on Deep Neural Networks | Navigieren der Tiefe: Signaturextraktion auf tiefen neuralen Netzwerken | 深层导航:深神经网络的签名提取 2506.17047v1 |
Authors (6): Haolin Liu, Adrien Siproudhis, Samuel Experton, Peter Lorenz, Christina Boura, Thomas Peyrin
Neural network model extraction has emerged in recent years as an important security concern, as adversaries attempt to recover a network’s parameters via black-box queries. A key step in this process is signature extraction, which aims to recover the absolute values of the network’s weights layer by layer. Prior work, notably by Carlini et al. (2020), introduced a technique inspired by differential cryptanalysis to extract neural network parameters. However, their method suffers from several limitations that restrict its applicability to networks with a few layers only. Later works focused on improving sign extraction, but largely relied on the assumption that signature extraction itself was feasible. In this work, we revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.’s signature extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues, greatly improving the efficiency of signature extraction. Our approach permits the extraction of much deeper networks than was previously possible. We validate our method through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth and accuracy. For instance, our extracted network matches the target network on at least 95% of the input space for each of the eight layers of a neural network trained on the CIFAR-10 dataset, while previous works could barely extract the first three layers. Our results represent a crucial step toward practical attacks on larger and more complex neural network architectures.
nan
Article 1055
Title@2025-06-20 (5): MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Title: MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models | MUCAR: Multilinguale Cross-Modal Ambiguity Auflösung für multimodale große Sprachmodelle Benchmarking | MUCAR:为多模式大语言模型制定多语言跨模式的多语种和多模式模糊分辨率基准 2506.17046v1 |
Authors (11): Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai, Xinyue Lou, Yunghwei Lai, Ziyue Wang, Yawen Wang, Kaiyu Huang, Yile Wang, Peng Li, Yang Liu
Multimodal Large Language Models (MLLMs) have demonstrated significant advances across numerous vision-language tasks. Due to their strong image-text alignment capability, MLLMs can effectively understand image-text pairs with clear meanings. However, effectively resolving the inherent ambiguities in natural language and visual contexts remains challenging. Existing multimodal benchmarks typically overlook linguistic and visual ambiguities, relying mainly on unimodal context for disambiguation and thus failing to exploit the mutual clarification potential between modalities. To bridge this gap, we introduce MUCAR, a novel and challenging benchmark designed explicitly for evaluating multimodal ambiguity resolution across multilingual and cross-modal scenarios. MUCAR includes: (1) a multilingual dataset where ambiguous textual expressions are uniquely resolved by corresponding visual contexts, and (2) a dual-ambiguity dataset that systematically pairs ambiguous images with ambiguous textual contexts, with each combination carefully constructed to yield a single, clear interpretation through mutual disambiguation. Extensive evaluations involving 19 state-of-the-art multimodal models–encompassing both open-source and proprietary architectures–reveal substantial gaps compared to human-level performance, highlighting the need for future research into more sophisticated cross-modal ambiguity comprehension methods, further pushing the boundaries of multimodal reasoning.
nan
Article 1056
Title@2025-06-20 (5): Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning
Title: Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning | Problemraumtransformationen für die Verallgemeinerung außerhalb der Verteilung im Verhaltens-Klonen | 行为性克隆中传播外普遍化的空间转变问题 2411.04056v2 |
Authors (3): Kiran Doshi, Marco Bagatella, Stelian Coros
The combination of behavioural cloning and neural networks has driven significant progress in robotic manipulation. As these algorithms may require a large number of demonstrations for each task of interest, they remain fundamentally inefficient in complex scenarios, in which finite datasets can hardly cover the state space. One of the remaining challenges is thus out-of-distribution (OOD) generalisation, i.e. the ability to predict correct actions for states with a low likelihood with respect to the state occupancy induced by the dataset. This issue is aggravated when the system to control is treated as a black-box, ignoring its physical properties. This work characterises widespread properties of robotic manipulation, specifically pose equivariance and locality. We investigate the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation could be employed for its improvement. We empirically demonstrate that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to OOD problem instances.
nan
Article 1057
Title@2025-06-20 (5): COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework
Title: COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework | COS-DPO: Bedingtes eins-shot Multi-Objective Fine-Tuning Framework | COS-DPO: 有条件的单片多目标微调框架 2410.08316v3 |
Authors (5): Yinuo Ren, Tesi Xiao, Michael Shavlovsky, Lexing Ying, Holakou Rahmanian
In LLM alignment and many other ML applications, one often faces the Multi-Objective Fine-Tuning (MOFT) problem, i.e., fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose a Conditioned One-Shot fine-tuning framework (COS-DPO) that extends the Direct Preference Optimization technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By direct conditioning on the weight across auxiliary objectives, our Weight-COS-DPO method enjoys an efficient one-shot training process for profiling the Pareto front and is capable of achieving comprehensive trade-off solutions even in the post-training stage. Based on our theoretical findings on the linear transformation properties of the loss function, we further propose the Temperature-COS-DPO method that augments the temperature parameter to the model input, enhancing the flexibility of post-training control over the trade-offs between the main and auxiliary objectives. We demonstrate the effectiveness and efficiency of the COS-DPO framework through its applications to various tasks, including the Learning-to-Rank (LTR) and LLM alignment tasks, highlighting its viability for large-scale ML deployments.
nan
Article 1058
Title@2025-06-20 (5): MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection
Title: MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection | MAWIFlow Benchmark: Realistische flussbasierte Bewertung für Netzwerkintrusionserkennung | MAWIFlow 基准:对网络入侵探测的现实流动评价 2506.17041v1 |
Authors (3): Joshua Schraven, Alexander Windmann, Oliver Niggemann
Benchmark datasets for network intrusion detection commonly rely on synthetically generated traffic, which fails to reflect the statistical variability and temporal drift encountered in operational environments. This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset, designed to enable realistic and reproducible evaluation of anomaly detection methods. A reproducible preprocessing pipeline is presented that transforms raw packet captures into flow representations conforming to the CICFlowMeter format, while preserving MAWILab’s original anomaly labels. The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic. To establish reference baselines, traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture. Empirical results demonstrate that tree-based classifiers perform well on temporally static data but experience significant performance degradation over time. In contrast, the CNN-BiLSTM model maintains better performance, thus showing improved generalization. These findings underscore the limitations of synthetic benchmarks and static models, and motivate the adoption of realistic datasets with explicit temporal structure. All datasets, pipeline code, and model implementations are made publicly available to foster transparency and reproducibility.
nan
Article 1059
Title@2025-06-20 (5): LSCD: Lomb-Scargle Conditioned Diffusion for Time series Imputation
Title: LSCD: Lomb-Scargle Conditioned Diffusion for Time series Imputation | LSCD: Lomb-Scargle Conditioned Diffusion für Zeitreihen Imputation | LSCD: 用于时间序列的有附加条件的激光扩散 2506.17039v1 |
Authors (6): Elizabeth Fons, Alejandro Sztrajman, Yousef El-Laham, Luciana Ferrer, Svitlana Vyetrenko, Manuela Veloso
Time series with missing or irregularly sampled data are a persistent challenge in machine learning. Many methods operate on the frequency-domain, relying on the Fast Fourier Transform (FFT) which assumes uniform sampling, therefore requiring prior interpolation that can distort the spectra. To address this limitation, we introduce a differentiable Lomb–Scargle layer that enables a reliable computation of the power spectrum of irregularly sampled data. We integrate this layer into a novel score-based diffusion model (LSCD) for time series imputation conditioned on the entire signal spectrum. Experiments on synthetic and real-world benchmarks demonstrate that our method recovers missing data more accurately than purely time-domain baselines, while simultaneously producing consistent frequency estimates. Crucially, our method can be easily integrated into learning frameworks, enabling broader adoption of spectral guidance in machine learning approaches involving incomplete or irregular data.
nan
Article 1060
Title@2025-06-20 (5): Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction
Title: Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction | Bayesisches gemeinsames Modell von Multi-Sensor- und Failure Event-Daten für Multi-Mode Failure Prediction | 多种模式故障预测多传感器和故障事件多发生数据的贝叶斯联合模型 2506.17036v1 |
Authors (4): Sina Aghaee Dabaghan Fard, Minhee Kim, Akash Deep, Jaesung Lee
Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system’s remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.
nan
Article 1061
Title@2025-06-20 (5): Critical Appraisal of Fairness Metrics in Clinical Predictive AI
Title: Critical Appraisal of Fairness Metrics in Clinical Predictive AI | Kritische Bewertung von Fairness-Metriken in klinisch vorausschauender KI | 临床预测性人工智能中的公平度量 2506.17035v1 |
Authors (9): João Matos, Ben Van Calster, Leo Anthony Celi, Paula Dhiman, Judy Wawira Gichoya, Richard D. Riley, Chris Russell, Sara Khalid, Gary S. Collins
Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of “fairness” remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a “fairness metric” as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.
nan
Article 1062
Title@2025-06-20 (5): Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence
Title: Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence | Bedingte Front-Tür-Anpassung für heterogene Behandlung Zuordnungseffektschätzung unter Nichtbefolgung | 不遵守规定情况下对不同不同待遇不同待遇的 条件性前门调整 外门调整 2505.05677v3 |
Authors (3): Winston Chen, Trenton Chang, Jenna Wiens
Estimates of heterogeneous treatment assignment effects can inform treatment decisions. Under the presence of non-adherence (e.g., patients do not adhere to their assigned treatment), both the standard backdoor adjustment (SBD) and the conditional front-door adjustment (CFD) can recover unbiased estimates of the treatment assignment effects. However, the estimation variance of these approaches may vary widely across settings, which remains underexplored in the literature. In this work, we demonstrate theoretically and empirically that CFD yields lower-variance estimates than SBD when the true effect of treatment assignment is small (i.e., assigning an intervention leads to small changes in patients’ future outcome). Additionally, since CFD requires estimating multiple nuisance parameters, we introduce LobsterNet, a multi-task neural network that implements CFD with joint modeling of the nuisance parameters. Empirically, LobsterNet reduces estimation error across several semi-synthetic and real-world datasets compared to baselines. Our findings suggest CFD with shared nuisance parameter modeling can improve treatment assignment effect estimation under non-adherence.
nan
Article 1063
Title@2025-06-20 (5): Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment
Title: Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment | Skalierbares und zuverlässiges Multi-Agenten-Verstärkungslernen für die Verkehrszuweisung | 可缩放和可靠的多试剂交通分配强化学习 2506.17029v1 |
Authors (7): Leizhen Wang, Peibo Duan, Cheng Lyu, Zewen Wang, Zhiqiang He, Nan Zheng, Zhenliang Ma
The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalability and reliability when managing extensive networks with substantial travel demand, which limiting their practical applicability in solving large-scale traffic assignment problems. To address these challenges, this study introduces MARL-OD-DA, a new MARL framework for the traffic assignment problem, which redefines agents as origin-destination (OD) pair routers rather than individual travelers, significantly enhancing scalability. Additionally, a Dirichlet-based action space with action pruning and a reward function based on the local relative gap are designed to enhance solution reliability and improve convergence efficiency. Experiments demonstrate that the proposed MARL framework effectively handles medium-sized networks with extensive and varied city-level OD demand, surpassing existing MARL methods. When implemented in the SiouxFalls network, MARL-OD-DA achieves better assignment solutions in 10 steps, with a relative gap that is 94.99% lower than that of conventional methods.
nan
Article 1064
Title@2025-06-20 (5): Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path Perturbation
Title: Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path Perturbation | Null-Schuss-Klasse Entlernen über schichtweise Relevanz Analyse und neuronale Path Perturbation | 通过从图层角度的关联性分析和神经路径干扰,零中弹的班级取消学习 2410.23693v2 |
Authors (6): Wenhan Chang, Tianqing Zhu, Ping Xiong, Yufeng Wu, Faqian Guan, Wanlei Zhou
In the rapid advancement of artificial intelligence, privacy protection has become crucial, giving rise to machine unlearning. Machine unlearning is a technique that removes specific data influences from trained models without the need for extensive retraining. However, it faces several key challenges, including accurately implementing unlearning, ensuring privacy protection during the unlearning process, and achieving effective unlearning without significantly compromising model performance. This paper presents a novel approach to machine unlearning by employing Layer-wise Relevance Analysis and Neuronal Path Perturbation. We address three primary challenges: the lack of detailed unlearning principles, privacy guarantees in zero-shot unlearning scenario, and the balance between unlearning effectiveness and model utility. Our method balances machine unlearning performance and model utility by identifying and perturbing highly relevant neurons, thereby achieving effective unlearning. By using data not present in the original training set during the unlearning process, we satisfy the zero-shot unlearning scenario and ensure robust privacy protection. Experimental results demonstrate that our approach effectively removes targeted data from the target unlearning model while maintaining the model’s utility, offering a practical solution for privacy-preserving machine learning.
nan
Article 1065
Title@2025-06-20 (5): Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning
Title: Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning | Eau De $Q$-Network: Adaptive Destillation von neuralen Netzwerken im Deep Reinforcement Learning | Eau de $Q$-网络:深强化学习中神经网络的适应性蒸馏 2503.01437v2 |
Authors (5): Théo Vincent, Tim Faust, Yogesh Tripathi, Jan Peters, Carlo D’Eramo
Recent works have successfully demonstrated that sparse deep reinforcement learning agents can be competitive against their dense counterparts. This opens up opportunities for reinforcement learning applications in fields where inference time and memory requirements are cost-sensitive or limited by hardware. Until now, dense-to-sparse methods have relied on hand-designed sparsity schedules that are not synchronized with the agent’s learning pace. Crucially, the final sparsity level is chosen as a hyperparameter, which requires careful tuning as setting it too high might lead to poor performances. In this work, we address these shortcomings by crafting a dense-to-sparse algorithm that we name Eau De $Q$-Network (EauDeQN). To increase sparsity at the agent’s learning pace, we consider multiple online networks with different sparsity levels, where each online network is trained from a shared target network. At each target update, the online network with the smallest loss is chosen as the next target network, while the other networks are replaced by a pruned version of the chosen network. We evaluate the proposed approach on the Atari $2600$ benchmark and the MuJoCo physics simulator, showing that EauDeQN reaches high sparsity levels while keeping performances high.
nan
Article 1066
Title@2025-06-20 (5): A Quantile Regression Approach for Remaining Useful Life Estimation with State Space Models
Title: A Quantile Regression Approach for Remaining Useful Life Estimation with State Space Models | Ein Quantile Regressionsansatz für verbleibende sinnvolle Lebensschätzung mit State Space Models | 国家空间模型中剩余使用寿命估计的量化回归方法 2506.17018v1 |
Authors (3): Davide Frizzo, Francesco Borsatti, Gian Antonio Susto
Predictive Maintenance (PdM) is pivotal in Industry 4.0 and 5.0, proactively enhancing efficiency through accurate equipment Remaining Useful Life (RUL) prediction, thus optimizing maintenance scheduling and reducing unexpected failures and premature interventions. This paper introduces a novel RUL estimation approach leveraging State Space Models (SSM) for efficient long-term sequence modeling. To handle model uncertainty, Simoultaneous Quantile Regression (SQR) is integrated into the SSM, enabling multiple quantile estimations. The proposed method is benchmarked against traditional sequence modelling techniques (LSTM, Transformer, Informer) using the C-MAPSS dataset. Results demonstrate superior accuracy and computational efficiency of SSM models, underscoring their potential for high-stakes industrial applications.
nan
Article 1067
Title@2025-06-20 (5): The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation
Title: The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation | Die versteckten Kosten eines Bildes: Quantifizierung des Energieverbrauchs von KI-Bilderzeugung | 图像的隐藏成本:对AI图像生成的能源消耗量进行量化 2506.17016v1 |
Authors (5): Giulia Bertazzini, Chiara Albisani, Daniele Baracchi, Dasara Shullani, Roberto Verdecchia
With the growing adoption of AI image generation, in conjunction with the ever-increasing environmental resources demanded by AI, we are urged to answer a fundamental question: What is the environmental impact hidden behind each image we generate? In this research, we present a comprehensive empirical experiment designed to assess the energy consumption of AI image generation. Our experiment compares 17 state-of-the-art image generation models by considering multiple factors that could affect their energy consumption, such as model quantization, image resolution, and prompt length. Additionally, we consider established image quality metrics to study potential trade-offs between energy consumption and generated image quality. Results show that image generation models vary drastically in terms of the energy they consume, with up to a 46x difference. Image resolution affects energy consumption inconsistently, ranging from a 1.3x to 4.7x increase when doubling resolution. U-Net-based models tend to consume less than Transformer-based one. Model quantization instead results to deteriorate the energy efficiency of most models, while prompt length and content have no statistically significant impact. Improving image quality does not always come at the cost of a higher energy consumption, with some of the models producing the highest quality images also being among the most energy efficient ones.
nan
Article 1068
Title@2025-06-20 (5): Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows
Title: Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows | Simulieren korrelierter Elektronen mit Symmetrie-verstärkten Normalisierungsströmen | 以对称强制正常化流程模拟与Cor相关电子 2506.17015v1 |
Authors (7): Dominic Schuh, Janik Kreit, Evan Berkowitz, Lena Funcke, Thomas Luu, Kim A. Nicoli, Marcel Rodekamp
We present the first proof of principle that normalizing flows can accurately learn the Boltzmann distribution of the fermionic Hubbard model - a key framework for describing the electronic structure of graphene and related materials. State-of-the-art methods like Hybrid Monte Carlo often suffer from ergodicity issues near the time-continuum limit, leading to biased estimates. Leveraging symmetry-aware architectures as well as independent and identically distributed sampling, our approach resolves these issues and achieves significant speed-ups over traditional methods.
nan
Article 1069
Title@2025-06-20 (5): Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators
Title: Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators | Robustes Verstärkungslernen für diskrete kompositorische Generierung über allgemeine Soft Operatoren | 通过一般软操作员为分辨合成生成进行强力强化学习 2506.17007v1 |
Authors (8): Marco Jiralerspong, Esther Derman, Danilo Vucetic, Nikolay Malkin, Bilun Sun, Tianyu Zhang, Pierre-Luc Bacon, Gauthier Gidel
A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.
nan
Article 1070
Title@2025-06-20 (5): Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments
Title: Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments | Prmpt2Adpt: Promptbasierte Zero-Shot-Domain-Anpassung für ressourcenbeschränkte Umgebungen | 受资源限制的环境的快速零热域适应 2506.16994v1 |
Authors (4): Yasir Ali Farrukh, Syed Wali, Irfan Khan, Nathaniel D. Bastian
Unsupervised Domain Adaptation (UDA) is a critical challenge in real-world vision systems, especially in resource-constrained environments like drones, where memory and computation are limited. Existing prompt-driven UDA methods typically rely on large vision-language models and require full access to source-domain data during adaptation, limiting their applicability. In this work, we propose Prmpt2Adpt, a lightweight and efficient zero-shot domain adaptation framework built around a teacher-student paradigm guided by prompt-based feature alignment. At the core of our method is a distilled and fine-tuned CLIP model, used as the frozen backbone of a Faster R-CNN teacher. A small set of low-level source features is aligned to the target domain semantics-specified only through a natural language prompt-via Prompt-driven Instance Normalization (PIN). These semantically steered features are used to briefly fine-tune the detection head of the teacher model. The adapted teacher then generates high-quality pseudo-labels, which guide the on-the-fly adaptation of a compact student model. Experiments on the MDS-A dataset demonstrate that Prmpt2Adpt achieves competitive detection performance compared to state-of-the-art methods, while delivering up to 7x faster adaptation and 5x faster inference speed using few source images-making it a practical and scalable solution for real-time adaptation in low-resource domains.
nan
Article 1071
Title@2025-06-20 (5): CoIFNet: A Unified Framework for Multivariate Time Series Forecasting with Missing Values
Title: CoIFNet: A Unified Framework for Multivariate Time Series Forecasting with Missing Values | CoIFNet: Ein einheitliches Framework für die Multivariate Zeitreihenprognose mit fehlenden Werten | CoIFNet:多变时间序列缺值预测统一框架 2506.13064v2 |
Authors (8): Kai Tang, Ji Zhang, Hua Meng, Minbo Ma, Qi Xiong, Fengmao Lv, Jie Xu, Tianrui Li
Multivariate time series forecasting (MTSF) is a critical task with broad applications in domains such as meteorology, transportation, and economics. Nevertheless, pervasive missing values caused by sensor failures or human errors significantly degrade forecasting accuracy. Prior efforts usually employ an impute-then-forecast paradigm, leading to suboptimal predictions due to error accumulation and misaligned objectives between the two stages. To address this challenge, we propose the Collaborative Imputation-Forecasting Network (CoIFNet), a novel framework that unifies imputation and forecasting to achieve robust MTSF in the presence of missing values. Specifically, CoIFNet takes the observed values, mask matrix and timestamp embeddings as input, processing them sequentially through the Cross-Timestep Fusion (CTF) and Cross-Variate Fusion (CVF) modules to capture temporal dependencies that are robust to missing values. We provide theoretical justifications on how our CoIFNet learning objective improves the performance bound of MTSF with missing values. Through extensive experiments on challenging MSTF benchmarks, we demonstrate the effectiveness and computational efficiency of our proposed approach across diverse missing-data scenarios, e.g., CoIFNet outperforms the state-of-the-art method by $\underline{\textbf{24.40}}$% ($\underline{\textbf{23.81}}$%) at a point (block) missing rate of 0.6, while improving memory and time efficiency by $\underline{\boldsymbol{4.3\times}}$ and $\underline{\boldsymbol{2.1\times}}$, respectively. Our code is available at: https://github.com/KaiTang-eng/CoIFNet.
nan
Article 1072
Title@2025-06-20 (5): SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Title: SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | SHAKTI: Ein 2,5 Milliarden Parameter kleines Sprachmodell optimiert für Edge-KI und Low-Resource-Umgebungen | SHAKTI:为边缘AI和低资源环境优化的2.5亿亿亿分数小语言模型 2410.11331v2 |
Authors (3): Syed Abdul Gaffar Shakhadri, Kruthika KR, Rakshit Aralimatti
We introduce Shakti, a 2.5 billion parameter language model specifically optimized for resource-constrained environments such as edge devices, including smartphones, wearables, and IoT systems. Shakti combines high-performance NLP with optimized efficiency and precision, making it ideal for real-time AI applications where computational resources and memory are limited. With support for vernacular languages and domain-specific tasks, Shakti excels in industries such as healthcare, finance, and customer service. Benchmark evaluations demonstrate that Shakti performs competitively against larger models while maintaining low latency and on-device efficiency, positioning it as a leading solution for edge AI.
nan
Article 1073
Title@2025-06-20 (5): The learned range test method for the inverse inclusion problem
Title: The learned range test method for the inverse inclusion problem | Die Lernbereich-Testmethode für das inverse Inklusion-Problem | 反包容问题的学习范围测试方法 2411.00463v2 |
Authors (2): Shiwei Sun, Giovanni S. Alberti
We consider the inverse problem consisting of the reconstruction of an inclusion $B$ contained in a bounded domain $\Omega\subset\mathbb{R}^d$ from a single pair of Cauchy data $(u | {\partial\Omega},\partial\nu u | _{\partial\Omega})$, where $\Delta u=0$ in $\Omega\setminus\overline B$ and $u=0$ on $\partial B$. We show that the reconstruction algorithm based on the range test, a domain sampling method, can be written as a neural network with a specific architecture. We propose to learn the weights of this network in the framework of supervised learning, and to combine it with a pre-trained classifier, with the purpose of distinguishing the inclusions based on their distance from the boundary. The numerical simulations show that this learned range test method provides accurate and stable reconstructions of polygonal inclusions. Furthermore, the results are superior to those obtained with the standard range test method (without learning) and with an end-to-end fully connected deep neural network, a purely data-driven method. |
nan
Article 1074
Title@2025-06-20 (5): Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond
Title: Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond | Sprachengpässe-Modelle: Ein Rahmen für interpretierbares Wissen auf Tracing und darüber hinaus | 语言瓶颈模式:可解释知识追踪框架及以后 2506.16982v1 |
Authors (2): Antonin Berthon, Mihaela van der Schaar
Accurately assessing student knowledge is critical for effective education, yet traditional Knowledge Tracing (KT) methods rely on opaque latent embeddings, limiting interpretability. Even LLM-based approaches generate direct predictions or summaries that may hallucinate without any accuracy guarantees. We recast KT as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable. Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text. By constraining all predictive information to pass through a short natural-language bottleneck, LBMs ensure that the summary contains accurate information while remaining human-interpretable. Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories. We demonstrate that training the encoder with group-relative policy optimization, using downstream decoding accuracy as a reward signal, effectively improves summary quality.
nan
Article 1075
Title@2025-06-20 (5): Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction
Title: Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction | Gegurtetes und ensembled neurales Netzwerk für lineare und nichtlineare Dimensionsreduktion | 内线和非线性足够尺寸减少带带和组合的神经网络 2412.08961v2 |
Authors (2): Yin Tang, Bing Li
We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework is achieved by a specially structured neural network – the Belted and Ensembled Neural Network (BENN) – that consists of a narrow latent layer, which we call the belt, and a family of transformations of the response, which we call the ensemble. By strategically placing the belt at different layers of the neural network, we can achieve linear or nonlinear sufficient dimension reduction, and by choosing the appropriate transformation families, we can achieve dimension reduction for the conditional distribution or the conditional mean. Moreover, thanks to the advantage of the neural network, the method is very fast to compute, overcoming a computation bottleneck of the traditional sufficient dimension reduction estimators, which involves the inversion of a matrix of dimension either p or n. We develop the algorithm and convergence rate of our method, compare it with existing sufficient dimension reduction methods, and apply it to two data examples.
nan
Article 1076
Title@2025-06-20 (5): Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
Title: Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework | Polysemantik mit PRISM erfassen: Ein Multi-Konzept-Feature Beschreibung Framework | 利用PRISM获得多边性能:多概念特征描述框架 2506.15538v2 |
Authors (7): Laura Kopf, Nils Feldhus, Kirill Bykov, Philine Lou Bommer, Anna Hedström, Marina M. -C. Höhne, Oliver Eberle
Automated interpretability research aims to identify concepts encoded in neural network features to enhance human understanding of model behavior. Current feature description methods face two critical challenges: limited robustness and the flawed assumption that each neuron encodes only a single concept (monosemanticity), despite growing evidence that neurons are often polysemantic. This assumption restricts the expressiveness of feature descriptions and limits their ability to capture the full range of behaviors encoded in model internals. To address this, we introduce Polysemantic FeatuRe Identification and Scoring Method (PRISM), a novel framework that captures the inherent complexity of neural network features. Unlike prior approaches that assign a single description per feature, PRISM provides more nuanced descriptions for both polysemantic and monosemantic features. We apply PRISM to language models and, through extensive benchmarking against existing methods, demonstrate that our approach produces more accurate and faithful feature descriptions, improving both overall description quality (via a description score) and the ability to capture distinct concepts when polysemanticity is present (via a polysemanticity score).
nan
Article 1077
Title@2025-06-20 (5): Latent Concept Disentanglement in Transformer-based Language Models
Title: Latent Concept Disentanglement in Transformer-based Language Models | Latent Concept Disentanglement in Transformer-basierten Sprachmodellen | 以变换器为基础的语言模型中的边端概念分解 2506.16975v1 |
Authors (6): Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, Rina Panigrahy
When large language models (LLMs) use in-context learning (ICL) to solve a new task, they seem to grasp not only the goal of the task but also core, latent concepts in the demonstration examples. This begs the question of whether transformers represent latent structures as part of their computation or whether they take shortcuts to solve the problem. Prior mechanistic work on ICL does not address this question because it does not sufficiently examine the relationship between the learned representation and the latent concept, and the considered problem settings often involve only single-step reasoning. In this work, we examine how transformers disentangle and use latent concepts. We show that in 2-hop reasoning tasks with a latent, discrete concept, the model successfully identifies the latent concept and does step-by-step concept composition. In tasks parameterized by a continuous latent concept, we find low-dimensional subspaces in the representation space where the geometry mimics the underlying parameterization. Together, these results refine our understanding of ICL and the representation of transformers, and they provide evidence for highly localized structures in the model that disentangle latent concepts in ICL tasks.
nan
Article 1078
Title@2025-06-20 (5): Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks
Title: Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks | Masken-PINNs: Regelbare Funktionsverteilungen in physikinformierten Neuronalen Netzwerken | Mask-PINNs:物理成形神经网络中规范地物分布 2505.06331v2 |
Authors (4): Feilong Jiang, Xiaonan Hou, Jianqiao Ye, Min Xia
Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical laws directly into the loss function. However, effective training of PINNs remains challenging due to internal covariate shift, which destabilizes feature distributions and impairs model expressiveness. While normalization techniques like Batch Normalization and Layer Normalization are standard remedies in deep learning, they disrupt the pointwise input-output mappings critical to preserving the physical consistency in PINNs. In this work, we introduce Mask-PINNs, a novel architecture that regulates internal feature distributions through a smooth, learnable mask function applied pointwise across hidden layers. Unlike conventional normalization methods, the proposed mask function preserves the deterministic nature of input-output relationships while suppressing activation drift and saturation. Theoretically, we demonstrate that Mask-PINNs control feature spread near initialization by attenuating gradient variance growth through a tailored modulation mechanism. Empirically, we validate the method on multiple PDE benchmarks across diverse activation functions. Our results show consistent improvements in prediction accuracy, convergence stability, and robustness, with relative L2 errors reduced by up to two orders of magnitude over baseline models. Furthermore, we demonstrate that Mask-PINNs enable the effective use of wider networks, overcoming a key limitation in existing PINN frameworks.
nan
Article 1079
Title@2025-06-20 (5): PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval
Title: PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval | PromptDSI: Prompt-basiert Probefrei Instance-wise Incremental Learning for Document Retrieval | 快速DSI:为文件检索进行基于即时的无排练-不重复式递增学习 2406.12593v3 |
Authors (8): Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do
Differentiable Search Index (DSI) utilizes pre-trained language models to perform indexing and document retrieval via end-to-end learning without relying on external indexes. However, DSI requires full re-training to index new documents, causing significant computational inefficiencies. Continual learning (CL) offers a solution by enabling the model to incrementally update without full re-training. Existing CL solutions in document retrieval rely on memory buffers or generative models for rehearsal, which is infeasible when accessing previous training data is restricted due to privacy concerns. To this end, we introduce PromptDSI, a prompt-based, rehearsal-free continual learning approach for document retrieval. PromptDSI follows the Prompt-based Continual Learning (PCL) framework, using learnable prompts to efficiently index new documents without accessing previous documents or queries. To improve retrieval latency, we remove the initial forward pass of PCL, which otherwise greatly increases training and inference time, with a negligible trade-off in performance. Additionally, we introduce a novel topic-aware prompt pool that employs neural topic embeddings as fixed keys, eliminating the instability of prompt key optimization while maintaining competitive performance with existing PCL prompt pools. In a challenging rehearsal-free continual learning setup, we demonstrate that PromptDSI variants outperform rehearsal-based baselines, match the strong cache-based baseline in mitigating forgetting, and significantly improving retrieval performance on new corpora.
nan
Article 1080
Title@2025-06-20 (5): RocketStack: A level-aware deep recursive ensemble learning framework with exploratory feature fusion and model pruning dynamics
Title: RocketStack: A level-aware deep recursive ensemble learning framework with exploratory feature fusion and model pruning dynamics | RocketStack: Ein level-aware tiefe rekursives Ensemble Lernrahmen mit Sondierungsfunktion Fusion und Modellschneiden Dynamik | 火箭堆:一个具有探索性聚集和模型排出动态的深深有觉知的循环深层共聚学习框架 2506.16965v1 |
Authors (1): Çağatay Demirel
Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains rare, as most designs prioritize horizontal diversity over recursive depth due to model complexity, feature redundancy, and computational burden. To address these challenges, RocketStack, a level-aware recursive ensemble framework, is introduced and explored up to ten stacking levels, extending beyond prior architectures. The framework incrementally prunes weaker learners at each level, enabling deeper stacking without excessive complexity. To mitigate early performance saturation, mild Gaussian noise is added to out-of-fold (OOF) scores before pruning, and compared against strict OOF pruning. Further both per-level and periodic feature compressions are explored using attention-based selection, Simple, Fast, Efficient (SFE) filter, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), linear-trend tests confirmed rising accuracy with depth in most variants, and the top performing meta-model at each level increasingly outperformed the strongest standalone ensemble. In the binary subset, periodic SFE with mild OOF-score randomization reached 97.08% at level 10, 5.14% above the strict-pruning configuration and cut runtime by 10.5% relative to no compression. In the multi-class subset, periodic attention selection reached 98.60% at level 10, exceeding the strongest baseline by 6.11%, while reducing runtime by 56.1% and feature dimensionality by 74% compared to no compression. These findings highlight mild randomization as an effective regularizer and periodic compression as a stabilizer. Echoing the design of multistage rockets in aerospace (prune, compress, propel) RocketStack achieves deep recursive ensembling with tractable complexity.
nan
Article 1081
Title@2025-06-20 (5): LogProber: Disentangling confidence from contamination in LLM responses
Title: LogProber: Disentangling confidence from contamination in LLM responses | LogProber: Entwirren des Vertrauens in LLM-Antworten | 日志Prober:解除对LLM反应中污染的信心 2408.14352v3 |
Authors (3): Nicolas Yax, Pierre-Yves Oudeyer, Stefano Palminteri
In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to fairly and properly track the evolution of the performance of LLMs. To date, only a few recent studies have attempted to address the issue of quantifying and detecting contamination in short text sequences, such as those commonly found in benchmarks. However, these methods have limitations that can sometimes render them impractical. In the present paper, we introduce LogProber, a novel, efficient algorithm that we show to be able to detect contamination in a black box setting that tries to tackle some of these drawbacks by focusing on the familiarity with the question rather than the answer. Here, we explore the properties of the proposed method in comparison with concurrent approaches, identify its advantages and limitations, and illustrate how different forms of contamination can go undetected depending on the design of the detection algorithm.
nan
Article 1082
Title@2025-06-20 (5): Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review
Title: Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review | Methoden des maschinellen Lernens für Anwendungen der kleinen Daten- und Upstream-Bioverarbeitung: Ein umfassender Überblick | 小型数据和上游生物处理应用的机械学习方法:全面审查 2506.12322v2 |
Authors (4): Johnny Peng, Thanh Tung Khuat, Katarzyna Musial, Bogdan Gabrys
Data is crucial for machine learning (ML) applications, yet acquiring large datasets can be costly and time-consuming, especially in complex, resource-intensive fields like biopharmaceuticals. A key process in this industry is upstream bioprocessing, where living cells are cultivated and optimised to produce therapeutic proteins and biologics. The intricate nature of these processes, combined with high resource demands, often limits data collection, resulting in smaller datasets. This comprehensive review explores ML methods designed to address the challenges posed by small data and classifies them into a taxonomy to guide practical applications. Furthermore, each method in the taxonomy was thoroughly analysed, with a detailed discussion of its core concepts and an evaluation of its effectiveness in tackling small data challenges, as demonstrated by application results in the upstream bioprocessing and other related domains. By analysing how these methods tackle small data challenges from different perspectives, this review provides actionable insights, identifies current research gaps, and offers guidance for leveraging ML in data-constrained environments.
nan
Article 1083
Title@2025-06-20 (5): LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
Title: LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models | LAION-C: Ein Out-of-Distribution-Benchmark für Web-Scale Vision-Modelle | LAION-C:网络规模愿景模型的分发外基准 2506.16950v1 |
Authors (5): Fanfei Li, Thomas Klein, Wieland Brendel, Robert Geirhos, Roland S. Zimmermann
Out-of-distribution (OOD) robustness is a desired property of computer vision models. Improving model robustness requires high-quality signals from robustness benchmarks to quantify progress. While various benchmark datasets such as ImageNet-C were proposed in the ImageNet era, most ImageNet-C corruption types are no longer OOD relative to today’s large, web-scraped datasets, which already contain common corruptions such as blur or JPEG compression artifacts. Consequently, these benchmarks are no longer well-suited for evaluating OOD robustness in the era of web-scale datasets. Indeed, recent models show saturating scores on ImageNet-era OOD benchmarks, indicating that it is unclear whether models trained on web-scale datasets truly become better at OOD generalization or whether they have simply been exposed to the test distortions during training. To address this, we introduce LAION-C as a benchmark alternative for ImageNet-C. LAION-C consists of six novel distortion types specifically designed to be OOD, even for web-scale datasets such as LAION. In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models, including MLLMs such as Gemini and GPT-4o. We additionally conducted a psychophysical experiment to evaluate the difficulty of our corruptions for human observers, enabling a comparison of models to lab-quality human robustness data. We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.
nan
Article 1084
Title@2025-06-20 (5): Solving a class of stochastic optimal control problems by physics-informed neural networks
Title: Solving a class of stochastic optimal control problems by physics-informed neural networks | Lösung einer Klasse stochastischer optimaler Kontrollprobleme durch physikinformierte neuronale Netzwerke | 通过物理知情神经网络解决一系列随机最佳控制问题 2402.15592v2 |
Authors (3): Zhe Jiao, Wantao Jia, Weiqiu Zhu
The aim of this work is to develop a deep learning method for solving high-dimensional stochastic control problems based on the Hamilton–Jacobi–Bellman (HJB) equation and physics-informed learning. Our approach is to parameterize the feedback control and the value function using a decoupled neural network with multiple outputs. We train this network by using a loss function with penalty terms that enforce the HJB equation along the sampled trajectories generated by the controlled system. More significantly, numerical results on various applications are carried out to demonstrate that the proposed approach is efficient and applicable.
nan
Article 1085
Title@2025-06-20 (5): Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs
Title: Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs | Kalibrierte vorausschauende untere Bounds zur Zeit-zu-Unsicher-Probenahme in LLMs | LLM 中时间到非安全抽样时对低频谱校准的预测值下下界 2506.13593v2 |
Authors (4): Hen Davidov, Gilad Freidkin, Shai Feldman, Yaniv Romano
We develop a framework to quantify the time-to-unsafe-sampling - the number of large language model (LLM) generations required to trigger an unsafe (e.g., toxic) response. Estimating this quantity is challenging, since unsafe responses are exceedingly rare in well-aligned LLMs, potentially occurring only once in thousands of generations. As a result, directly estimating time-to-unsafe-sampling would require collecting training data with a prohibitively large number of generations per prompt. However, with realistic sampling budgets, we often cannot generate enough responses to observe an unsafe outcome for every prompt, leaving the time-to-unsafe-sampling unobserved in many cases, making the estimation and evaluation tasks particularly challenging. To address this, we frame this estimation problem as one of survival analysis and develop a provably calibrated lower predictive bound (LPB) on the time-to-unsafe-sampling of a given prompt, leveraging recent advances in conformal prediction. Our key innovation is designing an adaptive, per-prompt sampling strategy, formulated as a convex optimization problem. The objective function guiding this optimized sampling allocation is designed to reduce the variance of the estimators used to construct the LPB, leading to improved statistical efficiency over naive methods that use a fixed sampling budget per prompt. Experiments on both synthetic and real data support our theoretical results and demonstrate the practical utility of our method for safety risk assessment in generative AI models.
nan
Article 1086
Title@2025-06-20 (5): Gaussian Processes and Reproducing Kernels: Connections and Equivalences
Title: Gaussian Processes and Reproducing Kernels: Connections and Equivalences | Gaußsche Prozesse und reproduzierende Kerne: Verbindungen und Äquivalenzen | 高斯进程和再生产核心:连接和等效 2506.17366v1 |
Authors (4): Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, Bharath K. Sriperumbudur
This monograph studies the relations between two approaches using positive definite kernels: probabilistic methods using Gaussian processes, and non-probabilistic methods using reproducing kernel Hilbert spaces (RKHS). They are widely studied and used in machine learning, statistics, and numerical analysis. Connections and equivalences between them are reviewed for fundamental topics such as regression, interpolation, numerical integration, distributional discrepancies, and statistical dependence, as well as for sample path properties of Gaussian processes. A unifying perspective for these equivalences is established, based on the equivalence between the Gaussian Hilbert space and the RKHS. The monograph serves as a basis to bridge many other methods based on Gaussian processes and reproducing kernels, which are developed in parallel by the two research communities.
nan
Article 1087
Title@2025-06-20 (5): Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test
Title: Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test | Steigerung der Expressivität von Quantum-Neuralen Netzwerken auf Basis des SWAP-Tests | 根据全部门办法测试,提高量子神经网络的表达性 2506.16938v1 |
Authors (4): Sebastian Nagies, Emiliano Tolotti, Davide Pastorello, Enrico Blanzieri
Parameterized quantum circuits represent promising architectures for machine learning applications, yet many lack clear connections to classical models, potentially limiting their ability to translate the wide success of classical neural networks to the quantum realm. We examine a specific type of quantum neural network (QNN) built exclusively from SWAP test circuits, and discuss its mathematical equivalence to a classical two-layer feedforward network with quadratic activation functions under amplitude encoding. Our analysis across classical real-world and synthetic datasets reveals that while this architecture can successfully learn many practical tasks, it exhibits fundamental expressivity limitations due to violating the universal approximation theorem, particularly failing on harder problems like the parity check function. To address this limitation, we introduce a circuit modification using generalized SWAP test circuits that effectively implements classical neural networks with product layers. This enhancement enables successful learning of parity check functions in arbitrary dimensions which we analytically argue to be impossible for the original architecture beyond two dimensions regardless of network size. Our results establish a framework for enhancing QNN expressivity through classical task analysis and demonstrate that our SWAP test-based architecture offers broad representational capacity, suggesting potential promise also for quantum learning tasks.
nan
Article 1088
Title@2025-06-20 (5): A deep learning and machine learning approach to predict neonatal death in the context of São Paulo
Title: A deep learning and machine learning approach to predict neonatal death in the context of São Paulo | Ein tiefer Lern- und maschineller Lernansatz zur Vorhersage des neonatalen Todes im Kontext von São Paulo | 在圣保罗背景下预测新生儿死亡的深层学习和机器学习方法 2506.16929v1 |
Authors (9): Mohon Raihan, Plabon Kumar Saha, Rajan Das Gupta, A Z M Tahmidul Kabir, Afia Anjum Tamanna, Md. Harun-Ur-Rashid, Adnan Bin Abdus Salam, Md Tanvir Anjum, A Z M Ahteshamul Kabir
Neonatal death is still a concerning reality for underdeveloped and even some developed countries. Worldwide data indicate that 26.693 babies out of 1,000 births die, according to Macro Trades. To reduce this number, early prediction of endangered babies is crucial. Such prediction enables the opportunity to take ample care of the child and mother so that early child death can be avoided. In this context, machine learning was used to determine whether a newborn baby is at risk. To train the predictive model, historical data of 1.4 million newborns was used. Machine learning and deep learning techniques such as logical regression, K-nearest neighbor, random forest classifier, extreme gradient boosting (XGBoost), convolutional neural network, and long short-term memory (LSTM) were implemented using the dataset to identify the most accurate model for predicting neonatal mortality. Among the machine learning algorithms, XGBoost and random forest classifier achieved the best accuracy with 94%, while among the deep learning models, LSTM delivered the highest accuracy with 99%. Therefore, using LSTM appears to be the most suitable approach to predict whether precautionary measures for a child are necessary.
nan
Article 1089
Title@2025-06-20 (5): A Neural Operator based Hybrid Microscale Model for Multiscale Simulation of Rate-Dependent Materials
Title: A Neural Operator based Hybrid Microscale Model for Multiscale Simulation of Rate-Dependent Materials | Ein neurales Operator-basiertes Hybrid-Mikroskalen-Modell zur Multiskalen-Simulation von ratenabhängigen Materialien | 以神经操作器为基础的多级制模调依赖材料多级模拟混合微型模型 2506.16918v1 |
Authors (6): Dhananjeyan Jeyaraj, Hamidreza Eivazi, Jendrik-Alexander Tröger, Stefan Wittek, Stefan Hartmann, Andreas Rausch
The behavior of materials is influenced by a wide range of phenomena occurring across various time and length scales. To better understand the impact of microstructure on macroscopic response, multiscale modeling strategies are essential. Numerical methods, such as the $\text{FE}^2$ approach, account for micro-macro interactions to predict the global response in a concurrent manner. However, these methods are computationally intensive due to the repeated evaluations of the microscale. This challenge has led to the integration of deep learning techniques into computational homogenization frameworks to accelerate multiscale simulations. In this work, we employ neural operators to predict the microscale physics, resulting in a hybrid model that combines data-driven and physics-based approaches. This allows for physics-guided learning and provides flexibility for different materials and spatial discretizations. We apply this method to time-dependent solid mechanics problems involving viscoelastic material behavior, where the state is represented by internal variables only at the microscale. The constitutive relations of the microscale are incorporated into the model architecture and the internal variables are computed based on established physical principles. The results for homogenized stresses ($<6\%$ error) show that the approach is computationally efficient ($\sim 100 \times$ faster).
nan
Article 1090
Title@2025-06-20 (5): Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs
Title: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs | Robuste Gradienten für POMDPs mit verstecktem Modell | 隐藏模式 POMDPs 的硬性有限记忆政策梯度 2505.09518v2 |
Authors (5): Maris F. L. Galesloot, Roman Andriushchenko, Milan Češka, Sebastian Junges, Nils Jansen
Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs.We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP, and (2) subgradient ascent to optimize the candidate policy for a worst-case POMDP. The empirical evaluation shows that, compared to various baselines, our approach (1) produces policies that are more robust and generalize better to unseen POMDPs, and (2) scales to HM-POMDPs that consist of over a hundred thousand environments.
nan
Article 1091
Title@2025-06-20 (5): From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
Title: From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts | Von Daten zu Wissen: Bewertung, wie effizient Sprachmodelle Fakten lernen | 从数据到知识:评价如何高效语言模式学习事实 2506.16912v1 |
Authors (4): Daniel Christoph, Max Ploner, Patrick Haller, Alan Akbik
Sample efficiency is a crucial property of language models with practical implications for training efficiency. In real-world text, information follows a long-tailed distribution. Yet, we expect models to learn and recall frequent and infrequent facts. Sample-efficient models are better equipped to handle this challenge of learning and retaining rare information without requiring excessive exposure. This study analyzes multiple models of varying architectures and sizes, all trained on the same pre-training data. By annotating relational facts with their frequencies in the training corpus, we examine how model performance varies with fact frequency. Our findings show that most models perform similarly on high-frequency facts but differ notably on low-frequency facts. This analysis provides new insights into the relationship between model architecture, size, and factual learning efficiency.
nan
Article 1092
Title@2025-06-20 (5): Graph is all you need? Lightweight data-agnostic neural architecture search without training
Title: Graph is all you need? Lightweight data-agnostic neural architecture search without training | Graph ist alles, was Sie brauchen? Leichte daten-agnostische neuronale Architektur-Suche ohne Training | 轻量数据神经神经结构搜索,不经过训练 2405.01306v2 |
Authors (5): Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Chunheng Jiang, Jianxi Gao
Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.
nan
Article 1093
Title@2025-06-20 (5): RCNet: $ΔΣ$ IADCs as Recurrent AutoEncoders
Title: RCNet: $ΔΣ$ IADCs as Recurrent AutoEncoders | RCNet: $Δω$ IADCs als recurrent AutoEncoder | RCNet:作为经常性自动编码器的空间碎片协委会 2506.16903v1 |
Authors (3): Arnaud Verdant, William Guicquero, Jérôme Chossat
This paper proposes a deep learning model (RCNet) for Delta-Sigma ($\Delta\Sigma$) ADCs. Recurrent Neural Networks (RNNs) allow to describe both modulators and filters. This analogy is applied to Incremental ADCs (IADC). High-end optimizers combined with full-custom losses are used to define additional hardware design constraints: quantized weights, signal saturation, temporal noise injection, devices area. Focusing on DC conversion, our early results demonstrate that $SNR$ defined as an Effective Number Of Bits (ENOB) can be optimized under a certain hardware mapping complexity. The proposed RCNet succeeded to provide design tradeoffs in terms of $SNR$ ($>$13bit) versus area constraints ($<$14pF total capacitor) at a given $OSR$ (80 samples). Interestingly, it appears that the best RCNet architectures do not necessarily rely on high-order modulators, leveraging additional topology exploration degrees of freedom.
nan
Article 1094
Title@2025-06-20 (5): On Almost Surely Safe Alignment of Large Language Models at Inference-Time
Title: On Almost Surely Safe Alignment of Large Language Models at Inference-Time | Zur fast sicher sicheren Ausrichtung großer Sprachmodelle bei Inferenz-Time | 在推断时几乎可以安全地统一大语言模型 2502.01208v3 |
Authors (6): Xiaotong Ji, Shyam Sundhar Ramesh, Matthieu Zimmer, Ilija Bogunovic, Jun Wang, Haitham Bou Ammar
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely, i.e., with probability approaching one. Our approach models the generation of safe responses as a constrained Markov Decision Process (MDP) within the LLM’s latent space. We augment a safety state that tracks the evolution of safety constraints and dynamically penalize unsafe generations to ensure the generation of safe responses. Consequently, we demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties. Building on this foundation, we propose InferenceGuard, a practical implementation that safely aligns LLMs without modifying the model weights. Empirically, we demonstrate that InferenceGuard effectively balances safety and task performance, outperforming existing inference-time alignment methods in generating safe and aligned responses. Our findings contribute to the advancement of safer LLM deployment through alignment at inference-time, thus presenting a promising alternative to resource-intensive, overfitting-prone alignment techniques like RLHF.
nan
Article 1095
Title@2025-06-20 (5): With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You
Title: With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You | Mit begrenzten Daten für multimodale Ausrichtung, lassen Sie die STRUKTUR-Leitfaden Sie | 以有限数据实现多式联运对齐,让结构引导你 2506.16895v1 |
Authors (4): Fabian Gröger, Shuo Wen, Huyen Le, Maria Brbić
Multimodal models have demonstrated powerful capabilities in complex tasks requiring multimodal alignment including zero-shot classification and cross-modal retrieval. However, existing models typically rely on millions of paired multimodal samples, which are prohibitively expensive or infeasible to obtain in many domains. In this work, we explore the feasibility of building multimodal models with limited amount of paired data by aligning pretrained unimodal foundation models. We show that high-quality alignment is possible with as few as tens of thousands of paired samples$\unicode{x2013}$less than $1\%$ of the data typically used in the field. To achieve this, we introduce STRUCTURE, an effective regularization technique that preserves the neighborhood geometry of the latent space of unimodal encoders. Additionally, we show that aligning last layers is often suboptimal and demonstrate the benefits of aligning the layers with the highest representational similarity across modalities. These two components can be readily incorporated into existing alignment methods, yielding substantial gains across 24 zero-shot image classification and retrieval benchmarks, with average relative improvement of $51.6\%$ in classification and $91.8\%$ in retrieval tasks. Our results highlight the effectiveness and broad applicability of our framework for limited-sample multimodal learning and offer a promising path forward for resource-constrained domains.
nan
Article 1096
Title@2025-06-20 (5): LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment
Title: LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | LearnAlign: Grundlegende Datenauswahl für Verstärkungslernen in großen Sprachmodellen basierend auf verbesserter Gradient Alignment | 学习对称:根据改进梯度对齐,为在大语言模式中强化学习选择理由数据 2506.11480v2 |
Authors (8): Shikun Li, Shipeng Li, Zhiqin Yang, Xinghua Zhang, Gaode Chen, Xiaobo Xia, Hengyu Liu, Zhe Peng
Reinforcement learning (RL) has become a key technique for enhancing LLMs’ reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative training reasoning data for RL post-training. To overcome the issue of response-length bias in gradient norms, we introduce the data learnability based on the success rate, which can indicate the learning potential of each data point. Experiments across three mathematical reasoning benchmarks demonstrate that our method significantly reduces training data requirements while achieving minor performance degradation or even improving performance compared to full-data training. For example, it reduces data requirements by up to 1,000 data points with better performance (77.53%) than that on the full dataset on GSM8K benchmark (77.04%). Furthermore, we show its effectiveness in the staged RL setting. This work provides valuable insights into data-efficient RL post-training and establishes a foundation for future research in optimizing reasoning data selection. To facilitate future work, we will release code.
nan
Article 1097
Title@2025-06-20 (5): From Lab to Factory: Pitfalls and Guidelines for Self-/Unsupervised Defect Detection on Low-Quality Industrial Images
Title: From Lab to Factory: Pitfalls and Guidelines for Self-/Unsupervised Defect Detection on Low-Quality Industrial Images | Vom Labor zur Fabrik: Pitfalls und Richtlinien für selbst-/unüberwachte Fehlererkennung auf niederqualitativen Industriebildern | 从实验室到工厂:坑和低质量工业形象自我/不受监督的缺陷探测准则 2506.16890v1 |
Authors (2): Sebastian Hönel, Jonas Nordqvist
The detection and localization of quality-related problems in industrially mass-produced products has historically relied on manual inspection, which is costly and error-prone. Machine learning has the potential to replace manual handling. As such, the desire is to facilitate an unsupervised (or self-supervised) approach, as it is often impossible to specify all conceivable defects ahead of time. A plethora of prior works have demonstrated the aptitude of common reconstruction-, embedding-, and synthesis-based methods in laboratory settings. However, in practice, we observe that most methods do not handle low data quality well or exude low robustness in unfavorable, but typical real-world settings. For practitioners it may be very difficult to identify the actual underlying problem when such methods underperform. Worse, often-reported metrics (e.g., AUROC) are rarely suitable in practice and may give misleading results. In our setting, we attempt to identify subtle anomalies on the surface of blasted forged metal parts, using rather low-quality RGB imagery only, which is a common industrial setting. We specifically evaluate two types of state-of-the-art models that allow us to identify and improve quality issues in production data, without having to obtain new data. Our contribution is to provide guardrails for practitioners that allow them to identify problems related to, e.g., (lack of) robustness or invariance, in either the chosen model or the data reliably in similar scenarios. Furthermore, we exemplify common pitfalls in and shortcomings of likelihood-based approaches and outline a framework for proper empirical risk estimation that is more suitable for real-world scenarios.
nan
Article 1098
Title@2025-06-20 (5): Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders
Title: Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders | Stabiles Lernen mit Spiking Neuronal Networks ausgestattet mit Affine Encodern und Decodern | 利用利用配有仙形编码器和代碼器的螺旋神经网络进行的稳定学习 2404.04549v3 |
Authors (3): A. Martina Neuman, Dominik Dold, Philipp Christian Petersen
We study the learning problem associated with spiking neural networks. Specifically, we focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights, equipped with an affine encoder and decoder; we refer to these as affine spiking neural networks. These neural networks are shown to depend continuously on their parameters, which facilitates classical covering number-based generalization statements and supports stable gradient-based training. We demonstrate that the positivity of the weights enables a wide range of expressivity results, including rate-optimal approximation of smooth functions and dimension-independent approximation of Barron regular functions. In particular, we show in theory and simulations that affine spiking neural networks are capable of approximating shallow ReLU neural networks. Furthermore, we apply these affine spiking neural networks to standard machine learning benchmarks and reach competitive results. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
nan
Article 1099
Title@2025-06-20 (5): Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Title: Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension | Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension | 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v5 |
Authors (5): Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei
Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.
nan
Article 1100
Title@2025-06-20 (5): The Importance of Being Lazy: Scaling Limits of Continual Learning
Title: The Importance of Being Lazy: Scaling Limits of Continual Learning | Die Bedeutung des Faulseins: Skalierungsgrenzen des kontinuierlichen Lernens | 懒惰的重要性:持续学习的局限性 2506.16884v1 |
Authors (5): Jacopo Graldi, Alessandro Breccia, Giulia Lanzillotta, Thomas Hofmann, Lorenzo Noci
Despite recent efforts, neural networks still struggle to learn in non-stationary environments, and our understanding of catastrophic forgetting (CF) is far from complete. In this work, we perform a systematic study on the impact of model scale and the degree of feature learning in continual learning. We reconcile existing contradictory observations on scale in the literature, by differentiating between lazy and rich training regimes through a variable parameterization of the architecture. We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness. Using the framework of dynamical mean field theory, we then study the infinite width dynamics of the model in the feature learning regime and characterize CF, extending prior theoretical results limited to the lazy regime. We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks. We identify a transition modulated by task similarity where the model exits an effectively lazy regime with low forgetting to enter a rich regime with significant forgetting. Finally, our findings reveal that neural networks achieve optimal performance at a critical level of feature learning, which depends on task non-stationarity and transfers across model scales. This work provides a unified perspective on the role of scale and feature learning in continual learning.
nan
Article 1101
Title@2025-06-20 (5): Efficient Feedback Gate Network for Hyperspectral Image Super-Resolution
Title: Efficient Feedback Gate Network for Hyperspectral Image Super-Resolution | Effizientes Feedback Gate-Netzwerk für Hyperspektrale Bild-Super-Resolution | 超光谱图像超分辨率高效反馈门户网 2506.17361v1 |
Authors (10): Xufei Wang, Mingjian Zhang, Fei Ge, Jinchen Zhu, Wen Sha, Jifen Ren, Zhimeng Hou, Shouguo Zheng, ling Zheng, Shizhuang Weng
Even without auxiliary images, single hyperspectral image super-resolution (SHSR) methods can be designed to improve the spatial resolution of hyperspectral images. However, failing to explore coherence thoroughly along bands and spatial-spectral information leads to the limited performance of the SHSR. In this study, we propose a novel group-based SHSR method termed the efficient feedback gate network, which uses various feedbacks and gate operations involving large kernel convolutions and spectral interactions. In particular, by providing different guidance for neighboring groups, we can learn rich band information and hierarchical hyperspectral spatial information using channel shuffling and dilatation convolution in shuffled and progressive dilated fusion module(SPDFM). Moreover, we develop a wide-bound perception gate block and a spectrum enhancement gate block to construct the spatial-spectral reinforcement gate module (SSRGM) and obtain highly representative spatial-spectral features efficiently. Additionally, we apply a three-dimensional SSRGM to enhance holistic information and coherence for hyperspectral data. The experimental results on three hyperspectral datasets demonstrate the superior performance of the proposed network over the state-of-the-art methods in terms of spectral fidelity and spatial content reconstruction.
nan
Article 1102
Title@2025-06-20 (5): A Statistical Evaluation of Indoor LoRaWAN Environment-Aware Propagation for 6G: MLR, ANOVA, and Residual Distribution Analysis
Title: A Statistical Evaluation of Indoor LoRaWAN Environment-Aware Propagation for 6G: MLR, ANOVA, and Residual Distribution Analysis | Eine statistische Auswertung von Indoor LoRaWAN Environment-Aware Propagation für 6G: MLR, ANOVA und Residual Distribution Analysis | 6G:MLR、ANOVA和残余分布分析的室内LORAWAN环境-软件传播统计评价 2504.16688v3 |
Authors (2): Nahshon Mokua Obiri, Kristof Van Laerhoven
Modeling path loss in indoor LoRaWAN technology deployments is inherently challenging due to structural obstructions, occupant density and activities, and fluctuating environmental conditions. This study proposes a two-stage approach to capture and analyze these complexities using an extensive dataset of 1,328,334 field measurements collected over six months in a single-floor office at the University of Siegen’s Hoelderlinstrasse Campus, Germany. First, we implement a multiple linear regression model that includes traditional propagation metrics (distance, structural walls) and an extension with proposed environmental variables (relative humidity, temperature, carbon dioxide, particulate matter, and barometric pressure). Using analysis of variance, we demonstrate that adding these environmental factors can reduce unexplained variance by 42.32 percent. Secondly, we examine residual distributions by fitting five candidate probability distributions: Normal, Skew-Normal, Cauchy, Student’s t, and Gaussian Mixture Models (GMMs) with 2 to 5 components. Our results show that a four-component Gaussian Mixture Model captures the residual heterogeneity of indoor signal propagation most accurately, significantly outperforming single-distribution approaches. Given the push toward ultra-reliable, context-aware communications in 6G networks, our analysis shows that environment-aware modeling can substantially improve LoRaWAN network design in dynamic indoor IoT deployments.
nan
Article 1103
Title@2025-06-20 (5): Training Multi-Layer Binary Neural Networks With Local Binary Error Signals
Title: Training Multi-Layer Binary Neural Networks With Local Binary Error Signals | Training Multi-Layer Binär-Neural-Netzwerke mit lokalen Binär-Fehler-Signale | 利用本地二进制错误信号,培训多语言二进制神经网络 2412.00119v3 |
Authors (3): Luca Colombo, Fabrizio Pittorino, Manuel Roveri
Binary Neural Networks (BNNs) significantly reduce computational complexity and memory usage in machine and deep learning by representing weights and activations with just one bit. However, most existing training algorithms for BNNs rely on quantization-aware floating-point Stochastic Gradient Descent (SGD), limiting the full exploitation of binary operations to the inference phase only. In this work, we propose, for the first time, a fully binary and gradient-free training algorithm for multi-layer BNNs, eliminating the need for back-propagated floating-point gradients. Specifically, the proposed algorithm relies on local binary error signals and binary weight updates, employing integer-valued hidden weights that serve as a synaptic metaplasticity mechanism, thereby enhancing its neurobiological plausibility. Our proposed solution enables the training of binary multi-layer perceptrons by using exclusively XNOR, Popcount, and increment/decrement operations. Experimental results on multi-class classification benchmarks show test accuracy improvements of up to +35.47% over the only existing fully binary single-layer state-of-the-art solution. Compared to full-precision SGD, our solution improves test accuracy by up to +35.30% under the same total memory demand, while also reducing computational cost by two to three orders of magnitude in terms of the total number of Boolean gates. The proposed algorithm is made available to the scientific community as a public repository.
nan
Article 1104
Title@2025-06-20 (5): Optimal Depth of Neural Networks
Title: Optimal Depth of Neural Networks | Optimale Tiefe der neuralen Netze | 神经网络的最佳深度 2506.16862v1 |
Authors (1): Qian Qi
Determining the optimal depth of a neural network is a fundamental yet challenging problem, typically resolved through resource-intensive experimentation. This paper introduces a formal theoretical framework to address this question by recasting the forward pass of a deep network, specifically a Residual Network (ResNet), as an optimal stopping problem. We model the layer-by-layer evolution of hidden representations as a sequential decision process where, at each layer, a choice is made between halting computation to make a prediction or continuing to a deeper layer for a potentially more refined representation. This formulation captures the intrinsic trade-off between accuracy and computational cost. Our primary theoretical contribution is a proof that, under a plausible condition of diminishing returns on the residual functions, the expected optimal stopping depth is provably finite, even in an infinite-horizon setting. We leverage this insight to propose a novel and practical regularization term, $\mathcal{L}_{\rm depth}$, that encourages the network to learn representations amenable to efficient, early exiting. We demonstrate the generality of our framework by extending it to the Transformer architecture and exploring its connection to continuous-depth models via free-boundary problems. Empirical validation on ImageNet confirms that our regularizer successfully induces the theoretically predicted behavior, leading to significant gains in computational efficiency without compromising, and in some cases improving, final model accuracy.
nan
Article 1105
Title@2025-06-20 (5): Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution
Title: Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution | Auf dem Weg zu einer effizienten Nur-Shot Graph-Neural-Architektur Suche über Partitionierung Gradient Beitrag | 通过分割渐变贡献, 实现高效、 短短截图图像神经结构搜索 2506.01231v2 |
Authors (9): Wenhao Song, Xuan Wu, Bo Yang, You Zhou, Yubin Xiao, Yanchun Liang, Hongwei Ge, Heow Pueh Lee, Chunguo Wu
To address the weight coupling problem, certain studies introduced few-shot Neural Architecture Search (NAS) methods, which partition the supernet into multiple sub-supernets. However, these methods often suffer from computational inefficiency and tend to provide suboptimal partitioning schemes. To address this problem more effectively, we analyze the weight coupling problem from a novel perspective, which primarily stems from distinct modules in succeeding layers imposing conflicting gradient directions on the preceding layer modules. Based on this perspective, we propose the Gradient Contribution (GC) method that efficiently computes the cosine similarity of gradient directions among modules by decomposing the Vector-Jacobian Product during supernet backpropagation. Subsequently, the modules with conflicting gradient directions are allocated to distinct sub-supernets while similar ones are grouped together. To assess the advantages of GC and address the limitations of existing Graph Neural Architecture Search methods, which are limited to searching a single type of Graph Neural Networks (Message Passing Neural Networks (MPNNs) or Graph Transformers (GTs)), we propose the Unified Graph Neural Architecture Search (UGAS) framework, which explores optimal combinations of MPNNs and GTs. The experimental results demonstrate that GC achieves state-of-the-art (SOTA) performance in supernet partitioning quality and time efficiency. In addition, the architectures searched by UGAS+GC outperform both the manually designed GNNs and those obtained by existing NAS methods. Finally, ablation studies further demonstrate the effectiveness of all proposed methods.
nan
Article 1106
Title@2025-06-20 (5): ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
Title: ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | ICC: 多式数据集曲线的量化图像显示具体度 2403.01306v4 |
Authors (4): Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concrete samples that provide the strongest signal for learning in a noisy dataset. In this work, we propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness and relevancy for use in multimodal learning. Our approach leverages strong foundation models for measuring visual-semantic information loss in multimodal representations. We demonstrate that this strongly correlates with human evaluation of concreteness in both single-word and sentence-level texts. Moreover, we show that curation using ICC complements existing approaches: It succeeds in selecting the highest quality samples from multimodal web-scale datasets to allow for efficient training in resource-constrained settings.
nan
Article 1107
Title@2025-06-20 (5): Anomaly Detection in Event-triggered Traffic Time Series via Similarity Learning
Title: Anomaly Detection in Event-triggered Traffic Time Series via Similarity Learning | Anomalie-Erkennung in ereignisgetriggerten Traffic Time-Serien über Ähnlichkeits-Lernen | 通过类似学习在事件触发的交通时间序列中异常探测 2506.16855v1 |
Authors (5): Shaoyu Dou, Kai Yang, Yang Jiao, Chengbo Qiu, Kui Ren
Time series analysis has achieved great success in cyber security such as intrusion detection and device identification. Learning similarities among multiple time series is a crucial problem since it serves as the foundation for downstream analysis. Due to the complex temporal dynamics of the event-triggered time series, it often remains unclear which similarity metric is appropriate for security-related tasks, such as anomaly detection and clustering. The overarching goal of this paper is to develop an unsupervised learning framework that is capable of learning similarities among a set of event-triggered time series. From the machine learning vantage point, the proposed framework harnesses the power of both hierarchical multi-resolution sequential autoencoders and the Gaussian Mixture Model (GMM) to effectively learn the low-dimensional representations from the time series. Finally, the obtained similarity measure can be easily visualized for the explanation. The proposed framework aspires to offer a stepping stone that gives rise to a systematic approach to model and learn similarities among a multitude of event-triggered time series. Through extensive qualitative and quantitative experiments, it is revealed that the proposed method outperforms state-of-the-art methods considerably.
nan
Article 1108
Title@2025-06-20 (5): Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models
Title: Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models | Reward-Agnostic Prompt-Optimierung für Diffusionsmodelle von Text zu Bild | 文本到图像传播模型的奖励-不可知迅速优化 2506.16853v1 |
Authors (4): Semin Kim, Yeonwoo Cha, Jaehoon Yoo, Seunghoon Hong
We investigate a general approach for improving user prompts in text-to-image (T2I) diffusion models by finding prompts that maximize a reward function specified at test-time. Although diverse reward models are used for evaluating image generation, existing automated prompt engineering methods typically target specific reward configurations. Consequently, these specialized designs exhibit suboptimal performance when applied to new prompt engineering scenarios involving different reward models. To address this limitation, we introduce RATTPO (Reward-Agnostic Test-Time Prompt Optimization), a flexible test-time optimization method applicable across various reward scenarios without modification. RATTPO iteratively searches for optimized prompts by querying large language models (LLMs) \textit{without} requiring reward-specific task descriptions. Instead, it uses the optimization trajectory and a novel reward-aware feedback signal (termed a “hint”) as context. Empirical results demonstrate the versatility of RATTPO, effectively enhancing user prompts across diverse reward setups that assess various generation aspects, such as aesthetics, general human preference, or spatial relationships between objects. RATTPO surpasses other test-time search baselines in search efficiency, using up to 3.5 times less inference budget, and, given sufficient inference budget, achieves performance comparable to learning-based baselines that require reward-specific fine-tuning. The code is available at https://github.com/seminkim/RATTPO.
nan
Article 1109
Title@2025-06-20 (5): Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Title: Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation | Anpassung beim Lernen: LLMs für wissenschaftliche Probleme mit intelligenter Werkzeugverwendung anpassen | 在学习期间适应适应:利用智能工具适应科学问题定位LMS 2411.00412v4 |
Authors (6): Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
Large Language Models (LLMs) demonstrate promising capabilities in solving scientific problems but often suffer from the issue of hallucination. While integrating LLMs with tools can mitigate this issue, models fine-tuned on tool usage become overreliant on them and incur unnecessary costs. Inspired by how human experts assess problem complexity before selecting solutions, we propose a novel two-component fine-tuning method, Adapting While Learning (AWL). In the first component, World Knowledge Learning (WKL), LLMs internalize scientific knowledge by learning from tool-generated solutions. In the second component, Tool Usage Adaptation (TUA), we categorize problems as easy or hard based on the model’s accuracy, and train it to maintain direct reasoning for easy problems while switching to tools for hard ones. We validate our method on six scientific benchmark datasets across climate science, epidemiology, physics, and other domains. Compared to the original instruct model (8B), models post-trained with AWL achieve 29.11% higher answer accuracy and 12.72% better tool usage accuracy, even surpassing state-of-the-art models including GPT-4o and Claude-3.5 on four custom-created datasets. Our code is open-source at https://github.com/Rose-STL-Lab/Adapting-While-Learning.
nan
Article 1110
Title@2025-06-20 (5): Bandwidth Selectors on Semiparametric Bayesian Networks
Title: Bandwidth Selectors on Semiparametric Bayesian Networks | Bandbreiten-Selektoren auf semiparametrischen Bayesischen Netzwerken | 半参数贝近地网络上的带宽选择器 2506.16844v1 |
Authors (3): Victor Alejandre, Concha Bielza, Pedro Larrañaga
Semiparametric Bayesian networks (SPBNs) integrate parametric and non-parametric probabilistic models, offering flexibility in learning complex data distributions from samples. In particular, kernel density estimators (KDEs) are employed for the non-parametric component. Under the assumption of data normality, the normal rule is used to learn the bandwidth matrix for the KDEs in SPBNs. This matrix is the key hyperparameter that controls the trade-off between bias and variance. However, real-world data often deviates from normality, potentially leading to suboptimal density estimation and reduced predictive performance. This paper first establishes the theoretical framework for the application of state-of-the-art bandwidth selectors and subsequently evaluates their impact on SPBN performance. We explore the approaches of cross-validation and plug-in selectors, assessing their effectiveness in enhancing the learning capability and applicability of SPBNs. To support this investigation, we have extended the open-source package PyBNesian for SPBNs with the additional bandwidth selection techniques and conducted extensive experimental analyses. Our results demonstrate that the proposed bandwidth selectors leverage increasing information more effectively than the normal rule, which, despite its robustness, stagnates with more data. In particular, unbiased cross-validation generally outperforms the normal rule, highlighting its advantage in high sample size scenarios.
nan
Article 1111
Title@2025-06-20 (5): FedFitTech: A Baseline in Federated Learning for Fitness Tracking
Title: FedFitTech: A Baseline in Federated Learning for Fitness Tracking | FedFitTech: Eine Basis im Federated Learning für Fitness-Tracking | FFFFFTTTech:联邦健身跟踪学习基准 2506.16840v1 |
Authors (4): Zeyneddin Oz, Shreyas Korde, Marius Bock, Kristof Van Laerhoven
Rapid evolution of sensors and resource-efficient machine learning models have spurred the widespread adoption of wearable fitness tracking devices. Equipped with inertial sensors, such devices can continuously capture physical movements for fitness technology (FitTech), enabling applications from sports optimization to preventive healthcare. Traditional centralized learning approaches to detect fitness activities struggle with privacy concerns, regulatory constraints, and communication inefficiencies. In contrast, Federated Learning (FL) enables a decentralized model training by communicating model updates rather than private wearable sensor data. Applying FL to FitTech presents unique challenges, such as data imbalance, lack of labelled data, heterogeneous user activity patterns, and trade-offs between personalization and generalization. To simplify research on FitTech in FL, we present the FedFitTech baseline, under the Flower framework, which is publicly available and widely used by both industry and academic researchers. Additionally, to illustrate its usage, this paper presents a case study that implements a system based on the FedFitTech baseline, incorporating a client-side early stopping strategy and comparing the results. For instance, this system allows wearable devices to optimize the trade-off between capturing common fitness activity patterns and preserving individuals’ nuances, thereby enhancing both the scalability and efficiency of privacy-aware fitness tracking applications. Results show that this reduces overall redundant communications by 13 percent, while maintaining the overall recognition performance at a negligible recognition cost by 1 percent. Thus, FedFitTech baseline creates a foundation for a wide range of new research and development opportunities in FitTech, and it is available as open-source at: https://github.com/adap/flower/tree/main/baselines/fedfittech
nan
Article 1112
Title@2025-06-20 (5): Beyond Blur: A Fluid Perspective on Generative Diffusion Models
Title: Beyond Blur: A Fluid Perspective on Generative Diffusion Models | Beyond Blur: Eine flüssige Perspektive auf generative Diffusionsmodelle | 模糊之外:关于发源传播模型的流透视角 2506.16827v1 |
Authors (4): Grzegorz Gruszczynski, Michal Jan Wlodarczyk, Jakub J Meixner, Przemyslaw Musialski
We propose a novel PDE-driven corruption process for generative image synthesis based on advection-diffusion processes which generalizes existing PDE-based approaches. Our forward pass formulates image corruption via a physically motivated PDE that couples directional advection with isotropic diffusion and Gaussian noise, controlled by dimensionless numbers (Peclet, Fourier). We implement this PDE numerically through a GPU-accelerated custom Lattice Boltzmann solver for fast evaluation. To induce realistic turbulence, we generate stochastic velocity fields that introduce coherent motion and capture multi-scale mixing. In the generative process, a neural network learns to reverse the advection-diffusion operator thus constituting a novel generative model. We discuss how previous methods emerge as specific cases of our operator, demonstrating that our framework generalizes prior PDE-based corruption techniques. We illustrate how advection improves the diversity and quality of the generated images while keeping the overall color palette unaffected. This work bridges fluid dynamics, dimensionless PDE theory, and deep generative modeling, offering a fresh perspective on physically informed image corruption processes for diffusion-based synthesis.
nan
Article 1113
Title@2025-06-20 (5): Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs
Title: Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs | Vorhersage neuer Forschungsrichtungen in der Materialwissenschaft mit großen Sprachmodellen und Konzeptgraphen | 利用大语言模型和概念图预测材料科学新研究方向 2506.16824v1 |
Authors (13): Thomas Marwitz, Alexander Colsmann, Ben Breitung, Christoph Brabec, Christoph Kirchlechner, Eva Blasco, Gabriel Cadilha Marques, Horst Hahn, Michael Hirtz, Pavel A. Levkin, Yolita M. Eggeler, Tobias Schlöder, Pascal Friederich
Due to an exponential increase in published research articles, it is impossible for individual scientists to read all publications, even within their own research field. In this work, we investigate the use of large language models (LLMs) for the purpose of extracting the main concepts and semantic information from scientific abstracts in the domain of materials science to find links that were not noticed by humans and thus to suggest inspiring near/mid-term future research directions. We show that LLMs can extract concepts more efficiently than automated keyword extraction methods to build a concept graph as an abstraction of the scientific literature. A machine learning model is trained to predict emerging combinations of concepts, i.e. new research ideas, based on historical data. We demonstrate that integrating semantic concept information leads to an increased prediction performance. The applicability of our model is demonstrated in qualitative interviews with domain experts based on individualized model suggestions. We show that the model can inspire materials scientists in their creative thinking process by predicting innovative combinations of topics that have not yet been investigated.
nan
Article 1114
Title@2025-06-20 (5): When and How Does CLIP Enable Domain and Compositional Generalization?
Title: When and How Does CLIP Enable Domain and Compositional Generalization? | Wann und wie aktiviert CLIP Domain- und Kompositionsverallgemeinerung? | CLIP 何时和如何启用域和组成集约化? 2502.09507v2 |
Authors (4): Elias Kempf, Simon Schrodi, Max Argus, Thomas Brox
The remarkable generalization performance of contrastive vision-language models like CLIP is often attributed to the diversity of their training distributions. However, key questions remain unanswered: Can CLIP generalize to an entirely unseen domain when trained on a diverse mixture of domains (domain generalization)? Can it generalize to unseen classes within partially seen domains (compositional generalization)? What factors affect such generalization? To answer these questions, we trained CLIP models on systematically constructed training distributions with controlled domain diversity and object class exposure. Our experiments show that domain diversity is essential for both domain and compositional generalization, yet compositional generalization can be surprisingly weaker than domain generalization when the training distribution contains a suboptimal subset of the test domain. Through data-centric and mechanistic analyses, we find that successful generalization requires the learning of sufficiently shared representations in intermediate layers and circuits.
nan
Article 1115
Title@2025-06-20 (5): Robust Group Anomaly Detection for Quasi-Periodic Network Time Series
Title: Robust Group Anomaly Detection for Quasi-Periodic Network Time Series | Robuste Gruppenanomalienerkennung für Quasi-periodische Netzwerk-Zeitreihen | 准固定网络自动探测强力组 时间序列 2506.16815v1 |
Authors (5): Kai Yang, Shaoyu Dou, Pan Luo, Xin Wang, H. Vincent Poor
Many real-world multivariate time series are collected from a network of physical objects embedded with software, electronics, and sensors. The quasi-periodic signals generated by these objects often follow a similar repetitive and periodic pattern, but have variations in the period, and come in different lengths caused by timing (synchronization) errors. Given a multitude of such quasi-periodic time series, can we build machine learning models to identify those time series that behave differently from the majority of the observations? In addition, can the models help human experts to understand how the decision was made? We propose a sequence to Gaussian Mixture Model (seq2GMM) framework. The overarching goal of this framework is to identify unusual and interesting time series within a network time series database. We further develop a surrogate-based optimization algorithm that can efficiently train the seq2GMM model. Seq2GMM exhibits strong empirical performance on a plurality of public benchmark datasets, outperforming state-of-the-art anomaly detection techniques by a significant margin. We also theoretically analyze the convergence property of the proposed training algorithm and provide numerical results to substantiate our theoretical claims.
nan
Article 1116
Title@2025-06-20 (5): Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning
Title: Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning | Boltzmann Klassifikator: Ein thermodynamisch inspirierter Ansatz zum überwachten Lernen | Boltzmann 分类: 一种热动力学激励式的受监督学习方法 2505.06753v2 |
Authors (2): Muhamed Amin, Bernard R. Brooks
We present the Boltzmann classifier, a novel distance based probabilistic classification algorithm inspired by the Boltzmann distribution. Unlike traditional classifiers that produce hard decisions or uncalibrated probabilities, the Boltzmann classifier assigns class probabilities based on the average distance to the nearest neighbors within each class, providing interpretable, physically meaningful outputs. We evaluate the performance of the method across three application domains: molecular activity prediction, oxidation state classification of transition metal complexes, and breast cancer diagnosis. In the molecular activity task, the classifier achieved the highest accuracy in predicting active compounds against two protein targets, with strong correlations observed between the predicted probabilities and experimental pIC50 values. For metal complexes, the classifier accurately distinguished between oxidation states II and III for Fe, Mn, and Co, using only metal-ligand bond lengths extracted from crystallographic data, and demonstrated high consistency with known chemical trends. In the breast cancer dataset, the classifier achieved 97% accuracy, with low confidence predictions concentrated in inherently ambiguous cases. Across all tasks, the Boltzmann classifier performed competitively or better than standard models such as logistic regression, support vector machines, random forests, and k-nearest neighbors. Its probabilistic outputs were found to correlate with continuous physical or biological properties, highlighting its potential utility in both classification and regression contexts. The results suggest that the Boltzmann classifier is a robust and interpretable alternative to conventional machine learning approaches, particularly in scientific domains where underlying structure property relationships are important.
nan
Article 1117
Title@2025-06-20 (5): CINNAMON: A hybrid approach to change point detection and parameter estimation in single-particle tracking data
Title: CINNAMON: A hybrid approach to change point detection and parameter estimation in single-particle tracking data | CINNAMON: Ein hybrider Ansatz zur Änderung der Punkterkennung und der Parameterschätzung in Einzelteilchen-Tracking-Daten | CINNAMON: 改变单粒子跟踪数据中点探测和参数估计的混合方法 2503.14253v2 |
Authors (5): Jakub Malinowski, Marcin Kostrzewa, Michał Balcerek, Weronika Tomczuk, Janusz Szwabiński
Change point detection has become an important part of the analysis of the single-particle tracking data, as it allows one to identify moments, in which the motion patterns of observed particles undergo significant changes. The segmentation of diffusive trajectories based on those moments may provide insight into various phenomena in soft condensed matter and biological physics. In this paper, we propose CINNAMON, a hybrid approach to classifying single-particle tracking trajectories, detecting change points within them, and estimating diffusion parameters in the segments between the change points. Our method is based on a combination of neural networks, feature-based machine learning, and statistical techniques. It has been benchmarked in the second Anomalous Diffusion Challenge. The method offers a high level of interpretability due to its analytical and feature-based components. A potential use of features from topological data analysis is also discussed.
nan
Article 1118
Title@2025-06-20 (5): DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis
Title: DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis | DVFS-Aware DNN-Schlussfolgerung zu GPUs: Latenzmodellierung und Leistungsanalyse | DVFS-Aware DNN GPUs的推论:长期建模和业绩分析 2502.06295v2 |
Authors (4): Yunchu Han, Zhaojun Nan, Sheng Zhou, Zhisheng Niu
The rapid development of deep neural networks (DNNs) is inherently accompanied by the problem of high computational costs. To tackle this challenge, dynamic voltage frequency scaling (DVFS) is emerging as a promising technology for balancing the latency and energy consumption of DNN inference by adjusting the computing frequency of processors. However, most existing models of DNN inference time are based on the CPU-DVFS technique, and directly applying the CPU-DVFS model to DNN inference on GPUs will lead to significant errors in optimizing latency and energy consumption. In this paper, we propose a DVFS-aware latency model to precisely characterize DNN inference time on GPUs. We first formulate the DNN inference time based on extensive experiment results for different devices and analyze the impact of fitting parameters. Then by dividing DNNs into multiple blocks and obtaining the actual inference time, the proposed model is further verified. Finally, we compare our proposed model with the CPU-DVFS model in two specific cases. Evaluation results demonstrate that local inference optimization with our proposed model achieves a reduction of no less than 66% and 69% in inference time and energy consumption respectively. In addition, cooperative inference with our proposed model can improve the partition policy and reduce the energy consumption compared to the CPU-DVFS model.
nan
Article 1119
Title@2025-06-20 (5): Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Title: Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack | Effizient, aber gefährdet: Benchmarking und Defending LLM Batch Prompting Attack | 高效但脆弱:基准设定和捍卫LLM批次快速袭击 2503.15551v2 |
Authors (2): Murong Yue, Ziyu Yao
Batch prompting, which combines a batch of multiple queries sharing the same context in one inference, has emerged as a promising solution to reduce inference costs. However, our study reveals a significant security vulnerability in batch prompting: malicious users can inject attack instructions into a batch, leading to unwanted interference across all queries, which can result in the inclusion of harmful content, such as phishing links, or the disruption of logical reasoning. In this paper, we construct BATCHSAFEBENCH, a comprehensive benchmark comprising 150 attack instructions of two types and 8k batch instances, to study the batch prompting vulnerability systematically. Our evaluation of both closed-source and open-weight LLMs demonstrates that all LLMs are susceptible to batch-prompting attacks. We then explore multiple defending approaches. While the prompting-based defense shows limited effectiveness for smaller LLMs, the probing-based approach achieves about 95% accuracy in detecting attacks. Additionally, we perform a mechanistic analysis to understand the attack and identify attention heads that are responsible for it.
nan
Article 1120
Title@2025-06-20 (5): Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
Title: Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective | Erforschung und Verbesserung der Initialisierung für tiefe Graphen-Neural-Netzwerke: Eine Signalverbreitungsperspektive | 探索和改进深图神经网络的初始化:信号传动视角 2506.16790v1 |
Authors (5): Senmiao Wang, Yupeng Chen, Yushun Zhang, Ruoyu Sun, Tian Ding
Graph Neural Networks (GNNs) often suffer from performance degradation as the network depth increases. This paper addresses this issue by introducing initialization methods that enhance signal propagation (SP) within GNNs. We propose three key metrics for effective SP in GNNs: forward propagation, backward propagation, and graph embedding variation (GEV). While the first two metrics derive from classical SP theory, the third is specifically designed for GNNs. We theoretically demonstrate that a broad range of commonly used initialization methods for GNNs, which exhibit performance degradation with increasing depth, fail to control these three metrics simultaneously. To deal with this limitation, a direct exploitation of the SP analysis–searching for weight initialization variances that optimize the three metrics–is shown to significantly enhance the SP in deep GCNs. This approach is called Signal Propagation on Graph-guided Initialization (SPoGInit). Our experiments demonstrate that SPoGInit outperforms commonly used initialization methods on various tasks and architectures. Notably, SPoGInit enables performance improvements as GNNs deepen, which represents a significant advancement in addressing depth-related challenges and highlights the validity and effectiveness of the SP analysis framework.
nan
Article 1121
Title@2025-06-20 (5): Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Title: Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps | LoRA durch die Lens of Parameter Redundancy erneut besuchen: Spectral Encoding hilft | 通过参数冗余的镜头对 LoRA 进行重审: 光谱编码帮助 2506.16787v1 |
Authors (7): Jiashun Cheng, Aochuan Chen, Nuo Chen, Ziqi Gao, Yuhan Li, Jia Li, Fugee Tsung
Low-Rank Adaptation (LoRA) has emerged as a prominent technique for fine-tuning large foundation models. Despite its successes, the substantial parameter redundancy, which limits the capacity and efficiency of LoRA, has been recognized as a bottleneck. In this work, we systematically investigate the impact of redundancy in fine-tuning LoRA and reveal that reducing density redundancy does not degrade expressiveness. Based on this insight, we introduce \underline{S}pectral-\underline{e}ncoding \underline{L}ow-\underline{R}ank \underline{A}daptation (SeLoRA), which harnesses the robust expressiveness of spectral bases to re-parameterize LoRA from a sparse spectral subspace. Designed with simplicity, SeLoRA enables seamless integration with various LoRA variants for performance boosting, serving as a scalable plug-and-play framework. Extensive experiments substantiate that SeLoRA achieves greater efficiency with fewer parameters, delivering superior performance enhancements over strong baselines on various downstream tasks, including commonsense reasoning, math reasoning, and code generation.
nan
Article 1122
Title@2025-06-20 (5): CodeV-R1: Reasoning-Enhanced Verilog Generation
Title: CodeV-R1: Reasoning-Enhanced Verilog Generation | CodeV-R1: Grundlegende Verilog-Generierung | 代码V-R1:有理性的增强性性性性性性性生殖器生成 2505.24183v2 |
Authors (19): Yaoyu Zhu, Di Huang, Hanqi Lyu, Xiaoyun Zhang, Chongxiao Li, Wenxuan Shi, Yutong Wu, Jianan Mu, Jinghua Wang, Yang Zhao, Pengwei Jin, Shuyao Cheng, Shengwen Liang, Xishan Zhang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen
Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code-NL-code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage “distill-then-RL” training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior state-of-the-art by 12~20%, while matching or even exceeding the performance of 671B DeepSeek-R1. We will release our model, training pipeline, and dataset to facilitate research in EDA and LLM communities.
nan
Article 1123
Title@2025-06-20 (5): What Is the Point of Equality in Machine Learning Fairness? Beyond Equality of Opportunity
Title: What Is the Point of Equality in Machine Learning Fairness? Beyond Equality of Opportunity | Was ist der Punkt der Gleichheit in der Fairness des maschinellen Lernens? | 机器学习公平中的平等点是什么? 2506.16782v1 |
Authors (1): Youjin Kong
Fairness in machine learning (ML) has become a rapidly growing area of research. But why, in the first place, is unfairness in ML morally wrong? And why should we care about improving fairness? Most fair-ML research implicitly appeals to distributive equality: the idea that desirable goods and benefits, such as opportunities (e.g., Barocas et al., 2023), should be equally distributed across society. Unfair ML models, then, are seen as wrong because they unequally distribute such benefits. This paper argues that this exclusive focus on distributive equality offers an incomplete and potentially misleading ethical foundation. Grounding ML fairness in egalitarianism – the view that equality is a fundamental moral and social ideal – requires challenging structural inequality: systematic, institutional, and durable arrangements that privilege some groups while disadvantaging others. Structural inequality manifests through ML systems in two primary forms: allocative harms (e.g., economic loss) and representational harms (e.g., stereotypes, erasure). While distributive equality helps address allocative harms, it fails to explain why representational harms are wrong – why it is wrong for ML systems to reinforce social hierarchies that stratify people into superior and inferior groups – and why ML systems should aim to foster a society where people relate as equals (i.e., relational equality). To address these limitations, the paper proposes a multifaceted egalitarian framework for ML fairness that integrates both distributive and relational equality. Drawing on critical social and political philosophy, this framework offers a more comprehensive ethical foundation for tackling the full spectrum of harms perpetuated by ML systems. The paper also outlines practical pathways for implementing the framework across the ML pipeline.
nan
Article 1124
Title@2025-06-20 (5): SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation
Title: SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation | SSR-Zero: Einfaches Selbstveredelungslernen für maschinelle Übersetzung | 机械翻译简单自评强化学习 2505.16637v3 |
Authors (5): Wenjie Yang, Mao Zheng, Mingyang Song, Zheng Li, Sitong Wang
Large language models (LLMs) have recently demonstrated remarkable capabilities in machine translation (MT). However, most advanced MT-specific LLMs heavily rely on external supervision signals during training, such as human-annotated reference data or trained reward models (RMs), which are often expensive to obtain and challenging to scale. To overcome this limitation, we propose a Simple Self-Rewarding (SSR) Reinforcement Learning (RL) framework for MT that is reference-free, fully online, and relies solely on self-judging rewards. Training with SSR using 13K monolingual examples and Qwen-2.5-7B as the backbone, our model SSR-Zero-7B outperforms existing MT-specific LLMs, e.g., TowerInstruct-13B and GemmaX-28-9B, as well as larger general LLMs like Qwen2.5-32B-Instruct in English $\leftrightarrow$ Chinese translation tasks from WMT23, WMT24, and Flores200 benchmarks. Furthermore, by augmenting SSR with external supervision from COMET, our strongest model, SSR-X-Zero-7B, achieves state-of-the-art performance in English $\leftrightarrow$ Chinese translation, surpassing all existing open-source models under 72B parameters and even outperforming closed-source models, e.g., GPT-4o and Gemini 1.5 Pro. Our analysis highlights the effectiveness of the self-rewarding mechanism compared to the external LLM-as-a-judge approach in MT and demonstrates its complementary benefits when combined with trained RMs. Our findings provide valuable insight into the potential of self-improving RL methods. We have publicly released our code, data and models.
nan
Article 1125
Title@2025-06-20 (5): Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
Title: Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies | Können wir Fehler ohne Fehlerdaten erkennen? Ungewissheit-Bewusst Runtime Failure Detection for Imitation Learning Policies | 我们能否在无故障数据的情况下检测失败? 用于模拟学习政策的不确定性- 软件运行时故障检测 2503.08558v3 |
Authors (10): Chen Xu, Tony Khuong Nguyen, Emma Dixon, Christopher Rodriguez, Patrick Miller, Robert Lee, Paarth Shah, Rares Ambrus, Haruki Nishimura, Masha Itkina
Recent years have witnessed impressive robotic manipulation systems driven by advances in imitation learning and generative modeling, such as diffusion- and flow-based approaches. As robot policy performance increases, so does the complexity and time horizon of achievable tasks, inducing unexpected and diverse failure modes that are difficult to predict a priori. To enable trustworthy policy deployment in safety-critical human environments, reliable runtime failure detection becomes important during policy inference. However, most existing failure detection approaches rely on prior knowledge of failure modes and require failure data during training, which imposes a significant challenge in practicality and scalability. In response to these limitations, we present FAIL-Detect, a modular two-stage approach for failure detection in imitation learning-based robotic manipulation. To accurately identify failures from successful training data alone, we frame the problem as sequential out-of-distribution (OOD) detection. We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture epistemic uncertainty. FAIL-Detect then employs conformal prediction (CP) as a versatile framework for uncertainty quantification with statistical guarantees. Empirically, we thoroughly investigate both learned and post-hoc scalar signal candidates on diverse robotic manipulation tasks. Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator. Furthermore, our method detects failures more accurately and faster than state-of-the-art (SOTA) failure detection baselines. These results highlight the potential of FAIL-Detect to enhance the safety and reliability of imitation learning-based robotic systems as they progress toward real-world deployment.
nan
Article 1126
Title@2025-06-20 (5): Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations
Title: Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations | Wissensdestillationsrahmen für die Beschleunigung hochakkurater neuraler Netzwerk-basierter molekularer Dynamiksimulationen | 加速高准确度高神经网基分子动态模拟学知识蒸馏框架 2506.15337v2 |
Authors (5): Naoki Matsumura, Yuta Yoshimoto, Yuto Iwasaki, Meguru Yamazaki, Yasufumi Sakai
Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data encompassing both low-energy stable structures and high-energy structures. Conventional knowledge distillation (KD) methods fine-tune a pre-trained NNP as a teacher model to generate training data for a student model. However, in material-specific models, this fine-tuning process increases energy barriers, making it difficult to create training data containing high-energy structures. To address this, we propose a novel KD framework that leverages a non-fine-tuned, off-the-shelf pre-trained NNP as a teacher. Its gentler energy landscape facilitates the exploration of a wider range of structures, including the high-energy structures crucial for stable MD simulations. Our framework employs a two-stage training process: first, the student NNP is trained with a dataset generated by the off-the-shelf teacher; then, it is fine-tuned with a smaller, high-accuracy density functional theory (DFT) dataset. We demonstrate the effectiveness of our framework by applying it to both organic (polyethylene glycol) and inorganic (L${10}$GeP${2}$S$_{12}$) materials, achieving comparable or superior accuracy in reproducing physical properties compared to existing methods. Importantly, our method reduces the number of expensive DFT calculations by 10x compared to existing NNP generation methods, without sacrificing accuracy. Furthermore, the resulting student NNP achieves up to 106x speedup in inference compared to the teacher NNP, enabling significantly faster and more efficient MD simulations.
nan
Article 1127
Title@2025-06-20 (5): Metapath-based Hyperbolic Contrastive Learning for Heterogeneous Graph Embedding
Title: Metapath-based Hyperbolic Contrastive Learning for Heterogeneous Graph Embedding | Metapath-basiertes hyperbolisches Kontrastives Lernen für heterogene Grapheneinbettung | 异异异形图形嵌入式的超双曲反对立学习 2506.16754v1 |
Authors (4): Jongmin Park, Seunghoon Han, Won-Yong Shin, Sungsu Lim
The hyperbolic space, characterized by a constant negative curvature and exponentially expanding space, aligns well with the structural properties of heterogeneous graphs. However, although heterogeneous graphs inherently possess diverse power-law structures, most hyperbolic heterogeneous graph embedding models rely on a single hyperbolic space. This approach may fail to effectively capture the diverse power-law structures within heterogeneous graphs. To address this limitation, we propose a Metapath-based Hyperbolic Contrastive Learning framework (MHCL), which uses multiple hyperbolic spaces to capture diverse complex structures within heterogeneous graphs. Specifically, by learning each hyperbolic space to describe the distribution of complex structures corresponding to each metapath, it is possible to capture semantic information effectively. Since metapath embeddings represent distinct semantic information, preserving their discriminability is important when aggregating them to obtain node representations. Therefore, we use a contrastive learning approach to optimize MHCL and improve the discriminability of metapath embeddings. In particular, our contrastive learning method minimizes the distance between embeddings of the same metapath and maximizes the distance between those of different metapaths in hyperbolic space, thereby improving the separability of metapath embeddings with distinct semantic information. We conduct comprehensive experiments to evaluate the effectiveness of MHCL. The experimental results demonstrate that MHCL outperforms state-of-the-art baselines in various graph machine learning tasks, effectively capturing the complex structures of heterogeneous graphs.
nan
Article 1128
Title@2025-06-20 (5): Nature Language Model: Deciphering the Language of Nature for Scientific Discovery
Title: Nature Language Model: Deciphering the Language of Nature for Scientific Discovery | Nature Language Model: Die Sprache der Natur für die wissenschaftliche Entdeckung bestimmen | 自然语言模型:为科学发现而破除自然语言 2502.07527v3 |
Authors (46): Yingce Xia, Peiran Jin, Shufang Xie, Liang He, Chuan Cao, Renqian Luo, Guoqing Liu, Yue Wang, Zequn Liu, Yuan-Jyue Chen, Zekun Guo, Yeqi Bai, Pan Deng, Yaosen Min, Ziheng Lu, Hongxia Hao, Han Yang, Jielan Li, Chang Liu, Jia Zhang, Jianwei Zhu, Ran Bi, Kehan Wu, Wei Zhang, Kaiyuan Gao, Qizhi Pei, Qian Wang, Xixian Liu, Yanting Li, Houtian Zhu, Yeqing Lu, Mingqian Ma, Zun Wang, Tian Xie, Krzysztof Maziarz, Marwin Segler, Zhao Yang, Zilong Chen, Yu Shi, Shuxin Zheng, Lijun Wu, Chen Hu, Peggy Dai, Tie-Yan Liu, Haiguang Liu, Tao Qin
Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, RNA and even cells. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the “language of nature”, we introduce Nature Language Model (NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) top performance across different domains, matching or surpassing state-of-the-art specialist models. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.
nan
Article 1129
Title@2025-06-20 (5): Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
Title: Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation | Off-Policy-Actor-Kritik für adversarische Beobachtung Robustheit: Virtuelles alternatives Training durch symmetrische Politikevaluierung | 外部观察强力非政策行为者-批评者:通过对称政策评价进行虚拟替代培训 2506.16753v1 |
Authors (4): Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii
Recently, robust reinforcement learning (RL) methods designed to handle adversarial input observations have received significant attention, motivated by RL’s inherent vulnerabilities. While existing approaches have demonstrated reasonable success, addressing worst-case scenarios over long time horizons requires both minimizing the agent’s cumulative rewards for adversaries and training agents to counteract them through alternating learning. However, this process introduces mutual dependencies between the agent and the adversary, making interactions with the environment inefficient and hindering the development of off-policy methods. In this work, we propose a novel off-policy method that eliminates the need for additional environmental interactions by reformulating adversarial learning as a soft-constrained optimization problem. Our approach is theoretically supported by the symmetric property of policy evaluation between the agent and the adversary. The implementation is available at https://github.com/nakanakakosuke/VALT_SAC.
nan
Article 1130
Title@2025-06-20 (5): DeepSelective: Interpretable Prognosis Prediction via Feature Selection and Compression in EHR Data
Title: DeepSelective: Interpretable Prognosis Prediction via Feature Selection and Compression in EHR Data | DeepSelective: Interpretierbare Prognosevorhersage über Feature Selection und Komprimierung in EHR-Daten | 深选择:通过EHR数据中的地物选择和压缩,通过特征选择和压缩,作出可解释预测预测预测 2504.11264v2 |
Authors (13): Ruochi Zhang, Qian Yang, Xiaoyang Wang, Tian Wang, Qiong Zhou, Ziqi Deng, Kewei Li, Yueying Wang, Yusi Fan, Jiale Zhang, Lan Huang, Chang Liu, Fengfeng Zhou
The rapid accumulation of Electronic Health Records (EHRs) has transformed healthcare by providing valuable data that enhance clinical predictions and diagnoses. While conventional machine learning models have proven effective, they often lack robust representation learning and depend heavily on expert-crafted features. Although deep learning offers powerful solutions, it is often criticized for its lack of interpretability. To address these challenges, we propose DeepSelective, a novel end to end deep learning framework for predicting patient prognosis using EHR data, with a strong emphasis on enhancing model interpretability. DeepSelective combines data compression techniques with an innovative feature selection approach, integrating custom-designed modules that work together to improve both accuracy and interpretability. Our experiments demonstrate that DeepSelective not only enhances predictive accuracy but also significantly improves interpretability, making it a valuable tool for clinical decision-making. The source code is freely available at http://www.healthinformaticslab.org/supp/resources.php .
nan
Article 1131
Title@2025-06-20 (5): Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization
Title: Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization | Konforme Schlussfolgerung unter hochdimensionalen Kovariate Verschiebungen über Likelihood-Ratio Regularisierung | 通过传统-拉蒂奥正规化,在高多样性可变性转变下发生非正式推论 2502.13030v4 |
Authors (5): Sunay Joshi, Shayan Kiyani, George Pappas, Edgar Dobriban, Hamed Hassani
We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, an image classification task from the WILDS repository, and an LLM question-answering task on the MMLU benchmark.
nan
Article 1132
Title@2025-06-20 (5): IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification
Title: IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification | IsoNet: Kausale Analyse multimodaler Transformer für die neuromuskuläre Gestenklassifikation | IsoNet:用于神经肌肉手腕分类的多式变形器的因果分析 2506.16744v1 |
Authors (4): Eion Tyacke, Kunal Gupta, Jay Patel, Rui Li
Hand gestures are a primary output of the human motor system, yet the decoding of their neuromuscular signatures remains a bottleneck for basic neuroscience and assistive technologies such as prosthetics. Traditional human-machine interface pipelines rely on a single biosignal modality, but multimodal fusion can exploit complementary information from sensors. We systematically compare linear and attention-based fusion strategies across three architectures: a Multimodal MLP, a Multimodal Transformer, and a Hierarchical Transformer, evaluating performance on scenarios with unimodal and multimodal inputs. Experiments use two publicly available datasets: NinaPro DB2 (sEMG and accelerometer) and HD-sEMG 65-Gesture (high-density sEMG and force). Across both datasets, the Hierarchical Transformer with attention-based fusion consistently achieved the highest accuracy, surpassing the multimodal and best single-modality linear-fusion MLP baseline by over 10% on NinaPro DB2 and 3.7% on HD-sEMG. To investigate how modalities interact, we introduce an Isolation Network that selectively silences unimodal or cross-modal attention pathways, quantifying each group of token interactions’ contribution to downstream decisions. Ablations reveal that cross-modal interactions contribute approximately 30% of the decision signal across transformer layers, highlighting the importance of attention-driven fusion in harnessing complementary modality information. Together, these findings reveal when and how multimodal fusion would enhance biosignal classification and also provides mechanistic insights of human muscle activities. The study would be beneficial in the design of sensor arrays for neurorobotic systems.
nan
Article 1133
Title@2025-06-20 (5): Group-Level Data Selection for Efficient Pretraining
Title: Group-Level Data Selection for Efficient Pretraining | Gruppen-Level-Datenauswahl für effizientes Vortraining | 高效预科培训的集团一级数据选择 2502.14709v2 |
Authors (6): Zichun Yu, Fei Peng, Jie Lei, Arnold Overwijk, Wen-tau Yih, Chenyan Xiong
In this paper, we introduce Group-MATES, an efficient group-level data selection approach to optimize the speed-quality frontier of language model pretraining. Specifically, Group-MATES parameterizes costly group-level selection with a relational data influence model. To train this model, we sample training trajectories of the language model and collect oracle data influences alongside. The relational data influence model approximates the oracle data influence by weighting individual influence with relationships among training data. To enable efficient selection with our relational data influence model, we partition the dataset into small clusters using relationship weights and select data within each cluster independently. Experiments on DCLM 400M-4x, 1B-1x, and 3B-1x show that Group-MATES achieves 3.5%-9.4% relative performance gains over random selection across 22 downstream tasks, nearly doubling the improvements achieved by state-of-the-art individual data selection baselines. Furthermore, Group-MATES reduces the number of tokens required to reach a certain downstream performance by up to 1.75x, substantially elevating the speed-quality frontier. Further analyses highlight the critical role of relationship weights in the relational data influence model and the effectiveness of our cluster-based inference. Our code is open-sourced at https://github.com/facebookresearch/Group-MATES.
nan
Article 1134
Title@2025-06-20 (5): Client-Centered Federated Learning for Heterogeneous EHRs: Use Fewer Participants to Achieve the Same Performance
Title: Client-Centered Federated Learning for Heterogeneous EHRs: Use Fewer Participants to Achieve the Same Performance | Client-Centered Federated Learning for Heterogeneous EHRs: Verwenden Sie weniger Teilnehmer, um die gleiche Leistung zu erreichen | 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 客户 – – 异种EHR学习:利用较少的参与者实现相同业绩 2404.13318v4 |
Authors (4): Jiyoun Kim, Junu Kim, Kyunghoon Hur, Edward Choi
The increasing volume of electronic health records (EHRs) presents the opportunity to improve the accuracy and robustness of models in clinical prediction tasks. Unlike traditional centralized approaches, federated learning enables training on data from multiple institutions while preserving patient privacy and complying with regulatory constraints. In practice, healthcare institutions (i.e., hosts) often need to build predictive models tailored to their specific needs using federated learning. In this scenario, two key challenges arise: (1) ensuring compatibility across heterogeneous EHR systems, and (2) managing federated learning costs within budget constraints. To address these challenges, we propose EHRFL, a federated learning framework designed for building a cost-effective, host-specific predictive model using patient EHR data. EHRFL consists of two components: (1) text-based EHR modeling, which facilitates cross-institution compatibility without costly data standardization, and (2) a participant selection strategy based on averaged patient embedding similarity to reduce the number of participants without degrading performance. Experiments on multiple open-source EHR datasets demonstrate the effectiveness of both components. We believe our framework offers a practical solution for enabling healthcare institutions to build institution-specific predictive models under budgetary constraints.
nan
Article 1135
Title@2025-06-20 (5): Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Title: Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening | Unwahrscheinliche Belohnung: GRPO über die Verbreitung hinaus schärfen | 奖励不理想者:将GROPO提升到分销加压之后 2506.02355v2 |
Authors (3): Andre He, Daniel Fried, Sean Welleck
Reinforcement learning is emerging as a primary driver for improving language model reasoning capabilities. A fundamental question is whether current reinforcement learning algorithms – such as Group Relative Policy Optimization (GRPO), the de facto standard algorithm used to improve language model reasoning – merely sharpen the base model’s distribution around problems it can already solve. We investigate this question in the context of formal theorem proving, which has access to a perfect verifier. We identify a degenerate rank bias in GRPO in which highly probable trajectories are reinforced and rare ones are neglected. This results in distribution sharpening: the model can solve some problems with fewer samples, but underperforms simply sampling more solutions from the original model. To overcome GRPO’s rank bias we introduce unlikeliness reward, a simple method for explicitly up-weighting rare but correct solutions. We show that unlikeliness reward mitigates rank bias and improves pass@$N$ across a large range of $N$ in both synthetic and real theorem proving settings. We also uncover an unexpected link between rank bias and a seemingly mundane hyperparameter – the number of updates per batch – that leads to a second, complementary mitigation. We combine our insights into a revised GRPO training recipe for formal theorem proving, yielding an open pipeline that achieves competitive performance to DeepSeek-Prover-V1.5-RL on the miniF2F-test benchmark. We release our implementation at https://github.com/AndreHe02/rewarding-unlikely-release
nan
Article 1136
Title@2025-06-20 (5): Optimism Without Regularization: Constant Regret in Zero-Sum Games
Title: Optimism Without Regularization: Constant Regret in Zero-Sum Games | Optimismus ohne Regularisierung: Ständiger Bedauern in Null-Sum-Spielen | 不带常规的乐观主义:对零-苏姆运动会的一贯悔恨 2506.16736v1 |
Authors (4): John Lazarsfeld, Georgios Piliouras, Ryann Sim, Stratis Skoulakis
This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL – a regularized algorithm with a bounded stepsize parameter – obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $\Omega(\sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(\sqrt{T})$ regret.
nan
Article 1137
Title@2025-06-20 (5): On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis
Title: On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis | On Training-Test (Mis)Ausrichtung in unüberwachter kombinatorischer Optimierung: Beobachtung, empirische Exploration und Analyse | 未经监督的组合优化中的培训测试(Miss)调整:观察、经验探索和分析 2506.16732v1 |
Authors (2): Fanchen Bu, Kijung Shin
In unsupervised combinatorial optimization (UCO), during training, one aims to have continuous decisions that are promising in a probabilistic sense for each training instance, which enables end-to-end training on initially discrete and non-differentiable problems. At the test time, for each test instance, starting from continuous decisions, derandomization is typically applied to obtain the final deterministic decisions. Researchers have developed more and more powerful test-time derandomization schemes to enhance the empirical performance and the theoretical guarantee of UCO methods. However, we notice a misalignment between training and testing in the existing UCO methods. Consequently, lower training losses do not necessarily entail better post-derandomization performance, even for the training instances without any data distribution shift. Empirically, we indeed observe such undesirable cases. We explore a preliminary idea to better align training and testing in UCO by including a differentiable version of derandomization into training. Our empirical exploration shows that such an idea indeed improves training-test alignment, but also introduces nontrivial challenges into training.
nan
Article 1138
Title@2025-06-20 (5): Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Title: Disentangling and Integrating Relational and Sensory Information in Transformer Architectures | Entwirren und Integrieren von relationalen und sensorischen Informationen in Transformer-Architekturen | 将关系和感官信息拆解和整合到变换结构中 2405.16727v3 |
Authors (2): Awni Altabaa, John Lafferty
Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.
nan
Article 1139
Title@2025-06-20 (5): Incentivizing High-quality Participation From Federated Learning Agents
Title: Incentivizing High-quality Participation From Federated Learning Agents | Anreize für eine qualitativ hochwertige Beteiligung von Federated Learning Agents | 激励来自联邦学习代理机构的高质量参与 2506.16731v1 |
Authors (5): Jinlong Pang, Jiaheng Wei, Yifan Hua, Chen Qian, Yang Liu
Federated learning (FL) provides a promising paradigm for facilitating collaboration between multiple clients that jointly learn a global model without directly sharing their local data. However, existing research suffers from two caveats: 1) From the perspective of agents, voluntary and unselfish participation is often assumed. But self-interested agents may opt out of the system or provide low-quality contributions without proper incentives; 2) From the mechanism designer’s perspective, the aggregated models can be unsatisfactory as the existing game-theoretical federated learning approach for data collection ignores the potential heterogeneous effort caused by contributed data. To alleviate above challenges, we propose an incentive-aware framework for agent participation that considers data heterogeneity to accelerate the convergence process. Specifically, we first introduce the notion of Wasserstein distance to explicitly illustrate the heterogeneous effort and reformulate the existing upper bound of convergence. To induce truthful reporting from agents, we analyze and measure the generalization error gap of any two agents by leveraging the peer prediction mechanism to develop score functions. We further present a two-stage Stackelberg game model that formalizes the process and examines the existence of equilibrium. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed mechanism.
nan
Article 1140
Title@2025-06-20 (5): TriCon-SF: A Triple-Shuffle and Contribution-Aware Serial Federated Learning Framework for Heterogeneous Healthcare Data
Title: TriCon-SF: A Triple-Shuffle and Contribution-Aware Serial Federated Learning Framework for Heterogeneous Healthcare Data | TriCon-SF: Ein Dreifach-Shuffle und Contribution-Aware Serial Federated Learning Framework für heterogene Gesundheitsdaten | TriCon-SF: 不同基因保健数据三维和贡献软件系列联邦学习框架 2506.16723v1 |
Authors (4): Yuping Yan, Yizhi Wang, Yuanshuai Li, Yaochu Jin
Serial pipeline training is an efficient paradigm for handling data heterogeneity in cross-silo federated learning with low communication overhead. However, even without centralized aggregation, direct transfer of models between clients can violate privacy regulations and remain susceptible to gradient leakage and linkage attacks. Additionally, ensuring resilience against semi-honest or malicious clients who may manipulate or misuse received models remains a grand challenge, particularly in privacy-sensitive domains such as healthcare. To address these challenges, we propose TriCon-SF, a novel serial federated learning framework that integrates triple shuffling and contribution awareness. TriCon-SF introduces three levels of randomization by shuffling model layers, data segments, and training sequences to break deterministic learning patterns and disrupt potential attack vectors, thereby enhancing privacy and robustness. In parallel, it leverages Shapley value methods to dynamically evaluate client contributions during training, enabling the detection of dishonest behavior and enhancing system accountability. Extensive experiments on non-IID healthcare datasets demonstrate that TriCon-SF outperforms standard serial and parallel federated learning in both accuracy and communication efficiency. Security analysis further supports its resilience against client-side privacy attacks.
nan
Article 1141
Title@2025-06-20 (5): DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy
Title: DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy | DRARL: Entflechtung-Verstärkung-Verstärkung-Lernen zur effizienten Verbesserung der autonomen Fahrpolitik | DARL: 为有效改进自主驾驶政策而加强学习 2506.16720v1 |
Authors (8): Weitao Zhou, Bo Zhang, Zhong Cao, Xiang Li, Qian Cheng, Chunyang Liu, Yaqin Zhang, Diange Yang
With the increasing presence of automated vehicles on open roads under driver supervision, disengagement cases are becoming more prevalent. While some data-driven planning systems attempt to directly utilize these disengagement cases for policy improvement, the inherent scarcity of disengagement data (often occurring as a single instances) restricts training effectiveness. Furthermore, some disengagement data should be excluded since the disengagement may not always come from the failure of driving policies, e.g. the driver may casually intervene for a while. To this end, this work proposes disengagement-reason-augmented reinforcement learning (DRARL), which enhances driving policy improvement process according to the reason of disengagement cases. Specifically, the reason of disengagement is identified by a out-of-distribution (OOD) state estimation model. When the reason doesn’t exist, the case will be identified as a casual disengagement case, which doesn’t require additional policy adjustment. Otherwise, the policy can be updated under a reason-augmented imagination environment, improving the policy performance of disengagement cases with similar reasons. The method is evaluated using real-world disengagement cases collected by autonomous driving robotaxi. Experimental results demonstrate that the method accurately identifies policy-related disengagement reasons, allowing the agent to handle both original and semantically similar cases through reason-augmented training. Furthermore, the approach prevents the agent from becoming overly conservative after policy adjustments. Overall, this work provides an efficient way to improve driving policy performance with disengagement cases.
nan
Article 1142
Title@2025-06-20 (5): Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback
Title: Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback | Automatisierte Skill Discovery für Sprachagenten durch Exploration und iteratives Feedback | 通过探索和迭回反馈自动发现语言物剂技能 2506.04287v2 |
Authors (6): Yongjin Yang, Sinjae Kang, Juyong Lee, Dongjun Lee, Se-Young Yun, Kimin Lee
Training large language model (LLM) agents to acquire necessary skills and perform diverse tasks within an environment is gaining interest as a means to enable open-endedness. However, creating the training dataset for their skill acquisition faces several challenges. Manual trajectory collection requires significant human effort. Another approach, where LLMs directly propose tasks to learn, is often invalid, as the LLMs lack knowledge of which tasks are actually feasible. Moreover, the generated data may not provide a meaningful learning signal, as agents often already perform well on the proposed tasks. To address this, we propose a novel automatic skill discovery framework EXIF for LLM-powered agents, designed to improve the feasibility of generated target behaviors while accounting for the agents’ capabilities. Our method adopts an exploration-first strategy by employing an exploration agent (Alice) to train the target agent (Bob) to learn essential skills in the environment. Specifically, Alice first interacts with the environment to retrospectively generate a feasible, environment-grounded skill dataset, which is then used to train Bob. Crucially, we incorporate an iterative feedback loop, where Alice evaluates Bob’s performance to identify areas for improvement. This feedback then guides Alice’s next round of exploration, forming a closed-loop data generation process. Experiments on Webshop and Crafter demonstrate EXIF’s ability to effectively discover meaningful skills and iteratively expand the capabilities of the trained agent without any human intervention, achieving substantial performance improvements. Interestingly, we observe that setting Alice to the same model as Bob also notably improves performance, demonstrating EXIF’s potential for building a self-evolving system.
nan
Article 1143
Title@2025-06-20 (5): Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Title: Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness | Multi-Agenten-Debatte als Test-Time Scaling: Eine systematische Studie der bedingten Wirksamkeit | 重新审议作为试验时间尺度的多机构辩论:对有条件有效性的系统研究 2505.22960v2 |
Authors (6): Yongjin Yang, Euiin Yi, Jongwoo Ko, Kimin Lee, Zhijing Jin, Se-Young Yun
The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD) approaches, where agents collaboratively present, critique, and refine arguments, potentially offer improved reasoning, robustness, and diverse perspectives over monolithic models. Despite prior studies leveraging MAD, a systematic understanding of its effectiveness compared to self-agent methods, particularly under varying conditions, remains elusive. This paper seeks to fill this gap by conceptualizing MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities. We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks. Our study systematically examines the influence of task difficulty, model scale, and agent diversity on MAD’s performance. Key findings reveal that, for mathematical reasoning, MAD offers limited advantages over self-agent scaling but becomes more effective with increased problem difficulty and decreased model capability, while agent diversity shows little benefit. Conversely, for safety tasks, MAD’s collaborative refinement can increase vulnerability, but incorporating diverse agent configurations facilitates a gradual reduction in attack success through the collaborative refinement process. We believe our findings provide critical guidance for the future development of more effective and strategically deployed MAD systems.
nan
Article 1144
Title@2025-06-20 (5): Info-Coevolution: An Efficient Framework for Data Model Coevolution
Title: Info-Coevolution: An Efficient Framework for Data Model Coevolution | Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution | 信息革命:数据模型革命的有效框架 2506.08070v2 |
Authors (9): Ziheng Qin, Hailun Xu, Wei Chee Yew, Qi Jia, Yang Luo, Kanchan Sarkar, Danhui Guan, Kai Wang, Yang You
Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.
nan
Article 1145
Title@2025-06-20 (5): How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension
Title: How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension | Wie viele Domains genügen für Domain Generalization? Eine enge Charakterisierung über die Domain Shattering Dimension | 有多少域 足以 域 普遍化 ? 通过 域 折叠 维格 的 紧度 。 2506.16704v1 |
Authors (3): Cynthia Dwork, Lunjia Hu, Han Shao
We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.
nan
Article 1146
Title@2025-06-20 (5): SIDE: Semantic ID Embedding for effective learning from sequences
Title: SIDE: Semantic ID Embedding for effective learning from sequences | SIDE: Semantische ID Einbetten für effektives Lernen aus Sequenzen | 语义识别码嵌入,以便从序列中有效学习 2506.16698v1 |
Authors (7): Dinesh Ramasamy, Shakti Kumar, Chris Cadonic, Jiaxin Yang, Sohini Roychowdhury, Esam Abdel Rhman, Srihari Reddy
Sequence-based recommendations models are driving the state-of-the-art for industrial ad-recommendation systems. Such systems typically deal with user histories or sequence lengths ranging in the order of O(10^3) to O(10^4) events. While adding embeddings at this scale is manageable in pre-trained models, incorporating them into real-time prediction models is challenging due to both storage and inference costs. To address this scaling challenge, we propose a novel approach that leverages vector quantization (VQ) to inject a compact Semantic ID (SID) as input to the recommendation models instead of a collection of embeddings. Our method builds on recent works of SIDs by introducing three key innovations: (i) a multi-task VQ-VAE framework, called VQ fusion that fuses multiple content embeddings and categorical predictions into a single Semantic ID; (ii) a parameter-free, highly granular SID-to-embedding conversion technique, called SIDE, that is validated with two content embedding collections, thereby eliminating the need for a large parameterized lookup table; and (iii) a novel quantization method called Discrete-PCA (DPCA) which generalizes and enhances residual quantization techniques. The proposed enhancements when applied to a large-scale industrial ads-recommendation system achieves 2.4X improvement in normalized entropy (NE) gain and 3X reduction in data footprint compared to traditional SID methods.
nan
Article 1147
Title@2025-06-20 (5): Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach
Title: Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach | Verständnis und Reduzierung der klassenabhängigen Effekte von Datenvergrößerung mit einem Zwei-Spieler-Spiel-Ansatz | 理解和减少数据递增的二级依赖影响,采用双层游戏方法 2407.03146v4 |
Authors (3): Yunpeng Jiang, Yutong Ban, Paul Weng
Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed, it may have an unfair effect in multi-class classification. While data augmentation generally improves the overall performance (and therefore is beneficial for many classes), it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose CLAM, a CLAss-dependent Multiplicative-weights method. To derive it, we first formulate the training of a classifier as a non-linear optimization problem that aims at simultaneously maximizing the individual class performances and balancing them. By rewriting this optimization problem as an adversarial two-player game, we propose a novel multiplicative weight algorithm, for which we prove the convergence. Interestingly, our formulation also reveals that the class-dependent effects of data augmentation is not due to data augmentation only, but is in fact a general phenomenon. Our empirical results over six datasets demonstrate that the performance of learned classifiers is indeed more fairly distributed over classes, with only limited impact on the average accuracy.
nan
Article 1148
Title@2025-06-20 (5): Fast and Stable Diffusion Planning through Variational Adaptive Weighting
Title: Fast and Stable Diffusion Planning through Variational Adaptive Weighting | Schnelle und stabile Diffusionsplanung durch variationale adaptive Gewichtung | 通过变式适应性重力规划快速和稳定扩散 2506.16688v1 |
Authors (2): Zhiying Qiu, Tao Lin
Diffusion models have recently shown promise in offline RL. However, these methods often suffer from high training costs and slow convergence, particularly when using transformer-based denoising backbones. While several optimization strategies have been proposed – such as modified noise schedules, auxiliary prediction targets, and adaptive loss weighting – challenges remain in achieving stable and efficient training. In particular, existing loss weighting functions typically rely on neural network approximators, which can be ineffective in early training phases due to limited generalization capacity of MLPs when exposed to sparse feedback in the early training stages. In this work, we derive a variationally optimal uncertainty-aware weighting function and introduce a closed-form polynomial approximation method for its online estimation under the flow-based generative modeling framework. We integrate our method into a diffusion planning pipeline and evaluate it on standard offline RL benchmarks. Experimental results on Maze2D and Kitchen tasks show that our method achieves competitive performance with up to 10 times fewer training steps, highlighting its practical effectiveness.
nan
Article 1149
Title@2025-06-20 (5): Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
Title: Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections | Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen | 共同残存挖掘者:改进现实世界接触-Rich 人教管管管 2506.16685v1 |
Authors (4): Xiaomeng Xu, Yifan Hou, Zeyi Liu, Shuran Song
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/
nan
Article 1150
Title@2025-06-20 (5): How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
Title: How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions | Wie Sie Ihr Text-zu-Image-Modell trainieren: Bewertung von Design-Optionen für synthetische Trainingsbilder | 如何培训您的文本到图像模型:评估合成培训说明的设计选择 2506.16679v1 |
Authors (7): Manuel Brack, Sudeep Katakol, Felix Friedrich, Patrick Schramowski, Hareesh Ravi, Kristian Kersting, Ajinkya Kale
Training data is at the core of any successful text-to-image models. The quality and descriptiveness of image text are crucial to a model’s performance. Given the noisiness and inconsistency in web-scraped datasets, recent works shifted towards synthetic training captions. While this setup is generally believed to produce more capable models, current literature does not provide any insights into its design choices. This study closes this gap by systematically investigating how different synthetic captioning strategies impact the downstream performance of text-to-image models. Our experiments demonstrate that dense, high-quality captions enhance text alignment but may introduce trade-offs in output aesthetics and diversity. Conversely, captions of randomized lengths yield balanced improvements across aesthetics and alignment without compromising sample diversity. We also demonstrate that varying caption distributions introduce significant shifts in the output bias of a trained model. Our findings underscore the importance of caption design in achieving optimal model performance and provide practical insights for more effective training data strategies in text-to-image generation.
nan
Article 1151
Title@2025-06-20 (5): Open-Set Graph Anomaly Detection via Normal Structure Regularisation
Title: Open-Set Graph Anomaly Detection via Normal Structure Regularisation | Open-Set Graph Anomalie Erkennung durch Normalstruktur Regularisierung | 通过正常结构规范化进行开放版图异常检测 2311.06835v5 |
Authors (5): Qizhou Wang, Guansong Pang, Mahsa Salehi, Xiaokun Xia, Christopher Leckie
This paper considers an important Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to train a detection model using a small number of normal and anomaly nodes (referred to as seen anomalies) to detect both seen anomalies and unseen anomalies (i.e., anomalies that cannot be illustrated the training anomalies). Those labelled training data provide crucial prior knowledge about abnormalities for GAD models, enabling substantially reduced detection errors. However, current supervised GAD methods tend to over-emphasise fitting the seen anomalies, leading to many errors of detecting the unseen anomalies as normal nodes. Further, existing open-set AD models were introduced to handle Euclidean data, failing to effectively capture discriminative features from graph structure and node attributes for GAD. In this work, we propose a novel open-set GAD approach, namely normal structure regularisation (NSReg), to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies. The key idea in NSReg is to introduce a regularisation term that enforces the learning of compact, semantically-rich representations of normal nodes based on their structural relations to other nodes. When being optimised with supervised anomaly detection losses, the regularisation term helps incorporate strong normality into the modelling, and thus, it effectively avoids over-fitting the seen anomalies and learns a better normality decision boundary, largely reducing the false negatives of detecting unseen anomalies as normal. Extensive empirical results on seven real-world datasets show that NSReg significantly outperforms state-of-the-art competing methods by at least 14% AUC-ROC on the unseen anomaly classes and by 10% AUC-ROC on all anomaly classes.
nan
Article 1152
Title@2025-06-20 (5): Kinetics: Rethinking Test-Time Scaling Laws
Title: Kinetics: Rethinking Test-Time Scaling Laws | Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen | 动因:重新思考试验时间扩增法 2506.05333v3 |
Authors (6): Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of-$N$, long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and more parallel samples within the same resource budget. Empirically, we show that sparse attention models consistently outperform dense counterparts, achieving over 60 points gains in low-cost regimes and over 5 points gains in high-cost regimes for problem-solving accuracy on AIME, encompassing evaluations on state-of-the-art MoEs. These results suggest that sparse attention is essential and increasingly important with more computing invested, for realizing the full potential of test-time scaling where, unlike training, accuracy has yet to saturate as a function of computation, and continues to improve through increased generation. The code is available at https://github.com/Infini-AI-Lab/Kinetics.
nan
Article 1153
Title@2025-06-20 (5): RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
Title: RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations | RL2Grid: Benchmarking-Verstärkung im Netzbetrieb | RL2Grid:在电力网业务中确定加强学习的基准 2503.23101v2 |
Authors (10): Enrico Marchesini, Benjamin Donnot, Constance Crozier, Ian Dytham, Christian Merz, Lars Schewe, Nico Westerbeck, Cathy Wu, Antoine Marot, Priya L. Donti
Reinforcement learning (RL) can provide adaptive and scalable controllers essential for power grid decarbonization. However, RL methods struggle with power grids’ complex dynamics, long-horizon goals, and hard physical constraints. For these reasons, we present RL2Grid, a benchmark designed in collaboration with power system operators to accelerate progress in grid control and foster RL maturity. Built on RTE France’s power simulation framework, RL2Grid standardizes tasks, state and action spaces, and reward structures for a systematic evaluation and comparison of RL algorithms. Moreover, we integrate operational heuristics and design safety constraints based on human expertise to ensure alignment with physical requirements. By establishing reference performance metrics for classic RL baselines on RL2Grid’s tasks, we highlight the need for novel methods capable of handling real systems and discuss future directions for RL-based grid control.
nan
Article 1154
Title@2025-06-20 (5): Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
Title: Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models | Adaptive Anleitung beschleunigt die Stärkung des Lernens von Vernunftmodellen | 适应性指导加速加速强化理性模型学习 2506.13923v2 |
Authors (6): Vaskar Nath, Elaine Lau, Anisha Gunjal, Manasi Sharma, Nikhil Baharte, Sean Hendryx
We study the process through which reasoning models trained with reinforcement learning on verifiable rewards (RLVR) can learn to solve new problems. We find that RLVR drives performance in two main ways: (1) by compressing pass@$k$ into pass@1 and (2) via “capability gain” in which models learn to solve new problems that they previously could not solve even at high $k$. We find that while capability gain exists across model scales, learning to solve new problems is primarily driven through self-distillation. We demonstrate these findings across model scales ranging from 0.5B to 72B parameters on >500,000 reasoning problems with prompts and verifiable final answers across math, science, and code domains. We further show that we can significantly improve pass@$k$ rates by leveraging natural language guidance for the model to consider within context while still requiring the model to derive a solution chain from scratch. Based of these insights, we derive $\text{Guide}$ – a new class of online training algorithms. $\text{Guide}$ adaptively incorporates hints into the model’s context on problems for which all rollouts were initially incorrect and adjusts the importance sampling ratio for the “off-policy” trajectories in order to optimize the policy for contexts in which the hints are no longer present. We describe variants of $\text{Guide}$ for GRPO and PPO and empirically show that Guide-GRPO on 7B and 32B parameter models improves generalization over its vanilla counterpart with up to 4$\%$ macro-average improvement across math benchmarks. We include careful ablations to analyze $\text{Guide}$’s components and theoretically analyze Guide’s learning efficiency.
nan
Article 1155
Title@2025-06-20 (5): The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing
Title: The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing | Der Hitchhiker-Leitfaden für effizientes, Ende-zu-Ende und enges DP-Auditing | Hitchhiker的《高效、最终到最终和严格DP审计指南》 2506.16666v1 |
Authors (5): Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, Emiliano De Cristofaro
This paper systematizes research on auditing Differential Privacy (DP) techniques, aiming to identify key insights into the current state of the art and open challenges. First, we introduce a comprehensive framework for reviewing work in the field and establish three cross-contextual desiderata that DP audits should target–namely, efficiency, end-to-end-ness, and tightness. Then, we systematize the modes of operation of state-of-the-art DP auditing techniques, including threat models, attacks, and evaluation functions. This allows us to highlight key details overlooked by prior work, analyze the limiting factors to achieving the three desiderata, and identify open research problems. Overall, our work provides a reusable and systematic methodology geared to assess progress in the field and identify friction points and future directions for our community to focus on.
nan
Article 1156
Title@2025-06-20 (5): Private Training & Data Generation by Clustering Embeddings
Title: Private Training & Data Generation by Clustering Embeddings | Privates Training & Datengenerierung durch Clustering-Embeddings | 通过集群化嵌入进行私营培训和数据生成 2506.16661v1 |
Authors (5): Felix Zhou, Samson Zhou, Vahab Mirrokni, Alessandro Epasto, Vincent Cohen-Addad
Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been shown to unintentionally memorize and reveal sensitive information, including reconstructing entire training samples. Differential privacy (DP) provides a robust framework for protecting individual data and in particular, a new approach to privately training deep neural networks is to approximate the input dataset with a privately generated synthetic dataset, before any subsequent training algorithm. We introduce a novel principled method for DP synthetic image embedding generation, based on fitting a Gaussian Mixture Model (GMM) in an appropriate embedding space using DP clustering. Our method provably learns a GMM under separation conditions. Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy on standard benchmark datasets. Additionally, we demonstrate that our method can generate realistic synthetic images that achieve downstream classification accuracy comparable to SOTA methods. Our method is quite general, as the encoder and decoder modules can be freely substituted to suit different tasks. It is also highly scalable, consisting only of subroutines that scale linearly with the number of samples and/or can be implemented efficiently in distributed systems.
nan
Article 1157
Title@2025-06-20 (5): A Minimalist Optimizer Design for LLM Pretraining
Title: A Minimalist Optimizer Design for LLM Pretraining | Minimalistisches Optimizer-Design für LLM Pretraining | LLM 培训前最起码的优化剂设计 2506.16659v1 |
Authors (4): Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong
Training large language models (LLMs) typically relies on adaptive optimizers such as Adam, which require significant memory to maintain first- and second-moment matrices, known as optimizer states. While recent works such as GaLore, Fira, and APOLLO have proposed state-compressed variants to reduce memory consumption, a fundamental question remains: What is the minimal amount of optimizer state that is truly necessary to retain state-of-the-art performance in LLM pretraining? In this work, we systematically investigate this question using a bottom-up approach. We find that two memory- and compute-efficient optimization techniques are particularly effective: (1) column-wise gradient normalization significantly boosts the performance of plain SGD without requiring momentum; and (2) adding first-order momentum only to the output layer - where gradient variance is highest - yields performance competitive with fully adaptive methods such as Muon. Based on these insights, we propose SCALE (Stochastic Column-normalized Last-layer Momentum), a new optimizer that combines column-normalized SGD with last-layer momentum, where column normalization refers to normalizing the gradient along the output dimension. Across multiple LLaMA models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira, and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For the LLaMA 7B model, SCALE outperforms the state-of-the-art method APOLLO in terms of both perplexity and memory consumption. In addition, our method serves as a minimalist baseline for more sophisticated optimizer design.
nan
Article 1158
Title@2025-06-20 (5): Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
Title: Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards | Multi-Armed Bandits mit maschinellem Lernen-erzeugte Surrogate Belohnungen | 多装甲强盗和机器学习优于学习的代金奖 2506.16658v1 |
Authors (4): Wenlong Ji, Yihan Pan, Ruihao Zhu, Lihua Lei
Multi-armed bandit (MAB) is a widely adopted framework for sequential decision-making under uncertainty. Traditional bandit algorithms rely solely on online data, which tends to be scarce as it must be gathered during the online phase when the arms are actively pulled. However, in many practical settings, rich auxiliary data, such as covariates of past users, is available prior to deploying any arms. We introduce a new setting for MAB where pre-trained machine learning (ML) models are applied to convert side information and historical data into \emph{surrogate rewards}. A prominent feature of this setting is that the surrogate rewards may exhibit substantial bias, as true reward data is typically unavailable in the offline phase, forcing ML predictions to heavily rely on extrapolation. To address the issue, we propose the Machine Learning-Assisted Upper Confidence Bound (MLA-UCB) algorithm, which can be applied to any reward prediction model and any form of auxiliary data. When the predicted and true rewards are jointly Gaussian, it provably improves the cumulative regret, provided that the correlation is non-zero – even in cases where the mean surrogate reward completely misaligns with the true mean rewards. Notably, our method requires no prior knowledge of the covariance matrix between true and surrogate rewards. We compare MLA-UCB with the standard UCB on a range of numerical studies and show a sizable efficiency gain even when the size of the offline data and the correlation between predicted and true rewards are moderate.
nan
Article 1159
Title@2025-06-20 (5): Near Optimal Decision Trees in a SPLIT Second
Title: Near Optimal Decision Trees in a SPLIT Second | Nahe Optimale Entscheidung Bäume in einem SPLIT zweite | SPLIT 秒中接近最佳决定树 2502.15988v2 |
Authors (4): Varun Babbar, Hayden McTavish, Cynthia Rudin, Margo Seltzer
Decision tree optimization is fundamental to interpretable machine learning. The most popular approach is to greedily search for the best feature at every decision point, which is fast but provably suboptimal. Recent approaches find the global optimum using branch and bound with dynamic programming, showing substantial improvements in accuracy and sparsity at great cost to scalability. An ideal solution would have the accuracy of an optimal method and the scalability of a greedy method. We introduce a family of algorithms called SPLIT (SParse Lookahead for Interpretable Trees) that moves us significantly forward in achieving this ideal balance. We demonstrate that not all sub-problems need to be solved to optimality to find high quality trees; greediness suffices near the leaves. Since each depth adds an exponential number of possible trees, this change makes our algorithms orders of magnitude faster than existing optimal methods, with negligible loss in performance. We extend this algorithm to allow scalable computation of sets of near-optimal trees (i.e., the Rashomon set).
nan
Article 1160
Title@2025-06-19 (4): Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures
Title: Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures | Relationales Deep Learning: Herausforderungen, Grundlagen und Architekturen der nächsten Generation | 关系深层学习:挑战、基础和下一代建筑 2506.16654v1 |
Authors (4): Vijay Prakash Dwivedi, Charilaos Kanatsoulis, Shenyang Huang, Jure Leskovec
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data and has been applied to molecules, social networks, recommendation systems, and transportation, among other domains. Data in multi-tabular relational databases can also be constructed as ‘relational entity graphs’ for Relational Deep Learning (RDL) - a new blueprint that enables end-to-end representation learning without traditional feature engineering. Compared to arbitrary graph-structured data, relational entity graphs have key properties: (i) their structure is defined by primary-foreign key relationships between entities in different tables, (ii) the structural connectivity is a function of the relational schema defining a database, and (iii) the graph connectivity is temporal and heterogeneous in nature. In this paper, we provide a comprehensive review of RDL by first introducing the representation of relational databases as relational entity graphs, and then reviewing public benchmark datasets that have been used to develop and evaluate recent GNN-based RDL models. We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data, while also surveying foundational neural network methods and recent architectural advances specialized for relational entity graphs. Finally, we explore opportunities to unify these distinct modeling challenges, highlighting how RDL converges multiple sub-fields in graph machine learning towards the design of foundation models that can transform the processing of relational data.
nan
Article 1161
Title@2025-06-19 (4): Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials
Title: Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials | Integration dynamischer Systeme Lernen mit Basismodellen: Ein Meta-Evolutionäres KI-Framework für klinische Studien | 将动态系统学习与基础模型相结合:临床试验的非革命性AI框架 2506.14782v2 |
Authors (6): Joseph Geraci, Bessi Qorri, Christian Cumbaa, Mike Tsay, Paul Leonczyk, Luca Pani
Artificial intelligence (AI) has evolved into an ecosystem of specialized “species,” each with unique strengths. We analyze two: DeepSeek-V3, a 671-billion-parameter Mixture of Experts large language model (LLM) exemplifying scale-driven generality, and NetraAI, a dynamical system-based framework engineered for stability and interpretability on small clinical trial datasets. We formalize NetraAI’s foundations, combining contraction mappings, information geometry, and evolutionary algorithms to identify predictive patient cohorts. Features are embedded in a metric space and iteratively contracted toward stable attractors that define latent subgroups. A pseudo-temporal embedding and long-range memory enable exploration of higher-order feature interactions, while an internal evolutionary loop selects compact, explainable 2-4-variable bundles (“Personas”). To guide discovery, we introduce an LLM Strategist as a meta-evolutionary layer that observes Persona outputs, prioritizes promising variables, injects domain knowledge, and assesses robustness. This two-tier architecture mirrors the human scientific process: NetraAI as experimentalist, the LLM as theorist, forming a self-improving loop. In case studies (schizophrenia, depression, pancreatic cancer), NetraAI uncovered small, high-effect-size subpopulations that transformed weak baseline models (AUC ~0.50-0.68) into near-perfect classifiers using only a few features. We position NetraAI at the intersection of dynamical systems, information geometry, and evolutionary learning, aligned with emerging concept-level reasoning paradigms such as LeCun’s Joint Embedding Predictive Architecture (JEPA). By prioritizing reliable, explainable knowledge, NetraAI offers a new generation of adaptive, self-reflective AI to accelerate clinical discovery.
nan
Article 1162
Title@2025-06-19 (4): LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
Title: LLMs in Coding and their Impact on the Commercial Software Engineering Landscape | LLMs in Coding und ihre Auswirkungen auf die kommerzielle Software-Engineering-Landschaft | 编码及其对商业软件工程景观的影响 2506.16653v1 |
Authors (3): Vladislav Belozerov, Peter J Barclay, Askhan Sami
Large-language-model coding tools are now mainstream in software engineering. But as these same tools move human effort up the development stack, they present fresh dangers: 10% of real prompts leak private data, 42% of generated snippets hide security flaws, and the models can even ``agree’’ with wrong ideas, a trait called sycophancy. We argue that firms must tag and review every AI-generated line of code, keep prompts and outputs inside private or on-premises deployments, obey emerging safety regulations, and add tests that catch sycophantic answers – so they can gain speed without losing security and accuracy.
nan
Article 1163
Title@2025-06-19 (4): CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Title: CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity | CodeDiffuser: Aufmerksamkeitsverstärkte Diffusionspolitik über VLM-generierten Code für Instruction Ambiguity | 代码用户:通过VLM - 教育结构设计守则加强关注 - 强化传播政策 2506.16652v1 |
Authors (9): Guang Yin, Yitong Li, Yixuan Wang, Dale McConachie, Paarth Shah, Kunimatsu Hashimoto, Huan Zhang, Katherine Liu, Yunzhu Li
Natural language instructions for robotic manipulation tasks often exhibit ambiguity and vagueness. For instance, the instruction “Hang a mug on the mug tree” may involve multiple valid actions if there are several mugs and branches to choose from. Existing language-conditioned policies typically rely on end-to-end models that jointly handle high-level semantic understanding and low-level action generation, which can result in suboptimal performance due to their lack of modularity and interpretability. To address these challenges, we introduce a novel robotic manipulation framework that can accomplish tasks specified by potentially ambiguous natural language. This framework employs a Vision-Language Model (VLM) to interpret abstract concepts in natural language instructions and generates task-specific code - an interpretable and executable intermediate representation. The generated code interfaces with the perception module to produce 3D attention maps that highlight task-relevant regions by integrating spatial and semantic information, effectively resolving ambiguities in instructions. Through extensive experiments, we identify key limitations of current imitation learning methods, such as poor adaptation to language and environmental variations. We show that our approach excels across challenging manipulation tasks involving language ambiguity, contact-rich manipulation, and multi-object interactions.
nan
Article 1164
Title@2025-06-19 (4): A Distributional-Lifting Theorem for PAC Learning
Title: A Distributional-Lifting Theorem for PAC Learning | Ein Distributional-Lifting-Theorem für PAC-Lernen | PAC 学习的分布式放行理论 2506.16651v1 |
Authors (4): Guy Blanc, Jane Lange, Carmen Strassle, Li-Yang Tan
The apparent difficulty of efficient distribution-free PAC learning has led to a large body of work on distribution-specific learning. Distributional assumptions facilitate the design of efficient algorithms but also limit their reach and relevance. Towards addressing this, we prove a distributional-lifting theorem: This upgrades a learner that succeeds with respect to a limited distribution family $\mathcal{D}$ to one that succeeds with respect to any distribution $D^\star$, with an efficiency overhead that scales with the complexity of expressing $D^\star$ as a mixture of distributions in $\mathcal{D}$. Recent work of Blanc, Lange, Malik, and Tan considered the special case of lifting uniform-distribution learners and designed a lifter that uses a conditional sample oracle for $D^\star$, a strong form of access not afforded by the standard PAC model. Their approach, which draws on ideas from semi-supervised learning, first learns $D^\star$ and then uses this information to lift. We show that their approach is information-theoretically intractable with access only to random examples, thereby giving formal justification for their use of the conditional sample oracle. We then take a different approach that sidesteps the need to learn $D^\star$, yielding a lifter that works in the standard PAC model and enjoys additional advantages: it works for all base distribution families, preserves the noise tolerance of learners, has better sample complexity, and is simpler.
nan
Article 1165
Title@2025-06-19 (4): Distributional Adversarial Loss
Title: Distributional Adversarial Loss | Verlust des Verteilungsgefälles | 分布相对损 损 2406.03458v2 |
Authors (5): Saba Ahmadi, Siddharth Bhandari, Avrim Blum, Chen Dan, Prabhav Jain
We initiate the study of a new notion of adversarial loss which we call distributional adversarial loss. In this notion, we assume for each original example, the allowed adversarial perturbation set is a family of distributions, and the adversarial loss over each example is the maximum loss over all the associated distributions. The goal is to minimize the overall adversarial loss. We show sample complexity bounds in the PAC-learning setting for our notion of adversarial loss. Our notion of adversarial loss contrasts the prior work on robust learning that considers a set of points, not distributions, as the perturbation set of each clean example. As an application of our approach, we show how to unify the two lines of work on randomized smoothing and robust learning in the PAC-learning setting and derive sample complexity bounds for randomized smoothing methods. Furthermore, we investigate the role of randomness in achieving robustness against adversarial attacks. We show a general derandomization technique that preserves the extent of a randomized classifier’s robustness against adversarial attacks and show its effectiveness empirically.
nan
Article 1166
Title@2025-06-19 (4): Semantic Outlier Removal with Embedding Models and LLMs
Title: Semantic Outlier Removal with Embedding Models and LLMs | Semantic Outlier Entfernung mit Einbetten Modelle und LLMs | 带有嵌入型模型和LLMs的语义外外部清除 2506.16644v1 |
Authors (6): Eren Akbiyik, João Almeida, Rik Melis, Ritu Sriram, Viviana Petrescu, Vilhjálmur Vilhjálmsson
Modern text processing pipelines demand robust methods to remove extraneous content while preserving a document’s core message. Traditional approaches such as HTML boilerplate extraction or keyword filters often fail in multilingual settings and struggle with context-sensitive nuances, whereas Large Language Models (LLMs) offer improved quality at high computational cost. We introduce SORE (Semantic Outlier Removal), a cost-effective, transparent method that leverages multilingual sentence embeddings and approximate nearest-neighbor search to identify and excise unwanted text segments. By first identifying core content via metadata embedding and then flagging segments that either closely match predefined outlier groups or deviate significantly from the core, SORE achieves near-LLM extraction precision at a fraction of the cost. Experiments on HTML datasets demonstrate that SORE outperforms structural methods and yield high precision in diverse scenarios. Our system is currently deployed in production, processing millions of documents daily across multiple languages while maintaining both efficiency and accuracy. To facilitate reproducibility and further research, we release our implementation and evaluation datasets.
nan
Article 1167
Title@2025-06-19 (4): Learning to Route LLMs with Confidence Tokens
Title: Learning to Route LLMs with Confidence Tokens | Lernen, LLMs mit vertrauensvollen Token zu routen | 学习使用充满信心的LLMs路线 2410.13284v3 |
Authors (7): Yu-Neng Chuang, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu, Helen Zhou
Large language models (LLMs) have demonstrated impressive performance on several tasks and are increasingly deployed in real-world applications. However, especially in high-stakes settings, it becomes vital to know when the output of an LLM may be unreliable. Depending on whether an answer is trustworthy, a system can then choose to route the question to another expert, or otherwise fall back on a safe default behavior. In this work, we study the extent to which LLMs can reliably indicate confidence in their answers, and how this notion of confidence can translate into downstream accuracy gains. We propose Self-Reflection with Error-based Feedback (Self-REF), a lightweight training strategy to teach LLMs to express confidence in whether their answers are correct in a reliable manner. Self-REF introduces confidence tokens into the LLM, from which a confidence score can be extracted. Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
nan
Article 1168
Title@2025-06-19 (4): Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions
Title: Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions | Low-Resource-Video-Super-Resolution mit Speicher, Wavelets und deformierbare Konvolutionen | 使用记忆、波子和变形革命的低资源视频超级分辨率 2502.01816v3 |
Authors (5): Kavitha Viswanathan, Shashwat Pathak, Piyush Bharambe, Harsh Choudhary, Amit Sethi
The tradeoff between reconstruction quality and compute required for video super-resolution (VSR) remains a formidable challenge in its adoption for deployment on resource-constrained edge devices. While transformer-based VSR models have set new benchmarks for reconstruction quality in recent years, these require substantial computational resources. On the other hand, lightweight models that have been introduced even recently struggle to deliver state-of-the-art reconstruction. We propose a novel lightweight and parameter-efficient neural architecture for VSR that achieves state-of-the-art reconstruction accuracy with just 2.3 million parameters. Our model enhances information utilization based on several architectural attributes. Firstly, it uses 2D wavelet decompositions strategically interlayered with learnable convolutional layers to utilize the inductive prior of spatial sparsity of edges in visual data. Secondly, it uses a single memory tensor to capture inter-frame temporal information while avoiding the computational cost of previous memory-based schemes. Thirdly, it uses residual deformable convolutions for implicit inter-frame object alignment that improve upon deformable convolutions by enhancing spatial information in inter-frame feature differences. Architectural insights from our model can pave the way for real-time VSR on the edge, such as display devices for streaming data.
nan
Article 1169
Title@2025-06-19 (4): Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation
Title: Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation | Latent Noise Injection für die private und statistisch ausgerichtete Synthetische Datengenerierung | 私人和统计上统一合成数据生成的热点喷射器 2506.16636v1 |
Authors (2): Rex Shen, Lu Tian
Synthetic Data Generation has become essential for scalable, privacy-preserving statistical analysis. While standard approaches based on generative models, such as Normalizing Flows, have been widely used, they often suffer from slow convergence in high-dimensional settings, frequently converging more slowly than the canonical $1/\sqrt{n}$ rate when approximating the true data distribution. To overcome these limitations, we propose a Latent Noise Injection method using Masked Autoregressive Flows (MAF). Instead of directly sampling from the trained model, our method perturbs each data point in the latent space and maps it back to the data domain. This construction preserves a one to one correspondence between observed and synthetic data, enabling synthetic outputs that closely reflect the underlying distribution, particularly in challenging high-dimensional regimes where traditional sampling struggles. Our procedure satisfies local $(\epsilon, \delta)$-differential privacy and introduces a single perturbation parameter to control the privacy-utility trade-off. Although estimators based on individual synthetic datasets may converge slowly, we show both theoretically and empirically that aggregating across $K$ studies in a meta analysis framework restores classical efficiency and yields consistent, reliable inference. We demonstrate that with a well-calibrated perturbation parameter, Latent Noise Injection achieves strong statistical alignment with the original data and robustness against membership inference attacks. These results position our method as a compelling alternative to conventional flow-based sampling for synthetic data sharing in decentralized and privacy-sensitive domains, such as biomedical research.
nan
Article 1170
Title@2025-06-19 (4): Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Title: Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts | Lion Secretly Solves Constrained Optimization: Wie Lyapunov voraussagt | 限制优化:如Lyapunov预测 2310.05898v6 |
Authors (4): Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu
Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion’s efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $|x|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
nan
Article 1171
Title@2025-06-19 (4): Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data
Title: Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data | Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten | 精神病纵向数据产生的可预期的学习结果 2506.16629v1 |
Authors (1): Eric V. Strobl
Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.
nan
Article 1172
Title@2025-06-19 (4): Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System
Title: Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System | Erste Untersuchung der LLM-Assistenten Entwicklung eines regelbasierten klinischen NLP-Systems | 利用LLM协助开发有章可循的临床NLP系统的初步调查 2506.16628v1 |
Authors (2): Jianlin Shi, Brian T. Bucher
Despite advances in machine learning (ML) and large language models (LLMs), rule-based natural language processing (NLP) systems remain active in clinical settings due to their interpretability and operational efficiency. However, their manual development and maintenance are labor-intensive, particularly in tasks with large linguistic variability. To overcome these limitations, we proposed a novel approach employing LLMs solely during the rule-based systems development phase. We conducted the initial experiments focusing on the first two steps of developing a rule-based NLP pipeline: find relevant snippets from the clinical note; extract informative keywords from the snippets for the rule-based named entity recognition (NER) component. Our experiments demonstrated exceptional recall in identifying clinically relevant text snippets (Deepseek: 0.98, Qwen: 0.99) and 1.0 in extracting key terms for NER. This study sheds light on a promising new direction for NLP development, enabling semi-automated or automated development of rule-based systems with significantly faster, more cost-effective, and transparent execution compared with deep learning model-based solutions.
nan
Article 1173
Title@2025-06-19 (4): FlatCAD: Fast Curvature Regularization of Neural SDFs for CAD Models
Title: FlatCAD: Fast Curvature Regularization of Neural SDFs for CAD Models | FlatCAD: Schnelle Curvature Regularisierung von neuralen SDFs für CAD-Modelle | FlatCAD: CAD 模型的神经SDF 快速曲线常规化 2506.16627v1 |
Authors (5): Haotian Yin, Aleksander Plocharski, Michal Jan Wlodarczyk, Mikolaj Kida, Przemyslaw Musialski
Neural signed-distance fields (SDFs) have become a versatile backbone for geometric learning, yet enforcing developable, CAD-style behavior still hinges on Gaussian curvature penalties that require full Hessian evaluation and second-order automatic differentiation, both of which are costly in memory and runtime. We present a curvature proxy that regularizes only the mixed second-order term (Weingarten term), allowing the two principal curvatures to adapt freely to data while suppressing unwanted warp. Two complementary instantiations realize this idea: (i) a finite-difference proxy that replaces each Hessian entry with four forward SDF evaluations and a single first-order gradient, and (ii) an autodiff proxy that computes the same mixed derivative via one Hessian-vector product, sidestepping explicit full Hessian assembly and remaining faster in practice. Both variants converge to the exact mixed second derivative, thus preserving the intended geometric bias without incurring full second-order graphs. On the ABC benchmarks, the proxies match or exceed the reconstruction fidelity of Hessian-based baselines while reducing GPU memory use and wall-clock time by a factor of two. Because the method is drop-in and framework-agnostic, it opens a practical path toward scalable, curvature-aware SDF learning for engineering-grade shape reconstruction.
nan
Article 1174
Title@2025-06-19 (4): Harmonizing Safety and Speed: A Human-Algorithm Approach to Enhance the FDA’s Medical Device Clearance Policy
Title: Harmonizing Safety and Speed: A Human-Algorithm Approach to Enhance the FDA’s Medical Device Clearance Policy | Harmonisierung von Sicherheit und Geschwindigkeit: Ein Mensch-Algorithmus-Ansatz zur Verbesserung der Sicherheitspolitik für medizinische Geräte der FDA | 统一安全和速度:采取人类-逻辑方法,加强林业发展局的医疗设备清理政策 2407.11823v2 |
Authors (3): Mohammad Zhalechian, Soroush Saghafian, Omar Robles
The United States Food and Drug Administration’s (FDA’s) Premarket Notification 510(k) pathway allows manufacturers to gain approval for a medical device by demonstrating its substantial equivalence to another legally marketed device. However, the inherent ambiguity of this regulatory procedure has led to high recall rates for many devices cleared through this pathway. This trend has raised significant concerns regarding the efficacy of the FDA’s current approach, prompting a reassessment of the 510(k) regulatory framework. In this paper, we develop a combined human-algorithm approach to assist the FDA in improving its 510(k) medical device clearance process by reducing the risk of recalls and the workload imposed on the FDA. We first develop machine learning methods to estimate the risk of recall of 510(k) medical devices based on the information available at submission time. We then propose a data-driven clearance policy that recommends acceptance, rejection, or deferral to FDA’s committees for in-depth evaluation. We conduct an empirical study using a unique large-scale dataset of over 31,000 medical devices that we assembled based on data sources from the FDA and Centers for Medicare and Medicaid Service (CMS). A conservative evaluation of our proposed policy based on this data shows a 32.9% improvement in the recall rate and a 40.5% reduction in the FDA’s workload. Our analyses also indicate that implementing our policy could result in significant annual cost savings of $1.7 billion, which highlights the value of using a holistic and data-driven approach to improve the FDA’s current 510(k) medical device evaluation pathway.
nan
Article 1175
Title@2025-06-19 (4): MonoSOWA: Scalable monocular 3D Object detector Without human Annotations
Title: MonoSOWA: Scalable monocular 3D Object detector Without human Annotations | MonoSOWA: Skalierbarer monookularer 3D Objektdetektor ohne menschliche Anmerkungen | MonoSOWA:无人说明的可缩缩的单镜3D物体探测器 2501.09481v3 |
Authors (2): Jan Skvrna, Lukas Neumann
Inferring object 3D position and orientation from a single RGB camera is a foundational task in computer vision with many important applications. Traditionally, 3D object detection methods are trained in a fully-supervised setup, requiring LiDAR and vast amounts of human annotations, which are laborious, costly, and do not scale well with the ever-increasing amounts of data being captured. We present a novel method to train a 3D object detector from a single RGB camera without domain-specific human annotations, making orders of magnitude more data available for training. The method uses newly proposed Local Object Motion Model to disentangle object movement source between subsequent frames, is approximately 700 times faster than previous work and compensates camera focal length differences to aggregate multiple datasets. The method is evaluated on three public datasets, where despite using no human labels, it outperforms prior work by a significant margin. It also shows its versatility as a pre-training tool for fully-supervised training and shows that combining pseudo-labels from multiple datasets can achieve comparable accuracy to using human labels from a single dataset. The source code and model are available at https://github.com/jskvrna/MonoSOWA.
nan
Article 1176
Title@2025-06-19 (4): Distribution Parameter Actor-Critic: Shifting the Agent-Environment Boundary for Diverse Action Spaces
Title: Distribution Parameter Actor-Critic: Shifting the Agent-Environment Boundary for Diverse Action Spaces | Verteilungsparameter Aktor-Kritik: Verschiebung der Agent-Umwelt-Grenze für unterschiedliche Aktionsräume | 分布参数 Actor-Critic: 改变不同行动空间的代理环境边界 2506.16608v1 |
Authors (3): Jiamin He, A. Rupam Mahmood, Martha White
We introduce a novel reinforcement learning (RL) framework that treats distribution parameters as actions, redefining the boundary between agent and environment. This reparameterization makes the new action space continuous, regardless of the original action type (discrete, continuous, mixed, etc.). Under this new parameterization, we develop a generalized deterministic policy gradient estimator, Distribution Parameter Policy Gradient (DPPG), which has lower variance than the gradient in the original action space. Although learning the critic over distribution parameters poses new challenges, we introduce interpolated critic learning (ICL), a simple yet effective strategy to enhance learning, supported by insights from bandit settings. Building on TD3, a strong baseline for continuous control, we propose a practical DPPG-based actor-critic algorithm, Distribution Parameter Actor-Critic (DPAC). Empirically, DPAC outperforms TD3 in MuJoCo continuous control tasks from OpenAI Gym and DeepMind Control Suite, and demonstrates competitive performance on the same environments with discretized action spaces.
nan
Article 1177
Title@2025-06-19 (4): SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics
Title: SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics | SlepNet: Spektrales Subgraphenrepräsentationslernen für neurale Dynamik | SlepNet:神经动力学光谱子图示学习 2506.16602v1 |
Authors (6): Siddharth Viswanath, Rahul Singh, Yanlei Zhang, J. Adam Noah, Joy Hirsch, Smita Krishnaswamy
Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.
nan
Article 1178
Title@2025-06-19 (4): FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Title: FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE | FLAME: Auf dem Weg zu Federated Fine-Tuning großen Sprachmodellen durch adaptive SMoE | FLAME:通过适应性SMOE,走向联邦微调大语言模式 2506.16600v1 |
Authors (4): Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla
Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme. Empirical results across diverse computational settings demonstrate that FLAME consistently outperforms existing methods, providing a robust and effective solution for resource-adaptive federated learning.
nan
Article 1179
Title@2025-06-19 (4): DRIVE Through the Unpredictability:From a Protocol Investigating Slip to a Metric Estimating Command Uncertainty
Title: DRIVE Through the Unpredictability:From a Protocol Investigating Slip to a Metric Estimating Command Uncertainty | DRIVE durch die Unvorhersehbarkeit:Von einem Protokoll, das Slip untersucht, zu einem Metric Estimating Command Uncertainty | 无法预测:从协议调查滑坡到计量估计命令不确定性 2506.16593v1 |
Authors (8): Nicolas Samson, William Larrivée-Hardy, William Dubois, Élie Roy-Brouard, Edith Brotherton, Dominic Baril, Julien Lépine, François Pomerleau
Off-road autonomous navigation is a challenging task as it is mainly dependent on the accuracy of the motion model. Motion model performances are limited by their ability to predict the interaction between the terrain and the UGV, which an onboard sensor can not directly measure. In this work, we propose using the DRIVE protocol to standardize the collection of data for system identification and characterization of the slip state space. We validated this protocol by acquiring a dataset with two platforms (from 75 kg to 470 kg) on six terrains (i.e., asphalt, grass, gravel, ice, mud, sand) for a total of 4.9 hours and 14.7 km. Using this data, we evaluate the DRIVE protocol’s ability to explore the velocity command space and identify the reachable velocities for terrain-robot interactions. We investigated the transfer function between the command velocity space and the resulting steady-state slip for an SSMR. An unpredictability metric is proposed to estimate command uncertainty and help assess risk likelihood and severity in deployment. Finally, we share our lessons learned on running system identification on large UGV to help the community.
nan
Article 1180
Title@2025-06-19 (4): Energy-Based Transfer for Reinforcement Learning
Title: Energy-Based Transfer for Reinforcement Learning | Energiebasierter Transfer für verstärktes Lernen | 强化学习以能源为基础的转让 2506.16590v1 |
Authors (6): Zeyun Deng, Jasorsi Ghosh, Fiona Xie, Yuzhe Lu, Katia Sycara, Joseph Campbell
Reinforcement learning algorithms often suffer from poor sample efficiency, making them challenging to apply in multi-task or continual learning settings. Efficiency can be improved by transferring knowledge from a previously trained teacher policy to guide exploration in new but related tasks. However, if the new task sufficiently differs from the teacher’s training task, the transferred guidance may be sub-optimal and bias exploration toward low-reward behaviors. We propose an energy-based transfer learning method that uses out-of-distribution detection to selectively issue guidance, enabling the teacher to intervene only in states within its training distribution. We theoretically show that energy scores reflect the teacher’s state-visitation density and empirically demonstrate improved sample efficiency and performance across both single-task and multi-task settings.
nan
Article 1181
Title@2025-06-19 (4): Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework
Title: Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework | Messung eines (ausreichenden) Weltmodells in LLMs: Ein Rahmen für die Abweichungszersetzung | 计量(足够)LLMM世界模型:差异分解框架 2506.16584v1 |
Authors (2): Nadav Kunievsky, James A. Evans
Understanding whether large language models (LLMs) possess a world model-a structured understanding of the world that supports generalization beyond surface-level patterns-is central to assessing their reliability, especially in high-stakes applications. We propose a formal framework for evaluating whether an LLM exhibits a sufficiently robust world model, defined as producing consistent outputs across semantically equivalent prompts while distinguishing between prompts that express different intents. We introduce a new evaluation approach to measure this that decomposes model response variability into three components: variability due to user purpose, user articulation, and model instability. An LLM with a strong world model should attribute most of the variability in its responses to changes in foundational purpose rather than superficial changes in articulation. This approach allows us to quantify how much of a model’s behavior is semantically grounded rather than driven by model instability or alternative wording. We apply this framework to evaluate LLMs across diverse domains. Our results show how larger models attribute a greater share of output variability to changes in user purpose, indicating a more robust world model. This improvement is not uniform, however: larger models do not consistently outperform smaller ones across all domains, and their advantage in robustness is often modest. These findings highlight the importance of moving beyond accuracy-based benchmarks toward semantic diagnostics that more directly assess the structure and stability of a model’s internal understanding of the world.
nan
Article 1182
Title@2025-06-19 (4): A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning
Title: A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning | A Impliziert B: Schaltungsanalyse in LLMs für propositionelle logische Vernunft | A Implies B: 用于推定逻辑理由的LLMLM的电路分析 2411.04105v4 |
Authors (6): Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy
Due to the size and complexity of modern large language models (LLMs), it has proven challenging to uncover the underlying mechanisms that models use to solve reasoning problems. For instance, is their reasoning for a specific problem localized to certain parts of the network? Do they break down the reasoning problem into modular components that are then executed as sequential steps as we go deeper in the model? To better understand the reasoning capability of LLMs, we study a minimal propositional logic problem that requires combining multiple facts to arrive at a solution. By studying this problem on Mistral and Gemma models, up to 27B parameters, we illuminate the core components the models use to solve such logic problems. From a mechanistic interpretability point of view, we use causal mediation analysis to uncover the pathways and components of the LLMs’ reasoning processes. Then, we offer fine-grained insights into the functions of attention heads in different layers. We not only find a sparse circuit that computes the answer, but we decompose it into sub-circuits that have four distinct and modular uses. Finally, we reveal that three distinct models – Mistral-7B, Gemma-2-9B and Gemma-2-27B – contain analogous but not identical mechanisms.
nan
Article 1183
Title@2025-06-19 (4): From Semantic To Instance: A Semi-Self-Supervised Learning Approach
Title: From Semantic To Instance: A Semi-Self-Supervised Learning Approach | Von semantisch bis instance: Ein halbselbstüberwachter Lernansatz | 从语义到实例:半自监督的学习方法 2506.16563v1 |
Authors (4): Keyhan Najafian, Farhad Maleki, Lingling Jin, Ian Stavness
Instance segmentation is essential for applications such as automated monitoring of plant health, growth, and yield. However, extensive effort is required to create large-scale datasets with pixel-level annotations of each object instance for developing instance segmentation models that restrict the use of deep learning in these areas. This challenge is more significant in images with densely packed, self-occluded objects, which are common in agriculture. To address this challenge, we propose a semi-self-supervised learning approach that requires minimal manual annotation to develop a high-performing instance segmentation model. We design GLMask, an image-mask representation for the model to focus on shape, texture, and pattern while minimizing its dependence on color features. We develop a pipeline to generate semantic segmentation and then transform it into instance-level segmentation. The proposed approach substantially outperforms the conventional instance segmentation models, establishing a state-of-the-art wheat head instance segmentation model with mAP@50 of 98.5%. Additionally, we assessed the proposed methodology on the general-purpose Microsoft COCO dataset, achieving a significant performance improvement of over 12.6% mAP@50. This highlights that the utility of our proposed approach extends beyond precision agriculture and applies to other domains, specifically those with similar data characteristics.
nan
Article 1184
Title@2025-06-19 (4): ChatDBG: Augmenting Debugging with Large Language Models
Title: ChatDBG: Augmenting Debugging with Large Language Models | ChatDBG: Augmenting Debugging mit großen Sprachmodellen | 聊天DBG: 使用大语言模式加强调试 2403.16354v5 |
Authors (4): Kyla H. Levin, Nicolas van Kempen, Emery D. Berger, Stephen N. Freund
Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like “why is x null?”. To handle these queries, ChatDBG grants the LLM autonomy to “take the wheel”: it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. By leveraging the real-world knowledge embedded in LLMs, ChatDBG can diagnose issues identifiable only through the use of domain-specific reasoning. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded more than 75,000 times.
nan
Article 1185
Title@2025-06-19 (4): One Sample is Enough to Make Conformal Prediction Robust
Title: One Sample is Enough to Make Conformal Prediction Robust | Eine Probe reicht aus, um konforme Vorhersagen robust zu machen | 一个样本就足够制造 共创预测力了 2506.16553v1 |
Authors (3): Soroush H. Zargarbashi, Mohammad Sadegh Akhondzadeh, Aleksandar Bojchevski
Given any model, conformal prediction (CP) returns prediction sets guaranteed to include the true label with high adjustable probability. Robust CP (RCP) extends this to inputs with worst-case noise. A well-established approach is to use randomized smoothing for RCP since it is applicable to any black-box model and provides smaller sets compared to deterministic methods. However, current smoothing-based RCP requires many model forward passes per each input which is computationally expensive. We show that conformal prediction attains some robustness even with a forward pass on a single randomly perturbed input. Using any binary certificate we propose a single sample robust CP (RCP1). Our approach returns robust sets with smaller average set size compared to SOTA methods which use many (e.g. around 100) passes per input. Our key insight is to certify the conformal prediction procedure itself rather than individual scores. Our approach is agnostic to the setup (classification and regression). We further extend our approach to smoothing-based robust conformal risk control.
nan
Article 1186
Title@2025-06-19 (4): A Free Probabilistic Framework for Analyzing the Transformer-based Language Models
Title: A Free Probabilistic Framework for Analyzing the Transformer-based Language Models | Ein freier probabilistischer Rahmen für die Analyse der transformerbasierten Sprachmodelle | 分析以变换器为基础的语言模型的自由概率框架 2506.16550v1 |
Authors (1): Swagatam Das
We outline an operator-theoretic framework for analyzing transformer-based language models using the tools of free probability theory. By representing token embeddings and attention mechanisms as self-adjoint operators in a racial probability space, we reinterpret attention as a non-commutative convolution and view the layer-wise propagation of representations as an evolution governed by free additive convolution. This formalism reveals a spectral dynamical system underpinning deep transformer stacks and offers insight into their inductive biases, generalization behavior, and entropy dynamics. We derive a generalization bound based on free entropy and demonstrate that the spectral trace of transformer layers evolves predictably with depth. Our approach bridges neural architecture with non-commutative harmonic analysis, enabling principled analysis of information flow and structural complexity in large language models
nan
Article 1187
Title@2025-06-19 (4): Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
Title: Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU | Herr Snuffleupagus bei SemEval-2025 Task 4: Unlearning Factual Knowledge von LLMs mit adaptiver RMU | Snuffleupagus先生在SemEval-2025任务4:从利用适应性RMU的LLMs中汲取事实知识 2506.16548v1 |
Authors (2): Arjun Dosajh, Mihika Sanghi
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, their tendency to memorize training data raises concerns regarding privacy, copyright compliance, and security, particularly in cases involving Personally Identifiable Information (PII). Effective machine unlearning techniques are essential to mitigate these risks, yet existing methods remain underdeveloped for LLMs due to their open-ended output space. In this work, we apply the Adaptive Representation Misdirection Unlearning (RMU) technique to unlearn sensitive information from LLMs. Through extensive experiments, we analyze the effects of unlearning across different decoder layers to determine the most effective regions for sensitive information removal. Our technique ranked 4th on the official leaderboard of both 1B parameter and 7B parameter models.
nan
Article 1188
Title@2025-06-19 (4): BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios
Title: BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios | BIDA: Ein Zwei-Ebenen-Interaktionsentscheidungs-Algorithmus für autonome Fahrzeuge in dynamischen Verkehrsszenarien | BIDA:动态交通情况中机动车辆的双级互动决策比额 2506.16546v1 |
Authors (6): Liyang Yu, Tianyi Wang, Junfeng Jiao, Fengwu Shan, Hongqing Chu, Bingzhao Gao
In complex real-world traffic environments, autonomous vehicles (AVs) need to interact with other traffic participants while making real-time and safety-critical decisions accordingly. The unpredictability of human behaviors poses significant challenges, particularly in dynamic scenarios, such as multi-lane highways and unsignalized T-intersections. To address this gap, we design a bi-level interaction decision-making algorithm (BIDA) that integrates interactive Monte Carlo tree search (MCTS) with deep reinforcement learning (DRL), aiming to enhance interaction rationality, efficiency and safety of AVs in dynamic key traffic scenarios. Specifically, we adopt three types of DRL algorithms to construct a reliable value network and policy network, which guide the online deduction process of interactive MCTS by assisting in value update and node selection. Then, a dynamic trajectory planner and a trajectory tracking controller are designed and implemented in CARLA to ensure smooth execution of planned maneuvers. Experimental evaluations demonstrate that our BIDA not only enhances interactive deduction and reduces computational costs, but also outperforms other latest benchmarks, which exhibits superior safety, efficiency and interaction rationality under varying traffic conditions.
nan
Article 1189
Title@2025-06-19 (4): Essential-Web v1.0: 24T tokens of organized web data
Title: Essential-Web v1.0: 24T tokens of organized web data | Essential-Web v1.0: 24T Token von organisierten Web-Daten | 基本Web v1.0: 24个有组织网络数据标记 2506.14111v2 |
Authors (25): Essential AI, :, Andrew Hojel, Michael Pust, Tim Romanski, Yash Vanjani, Ritvik Kapila, Mohit Parmar, Adarsh Chaluvaraju, Alok Tripathy, Anil Thomas, Ashish Tanwer, Darsh J Shah, Ishaan Shah, Karl Stratos, Khoi Nguyen, Kurt Smith, Michael Callahan, Peter Rushton, Philip Monk, Platon Mazarakis, Saad Jamal, Saurabh Srivastava, Somanshu Singla, Ashish Vaswani
Data plays the most prominent role in how language models acquire skills and knowledge. The lack of massive, well-organized pre-training datasets results in costly and inaccessible data pipelines. We present Essential-Web v1.0, a 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality. Taxonomy labels are produced by EAI-Distill-0.5b, a fine-tuned 0.5b-parameter model that achieves an annotator agreement within 3% of Qwen2.5-32B-Instruct. With nothing more than SQL-style filters, we obtain competitive web-curated datasets in math (-8.0% relative to SOTA), web code (+14.3%), STEM (+24.5%) and medical (+8.6%). Essential-Web v1.0 is available on HuggingFace: https://huggingface.co/datasets/EssentialAI/essential-web-v1.0
nan
Article 1190
Title@2025-06-19 (4): On the Robustness of Decision-Focused Learning
Title: On the Robustness of Decision-Focused Learning | Zur Robustheit des entscheidungsorientierten Lernens | 关于决策重点学习的有力性 2311.16487v4 |
Authors (1): Yehya Farhat
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted. DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives. DFL has shown a lot of promise and holds the capacity to revolutionize decision-making in many real-world applications. However, very little is known about the performance of these models under adversarial attacks. We adopt ten unique DFL methods and benchmark their performance under two distinctly focused attacks adapted towards the Predict-then-Optimize problem setting. Our study proposes the hypothesis that the robustness of a model is highly correlated with its ability to find predictions that lead to optimal decisions without deviating from the ground-truth label. Furthermore, we provide insight into how to target the models that violate this condition and show how these models respond differently depending on the achieved optimality at the end of their training cycles.
nan
Article 1191
Title@2025-06-19 (4): Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches
Title: Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches | Ausrichtung der ASR-Bewertung auf menschliche und LLM-Richtungen: Intelligibilitätsmetrics mit phonetischen, semantischen und NLI-Anflügen | 将ASR评价与人类和LLM判决:使用电话、语义和NLI方法的智能计量学 2506.16528v1 |
Authors (3): Bornali Phukon, Xiuwen Zheng, Mark Hasegawa-Johnson
Traditional ASR metrics like WER and CER fail to capture intelligibility, especially for dysarthric and dysphonic speech, where semantic alignment matters more than exact word matches. ASR systems struggle with these speech types, often producing errors like phoneme repetitions and imprecise consonants, yet the meaning remains clear to human listeners. We identify two key challenges: (1) Existing metrics do not adequately reflect intelligibility, and (2) while LLMs can refine ASR output, their effectiveness in correcting ASR transcripts of dysarthric speech remains underexplored. To address this, we propose a novel metric integrating Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity. Our ASR evaluation metric achieves a 0.890 correlation with human judgments on Speech Accessibility Project data, surpassing traditional methods and emphasizing the need to prioritize intelligibility over error-based measures.
nan
Article 1192
Title@2025-06-19 (4): Improvement of Nuclide Detection through Graph Spectroscopic Analysis Framework and its Application to Nuclear Facility Upset Detection
Title: Improvement of Nuclide Detection through Graph Spectroscopic Analysis Framework and its Application to Nuclear Facility Upset Detection | Verbesserung der Nuklid-Erkennung durch Graph Spektroskopische Analyserahmen und ihre Anwendung auf Kernanlagen-Auffangerkennung | 通过图谱光谱分析框架及其适用于核设施爆裂探测的图示光谱分析框架改进核子分子探测 2506.16522v1 |
Authors (3): Pedro Rodríguez Fernández, Christian Svinth, Alex Hagen
We present a method to improve the detection limit for radionuclides using spectroscopic radiation detectors and the arrival time of each detected radiation quantum. We enable this method using a neural network with an attention mechanism. We illustrate the method on the detection of Cesium release from a nuclear facility during an upset, and our method shows $2\times$ improvement over the traditional spectroscopic method. We hypothesize that our method achieves this performance increase by modulating its detection probability by the overall rate of probable detections, specifically by adapting detection thresholds based on temporal event distributions and local spectral features, and show evidence to this effect. We believe this method is applicable broadly and may be more successful for radionuclides with more complicated decay chains than Cesium; we also note that our method can generalize beyond the addition of arrival time and could integrate other data about each detection event, such as pulse quality, location in detector, or even combining the energy and time from detections in different detectors.
nan
Article 1193
Title@2025-06-19 (4): Robust Reward Modeling via Causal Rubrics
Title: Robust Reward Modeling via Causal Rubrics | Robuste Reward-Modellierung über Kausalrubriken | 通过果实卢布建模的强力奖赏模型 2506.16507v1 |
Authors (12): Pragya Srivastava, Harman Singh, Rahul Madhavan, Gandharv Patil, Sravanti Addepalli, Arun Suggala, Rengarajan Aravamudhan, Soumya Sharma, Anirban Laha, Aravindan Raghuveer, Karthikeyan Shanmugam, Doina Precup
Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standard training objectives struggle to disentangle these factors, leading to brittle RMs and misaligned policies. We introduce Crome (Causally Robust Reward Modeling), a novel framework grounded in an explicit causal model designed to mitigate reward hacking. Crome employs the following synthetic targeted augmentations during training: (1) Causal Augmentations, which are pairs that differ along specific causal attributes, to enforce sensitivity along each causal attribute individually, and (2) Neutral Augmentations, which are tie-label pairs varying primarily in spurious attributes, to enforce invariance along spurious attributes. Notably, our augmentations are produced without any knowledge of spurious factors, via answer interventions only along causal rubrics, that are identified by querying an oracle LLM. Empirically, Crome significantly outperforms standard baselines on RewardBench, improving average accuracy by up to 5.4% and achieving gains of up to 13.2% and 7.2% in specific categories. The robustness of Crome is further testified by the consistent gains obtained in a Best-of-N inference setting across increasing N, across various benchmarks, including the popular RewardBench (covering chat, chat-hard, safety, and reasoning tasks), the safety-focused WildGuardTest, and the reasoning-specific GSM8k.
nan
Article 1194
Title@2025-06-19 (4): Subspace-Boosted Model Merging
Title: Subspace-Boosted Model Merging | Subraum-beschleunigtes Modell Zusammenführen | 子空间叠装模型合并 2506.16506v1 |
Authors (4): Ronald Skorobogat, Karsten Roth, Mariana-Iuliana Georgescu, Zeynep Akata
Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we offer an explanation and analysis from a task arithmetic perspective; revealing that as the merging process (across numerous existing merging methods) continues for more and more experts, the associated task vector space experiences rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 expert models by large margins of more than 10% when evaluated on vision benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to further quantify task similarity, offering a new interpretable perspective on model merging.
nan
Article 1195
Title@2025-06-19 (4): SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Title: SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity | SparseLoRA: LLM-Fine-Tuning mit Kontextsparsität beschleunigen | 加快LLM与上下文质量的精细调整 2506.16500v1 |
Authors (10): Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.
nan
Article 1196
Title@2025-06-19 (4): ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning
Title: ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning | ML-Master: Auf dem Weg zu KI durch Integration von Exploration und Vernunft | ML-Master:通过综合探讨和理由,争取AI为AI 2506.16499v1 |
Authors (9): Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen
As AI capabilities advance toward and potentially beyond human-level performance, a natural transition emerges where AI-driven development becomes more efficient than human-centric approaches. A promising pathway toward this transition lies in AI-for-AI (AI4AI), which leverages AI techniques to automate and optimize the design, training, and deployment of AI systems themselves. While LLM-based agents have shown the potential to realize AI4AI, they are often unable to fully leverage the experience accumulated by agents during the exploration of solutions in the reasoning process, leading to inefficiencies and suboptimal performance. To address this limitation, we propose ML-Master, a novel AI4AI agent that seamlessly integrates exploration and reasoning by employing a selectively scoped memory mechanism. This approach allows ML-Master to efficiently combine diverse insights from parallel solution trajectories with analytical reasoning, guiding further exploration without overwhelming the agent with excessive context. We evaluate ML-Master on the MLE-Bench, where it achieves a 29.3% average medal rate, significantly surpassing existing methods, particularly in medium-complexity tasks, while accomplishing this superior performance within a strict 12-hour time constraint-half the 24-hour limit used by previous baselines. These results demonstrate ML-Master’s potential as a powerful tool for advancing AI4AI.
nan
Article 1197
Title@2025-06-19 (4): QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
Title: QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation | QG-SMS: Verbesserung der Testobjektanalyse durch Studentenmodellierung und Simulation | QG-SMS:通过学生建模和模拟加强测试物品分析 2503.05888v2 |
Authors (5): Bang Nguyen, Tingting Du, Mengxia Yu, Lawrence Angrave, Meng Jiang
While the Question Generation (QG) task has been increasingly adopted in educational assessments, its evaluation remains limited by approaches that lack a clear connection to the educational values of test items. In this work, we introduce test item analysis, a method frequently used by educators to assess test question quality, into QG evaluation. Specifically, we construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency. We then examine whether existing QG evaluation approaches can effectively distinguish these differences. Our findings reveal significant shortcomings in these approaches with respect to accurately assessing test item quality in relation to student performance. To address this gap, we propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation to perform test item analysis. As demonstrated in our extensive experiments and human evaluation study, the additional perspectives introduced by the simulated student profiles lead to a more effective and robust assessment of test items.
nan
Article 1198
Title@2025-06-19 (4): Manifold Learning for Personalized and Label-Free Detection of Cardiac Arrhythmias
Title: Manifold Learning for Personalized and Label-Free Detection of Cardiac Arrhythmias | Manifold Learning für personalisierte und etikettenfreie Erkennung von Herzrhythmusstörungen | 人工和无标签地发现心心心律失常的人工和不贴标签的人文学习 2506.16494v1 |
Authors (2): Amir Reza Vazifeh, Jason W. Fleischer
Electrocardiograms (ECGs) provide direct, non-invasive measurements of heart activity and are well-established tools for detecting and monitoring cardiovascular disease. However, manual ECG analysis can be time-consuming and prone to errors. Machine learning has emerged as a promising approach for automated heartbeat recognition and classification, but substantial variations in ECG signals make it challenging to develop generalizable models. ECG signals can vary widely across individuals and leads, while datasets often follow different labeling standards and may be biased, all of which greatly hinder supervised methods. Conventional unsupervised methods, e.g. principal component analysis, prioritize large (and often obvious) variances in the data and typically overlook subtle yet clinically relevant patterns. If labels are missing and/or variations are significant but small, both approaches fail. Here, we show that nonlinear dimensionality reduction (NLDR) can accommodate these issues and identify medically relevant features in ECG signals, with no need for training or prior information. Using the MLII and V1 leads of the MIT-BIH dataset, we demonstrate that t-distributed stochastic neighbor embedding and uniform manifold approximation and projection can discriminate individual recordings in mixed populations with >= 90% accuracy and distinguish different arrhythmias in individual patients with a median accuracy of 98.96% and a median F1-score of 91.02%. The results show that NLDR holds much promise for cardiac monitoring, including the limiting cases of single-lead ECG and the current 12-lead standard of care, and for personalized health care beyond cardiology.
nan
Article 1199
Title@2025-06-19 (4): Competing Bandits in Decentralized Contextual Matching Markets
Title: Competing Bandits in Decentralized Contextual Matching Markets | Konkurrieren von Banditen in dezentralisierten Kontext-Matching-Märkten | 分散环境匹配市场中相互竞争的强盗 2411.11794v2 |
Authors (4): Satush Parikh, Soumya Basu, Avishek Ghosh, Abishek Sankararaman
Sequential learning in a multi-agent resource constrained matching market has received significant interest in the past few years. We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for the supply side (aka arms) with potentially time-varying preferences to obtain a stable match. Motivated by the linear contextual bandit framework, we assume that for each agent, an arm-mean may be represented by a linear function of a known feature vector and an unknown (agent-specific) parameter. Moreover, the preferences over arms depend on a latent environment in each round, where the latent environment varies across rounds in a non-stationary manner. We propose learning algorithms to identify the latent environment and obtain stable matchings simultaneously. Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, and hence applicable for a large market.
nan
Article 1200
Title@2025-06-19 (4): Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection
Title: Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection | Auf dem Weg zu allgemeingültigen allgemeinen schädlichen Sprachdatensätzen für Implizite Hass-Spracherkennung | 争取建立通用通用通用有害言论数据集,用于隐含仇恨言论探测 2506.16476v1 |
Authors (4): Saad Almohaimeed, Saleh Almohaimeed, Damla Turgut, Ladislau Bölöni
Implicit hate speech has recently emerged as a critical challenge for social media platforms. While much of the research has traditionally focused on harmful speech in general, the need for generalizable techniques to detect veiled and subtle forms of hate has become increasingly pressing. Based on lexicon analysis, we hypothesize that implicit hate speech is already present in publicly available harmful speech datasets but may not have been explicitly recognized or labeled by annotators. Additionally, crowdsourced datasets are prone to mislabeling due to the complexity of the task and often influenced by annotators’ subjective interpretations. In this paper, we propose an approach to address the detection of implicit hate speech and enhance generalizability across diverse datasets by leveraging existing harmful speech datasets. Our method comprises three key components: influential sample identification, reannotation, and augmentation using Llama-3 70B and GPT-4o. Experimental results demonstrate the effectiveness of our approach in improving implicit hate detection, achieving a +12.9-point F1 score improvement compared to the baseline.
nan
Article 1201
Title@2025-06-19 (4): Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining
Title: Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining | Human2LocoMan: Vielseitige Quadrupedalmanipulation mit menschlichem Vortraining lernen | 人类2 Locoman: 学习与人类预科培训一起四步操作 2506.16475v1 |
Authors (14): Yaru Niu, Yunzhe Zhang, Mingyang Yu, Changyi Lin, Chenhao Li, Yikai Wang, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Bingqing Chen, Jonathan Francis, Zhenzhen Li, Jie Tan, Ding Zhao
Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at: https://human2bots.github.io.
nan
Article 1202
Title@2025-06-19 (4): Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations
Title: Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations | Förderung des multidemografischen föderierten Lernens für die Röntgenanalyse in der Brust mittels selbstüberwachter Darstellungen für allgemeine Zwecke | 利用普通用途自我监督的表述方式,促进多人口联合会学习胸部射电分析 2504.08584v2 |
Authors (5): Mahshad Lotfinia, Arash Tayebiarasteh, Samaneh Samiei, Mehdi Joodaki, Soroosh Tayebi Arasteh
Reliable artificial intelligence (AI) models for medical image analysis often depend on large and diverse labeled datasets. Federated learning (FL) offers a decentralized and privacy-preserving approach to training but struggles in highly non-independent and identically distributed (non-IID) settings, where institutions with more representative data may experience degraded performance. Moreover, existing large-scale FL studies have been limited to adult datasets, neglecting the unique challenges posed by pediatric data, which introduces additional non-IID variability. To address these limitations, we analyzed n=398,523 adult chest radiographs from diverse institutions across multiple countries and n=9,125 pediatric images, leveraging transfer learning from general-purpose self-supervised image representations to classify pneumonia and cases with no abnormality. Using state-of-the-art vision transformers, we found that FL improved performance only for smaller adult datasets (P<0.001) but degraded performance for larger datasets (P<0.064) and pediatric cases (P=0.242). However, equipping FL with self-supervised weights significantly enhanced outcomes across pediatric cases (P=0.031) and most adult datasets (P<0.008), except the largest dataset (P=0.052). These findings underscore the potential of easily deployable general-purpose self-supervised image representations to address non-IID challenges in clinical FL applications and highlight their promise for enhancing patient outcomes and advancing pediatric healthcare, where data scarcity and variability remain persistent obstacles.
nan
Article 1203
Title@2025-06-19 (4): AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation
Title: AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation | AlphaTrans: Ein neuro-symbolischer Kompositionsansatz für Repository-Level-Code-Übersetzung und Validierung | AlphaTrans: 存储层代码翻译和校验的神经-交元组合法 2410.24117v5 |
Authors (7): Ali Reza Ibrahimzada, Kaiyao Ke, Mrigank Pawagi, Muhammad Salman Abid, Rangeet Pan, Saurabh Sinha, Reyhaneh Jabbarvand
Code translation transforms programs from one programming language (PL) to another. Several rule-based transpilers have been designed to automate code translation between different pairs of PLs. However, the rules can become obsolete as the PLs evolve and cannot generalize to other PLs. Recent studies have explored the automation of code translation using Large Language Models (LLMs). One key observation is that such techniques may work well for crafted benchmarks but fail to generalize to the scale and complexity of real-world projects with dependencies, custom types, PL-specific features, etc. We propose AlphaTrans, a neuro-symbolic approach to automate repository-level code translation. AlphaTrans translates both source and test code, and employs multiple levels of validation to ensure the translation preserves the functionality of the source program. To break down the problem for LLMs, AlphaTrans leverages program analysis to decompose the program into fragments and translates them in the reverse call order. We leveraged AlphaTrans to translate ten real-world open-source projects consisting of <836, 8575, 2719> classes, methods, and tests. AlphaTrans breaks down these projects into 17874 fragments and translates the entire repository. 96.40% of the translated fragments are syntactically correct, and AlphaTrans validates the translations’ runtime behavior and functional correctness for 27.03% and 25.14% of fragments. On average, the integrated translation and validation take 34 hours to translate a project, showing its scalability in practice. For the incorrect translations, AlphaTrans generates a report including existing translation, stack trace, test errors, or assertion failures. We provided these artifacts to two developers to fix the translation bugs in four projects. They were able to fix the issues in 20.1 hours on average and achieve all passing tests.
nan
Article 1204
Title@2025-06-19 (4): Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
Title: Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities | Progressive Inferenz-Zeit Annealing von Diffusionsmodellen für die Probenahme von Boltzmann Dichten | Boltzmann大区采样扩散模型的逐步推导-及时销毁 2506.16471v1 |
Authors (10): Tara Akhound-Sadegh, Jungyoon Lee, Avishek Joey Bose, Valentin De Bortoli, Arnaud Doucet, Michael M. Bronstein, Dominique Beaini, Siamak Ravanbakhsh, Kirill Neklyudov, Alexander Tong
Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact scientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita
nan
Article 1205
Title@2025-06-19 (4): Human-like Forgetting Curves in Deep Neural Networks
Title: Human-like Forgetting Curves in Deep Neural Networks | Menschenähnliche vergessende Kurven in tiefen neuralen Netzwerken | 人类在深神经网络中忘记曲线 2506.12034v2 |
Authors (1): Dylan Kline
This study bridges cognitive science and neural network design by examining whether artificial models exhibit human-like forgetting curves. Drawing upon Ebbinghaus’ seminal work on memory decay and principles of spaced repetition, we propose a quantitative framework to measure information retention in neural networks. Our approach computes the recall probability by evaluating the similarity between a network’s current hidden state and previously stored prototype representations. This retention metric facilitates the scheduling of review sessions, thereby mitigating catastrophic forgetting during deployment and enhancing training efficiency by prompting targeted reviews. Our experiments with Multi-Layer Perceptrons reveal human-like forgetting curves, with knowledge becoming increasingly robust through scheduled reviews. This alignment between neural network forgetting curves and established human memory models identifies neural networks as an architecture that naturally emulates human memory decay and can inform state-of-the-art continual learning algorithms.
nan
Article 1206
Title@2025-06-19 (4): Black-Box Privacy Attacks on Shared Representations in Multitask Learning
Title: Black-Box Privacy Attacks on Shared Representations in Multitask Learning | Black-Box-Datenschutzangriffe auf geteilte Repräsentationen im Multitasking-Lernen | 在多任务学习中分享代表的黑人隐私攻击 2506.16460v1 |
Authors (6): John Abascal, Nicolás Berrios, Alina Oprea, Jonathan Ullman, Adam Smith, Matthew Jagielski
Multitask learning (MTL) has emerged as a powerful paradigm that leverages similarities among multiple learning tasks, each with insufficient samples to train a standalone model, to solve them simultaneously while minimizing data sharing across users and organizations. MTL typically accomplishes this goal by learning a shared representation that captures common structure among the tasks by embedding data from all tasks into a common feature space. Despite being designed to be the smallest unit of shared information necessary to effectively learn patterns across multiple tasks, these shared representations can inadvertently leak sensitive information about the particular tasks they were trained on. In this work, we investigate what information is revealed by the shared representations through the lens of inference attacks. Towards this, we propose a novel, black-box task-inference threat model where the adversary, given the embedding vectors produced by querying the shared representation on samples from a particular task, aims to determine whether that task was present when training the shared representation. We develop efficient, purely black-box attacks on machine learning models that exploit the dependencies between embeddings from the same task without requiring shadow models or labeled reference data. We evaluate our attacks across vision and language domains for multiple use cases of MTL and demonstrate that even with access only to fresh task samples rather than training data, a black-box adversary can successfully infer a task’s inclusion in training. To complement our experiments, we provide theoretical analysis of a simplified learning setting and show a strict separation between adversaries with training samples and fresh samples from the target task’s distribution.
nan
Article 1207
Title@2025-06-19 (4): Joint Tensor-Train Parameterization for Efficient and Expressive Low-Rank Adaptation
Title: Joint Tensor-Train Parameterization for Efficient and Expressive Low-Rank Adaptation | Gemeinsame Tensor-Train-Parameterisierung für effiziente und Expressive Low-Rank-Anpassung | 高效和表现式低射速适应联合登机机-培训参数 2506.16456v1 |
Authors (5): Jun Qi, Chen-Yu Liu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Min-Hsiu Hsieh
Low-Rank Adaptation (LoRA) is widely recognized for its parameter-efficient fine-tuning of large-scale neural models. However, standard LoRA independently optimizes low-rank matrices, which inherently limits its expressivity and generalization capabilities. While classical tensor-train (TT) decomposition can be separately employed on individual LoRA matrices, this work demonstrates that the classical TT-based approach neither significantly improves parameter efficiency nor achieves substantial performance gains. This paper proposes TensorGuide, a novel tensor-train-guided adaptation framework to overcome these limitations. TensorGuide generates two correlated low-rank LoRA matrices through a unified TT structure driven by controlled Gaussian noise. The resulting joint TT representation inherently provides structured, low-rank adaptations, significantly enhancing expressivity, generalization, and parameter efficiency without increasing the number of trainable parameters. Theoretically, we justify these improvements through neural tangent kernel analyses, demonstrating superior optimization dynamics and enhanced generalization. Extensive experiments on quantum dot classification and GPT-2 fine-tuning benchmarks demonstrate that TensorGuide-based LoRA consistently outperforms standard LoRA and TT-LoRA, achieving improved accuracy and scalability with fewer parameters.
nan
Article 1208
Title@2025-06-19 (4): Consumer-friendly EEG-based Emotion Recognition System: A Multi-scale Convolutional Neural Network Approach
Title: Consumer-friendly EEG-based Emotion Recognition System: A Multi-scale Convolutional Neural Network Approach | Consumer-friendly EEG-based Emotion Recognition System: Ein multi-scale Convolutional Neural Network Ansatz | 以基于基于爱-爱-爱-爱-爱-爱-爱-情感承认系统:多规模革命神经网络方法 2506.16448v1 |
Authors (2): Tri Duc Ly, Gia H. Ngo
EEG is a non-invasive, safe, and low-risk method to record electrophysiological signals inside the brain. Especially with recent technology developments like dry electrodes, consumer-grade EEG devices, and rapid advances in machine learning, EEG is commonly used as a resource for automatic emotion recognition. With the aim to develop a deep learning model that can perform EEG-based emotion recognition in a real-life context, we propose a novel approach to utilize multi-scale convolutional neural networks to accomplish such tasks. By implementing feature extraction kernels with many ratio coefficients as well as a new type of kernel that learns key information from four separate areas of the brain, our model consistently outperforms the state-of-the-art TSception model in predicting valence, arousal, and dominance scores across many performance evaluation metrics.
nan
Article 1209
Title@2025-06-19 (4): Leveraging Influence Functions for Resampling Data in Physics-Informed Neural Networks
Title: Leveraging Influence Functions for Resampling Data in Physics-Informed Neural Networks | Nutzung von Einflussfunktionen für die Resampling-Daten in physikinformierten Neuronalen Netzwerken | 利用物理内成形神经网络中恢复数据取样的利用影响功能 2506.16443v1 |
Authors (8): Jonas R. Naujoks, Aleksander Krasowski, Moritz Weckbecker, Galip Ümit Yolcu, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek, René P. Klausen
Physics-informed neural networks (PINNs) offer a powerful approach to solving partial differential equations (PDEs), which are ubiquitous in the quantitative sciences. Applied to both forward and inverse problems across various scientific domains, PINNs have recently emerged as a valuable tool in the field of scientific machine learning. A key aspect of their training is that the data – spatio-temporal points sampled from the PDE’s input domain – are readily available. Influence functions, a tool from the field of explainable AI (XAI), approximate the effect of individual training points on the model, enhancing interpretability. In the present work, we explore the application of influence function-based sampling approaches for the training data. Our results indicate that such targeted resampling based on data attribution methods has the potential to enhance prediction accuracy in physics-informed neural networks, demonstrating a practical application of an XAI method in PINN training.
nan
Article 1210
Title@2025-06-19 (4): PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Title: PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding | PerceptionLM: Open-Access-Daten und Modelle für ein detailliertes visuelles Verständnis | 感知LM:开放存取数据和详细视觉理解模型 2504.13180v2 |
Authors (29): Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Philipp Krähenbühl, Piotr Dollár, Lorenzo Torresani, Kristen Grauman, Christoph Feichtenhofer
Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM-VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about “what”, “where”, “when”, and “how” of a video. We make our work fully reproducible by providing data, training recipes, code & models. https://github.com/facebookresearch/perception_models
nan
Article 1211
Title@2025-06-19 (4): An efficient neuromorphic approach for collision avoidance combining Stack-CNN with event cameras
Title: An efficient neuromorphic approach for collision avoidance combining Stack-CNN with event cameras | Ein effizienter neuromorpher Ansatz zur Kollisionsvermeidung, der Stack-CNN mit Eventkameras kombiniert | 将Stack-CNN与事件摄像头相结合的一种高效的避免碰撞神经形态法 2506.16436v1 |
Authors (3): Antonio Giulio Coretti, Mattia Varile, Mario Edoardo Bertaina
Space debris poses a significant threat, driving research into active and passive mitigation strategies. This work presents an innovative collision avoidance system utilizing event-based cameras - a novel imaging technology well-suited for Space Situational Awareness (SSA) and Space Traffic Management (STM). The system, employing a Stack-CNN algorithm (previously used for meteor detection), analyzes real-time event-based camera data to detect faint moving objects. Testing on terrestrial data demonstrates the algorithm’s ability to enhance signal-to-noise ratio, offering a promising approach for on-board space imaging and improving STM/SSA operations.
nan
Article 1212
Title@2025-06-19 (4): Agentic Personalisation of Cross-Channel Marketing Experiences
Title: Agentic Personalisation of Cross-Channel Marketing Experiences | Agentische Personalisierung von Cross-Channel-Marketing-Erfahrungen | 跨渠道营销经验的代理个性化 2506.16429v1 |
Authors (5): Sami Abboud, Eleanor Hanna, Olivier Jeunen, Vineesha Raheja, Schaun Wheeler
Consumer applications provide ample opportunities to surface and communicate various forms of content to users. From promotional campaigns for new features or subscriptions, to evergreen nudges for engagement, or personalised recommendations; across e-mails, push notifications, and in-app surfaces. The conventional approach to orchestration for communication relies heavily on labour-intensive manual marketer work, and inhibits effective personalisation of content, timing, frequency, and copy-writing. We formulate this task under a sequential decision-making framework, where we aim to optimise a modular decision-making policy that maximises incremental engagement for any funnel event. Our approach leverages a Difference-in-Differences design for Individual Treatment Effect estimation, and Thompson sampling to balance the explore-exploit trade-off. We present results from a multi-service application, where our methodology has resulted in significant increases to a variety of goal events across several product features, and is currently deployed across 150 million users.
nan
Article 1213
Title@2025-06-19 (4): EFormer: An Effective Edge-based Transformer for Vehicle Routing Problems
Title: EFormer: An Effective Edge-based Transformer for Vehicle Routing Problems | EFormer: Ein effektiver Edge-basierter Transformer für Fahrzeugrouting-Probleme | Eformer:车辆运行问题的有效边缘变异器 2506.16428v1 |
Authors (6): Dian Meng, Zhiguang Cao, Yaoxin Wu, Yaqing Hou, Hongwei Ge, Qiang Zhang
Recent neural heuristics for the Vehicle Routing Problem (VRP) primarily rely on node coordinates as input, which may be less effective in practical scenarios where real cost metrics-such as edge-based distances-are more relevant. To address this limitation, we introduce EFormer, an Edge-based Transformer model that uses edge as the sole input for VRPs. Our approach employs a precoder module with a mixed-score attention mechanism to convert edge information into temporary node embeddings. We also present a parallel encoding strategy characterized by a graph encoder and a node encoder, each responsible for processing graph and node embeddings in distinct feature spaces, respectively. This design yields a more comprehensive representation of the global relationships among edges. In the decoding phase, parallel context embedding and multi-query integration are used to compute separate attention mechanisms over the two encoded embeddings, facilitating efficient path construction. We train EFormer using reinforcement learning in an autoregressive manner. Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) reveal that EFormer outperforms established baselines on synthetic datasets, including large-scale and diverse distributions. Moreover, EFormer demonstrates strong generalization on real-world instances from TSPLib and CVRPLib. These findings confirm the effectiveness of EFormer’s core design in solving VRPs.
nan
Article 1214
Title@2025-06-19 (4): Quantifying artificial intelligence through algorithmic generalization
Title: Quantifying artificial intelligence through algorithmic generalization | Quantifizierung künstlicher Intelligenz durch algorithmische Verallgemeinerung | 通过算法一般化对人工智能进行量化 2411.05943v2 |
Authors (5): Takuya Ito, Murray Campbell, Lior Horesh, Tim Klinger, Parikshit Ram
The rapid development of artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, AI systems fall short on tests requiring algorithmic reasoning – a glaring limitation given the necessity for interpretable and reliable technology. Despite a surge of reasoning benchmarks emerging from the academic community, no theoretical framework exists to quantify algorithmic reasoning in AI systems. Here, we adopt a framework from computational complexity theory to quantify algorithmic generalization using algebraic expressions: algebraic circuit complexity. Algebraic circuit complexity theory – the study of algebraic expressions as circuit models – is a natural framework to study the complexity of algorithmic computation. Algebraic circuit complexity enables the study of generalization by defining benchmarks in terms of the computational requirements to solve a problem. Moreover, algebraic circuits are generic mathematical objects; an arbitrarily large number of samples can be generated for a specified circuit, making it an ideal experimental sandbox for the data-hungry models that are used today. In this Perspective, we adopt tools from algebraic circuit complexity, apply them to formalize a science of algorithmic generalization, and address key challenges for its successful application to AI science.
nan
Article 1215
Title@2025-06-19 (4): ALTA: Compiler-Based Analysis of Transformers
Title: ALTA: Compiler-Based Analysis of Transformers | ALTA: Compiler-basierte Analyse von Transformatoren | ALTA:以汇编者为基础对变形器的分析 2410.18077v2 |
Authors (6): Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova
We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework – language specification, symbolic interpreter, and weight compiler – available to the community to enable further applications and insights.
nan
Article 1216
Title@2025-06-19 (4): Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Title: Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models | MoE Router optimieren: Design, Implementierung und Evaluation in Transformer-Modellen | 优化教育部优化路由器:变革型模型的设计、实施和评价 2506.16419v1 |
Authors (3): Daniel Fidel Harvey, George Weale, Berk Yilmaz
Mixture of Experts (MoE) architectures increase large language model scalability, yet their performance depends on the router module that moves tokens to specialized experts. Bad routing can load imbalance and reduced accuracy. This project designed and implemented different router architectures within Transformer models to fix these limitations. We experimented with six distinct router variants Linear, Attention, Multi-Layer Perceptron (MLP), Hybrid, Hash, and our new MLP-Hadamard. We characterized these routers using BERT and the Qwen1.5-MoE model, looking at parameter efficiency, inference latency, routing entropy, and expert utilization patterns. Our evaluations showed distinct trade-offs: Linear routers offer speed, while MLP and Attention routers provide greater expressiveness. The MLP-Hadamard router shows a unique capability for structured, sparse routing. We successfully replaced and fine-tuned custom routers within the complex, quantized Qwen1.5-MoE model. This work provides a comparative analysis of MoE router designs and offers insights into optimizing their performance for efficient and effective large-scale model deployment.
nan
Article 1217
Title@2025-06-19 (4): On Continuous Monitoring of Risk Violations under Unknown Shift
Title: On Continuous Monitoring of Risk Violations under Unknown Shift | Kontinuierliche Überwachung von Risikoverletzungen unter unbekannter Verschiebung | 关于根据未知轮移持续监测违反风险情况 2506.16416v1 |
Authors (4): Alexander Timans, Rajeev Verma, Eric Nalisnick, Christian A. Naesseth
Machine learning systems deployed in the real world must operate under dynamic and often unpredictable distribution shifts. This challenges the validity of statistical safety assurances on the system’s risk established beforehand. Common risk control frameworks rely on fixed assumptions and lack mechanisms to continuously monitor deployment reliability. In this work, we propose a general framework for the real-time monitoring of risk violations in evolving data streams. Leveraging the ‘testing by betting’ paradigm, we propose a sequential hypothesis testing procedure to detect violations of bounded risks associated with the model’s decision-making mechanism, while ensuring control on the false alarm rate. Our method operates under minimal assumptions on the nature of encountered shifts, rendering it broadly applicable. We illustrate the effectiveness of our approach by monitoring risks in outlier detection and set prediction under a variety of shifts.
nan
Article 1218
Title@2025-06-19 (4): When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Title: When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework | Wann funktioniert Trennen und Erobern für den langen Kontext LLM? Ein Lärmzersetzungsrahmen | 何时分化和征服工作为长期LLM服务? 噪音分解框架 2506.16411v1 |
Authors (8): Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang
We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is effective to use multi-agent chunking, i.e., dividing a length sequence into smaller chunks and aggregating the processed results of each chunk. Our experiments on tasks such as retrieval, question answering, and summarization confirm both the theoretical analysis and the conditions that favor multi-agent chunking. By exploring superlinear model noise growth with input length, we also explain why, for large inputs, a weaker model configured with chunk-based processing can surpass a more advanced model like GPT4o applied in a single shot. Overall, we present a principled understanding framework and our results highlight a direct pathway to handling long contexts in LLMs with carefully managed chunking and aggregator strategies.
nan
Article 1219
Title@2025-06-19 (4): CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning
Title: CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning | CLOUD: Ein skalierbares und physikinformiertes Grundmodell für das Kristalldarstellungslernen | CLOUD: 水晶代表制学习的可缩缩化和物理成形基础模型 2506.17345v1 |
Authors (3): Changwen Xu, Shang Zhu, Venkatasubramanian Viswanathan
The prediction of crystal properties is essential for understanding structure-property relationships and accelerating the discovery of functional materials. However, conventional approaches relying on experimental measurements or density functional theory (DFT) calculations are often resource-intensive, limiting their scalability. Machine learning (ML) models offer a promising alternative by learning complex structure-property relationships from data, enabling faster predictions. Yet, existing ML models often rely on labeled data, adopt representations that poorly capture essential structural characteristics, and lack integration with physical principles–factors that limit their generalizability and interpretability. Here, we introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE) that encodes crystal symmetry, Wyckoff positions, and composition in a compact, coordinate-free string representation. Pre-trained on over six million crystal structures, CLOUD is fine-tuned on multiple downstream tasks and achieves competitive performance in predicting a wide range of material properties, demonstrating strong scaling performance. Furthermore, as proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity, which integrates the Debye model to preserve thermodynamic consistency. The CLOUD-DEBYE framework enforces thermodynamic consistency and enables temperature-dependent property prediction without requiring additional data. These results demonstrate the potential of CLOUD as a scalable and physics-informed foundation model for crystalline materials, unifying symmetry-consistent representations with physically grounded learning for property prediction and materials discovery.
nan
Article 1220
Title@2025-06-19 (4): Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression
Title: Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression | Breaking the Compression Ceiling: Datenfreie Pipeline für ultraeffiziente Delta-Kompression | 打破压缩上限:超有效三角洲压缩无数据管道 2505.13563v2 |
Authors (8): Xiaohui Wang, Peng Ye, Chenyu Huang, Shenghe Zheng, Bo Zhang, Lei Bai, Wanli Ouyang, Tao Chen
With the rise of the fine-tuned–pretrained paradigm, storing numerous fine-tuned models for multi-tasking creates significant storage overhead. Delta compression alleviates this by storing only the pretrained model and the highly compressed delta weights (the differences between fine-tuned and pretrained model weights). However, existing methods fail to maintain both high compression and performance, and often rely on data. To address these challenges, we propose UltraDelta, the first data-free delta compression pipeline that achieves both ultra-high compression and strong performance. UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions, using three key components: (1) Variance-Based Mixed Sparsity Allocation assigns sparsity based on variance, giving lower sparsity to high-variance layers to preserve inter-layer information. (2) Distribution-Aware Compression applies uniform quantization and then groups parameters by value, followed by group-wise pruning, to better preserve intra-layer distribution. (3) Trace-Norm-Guided Rescaling uses the trace norm of delta weights to estimate a global rescaling factor, improving model stability under higher compression. Extensive experiments across (a) large language models (fine-tuned on LLaMA-2 7B and 13B) with up to 133x, (b) general NLP models (RoBERTa-base, T5-base) with up to 800x, (c) vision models (ViT-B/32, ViT-L/14) with up to 400x, and (d) multi-modal models (BEiT-3) with 40x compression ratio, demonstrate that UltraDelta consistently outperforms existing methods, especially under ultra-high compression.
nan
Article 1221
Title@2025-06-19 (4): Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Title: Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights | Drag-and-Drop LLMs: Nullschnelle Prompt-zu-Gewichte | 拖放LMs: 零热快速到重 2506.16406v1 |
Authors (14): Zhiyuan Liang, Dongwen Tang, Yuhao Zhou, Xuanlei Zhao, Mingjia Shi, Wangbo Zhao, Zekai Li, Peihao Wang, Konstantin Schürholt, Damian Borth, Michael M. Bronstein, Yang You, Zhangyang Wang, Kai Wang
Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to \textbf{12,000$\times$} lower overhead than full fine-tuning, ii) average gains up to \textbf{30\%} in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at \href{https://jerryliang24.github.io/DnD}{https://jerryliang24.github.io/DnD}.
nan
Article 1222
Title@2025-06-19 (4): Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Title: Generating Directed Graphs with Dual Attention and Asymmetric Encoding | Generieren von gerichteten Graphen mit doppelter Aufmerksamkeit und asymmetrischer Kodierung | 产生具有双重注意和对称编码的定向图形 2506.16404v1 |
Authors (5): Alba Carballo-Castro, Manuel Madeira, Yiming Qin, Dorina Thanou, Pascal Frossard
Directed graphs naturally model systems with asymmetric, ordered relationships, essential to applications in biology, transportation, social networks, and visual understanding. Generating such graphs enables tasks such as simulation, data augmentation and novel instance discovery; however, directed graph generation remains underexplored. We identify two key factors limiting progress in this direction: first, modeling edge directionality introduces a substantially larger dependency space, making the underlying distribution harder to learn; second, the absence of standardized benchmarks hinders rigorous evaluation. Addressing the former requires more expressive models that are sensitive to directional topologies. We propose Directo, the first generative model for directed graphs built upon the discrete flow matching framework. Our approach combines: (i) principled positional encodings tailored to asymmetric pairwise relations, (ii) a dual-attention mechanism capturing both incoming and outgoing dependencies, and (iii) a robust, discrete generative framework. To support evaluation, we introduce a benchmark suite covering synthetic and real-world datasets. It shows that our method performs strongly across diverse settings and even competes with specialized models for particular classes, such as directed acyclic graphs. Our results highlight the effectiveness and generality of our approach, establishing a solid foundation for future research in directed graph generation.
nan
Article 1223
Title@2025-06-19 (4): IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
Title: IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks | IS-Bench: Bewertung der interaktiven Sicherheit von VLM-getriebenen Körpermitteln bei täglichen Haushaltsaufgaben | IS-Bench:评估每日家务任务中VLM-Driven 充装代理人的互动安全 2506.16402v1 |
Authors (8): Xiaoya Lu, Zeren Chen, Xuhao Hu, Yijin Zhou, Weichen Zhang, Dongrui Liu, Lu Sheng, Jing Shao
Flawed planning from VLM-driven embodied agents poses significant safety hazards, hindering their deployment in real-world household tasks. However, existing static, non-interactive evaluation paradigms fail to adequately assess risks within these interactive environments, since they cannot simulate dynamic risks that emerge from an agent’s actions and rely on unreliable post-hoc evaluations that ignore unsafe intermediate steps. To bridge this critical gap, we propose evaluating an agent’s interactive safety: its ability to perceive emergent risks and execute mitigation steps in the correct procedural order. We thus present IS-Bench, the first multi-modal benchmark designed for interactive safety, featuring 161 challenging scenarios with 388 unique safety risks instantiated in a high-fidelity simulator. Crucially, it facilitates a novel process-oriented evaluation that verifies whether risk mitigation actions are performed before/after specific risk-prone steps. Extensive experiments on leading VLMs, including the GPT-4o and Gemini-2.5 series, reveal that current agents lack interactive safety awareness, and that while safety-aware Chain-of-Thought can improve performance, it often compromises task completion. By highlighting these critical limitations, IS-Bench provides a foundation for developing safer and more reliable embodied AI systems.
nan
Article 1224
Title@2025-06-19 (4): GoalLadder: Incremental Goal Discovery with Vision-Language Models
Title: GoalLadder: Incremental Goal Discovery with Vision-Language Models | Zielleiter: Incremental Goal Discovery mit Vision-Language-Modellen | 目标增编:利用视觉语言模型发现递增目标 2506.16396v1 |
Authors (2): Alexey Zakharov, Shimon Whiteson
Natural language can offer a concise and human-interpretable means of specifying reinforcement learning (RL) tasks. The ability to extract rewards from a language instruction can enable the development of robotic systems that can learn from human guidance; however, it remains a challenging problem, especially in visual environments. Existing approaches that employ large, pretrained language models either rely on non-visual environment representations, require prohibitively large amounts of feedback, or generate noisy, ill-shaped reward functions. In this paper, we propose a novel method, $\textbf{GoalLadder}$, that leverages vision-language models (VLMs) to train RL agents from a single language instruction in visual environments. GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language. To do so, it queries a VLM to identify states that represent an improvement in agent’s task progress and to rank them using pairwise comparisons. Unlike prior work, GoalLadder does not trust VLM’s feedback completely; instead, it uses it to rank potential goal states using an ELO-based rating system, thus reducing the detrimental effects of noisy VLM feedback. Over the course of training, the agent is tasked with minimising the distance to the top-ranked goal in a learned embedding space, which is trained on unlabelled visual data. This key feature allows us to bypass the need for abundant and accurate feedback typically required to train a well-shaped reward function. We demonstrate that GoalLadder outperforms existing related methods on classic control and robotic manipulation environments with the average final success rate of $\sim$95% compared to only $\sim$45% of the best competitor.
nan
Article 1225
Title@2025-06-19 (4): State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification
Title: State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification | State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification | 国家空间局Kolmogorov Arnold 解释非线性非线性系统识别网络 2506.16392v1 |
Authors (4): Gonçalo Granjal Cruz, Balazs Renczes, Mark C Runacres, Jan Decuyper
While accurate, black-box system identification models lack interpretability of the underlying system dynamics. This paper proposes State-Space Kolmogorov-Arnold Networks (SS-KAN) to address this challenge by integrating Kolmogorov-Arnold Networks within a state-space framework. The proposed model is validated on two benchmark systems: the Silverbox and the Wiener-Hammerstein benchmarks. Results show that SS-KAN provides enhanced interpretability due to sparsity-promoting regularization and the direct visualization of its learned univariate functions, which reveal system nonlinearities at the cost of accuracy when compared to state-of-the-art black-box models, highlighting SS-KAN as a promising approach for interpretable nonlinear system identification, balancing accuracy and interpretability of nonlinear system dynamics.
nan
Article 1226
Title@2025-06-19 (4): Patch-based learning of adaptive Total Variation parameter maps for blind image denoising
Title: Patch-based learning of adaptive Total Variation parameter maps for blind image denoising | Patchbasiertes Lernen von adaptiven Total Variation-Parameterkarten für Blind Image Denoising | 盲人图像除污的全变化参数图 2503.16010v2 |
Authors (4): Claudio Fantasia, Luca Calatroni, Xavier Descombes, Rim Rekik
We consider a patch-based learning approach defined in terms of neural networks to estimate spatially adaptive regularisation parameter maps for image denoising with weighted Total Variation (TV) and test it to situations when the noise distribution is unknown. As an example, we consider situations where noise could be either Gaussian or Poisson and perform preliminary model selection by a standard binary classification network. Then, we define a patch-based approach where at each image pixel an optimal weighting between TV regularisation and the corresponding data fidelity is learned in a supervised way using reference natural image patches upon optimisation of SSIM and in a sliding window fashion. Extensive numerical results are reported for both noise models, showing significant improvement w.r.t. results obtained by means of optimal scalar regularisation.
nan
Article 1227
Title@2025-06-19 (4): CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset
Title: CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset | CLIP-MG: Leitende Semantische Aufmerksamkeit mit skeletalen Pose-Funktionen und RGB-Daten zur Micro-Gesture-Erkennung auf dem iMiGUE-Datensatz | CLIP-MG:在iMIGUE数据集中以骨骨骼藻类特征和RGB数据指导语义注意,用于识别微气识别的RGB数据 2506.16385v1 |
Authors (3): Santosh Patapati, Trisanth Srinivasan, Amith Adiraju
Micro-gesture recognition is a challenging task in affective computing due to the subtle, involuntary nature of the gestures and their low movement amplitude. In this paper, we introduce a Pose-Guided Semantics-Aware CLIP-based architecture, or CLIP for Micro-Gesture recognition (CLIP-MG), a modified CLIP model tailored for micro-gesture classification on the iMiGUE dataset. CLIP-MG integrates human pose (skeleton) information into the CLIP-based recognition pipeline through pose-guided semantic query generation and a gated multi-modal fusion mechanism. The proposed model achieves a Top-1 accuracy of 61.82%. These results demonstrate both the potential of our approach and the remaining difficulty in fully adapting vision-language models like CLIP for micro-gesture recognition.
nan
Article 1228
Title@2025-06-19 (4): Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
Title: Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval | Hopfield-Fenchel-Young Networks: Ein einheitliches Rahmenwerk für assoziative Memory Retrieval | Hopfield-Fenchel-青年网络:联合记忆检索统一框架 2411.08590v2 |
Authors (4): Saul Santos, Vlad Niculae, Daniel McNamee, André F. T. Martins
Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.
nan
Article 1229
Title@2025-06-19 (4): Celo: Training Versatile Learned Optimizers on a Compute Diet
Title: Celo: Training Versatile Learned Optimizers on a Compute Diet | Celo: Training vielseitiger gelernter Optimierer auf einer Computer-Diät | Celo:就计算膳食培训有说服力的优化剂 2501.12670v2 |
Authors (4): Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky
Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned update rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.
nan
Article 1230
Title@2025-06-19 (4): Classification of Cattle Behavior and Detection of Heat (Estrus) using Sensor Data
Title: Classification of Cattle Behavior and Detection of Heat (Estrus) using Sensor Data | Klassifizierung des Rinderverhaltens und der Erkennung von Wärme (Estrus) anhand von Sensordaten | 使用传感器数据对牛行为进行分类和检测热量(Estrus) 2506.16380v1 |
Authors (6): Druva Dhakshinamoorthy, Avikshit Jha, Sabyasachi Majumdar, Devdulal Ghosh, Ranjita Chakraborty, Hena Ray
This paper presents a novel system for monitoring cattle behavior and detecting estrus (heat) periods using sensor data and machine learning. We designed and deployed a low-cost Bluetooth-based neck collar equipped with accelerometer and gyroscope sensors to capture real-time behavioral data from real cows, which was synced to the cloud. A labeled dataset was created using synchronized CCTV footage to annotate behaviors such as feeding, rumination, lying, and others. We evaluated multiple machine learning models – Support Vector Machines (SVM), Random Forests (RF), and Convolutional Neural Networks (CNN) – for behavior classification. Additionally, we implemented a Long Short-Term Memory (LSTM) model for estrus detection using behavioral patterns and anomaly detection. Our system achieved over 93% behavior classification accuracy and 96% estrus detection accuracy on a limited test set. The approach offers a scalable and accessible solution for precision livestock monitoring, especially in resource-constrained environments.
nan
Article 1231
Title@2025-06-19 (4): FFINO: Factorized Fourier Improved Neural Operator for Modeling Multiphase Flow in Underground Hydrogen Storage
Title: FFINO: Factorized Fourier Improved Neural Operator for Modeling Multiphase Flow in Underground Hydrogen Storage | FFINO: Factorized Fourier Verbesserter neuraler Operator für die Modellierung von Mehrphasenströmungen im unterirdischen Wasserstoffspeicher | FFINO:用于模拟地下氢储存多阶段流动模型的四倍改进神经操作员 2506.17344v1 |
Authors (2): Tao Wang, Hewei Tang
Underground hydrogen storage (UHS) is a promising energy storage option for the current energy transition to a low-carbon economy. Fast modeling of hydrogen plume migration and pressure field evolution is crucial for UHS field management. In this study, we propose a new neural operator architecture, FFINO, as a fast surrogate model for multiphase flow problems in UHS. We parameterize experimental relative permeability curves reported in the literature and include them as key uncertainty parameters in the FFINO model. We also compare the FFINO model with the state-of-the-art FMIONet model through a comprehensive combination of metrics. Our new FFINO model has 38.1% fewer trainable parameters, 17.6% less training time, and 12% less GPU memory cost compared to FMIONet. The FFINO model also achieves a 9.8% accuracy improvement in predicting hydrogen plume in focused areas, and 18% higher RMSE in predicting pressure buildup. The inference time of the trained FFINO model is 7850 times faster than a numerical simulator, which makes it a competent substitute for numerical simulations of UHS problems with superior time efficiency.
nan
Article 1232
Title@2025-06-19 (4): WebXAII: an open-source web framework to study human-XAI interaction
Title: WebXAII: an open-source web framework to study human-XAI interaction | WebXAII: ein Open-Source-Web-Framework zur Untersuchung der Mensch-XAI-Interaktion | WebXAII:研究人类-XAI相互作用的公开来源网络框架 2506.14777v2 |
Authors (4): Jules Leguy, Pierre-Antoine Jean, Felipe Torres Figueroa, Sébastien Harispe
This article introduces WebXAII, an open-source web framework designed to facilitate research on human interaction with eXplainable Artificial Intelligence (XAI) systems. The field of XAI is rapidly expanding, driven by the growing societal implications of the widespread adoption of AI (and in particular machine learning) across diverse applications. Researchers who study the interaction between humans and XAI techniques typically develop ad hoc interfaces in order to conduct their studies. These interfaces are usually not shared alongside the results of the studies, which limits their reusability and the reproducibility of experiments. In response, we design and implement WebXAII, a web-based platform that can embody full experimental protocols, meaning that it can present all aspects of the experiment to human participants and record their responses. The experimental protocols are translated into a composite architecture of generic views and modules, which offers a lot of flexibility. The architecture is defined in a structured configuration file, so that protocols can be implemented with minimal programming skills. We demonstrate that WebXAII can effectively embody relevant protocols, by reproducing the protocol of a state-of-the-art study of the literature.
nan
Article 1233
Title@2025-06-19 (4): Variance-Based Defense Against Blended Backdoor Attacks
Title: Variance-Based Defense Against Blended Backdoor Attacks | Varianzbasierte Verteidigung gegen gemischte Hintertürangriffe | 以差异为基础防范混合的幕后袭击 2506.01444v2 |
Authors (3): Sujeevan Aseervatham, Achraf Kerzazi, Younès Bennani
Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models, primarily due to their stealthy nature. The model behaves normally on clean data but exhibits malicious behavior only when the attacker embeds a specific trigger into the input. This attack is performed during the training phase, where the adversary corrupts a small subset of the training data by embedding a pattern and modifying the labels to a chosen target. The objective is to make the model associate the pattern with the target label while maintaining normal performance on unaltered data. Several defense mechanisms have been proposed to sanitize training data-sets. However, these methods often rely on the availability of a clean dataset to compute statistical anomalies, which may not always be feasible in real-world scenarios where datasets can be unavailable or compromised. To address this limitation, we propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger before identifying the poisoned instances. This approach enhances explainability by explicitly revealing the harmful part of the trigger. The effectiveness of our method is demonstrated through experimental evaluations on well-known image datasets and comparative analysis against three state-of-the-art algorithms: SCAn, ABL, and AGPD.
nan
Article 1234
Title@2025-06-19 (4): Data-Driven Policy Mapping for Safe RL-based Energy Management Systems
Title: Data-Driven Policy Mapping for Safe RL-based Energy Management Systems | Datengestützte Politikmappings für sichere RL-basierte Energiemanagementsysteme | 以RL为基础的安全能源管理系统数据驱动政策绘图 2506.16352v1 |
Authors (3): Theo Zangato, Aomar Osmani, Pegah Alizadeh
Increasing global energy demand and renewable integration complexity have placed buildings at the center of sustainable energy management. We present a three-step reinforcement learning(RL)-based Building Energy Management System (BEMS) that combines clustering, forecasting, and constrained policy learning to address scalability, adaptability, and safety challenges. First, we cluster non-shiftable load profiles to identify common consumption patterns, enabling policy generalization and transfer without retraining for each new building. Next, we integrate an LSTM based forecasting module to anticipate future states, improving the RL agents’ responsiveness to dynamic conditions. Lastly, domain-informed action masking ensures safe exploration and operation, preventing harmful decisions. Evaluated on real-world data, our approach reduces operating costs by up to 15% for certain building types, maintains stable environmental performance, and quickly classifies and optimizes new buildings with limited data. It also adapts to stochastic tariff changes without retraining. Overall, this framework delivers scalable, robust, and cost-effective building energy management.
nan
Article 1235
Title@2025-06-19 (4): Adaptive Experimental Design for Policy Learning
Title: Adaptive Experimental Design for Policy Learning | Adaptives Experimentelles Design für politisches Lernen | 政策学习适应性实验设计 2401.03756v4 |
Authors (4): Masahiro Kato, Kyohei Okumura, Takuya Ishihara, Toru Kitagawa
This study investigates the contextual best arm identification (BAI) problem, aiming to design an adaptive experiment to identify the best treatment arm conditioned on contextual information (covariates). We consider a decision-maker who assigns treatment arms to experimental units during an experiment and recommends the estimated best treatment arm based on the contexts at the end of the experiment. The decision-maker uses a policy for recommendations, which is a function that provides the estimated best treatment arm given the contexts. In our evaluation, we focus on the worst-case expected regret, a relative measure between the expected outcomes of an optimal policy and our proposed policy. We derive a lower bound for the expected simple regret and then propose a strategy called Adaptive Sampling-Policy Learning (PLAS). We prove that this strategy is minimax rate-optimal in the sense that its leading factor in the regret upper bound matches the lower bound as the number of experimental units increases.
nan
Article 1236
Title@2025-06-19 (4): Watermarking Autoregressive Image Generation
Title: Watermarking Autoregressive Image Generation | Autoregressive Bildgenerierung mit Wasserzeichen | 自动递减图像生成 2506.16349v1 |
Authors (5): Nikola Jovanović, Ismail Labiad, Tomáš Souček, Martin Vechev, Pierre Fernandez
Watermarking the outputs of generative models has emerged as a promising approach for tracking their provenance. Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. We identify a key challenge: the lack of reverse cycle-consistency (RCC), wherein re-tokenizing generated image tokens significantly alters the token sequence, effectively erasing the watermark. To address this and to make our method robust to common image transformations, neural compression, and removal attacks, we introduce (i) a custom tokenizer-detokenizer finetuning procedure that improves RCC, and (ii) a complementary watermark synchronization layer. As our experiments demonstrate, our approach enables reliable and robust watermark detection with theoretically grounded p-values.
nan
Article 1237
Title@2025-06-19 (4): Quantum-Informed Contrastive Learning with Dynamic Mixup Augmentation for Class-Imbalanced Expert Systems
Title: Quantum-Informed Contrastive Learning with Dynamic Mixup Augmentation for Class-Imbalanced Expert Systems | Quantum-informiertes Kontrastives Lernen mit dynamischer Mixup Augmentation für klassengerechte Expertensysteme | 以动态混合增量促进分类平衡专家系统 2506.13987v2 |
Authors (3): Md Abrar Jahin, Adiba Abid, M. F. Mridha
Expert systems often operate in domains characterized by class-imbalanced tabular data, where detecting rare but critical instances is essential for safety and reliability. While conventional approaches, such as cost-sensitive learning, oversampling, and graph neural networks, provide partial solutions, they suffer from drawbacks like overfitting, label noise, and poor generalization in low-density regions. To address these challenges, we propose QCL-MixNet, a novel Quantum-Informed Contrastive Learning framework augmented with k-nearest neighbor (kNN) guided dynamic mixup for robust classification under imbalance. QCL-MixNet integrates three core innovations: (i) a Quantum Entanglement-inspired layer that models complex feature interactions through sinusoidal transformations and gated attention, (ii) a sample-aware mixup strategy that adaptively interpolates feature representations of semantically similar instances to enhance minority class representation, and (iii) a hybrid loss function that unifies focal reweighting, supervised contrastive learning, triplet margin loss, and variance regularization to improve both intra-class compactness and inter-class separability. Extensive experiments on 18 real-world imbalanced datasets (binary and multi-class) demonstrate that QCL-MixNet consistently outperforms 20 state-of-the-art machine learning, deep learning, and GNN-based baselines in macro-F1 and recall, often by substantial margins. Ablation studies further validate the critical role of each architectural component. Our results establish QCL-MixNet as a new benchmark for tabular imbalance handling in expert systems. Theoretical analyses reinforce its expressiveness, generalization, and optimization robustness.
nan
Article 1238
Title@2025-06-19 (4): Sustainable Greenhouse Microclimate Modeling: A Comparative Analysis of Recurrent and Graph Neural Networks
Title: Sustainable Greenhouse Microclimate Modeling: A Comparative Analysis of Recurrent and Graph Neural Networks | Sustainable Greenhouse Microclimate Modeling: Eine vergleichende Analyse von recurrenten und Graphen-Neuralen Netzwerken | 可持续的温室微观气候建模:经常性和图形神经网络比较分析 2502.17371v4 |
Authors (5): Emiliano Seri, Marcello Petitta, Chryssoula Papaioannou, Nikolaos Katsoulas, Cristina Cornaro
The integration of photovoltaic (PV) systems into greenhouses not only optimizes land use but also enhances sustainable agricultural practices by enabling dual benefits of food production and renewable energy generation. However, accurate prediction of internal environmental conditions is crucial to ensure optimal crop growth while maximizing energy production. This study introduces a novel application of Spatio-Temporal Graph Neural Networks (STGNNs) to greenhouse microclimate modeling, comparing their performance with traditional Recurrent Neural Networks (RNNs). While RNNs excel at temporal pattern recognition, they cannot explicitly model the directional relationships between environmental variables. Our STGNN approach addresses this limitation by representing these relationships as directed graphs, enabling the model to capture both environmental dependencies and their directionality. We benchmark RNNs against directed STGNNs on two 15-min-resolution datasets from Volos (Greece): a six-variable 2020 installation and a more complex eight-variable greenhouse monitored in autumn 2024. In the simpler 2020 case the RNN attains near-perfect accuracy, outperforming the STGNN. When additional drivers are available in 2024, the STGNN overtakes the RNN ($R^{2}=0.905$ vs $0.740$), demonstrating that explicitly modelling directional dependencies becomes critical as interaction complexity grows. These findings indicate when graph-based models are warranted and provide a stepping-stone toward digital twins that jointly optimise crop yield and PV power in agrivoltaic greenhouses.
nan
Article 1239
Title@2025-06-19 (4): Feedback-driven recurrent quantum neural network universality
Title: Feedback-driven recurrent quantum neural network universality | feedbackgesteuerte rezidivierende quantenneuronale Netzwerk-Universalität | 由反馈驱动的经常性量子神经网络普遍性 2506.16332v1 |
Authors (3): Lukas Gonon, Rodrigo Martínez-Peña, Juan-Pablo Ortega
Quantum reservoir computing uses the dynamics of quantum systems to process temporal data, making it particularly well-suited for learning with noisy intermediate-scale quantum devices. Early experimental proposals, such as the restarting and rewinding protocols, relied on repeating previous steps of the quantum map to avoid backaction. However, this approach compromises real-time processing and increases computational overhead. Recent developments have introduced alternative protocols that address these limitations. These include online, mid-circuit measurement, and feedback techniques, which enable real-time computation while preserving the input history. Among these, the feedback protocol stands out for its ability to process temporal information with comparatively fewer components. Despite this potential advantage, the theoretical foundations of feedback-based quantum reservoir computing remain underdeveloped, particularly with regard to the universality and the approximation capabilities of this approach. This paper addresses this issue by presenting a recurrent quantum neural network architecture that extends a class of existing feedforward models to a dynamic, feedback-driven reservoir setting. We provide theoretical guarantees for variational recurrent quantum neural networks, including approximation bounds and universality results. Notably, our analysis demonstrates that the model is universal with linear readouts, making it both powerful and experimentally accessible. These results pave the way for practical and theoretically grounded quantum reservoir computing with real-time processing capabilities.
nan
Article 1240
Title@2025-06-19 (4): Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners
Title: Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners | Beitrag anregen und auch Parameter lernen: Föderiertes Lernen mit strategischen Dateninhabern | 激励贡献和学习参数:与战略数据所有者进行联邦学习 2505.12010v2 |
Authors (5): Drashthi Doshi, Aditya Vema Reddy Kesari, Swaprava Nath, Avishek Ghosh, Suhas S Kowshik
Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism with monetary transfers that is budget balanced and enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FeMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees, and better model performance for all agents.
nan
Article 1241
Title@2025-06-19 (4): SimBank: from Simulation to Solution in Prescriptive Process Monitoring
Title: SimBank: from Simulation to Solution in Prescriptive Process Monitoring | SimBank: Von der Simulation zur Lösung in der Prescriptive Process Monitoring | SimBank:从模拟到规范程序监测的解决方案 2506.14772v2 |
Authors (4): Jakob De Moor, Hans Weytjens, Johannes De Smedt, Jochen De Weerdt
Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking enhanced operational performance. However, the current literature faces two key limitations: a lack of extensive comparisons between techniques and insufficient evaluation approaches. To address these gaps, we introduce SimBank: a simulator designed for accurate benchmarking of PresPM methods. Modeled after a bank’s loan application process, SimBank enables extensive comparisons of both online and offline PresPM methods. It incorporates a variety of intervention optimization problems with differing levels of complexity and supports experiments on key causal machine learning challenges, such as assessing a method’s robustness to confounding in data. SimBank additionally offers a comprehensive evaluation capability: for each test case, it can generate the true outcome under each intervention action, which is not possible using recorded datasets. The simulator incorporates parallel activities and loops, drawing from common logs to generate cases that closely resemble real-life process instances. Our proof of concept demonstrates SimBank’s benchmarking capabilities through experiments with various PresPM methods across different interventions, highlighting its value as a publicly available simulator for advancing research and practice in PresPM.
nan
Article 1242
Title@2025-06-19 (4): Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation
Title: Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation | Effiziente und flexible Neural-Netzwerk-Schulung durch schichtweise Feedback-Propagation | 通过多层反馈传播进行有效和灵活的神经网络培训 2308.12053v3 |
Authors (7): Leander Weber, Jim Berend, Moritz Weckbecker, Alexander Binder, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Gradient-based optimization has been a cornerstone of machine learning that enabled the vast advances of Artificial Intelligence (AI) development over the past decades. However, this type of optimization requires differentiation, and with recent evidence of the benefits of non-differentiable (e.g. neuromorphic) architectures over classical models w.r.t. efficiency, such constraints can become limiting in the future. We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors that utilizes methods from the domain of explainability to decompose a reward to individual neurons based on their respective contributions. Leveraging these neuron-wise rewards, our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones. While having comparable computational complexity to gradient descent, LFP does not require gradient computation and generates sparse and thereby memory- and energy-efficient parameter updates and models. We establish the convergence of LFP theoretically and empirically, demonstrating its effectiveness on various models and datasets. Via two applications - neural network pruning and the approximation-free training of Spiking Neural Networks (SNNs) - we demonstrate that LFP combines increased efficiency in terms of computation and representation with flexibility w.r.t. choice of model architecture and objective function. Our code is available at https://github.com/leanderweber/layerwise-feedback-propagation.
nan
Article 1243
Title@2025-06-19 (4): Bayesian Optimization over Bounded Domains with the Beta Product Kernel
Title: Bayesian Optimization over Bounded Domains with the Beta Product Kernel | Bayesian Optimierung über Bounded Domains mit dem Beta Product Kernel | 利用贝塔产品中枢在Beta Product Kernel的封闭区上空实现最佳贝叶斯优化 2506.16316v1 |
Authors (4): Huy Hoang Nguyen, Han Zhou, Matthew B. Blaschko, Aleksei Tiulpin
Bayesian optimization with Gaussian processes (GP) is commonly used to optimize black-box functions. The Mat'ern and the Radial Basis Function (RBF) covariance functions are used frequently, but they do not make any assumptions about the domain of the function, which may limit their applicability in bounded domains. To address the limitation, we introduce the Beta kernel, a non-stationary kernel induced by a product of Beta distribution density functions. Such a formulation allows our kernel to naturally model functions on bounded domains. We present statistical evidence supporting the hypothesis that the kernel exhibits an exponential eigendecay rate, based on empirical analyses of its spectral properties across different settings. Our experimental results demonstrate the robustness of the Beta kernel in modeling functions with optima located near the faces or vertices of the unit hypercube. The experiments show that our kernel consistently outperforms a wide range of kernels, including the well-known Mat'ern and RBF, in different problems, including synthetic function optimization and the compression of vision and language models.
nan
Article 1244
Title@2025-06-19 (4): Signatures to help interpretability of anomalies
Title: Signatures to help interpretability of anomalies | Unterschriften zur Interpretation von Anomalien | 签名有助于异常点的解释 2506.16314v1 |
Authors (11): Emmanuel Gangler, Emille E. O. Ishida, Matwey V. Kornilov, Vladimir Korolev, Anastasia Lavrukhina, Konstantin Malanchev, Maria V. Pruzhinskaya, Etienne Russeil, Timofey Semenikhin, Sreevarsha Sreejith, Alina A. Volnova
Machine learning is often viewed as a black box when it comes to understanding its output, be it a decision or a score. Automatic anomaly detection is no exception to this rule, and quite often the astronomer is left to independently analyze the data in order to understand why a given event is tagged as an anomaly. We introduce here idea of anomaly signature, whose aim is to help the interpretability of anomalies by highlighting which features contributed to the decision.
nan
Article 1245
Title@2025-06-19 (4): Improved Exploration in GFlownets via Enhanced Epistemic Neural Networks
Title: Improved Exploration in GFlownets via Enhanced Epistemic Neural Networks | Verbesserte Exploration in GFlownets durch verstärkte epistemische Neuralnetze | 通过增强的神电网网改进对绿地网的探索 2506.16313v1 |
Authors (2): Sajan Muhammad, Salem Lahlou
Efficiently identifying the right trajectories for training remains an open problem in GFlowNets. To address this, it is essential to prioritize exploration in regions of the state space where the reward distribution has not been sufficiently learned. This calls for uncertainty-driven exploration, in other words, the agent should be aware of what it does not know. This attribute can be measured by joint predictions, which are particularly important for combinatorial and sequential decision problems. In this research, we integrate epistemic neural networks (ENN) with the conventional architecture of GFlowNets to enable more efficient joint predictions and better uncertainty quantification, thereby improving exploration and the identification of optimal trajectories. Our proposed algorithm, ENN-GFN-Enhanced, is compared to the baseline method in GFlownets and evaluated in grid environments and structured sequence generation in various settings, demonstrating both its efficacy and efficiency.
nan
Article 1246
Title@2025-06-19 (4): Neural Guided Diffusion Bridges
Title: Neural Guided Diffusion Bridges | Neural geführte Diffusionsbrücken | 神经向导扩散桥 2502.11909v3 |
Authors (3): Gefan Yang, Frank van der Meulen, Stefan Sommer
We propose a novel method for simulating conditioned diffusion processes (diffusion bridges) in Euclidean spaces. By training a neural network to approximate bridge dynamics, our approach eliminates the need for computationally intensive Markov Chain Monte Carlo (MCMC) methods or score modeling. Compared to existing methods, it offers greater robustness across various diffusion specifications and conditioning scenarios. This applies in particular to rare events and multimodal distributions, which pose challenges for score-learning- and MCMC-based approaches. We introduce a flexible variational family, partially specified by a neural network, for approximating the diffusion bridge path measure. Once trained, it enables efficient sampling of independent bridges at a cost comparable to sampling the unconditioned (forward) process.
nan
Article 1247
Title@2025-06-19 (4): Optimizing Multilingual Text-To-Speech with Accents & Emotions
Title: Optimizing Multilingual Text-To-Speech with Accents & Emotions | Multilinguale Text-To-Speech-Optimierung mit Akzenten & Emotionen | 利用 Acents 和情感优化多语种文字语音语音 2506.16310v1 |
Authors (5): Pranav Pawar, Akshansh Dwivedi, Jenish Boricha, Himanshu Gohil, Aditya Dubey
State-of-the-art text-to-speech (TTS) systems realize high naturalness in monolingual environments, synthesizing speech with correct multilingual accents (especially for Indic languages) and context-relevant emotions still poses difficulty owing to cultural nuance discrepancies in current frameworks. This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent. Our approach extends the Parler-TTS model by integrating A language-specific phoneme alignment hybrid encoder-decoder architecture, and culture-sensitive emotion embedding layers trained on native speaker corpora, as well as incorporating a dynamic accent code switching with residual vector quantization. Quantitative tests demonstrate 23.7% improvement in accent accuracy (Word Error Rate reduction from 15.4% to 11.8%) and 85.3% emotion recognition accuracy from native listeners, surpassing METTS and VECL-TTS baselines. The novelty of the system is that it can mix code in real time - generating statements such as “Namaste, let’s talk about
nan
Article 1248
Title@2025-06-19 (4): Adaptive Social Metaverse Streaming based on Federated Multi-Agent Deep Reinforcement Learning
Title: Adaptive Social Metaverse Streaming based on Federated Multi-Agent Deep Reinforcement Learning | Adaptives soziales Metaverse-Streaming auf Basis von Federated Multi-Agent Deep Reinforcement Learning | 基于联邦多要求深层强化学习的适应性社会超常流 2506.17342v1 |
Authors (4): Zijian Long, Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik
The social metaverse is a growing digital ecosystem that blends virtual and physical worlds. It allows users to interact socially, work, shop, and enjoy entertainment. However, privacy remains a major challenge, as immersive interactions require continuous collection of biometric and behavioral data. At the same time, ensuring high-quality, low-latency streaming is difficult due to the demands of real-time interaction, immersive rendering, and bandwidth optimization. To address these issues, we propose ASMS (Adaptive Social Metaverse Streaming), a novel streaming system based on Federated Multi-Agent Proximal Policy Optimization (F-MAPPO). ASMS leverages F-MAPPO, which integrates federated learning (FL) and deep reinforcement learning (DRL) to dynamically adjust streaming bit rates while preserving user privacy. Experimental results show that ASMS improves user experience by at least 14% compared to existing streaming methods across various network conditions. Therefore, ASMS enhances the social metaverse experience by providing seamless and immersive streaming, even in dynamic and resource-constrained networks, while ensuring that sensitive user data remains on local devices.
nan
Article 1249
Title@2025-06-19 (4): AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Title: AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | AlignDistil: Token-Level-Sprachmodell Alignment als Adaptive Policy Destillation | Aligndistil: 作为适应性政策蒸馏的调整级语言模式模型对齐 2503.02832v2 |
Authors (6): Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu
In modern large language models (LLMs), LLM alignment is of crucial importance and is typically achieved through methods such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation. The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, an RLHF-equivalent distillation method for token-level reward optimization. Specifically, we introduce the reward learned by DPO into the RLHF objective and theoretically prove the equivalence between this objective and a token-level distillation process, where the teacher distribution linearly combines the logits from the DPO model and a reference model. On this basis, we further bridge the accuracy gap between the reward from the DPO model and the pure reward model, by building a contrastive DPO reward with a normal and a reverse DPO model. Moreover, to avoid under- and over-optimization on different tokens, we design a token adaptive logit extrapolation mechanism to construct an appropriate teacher distribution for each token. Experimental results demonstrate the superiority of our AlignDistil over existing methods and showcase fast convergence due to its token-level distributional reward optimization.
nan
Article 1250
Title@2025-06-19 (4): The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units
Title: The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units | Die Condition Number als Scale-Invariant Proxy für die Informationskodierung in neuralen Einheiten | 作为神经单位信息编码缩放- 变量代理工具的条件编号 2506.16289v1 |
Authors (1): Oswaldo Ludwig
This paper explores the relationship between the condition number of a neural network’s weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. We argue that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. We formalize this intuition, particularly for linear units with Gaussian inputs, linking the condition number and the transformation’s log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. Our analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy. Furthermore, we present a practical case study where these principles are applied to guide selective fine-tuning of a multimodal Large Language Model, aiming to mitigate catastrophic forgetting during cross-modal adaptation. Unlike many existing catastrophic forgetting mitigation methods that rely on access to pre-training statistics, which are often unavailable, our selective fine-tuning approach offers a way to bypass this common requirement.
nan
Article 1251
Title@2025-06-19 (4): Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Title: Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective | Next-Token Vorhersage sollte Ambiguität-Sensitiv sein: Eine Meta-Learning-Perspektive | 下肯预测应该是对模糊度敏感度的:一种元学习的视角 2506.16288v1 |
Authors (7): Leo Gagnon, Eric Elmoznino, Sarthak Mittal, Tom Marty, Tejas Kasetty, Dhanya Sridhar, Guillaume Lajoie
The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is because, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.
nan
Article 1252
Title@2025-06-19 (4): LLM-Guided Indoor Navigation with Multimodal Map Understanding
Title: LLM-Guided Indoor Navigation with Multimodal Map Understanding | LLM-geführte Indoor-Navigation mit multimodalem Kartenverständnis | 具有多式地图理解的LLM-引导式室内导航 2503.11702v4 |
Authors (5): Alberto Coffrini, Paolo Barsocchi, Francesco Furfari, Antonino Crivello, Alessio Ferrari
Indoor navigation presents unique challenges due to complex layouts and the unavailability of GNSS signals. Existing solutions often struggle with contextual adaptation, and typically require dedicated hardware. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 86.59% correct indications and a maximum of 97.14%. The proposed system achieves high accuracy and reasoning performance. These results have key implications for AI-driven navigation and assistive technologies.
nan
Article 1253
Title@2025-06-19 (4): Random feature approximation for general spectral methods
Title: Random feature approximation for general spectral methods | Random Feature Approximation für allgemeine Spektralmethoden | 普通光谱方法的随机随机特征近似 2506.16283v1 |
Authors (2): Mike Nguyen, Nicole Mücke
Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral regularization techniques. This includes not only explicit methods but also implicit schemes such as gradient descent and accelerated algorithms like the Heavy-Ball and Nesterov method. Through this framework, we enable a theoretical analysis of neural networks and neural operators through the lens of the Neural Tangent Kernel (NTK) approach trained via gradient descent. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.
nan
Article 1254
Title@2025-06-19 (4): Harnessing omnipresent oscillator networks as computational resource
Title: Harnessing omnipresent oscillator networks as computational resource | Nutzung allgegenwärtiger Oszillatornetzwerke als rechnerische Ressource | 将无所不在的振动器网络作为计算资源 2502.04818v3 |
Authors (3): Thomas Geert de Jong, Hirofumi Notsu, Kohei Nakajima
Nature is pervaded with oscillatory dynamics. In networks of coupled oscillators patterns can arise when the system synchronizes to an external input. Hence, these networks provide processing and memory of input. We present a universal framework for harnessing oscillator networks as computational resource. This computing framework is introduced by the ubiquitous model for phase-locking, the Kuramoto model. We force the Kuramoto model by a nonlinear target-system, then after substituting the target-system with a trained feedback-loop it emulates the target-system. Our results are two-fold. Firstly, the trained network inherits performance properties of the Kuramoto model, where all-to-all coupling is performed in linear time with respect to the number of nodes and parameters for synchronization are abundant. The latter implies that the network is generically successful since the system learns via sychronization. Secondly, the learning capabilities of the oscillator network, which describe a type of collective intelligence, can be explained using Kuramoto model’s order parameter. In summary, this work provides the foundation for utilizing nature’s oscillator networks as a new class of information processing systems.
nan
Article 1255
Title@2025-06-19 (4): The Exploration of Error Bounds in Classification with Noisy Labels
Title: The Exploration of Error Bounds in Classification with Noisy Labels | Die Erforschung von Fehlergrenzen in der Klassifizierung mit Noisy-Labels | 探索有噪音标签的分类误差 2501.15163v2 |
Authors (4): Haixia Liu, Boxiao Li, Can Yang, Yang Wang
Numerous studies have shown that label noise can lead to poor generalization performance, negatively affecting classification accuracy. Therefore, understanding the effectiveness of classifiers trained using deep neural networks in the presence of noisy labels is of considerable practical significance. In this paper, we focus on the error bounds of excess risks for classification problems with noisy labels within deep learning frameworks. We derive error bounds for the excess risk, decomposing it into statistical error and approximation error. To handle statistical dependencies (e.g., mixing sequences), we employ an independent block construction to bound the error, leveraging techniques for dependent processes. For the approximation error, we establish these theoretical results to the vector-valued setting, where the output space consists of $K$-dimensional unit vectors. Finally, under the low-dimensional manifold hypothesis, we further refine the approximation error to mitigate the impact of high-dimensional input spaces.
nan
Article 1256
Title@2025-06-19 (4): Serving Large Language Models on Huawei CloudMatrix384
Title: Serving Large Language Models on Huawei CloudMatrix384 | Große Sprachmodelle auf Huawei CloudMatrix384 | 瓦威云马特列克384 2506.12708v3 |
Authors (46): Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, Shirui Lu, Zhao Qiu, Peiyang Li, Xianyu Chang, Zhengzhong Yu, Fangzheng Miao, Jia Zheng, Ying Li, Yuan Feng, Bei Wang, Zaijian Zong, Mosong Zhou, Wenli Zhou, Houjiang Chen, Xingyu Liao, Yipeng Li, Wenxiao Zhang, Ping Zhu, Yinggang Wang, Chuanjie Xiao, Depeng Liang, Dong Cao, Juncheng Liu, Yongqiang Yang, Xiaolong Bai, Yi Li, Huaguo Xie, Huatao Wu, Zhibin Yu, Lv Chen, Hu Liu, Yujun Ding, Haipei Zhu, Jing Xia, Yi Xiong, Zhou Yu, Heng Liao
The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-level objectives. Addressing these issues requires fundamentally redesigned hardware-software integration. This paper introduces Huawei CloudMatrix, a next-generation AI datacenter architecture, realized in the production-grade CloudMatrix384 supernode. It integrates 384 Ascend 910 NPUs and 192 Kunpeng CPUs interconnected via an ultra-high-bandwidth Unified Bus (UB) network, enabling direct all-to-all communication and dynamic pooling of resources. These features optimize performance for communication-intensive operations, such as large-scale MoE expert parallelism and distributed key-value cache access. To fully leverage CloudMatrix384, we propose CloudMatrix-Infer, an advanced LLM serving solution incorporating three core innovations: a peer-to-peer serving architecture that independently scales prefill, decode, and caching; a large-scale expert parallelism strategy supporting EP320 via efficient UB-based token dispatch; and hardware-aware optimizations including specialized operators, microbatch-based pipelining, and INT8 quantization. Evaluation with the DeepSeek-R1 model shows CloudMatrix-Infer achieves state-of-the-art efficiency: prefill throughput of 6,688 tokens/s per NPU and decode throughput of 1,943 tokens/s per NPU (<50 ms TPOT). It effectively balances throughput and latency, sustaining 538 tokens/s per NPU even under stringent 15 ms latency constraints, while INT8 quantization maintains model accuracy across benchmarks.
nan
Article 1257
Title@2025-06-19 (4): Optimal Online Bookmaking for Any Number of Outcomes
Title: Optimal Online Bookmaking for Any Number of Outcomes | Optimale Online Bookmaking für jede Anzahl von Ergebnissen | 任意数量结果的优化在线账簿制作 2506.16253v1 |
Authors (2): Hadar Tal, Oron Sabag
We study the Online Bookmaking problem, where a bookmaker dynamically updates betting odds on the possible outcomes of an event. In each betting round, the bookmaker can adjust the odds based on the cumulative betting behavior of gamblers, aiming to maximize profit while mitigating potential loss. We show that for any event and any number of betting rounds, in a worst-case setting over all possible gamblers and outcome realizations, the bookmaker’s optimal loss is the largest root of a simple polynomial. Our solution shows that bookmakers can be as fair as desired while avoiding financial risk, and the explicit characterization reveals an intriguing relation between the bookmaker’s regret and Hermite polynomials. We develop an efficient algorithm that computes the optimal bookmaking strategy: when facing an optimal gambler, the algorithm achieves the optimal loss, and in rounds where the gambler is suboptimal, it reduces the achieved loss to the optimal opportunistic loss, a notion that is related to subgame perfect Nash equilibrium. The key technical contribution to achieve these results is an explicit characterization of the Bellman-Pareto frontier, which unifies the dynamic programming updates for Bellman’s value function with the multi-criteria optimization framework of the Pareto frontier in the context of vector repeated games.
nan
Article 1258
Title@2025-06-19 (4): Guaranteed prediction sets for functional surrogate models
Title: Guaranteed prediction sets for functional surrogate models | Garantierte Vorhersagesätze für funktionale Ersatzmodelle | 功能替代模型的保证预测数据集 2501.18426v2 |
Authors (4): Ander Gray, Vignesh Gopakumar, Sylvain Rousseau, Sébastien Destercke
We propose a method for obtaining statistically guaranteed prediction sets for functional machine learning methods: surrogate models which map between function spaces, motivated by the need to build reliable PDE emulators. The method constructs nested prediction sets on a low-dimensional representation (an SVD) of the surrogate model’s error, and then maps these sets to the prediction space using set-propagation techniques. This results in prediction sets for functional surrogate models with conformal prediction coverage guarantees. We use zonotopes as basis of the set construction, which allow an exact linear propagation and are closed under Cartesian products, making them well-suited to this high-dimensional problem. The method is model agnostic and can thus be applied to complex Sci-ML models, including Neural Operators, but also in simpler settings. We also introduce a technique to capture the truncation error of the SVD, preserving the guarantees of the method.
nan
Article 1259
Title@2025-06-19 (4): Synthetic ALS-EEG Data Augmentation for ALS Diagnosis Using Conditional WGAN with Weight Clipping
Title: Synthetic ALS-EEG Data Augmentation for ALS Diagnosis Using Conditional WGAN with Weight Clipping | Synthetische ALS-EEG Datenvergrößerung für ALS Diagnose mit Bedingtem WGAN mit Gewichtseinschnitt | 使用有重量缩放的附加条件WGAN系统进行ALS诊断的ALS-EEG合成合成数据增强 2506.16243v1 |
Authors (3): Abdulvahap Mutlu, Şengül Doğan, Türker Tuncer
Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative disease, and high-quality EEG data from ALS patients are scarce. This data scarcity, coupled with severe class imbalance between ALS and healthy control recordings, poses a challenge for training reliable machine learning classifiers. In this work, we address these issues by generating synthetic EEG signals for ALS patients using a Conditional Wasserstein Generative Adversarial Network (CWGAN). We train CWGAN on a private EEG dataset (ALS vs. non-ALS) to learn the distribution of ALS EEG signals and produce realistic synthetic samples. We preprocess and normalize EEG recordings, and train a CWGAN model to generate synthetic ALS signals. The CWGAN architecture and training routine are detailed, with key hyperparameters chosen for stable training. Qualitative evaluation of generated signals shows that they closely mimic real ALS EEG patterns. The CWGAN training converged with generator and discriminator loss curves stabilizing, indicating successful learning. The synthetic EEG signals appear realistic and have potential use as augmented data for training classifiers, helping to mitigate class imbalance and improve ALS detection accuracy. We discuss how this approach can facilitate data sharing and enhance diagnostic models.
nan
Article 1260
Title@2025-06-19 (4): Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means
Title: Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means | Einheitliche Mean-Schätzung für schwerfällige Verteilungen über Median-of-Means | 通过中中度中度中途重型发运重故障统一平均估计值 2506.14673v3 |
Authors (2): Mikael Møller Høgsgaard, Andrea Paudice
The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $\mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p \in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.
nan
Article 1261
Title@2025-06-19 (4): Active MRI Acquisition with Diffusion Guided Bayesian Experimental Design
Title: Active MRI Acquisition with Diffusion Guided Bayesian Experimental Design | Aktive MRT-Akquisition mit Diffusion Guided Bayesian Experimental Design | 主动MRI 利用扩散导导贝叶斯实验设计获得MRI 2506.16237v1 |
Authors (5): Jacopo Iollo, Geoffroy Oudoumanessah, Carole Lartizien, Michel Dojat, Florence Forbes
A key challenge in maximizing the benefits of Magnetic Resonance Imaging (MRI) in clinical settings is to accelerate acquisition times without significantly degrading image quality. This objective requires a balance between under-sampling the raw k-space measurements for faster acquisitions and gathering sufficient raw information for high-fidelity image reconstruction and analysis tasks. To achieve this balance, we propose to use sequential Bayesian experimental design (BED) to provide an adaptive and task-dependent selection of the most informative measurements. Measurements are sequentially augmented with new samples selected to maximize information gain on a posterior distribution over target images. Selection is performed via a gradient-based optimization of a design parameter that defines a subsampling pattern. In this work, we introduce a new active BED procedure that leverages diffusion-based generative models to handle the high dimensionality of the images and employs stochastic optimization to select among a variety of patterns while meeting the acquisition process constraints and budget. So doing, we show how our setting can optimize, not only standard image reconstruction, but also any associated image analysis task. The versatility and performance of our approach are demonstrated on several MRI acquisitions.
nan
Article 1262
Title@2025-06-19 (4): Think Global, Act Local: Bayesian Causal Discovery with Language Models in Sequential Data
Title: Think Global, Act Local: Bayesian Causal Discovery with Language Models in Sequential Data | Think Global, Act Local: Bayesian Causal Discovery mit Sprachmodellen in Sequential Data | 《全球思维》,《地方行动法》:Bayesian Causal发现序列数据中的语言模式 2506.16234v1 |
Authors (6): Prakhar Verma, David Arbour, Sunav Choudhary, Harshita Chopra, Arno Solin, Atanu R. Sinha
Causal discovery from observational data typically assumes full access to data and availability of domain experts. In practice, data often arrive in batches, and expert knowledge is scarce. Language Models (LMs) offer a surrogate but come with their own issues-hallucinations, inconsistencies, and bias. We present BLANCE (Bayesian LM-Augmented Causal Estimation)-a hybrid Bayesian framework that bridges these gaps by adaptively integrating sequential batch data with LM-derived noisy, expert knowledge while accounting for both data-induced and LM-induced biases. Our proposed representation shift from Directed Acyclic Graph (DAG) to Partial Ancestral Graph (PAG) accommodates ambiguities within a coherent Bayesian framework, allowing grounding the global LM knowledge in local observational data. To guide LM interaction, we use a sequential optimization scheme that adaptively queries the most informative edges. Across varied datasets, BLANCE outperforms prior work in structural accuracy and extends to Bayesian parameter estimation, showing robustness to LM noise.
nan
Article 1263
Title@2025-06-19 (4): Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation
Title: Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation | Kann KI von ungesehenen Galaxien träumen? Bedingtes Diffusionsmodell für Galaxy Morphology Augmentation | AI 能梦到看不见的星系吗? 2506.16233v1 |
Authors (7): Chenrui Ma, Zechang Sun, Tao Jing, Zheng Cai, Yuan-Sen Ting, Song Huang, Mingyu Li
Observational astronomy relies on visual feature identification to detect critical astrophysical phenomena. While machine learning (ML) increasingly automates this process, models often struggle with generalization in large-scale surveys due to the limited representativeness of labeled datasets – whether from simulations or human annotation – a challenge pronounced for rare yet scientifically valuable objects. To address this, we propose a conditional diffusion model to synthesize realistic galaxy images for augmenting ML training data. Leveraging the Galaxy Zoo 2 dataset which contains visual feature – galaxy image pairs from volunteer annotation, we demonstrate that our model generates diverse, high-fidelity galaxy images closely adhere to the specified morphological feature conditions. Moreover, this model enables generative extrapolation to project well-annotated data into unseen domains and advancing rare object detection. Integrating synthesized images into ML pipelines improves performance in standard morphology classification, boosting completeness and purity by up to 30\% across key metrics. For rare object detection, using early-type galaxies with prominent dust lane features ( $\sim$0.1\% in GZ2 dataset) as a test case, our approach doubled the number of detected instances from 352 to 872, compared to previous studies based on visual inspection. This study highlights the power of generative models to bridge gaps between scarce labeled data and the vast, uncharted parameter space of observational astronomy and sheds insight for future astrophysical foundation model developments. Our project homepage is available at https://galaxysd-webpage.streamlit.app/.
nan
Article 1264
Title@2025-06-19 (4): Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
Title: Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy | Malware-Klassifizierung NLP & Machine Learning für verbesserte Genauigkeit | 恶意分类利用NLP和机器学习来提高准确度 2506.16224v1 |
Authors (4): Bishwajit Prasad Gond, Rajneekant, Pushkar Kishore, Durga Prasad Mohapatra
This paper investigates the application of natural language processing (NLP)-based n-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through n-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into n-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our n-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features.
nan
Article 1265
Title@2025-06-19 (4): Interventions Against Machine-Assisted Statistical Discrimination
Title: Interventions Against Machine-Assisted Statistical Discrimination | Maßnahmen gegen maschinengestützte statistische Diskriminierungen | 反对机器辅助统计歧视的干预措施 2310.04585v4 |
Authors (1): John Y. Zhu
I study statistical discrimination driven by verifiable beliefs, such as those generated by machine learning, rather than by humans. When beliefs are verifiable, interventions against statistical discrimination can move beyond simple, belief-free designs like affirmative action, to more sophisticated ones, that constrain decision makers based on what they are thinking. I design a belief-contingent intervention I call common identity. I show that it is effective at eliminating equilibrium statistical discrimination, even when training data exhibit the various statistical biases that often plague algorithmic decision problems.
nan
Article 1266
Title@2025-06-19 (4): From Pixels to CSI: Distilling Latent Dynamics For Efficient Wireless Resource Management
Title: From Pixels to CSI: Distilling Latent Dynamics For Efficient Wireless Resource Management | Von Pixeln zu CSI: Destillieren von Latent Dynamics für effizientes drahtloses Ressourcenmanagement | 从像素到 CSI: 为高效无线资源管理蒸馏中流动态 2506.16216v1 |
Authors (3): Charbel Bou Chaaya, Abanoub M. Girgis, Mehdi Bennis
In this work, we aim to optimize the radio resource management of a communication system between a remote controller and its device, whose state is represented through image frames, without compromising the performance of the control task. We propose a novel machine learning (ML) technique to jointly model and predict the dynamics of the control system as well as the wireless propagation environment in latent space. Our method leverages two coupled joint-embedding predictive architectures (JEPAs): a control JEPA models the control dynamics and guides the predictions of a wireless JEPA, which captures the dynamics of the device’s channel state information (CSI) through cross-modal conditioning. We then train a deep reinforcement learning (RL) algorithm to derive a control policy from latent control dynamics and a power predictor to estimate scheduling intervals with favorable channel conditions based on latent CSI representations. As such, the controller minimizes the usage of radio resources by utilizing the coupled JEPA networks to imagine the device’s trajectory in latent space. We present simulation results on synthetic multimodal data and show that our proposed approach reduces transmit power by over 50% while maintaining control performance comparable to baseline methods that do not account for wireless optimization.
nan
Article 1267
Title@2025-06-19 (4): Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts
Title: Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts | Multi-Preference-Optimierung: Verallgemeinern von DPO über Set-Level-Kontrast | 多优先优化:通过定点对比度普及残疾人组织 2412.04628v4 |
Authors (6): Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Nagarajan Natarajan, Chetan Bansal, Saravan Rajmohan
Direct Preference Optimization (DPO) has become a popular approach for aligning language models using pairwise preferences. However, in practical post-training pipelines, on-policy generation typically yields multiple candidate responses per prompt, which are scored by a reward model to guide learning. In this setting, we propose $\textbf{Multi-Preference Optimization (MPO)}$, a generalization of DPO that optimizes over entire sets of responses by extending the Bradley-Terry model to groupwise comparisons between chosen and rejected sets. To further enhance learning, MPO employs deviation-based weighting, which emphasizes outlier responses that deviate most from the mean reward, effectively inducing a self-paced curriculum. We theoretically prove that MPO reduces alignment bias at a rate of $\mathcal{O}\left(\frac{1}{\sqrt{n}}\right)$ with respect to the number of responses per query. Empirically, MPO achieves state-of-the-art performance on the UltraFeedback benchmark and yields up to $\sim 17.5\%$ improvement over the state-of-the-art baseline in length-controlled win rate on AlpacaEval2, establishing a new baseline for preference-based alignment
nan
Article 1268
Title@2025-06-19 (4): VideoGAN-based Trajectory Proposal for Automated Vehicles
Title: VideoGAN-based Trajectory Proposal for Automated Vehicles | VideoGAN-basierter Flugbahnvorschlag für Automatisierte Fahrzeuge | 以视频GAN为基础的自动车辆轨迹建议 2506.16209v1 |
Authors (3): Annajoyce Mariani, Kira Maag, Hanno Gottschalk
Being able to generate realistic trajectory options is at the core of increasing the degree of automation of road vehicles. While model-driven, rule-based, and classical learning-based methods are widely used to tackle these tasks at present, they can struggle to effectively capture the complex, multimodal distributions of future trajectories. In this paper we investigate whether a generative adversarial network (GAN) trained on videos of bird’s-eye view (BEV) traffic scenarios can generate statistically accurate trajectories that correctly capture spatial relationships between the agents. To this end, we propose a pipeline that uses low-resolution BEV occupancy grid videos as training data for a video generative model. From the generated videos of traffic scenarios we extract abstract trajectory data using single-frame object detection and frame-to-frame object matching. We particularly choose a GAN architecture for the fast training and inference times with respect to diffusion models. We obtain our best results within 100 GPU hours of training, with inference times under 20\,ms. We demonstrate the physical realism of the proposed trajectories in terms of distribution alignment of spatial and dynamic parameters with respect to the ground truth videos from the Waymo Open Motion Dataset.
nan
Article 1269
Title@2025-06-19 (4): Learning Dynamics in Continual Pre-Training for Large Language Models
Title: Learning Dynamics in Continual Pre-Training for Large Language Models | Dynamisches Lernen im kontinuierlichen Pre-Training für große Sprachmodelle | 大语言模式持续培训前培训中的学习动态 2505.07796v2 |
Authors (5): Xingjin Wang, Howe Tissue, Lu Wang, Linjing Li, Daniel Dajun Zeng
Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models. We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We have observed that the CPT loss curve fundamentally characterizes the transition from one curve to another hidden curve, and could be described by decoupling the effects of distribution shift and learning rate annealing. We derive a CPT scaling law that combines the two factors, enabling the prediction of loss at any (continual) training steps and across learning rate schedules (LRS) in CPT. Our formulation presents a comprehensive understanding of several critical factors in CPT, including loss potential, peak learning rate, training steps, replay ratio, etc. Moreover, our approach can be adapted to customize training hyper-parameters to different CPT goals such as balancing general and domain-specific performance. Extensive experiments demonstrate that our scaling law holds across various CPT datasets and training hyper-parameters.
nan
Article 1270
Title@2025-06-19 (4): Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
Title: Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs | Effiziente und datenschutzschonende Soft-Prompt-Übertragung für LLMs | 为LLMM公司高效和隐私保护软件迅速转让 2506.16196v1 |
Authors (6): Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Christopher A. Choquette-Choo, Adam Dziedzic
Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user’s token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for efficiency and privacy: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST (Privacy Of Soft prompt Transfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.
nan
Article 1271
Title@2025-06-19 (4): Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction
Title: Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction | Föderated Learning for MRI-based BrainAGE: Eine multizentrische Studie zur post-stroke funktionellen Ergebnisvorhersage | 为基于MRI的脑力智能学习联合会学习:关于打击后功能性结果预测的多中心研究 2506.15626v2 |
Authors (11): Vincent Roca, Marc Tommasi, Paul Andrey, Aurélien Bellet, Markus D. Schirmer, Hilde Henon, Laurent Puy, Julien Ramon, Grégory Kuchcinski, Martin Bretzner, Renaud Lopes
$\textbf{Objective:}$ Brain-predicted age difference (BrainAGE) is a neuroimaging biomarker reflecting brain health. However, training robust BrainAGE models requires large datasets, often restricted by privacy concerns. This study evaluates the performance of federated learning (FL) for BrainAGE estimation in ischemic stroke patients treated with mechanical thrombectomy, and investigates its association with clinical phenotypes and functional outcomes. $\textbf{Methods:}$ We used FLAIR brain images from 1674 stroke patients across 16 hospital centers. We implemented standard machine learning and deep learning models for BrainAGE estimates under three data management strategies: centralized learning (pooled data), FL (local training at each site), and single-site learning. We reported prediction errors and examined associations between BrainAGE and vascular risk factors (e.g., diabetes mellitus, hypertension, smoking), as well as functional outcomes at three months post-stroke. Logistic regression evaluated BrainAGE’s predictive value for these outcomes, adjusting for age, sex, vascular risk factors, stroke severity, time between MRI and arterial puncture, prior intravenous thrombolysis, and recanalisation outcome. $\textbf{Results:}$ While centralized learning yielded the most accurate predictions, FL consistently outperformed single-site models. BrainAGE was significantly higher in patients with diabetes mellitus across all models. Comparisons between patients with good and poor functional outcomes, and multivariate predictions of these outcomes showed the significance of the association between BrainAGE and post-stroke recovery. $\textbf{Conclusion:}$ FL enables accurate age predictions without data centralization. The strong association between BrainAGE, vascular risk factors, and post-stroke recovery highlights its potential for prognostic modeling in stroke care.
nan
Article 1272
Title@2025-06-19 (4): CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization
Title: CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization | CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization | CP$2美元:利用几何方法,通过Canonic化进行非正式预测 2506.16189v1 |
Authors (3): Putri A. van der Linden, Alexander Timans, Erik J. Bekkers
We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution shifts that deteriorate model performance. To address this issue, we propose integrating geometric information–such as geometric pose–into the conformal procedure to reinstate its guarantees and ensure robustness under geometric shifts. In particular, we explore recent advancements on pose canonicalization as a suitable information extractor for this purpose. Evaluating the combined approach across discrete and continuous shifts and against equivariant and augmentation-based baselines, we find that integrating geometric information with CP yields a principled way to address geometric shifts while maintaining broad applicability to black-box predictors.
nan
Article 1273
Title@2025-06-19 (4): Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval
Title: Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval | Hierarchisches Multi-Positive-Kontrastives Lernen für das Patentbild-Retrieval | 用于专利图像检索的等级式多动态差异学习 2506.13496v3 |
Authors (4): Kshitij Kavimandan, Angelos Nalmpantis, Emma Beauxis-Aussalet, Robert-Jan Sips
Patent images are technical drawings that convey information about a patent’s innovation. Patent image retrieval systems aim to search in vast collections and retrieve the most relevant images. Despite recent advances in information retrieval, patent images still pose significant challenges due to their technical intricacies and complex semantic information, requiring efficient fine-tuning for domain adaptation. Current methods neglect patents’ hierarchical relationships, such as those defined by the Locarno International Classification (LIC) system, which groups broad categories (e.g., “furnishing”) into subclasses (e.g., “seats” and “beds”) and further into specific patent designs. In this work, we introduce a hierarchical multi-positive contrastive loss that leverages the LIC’s taxonomy to induce such relations in the retrieval process. Our approach assigns multiple positive pairs to each patent image within a batch, with varying similarity scores based on the hierarchical taxonomy. Our experimental analysis with various vision and multimodal models on the DeepPatent2 dataset shows that the proposed method enhances the retrieval results. Notably, our method is effective with low-parameter models, which require fewer computational resources and can be deployed on environments with limited hardware.
nan
Article 1274
Title@2025-06-19 (4): Robust Hallucination Detection in LLMs via Adaptive Token Selection
Title: Robust Hallucination Detection in LLMs via Adaptive Token Selection | Robuste Halluzinationserkennung in LLMs durch adaptive Tokenauswahl | 通过适应 Tok 选择在LLMs中进行强力幻觉检测 2504.07863v2 |
Authors (3): Mengjia Niu, Hamed Haddadi, Guansong Pang
Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment. Recent research in hallucination detection has demonstrated that LLMs’ internal representations contain truthfulness hints, which can be harnessed for detector training. However, the performance of these detectors is heavily dependent on the internal representations of predetermined tokens, fluctuating considerably when working on free-form generations with varying lengths and sparse distributions of hallucinated entities. To address this, we propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens that are most indicative of hallucinations. We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence, thereby facilitating a joint optimisation of token selection and hallucination detection on generation sequences of diverse forms. Comprehensive experimental results on four hallucination benchmarks show that HaMI significantly outperforms existing state-of-the-art approaches.
nan
Article 1275
Title@2025-06-19 (4): Sheaf Hypergraph Networks
Title: Sheaf Hypergraph Networks | Sheaf Hypergraph Networks | Sheaf 电报网络 2309.17116v2 |
Authors (4): Iulia Duta, Giulia Cassarà, Fabrizio Silvestri, Pietro Liò
Higher-order relations are widespread in nature, with numerous phenomena involving complex interactions that extend beyond simple pairwise connections. As a result, advancements in higher-order processing can accelerate the growth of various fields requiring structured data. Current approaches typically represent these interactions using hypergraphs. We enhance this representation by introducing cellular sheaves for hypergraphs, a mathematical construction that adds extra structure to the conventional hypergraph while maintaining their local, higherorder connectivity. Drawing inspiration from existing Laplacians in the literature, we develop two unique formulations of sheaf hypergraph Laplacians: linear and non-linear. Our theoretical analysis demonstrates that incorporating sheaves into the hypergraph Laplacian provides a more expressive inductive bias than standard hypergraph diffusion, creating a powerful instrument for effectively modelling complex data structures. We employ these sheaf hypergraph Laplacians to design two categories of models: Sheaf Hypergraph Neural Networks and Sheaf Hypergraph Convolutional Networks. These models generalize classical Hypergraph Networks often found in the literature. Through extensive experimentation, we show that this generalization significantly improves performance, achieving top results on multiple benchmark datasets for hypergraph node classification.
nan
Article 1276
Title@2025-06-19 (4): Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks
Title: Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks | Repräsentationslernen mit gegenseitigem Einfluss von Modalitäten für die Knotenklassifikation in multimodalen Heterogenen Netzwerken | 多模式不同形式网络节点分类方式相互影响,代表学习 2505.07895v3 |
Authors (7): Jiafan Li, Jiaqi Zhu, Liang Chang, Yilin Li, Miaomiao Li, Yang Wang, Hongan Wang
Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban’s movie networks and Amazon’s product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either early fusion strategies which may lose the unique characteristics of individual modalities, or late fusion approaches overlooking the cross-modal guidance in GNN-based information propagation. In this paper, we propose a novel model for node classification in MMHNs, named Heterogeneous Graph Neural Network with Inter-Modal Attention (HGNN-IMA). It learns node representations by capturing the mutual influence of multiple modalities during the information propagation process, within the framework of heterogeneous graph transformer. Specifically, a nested inter-modal attention mechanism is integrated into the inter-node attention to achieve adaptive multi-modal fusion, and modality alignment is also taken into account to encourage the propagation among nodes with consistent similarities across all modalities. Moreover, an attention loss is augmented to mitigate the impact of missing modalities. Extensive experiments validate the superiority of the model in the node classification task, providing an innovative view to handle multi-modal data, especially when accompanied with network structures.
nan
Article 1277
Title@2025-06-19 (4): From Teacher to Student: Tracking Memorization Through Model Distillation
Title: From Teacher to Student: Tracking Memorization Through Model Distillation | Vom Lehrer zum Schüler: Erinnerung durch Modelldestillation verfolgen | 从教师到学生:通过示范蒸馏跟踪记忆 2506.16170v1 |
Authors (1): Simardeep Singh
Large language models (LLMs) are known to memorize parts of their training data, raising important concerns around privacy and security. While previous research has focused on studying memorization in pre-trained models, much less is known about how knowledge distillation (KD) affects memorization.In this study, we explore how different KD methods influence the memorization of fine-tuned task data when a large teacher model is distilled into smaller student variants.This study demonstrates that distilling a larger teacher model, fine-tuned on a dataset, into a smaller variant not only lowers computational costs and model size but also significantly reduces the memorization risks compared to standard fine-tuning approaches.
nan
Article 1278
Title@2025-06-19 (4): Performance of Rank-One Tensor Approximation on Incomplete Data
Title: Performance of Rank-One Tensor Approximation on Incomplete Data | Leistung der Rang eins Tensor-Annäherung auf unvollständigen Daten | 在不完全数据上接近 “ 一等-一等 “ 的性能 2504.07818v2 |
Authors (1): Hugo Lebeau
We are interested in the estimation of a rank-one tensor signal when only a portion $\varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral analysis gives access to the reconstruction performance. These results shed light on and specify the loss of performance induced by an artificial reduction of the memory cost of a tensor via the deletion of a random part of its entries.
nan
Article 1279
Title@2025-06-19 (4): Return-Aligned Decision Transformer
Title: Return-Aligned Decision Transformer | Return-Aligned Decision Transformer | 回归统一决定转换器 2402.03923v6 |
Authors (5): Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra
Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. It is increasingly important to adjust the performance of AI agents to meet human requirements, for example, in applications like video games and education tools. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and includes a mechanism to control the agent’s performance using the target return. However, the action generation is hardly influenced by the target return because DT’s self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to more effectively align the actual return with the target return. RADT leverages features extracted by paying attention solely to the return, enabling action generation to consistently depend on the target return. Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods. Our code is available at https://github.com/CyberAgentAILab/radt
nan
Article 1280
Title@2025-06-19 (4): DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Title: DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products | DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte | DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v6 |
Authors (6): Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank-1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing negative eigenvalues in the state-transition matrices. Building on the interpretation of DeltaNet’s recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple ($n_h$) steps per token. This naturally leads to diagonal plus rank-$n_h$ state-transition matrices, formed as products of $n_h$ generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision, showing how it improves by increasing $n_h$. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and language modeling, while also showing significantly improved length extrapolation capabilities.
nan
Article 1281
Title@2025-06-19 (4): Geometric Learning in Black-Box Optimization: A GNN Framework for Algorithm Performance Prediction
Title: Geometric Learning in Black-Box Optimization: A GNN Framework for Algorithm Performance Prediction | Geometrisches Lernen in der Black-Box-Optimierung: Ein GNN-Framework für Algorithmen-Performance-Vorhersage | 黑人Box优化中的几何学习:GNN指数性能预测框架 2506.16144v1 |
Authors (5): Ana Kostovska, Carola Doerr, Sašo Džeroski, Panče Panov, Tome Eftimov
Automated algorithm performance prediction in numerical blackbox optimization often relies on problem characterizations, such as exploratory landscape analysis features. These features are typically used as inputs to machine learning models and are represented in a tabular format. However, such approaches often overlook algorithm configurations, a key factor influencing performance. The relationships between algorithm operators, parameters, problem characteristics, and performance outcomes form a complex structure best represented as a graph. This work explores the use of heterogeneous graph data structures and graph neural networks to predict the performance of optimization algorithms by capturing the complex dependencies between problems, algorithm configurations, and performance outcomes. We focus on two modular frameworks, modCMA-ES and modDE, which decompose two widely used derivative-free optimization algorithms: the covariance matrix adaptation evolution strategy (CMA-ES) and differential evolution (DE). We evaluate 324 modCMA-ES and 576 modDE variants on 24 BBOB problems across six runtime budgets and two problem dimensions. Achieving up to 36.6% improvement in MSE over traditional tabular-based methods, this work highlights the potential of geometric learning in black-box optimization.
nan
Article 1282
Title@2025-06-19 (4): GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Title: GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | GRPO-CARE: Konsequentitäts-Bewusst-Verstärkungs-Lernen für multimodale Vernunft | GROPO-CARE: 统一软件强化学习,用于多模式理由 2506.16141v1 |
Authors (7): Yi Chen, Yuying Ge, Rui Wang, Yixiao Ge, Junhao Cheng, Ying Shan, Xihui Liu
Recent reinforcement learning approaches, such as outcome-supervised GRPO, have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1, a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution, cross-environment, and cross-environment-task scenarios. Using SEED-Bench-R1, we find that standard GRPO, while improving answer accuracy, often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts, and strict KL penalties limiting exploration.To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward: (1) a base reward for answer correctness, and (2) an adaptive consistency bonus, computed by comparing the model’s reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1, achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability, improving model performance across diverse video understanding benchmarks. Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
nan
Article 1283
Title@2025-06-19 (4): Solving Zero-Sum Convex Markov Games
Title: Solving Zero-Sum Convex Markov Games | Lösen Zero-Sum Convex Markov Spiele | 解决零- 苏姆 Convex Markov 游戏 2506.16120v1 |
Authors (4): Fivos Kalogiannis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ian Gemp, Georgios Piliouras
We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex–hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.
nan
Article 1284
Title@2025-06-19 (4): Deep learning joint extremes of metocean variables using the SPAR model
Title: Deep learning joint extremes of metocean variables using the SPAR model | Deep Learning gemeinsame Extreme von Metozean-Variablen mit dem SPAR-Modell | 使用SPAR模型的深海海洋变量的深学习联合极端 2412.15808v2 |
Authors (4): Ed Mackay, Callum Murphy-Barltrop, Jordan Richards, Philip Jonathan
This paper presents a novel deep learning framework for estimating multivariate joint extremes of metocean variables, based on the Semi-Parametric Angular-Radial (SPAR) model. When considered in polar coordinates, the problem of modelling multivariate extremes is transformed to one of modelling an angular density, and the tail of a univariate radial variable conditioned on angle. In the SPAR approach, the tail of the radial variable is modelled using a generalised Pareto (GP) distribution, providing a natural extension of univariate extreme value theory to the multivariate setting. In this work, we show how the method can be applied in higher dimensions, using a case study for five metocean variables: wind speed, wind direction, wave height, wave period, and wave direction. The angular variable is modelled using a kernel density method, while the parameters of the GP model are approximated using fully-connected deep neural networks. Our approach provides great flexibility in the dependence structures that can be represented, together with computationally efficient routines for training the model. Furthermore, the application of the method requires fewer assumptions about the underlying distribution(s) compared to existing approaches, and an asymptotically justified means for extrapolating outside the range of observations. Using various diagnostic plots, we show that the fitted models provide a good description of the joint extremes of the metocean variables considered.
nan
Article 1285
Title@2025-06-19 (4): ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Title: ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning | ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen | ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v5 |
Authors (4): Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy’s deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
nan
Article 1286
Title@2025-06-19 (4): Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification
Title: Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification | Vermeidung von Überbeanspruchung in Graphen-Neuralen Netzwerken durch Spectrum-Erhaltung von Sparsifikationen | 通过光谱保护分解减轻图形神经网络的过度隔动 2506.16110v1 |
Authors (6): Langzhang Liang, Fanchen Bu, Zixing Song, Zenglin Xu, Shirui Pan, Kijung Shin
The message-passing paradigm of Graph Neural Networks often struggles with exchanging information across distant nodes typically due to structural bottlenecks in certain graph regions, a limitation known as \textit{over-squashing}. To reduce such bottlenecks, \textit{graph rewiring}, which modifies graph topology, has been widely used. However, existing graph rewiring techniques often overlook the need to preserve critical properties of the original graph, e.g., \textit{spectral properties}. Moreover, many approaches rely on increasing edge count to improve connectivity, which introduces significant computational overhead and exacerbates the risk of over-smoothing. In this paper, we propose a novel graph rewiring method that leverages \textit{spectrum-preserving} graph \textit{sparsification}, for mitigating over-squashing. Our method generates graphs with enhanced connectivity while maintaining sparsity and largely preserving the original graph spectrum, effectively balancing structural bottleneck reduction and graph property preservation. Experimental results validate the effectiveness of our approach, demonstrating its superiority over strong baseline methods in classification accuracy and retention of the Laplacian spectrum.
nan
Article 1287
Title@2025-06-19 (4): Advancing atomic electron tomography with neural networks
Title: Advancing atomic electron tomography with neural networks | Weiterentwicklung der Atomelektronentomographie mit neuronalen Netzwerken | 利用神经网络推进原子电子摄影 2506.16104v1 |
Authors (2): Juhyeok Lee, Yongsoo Yang
Accurate determination of three-dimensional (3D) atomic structures is crucial for understanding and controlling the properties of nanomaterials. Atomic electron tomography (AET) offers non-destructive atomic imaging with picometer-level precision, enabling the resolution of defects, interfaces, and strain fields in 3D, as well as the observation of dynamic structural evolution. However, reconstruction artifacts arising from geometric limitations and electron dose constraints can hinder reliable atomic structure determination. Recent progress has integrated deep learning, especially convolutional neural networks, into AET workflows to improve reconstruction fidelity. This review highlights recent advances in neural network-assisted AET, emphasizing its role in overcoming persistent challenges in 3D atomic imaging. By significantly enhancing the accuracy of both surface and bulk structural characterization, these methods are advancing the frontiers of nanoscience and enabling new opportunities in materials research and technology.
nan
Article 1288
Title@2025-06-19 (4): Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans
Title: Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans | Flow Matching: Markov-Kernel, stochastische Prozesse und Transportpläne | 流程匹配:Markov Kernels, 存储过程和运输计划 2501.16839v4 |
Authors (2): Christian Wald, Gabriele Steidl
Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.
nan
Article 1289
Title@2025-06-19 (4): Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning
Title: Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning | Semantic-Aware-Spektrum-Sharing im Internet von Fahrzeugen auf Basis von Deep Reinforcement Learning | 基于深强化学习的车辆在互联网上共享语义-语言软件频谱 2406.07213v4 |
Authors (7): Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief
This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.
nan
Article 1290
Title@2025-06-19 (4): Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning
Title: Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning | Rekonfigurierbare intelligente oberflächenunterstützte VEC auf Basis von Multi-Agenten-Verstärkungslernen | 基于多机构强化学习的可重新配置智能表面辅助VEC 2406.11318v2 |
Authors (6): Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang
Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS) is introduced to support vehicle communication and provide an alternative communication path. The system performance can be improved by flexibly adjusting the phase-shift of the RIS. For RIS-assisted VEC system where tasks arrive randomly, we design a control scheme that considers offloading power, local power allocation and phase-shift optimization. To solve this non-convex problem, we propose a new deep reinforcement learning (DRL) framework that employs modified multi-agent deep deterministic policy gradient (MADDPG) approach to optimize the power allocation for vehicle users (VUs) and block coordinate descent (BCD) algorithm to optimize the phase-shift of the RIS. Simulation results show that our proposed scheme outperforms the centralized deep deterministic policy gradient (DDPG) scheme and random scheme.
nan
Article 1291
Title@2025-06-19 (4): On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse
Title: On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse | Über die Grenzen der Sprachgenerierung: Trade-Offs zwischen Halluzination und Modekollaps | 语言产生限制:幻觉与模式崩溃之间的取舍 2411.09642v2 |
Authors (3): Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas
Specifying all desirable properties of a language model is challenging, but certain requirements seem essential. Given samples from an unknown language, the trained model should produce valid strings not seen in training and be expressive enough to capture the language’s full richness. Otherwise, outputting invalid strings constitutes “hallucination,” and failing to capture the full range leads to “mode collapse.” We ask if a language model can meet both requirements. We investigate this within a statistical language generation setting building on Gold and Angluin. Here, the model receives random samples from a distribution over an unknown language K, which belongs to a possibly infinite collection of languages. The goal is to generate unseen strings from K. We say the model generates from K with consistency and breadth if, as training size increases, its output converges to all unseen strings in K. Kleinberg and Mullainathan [KM24] asked if consistency and breadth in language generation are possible. We answer this negatively: for a large class of language models, including next-token prediction models, this is impossible for most collections of candidate languages. This contrasts with [KM24]’s result, showing consistent generation without breadth is possible for any countable collection of languages. Our finding highlights that generation with breadth fundamentally differs from generation without breadth. As a byproduct, we establish near-tight bounds on the number of samples needed for generation with or without breadth. Finally, our results offer hope: consistent generation with breadth is achievable for any countable collection of languages when negative examples (strings outside K) are available alongside positive ones. This suggests that post-training feedback, which encodes negative examples, can be crucial in reducing hallucinations while limiting mode collapse.
nan
Article 1292
Title@2025-06-19 (4): Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks
Title: Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks | Deep-Reinforcement-Learning-based AoI-Aware Ressourcenzuweisung für RIS-Aided IoV-Netzwerke | 为RIS援助的IOV网络分配的深入加强-基于学习的AoI-软件资源 2406.11245v2 |
Authors (7): Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief
Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-to-infrastructure (V2I) links and the stability of vehicle-to-vehicle (V2V) links, we introduce the age of information (AoI) model and the payload transmission probability model. Therefore, with the objective of minimizing the AoI of V2I links and prioritizing transmission of V2V links payload, we construct this optimization problem as an Markov decision process (MDP) problem in which the BS serves as an agent to allocate resources and control phase-shift for the vehicles using the soft actor-critic (SAC) algorithm, which gradually converges and maintains a high stability. A AoI-aware joint vehicular resource allocation and RIS phase-shift control scheme based on SAC algorithm is proposed and simulation results show that its convergence speed, cumulative reward, AoI performance, and payload transmission probability outperforms those of proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3) and stochastic algorithms.
nan
Article 1293
Title@2025-06-19 (4): A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders
Title: A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders | Ein Brain-to-Population Graph Learning Framework zur Diagnose von Hirnerkrankungen | 脑至人口图诊断脑疾病学习框架 2506.16096v1 |
Authors (7): Qianqian Liao, Wuque Cai, Hongze Sun, Dongze Liu, Duo Chen, Dezhong Yao, Daqing Guo
Recent developed graph-based methods for diagnosing brain disorders using functional connectivity highly rely on predefined brain atlases, but overlook the rich information embedded within atlases and the confounding effects of site and phenotype variability. To address these challenges, we propose a two-stage Brain-to-Population Graph Learning (B2P-GL) framework that integrates the semantic similarity of brain regions and condition-based population graph modeling. In the first stage, termed brain representation learning, we leverage brain atlas knowledge from GPT-4 to enrich the graph representation and refine the brain graph through an adaptive node reassignment graph attention network. In the second stage, termed population disorder diagnosis, phenotypic data is incorporated into population graph construction and feature fusion to mitigate confounding effects and enhance diagnosis performance. Experiments on the ABIDE I, ADHD-200, and Rest-meta-MDD datasets show that B2P-GL outperforms state-of-the-art methods in prediction accuracy while enhancing interpretability. Overall, our proposed framework offers a reliable and personalized approach to brain disorder diagnosis, advancing clinical applicability.
nan
Article 1294
Title@2025-06-19 (4): Temporal horizons in forecasting: a performance-learnability trade-off
Title: Temporal horizons in forecasting: a performance-learnability trade-off | Zeithorizonte bei der Prognose: ein Leistungs-Ernennbarkeits-Austausch | 预测的时空前景:业绩-可忽略性权衡取舍 2506.03889v2 |
Authors (5): Pau Vilimelis Aceituno, Jack William Miller, Noah Marti, Youssef Farag, Victor Boussange
When training autoregressive models to forecast dynamical systems, a critical question arises: how far into the future should the model be trained to predict? Too short a horizon may miss long-term trends, while too long a horizon can impede convergence due to accumulating prediction errors. In this work, we formalize this trade-off by analyzing how the geometry of the loss landscape depends on the training horizon. We prove that for chaotic systems, the loss landscape’s roughness grows exponentially with the training horizon, while for limit cycles, it grows linearly, making long-horizon training inherently challenging. However, we also show that models trained on long horizons generalize well to short-term forecasts, whereas those trained on short horizons suffer exponentially (resp. linearly) worse long-term predictions in chaotic (resp. periodic) systems. We validate our theory through numerical experiments and discuss practical implications for selecting training horizons. Our results provide a principled foundation for hyperparameter optimization in autoregressive forecasting models.
nan
Article 1295
Title@2025-06-19 (4): Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network
Title: Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network | Ressourcenzuteilung für Twin Maintenance und Computing Task Processing im digitalen Twin Vehicular Edge Computing Network | 数字双面电子计算网络双向维护和电子计算任务处理的资源分配 2407.07575v2 |
Authors (7): Yu Xie, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief
As a promising technology, vehicular edge computing (VEC) can provide computing and caching services by deploying VEC servers near vehicles. However, VEC networks still face challenges such as high vehicle mobility. Digital twin (DT), an emerging technology, can predict, estimate, and analyze real-time states by digitally modeling objects in the physical world. By integrating DT with VEC, a virtual vehicle DT can be created in the VEC server to monitor the real-time operating status of vehicles. However, maintaining the vehicle DT model requires ongoing attention from the VEC server, which also needs to offer computing services for the vehicles. Therefore, effective allocation and scheduling of VEC server resources are crucial. This study focuses on a general VEC network with a single VEC service and multiple vehicles, examining the two types of delays caused by twin maintenance and computational processing within the network. By transforming the problem using satisfaction functions, we propose an optimization problem aimed at maximizing each vehicle’s resource utility to determine the optimal resource allocation strategy. Given the non-convex nature of the issue, we employ multi-agent Markov decision processes to reformulate the problem. Subsequently, we propose the twin maintenance and computing task processing resource collaborative scheduling (MADRL-CSTC) algorithm, which leverages multi-agent deep reinforcement learning. Through experimental comparisons with alternative algorithms, it demonstrates that our proposed approach is effective in terms of resource allocation.
nan
Article 1296
Title@2025-06-19 (4): Mobility-Aware Federated Self-supervised Learning in Vehicular Network
Title: Mobility-Aware Federated Self-supervised Learning in Vehicular Network | Mobilitätsbewusstes Selbstüberwachtes Lernen im Vehicular Network | 车辆网络中流动软件 – – 流动软件 – – 联邦监督的自我学习 2408.00256v2 |
Authors (4): Xueying Gu, Qiong Wu, Pingyi Fan, Qiang Fan
Federated Learning (FL) is an advanced distributed machine learning approach, that protects the privacy of each vehicle by allowing the model to be trained on multiple devices simultaneously without the need to upload all data to a road side unit (RSU). This enables FL to handle scenarios with sensitive or widely distributed data. However, in these fields, it is well known that the labeling costs can be a significant expense, and models relying on labels are not suitable for these rapidly evolving fields especially in vehicular networks, or mobile internet of things (MIoT), where new data emerges constantly. To handle this issue, the self-supervised learning paves the way for training without labels. Additionally, for vehicles with high velocity, owing to blurred images, simple aggregation not only impacts the accuracy of the aggregated model but also reduces the convergence speed of FL. This paper proposes a FL algorithm based on image blur level to aggregation, called FLSimCo, which does not require labels and serves as a pre-training stage for self-supervised learning in the vehicular environment. Simulation results demonstrate that the proposed algorithm exhibits fast and stable convergence.
nan
Article 1297
Title@2025-06-19 (4): Diffusion-Based Hypothesis Testing and Change-Point Detection
Title: Diffusion-Based Hypothesis Testing and Change-Point Detection | Diffusionsbasierte Hypothesenprüfung und Change-Point-Erkennung | 基于传播的假假设测试和变化点探测 2506.16089v1 |
Authors (3): Sean Moushegian, Taposh Banerjee, Vahid Tarokh
Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based Fisher divergence into a diffusion-divergence by transforming score functions via multiplication with a matrix-valued function or a weight matrix. In this paper, we extend the score-based hypothesis test and change-point detection stopping rule into their diffusion-based analogs. Additionally, we theoretically quantify the performance of these diffusion-based algorithms and study scenarios where optimal performance is achievable. We propose a method of numerically optimizing the weight matrix and present numerical simulations to illustrate the advantages of diffusion-based algorithms.
nan
Article 1298
Title@2025-06-19 (4): HSTU-BLaIR: Lightweight Contrastive Text Embedding for Generative Recommender
Title: HSTU-BLaIR: Lightweight Contrastive Text Embedding for Generative Recommender | HSTU-BLaIR: Leichte Kontrastive Text-Embedding für generative Recommender | HSTU-BLAIR: 用于产生建议建议的轻量比对式文本嵌入 2504.10545v3 |
Authors (1): Yijun Liu
Recent advances in recommender systems have underscored the complementary strengths of generative modeling and pretrained language models. We propose HSTU-BLaIR, a hybrid framework that augments the Hierarchical Sequential Transduction Unit (HSTU)-based generative recommender with BLaIR, a lightweight contrastive text embedding model. This integration enriches item representations with semantic signals from textual metadata while preserving HSTU’s powerful sequence modeling capabilities. We evaluate HSTU-BLaIR on two e-commerce datasets: three subsets from the Amazon Reviews 2023 dataset and the Steam dataset. We compare its performance against both the original HSTU-based recommender and a variant augmented with embeddings from OpenAI’s state-of-the-art \texttt{text-embedding-3-large} model. Despite the latter being trained on a substantially larger corpus with significantly more parameters, our lightweight BLaIR-enhanced approach – pretrained on domain-specific data – achieves better performance in nearly all cases. Specifically, HSTU-BLaIR outperforms the OpenAI embedding-based variant on all but one metric, where it is marginally lower, and matches it on another. These findings highlight the effectiveness of contrastive text embeddings in compute-efficient recommendation settings.
nan
Article 1299
Title@2025-06-19 (4): Investigating Lagrangian Neural Networks for Infinite Horizon Planning in Quadrupedal Locomotion
Title: Investigating Lagrangian Neural Networks for Infinite Horizon Planning in Quadrupedal Locomotion | Untersuchung lagrangischer neuraler Netzwerke für die unendliche Horizontplanung in der Quadrupedal-Locomotion | 调查拉格朗江神经网络,以在四分居动中进行无限期地地平线规划 2506.16079v1 |
Authors (3): Prakrut Kotecha, Aditya Shirwatkar, Shishir Kolathaya
Lagrangian Neural Networks (LNNs) present a principled and interpretable framework for learning the system dynamics by utilizing inductive biases. While traditional dynamics models struggle with compounding errors over long horizons, LNNs intrinsically preserve the physical laws governing any system, enabling accurate and stable predictions essential for sustainable locomotion. This work evaluates LNNs for infinite horizon planning in quadrupedal robots through four dynamics models: (1) full-order forward dynamics (FD) training and inference, (2) diagonalized representation of Mass Matrix in full order FD, (3) full-order inverse dynamics (ID) training with FD inference, (4) reduced-order modeling via torso centre-of-mass (CoM) dynamics. Experiments demonstrate that LNNs bring improvements in sample efficiency (10x) and superior prediction accuracy (up to 2-10x) compared to baseline methods. Notably, the diagonalization approach of LNNs reduces computational complexity while retaining some interpretability, enabling real-time receding horizon control. These findings highlight the advantages of LNNs in capturing the underlying structure of system dynamics in quadrupeds, leading to improved performance and efficiency in locomotion planning and control. Additionally, our approach achieves a higher control frequency than previous LNN methods, demonstrating its potential for real-world deployment on quadrupeds.
nan
Article 1300
Title@2025-06-19 (4): Probing the Robustness of Large Language Models Safety to Latent Perturbations
Title: Probing the Robustness of Large Language Models Safety to Latent Perturbations | Nachweis der Robustheit großer Sprachmodelle Sicherheit zu latenten Störungen | 检验大语言模型安全性是否强,以证实大语言模型安全性是否足以应对前端扰动 2506.16078v1 |
Authors (10): Tianle Gu, Kexin Huang, Zongqi Wang, Yixu Wang, Jie Li, Yuanqi Yao, Yang Yao, Yujiu Yang, Yan Teng, Yingchun Wang
Safety alignment is a key requirement for building reliable Artificial General Intelligence. Despite significant advances in safety alignment, we observe that minor latent shifts can still trigger unsafe responses in aligned models. We argue that this stems from the shallow nature of existing alignment methods, which focus on surface-level refusal behaviors without sufficiently altering internal representations. Consequently, small shifts in hidden activations can re-trigger harmful behaviors embedded in the latent space. To explore the robustness of safety alignment to latent perturbations, we introduce a probing method that measures the Negative Log-Likelihood of the original response generated by the model. This probe quantifies local sensitivity in the latent space, serving as a diagnostic tool for identifying vulnerable directions. Based on this signal, we construct effective jailbreak trajectories, giving rise to the Activation Steering Attack (ASA). More importantly, these insights offer a principled foundation for improving alignment robustness. To this end, we introduce Layer-wise Adversarial Patch Training~(LAPT), a fine-tuning strategy that inject controlled perturbations into hidden representations during training. Experimental results highlight that LAPT strengthen alignment robustness without compromising general capabilities. Our findings reveal fundamental flaws in current alignment paradigms and call for representation-level training strategies that move beyond surface-level behavior supervision. Codes and results are available at https://github.com/Carol-gutianle/LatentSafety.
nan
Article 1301
Title@2025-06-19 (4): Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
Title: Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching | Schnellere stochastische Optimierung mit willkürlichen Verzögerungen über asynchrones Mini-Batching | 通过非同步小型批次快速优化任意拖延 2408.07503v2 |
Authors (3): Amit Attia, Ofir Gaash, Tomer Koren
We consider the problem of asynchronous stochastic optimization, where an optimization algorithm makes updates based on stale stochastic gradients of the objective that are subject to an arbitrary (possibly adversarial) sequence of delays. We present a procedure which, for any given $q \in (0,1]$, transforms any standard stochastic first-order method to an asynchronous method with convergence guarantee depending on the $q$-quantile delay of the sequence. This approach leads to convergence rates of the form $O(\tau_q/qT+\sigma/\sqrt{qT})$ for non-convex and $O(\tau_q^2/(q T)^2+\sigma/\sqrt{qT})$ for convex smooth problems, where $\tau_q$ is the $q$-quantile delay, generalizing and improving on existing results that depend on the average delay. We further show a method that automatically adapts to all quantiles simultaneously, without any prior knowledge of the delays, achieving convergence rates of the form $O(\inf_{q} \tau_q/qT+\sigma/\sqrt{qT})$ for non-convex and $O(\inf_{q} \tau_q^2/(q T)^2+\sigma/\sqrt{qT})$ for convex smooth problems. Our technique is based on asynchronous mini-batching with a careful batch-size selection and filtering of stale gradients.
nan
Article 1302
Title@2025-06-19 (4): Joint User Priority and Power Scheduling for QoS-Aware WMMSE Precoding: A Constrained-Actor Attentive-Critic Approach
Title: Joint User Priority and Power Scheduling for QoS-Aware WMMSE Precoding: A Constrained-Actor Attentive-Critic Approach | Gemeinsame Benutzerpriorität und Leistungsplanung für QoS-Aware WMMSE-Vorkodierung: Ein eingeschränkter, aktiv-kritischer Ansatz | Qos-Aware WMMSE 预码: 控制- 控制- 控制- 控制- 控制- 控制- 控制- 反应- 批评方法 2506.16074v1 |
Authors (2): Kexuan Wang, An Liu
6G wireless networks are expected to support diverse quality-of-service (QoS) demands while maintaining high energy efficiency. Weighted Minimum Mean Square Error (WMMSE) precoding with fixed user priorities and transmit power is widely recognized for enhancing overall system performance but lacks flexibility to adapt to user-specific QoS requirements and time-varying channel conditions. To address this, we propose a novel constrained reinforcement learning (CRL) algorithm, Constrained-Actor Attentive-Critic (CAAC), which uses a policy network to dynamically allocate user priorities and power for WMMSE precoding. Specifically, CAAC integrates a Constrained Stochastic Successive Convex Approximation (CSSCA) method to optimize the policy, enabling more effective handling of energy efficiency goals and satisfaction of stochastic non-convex QoS constraints compared to traditional and existing CRL methods. Moreover, CAAC employs lightweight attention-enhanced Q-networks to evaluate policy updates without prior environment model knowledge. The network architecture not only enhances representational capacity but also boosts learning efficiency. Simulation results show that CAAC outperforms baselines in both energy efficiency and QoS satisfaction.
nan
Article 1303
Title@2025-06-19 (4): KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity
Title: KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity | KCES: Training-freie Verteidigung für robuste Graphen-Neural-Netzwerke über Kernel-Komplexität | KCES:通过核心复杂度为坚固的图表神经网络提供无训练防御 2506.11611v2 |
Authors (5): Yaning Jia, Shenyang Deng, Chiyu Ma, Yaoqing Yang, Soroush Vosoughi
Graph Neural Networks (GNNs) have achieved impressive success across a wide range of graph-based tasks, yet they remain highly vulnerable to small, imperceptible perturbations and adversarial attacks. Although numerous defense methods have been proposed to address these vulnerabilities, many rely on heuristic metrics, overfit to specific attack patterns, and suffer from high computational complexity. In this paper, we propose Kernel Complexity-Based Edge Sanitization (KCES), a training-free, model-agnostic defense framework. KCES leverages Graph Kernel Complexity (GKC), a novel metric derived from the graph’s Gram matrix that characterizes GNN generalization via its test error bound. Building on GKC, we define a KC score for each edge, measuring the change in GKC when the edge is removed. Edges with high KC scores, typically introduced by adversarial perturbations, are pruned to mitigate their harmful effects, thereby enhancing GNNs’ robustness. KCES can also be seamlessly integrated with existing defense strategies as a plug-and-play module without requiring training. Theoretical analysis and extensive experiments demonstrate that KCES consistently enhances GNN robustness, outperforms state-of-the-art baselines, and amplifies the effectiveness of existing defenses, offering a principled and efficient solution for securing GNNs.
nan
Article 1304
Title@2025-06-19 (4): A Lightweight RL-Driven Deep Unfolding Network for Robust WMMSE Precoding in Massive MU-MIMO-OFDM Systems
Title: A Lightweight RL-Driven Deep Unfolding Network for Robust WMMSE Precoding in Massive MU-MIMO-OFDM Systems | Ein leichtes RL-getriebenes Tiefen-Entfaltungs-Netzwerk für robuste WMMSE-Vorkodierung in massiven MU-MIMO-OFDM-Systemen | 大型MU-MIMO-OFDM系统中强力 WMMSE 预码的轻量 RL-Dripry 深载网络 2506.16072v1 |
Authors (2): Kexuan Wang, An Liu
Weighted Minimum Mean Square Error (WMMSE) precoding is widely recognized for its near-optimal weighted sum rate performance. However, its practical deployment in massive multi-user (MU) multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems is hindered by the assumption of perfect channel state information (CSI) and high computational complexity. To address these issues, we first develop a wideband stochastic WMMSE (SWMMSE) algorithm that iteratively maximizes the ergodic weighted sum-rate (EWSR) under imperfect CSI. Building on this, we propose a lightweight reinforcement learning (RL)-driven deep unfolding (DU) network (RLDDU-Net), where each SWMMSE iteration is mapped to a network layer. Specifically, its DU module integrates approximation techniques and leverages beam-domain sparsity as well as frequency-domain subcarrier correlation, significantly accelerating convergence and reducing computational overhead. Furthermore, the RL module adaptively adjusts the network depth and generates compensation matrices to mitigate approximation errors. Simulation results under imperfect CSI demonstrate that RLDDU-Net outperforms existing baselines in EWSR performance while offering superior computational and convergence efficiency.
nan
Article 1305
Title@2025-06-19 (4): Provably Efficient Online RLHF with One-Pass Reward Modeling
Title: Provably Efficient Online RLHF with One-Pass Reward Modeling | Effiziente Online-RLHF mit One-Pass-Reward-Modellierung | 配有 “ 一纸分奖励 “ 模型的在线甚高频网络高效率 2502.07193v2 |
Authors (4): Long-Fei Li, Yu-Yang Qian, Peng Zhao, Zhi-Hua Zhou
Reinforcement Learning from Human Feedback (RLHF) has shown remarkable success in aligning Large Language Models (LLMs) with human preferences. Traditional RLHF approaches rely on a fixed dataset, which often suffers from limited coverage. To this end, online RLHF has emerged as a promising direction, enabling iterative data collection and model improvement. Despite its potential, this paradigm faces a key bottleneck: the requirement to continuously integrate new data into the historical dataset and re-optimize the model from scratch at each iteration, resulting in computational and storage costs that grow linearly with the number of iterations. In this work, we address this challenge by proposing a one-pass reward modeling method that does not require storing the historical data and can be computed in constant time. Specifically, we first formalize RLHF as a contextual preference bandit problem and design an online mirror descent algorithm with a tailored local norm to replace the standard maximum likelihood estimation for reward modeling. We then apply our method to various online RLHF settings, including passive data collection, active data collection, and deployment-time adaptation. We provide theoretical guarantees showing that our method improves both statistical and computational efficiency. Finally, we provide practical algorithms and conduct experiments using Llama-3-8B-Instruct and Qwen2.5-7B-Instruct models on the Ultrafeedback-binarized and Mixture2 datasets, validating the effectiveness of our proposed method.
nan
Article 1306
Title@2025-06-19 (4): Complexity of Injectivity and Verification of ReLU Neural Networks
Title: Complexity of Injectivity and Verification of ReLU Neural Networks | Komplexität der Injektivität und Verifizierung von ReLU-Neuralnetzen | RELU神经网络的投射复杂度和核查 2405.19805v3 |
Authors (3): Vincent Froese, Moritz Grillo, Martin Skutella
Neural networks with ReLU activation play a key role in modern machine learning. Understanding the functions represented by ReLU networks is a major topic in current research as this enables a better interpretability of learning processes. Injectivity of a function computed by a ReLU network, that is, the question if different inputs to the network always lead to different outputs, plays a crucial role whenever invertibility of the function is required, such as, e.g., for inverse problems or generative models. The exact computational complexity of deciding injectivity was recently posed as an open problem (Puthawala et al. [JMLR 2022]). We answer this question by proving coNP-completeness. On the positive side, we show that the problem for a single ReLU-layer is still tractable for small input dimension; more precisely, we present a parameterized algorithm which yields fixed-parameter tractability with respect to the input dimension. In addition, we study the network verification problem which is to verify that certain inputs only yield specific outputs. This is of great importance since neural networks are increasingly used in safety-critical systems. We prove that network verification is coNP-hard for a general class of input domains. Our results also exclude constant-factor polynomial-time approximations for the maximum of a function computed by a ReLU network. In this context, we also characterize surjectivity of functions computed by ReLU networks with one-dimensional output which turns out to be the complement of a basic network verification task. We reveal interesting connections to computational convexity by formulating the surjectivity problem as a zonotope containment problem
nan
Article 1307
Title@2025-06-19 (4): DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing
Title: DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing | DRL-basiertes, selbstüberwachtes Lernen für Aufgabe Offloading und Ressourcenallokation im ISAC-fähigen Fahrzeug Edge Computing | DRL-基于DRL的基于联邦的自我监督学习,以在ISAC-可加入的车辆边缘电子计算中进行任务卸载和资源分配 2408.14831v2 |
Authors (6): Xueying Gu, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief
Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. Our previous work FLSimCo algorithm, which uses local resources for Federated Self-Supervised Learning (SSL), though vehicles often can’t complete all iterations task. Our improved algorithm offloads partial task to RSU and optimizes energy consumption by adjusting transmission power, CPU frequency, and task assignment ratios, balancing local and RSU-based training. Meanwhile, setting an offloading threshold further prevents inefficiencies. Simulation results show that the enhanced algorithm reduces energy consumption, improves offloading efficiency and the accuracy of Federated SSL.
nan
Article 1308
Title@2025-06-19 (4): On the generalization of Tanimoto-type kernels to real valued functions
Title: On the generalization of Tanimoto-type kernels to real valued functions | Über die Verallgemeinerung von Tanimoto-Kerneln zu echten wertgeschätzten Funktionen | 将谷本本型内核普遍化为实际的有价值功能 2007.05943v3 |
Authors (2): Sandor Szedmak, Eric Bach
The Tanimoto kernel (Jaccard index) is a well known tool to describe the similarity between sets of binary attributes. It has been extended to the case when the attributes are nonnegative real values. This paper introduces a more general Tanimoto kernel formulation which allows to measure the similarity of arbitrary real-valued functions. This extension is constructed by unifying the representation of the attributes via properly chosen sets. After deriving the general form of the kernel, explicit feature representation is extracted from the kernel function, and a simply way of including general kernels into the Tanimoto kernel is shown. Finally, the kernel is also expressed as a quotient of piecewise linear functions, and a smooth approximation is provided.
nan
Article 1309
Title@2025-06-19 (4): Floating-Point Neural Networks Are Provably Robust Universal Approximators
Title: Floating-Point Neural Networks Are Provably Robust Universal Approximators | Floating-Point-Neural-Netzwerke sind wahrscheinlich robuste Universal-Annäherung | 浮动点神经网络具有可可预见强健的通用通用近似器 2506.16065v1 |
Authors (5): Geonho Hwang, Wonyeol Lee, Yeachan Park, Sejun Park, Feras Saad
The classical universal approximation (UA) theorem for neural networks establishes mild conditions under which a feedforward neural network can approximate a continuous function $f$ with arbitrary accuracy. A recent result shows that neural networks also enjoy a more general interval universal approximation (IUA) theorem, in the sense that the abstract interpretation semantics of the network using the interval domain can approximate the direct image map of $f$ (i.e., the result of applying $f$ to a set of inputs) with arbitrary accuracy. These theorems, however, rest on the unrealistic assumption that the neural network computes over infinitely precise real numbers, whereas their software implementations in practice compute over finite-precision floating-point numbers. An open question is whether the IUA theorem still holds in the floating-point setting. This paper introduces the first IUA theorem for floating-point neural networks that proves their remarkable ability to perfectly capture the direct image map of any rounded target function $f$, showing no limits exist on their expressiveness. Our IUA theorem in the floating-point setting exhibits material differences from the real-valued setting, which reflects the fundamental distinctions between these two computational models. This theorem also implies surprising corollaries, which include (i) the existence of provably robust floating-point neural networks; and (ii) the computational completeness of the class of straight-line programs that use only floating-point additions and multiplications for the class of all floating-point programs that halt.
nan
Article 1310
Title@2025-06-19 (4): Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models
Title: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models | Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle | 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v3 |
Authors (6): Muxing Li, Zesheng Ye, Yixuan Li, Andy Song, Guangquan Zhang, Feng Liu
To detect unauthorized data usage in training large-scale generative models (e.g., ChatGPT or Midjourney), membership inference attacks (MIA) have proven effective in distinguishing a single training instance (a member) from a single non-training instance (a non-member). This success is mainly credited to a memorization effect: models tend to perform better on a member than a non-member. However, we find that standard MIAs fail against distilled generative models (i.e., student models) that are increasingly deployed in practice for efficiency (e.g., ChatGPT 4o-mini). Trained exclusively on data generated from a large-scale model (a teacher model), the student model lacks direct exposure to any members (teacher’s training data), nullifying the memorization effect that standard MIAs rely on. This finding reveals a serious privacy loophole, where generation-service providers could deploy a student model whose teacher was potentially trained on unauthorized data, yet claim the deployed model is clean because it was not directly trained on such data. Hence, are distilled models inherently unauditable for upstream privacy violations, and should we discard them when we care about privacy? We contend no, as we uncover a memory chain connecting the student and teacher’s member data: the distribution of student-generated data aligns more closely with the distribution of the teacher’s members than with non-members, thus we can detect unauthorized data usage even when direct instance-level memorization is absent. This leads us to posit that MIAs on distilled generative models should shift from instance-level scores to distribution-level statistics. We further propose three principles of distribution-based MIAs for detecting unauthorized training data through distilled generative models, and validate our position through an exemplar framework. We lastly discuss the implications of our position.
nan
Article 1311
Title@2025-06-19 (4): CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations
Title: CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations | CRIA: Ein Cross-View-Interaktions- und Instanz-adaptierter Vorausbildungsrahmen für allgemeine EEG-Vertretungen | CRIA: 通用的EEG代表制跨视角互动和根据实际情况制定的培训前框架 2506.16056v1 |
Authors (4): Puchun Liu, C. L. Philip Chen, Yubin He, Tong Zhang
The difficulty of extracting deep features from EEG data and effectively integrating information from multiple views presents significant challenges for developing a generalizable pretraining framework for EEG representation learning. However, most existing pre-training methods rely solely on the contextual semantics of a single view, failing to capture the complex and synergistic interactions among different perspectives, limiting the expressiveness and generalization of learned representations. To address these issues, this paper proposes CRIA, an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets. In this work, we define cross-view information as the integrated representation that emerges from the interaction among temporal, spectral, and spatial views of EEG signals. The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively, and combines an attention matrix masking strategy based on the information bottleneck principle with a novel viewpoint masking pre-training scheme. Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions, achieving a balanced accuracy of 57.02% for multi-class event classification and 80.03% for anomaly detection, highlighting its strong generalization ability.
nan
Article 1312
Title@2025-06-19 (4): A Sparse Tensor Generator with Efficient Feature Extraction
Title: A Sparse Tensor Generator with Efficient Feature Extraction | Ein Sparse Tensor Generator mit effizienter Feature Extraction | 具有高效地物采掘功能的简式天窗生成器 2405.04944v3 |
Authors (3): Tugba Torun, Ameer Taweel, Didem Unat
Sparse tensor operations are increasingly important in diverse applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle in sparse tensor research is the lack of large-scale sparse tensor datasets. Another challenge lies in analyzing sparse tensor features, which are essential not only for understanding the nonzero pattern but also for selecting the most suitable storage format, decomposition algorithm, and reordering methods. However, due to the large size of real-world tensors, even extracting these features can be computationally expensive without careful optimization. To address these limitations, we have developed a smart sparse tensor generator that replicates key characteristics of real sparse tensors. Additionally, we propose efficient methods for extracting a comprehensive set of sparse tensor features. The effectiveness of our generator is validated through the quality of extracted features and the performance of decomposition on the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/FeaTensor and https://github.com/sparcityeu/GenTensor, respectively.
nan
Article 1313
Title@2025-06-19 (4): LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records
Title: LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records | LabTOP: Ein einheitliches Modell für Labortestergebnisse Vorhersage auf elektronische Gesundheitsdatensätze | LabTOP:电子健康记录实验室试验结果预测统一模型 2502.14259v4 |
Authors (3): Sujeong Im, Jungwoo Oh, Edward Choi
Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose LabTOP, a unified model that predicts lab test outcomes by leveraging a language modeling approach on EHR data. Unlike conventional methods that estimate only a subset of lab tests or classify discrete value ranges, LabTOP performs continuous numerical predictions for a diverse range of lab items. We evaluate LabTOP on three publicly available EHR datasets and demonstrate that it outperforms existing methods, including traditional machine learning models and state-of-the-art large language models. We also conduct extensive ablation studies to confirm the effectiveness of our design choices. We believe that LabTOP will serve as an accurate and generalizable framework for lab test outcome prediction, with potential applications in clinical decision support and early detection of critical conditions.
nan
Article 1314
Title@2025-06-19 (4): From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Title: From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots | Vom Experten zum Generalisten: Auf dem Weg zur allgemeinen Ganzkörperkontrolle für humanoide Roboter | 从专家到通才:对人体机器人实行全面整体控制 2506.12779v2 |
Authors (8): Yuxuan Wang, Ming Yang, Weishuai Zeng, Yu Zhang, Xinrun Xu, Haobin Jiang, Ziluo Ding, Zongqing Lu
Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation to overcome these challenges. BB first leverages an autoencoder-based clustering method to group behaviorally similar motions using motion features and motion descriptions. Expert policies are then trained within each cluster and refined with real-world data through iterative delta action modeling to bridge the sim-to-real gap. Finally, these experts are distilled into a unified generalist controller that preserves agility and robustness across all motion types. Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control, setting a new benchmark for agile, robust, and generalizable humanoid performance in the real world.
nan
Article 1315
Title@2025-06-19 (4): From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience
Title: From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience | Von Daten zur Entscheidung: Data-Centric Infrastruktur für reproduzierbare ML in Collaborative eScience | 从数据到决定:合作电子科学中可复制ML的数据中心基础设施 2506.16051v1 |
Authors (6): Zhiwei Li, Carl Kesselman, Tran Huy Nguyen, Benjamin Yixing Xu, Kyle Bolo, Kimberley Yu
Reproducibility remains a central challenge in machine learning (ML), especially in collaborative eScience projects where teams iterate over data, features, and models. Current ML workflows are often dynamic yet fragmented, relying on informal data sharing, ad hoc scripts, and loosely connected tools. This fragmentation impedes transparency, reproducibility, and the adaptability of experiments over time. This paper introduces a data-centric framework for lifecycle-aware reproducibility, centered around six structured artifacts: Dataset, Feature, Workflow, Execution, Asset, and Controlled Vocabulary. These artifacts formalize the relationships between data, code, and decisions, enabling ML experiments to be versioned, interpretable, and traceable over time. The approach is demonstrated through a clinical ML use case of glaucoma detection, illustrating how the system supports iterative exploration, improves reproducibility, and preserves the provenance of collaborative decisions across the ML lifecycle.
nan
Article 1316
Title@2025-06-19 (4): Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Title: Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping | Leiter-Residual: Parallelismus-bewusste Architektur zur Beschleunigung großer Modellinferenz mit Kommunikationsüberlappung | 云梯-残余:加速大型模型推断与通信重叠的平行意识结构 2501.06589v5 |
Authors (10): Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
Large language model inference is both memory-intensive and time-consuming, often requiring distributed algorithms to efficiently scale. Various model parallelism strategies are used in multi-gpu training and inference to partition computation across multiple devices, reducing memory load and computation time. However, using model parallelism necessitates communication of information between GPUs, which has been a major bottleneck and limits the gains obtained by scaling up the number of devices. We introduce Ladder Residual, a simple architectural modification applicable to all residual-based models that enables straightforward overlapping that effectively hides the latency of communication. Our insight is that in addition to systems optimization, one can also redesign the model architecture to decouple communication from computation. While Ladder Residual can allow communication-computation decoupling in conventional parallelism patterns, we focus on Tensor Parallelism in this paper, which is particularly bottlenecked by its heavy communication. For a Transformer model with 70B parameters, applying Ladder Residual to all its layers can achieve 29% end-to-end wall clock speed up at inference time with TP sharding over 8 devices. We refer the resulting Transformer model as the Ladder Transformer. We train a 1B and 3B Ladder Transformer from scratch and observe comparable performance to a standard dense transformer baseline. We also show that it is possible to convert parts of the Llama-3.1 8B model to our Ladder Residual architecture with minimal accuracy degradation by only retraining for 3B tokens. We release our code for training and inference for easier replication of experiments.
nan
Article 1317
Title@2025-06-19 (4): FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system
Title: FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | FALCON: Feedback-gesteuert Adaptiv Lang-/Kurzzeitspeicher verstärkt Coding Optimization System | FALCON: 反馈驱动的适应性长/短期内存强化编码优化系统 2410.21349v5 |
Authors (8): Zeyuan Li, Yangfan He, Lewei He, Jianhui Wang, Tianyu Shi, Bin Lei, Yuchen Li, Qiuwu Chen
Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding scenarios. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON). FALCON is structured into two hierarchical levels. From the global level, long-term memory improves code quality by retaining and applying learned knowledge. At the local level, short-term memory allows for the incorporation of immediate feedback from compilers and AI systems. Additionally, we introduce meta-reinforcement learning with feedback rewards to solve the global-local bi-level optimization problem and enhance the model’s adaptability across diverse code generation tasks. Extensive experiments demonstrate that our technique achieves state-of-the-art performance, leading other reinforcement learning methods by more than 4.5 percentage points on the MBPP benchmark and 6.1 percentage points on the Humaneval benchmark. The open-sourced code is publicly available at https://github.com/titurte/FALCON.
nan
Article 1318
Title@2025-06-19 (4): AutomataGPT: Forecasting and Ruleset Inference for Two-Dimensional Cellular Automata
Title: AutomataGPT: Forecasting and Ruleset Inference for Two-Dimensional Cellular Automata | AutomataGPT: Prognose und Regelschluss für zweidimensionale zelluläre Automata | AutomataGPT: 两维细胞自动数据预测和规则推理 2506.17333v1 |
Authors (3): Jaime A. Berkovich, Noah S. David, Markus J. Buehler
Cellular automata (CA) provide a minimal formalism for investigating how simple local interactions generate rich spatiotemporal behavior in domains as diverse as traffic flow, ecology, tissue morphogenesis and crystal growth. However, automatically discovering the local update rules for a given phenomenon and using them for quantitative prediction remains challenging. Here we present AutomataGPT, a decoder-only transformer pretrained on around 1 million simulated trajectories that span 100 distinct two-dimensional binary deterministic CA rules on toroidal grids. When evaluated on previously unseen rules drawn from the same CA family, AutomataGPT attains 98.5% perfect one-step forecasts and reconstructs the governing update rule with up to 96% functional (application) accuracy and 82% exact rule-matrix match. These results demonstrate that large-scale pretraining over wider regions of rule space yields substantial generalization in both the forward (state forecasting) and inverse (rule inference) problems, without hand-crafted priors. By showing that transformer models can faithfully infer and execute CA dynamics from data alone, our work lays the groundwork for abstracting real-world dynamical phenomena into data-efficient CA surrogates, opening avenues in biology, tissue engineering, physics and AI-driven scientific discovery.
nan
Article 1319
Title@2025-06-19 (4): DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling
Title: DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling | DynScaling: Effizientes Verifier-freies Inferenzscaling über dynamische und integrierte Sampling | DynSACLAG:通过动态和综合抽样,提高验证人无引文的有效比例 2506.16043v1 |
Authors (5): Fei Wang, Xingchen Wan, Ruoxi Sun, Jiefeng Chen, Sercan Ö. Arık
Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation. Yet, its practical application is often hindered by reliance on external verifiers or a lack of optimization for realistic computational constraints. We propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework. The integrated sampling strategy unifies parallel and sequential sampling by constructing synthetic sequential reasoning chains from initially independent parallel responses, promoting diverse and coherent reasoning trajectories. The dynamic budget allocation framework formulates the allocation of computational resources as a multi-armed bandit problem, adaptively distributing the inference budget across queries based on the uncertainty of previously sampled responses, thereby maximizing computational efficiency. By combining these components, DynScaling effectively improves LLM performance under practical resource constraints without the need for external verifiers. Experimental results demonstrate that DynScaling consistently surpasses existing verifier-free inference scaling baselines in both task performance and computational cost.
nan
Article 1320
Title@2025-06-19 (4): OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
Title: OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents | OSWorld-Human: Benchmarking der Effizienz von Computer-Use Agents | OS 世界人类:计算机使用代理的效率基准 2506.16042v1 |
Authors (3): Reyna Abhyankar, Qi Qi, Yiying Zhang
Generative AI is being leveraged to solve a variety of computer-use tasks involving desktop applications. State-of-the-art systems have focused solely on improving accuracy on leading benchmarks. However, these systems are practically unusable due to extremely high end-to-end latency (e.g., tens of minutes) for tasks that typically take humans just a few minutes to complete. To understand the cause behind this and to guide future developments of computer agents, we conduct the first study on the temporal performance of computer-use agents on OSWorld, the flagship benchmark in computer-use AI. We find that large model calls for planning and reflection account for the majority of the overall latency, and as an agent uses more steps to complete a task, each successive step can take 3x longer than steps at the beginning of a task. We then construct OSWorld-Human, a manually annotated version of the original OSWorld dataset that contains a human-determined trajectory for each task. We evaluate 16 agents on their efficiency using OSWorld-Human and found that even the highest-scoring agents on OSWorld take 1.4-2.7x more steps than necessary.
nan
Article 1321
Title@2025-06-19 (4): Enhancing Document-Level Question Answering via Multi-Hop Retrieval-Augmented Generation with LLaMA 3
Title: Enhancing Document-Level Question Answering via Multi-Hop Retrieval-Augmented Generation with LLaMA 3 | Verbesserung der Dokumenten-Fragebeantwortung mittels Multi-Hop Retrieval-Augmented Generation mit LLaMA 3 | 通过多层检索-提法一代加强文件层面的回答问题,LLAMA 3 2506.16037v1 |
Authors (6): Xinyue Huang, Ziqi Lin, Fang Sun, Wenchao Zhang, Kejian Tong, Yunbo Liu
This paper presents a novel Retrieval-Augmented Generation (RAG) framework tailored for complex question answering tasks, addressing challenges in multi-hop reasoning and contextual understanding across lengthy documents. Built upon LLaMA 3, the framework integrates a dense retrieval module with advanced context fusion and multi-hop reasoning mechanisms, enabling more accurate and coherent response generation. A joint optimization strategy combining retrieval likelihood and generation cross-entropy improves the model’s robustness and adaptability. Experimental results show that the proposed system outperforms existing retrieval-augmented and generative baselines, confirming its effectiveness in delivering precise, contextually grounded answers.
nan
Article 1322
Title@2025-06-19 (4): Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Title: Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding | Vision-geführtes Chunking ist alles, was Sie brauchen: Verbesserung der RAG durch multimodales Dokumentenverständnis | 愿景引导的决赛是您所需要的:用多模式文件理解加强RAG 2506.16035v1 |
Authors (5): Vishesh Tripathi, Tanmay Odapally, Indraneel Das, Uday Allu, Biddwan Ahmed
Retrieval-Augmented Generation (RAG) systems have revolutionized information retrieval and question answering, but traditional text-based chunking methods struggle with complex document structures, multi-page tables, embedded figures, and contextual dependencies across page boundaries. We present a novel multimodal document chunking approach that leverages Large Multimodal Models (LMMs) to process PDF documents in batches while maintaining semantic coherence and structural integrity. Our method processes documents in configurable page batches with cross-batch context preservation, enabling accurate handling of tables spanning multiple pages, embedded visual elements, and procedural content. We evaluate our approach on a curated dataset of PDF documents with manually crafted queries, demonstrating improvements in chunk quality and downstream RAG performance. Our vision-guided approach achieves better accuracy compared to traditional vanilla RAG systems, with qualitative analysis showing superior preservation of document structure and semantic coherence.
nan
Article 1323
Title@2025-06-19 (4): DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
Title: DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models | DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen | 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v4 |
Authors (2): Ziwei Yang, Takeyuki Tamura
In genome-scale constraint-based metabolic models, gene deletion strategies are crucial for achieving growth-coupled production, where cell growth and target metabolite production are simultaneously achieved. While computational methods for calculating gene deletions have been widely explored and contribute to developing gene deletion strategy databases, current approaches are limited in leveraging new data-driven paradigms, such as machine learning, for more efficient strain design. Therefore, it is necessary to propose a fundamental framework for this objective. In this study, we first formulate the problem of gene deletion strategy prediction and then propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models. The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation, enabling the automatic gene deletion strategy prediction. Computational experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over baseline methods. Specifically, the proposed framework achieves a 14.69%, 22.52%, and 13.03% increase in overall accuracy across three metabolic models of different scales under study, while maintaining balanced precision and recall in predicting gene deletion statuses. The source code and examples for the framework are publicly available at https://github.com/MetNetComp/DeepGDel.
nan
Article 1324
Title@2025-06-19 (4): A Scalable Factorization Approach for High-Order Structured Tensor Recovery
Title: A Scalable Factorization Approach for High-Order Structured Tensor Recovery | Ein skalierbarer Factorisierungsansatz für die hochordentlich strukturierte Tensor-Wiederherstellung | 高分结构结构梯度恢复的可缩放因数化办法 2506.16032v1 |
Authors (3): Zhen Qin, Michael B. Wakin, Zhihui Zhu
Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal processing, machine learning, and quantum physics. A computationally and memory-efficient approach to these problems is to optimize directly over the factors using local search algorithms such as gradient descent, a strategy known as the factorization approach in matrix and tensor optimization. However, the resulting optimization problems are highly nonconvex due to the multiplicative interactions between factors, posing significant challenges for convergence analysis and recovery guarantees. In this paper, we present a unified framework for the factorization approach to solving various tensor decomposition problems. Specifically, by leveraging the canonical form of tensor decompositions–where most factors are constrained to be orthonormal to mitigate scaling ambiguity–we apply Riemannian gradient descent (RGD) to optimize these orthonormal factors on the Stiefel manifold. Under a mild condition on the loss function, we establish a Riemannian regularity condition for the factorized objective and prove that RGD converges to the ground-truth tensor at a linear rate when properly initialized. Notably, both the initialization requirement and the convergence rate scale polynomially rather than exponentially with $N$, improving upon existing results for Tucker and tensor-train format tensors.
nan
Article 1325
Title@2025-06-19 (4): V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models
Title: V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models | V2X-VLM: End-to-End V2X kooperatives autonomes Fahren durch große Vision-Sprache Modelle | V2X-VLM:通过大型视觉语言模型自主驾驶的终端到终端V2X合作社 2408.09251v3 |
Authors (9): Junwei You, Haotian Shi, Zhuoyu Jiang, Zilin Huang, Rui Gan, Keshu Wu, Xi Cheng, Xiaopeng Li, Bin Ran
Vehicle-to-everything (V2X) cooperation has emerged as a promising paradigm to overcome the perception limitations of classical autonomous driving by leveraging information from both ego-vehicle and infrastructure sensors. However, effectively fusing heterogeneous visual and semantic information while ensuring robust trajectory planning remains a significant challenge. This paper introduces V2X-VLM, a novel end-to-end (E2E) cooperative autonomous driving framework based on vision-language models (VLMs). V2X-VLM integrates multiperspective camera views from vehicles and infrastructure with text-based scene descriptions to enable a more comprehensive understanding of driving environments. Specifically, we propose a contrastive learning-based mechanism to reinforce the alignment of heterogeneous visual and textual characteristics, which enhances the semantic understanding of complex driving scenarios, and employ a knowledge distillation strategy to stabilize training. Experiments on a large real-world dataset demonstrate that V2X-VLM achieves state-of-the-art trajectory planning accuracy, significantly reducing L2 error and collision rate compared to existing cooperative autonomous driving baselines. Ablation studies validate the contributions of each component. Moreover, the evaluation of robustness and efficiency highlights the practicality of V2X-VLM for real-world deployment to enhance overall autonomous driving safety and decision-making.
nan
Article 1326
Title@2025-06-19 (4): Multi-agent Multi-armed Bandits with Minimum Reward Guarantee Fairness
Title: Multi-agent Multi-armed Bandits with Minimum Reward Guarantee Fairness | Multi-Agent Multi-Armed Bandits mit Mindestprämie Garantie Fairness | 具有最低奖励保证公平性的多武装多武装多武装强盗 2502.15240v2 |
Authors (4): Piyushi Manupriya, Himanshu, SakethaNath Jagarlapudi, Ganesh Ghalme
We investigate the problem of maximizing social welfare while ensuring fairness in a multi-agent multi-armed bandit (MA-MAB) setting. In this problem, a centralized decision-maker takes actions over time, generating random rewards for various agents. Our goal is to maximize the sum of expected cumulative rewards, a.k.a. social welfare, while ensuring that each agent receives an expected reward that is at least a constant fraction of the maximum possible expected reward. Our proposed algorithm, RewardFairUCB, leverages the Upper Confidence Bound (UCB) technique to achieve sublinear regret bounds for both fairness and social welfare. The fairness regret measures the positive difference between the minimum reward guarantee and the expected reward of a given policy, whereas the social welfare regret measures the difference between the social welfare of the optimal fair policy and that of the given policy. We show that RewardFairUCB algorithm achieves instance-independent social welfare regret guarantees of $\tilde{O}(T^{1/2})$ and a fairness regret upper bound of $\tilde{O}(T^{3/4})$. We also give the lower bound of $\Omega(\sqrt{T})$ for both social welfare and fairness regret. We evaluate RewardFairUCB’s performance against various baseline and heuristic algorithms using simulated data and real world data, highlighting trade-offs between fairness and social welfare regrets.
nan
Article 1327
Title@2025-06-19 (4): Conformal prediction for frequency-severity modeling
Title: Conformal prediction for frequency-severity modeling | Konforme Vorhersage für Frequenz-Schwere-Modellierung | 频率比重建模非正式预测 2307.13124v4 |
Authors (4): Helton Graziadei, Paulo C. Marques F., Eduardo F. L. de Melo, Rodrigo S. Targino
We present a model-agnostic framework for the construction of prediction intervals of insurance claims, with finite sample statistical guarantees, extending the technique of split conformal prediction to the domain of two-stage frequency-severity modeling. The framework effectiveness is showcased with simulated and real datasets using classical parametric models and contemporary machine learning methods. When the underlying severity model is a random forest, we extend the two-stage split conformal prediction algorithm, showing how the out-of-bag mechanism can be leveraged to eliminate the need for a calibration set in the conformal procedure.
nan
Article 1328
Title@2025-06-19 (4): EvoLM: In Search of Lost Language Model Training Dynamics
Title: EvoLM: In Search of Lost Language Model Training Dynamics | EvoLM: Auf der Suche nach verlorenen Sprachmodellen | EvoLM: 寻找失传语言培训模式 2506.16029v1 |
Authors (9): Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric Xing, Sham Kakade, Hanlin Zhang
Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, a model suite that enables systematic and transparent analysis of LMs’ training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning. By training over 100 LMs with 1B and 4B parameters from scratch, we rigorously evaluate both upstream (language modeling) and downstream (problem-solving) reasoning capabilities, including considerations of both in-domain and out-of-domain generalization. Key insights highlight the diminishing returns from excessive pre-training and post-training, the importance and practices of mitigating forgetting during domain-specific continued pre-training, the crucial role of continued pre-training in bridging pre-training and post-training phases, and various intricate trade-offs when configuring supervised fine-tuning and reinforcement learning. To facilitate open research and reproducibility, we release all pre-trained and post-trained models, training datasets for all stages, and our entire training and evaluation pipeline.
nan
Article 1329
Title@2025-06-19 (4): Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models
Title: Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models | Min-p, Max Übertreibung: Eine kritische Analyse der Min-p-Sampling in Sprachmodellen | Min-p, Max Explation: 对语言模型的 Min-p 抽样的批判性分析 2506.13681v2 |
Authors (3): Rylan Schaeffer, Joshua Kazdan, Yegor Denisov-Blanch
Sampling from language models impacts the quality and diversity of outputs, affecting both research and real-world applications. Recently, Nguyen et al. 2024’s “Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs” introduced a new sampler called min-p, claiming it achieves superior quality and diversity over established samplers such as basic, top-k, and top-p sampling. The significance of these claims was underscored by the paper’s recognition as the 18th highest-scoring submission to ICLR 2025 and selection for an Oral presentation. This paper conducts a comprehensive re-examination of the evidence supporting min-p and reaches different conclusions from the original paper’s four lines of evidence. First, the original paper’s human evaluations omitted data, conducted statistical tests incorrectly, and described qualitative feedback inaccurately; our reanalysis demonstrates min-p did not outperform baselines in quality, diversity, or a trade-off between quality and diversity; in response to our findings, the authors of the original paper conducted a new human evaluation using a different implementation, task, and rubric that nevertheless provides further evidence min-p does not improve over baselines. Second, comprehensively sweeping the original paper’s NLP benchmarks reveals min-p does not surpass baselines when controlling for the number of hyperparameters. Third, the original paper’s LLM-as-a-Judge evaluations lack methodological clarity and appear inconsistently reported. Fourth, community adoption claims (49k GitHub repositories, 1.1M GitHub stars) were found to be unsubstantiated, leading to their removal; the revised adoption claim remains misleading. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.
nan
Article 1330
Title@2025-06-19 (4): Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis
Title: Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis | Effiziente Videoannotation im Einzelhandel: Robuster Ansatz zur Erstellung von Schlüsselrahmen für die Analyse von Produkt- und Kundeninteraktion | 高效零售视频注释:产品和客户互动分析的强有力关键框架生成方法 2506.14854v2 |
Authors (2): Varun Mannam, Zhenyu Shi
Accurate video annotation plays a vital role in modern retail applications, including customer behavior analysis, product interaction detection, and in-store activity recognition. However, conventional annotation methods heavily rely on time-consuming manual labeling by human annotators, introducing non-robust frame selection and increasing operational costs. To address these challenges in the retail domain, we propose a deep learning-based approach that automates key-frame identification in retail videos and provides automatic annotations of products and customers. Our method leverages deep neural networks to learn discriminative features by embedding video frames and incorporating object detection-based techniques tailored for retail environments. Experimental results showcase the superiority of our approach over traditional methods, achieving accuracy comparable to human annotator labeling while enhancing the overall efficiency of retail video annotation. Remarkably, our approach leads to an average of 2 times cost savings in video annotation. By allowing human annotators to verify/adjust less than 5% of detected frames in the video dataset, while automating the annotation process for the remaining frames without reducing annotation quality, retailers can significantly reduce operational costs. The automation of key-frame detection enables substantial time and effort savings in retail video labeling tasks, proving highly valuable for diverse retail applications such as shopper journey analysis, product interaction detection, and in-store security monitoring.
nan
Article 1331
Title@2025-06-19 (4): Bridging Brain with Foundation Models through Self-Supervised Learning
Title: Bridging Brain with Foundation Models through Self-Supervised Learning | Gehirn mit Grundmodellen durch Selbstüberwachtes Lernen überbrücken | 通过自学学习与基金会模式架架架脑架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架架 2506.16009v1 |
Authors (5): Hamdi Altaheri, Fakhri Karray, Md. Milon Islam, S M Taslim Uddin Raju, Amir-Hossein Karimi
Foundation models (FMs), powered by self-supervised learning (SSL), have redefined the capabilities of artificial intelligence, demonstrating exceptional performance in domains like natural language processing and computer vision. These advances present a transformative opportunity for brain signal analysis. Unlike traditional supervised learning, which is limited by the scarcity of labeled neural data, SSL offers a promising solution by enabling models to learn meaningful representations from unlabeled data. This is particularly valuable in addressing the unique challenges of brain signals, including high noise levels, inter-subject variability, and low signal-to-noise ratios. This survey systematically reviews the emerging field of bridging brain signals with foundation models through the innovative application of SSL. It explores key SSL techniques, the development of brain-specific foundation models, their adaptation to downstream tasks, and the integration of brain signals with other modalities in multimodal SSL frameworks. The review also covers commonly used evaluation metrics and benchmark datasets that support comparative analysis. Finally, it highlights key challenges and outlines future research directions. This work aims to provide researchers with a structured understanding of this rapidly evolving field and a roadmap for developing generalizable brain foundation models powered by self-supervision.
nan
Article 1332
Title@2025-06-19 (4): Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
Title: Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning | Jeder Rang könnte ein Experte sein: Ein-Ranked-Mixtur von Experten LoRA für Multi-Task-Learning | 每一级别都可以是一名专家:多任务学习专家LORA的单条混合体 2501.15103v2 |
Authors (13): Ziyu Zhao, Yixiao Zhou, Zhi Zhang, Didi Zhu, Tao Shen, Zexi Li, Jinluan Yang, Xuwu Wang, Jing Su, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng
Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity. Meanwhile, vanilla LoRA struggles with task conflicts in multi-task scenarios. Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules. While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks. In this paper, we establish a connection between single LoRA and multi-LoRA MoE, integrating them into a unified framework. We demonstrate that the dynamic routing of multiple LoRAs is functionally equivalent to rank partitioning and block-level activation within a single LoRA. We further empirically demonstrate that finer-grained LoRA partitioning, within the same total and activated parameter constraints, leads to better performance gains across heterogeneous tasks. Building on these findings, we propose Single-ranked Mixture of Experts LoRA (\textbf{SMoRA}), which embeds MoE into LoRA by \textit{treating each rank as an independent expert}. With a \textit{dynamic rank-wise activation} mechanism, SMoRA promotes finer-grained knowledge sharing while mitigating task conflicts. Experiments demonstrate that SMoRA activates fewer parameters yet achieves better performance in multi-task scenarios.
nan
Article 1333
Title@2025-06-19 (4): Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advances
Title: Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advances | Out-of-Distribution Detection: Eine aufgabenorientierte Umfrage über die jüngsten Fortschritte | 分销外探测:最近进展的专案调查 2409.11884v3 |
Authors (6): Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, Jian Liang
Out-of-distribution (OOD) detection aims to detect test samples outside the training category space, which is an essential component in building reliable machine learning systems. Existing reviews on OOD detection primarily focus on method taxonomy, surveying the field by categorizing various approaches. However, many recent works concentrate on non-traditional OOD detection scenarios, such as test-time adaptation, multi-modal data sources and other novel contexts. In this survey, we uniquely review recent advances in OOD detection from the task-oriented perspective for the first time. According to the user’s access to the model, that is, whether the OOD detection method is allowed to modify or retrain the model, we classify the methods as training-driven or training-agnostic. Besides, considering the rapid development of pre-trained models, large pre-trained model-based OOD detection is also regarded as an important category and discussed separately. Furthermore, we provide a discussion of the evaluation scenarios, a variety of applications, and several future research directions. We believe this survey with new taxonomy will benefit the proposal of new methods and the expansion of more practical scenarios. A curated list of related papers is provided in the Github repository: https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection.
nan
Article 1334
Title@2025-06-19 (4): Data-Agnostic Cardinality Learning from Imperfect Workloads
Title: Data-Agnostic Cardinality Learning from Imperfect Workloads | Daten-agnostische Kardinalität Lernen aus unvollkommenen Arbeitsbelastungen | 从不完美工作量中学习 2506.16007v1 |
Authors (6): Peizhi Wu, Rong Kang, Tieying Zhang, Jianjun Chen, Ryan Marcus, Zachary G. Ives
Cardinality estimation (CardEst) is a critical aspect of query optimization. Traditionally, it leverages statistics built directly over the data. However, organizational policies (e.g., regulatory compliance) may restrict global data access. Fortunately, query-driven cardinality estimation can learn CardEst models using query workloads. However, existing query-driven models often require access to data or summaries for best performance, and they assume perfect training workloads with complete and balanced join templates (or join graphs). Such assumptions rarely hold in real-world scenarios, in which join templates are incomplete and imbalanced. We present GRASP, a data-agnostic cardinality learning system designed to work under these real-world constraints. GRASP’s compositional design generalizes to unseen join templates and is robust to join template imbalance. It also introduces a new per-table CardEst model that handles value distribution shifts for range predicates, and a novel learned count sketch model that captures join correlations across base relations. Across three database instances, we demonstrate that GRASP consistently outperforms existing query-driven models on imperfect workloads, both in terms of estimation accuracy and query latency. Remarkably, GRASP achieves performance comparable to, or even surpassing, traditional approaches built over the underlying data on the complex CEB-IMDb-full benchmark – despite operating without any data access and using only 10% of all possible join templates.
nan
Article 1335
Title@2025-06-19 (4): On Domain-Adaptive Post-Training for Multimodal Large Language Models
Title: On Domain-Adaptive Post-Training for Multimodal Large Language Models | Zum Domain-Adaptive Post-Training für multimodale große Sprachmodelle | 关于多模式大语言模式的多模式后培训 2411.19930v3 |
Authors (8): Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang
Adapting general multimodal large language models (MLLMs) to specific domains, such as scientific and industrial fields, is highly significant in promoting their practical applications. This paper systematically investigates domain adaptation of MLLMs via post-training, focusing on data synthesis, training pipeline, and task evaluation. (1) Data Synthesis: Using only open-source models, we develop a generate-then-filter pipeline that curates diverse visual instruction tasks based on domain-specific image-caption pairs. The resulting data surpass the data synthesized by manual rules or strong closed-source models in enhancing domain-specific performance. (2) Training Pipeline: Unlike general MLLMs that typically adopt a two-stage training paradigm, we find that a single-stage approach is more effective for domain adaptation. (3) Task Evaluation: We conduct extensive experiments in high-impact domains such as biomedicine, food, and remote sensing, by post-training a variety of MLLMs and then evaluating MLLM performance on various domain-specific tasks. Finally, we fully open-source our models, code, and data to encourage future research in this area.
nan
Article 1336
Title@2025-06-19 (4): AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction
Title: AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction | AutoHFormer: Effizienter Hierarchischer Autoregressiver Transformer für die Vorhersage der Zeitreihen | AutoH former: 用于时间序列预测的高效的等级自动递减变换器 2506.16001v1 |
Authors (7): Qianru Zhang, Honggang Wen, Ming Li, Dong Huang, Siu-Ming Yiu, Christian S. Jensen, Pietro Liò
Time series forecasting requires architectures that simultaneously achieve three competing objectives: (1) strict temporal causality for reliable predictions, (2) sub-quadratic complexity for practical scalability, and (3) multi-scale pattern recognition for accurate long-horizon forecasting. We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges through three key innovations: 1) Hierarchical Temporal Modeling: Our architecture decomposes predictions into segment-level blocks processed in parallel, followed by intra-segment sequential refinement. This dual-scale approach maintains temporal coherence while enabling efficient computation. 2) Dynamic Windowed Attention: The attention mechanism employs learnable causal windows with exponential decay, reducing complexity while preserving precise temporal relationships. This design avoids both the anti-causal violations of standard transformers and the sequential bottlenecks of RNN hybrids. 3) Adaptive Temporal Encoding: a novel position encoding system is adopted to capture time patterns at multiple scales. It combines fixed oscillating patterns for short-term variations with learnable decay rates for long-term trends. Comprehensive experiments demonstrate that AutoHFormer 10.76X faster training and 6.06X memory reduction compared to PatchTST on PEMS08, while maintaining consistent accuracy across 96-720 step horizons in most of cases. These breakthroughs establish new benchmarks for efficient and precise time series modeling. Implementations of our method and all baselines in hierarchical autoregressive mechanism are available at https://github.com/lizzyhku/Autotime.
nan
Article 1337
Title@2025-06-19 (4): TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement
Title: TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement | TAPS: Throat and Acoustic Paired Speech Dataset für Deep Learning-based Speech Enhancement | TAPS: 用于加强深学习式语音强化的喉音和声频语音数据集 2502.11478v2 |
Authors (3): Yunsik Kim, Yonghun Song, Yoonyoung Chung
In high-noise environments such as factories, subways, and busy streets, capturing clear speech is challenging. Throat microphones can offer a solution because of their inherent noise-suppression capabilities; however, the passage of sound waves through skin and tissue attenuates high-frequency information, reducing speech clarity. Recent deep learning approaches have shown promise in enhancing throat microphone recordings, but further progress is constrained by the lack of a standard dataset. Here, we introduce the Throat and Acoustic Paired Speech (TAPS) dataset, a collection of paired utterances recorded from 60 native Korean speakers using throat and acoustic microphones. Furthermore, an optimal alignment approach was developed and applied to address the inherent signal mismatch between the two microphones. We tested three baseline deep learning models on the TAPS dataset and found mapping-based approaches to be superior for improving speech quality and restoring content. These findings demonstrate the TAPS dataset’s utility for speech enhancement tasks and support its potential as a standard resource for advancing research in throat microphone-based applications.
nan
Article 1338
Title@2025-06-19 (4): Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG
Title: Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG | Tuning-Free Coreset Markov Kette Monte Carlo über Hot Dog | 通过Hot DoG连线蒙特卡洛(Monte Carlo) 2410.18973v2 |
Authors (3): Naitong Chen, Jonathan H. Huggins, Trevor Campbell
A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, Coreset Markov chain Monte Carlo (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, Hot-start Distance over Gradient (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. We provide a theoretical analysis of the convergence of the coreset weights produced by Hot DoG. We also provide empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.
nan
Article 1339
Title@2025-06-19 (4): A Comprehensive Survey on Continual Learning in Generative Models
Title: A Comprehensive Survey on Continual Learning in Generative Models | Eine umfassende Umfrage zum kontinuierlichen Lernen in generativen Modellen | 关于以创建模式持续学习的综合调查 2506.13045v3 |
Authors (12): Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao, Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
The rapid advancement of generative models has enabled modern AI systems to comprehend and produce highly sophisticated content, even achieving human-level performance in specific domains. However, these models remain fundamentally constrained by catastrophic forgetting - a persistent challenge where adapting to new tasks typically leads to significant degradation in performance on previously learned tasks. To address this practical limitation, numerous approaches have been proposed to enhance the adaptability and scalability of generative models in real-world applications. In this work, we present a comprehensive survey of continual learning methods for mainstream generative models, including large language models, multimodal large language models, vision language action models, and diffusion models. Drawing inspiration from the memory mechanisms of the human brain, we systematically categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based methods, while elucidating their underlying methodologies and motivations. We further analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones, offering deeper insights into the field. The project page of this paper is available at https://github.com/Ghy0501/Awesome-Continual-Learning-in-Generative-Models.
nan
Article 1340
Title@2025-06-19 (4): Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging
Title: Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging | Heterogen-Modal Unüberwachte Domain-Anpassung über Latent Space Bridging | 通过低空空间连接对域进行无监督的适应 2506.15971v1 |
Authors (5): Jiawen Yang, Shuhao Chen, Yucong Duan, Ke Tang, Yu Zhang
Unsupervised domain adaptation (UDA) methods effectively bridge domain gaps but become struggled when the source and target domains belong to entirely distinct modalities. To address this limitation, we propose a novel setting called Heterogeneous-Modal Unsupervised Domain Adaptation (HMUDA), which enables knowledge transfer between completely different modalities by leveraging a bridge domain containing unlabeled samples from both modalities. To learn under the HMUDA setting, we propose Latent Space Bridging (LSB), a specialized framework designed for the semantic segmentation task. Specifically, LSB utilizes a dual-branch architecture, incorporating a feature consistency loss to align representations across modalities and a domain alignment loss to reduce discrepancies between class centroids across domains. Extensive experiments conducted on six benchmark datasets demonstrate that LSB achieves state-of-the-art performance.
nan
Article 1341
Title@2025-06-19 (4): LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
Title: LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning | LazyEviction: Verlangsamte KV-Eviktion mit Aufmerksamkeitsmusterbeobachtung für effizientes Long Reasoning | LazyEvition: 以关注方式对有效长长理由进行观察的Lucking KV驱逐 2506.15969v1 |
Authors (5): Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo
Large Language Models (LLMs) exhibit enhanced reasoning capabilities by employing Chain-of-Thought (CoT). However, the extended reasoning sequences introduce significant GPU memory overhead due to increased key-value (KV) cache size, particularly in tasks requiring long reasoning sequences, such as mathematics and programming. Existing KV cache compression methods mitigate memory bottlenecks but struggle in long reasoning tasks. In this paper, we analyze attention patterns in reasoning tasks and reveal a Token Importance Recurrence phenomenon: a large proportion of tokens receive renewed attention after multiple decoding steps, which is failed to capture by existing works and may lead to unpredictable eviction on such periodically critical tokens. To address this, we propose LazyEviction, a lagged KV eviction framework designed to maintain reasoning performance while reducing KV memory. LazyEviction is an Observation Window-based Lagged Eviction Mechanism retaining latent recurring tokens by performing lagged evictions across decoding steps, which contains two key components: (1) Recurrence Interval Tracking for capturing temporal variations in token importance, and (2) an Maximum Recurrence Interval-Centric Eviction Policy that prioritizes eviction based on tokens’ recurrence patterns. Extensive experiments demonstrate that LazyEviction reduces KV cache size by 50% while maintaining comparable accuracy on mathematics reasoning datasets, outperforming state-of-the-art methods. Our findings highlight the importance of preserving recurring tokens, which are critical for maintaining knowledge continuity in multi-step reasoning tasks.
nan
Article 1342
Title@2025-06-19 (4): Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Title: Two Heads Are Better than One: Simulating Large Transformers with Small Ones | Zwei Köpfe sind besser als einer: Große Transformer mit kleinen zu simulieren | 两头胜于一:模拟大型变形器,使用小头变形器 2506.12220v2 |
Authors (2): Hantao Yu, Josh Alman
The quadratic complexity of self-attention prevents transformers from scaling effectively to long input sequences. On the other hand, modern GPUs and other specialized hardware accelerators are well-optimized for processing small input sequences in transformers during both training and inference. A natural question arises: can we take advantage of the efficiency of small transformers to deal with long input sequences? In this paper, we show that transformers with long input sequences (large transformers) can be efficiently simulated by transformers that can only take short input sequences (small transformers). Specifically, we prove that any transformer with input length $N$ can be efficiently simulated by only $O((N/M)^2)$ transformers with input length $M \ll N$, and that this cannot be improved in the worst case. However, we then prove that in various natural scenarios including average-case inputs, sliding window masking and attention sinks, the optimal number $O(N/M)$ of small transformers suffice.
nan
Article 1343
Title@2025-06-19 (4): Bridging Text and Crystal Structures: Literature-driven Contrastive Learning for Materials Science
Title: Bridging Text and Crystal Structures: Literature-driven Contrastive Learning for Materials Science | Bridging Text und Kristallstrukturen: Literaturgetriebenes Kontrastives Lernen für die Materialwissenschaft | 架桥文字和水晶结构:以文学为动力的材料科学竞赛学习 2501.12919v2 |
Authors (7): Yuta Suzuki, Tatsunori Taniai, Ryo Igarashi, Kotaro Saito, Naoya Chiba, Yoshitaka Ushiku, Kanta Ono
Understanding structure-property relationships is an essential yet challenging aspect of materials discovery and development. To facilitate this process, recent studies in materials informatics have sought latent embedding spaces of crystal structures to capture their similarities based on properties and functionalities. However, abstract feature-based embedding spaces are human-unfriendly and prevent intuitive and efficient exploration of the vast materials space. Here we introduce Contrastive Language–Structure Pre-training (CLaSP), a learning paradigm for constructing crossmodal embedding spaces between crystal structures and texts. CLaSP aims to achieve material embeddings that 1) capture property- and functionality-related similarities between crystal structures and 2) allow intuitive retrieval of materials via user-provided description texts as queries. To compensate for the lack of sufficient datasets linking crystal structures with textual descriptions, CLaSP leverages a dataset of over 400,000 published crystal structures and corresponding publication records, including paper titles and abstracts, for training. We demonstrate the effectiveness of CLaSP through text-based crystal structure screening and embedding space visualization.
nan
Article 1344
Title@2025-06-19 (4): On the Theoretical Understanding of Identifiable Sparse Autoencoders and Beyond
Title: On the Theoretical Understanding of Identifiable Sparse Autoencoders and Beyond | Über das theoretische Verständnis identifizierbarer Sparse Autoencoder und darüber hinaus | 关于可辨识的微缩自动编码器理论理解及以后问题 2506.15963v1 |
Authors (4): Jingyi Cui, Qi Zhang, Yifei Wang, Yisen Wang
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting features learned by large language models (LLMs). It aims to recover complex superposed polysemantic features into interpretable monosemantic ones through feature reconstruction via sparsely activated neural networks. Despite the wide applications of SAEs, it remains unclear under what conditions an SAE can fully recover the ground truth monosemantic features from the superposed polysemantic ones. In this paper, through theoretical analysis, we for the first time propose the necessary and sufficient conditions for identifiable SAEs (SAEs that learn unique and ground truth monosemantic features), including 1) extreme sparsity of the ground truth feature, 2) sparse activation of SAEs, and 3) enough hidden dimensions of SAEs. Moreover, when the identifiable conditions are not fully met, we propose a reweighting strategy to improve the identifiability. Specifically, following the theoretically suggested weight selection principle, we prove that the gap between the loss functions of SAE reconstruction and monosemantic feature reconstruction can be narrowed, so that the reweighted SAEs have better reconstruction of the ground truth monosemantic features than the uniformly weighted ones. In experiments, we validate our theoretical findings and show that our weighted SAE significantly improves feature monosemanticity and interpretability.
nan
Article 1345
Title@2025-06-19 (4): Learning Model Successors
Title: Learning Model Successors | Nachfolger von Lernmodellen | 学习模式继承人 2502.00197v2 |
Authors (2): Yingshan Chang, Yonatan Bisk
The notion of generalization has moved away from the classical one defined in statistical learning theory towards an emphasis on out-of-domain generalization (OODG). There has been a growing focus on generalization from easy to hard, where a progression of difficulty implicitly governs the direction of domain shifts. This emerging regime has appeared in the literature under different names, such as length/logical/algorithmic extrapolation, but a formal definition is lacking. We argue that the unifying theme is induction – based on finite samples observed in training, a learner should infer an inductive principle that applies in an unbounded manner. This work formalizes the notion of inductive generalization along a difficulty progression and argues that our path ahead lies in transforming the learning paradigm. We attempt to make inroads by proposing a novel learning paradigm, Inductive Learning, which involves a central concept called model successors. We outline practical steps to adapt well-established techniques towards learning model successors. This work calls for restructuring of the research discussion around induction and generalization from fragmented task-centric communities to a more unified effort, focused on universal properties of learning and computation.
nan
Article 1346
Title@2025-06-19 (4): Contactless Precision Steering of Particles in a Fluid inside a Cube with Rotating Walls
Title: Contactless Precision Steering of Particles in a Fluid inside a Cube with Rotating Walls | Kontaktlose Präzisionslenkung von Partikeln in einer Flüssigkeit in einem Würfel mit rotierenden Wänden | 带旋转墙的立方体内流流体中的粒子无接触精确度指示器 2506.15958v1 |
Authors (3): Lucas Amoudruz, Petr Karnakov, Petros Koumoutsakos
Contactless manipulation of small objects is essential for biomedical and chemical applications, such as cell analysis, assisted fertilisation, and precision chemistry. Established methods, including optical, acoustic, and magnetic tweezers, are now complemented by flow control techniques that use flow-induced motion to enable precise and versatile manipulation. However, trapping multiple particles in fluid remains a challenge. This study introduces a novel control algorithm capable of steering multiple particles in flow. The system uses rotating disks to generate flow fields that transport particles to precise locations. Disk rotations are governed by a feedback control policy based on the Optimising a Discrete Loss (ODIL) framework, which combines fluid dynamics equations with path objectives into a single loss function. Our experiments, conducted in both simulations and with the physical device, demonstrate the capability of the approach to transport two beads simultaneously to predefined locations, advancing robust contactless particle manipulation for biomedical applications.
nan
Article 1347
Title@2025-06-19 (4): One Period to Rule Them All: Identifying Critical Learning Periods in Deep Networks
Title: One Period to Rule Them All: Identifying Critical Learning Periods in Deep Networks | Eine Periode, um sie alle zu beherrschen: Kritische Lernphasen in tiefen Netzwerken identifizieren | 确定深网络的关键学习期 2506.15954v1 |
Authors (6): Vinicius Yuiti Fukase, Heitor Gama, Barbara Bueno, Lucas Libanio, Anna Helena Reali Costa, Artur Jordao
Critical Learning Periods comprehend an important phenomenon involving deep learning, where early epochs play a decisive role in the success of many training recipes, such as data augmentation. Existing works confirm the existence of this phenomenon and provide useful insights. However, the literature lacks efforts to precisely identify when critical periods occur. In this work, we fill this gap by introducing a systematic approach for identifying critical periods during the training of deep neural networks, focusing on eliminating computationally intensive regularization techniques and effectively applying mechanisms for reducing computational costs, such as data pruning. Our method leverages generalization prediction mechanisms to pinpoint critical phases where training recipes yield maximum benefits to the predictive ability of models. By halting resource-intensive recipes beyond these periods, we significantly accelerate the learning phase and achieve reductions in training time, energy consumption, and CO$_2$ emissions. Experiments on standard architectures and benchmarks confirm the effectiveness of our method. Specifically, we achieve significant milestones by reducing the training time of popular architectures by up to 59.67%, leading to a 59.47% decrease in CO$_2$ emissions and a 60% reduction in financial costs, without compromising performance. Our work enhances understanding of training dynamics and paves the way for more sustainable and efficient deep learning practices, particularly in resource-constrained environments. In the era of the race for foundation models, we believe our method emerges as a valuable framework. The repository is available at https://github.com/baunilhamarga/critical-periods
nan
Article 1348
Title@2025-06-19 (4): Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments
Title: Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments | Hierarchisches und Modulares Netzwerk zur nicht-prähensilen Manipulation in allgemeinen Umgebungen | 关于一般环境中非流行病操纵的等级和模块网络 2502.20843v2 |
Authors (4): Yoonyoung Cho, Junhyek Han, Jisu Han, Beomjoon Kim
For robots to operate in general environments like households, they must be able to perform non-prehensile manipulation actions such as toppling and rolling to manipulate ungraspable objects. However, prior works on non-prehensile manipulation cannot yet generalize across environments with diverse geometries. The main challenge lies in adapting to varying environmental constraints: within a cabinet, the robot must avoid walls and ceilings; to lift objects to the top of a step, the robot must account for the step’s pose and extent. While deep reinforcement learning (RL) has demonstrated impressive success in non-prehensile manipulation, accounting for such variability presents a challenge for the generalist policy, as it must learn diverse strategies for each new combination of constraints. To address this, we propose a modular and reconfigurable architecture that adaptively reconfigures network modules based on task requirements. To capture the geometric variability in environments, we extend the contact-based object representation (CORN) to environment geometries, and propose a procedural algorithm for generating diverse environments to train our agent. Taken together, the resulting policy can zero-shot transfer to novel real-world environments and objects despite training entirely within a simulator. We additionally release a simulation-based benchmark featuring nine digital twins of real-world scenes with 353 objects to facilitate non-prehensile manipulation research in realistic domains.
nan
Article 1349
Title@2025-06-19 (4): Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
Title: Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments | Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten | 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v4 |
Authors (5): Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu
A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing works which derive the optimal design based on specific and relatively simple estimators, our analysis covers a range of state-of-the-art estimators developed in the reinforcement learning (RL) literature. It reveals that the effectiveness of different switchback designs depends crucially on (i) the size of the carryover effect and (ii) the auto-correlations among reward errors over time. Meanwhile, these findings are estimator-agnostic, i.e., they apply to most RL estimators. Based on these insights, we provide a workflow to offer guidelines for practitioners on designing switchback experiments in A/B testing.
nan
Article 1350
Title@2025-06-19 (4): Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning
Title: Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning | Gemeinsame Optimierung des Informationszeitalters und des Energieverbrauchs im NR-V2X-System auf Basis von Deep Reinforcement Learning | 基于深强化学习的NR-V2X系统信息和能源消耗年龄的联合优化 2407.08458v2 |
Authors (5): Shulin Song, Zheng Zhang, Qiong Wu, Qiang Fan, Pingyi Fan
Autonomous driving may be the most important application scenario of next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles. This supplements SL communication in LTE-V2X and represents the latest advancement in cellular V2X (C-V2X) with improved performance of NR-V2X. However, in NR-V2X Mode 2, resource collisions still occur, and thus degrade the age of information (AOI). Therefore, a interference cancellation method is employed to mitigate this impact by combining NR-V2X with Non-Orthogonal multiple access (NOMA) technology. In NR-V2X, when vehicles select smaller resource reservation interval (RRI), higher-frequency transmissions take ore energy to reduce AoI. Hence, it is important to jointly consider AoI and communication energy consumption based on NR-V2X communication. Then, we formulate such an optimization problem and employ the Deep Reinforcement Learning (DRL) algorithm to compute the optimal transmission RRI and transmission power for each transmitting vehicle to reduce the energy consumption of each transmitting vehicle and the AoI of each receiving vehicle. Extensive simulations have demonstrated the performance of our proposed algorithm.
nan
Article 1351
Title@2025-06-19 (4): Statistical Inference under Performativity
Title: Statistical Inference under Performativity | Statistische Schlussfolgerung unter Performativität | 性能下统计推断值 2505.18493v2 |
Authors (5): Xiang Li, Yunai Li, Huiying Zhong, Lihua Lei, Zhun Deng
Performativity of predictions refers to the phenomena that prediction-informed decisions may influence the target they aim to predict, which is widely observed in policy-making in social sciences and economics. In this paper, we initiate the study of statistical inference under performativity. Our contribution is two-fold. First, we build a central limit theorem for estimation and inference under performativity, which enables inferential purposes in policy-making such as constructing confidence intervals or testing hypotheses. Second, we further leverage the derived central limit theorem to investigate prediction-powered inference (PPI) under performativity, which is based on a small labeled dataset and a much larger dataset of machine-learning predictions. This enables us to obtain more precise estimation and improved confidence regions for the model parameter (i.e., policy) of interest in performative prediction. We demonstrate the power of our framework by numerical experiments. To the best of our knowledge, this paper is the first one to establish statistical inference under performativity, which brings up new challenges and inference settings that we believe will add significant values to policy-making, statistics, and machine learning.
nan
Article 1352
Title@2025-06-19 (4): Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach
Title: Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach | Topologie-Lernaktionen für Power Grid Control: Ein graphisch basierter Soft-Label-Lernansatz | 电网控制学习地形行动:以图表为基础的软标签模拟学习方法 2503.15190v2 |
Authors (7): Mohamed Hassouna, Clara Holzhüter, Malte Lehna, Matthijs de Jong, Jan Viebahn, Bernhard Sick, Christoph Scholz
The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion management. In this work, we contribute to this research by introducing a novel Imitation Learning (IL) approach that leverages soft labels derived from simulated topological action outcomes, thereby capturing multiple viable actions per state. Unlike traditional IL methods that rely on hard labels to enforce a single optimal action, our method constructs soft labels that capture the effectiveness of actions that prove suitable in resolving grid congestion. To further enhance decision-making, we integrate Graph Neural Networks (GNNs) to encode the structural properties of power grids, ensuring that the topology-aware representations contribute to better agent performance. Our approach significantly outperforms its hard-label counterparts as well as state-of-the-art Deep Reinforcement Learning (DRL) baseline agents. Most notably, it achieves a 17% better performance compared to the greedy expert agent from which the imitation targets were derived.
nan
Article 1353
Title@2025-06-19 (4): On the optimal regret of collaborative personalized linear bandits
Title: On the optimal regret of collaborative personalized linear bandits | Über das optimale Bedauern der kollaborativen personalisierten linearen Banditen | 合作的个人化线性强盗的最佳遗憾 2506.15943v1 |
Authors (4): Bruce Huang, Ruida Zhou, Lin F. Yang, Suhas Diggavi
Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applying single agent algorithms independently ignores cross-agent similarity and learning opportunities. This paper investigates the optimal regret achievable in collaborative personalized linear bandits. We provide an information-theoretic lower bound that characterizes how the number of agents, the interaction rounds, and the degree of heterogeneity jointly affect regret. We then propose a new two-stage collaborative algorithm that achieves the optimal regret. Our analysis models heterogeneity via a hierarchical Bayesian framework and introduces a novel information-theoretic technique for bounding regret. Our results offer a complete characterization of when and how collaboration helps with a optimal regret bound $\tilde{O}(d\sqrt{mn})$, $\tilde{O}(dm^{1-\gamma}\sqrt{n})$, $\tilde{O}(dm\sqrt{n})$ for the number of rounds $n$ in the range of $(0, \frac{d}{m \sigma^2})$, $[\frac{d}{m^{2\gamma} \sigma^2}, \frac{d}{\sigma^2}]$ and $(\frac{d}{\sigma^2}, \infty)$ respectively, where $\sigma$ measures the level of heterogeneity, $m$ is the number of agents, and $\gamma\in[0, 1/2]$ is an absolute constant. In contrast, agents without collaboration achieve a regret bound $O(dm\sqrt{n})$ at best.
nan
Article 1354
Title@2025-06-19 (4): CORAL: Disentangling Latent Representations in Long-Tailed Diffusion
Title: CORAL: Disentangling Latent Representations in Long-Tailed Diffusion | KORAL: Entwirrende Latentendarstellungen in langanhaltender Diffusion | CORAL: 在长期失败的传播中拆分内流代表处 2506.15933v1 |
Authors (6): Esther Rodriguez, Monica Welfert, Samuel McDowell, Nathan Stromberg, Julian Antolin Camarena, Lalitha Sankar
Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle – producing low-diversity and lower-quality samples for tail classes. While this degradation is well-documented, its underlying cause remains poorly understood. In this work, we investigate the behavior of diffusion models trained on long-tailed datasets and identify a key issue: the latent representations (from the bottleneck layer of the U-Net) for tail class subspaces exhibit significant overlap with those of head classes, leading to feature borrowing and poor generation quality. Importantly, we show that this is not merely due to limited data per class, but that the relative class imbalance significantly contributes to this phenomenon. To address this, we propose COntrastive Regularization for Aligning Latents (CORAL), a contrastive latent alignment framework that leverages supervised contrastive losses to encourage well-separated latent class representations. Experiments demonstrate that CORAL significantly improves both the diversity and visual quality of samples generated for tail classes relative to state-of-the-art methods.
nan
Article 1355
Title@2025-06-19 (4): Competing Bandits in Matching Markets via Super Stability
Title: Competing Bandits in Matching Markets via Super Stability | Konkurrierende Banditen in Matching Markets über Super Stabilität | 通过超级稳定在匹配市场中相互竞争的强盗 2506.15926v1 |
Authors (1): Soumya Basu
We study bandit learning in matching markets with two-sided reward uncertainty, extending prior research primarily focused on single-sided uncertainty. Leveraging the concept of `super-stability’ from Irving (1994), we demonstrate the advantage of the Extended Gale-Shapley (GS) algorithm over the standard GS algorithm in achieving true stable matchings under incomplete information. By employing the Extended GS algorithm, our centralized algorithm attains a logarithmic pessimal stable regret dependent on an instance-dependent admissible gap parameter. This algorithm is further adapted to a decentralized setting with a constant regret increase. Finally, we establish a novel centralized instance-dependent lower bound for binary stable regret, elucidating the roles of the admissible gap and super-stable matching in characterizing the complexity of stable matching with bandit feedback.
nan
Article 1356
Title@2025-06-19 (4): fairmetrics: An R package for group fairness evaluation
Title: fairmetrics: An R package for group fairness evaluation | fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit | 公平度:团体公平评估R包件 2506.06243v2 |
Authors (3): Benjamin Smith, Jianhui Gao, Jessica Gronsbell
Fairness is a growing area of machine learning (ML) that focuses on ensuring models do not produce systematically biased outcomes for specific groups, particularly those defined by protected attributes such as race, gender, or age. Evaluating fairness is a critical aspect of ML model development, as biased models can perpetuate structural inequalities. The {fairmetrics} R package offers a user-friendly framework for rigorously evaluating numerous group-based fairness criteria, including metrics based on independence (e.g., statistical parity), separation (e.g., equalized odds), and sufficiency (e.g., predictive parity). Group-based fairness criteria assess whether a model is equally accurate or well-calibrated across a set of predefined groups so that appropriate bias mitigation strategies can be implemented. {fairmetrics} provides both point and interval estimates for multiple metrics through a convenient wrapper function and includes an example dataset derived from the Medical Information Mart for Intensive Care, version II (MIMIC-II) database (Goldberger et al., 2000; Raffa, 2016).
nan