cs.LG @ 2025-08-01: 1240

07-31 (4)

SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

SUB: Benchmarking der CBM-Verallgemeinerung über Synthetische Attribute Substitutionen

基准化 CBM 通过合成属性替代实现普遍化

2507.23784v1

07-31

XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

XSpecMesh: Qualitätsschonende Auto-Regressive Mesh-Generation Beschleunigung über Multi-Head Spekulative Decodierung

XSpecMesh:通过多格投机代号加速实现质量保护自动递减的机械生成

2507.23777v1

07-31

SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

SimuRA: Auf dem Weg zu einem General Goal-Oriented Agent über Simulative Reasoning Architecture mit LLM-basiertem Weltmodell

SimurRA:通过使用以LLM为基础的世界模型的模拟合理理由结构,努力实现以一般目标为导向的代理

2507.23773v1

07-31

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

GenoMAS: Ein Multi-Agenten-Framework für wissenschaftliche Entdeckung durch codegetriebene Genexpressionsanalyse

GenoMAS: 通过代码驱动基因表达分析科学发现多机构框架

2507.21035v2

07-31

Consensus-Driven Active Model Selection

Consensus-Driven Aktive Modellauswahl

采用协商一致的主动选择模式

2507.23771v1

07-31

Formal Bayesian Transfer Learning via the Total Risk Prior

Formale Bayesian Transfer Learning über das Total Risk Prior

通过 “ 总风险前 “ 学习

2507.23768v1

07-31

Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing

Skalierte Beta-Modelle und Feature-Verdünnung für dynamische Ticket-Preise

用于动态票盘定价的缩放贝塔模型和特性稀释

2507.23767v1

07-31

Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System

Verbesserung der Annotator-Auswahl in Active Learning mit einem Stimmungs- und Ermüdungs-Empfänger-System

利用情绪和疲劳意识建议系统,改进积极学习中宣传员的选择

2507.23756v1

07-31

Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic

Raum-Temporale Verstärkung Lernen für Netzwerk Routing mit nicht-Markovian Verkehr

非马其顿交通网络运行空间-临时加强学习

2507.22174v2

07-31

Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs

Regel2Text: Natürliche Sprache Erklärung der logischen Regeln in Wissensgraphen

规则2案文:知识图中逻辑规则的自然语言解释

2507.23740v1

07-31

DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction

DICOM De-Identifikation über Hybrid-KI und regelbasiertes Framework für skalierbare, unsichere Redaction

DICOM 通过混合AI和基于规则的可缩放、不确定-软件编辑框架进行识别

2507.23736v1

07-31

A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Ein theoretisches Rahmenwerk zur Erklärung von Stärkungslernen mit Shapley-Werten

解释有阴影值的强化学习理论框架

2505.07797v2

07-31

Intersectional Divergence: Measuring Fairness in Regression

Intersektionale Divergenz: Fairness in der Regression messen

跨部门的交叉差异:衡量倒退中的公平性

2505.00830v2

07-31

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Verstärkung der multi-agenten Zusammenarbeit mit aufmerksamkeitsbasierter akteur-kritischer Politik

加强多机构与基于注意的行为者-批评政策的协作

2507.22782v2

07-31

Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach

Quantentransfer-Lernen für die MNIST-Klassifizierung mit einem hybriden Quantum-Klassiker-Ansatz

采用混合量子分类方法进行MNIST分类的量子转移学习

2408.03351v2

07-31

Anomalous Samples for Few-Shot Anomaly Detection

Anomale Proben für wenige heiße Anomalien-Erkennung

很少热异常检测的异常样本

2507.23712v1

07-31

GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network

GCL-GCN: Graphiter und Kontrastives Lernen verbessertes Attribut-Graph-Clustering-Netzwerk

GCL-GCN: 石墨和反向学习强化成份图集集成网络

2507.19095v2

07-31

Disparate Conditional Prediction in Multiclass Classifiers

Disparate Bedingte Vorhersagen in Mehrklassen-Klassifikatoren

多分类分类中的条件预测

2206.03234v3

07-31

Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks

Satelliten-Federated Fine-Tuning für Basismodelle in Weltraum Computing Power Networks

卫星卫星联合会空间电子计算动力网络基础模型精密设计

2504.10403v3

07-31

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

villa-X: Verbesserung des Latent Action Modeling in Vision-Language-Action-Modellen

VIAN-X:加强视觉-语言-行动模型的原始行动模型

2507.23682v1

07-31

DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data

DepMicroDiff: Diffusionsbasierte Abhängigkeits-Bewusst Multimodale Imputation für Mikrobiom-Daten

DepMicroDiff: 微生物数据多式多式计算法

2507.23676v1

07-31

A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms

二进制黑洞波形的深学习动力数字相对相对性替代工具

2412.06946v3

07-31

One-Step Flow Policy Mirror Descent

Ein-Schritt-Fluss-Politik Spiegelabstieg

单步流动政策从属

2507.23675v1

07-31

TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses

TweakLLM: Eine Routing-Architektur für dynamisches Tailoring von Cached Responses

TweakLLLM: 快速快速定制快速响应的运行结构

2507.23674v1

07-31

SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

SAMSA: Segment alles Modell mit Spektralwinkeln für hyperspektrale interaktive medizinische Bildsegmentierung verbessert

SAMSA:用超光谱交互式医学图像截面光谱光谱角度增强的片段“任何东西”模型

2507.23673v1

07-31

SHAP-Guided Regularization in Machine Learning Models

SHAP-geführte Regularisierung in Machine Learning-Modellen

SHAP-指导的机械学习模式规范化

2507.23665v1

07-31

Parallel Split Learning with Global Sampling

Paralleles Split-Lernen mit globaler Probenahme

与全球抽样平行拆分学习

2407.15738v4

07-31

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Wie kann ich meinen LLM-Benchmark veröffentlichen, ohne die wahren Antworten wegzugeben?

我怎样才能公布我的LLM基准而不给出正确的答案?

2505.18102v2

07-31

Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

Kandinsky Conformal Prediction: Jenseits von Klassen- und Kovariate-Conditional Coverage

Kandinsky 共变预测:超越等级和共同 – – 有条件的覆盖范围

2502.17264v2

07-31

OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting

OptiGradTrust: Byzantinisch-Robust-Federiertes Lernen mit Multi-Feature Gradientenanalyse und Verstärkung Learning-Based Trust Gewichtung

OptiGrad Trustt:Byzantine-Robust 采用多性质渐进分析和强化学习的联邦学习,基于信任的加权

2507.23638v1

07-31

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Zur Expressivität von Softmax Achtung: Eine recurrente Neurale Netzwerkperspektive

” 软体关注的表达性:神经网络的经常性视角 “

2507.23632v1

07-31

CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

CS-SHRED: Verbesserung von SHRED zur robusten Erholung der Spatiotemporalen Dynamik

CS-SHRED:加强光学时光动力的强劲恢复

2507.22303v2

07-31

DivControl: Knowledge Diversion for Controllable Image Generation

DivControl: Wissensdiversion für steuerbare Bilderzeugung

Div Control: 知识转移用于可控图像生成

2507.23620v1

07-31

L-GTA: Latent Generative Modeling for Time Series Augmentation

L-GTA: Latent Generative Modellierung für Zeitreihenvergrößerung

L-GTA: 时间序列递增原始生成模型

2507.23615v1

07-31

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

MaxInfoRL: Förderung der Exploration im Verstärkungslernen durch Informationsgewinnmaximierung

MaxInfoRL:促进探索,通过信息获取最大化加强学习

2412.12098v2

35 07-31 Consistent Point Matching Konsistente Punktgleichung 统一点匹配 2507.23609v1

07-31

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Assessments

具有不确定性估计值的临床试验的深入学习预测

2507.23607v1

07-31

Hierarchical Message-Passing Policies for Multi-Agent Reinforcement Learning

Hierarchische Message-Passing-Politiken für das Mehr-Agenten-Verstärkungs-Lernen

促进多机构强化学习的等级信息传递政策

2507.23604v1

07-31

EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve Resolution

EB-gMCR: Energiebasierte Generative Modellierung für Signalunmixing und Multivariate Kurvenauflösung

EB-gMCR: 以能源为基础的信号融合和多变量曲线分辨率生成模型

2507.23600v1

07-31

Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots

Geteilte Aufmerksamkeit: Unüberwachte Multi-Objekt-Entdeckung mit kontextuell getrennten Slots

分散注意: 未监督的多对象发现, 带有上下文分隔的空格

2304.01430v3

07-31

SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

SinBasis Networks: Matrix-äquivalente Feature-Extraktion für wellenähnliche optische Spektrogramme

Sinbasis 网络: 类似光谱仪波的矩阵等效特征提取

2505.06275v2

07-31

Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding

Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding

路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查

2505.19219v2

07-31

GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

GraphRAG-R1: Graph Retrieval-Augmented Generation mit prozessabhängigem Verstärkungslernen

图图RAG-R-1:具有过程限制的加强学习的回流-加速一代图

2507.23581v1

07-31

Neutral Residues: Revisiting Adapters for Model Extension

Neutrale Rückstände: Adapter zur Modellerweiterung

中立残留物:重新审视适应器,用于示范推广

2410.02744v3

07-31

Optimised Feature Subset Selection via Simulated Annealing

Optimierte Feature-Subset-Auswahl über Simuliertes Annealing

通过模拟 Annaaling 模拟优化功能子集选择

2507.23568v1

07-31

Momentum-based gradient descent methods for Lie groups

Momentumbasierte Gradientenabstufungsmethoden für Lie-Gruppen

针对 “ 骗子 “ 群体的基于动力的梯度梯度下降方法

2404.09363v2

07-31

Weighted least-squares approximation with determinantal point processes and generalized volume sampling

Gewichtete am wenigsten quadratische Annäherung mit determinativen Punktprozessen und generalisierter Volumen-Probenahme

带有确定点过程和通用量抽样的加权最小方平方近似值

2312.14057v4

07-31

Optimal and Near-Optimal Adaptive Vector Quantization

Optimale und nahezu optimale adaptive Vektor-Quantisierung

最佳和近近最佳适应性

2402.03158v2

07-31

Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform

Hardware-Aware Feintuning von Spiking Q-Netzwerken auf der SpiNNaker2 Neuromorphic Platform

SpinNNNAK2 神经变形平台SpiNNAKK QNetwork 的硬件- 硬件- 软件精密配置

2507.23562v1

07-31

Physics-informed Gaussian Processes as Linear Model Predictive Controller

Physik-informierte Gaußsche Prozesse als linearer Modellvorhersageregler

作为线性模拟预测主计长

2412.04502v2

07-31

Molecule Graph Networks with Many-body Equivariant Interactions

Molekulare Graphen-Netzwerke mit Vielkörper-Equivariant-Interaktionen

多体等同交互作用的分子图图网络

2406.13265v3

07-31

Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Tile and Slide : Ein neuer Rahmen für die Skalierung von NeRF von lokaler bis globaler 3D-Erdbeobachtung

平板和幻灯片:从地方向全球3D地球观测扩大内域FF的新框架

2507.01631v2

07-31

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Verbesserte Algorithmen für Kernel-Matrix-Vektor-Multiplikation unter Sparsamkeitsannahmen

改进内核矩阵矩阵-变量乘法乘法的数值

2507.23539v1

07-31

From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

Von LLMs bis Edge: Parametereffizientes Feintuning auf Edge-Geräten

从LLMs到边缘:边缘装置的参数-有效精密喷射

2507.23536v1

07-31

PurpCode: Reasoning for Safer Code Generation

PurpCode: Begründung für eine sicherere Code-Generierung

PurpCode:更安全代码生成的理由

2507.19060v2

07-31

Transparent AI: The Case for Interpretability and Explainability

Transparente KI: Der Fall für Dolmetschbarkeit und Erklärbarkeit

透明 AI: 解释和解释的理由

2507.23535v1

07-31

Continual Learning with Synthetic Boundary Experience Blending

Kontinuierliches Lernen mit synthetischer Grenzerfahrung Blending

与合成边界不断学习

2507.23534v1

07-31

Diffusion Beats Autoregressive in Data-Constrained Settings

Diffusion schlägt Autoregressive in datenbeschränkten Einstellungen

在受数据约束的设置中自动递减

2507.15857v4

07-31

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

H-RDT: Menschliche Manipulation verbessert bimanuelle Robotermanipulation

H-RDT:人类操纵增强二手机械操纵

2507.23523v1

07-31

TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

TPP-SD: Beschleunigung der Transformer-Punkt-Prozedursampling mit spekulativer Dekodierung

TPP-SD:加速变速点进程与投机代号抽样

2507.09252v2

07-31

Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

Unterschiedlich Private Clipped-SGD: Hochwahrscheinlichkeitskonvergenz mit willkürlicher Clipping-Ebene

区别私人的Cllipped-SGD:高概率与任意缩小水平相融合

2507.23512v1

61 07-31 A Verifier Hierarchy Eine Prüferhierarchie 验证者等级分层 2507.23504v1

07-31

Directional Ensemble Aggregation for Actor-Critics

Regie-Ensemble Aggregation für Schauspieler-Kritik

行为者-批评者方向集合群

2507.23501v1

07-31

Incorporating structural uncertainty in causal decision making

Einbeziehung struktureller Unsicherheit in die kausale Entscheidungsfindung

将结构性不确定性纳入因果决策

2507.23495v1

07-31

Neural-ANOVA: Analytical Model Decomposition using Automatic Integration

Neural-ANOVA: Analytische Modellzersetzung mit automatischer Integration

神经-ANOVA:使用自动集成法分析模型分解

2408.12319v2

07-31

Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus

Erklärbares Modell für künstliche Intelligenz zur Vorhersage des Risikos einer Gesamtsterblichkeit bei Patienten mit Typ-2-Diabetes mellitus

可解释的人工智能模型,预测2型糖尿病患者因各种原因死亡的风险

2507.23491v1

07-31

On the Approximation of Stationary Processes using the ARMA Model

Zur Annäherung von stationären Prozessen mit dem ARMA-Modell

使用ARMA模型的固定工艺接近情况

2408.10610v4

07-31

Machine learning and machine learned prediction in chest X-ray images

Maschinelles Lernen und maschinell gelernte Vorhersagen in Röntgenbildern in der Brust

胸部X光图像中的机器学习和机器学习预测

2507.23455v1

07-31

Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly Detection

Manifold-regularisierte Signatur-Kernel Large-Margin $\ell_p$-SVDD für mehrdimensionale Zeitreihenanomalienerkennung

用于多层时间序列异常探测的大型内核 $\ ell_ p$- SVDD $\ ell_ p$- SVDD

2507.23449v1

07-31

Adjoint-Based Aerodynamic Shape Optimization with a Manifold Constraint Learned by Diffusion Models

Adjoint-Based Aerodynamic Shape Optimization mit einer Manifold Constraint durch Diffusion Modelle gelernt

以联合为基础的空气动力学元件优化,通过扩散模型进行控制

2507.23443v1

07-31

Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design

Coflex: Verbesserung von HW-NAS mit Sparse Gaussian Prozessen für effizientes und skalierbares DNN Accelerator Design

Coflex:加强HW-NAS,并配有用于高效和可缩放 DNN 加速器设计的斯普尔斯高斯进程

2507.23437v1

07-31

A ZeNN architecture to avoid the Gaussian trap

Eine ZeNN-Architektur, um die Gaussische Falle zu vermeiden

避免高斯陷阱的 ZeNN 建筑

2505.20553v2

07-31

Merging Memory and Space: A Spatiotemporal State Space Neural Operator

Zusammenführen von Speicher und Raum: Ein räumlich-temporaler Zustandsraum-Neural-Betreiber

合并的记忆与空间:一个瞬间国家空间神经操作员

2507.23428v1

07-31

Identifying Super Spreaders in Multilayer Networks

Identifizieren von Superspreizern in Multilayer-Netzwerken

识别多层网络中的超级传播器

2505.20980v2

07-31

Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine Learning

Erkennung von Verwechslungen in Kokosmilch mittels Infrarotspektroskopie und maschinellem Lernen

利用红外红外光谱镜像和机器学习探测椰子牛奶中通奸

2507.23418v1

07-31

Honey Adulteration Detection using Hyperspectral Imaging and Machine Learning

Honey Adulteration Detection mit Hyperspektrale Bildgebung und maschinelles Lernen

利用超光谱成像和机器学习探测蜂蜜通奸

2507.23416v1

07-31

A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles

Ein maschineller Lernansatz für die Erkennung von Honig-Adulteration mittels Mineralelement-Profilen

利用矿物元素简介进行蜂蜜通奸检测的机械学习方法

2507.23412v1

07-31

Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Effiziente Schmerzerkennung durch Respirationssignale: Eine einzige Cross-Attention Transformer Multi-Window Fusion Pipeline

通过呼吸信号进行有效的疼痛识别:单一交叉感应变异器多窗口融合管道

2507.21886v3

07-31

Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Künstliche induktive Bias für die synthetische tabellarische Datengenerierung in Data-Scarce-Szenarien

数据碎片假设情景中合成图示数据生成人工诱导比值

2407.03080v2

07-31

AGA: An adaptive group alignment framework for structured medical cross-modal representation learning

AGA: Ein adaptiver Gruppenausrichtungsrahmen für strukturiertes medizinisches Cross-Modalitäts-Repräsentations-Lernen

AGA:结构化医疗跨模式代表性学习的适应性小组调整框架

2507.23402v1

07-31

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Politik Lernen aus großen Vision-Sprache Modell Feedback ohne Belohnung Modellierung

从大视野 – – 语言模型反馈中学习政策而不进行奖励建模

2507.23391v1

07-31

Causal Explanation of Concept Drift – A Truly Actionable Approach

Kausale Erklärung des Konzepts Drift – Ein wirklich handlungsfähiger Ansatz

对 “ 漂流 “ 概念的因果解释 – – 真正可采取行动的方法

2507.23389v1

07-31

Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken

关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果

2507.07675v2

07-31

EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

EP-Diffusor: Ein effizientes Diffusionsmodell für die Verkehrsszenengenerierung und -vorhersage über polynomische Darstellungen

EP-Diffuser:通过多边代表制有效传播交通景点生成和预测模式

2504.05422v3

07-31

Robust and Fine-Grained Detection of AI Generated Texts

Robuste und feinkörnige Erkennung von KI-generierten Texten

对 AI 生成文本的强力和精细探测

2504.11952v3

07-31

SWE-Exp: Experience-Driven Software Issue Resolution

SWE-Exp: Erfahrungsgetriebene Software-Ausgabeauflösung

SWE-Expl:经验丰富的软件问题决议

2507.23361v1

07-31

Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment Rules

Optimales Verkehrslernen: Wertoptimierung und Fairness in individualisierten Behandlungsregeln ausgleichen

最佳交通学习:在个人化待遇规则中平衡价值的优化和公平

2507.23349v1

07-31

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

SWE-Debatte: Wettbewerbsfähige Multi-Agenten-Debatte für die Lösung von Software-Problemen

SWE-Debate:解决软件问题竞争性多机构辩论

2507.23348v1

07-31

Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

Strompreisvorhersage mit Multi-Kernel Gaussian Prozess-Regression kombiniert mit Kernel-basierte Unterstützung Vektor-Regression

利用多克朗高斯进程回归与内核支持矢量回归结合进行的电力价格预测

2412.00123v4

07-31

Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation

Dynamische Preisgestaltung für Bike-Sharing-Systeme über eine charakteristische agentenbasierte Simulation

通过基于不同制剂的模拟,为自行车共享系统设计动态定价

2507.23344v1

07-31

Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k Predictions

Skalierbare und präzise Patch Robustness Zertifizierung für Deep Learning Modelle mit Top-K Vorhersagen

具有顶级预测力的深学习模型可缩放和精确的补丁强度认证

2507.23335v1

07-31

FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural Networks

FovEx: Menschlich inspirierte Erklärungen für Visionstransformer und konvolutionäre Neuralnetzwerke

FovEx:对愿景变异者和革命性神经网络的人类启发解释

2408.02123v3

07-31

MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation

MUST-RAG: MUSical Text Question Beantwortung mit retrieval Augmented Generation

MOST-RAG: 以回取增加的一代人回答的中文本问题

2507.23334v1

07-31

EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models

EaqVLA: Kodierungsorientierte Quantisierung für Vision-Language-Action-Modelle

EaqVLA: 愿景-语言-行动模式的编码和一致的量化

2505.21567v2

07-31

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Transformierte Low-Rank-Anpassung über Tensor-Zersetzung und deren Anwendungen zu Text-zu-Bild-Modellen

通过Tensor分解及其在文本到图像模型中的应用

2501.08727v2

07-31

MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

MVCNet: Multi-View Kontrastives Netzwerk für die Klassifizierung von Motorbildern

MVCNet:机动图像分类多视比网络

2502.17482v4

07-31

HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

HER2-Expression-Vorhersage mit flexiblen Multi-Modal-Eingängen durch dynamische bidirektionale Rekonstruktion

通过动态双向重建灵活多模式输入的HER2表达式预测

2506.10006v2

07-31

Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Gute Lernende denken an ihr Denken: Generatives PRM macht groß aufschlussreiches Modell effizienter Math Learner

优秀的学习者思考他们的思考:创创型程序使大型理性模型提高数学学习者的效率

2507.23317v1

07-31

MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

MemShare: Memory Effiziente Schlussfolgerung für große Vernunftmodelle durch KV Cache Reuse

Memshare:通过 KV 缓存再使用大型理由模型的内存高效引用

2507.21433v2

07-31

Impact of Hyperparameter Optimization on the Accuracy of Lightweight Deep Learning Models for Real-Time Image Classification

Auswirkungen von Hyperparameter-Optimierung auf die Genauigkeit leichter Deep Learning-Modelle für Echtzeit-Bildklassifikation

超超参数优化对实时图像分类轻型深度学习模型准确性的影响

2507.23315v1

100

07-31

An Interpretable Data-Driven Unsupervised Approach for the Prevention of Forgotten Items

Ein interpretierbarer, datengestützter, unbeaufsichtigter Ansatz zur Vermeidung vergessener Gegenstände

防止被遗忘物品不受监督的可解释数据驱动的未受监督的防止被遗忘物品的方法

2507.23303v1

101

07-31

Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuning

Simulationsbasierte Inferenz für Präzisions-Neutrinophysik durch Neural Monte Carlo-Tuning

通过神经蒙特卡洛调控精密中子物理的模拟推论

2507.23297v1

102

07-31

SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

SequenzLayer: Sequenzverarbeitung und Streaming von Neuronalen Netzwerken leicht gemacht

序列激光器:序列处理和串联神经网络变得容易

2507.23292v1

103

07-31

Evaluating the Dynamics of Membership Privacy in Deep Learning

Bewertung der Dynamik der Mitgliedschafts-Privacy in Deep Learning

深层学习中成员隐私动态评估

2507.23291v1

104

07-31

Tailored Forecasting from Short Time Series via Meta-learning

Maßgeschneiderte Prognose aus Kurzzeitreihen über Meta-Learning

通过元学习从短时间系列中进行量身定制的预测

2501.16325v2

105

07-31

Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling

Einblicke in die Closed-Form IPM-GAN Discriminator Guidance for Diffusion Modeling

透视到封闭式 IPPM-GAN-GAN

2306.01654v2

106

07-31

CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

CEE: Eine Inferenz-Zeit-Jailbreak-Verteidigung für eingedrungene Intelligenz über Subraumkonzept-Rotation

中东欧:通过子空间概念旋转对潜入式情报进行推论-时间破狱防御

2504.13201v2

107

07-31

SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research

SmartPNT-MSF: Multi-Sensor-Fusionsdatensatz für Positionierung und Navigationsforschung

SmartPNT-MSF:用于定位和导航研究的多传感器融合数据集

2507.19079v2

108

07-31

DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

DynaSwarm: Dynamische Graphenstrukturauswahl für LLM-basiertes Multi-Agent-System

DynSwarm: 以LLM为基础的多剂系统动态图结构选择

2507.23261v1

109

07-31

AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

KI sollte besser fühlen, nicht nur größer skalieren: Adaptive Sensing als Paradigmenverschiebung

AI 应当更好,而不仅仅是规模更大:将适应性遥感作为范式转变

2507.07820v2

110

07-31

Efficient Machine Unlearning via Influence Approximation

Effizientes maschinelles Lernen durch Einflussannäherung

通过 “ 影响力接近 “ 解除学习

2507.23257v1

111

07-31

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

ActSafe: Aktive Exploration mit Sicherheitseinschränkungen für Verstärkungslernen

Acsafe:积极探索加强学习的安全制约因素

2410.09486v3

112

07-31

Evaluating LLMs’ Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis

Bewertung der Mehrsprachigkeitsfähigkeiten von LLMs für Bengalen: Benchmark-Erstellung und Leistungsanalyse

评价孟加拉多种语文能力:基准设定和业绩分析

2507.23248v1

113

07-31

GrokAlign: Geometric Characterisation and Acceleration of Grokking

GrokAlign: Geometrische Charakterisierung und Beschleunigung von Grokking

Grokalign:Grokking的几何特征和加速

2506.12284v2

114

07-31

Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World Documents

Generalisiertes Verstärkungslernen für retriever-spezifische Abfrage-Rewriter mit unstrukturierten Real-World-Dokumenten

利用无结构的 “ 现实世界文件 “ 检索特定查询卷卷的通用强化学习

2507.23242v1

115

07-31

Accumulator-Aware Post-Training Quantization for Large Language Models

Akkumulator-Aware-Nachschulungs-Quantisierung für große Sprachmodelle

大型语文模式培训后量化

2409.17092v2

116

07-31

Achieving Deep Continual Learning via Evolution

Deep Continual Learning durch Evolution erreichen

通过演进实现深入不断学习

2502.06210v2

117

07-31

Enabling Few-Shot Alzheimer’s Disease Diagnosis on Tabular Biomarker Data with LLMs

Ermöglichung der weniger scharfen Alzheimer-Krankheit Diagnose auf Tabular Biomarker Daten mit LLMs

使小热阿尔茨海默氏病的疾病诊断能够用LMS在表示生物标记数据上进行

2507.23227v1

118

07-31

Unveiling the Influence of Amplifying Language-Specific Neurons

Enthüllen des Einflusses amplifizierender sprachspezifischer Neuronen

消除扩增语言特有新元的影响

2507.22581v2

119

07-31

A Single Direction of Truth: An Observer Model’s Linear Residual Probe Exposes and Steers Contextual Hallucinations

Eine einzige Richtung der Wahrheit: Die linearen residualen Sonden eines Beobachtermodells zeigen und säumen kontextuelle Halluzinationen

真相的单一方向:观察模型的线性残余研究发现和脚底背景幻觉

2507.23221v1

120

07-31

Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles

Förderung generativer Künstlicher Intelligenz und großer Sprachmodelle für das Nachfrage-Side-Management mit dem Internet von Elektrofahrzeugen

利用电动车辆互联网推动产生供求方管理的人工情报和大语言模型

2501.15544v4

121

07-31

Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Model Directions, keine Worte: Mechanistische Themenmodelle mit Sparse Autoencodern

模型方向,非单词:使用粗态自动编码器的机械专题模型

2507.23220v1

122

07-31

Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation

Zero-Shot-Dokument Verständnis mit Pseudo Inhaltsverzeichnis-Geführte retrieval-Augmented Generation

使用内容引导回源回收新一代的 Pseudo 表格进行零热文档理解

2507.23217v1

123

07-31

Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation

Nicht nur was, aber wann: Integrieren unregelmäßiger Intervalle in LLM für sequentielle Empfehlung

不仅只是什么,但是当: 将非正常的间联者纳入LLM, 以便按顺序提出建议

2507.23209v1

124

07-31

Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

Sind Recommenders Self-Aware? Label-freie Empfehlung Leistungsschätzung über Modellunsicherheit

推荐人是否自觉?通过模型不确定性对无标签建议绩效的估算

2507.23208v1

125

07-31

Adapt before Continual Learning

Anpassung vor dem kontinuierlichen Lernen

在持续学习前适应

2506.03956v3

126

07-31

InfAlign: Inference-aware language model alignment

InfAlign: Inference-aware Sprachmodellausrichtung

Infagign: 参考意识语言模型对齐

2412.19792v4

127

07-31

Learning 3D Scene Analogies with Neural Contextual Scene Maps

3D-Szenen-Analogien mit neuralen Kontext-Szenenkarten lernen

学习 3D 与天体背景场景地图的场景模拟

2503.15897v2

128

07-31

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Geak: Einführung von Triton Kernel AI Agent & Evaluation Benchmarks

Geak:介绍Triton Kernel AI 代理和评估基准

2507.23194v1

129

07-31

G-Core: A Simple, Scalable and Balanced RLHF Trainer

G-Core: Ein einfacher, skalierbarer und ausbalancierter RLHF-Trainer

G-Core: 简单、可缩放和平衡的RLHF培训员

2507.22789v2

130

07-31

NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions

NaN-Propagation: Eine neuartige Methode zur Erkennung von Sparsität in Black-Box Computational Functions

NaN- propagation: 在黑箱计算函数中检测分数的新颖方法

2507.23186v1

131

07-31

H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

H2Tune: Federated Foundation Model Feintuning mit Hybrid Heterogenität

H2Tune: 联邦基金会混合异质性混合调整示范

2507.22633v2

132

07-31

MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

MolPIF: Ein Parameter Interpolationsflussmodell für die Molekülerzeugung

MoLPIF: 分子一代的参数内插流动模型

2507.13762v3

133

07-31

Entanglement-induced provable and robust quantum learning advantages

Verflechtung-induzierte nachweisbare und robuste Vorteile des Quantenlernens

纠缠引发的可证实和稳健量量的学习优势

2410.03094v2

134

07-31

A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

Eine umfassende Überprüfung von Difffusionsmodellen in der intelligenten Landwirtschaft: Fortschritt, Anwendungen und Herausforderungen

全面审查 “ 智能农业传播模式:进展、应用和挑战 “

2507.18376v2

135

07-31

CNN-based solution for mango classification in agricultural environments

CNN-basierte Lösung für die Mangoklassifizierung in landwirtschaftlichen Umgebungen

以有线有线电视新闻网为基础的农业环境芒果分类解决办法

2507.23174v1

136

07-31

BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning

BAR Conjecture: Die Machbarkeit von Schlussfolgerungen Budget-konstruierten LLM-Diensten mit Authentizität und Vernunft

BAR 假设:具有真实性和合理性、经过预算约束的有限LLM服务推论的可行性

2507.23170v1

137

07-31

LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration

LENS: Lerne Ensemble Vertrauen aus neuralen Staaten für Multi-LLM-Antwortintegration

LENS:从神经国家学习多LLM应答整合的集合信任

2507.23167v1

138

07-31

Tensor Product Neural Networks for Functional ANOVA Model

Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell

功能ANOVA模型的神经网络

2502.15215v5

139

07-31

Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Kompositorische Funktionsnetzwerke: Eine leistungsstarke Alternative zu tiefen neuralen Netzwerken mit eingebauter Interpretierbarkeit

构成函数网络:具有内置可解释性的深神经网络高性能替代品

2507.21004v2

140

07-30 (3)

TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

TokenBlowUp: Auflösung von Repräsentationssingularitäten in LLM-Tokenräumen über monoidale Transformationen

TokenBlowUp: 通过一式转换解决LLM Token空间的代表标志

2507.19747v2

141

07-30

Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting Materials

Erweiterte Factorisierungsmaschine Annealing für die schnelle Entdeckung von transparenten leitenden Materialien

迅速发现透明操作材料的扩展保理装置

2507.23160v1

142

07-30

AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

AdaptHetero: maschinelles Lernen Interpretationsgetriebene Subgruppenanpassung für EHR-basierte klinische Vorhersagen

适应赫特罗:基于EHR的临床预测的机器学习口译驱动分组适应

2507.21197v2

143

07-30

Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization

Entscheidung von Supervised Learning mit tiefen Ensembles: Ein praktischer Rahmen für robuste Portfolio-Optimierung

受监督的深群学习决定:强力组合组合优化实用框架

2503.13544v4

144

07-30

On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Über die Komplexität der Suche nach Stationären Punkten in nicht konvexe einfache Bilevel-Optimierung

关于非电解简单双级最佳化中寻找固定点的复杂性

2507.23155v1

145

07-30

FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations

FuseTen: Ein generatives Modell für täglich 10 m Landoberflächentemperaturschätzung aus räumlich-zeitlichen Satellitenbeobachtungen

FuseTen:斯帕蒂奥-时空卫星观测每日10米地表温度估计的生成模型

2507.23154v1

146

07-30

AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

KI-Paradigma zur Lösung von Differentialgleichungen: First-Principles Datengenerierung und Scale-Dilation Operator KI-Löser

解决差别方程式的AI模式:第一原则数据生成和比例关系操作员AI求解器

2507.23141v1

147 07-30 Observational Multiplicity Beobachtungsvielfalt 观测多样性 2507.23136v1

148

07-30

Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts

Bewertung und Verbesserung der Robustheit von Sprachbefehlserkennungsmodellen für Geräusch- und Verteilungsverschiebungen

评估和改进语音指令识别模式对噪音和分配变化的威力

2507.23128v1

149

07-30

ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

ModalTune: Fine-Tuning Slide-Level-Grundlagenmodelle mit multi-Modalen Informationen für Multi-Task-Lernen in der digitalen Pathologie

模式图纳:数字病理学多任务学习多模式信息多模式学习的精准引导幻灯片级基金会模型

2503.17564v2

150

07-30

MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

MLE-STAR: Maschinenbauer über Suche und gezielte Veredelung

MLE-STAR:通过搜索和定向改进进行机械学习工程代理

2506.15692v2

151

07-30

Controlling diverse robots by inferring Jacobian fields with deep networks

Steuerung diverser Roboter durch Rückschlüsse auf Jacobian-Felder mit tiefen Netzwerken

通过将雅各布田地与深层网络进行推断,控制各种机器人

2407.08722v2

152

07-30

Insights into resource utilization of code small language models serving with runtime engines and execution providers

Einblicke in die Ressourcennutzung von Code-Small Language-Modellen, die mit Laufzeit-Engines und Ausführungsanbietern dienen

深入了解为运行时引擎和执行提供方服务的编码小型语文模式的资源利用情况

2412.15441v2

153

07-30

FLOSS: Federated Learning with Opt-Out and Straggler Support

FLOSS: Föderiertes Lernen mit Opt-Out und Straggler-Unterstützung

FLOSS: 具有“Opt-Out”和“Straggler”支持的联邦学习

2507.23115v1

154

07-30

Scalable Generative Modeling of Weighted Graphs

Skalierbare Generative Modellierung von gewichteten Graphen

加权图表的可缩放生成建模

2507.23111v1

155

07-30

Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Graphenstichproben für skalierbare und expressive Graphenneurale Netzwerke auf homophilen Graphen

光益图可缩缩和伸缩图形神经网络图示样本

2410.16593v4

156

07-30

Coarse Graining with Neural Operators for Simulating Chaotic Systems

Grobkörnung mit neuralen Operatoren zur Simulation chaotischer Systeme

与模拟劣质系统神经操作员的粗粗谷物

2408.05177v5

157

07-30

RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

RASL: 大规模数据库文本到 SQL 的检索增强的相连接表表

2507.23104v1

158

07-30

On the Sustainability of AI Inferences in the Edge

Zur Nachhaltigkeit von KI-Schlussfolgerungen am Rande

AI I 边缘推论的可持续性

2507.23093v1

159

07-30

Accenture-NVS1: A Novel View Synthesis Dataset

Accenture-NVS1: Ein neuartiger Synthesedatensatz

Accenture-NVS1:新观点合成数据集

2503.18711v2

160

07-30

Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation

Dynamisch inspiriertes Lernen invarianter Subräume für Koopman und Transferoperator Approximation

Koopman 和传输操作员近似值的动态激励学习动态激励的变量子空间和传输操作员近似值

2505.05085v2

161

07-30

A Foundation Model for Material Fracture Prediction

Ein Grundlagenmodell für die Vorhersage von Materialfrakturen

材料断裂预测基金会模型

2507.23077v1

162

07-30

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Weiterentwicklung der visionsbasierten menschlichen Handlungserkennung: Erforschen eines visionssprachlichen CLIP-Modells zur Generalisierung in domänenunabhängigen Aufgaben

推进基于愿景的人类行动认识:探索愿景-语言化 CLIP 在独立领域各任务中推广的CLIP模式

2507.18675v2

163

07-30

Locally Differentially Private Thresholding Bandits

Lokal unterschiedlich private Thresholding Bandits

地方差异式私家强盗

2507.23073v1

164

07-30

Affect Models Have Weak Generalizability to Atypical Speech

Affect Models haben geringe Verallgemeinerbarkeit zu atypischer Sprache

效果模型对非典型演讲的可普及性较弱

2504.16283v2

165

07-30

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

Vision-Language Fusion für autonomes Fahren in Echtzeit: Zielzentrierte Cross-Achtung von Kamera, HD-Karte und Wegpunkten

实时自主驾驶的视觉-语言融合:以目标为中心交叉使用相机、HD-地图、和途径点

2507.23064v1

166

07-30

Lattice Protein Folding with Variational Annealing

Gitterprotein-Falten mit Variations-Analing

Lattice Protein 以变式安纳林方式折叠

2502.20632v2

167

07-30

Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through Catboost

Vorhersage einer signifikanten Kreatininerhöhung in der ersten Intensivstation bei Vancomycin Anwendung: Eine retrospektive Studie über Catboost

第一次伊斯兰法院联盟第一次重大生物量升高预测与Vancomycin使用保持:通过Cattopost进行的回顾性研究

2507.23043v1

168

07-30

Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Frühe zielgeführte Multi-Scale Fusion für Echtzeit-Vision-Sprachenfahren

实时愿景-语言定位驱动目标引导的早期多阶段融合

2507.23042v1

169

07-30

Two-dimensional Parallel Tempering for Constrained Optimization

Zweidimensionales paralleles Temperieren für eingeschränkte Optimierung

限制优化的二维平行热量

2506.14781v2

170

07-30

Linking Actor Behavior to Process Performance Over Time

Verknüpfen des Verhaltens von Schauspielern mit der Prozessleistung im Laufe der Zeit

将动作器行为与时间过程性能链接

2507.23037v1

171

07-30

KLLM: Fast LLM Inference with K-Means Quantization

KLLM: Schnelle LLM-Inferenz mit K-Means-Quantisierung

KLLM: 快速LLM 与 K- Means 量化的推论

2507.23035v1

172

07-30

Recursive Learning-Based Virtual Buffering for Analytical Global Placement

Rekursives Lernen-basiertes virtuelles Puffern für analytische globale Platzierung

分析全球职位安排的基于学习的累累虚拟缓冲

2506.17247v2

173

07-30

Data Readiness for Scientific AI at Scale

Datenbereitstellung für wissenschaftliche KI im Maßstab

规模化科学AI 数据准备程度

2507.23018v1

174

07-30

Deciphering interventional dynamical causality from non-intervention complex systems

Entschlüsselung interventioneller dynamischer Kausalität durch nichtinterventionsfähige komplexe Systeme

消除不干预复杂系统造成的干预性动态因果关系

2407.01621v2

175

07-30

A Smoothing Newton Method for Rank-one Matrix Recovery

Eine glättende Newton-Methode für die Rank-One-Matrix-Wiederherstellung

为一等一矩阵恢复采用平滑的牛顿方法

2507.23017v1

176

07-30

Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning

Hypergraph Neural Sheaf Diffusion: Ein symmetrischer Simplicial-Set-Rahmen für höhere Anforderungen an das Lernen

超时光谱神经纤维扩散:高阶学习的对称简易设置框架

2505.05702v3

177

07-30

Learning to Prune Branches in Modern Tree-Fruit Orchards

Lernen, Zweige in modernen Baumobstplantagen zu beschneiden

学习现代树枝果园的普鲁纳分支

2507.23015v1

178

07-30

Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Untersuchung der Wechselbarkeit multimodaler Latentenräume: Einschränkungen von Optimierungsmethoden

调查多式联运低温空间的不可视性:以优化为基础的方法的局限性

2507.23010v1

179

07-30

Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

Stoppen Sie die Bewertung von KI mit menschlichen Tests, entwickeln Sie Prinzipien, KI-spezifische Tests statt

停止用人类测试来评价AI, 制定原则性、 AI 特定测试

2507.23009v1

180

07-30

Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation

Planung für coolere Städte: Ein multimodales KI-Framework zur Vorhersage und Milderung von städtischem Wärmestress durch Urban Landscape Transformation

更冷城市规划:通过城市景观转型预测和减轻城市热量压力的多模式AI框架

2507.23000v1

181

07-30

Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Konsistenz der Eigenschaftszuweisung in Deep Learning Architekturen für Multi-Omics

多种语言深深学习结构中地物归属的一致性

2507.22877v1

182

07-30

LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content

LCS: Ein KI-basierter Low-Complexity Scaler für leistungsstarke Super-Resolution von Spielinhalten

LCS: 以AI为基础的高功率超级游戏内容分辨率低复杂度缩放仪

2507.22873v1

183

07-30

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Föderiertes Lernen mit On-Device-Training und Kommunikation im 8-Bit-Schwebepunkt

在8位浮动点进行联邦在职培训和交流

2407.02610v2

184

07-30

Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

Nutzung von Evolutionsstrategien zur Ausbildung von Transformern in der Stärkung des Lernens

利用进化战略培训变革者加强学习培训

2501.13883v2

185

07-30

Mesh based segmentation for automated margin line generation on incisors receiving crown treatment

Mesh-basierte Segmentierung für automatisierte Margenlinien-Generierung an Schneidezähnen, die Kronenbehandlung erhalten

在接受皇冠治疗的开切器上自动生成边线的网状隔断除法

2507.22859v1

186

07-30

Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Leistungsfähiges Pokémon auf menschlicher Ebene durch skalierbares Offline-Verstärkungslernen mit Transformern

通过与变革者一起进行可缩放的离线强化学习,进行人级竞争Pokémon

2504.04395v2

187

07-30

Synchronization of mean-field models on the circle

Synchronisierung von Mittelwert-Feld-Modellen auf dem Kreis

圆圈中平均场模型同步化

2507.22857v1

188

07-30

Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach

Föderiertes Lernen auf Riemannschen Manifolds: Ein gradient-free-Projektion-basierter Ansatz

里伊曼曼字形上的联邦学习:基于渐进、无预测的渐进式项目方法

2507.22855v1

189

07-30

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

Ein bisschen Freiheit ist ein langer Weg: Klassische und Quantenalgorithmen zur Stärkung des Lernens unter einem generativen Modell

自由的一段长路:在创举模式下,为强化学习而进行古典和量子分析。

2507.22854v1

190

07-30

Lightweight Online Adaption for Time Series Foundation Model Forecasts

Leichte Online-Anpassung für Time Series Foundation Modellprognosen

时间系列基础基础模型预测

2502.12920v3

191

07-30

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

FRED: Finanzielle Retrieval-erweiterte Erkennung und Bearbeitung von Halluzinationen in Sprachmodellen

FRED: 财务检索-加强发现和编辑语言模型中的幻觉

2507.20930v2

192

07-30

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Anwendung des Vision-Language-Modells auf Fußgänger Verhalten und Szeneverständnis im autonomen Fahren

在自主驾驶中将视觉语言模型应用到行人行为和场景理解

2501.06680v2

193

07-30

Decentralized Differentially Private Power Method

Dezentralisierte Differential-Private-Power-Methode

分散分散的、有区别的私用电力方法

2507.22849v1

194

07-30

Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Krümmung Dynamischer Black-Box-Angriff: Wiederherstellung der gegnerischen Robustheit durch dynamische Krümmungsschätzung

曲线动态黑盒攻击: 通过动态曲线估计, 重新审视对抗性对称稳健性

2505.19194v2

195

07-30

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

RLVMR: Verstärktes Lernen mit überprüfbaren Meta-Reasoning-Belohnungen für robuste Long-Horizon-Agenten

RLVMR: 对强力长森剂采用可核查的可计量可计量的奖赏加强学习

2507.22844v1

196

07-30

Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection

Subgrid BoostCNN: Effiziente Steigerung konvolutionärer Netzwerke durch gradient-geführte Feature-Auswahl

Subgrid 启动CNN: 通过渐变引导特性选择有效推动革命网络

2507.22842v1

197

07-30

PAF-Net: Phase-Aligned Frequency Decoupling Network for Multi-Process Manufacturing Quality Prediction

PAF-Net: Phase-Aligned Frequency Entkopplungsnetzwerk für die Qualitätsvorhersage in der Mehrprozessfertigung

PAF-Net:多处理制造质量预测的分阶段统一频率脱钩网络

2507.22840v1

198

07-30

Tapping into the Black Box: Uncovering Aligned Representations in Pretrained Neural Networks

In die Black Box tappen: Uncovering Aligned Representations in Pretrained Neural Networks

进入黑盒:在培训前神经网络中实现统一代表制

2507.22832v1

199

07-30

The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA

Die Geometrie der Abfragen: Query-based Innovationen in der retrieval-Augmented Generation für Healthcare QA

查询的几何学:以查询为基础的求取-养代保健创新

2407.18044v2

200

07-30

Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit

Wiederholung macht perfekt: Recurrent Graph Neural Networks Match Message Passing Limit

翻版完美:经常图表神经网络符合信件传递限制

2505.00291v2

201

07-30

Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

Verringerung des Varianzverlustes in der Ensembledatenassimilation: maschinelle Learning-basierte und distanzfreie Lokalisierung

减缓在数据共化方面差异的损失:机械学习和远距离本地化

2506.13362v2

202

07-30

Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models

Quantifying surprise in clinical care: Erkennung von hochinformativen Ereignissen in elektronischen Gesundheitsakten mit Fundamentmodellen

将临床护理的意外事件量化:用基础模型检测电子健康记录中的高度信息化事件

2507.22798v1

203

07-30

Towards the Law of Capacity Gap in Distilling Language Models

Auf dem Weg zum Gesetz der Kapazitä tigkeitslücke bei der Destillierung von Sprachmodellen

迈向《语文模式再学习能力差距法》

2311.07052v4

204

07-30

Amorphous Solid Model of Vectorial Hopfield Neural Networks

Amorphes solides Modell von Vectorial Hopfield Neural Networks

矢量跳式浮式神经网络固态模型

2507.22787v1

205

07-30

DO-EM: Density Operator Expectation Maximization

DO-EM: Dichte-Operator-Erwartungsmaximierung

DO-EM: 密度操作员预期最大化

2507.22786v1

206

07-30

Effective Non-Random Extreme Learning Machine

Effektive Non-Random Extreme Lernmaschine

有效的非兰地极端学习机

2411.16229v2

207

07-30

Label-free estimation of clinically relevant performance metrics under distribution shifts

Labelfreie Schätzung klinisch relevanter Leistungsmetriken unter Verteilungsverschiebungen

无标签地估算分布转移中与临床相关的绩效衡量指标

2507.22776v1

208

07-30

Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Empirische Bewertung von Konzept Drift in ML-basierte Android Malware-Erkennung

对以ML为基体和机器人毛虫探测中的概念漂流进行经验评估

2507.22772v1

209

07-30

The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis

Die Wirkung der Stochastik bei der Score-Based Diffusion Sampling: eine KL Divergence Analyse

存储在基于分分数的传播抽样中的效果:KL差异分析

2506.11378v2

210

07-30

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Lehren des Lehrers: Verbesserung der Neuralen Netzwerk-Destillierbarkeit für symbolische Regression durch Jacobian Regularisierung

教师教学:通过雅各的正规化,提高神经网络的可固化性

2507.22767v1

211

07-30

Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Bayesische Optimierung von Prozessparametern eines Sensor-basierten Sortiersystems unter Verwendung Gaussischer Prozesse als Surrogate-Modelle

利用高斯进程作为代位模型,优化基于传感器的排序系统的处理参数

2507.22766v1

212

07-30

Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

Von guten Dämonen und schlechten Engeln: Sichere Kontrolle unter finite Precision garantieren

善魔和坏天使:在有限精密情况下保证安全控制

2507.22760v1

213

07-30

MASCA: LLM based-Multi Agents System for Credit Assessment

MASCA: LLM-basiertes Multi-Agenten-System zur Bonitätsbeurteilung

MASCA: 以LLM为基础的信用评估多边代理系统

2507.22758v1

214

07-30

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Adressierung von Darstellungskollapsen in Vector Quantized Models mit einer linearen Ebene

处理单线层矢量量化模型中代表折叠情况

2411.02038v2

215

07-30

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Bewertungsprüfer: Bewertung der synthetischen Überprüfung für Code und Begründung

标定验证符:评估编码和理由的合成核查

2502.13820v3

216

07-30

RocketStack: Level-aware deep recursive ensemble learning framework with adaptive feature fusion and model pruning dynamics

RocketStack: Level-aware deep rekursive ensemble Learning Framework mit adaptiver Funktion Fusion und Modellschnitt Dynamik

火箭堆: 具有适应性特征聚集和模型排出动态的有意识的深层循环深层共聚学习框架

2506.16965v2

217

07-30

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

FLOSS: Kostenloses Mittagessen in Open-Vocabulary Semantic Segmentation

FLOSS: 开放词汇语义分割中的免费午餐

2504.10487v2

218

07-30

Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Erhöhung der Ultra-Low-Bit-Quantisierung großer Sprachmodelle durch Saliency-Aware Partial Retraining

通过提高质量-软件部分再培训,加强大语言模型的超低比小量量化

2504.13932v3

219

07-30

Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

Verbesserte Vorhersage der CAR T-Zell-Zytotoxizität mit Quanten-Kernel-Methoden

采用量子管方法增强对CAR T-Cell Cyt毒性的预测

2507.22710v1

220

07-30

Unsupervised Learning in Echo State Networks for Input Reconstruction

Unüberwachtes Lernen in Echo State Networks für Input-Reconstruction

在回声州投入重建网络中无人监督的学习

2501.11409v4

221

07-30

Inferring biological processes with intrinsic noise from cross-sectional data

Ableitung biologischer Prozesse mit intrinsischem Rauschen aus Querschnittsdaten

从跨部门数据中以内在噪音推断生物过程

2410.07501v2

222

07-30

Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

Unüberwachtes Lernen: Vergleichende Analyse von Clustering-Techniken auf hochdimensionalen Daten

未受监督的学习:高数据群集技术比较分析

2503.23215v2

223

07-30

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Lokale Mischungen von Experten: Im Wesentlichen kostenlose Test-Zeit-Training über Modellverschmelzung

当地专家混合:通过模式合并进行的基本免费试验时间培训

2505.14136v2

224

07-30

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Raumsprache Likelihood Grounding Network für Bayesian Fusion von Mensch-Roboter-Beobachtungen

Bayesian人类-机器人观测融合空间语言定位网络

2507.19947v2

225

07-30

Cluster-Based Random Forest Visualization and Interpretation

Clusterbasierte Random Forest Visualisierung und Interpretation

以集束为基础的随机森林视觉化和解释

2507.22665v1

226

07-30

Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG

Lag nicht, RAG: Training-freie Adversarial Detection mit RAG

不要拉格,RAG:使用RAG进行无训练的反向探测

2504.04858v3

227

07-30

Transductive Model Selection under Prior Probability Shift

Transduktive Modellauswahl unter vorheriger Wahrscheinlichkeitsverschiebung

先前概率变化下的转变模式选择

2507.22647v1

228

07-30

Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

Sichere Einführung von Offline-Verstärkungslernen über Input Convex-Action-Korrektur

通过投入Convex行动校正安全部署离线强化学习

2507.22640v1

229

07-30

trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images

trAIce3D: Ein prompt-getriebenes Transformer-basiertes U-Net zur semantischen Segmentierung von Mikrogliazellen aus großformatigen 3D-Mikroskopiebildern

trAIce3D: 一个基于U-Net的快速驱动变形器,用于从大型 3D 显微镜片图像中对微晶体细胞进行语义分解

2507.22635v1

230

07-30

A Unified Analysis of Generalization and Sample Complexity for Semi-Supervised Domain Adaptation

Eine einheitliche Analyse von Generalisierung und Probenkomplexität für halbüberwachte Domain-Anpassung

半监督域适应通用和抽样复杂程度统一分析

2507.22632v1

231

07-30

Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Graph Kollaboratives Aufmerksamkeitsnetzwerk für Link-Vorhersage in Wissensgraphen

知识图中预测联系协作关注网络

2507.03947v2

232

07-30

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Die kooperative Netzwerkarchitektur: Lernstrukturierte Netzwerke als Darstellung sensorischer Muster

合作网络架构:学习结构网络作为感官模式的体现

2407.05650v4

233

07-30

Skull-stripping induces shortcut learning in MRI-based Alzheimer’s disease classification

Skull-Stipendien induziert das Kurzlehren in der MRT-basierten Alzheimer-Klassifikation

Skull脱光诱发在以磁RI为基础的阿尔茨海默氏病分类中进行捷径学习。

2501.15831v3

234

07-30

TempRe: Template generation for single and direct multi-step retrosynthesis

TempRe: Template-Generierung für einzelne und direkte Mehrschritt-Retrosynthese

Tempre: 用于单步和直接多步复演合成的模板生成

2507.21762v2

235

07-30

Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Komprimierungsmethode für das Deep Diagonal State Space Model basierend auf $H^2$ Optimale Reduktion

以2千赫元最佳减少量为基础的深对角国家空间模型压缩方法

2507.10078v2

236

07-30

Deep learning of geometrical cell division rules

Deep learning von geometrischen Zellteilungsregeln

深入学习几几何细胞分区规则

2507.22587v1

237

07-30

A Mean-Field Theory of $Θ$-Expectations

Eine Mittlere-Feld-Theorie von $ ?$-Erwartungen

平均实地理论(美元-15美元)-预期

2507.22577v1

238

07-30

MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models

MLMC-basierte Ressourcenadäquatitätsbewertung mit aktiven Learning-Trained-Surrogate-Modellen

以MLMC为基础的基于MLMC的资源充足性评估,与积极学习、经过培训的代用模型进行资源充足性评估

2505.20930v2

239

07-30

Hyperbolic Graph Learning: A Comprehensive Review

Hyperbolisches Graphenlernen: Eine umfassende Übersicht

超双曲图学习:全面审查

2202.13852v3

240

07-30

COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

COOkeD: Ensemble-basierte OOD-Erkennung im Zeitalter von Zero-Shot CLIP

COOOKD:在零弹CLIP时代以组合为基础的OOOD探测

2507.22576v1

241

07-30

Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity

Erklärung der tiefen Netzwerkklassifikation von Matrizen: Eine Fallstudie zur Monotonizität

解释母体深网络分类:单体性案例研究

2507.22570v1

242

07-30

Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Effizientes Differentielles Privates Feintuning von LLMs durch Verstärkungslernen

通过强化学习对LLMs 进行有区别的私人高效率私人罚款

2507.22565v1

243

07-30

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Nutzung synergistischer Kognitiv-Biasen zur Umgehung der Sicherheit in LLMs

利用协同协同一致的双星体在LLM中用于绕过安全

2507.22564v1

244

07-30

VAR: Visual Analysis for Rashomon Set of Machine Learning Models’ Performance

VAR: Visuelle Analyse der Leistungsfähigkeit von Rashomon-Modellen

VAR: Rashomon系列机器学习模型的视觉分析

2507.22556v1

245

07-30

DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology

DeepC4: Deep Conditional Census-Constrained Clustering für großflächige Multitasking-Spatiale Disaggregation der Urbanen Morphologie

深层C4:深入有条件的人口普查 – – 为大规模多任务城市病理学多任务空间分解进行有约束的集群

2507.22554v1

246

07-30

RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning

RainbowPrompt: Diversity-enhanced Prompt-Evolving für kontinuierliches Lernen

” 彩虹 “ 方案:加强多样性,为继续学习迅速发展

2507.22553v1

247

07-30

Thermodynamics-Inspired Computing with Oscillatory Neural Networks for Inverse Matrix Computation

Thermodynamik-inspiriertes Rechnen mit oszillatorischen Neuronalen Netzwerken für Inverse Matrix Computation

由热动力-受热力启发的计算,与用于反向矩阵计算算法的观测神经网络

2507.22544v1

248

07-30

Pre-trained Models Perform the Best When Token Distributions Follow Zipf’s Law

Vortrainierte Modelle führen das Beste aus, wenn Token-Distributionen Zipfs Gesetz folgen

事先培训的模型按照Zipf法在配制时最佳表现

2507.22543v1

249

07-30

A surrogate model for topology optimisation of elastic structures via parametric autoencoders

Ein Surrogatmodell zur Topologieoptimierung von elastischen Strukturen über parametrische Autoencoder

通过参数自动电解器使弹性结构在地形学上优化的替代模型

2507.22539v1

250

07-30

Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

Accident-Driven Congestion Prediction and Simulation: Ein erklärbares Framework mit Advanced Clustering und Bayesian Networks

意外 – – 发生时的拥挤预测和模拟:使用先进集束和贝耶斯网络的可解释框架

2507.22529v1

251

07-30

FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

FGFP: Ein zerbrechlicher Gaußfilter und Pruning für tiefe neurale Netzwerke Kompression

FGFP: 一个分数高斯过滤器和深神经网络压缩

2507.22527v1

252

07-30

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

HGCN(O): Ein selbsttätiges GCN-Hypermodell-Toolkit zur Vorhersage der Ergebnisse in Ereignis-Sequenzdaten

HGCN(O):关于事件序列数据结果预测的自发GCN超模工具箱

2507.22524v1

253

07-30

Rethinking Individual Fairness in Deepfake Detection

Individuelle Fairness in Deepfake Detection neu denken

重新思考个人在深假探测中的公平性

2507.14326v2

254

07-30

SmilesT5: Domain-specific pretraining for molecular language models

SmilesT5: Domainspezifische Vorausbildung für molekulare Sprachmodelle

微笑T5:具体领域分子语言模型预培训

2507.22514v1

255

07-30

AlphaDent: A dataset for automated tooth pathology detection

AlphaDent: Ein Datensatz für automatisierte Zahnpathologie-Erkennung

AlphaDent:用于自动检测牙齿病理学的数据集

2507.22512v1

256

07-30

Geometry of nonlinear forecast reconciliation

Geometrie der nichtlinearen Vorhersageabgleichung

非线性预测对账的几何测量

2507.22500v1

257

07-30

LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning

LoReUn: Daten selbst stellen implizit Cues zur Verbesserung des maschinellen Lernens zur Verfügung

LOUUU:数据本身不言而喻地提供了改进机器脱学的库子

2507.22499v1

258

07-30

Reconstructing Historical Climate Fields With Deep Learning

Rekonstruieren von historischen Klimafeldern mit tiefem Lernen

重建历史气候领域与深学习

2311.18348v2

259

07-30

Emergence of Quantised Representations Isolated to Anisotropic Functions

Entstehung quantifizierter Repräsentationen isoliert mit anisotropen Funktionen

孤立到非尼斯代职能的量化代表的出现情况

2507.12070v2

260

07-30

Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

Eigentumsverifizierung von DNN-Modellen mit White-Box-Adversarial-Angriffen mit spezifizierter Wahrscheinlichkeitsmanipulation

DNN 使用白毒对反对反对性袭击模式进行指定概率操纵的DNN自有性核查

2505.17579v3

261

07-30

Probing Information Distribution in Transformer Architectures through Entropy Analysis

Probing Information Distribution in Transformer-Architekturen durch Entropie-Analyse

通过 Entropy 分析在变形结构中进行测试信息发布

2507.15347v2

262

07-30

LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

LVM-GP: Unsicherer PDE Solver über Kopplung latent variables Modell und Gaußschen Prozess

LVM-GP:通过混合潜潜伏变量模型和Gaussian过程的不确定性-软件PDE溶解器

2507.22493v1

263

07-30

Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

Proto-EVFL: Verbessertes vertikales Federated Learning über Dual Prototype mit extrem ungebundenen Daten

EVFL:通过具有极不匹配数据的双重原型强化垂直联邦学习

2507.22488v1

264

07-30

Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

Konvergenzeigenschaften der natürlichen Gradientenablassung zur Minimierung der KL-Divergenz

最小化 KL 差异的自然渐变源的趋同属性

2504.19259v2

265

07-30

The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications

Die Kugel-Proximal (=”Broximal”) Punktmethode: ein neuer Algorithmus, Konvergenztheorie und Anwendungen

Ball- Proximal (=“ 布鲁克马” ) 点法: 新的算法、趋同理论和应用

2502.02002v2

266

07-30

Visual Language Models as Zero-Shot Deepfake Detectors

Visuelle Sprachmodelle als Zero-Shot Deepfake Detektoren

视觉语言模型,作为零热深假探测器

2507.22469v1

267

07-30

Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework

Auf dem Weg zu einer interpretierbaren Renal Health-Prognose über Multi-LMM-Kollaboratives Reasoning-Framework

通过多伦多和多伦多MM合作理由框架,迈向可解释性中时健康下降预测

2507.22464v1

268

07-30

SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

SDBA: Ein steter und langlebiger Hintertürangriff im Federated Learning

SDBA: 联邦学习中的隐秘和长期持久的后门攻击

2409.14805v2

269

07-30

Trajectory First: A Curriculum for Discovering Diverse Policies

Trajektorie zuerst: Ein Curriculum für die Entdeckung unterschiedlicher Politiken

轨迹第一:发现多样化政策课程

2506.01568v2

270

07-30

Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

Strategische Integration der Künstlichen Intelligenz in die C-Suite: Die Rolle des Chief AI Officer

C. 人造情报在C-实物中的战略整合:AI首席干事的作用

2407.10247v2

271

07-30

A case for data valuation transparency via DValCards

Ein Fall für Datenbewertungstransparenz über DValCards

通过 DValCards 提高数据估价透明度的一个案例

2506.23349v2

272

07-30

Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

Breaking Obfuscation: Cluster-Aware Graph mit LLM-gestützte Erholung für bösartige JavaScript-Erkennung

打破困惑:利用LLM辅助回收利用LLM的集束器图,用于恶意爪哇Script探测

2507.22447v1

273

07-30

RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function

RCR-AF: Verbesserung der Modellverallgemeinerung durch Rademacher-Komplexitätsreduktions-Aktivierungsfunktion

RCR-AF:通过雷德马赫赫减少复杂程度减少激活功能加强示范性一般化

2507.22446v1

274

07-30

RANA: Robust Active Learning for Noisy Network Alignment

RANA: Robustes aktives Lernen für geräuschreiche Netzwerkausrichtung

RANA: 大力积极学习,促进吵闹网络对齐

2507.22434v1

275

07-30

Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems

Vergleich der Normalisierungsströme mit der Schätzung der Kerneldichte bei der Schätzung des Risikos Automatisierter Fahrsysteme

在估计自动驱动系统的风险时,将正常流动与内核密度量估计值与内核密度量的标准化对比

2507.22429v1

276

07-30

Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss

Theoretische Analyse von relativen Fehlern bei gradienten Berechnungen für Adversarialangriffe mit CE-Verlust

CE损失反向攻击的渐进计算中的相对误差理论分析

2507.22428v1

277

07-30

Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Multimodales Late-Fusion-Modell für Problemlösungsstrategie-Klassifizierung in einem Machine-Learning-Spiel

机器学习游戏中解决问题战略分类的多模式晚期融合模式

2507.22426v1

278

07-30

Bridging Privacy and Robustness for Trustworthy Machine Learning

Überbrückung von Privatsphäre und Robustheit für vertrauenswürdiges maschinelles Lernen

连接隐私和强力,促进可信赖的机器学习

2403.16591v5

279

07-30

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance

Spec-VLA: Spekulative Dekodierung für Vision-Language-Action-Modelle mit entspannter Akzeptanz

Spec-VLA:放宽接受的愿景-语言-行动模式的投机代号

2507.22424v1

280

07-30

Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

Neurale Netzwerke als universelle Finite-State-Maschinen: Ein konstruktives ReLU-Simulations-Framework für NFAs

神经网络作为普遍有限国家机器:非官方FAS的建设性再LU模拟框架

2505.24110v2

281

07-30

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

SmallThinker: Eine Familie von effizienten großen Sprachmodellen Natively Trained for Local Deployment

小规模:一个由本地培训的高效大语言模式组成的家庭,供当地部署使用

2507.20984v2

282

07-30

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

Föderierte, distributiv robuste Optimierung mit nicht konvexen Zielen: Algorithmen und Analysen

与非Convex目标优化的联邦分布强度优化:等级和分析

2307.14364v2

283

07-30

FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

FedCVD++: Kommunikationseffizientes Federated Learning für kardiovaskuläre Risikovorhersage mit parametrischer und nicht parametrischer Modelloptimierung

FedCVD++: 具有参数和非参数模型优化的心血管风险预测通信-高效联邦学习

2507.22963v1

284

07-30

MINR: Implicit Neural Representations with Masked Image Modelling

MIRR: Implizite Neuraldarstellungen mit maskierter Bildmodellierung

MINR:带有蒙面图像建模的隐性神经图示

2507.22404v1

285

07-30

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

OpenEarthSensing: Großformatiger, feinkörniger Benchmark für Open-World Remote Sensing

开放地球传感器:开放世界遥感大型精细基准

2502.20668v2

286

07-30

Gems: Group Emotion Profiling Through Multimodal Situational Understanding

Edelsteine: Gruppen-Emotion Profiling durch multimodale Situation verstehen

Gems:通过多模式情况理解来分析群体情感

2507.22393v1

287

07-30

Outcome-based Reinforcement Learning to Predict the Future

Ergebnisbasiertes Bewehrungslernen zur Vorhersage der Zukunft

基于成果的强化学习,以预测未来

2505.17989v3

288

07-30

PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

PATENTWRITER: Eine Benchmarking-Studie für die Patenterstellung mit LLMs

PATENTWRITER: 专利起草基准研究与LLMs

2507.22387v1

289

07-30

Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

Multi-Hazard Frühwarnsysteme für die Landwirtschaft mit featured-Temporal Erklärungen

多危险农业预警系统及时/时解释

2507.22962v1

290

07-30

OWLViz: An Open-World Benchmark for Visual Question Answering

OWLViz: Ein Open-World-Benchmark für visuelle Fragen

OWLViz:视觉问答的开放世界基准

2503.07631v3

291

07-30

Set Invariance with Probability One for Controlled Diffusion: Score-based Approach

Invarianz mit Probability One für kontrollierte Diffusion einstellen: Score-basierter Ansatz

设定控制下扩散的概率一的变量一:计分法

2507.22385v1

292

07-30

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

MAVFlow: Paralinguistische Elemente mit konditionellem Fluss erhalten, passend für blitzfreie AV2AV Mehrsprachige Übersetzung

MAVFlow: 将语言要素与条件流动相匹配的配方元素保留在零点热AV2AV多语种翻译中

2503.11026v2

293

07-30

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Verbesserung der Verallgemeinerung Fähigkeit des Roboterimitationslernens durch Lösung von Kausalverwirrung in Beobachtungen

通过解决观测中的原因融合,提高机器人模拟学习的普遍化能力

2507.22380v1

294

07-30

Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review

Jährliche Entwicklungen bei der Erkennung von Finanzbetrug durch Deep Learning: Ein systematischer Literaturbericht

《通过深学习侦查金融欺诈:系统文学审查》年年发展动态

2502.00201v2

295

07-30

Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

Vorhersage des akustischen Feldes im 1-D-Uniformkanal mit unterschiedlichem mittleren Durchfluss und Temperatur über neuronale Netze

使用神经网络以不同平均流量和温度的1D级统一管道声学场预测

2507.22370v1

296

07-30

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

BlockFFN: Auf dem Weg zur End-Side Acceleration-Friendly Mixture-of-Experts mit Chunk-Level-Aktivierung Sparsity

块块FFN: 向具有整块级激活分级的终端- 双极加速- 友好混合混合专家方向

2507.08771v2

297

07-30

($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametrische Multi-Task-Optimierung: Gemeinsame Suche in Lösungs- und unendlichen Aufgabenräumen

(boldsymboll,\boldsymbolu$) - 几何多功能优化:在解决方案中共同搜索和无限任务空间

2503.08394v3

298

07-30

MSQ: Memory-Efficient Bit Sparsification Quantization

MSQ: Speichereffiziente Bit Sparsifikation Quantisierung

MSQ: 内存效率比分分量化

2507.22349v1

299

07-30

Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Nutzung von großen Sprachmodellen für Bengalische Mathematik-Wort-Probleme bei der Lösung der Kette der Gedankenveranlagung

利用大语言模型解决孟加拉语数学字词与思维链理性的解决问题

2505.21354v2

300

07-30

Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Koopman-basierte Verallgemeinerung des Deep Reinforcement Learning mit Anwendung in der drahtlosen Kommunikation

以Koopman为基础的深强化学习通用,应用于无线通信

2503.02961v2

301

07-30

Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution

Robustes Filtern und Lernen in State-Space-Modellen: Skewness und Heavy Tails via Asymmetrische Laplace-Distribution

州空间模型中的强力过滤和学习:扭曲和重尾体通过反对称拉皮板分布

2507.22343v1

302

07-30

A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks

Ein semi-überwachtes Federated Learning Framework mit Hierarchical Clustering Aggregation für heterogene Satellitennetzwerke

半上层联邦学习框架,包括异源卫星网络的等级集群聚合

2507.22339v1

303

07-30

Parametrized Multi-Agent Routing via Deep Attention Models

Parametrisiertes Multi-Agent Routing über Deep Attachment Modelle

透过深关注模型流出

2507.22338v1

304

07-30

HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

HypKG: Hypergraph-basierte Wissensgrafik Kontextualisierung für Precision Healthcare

HYPKG: 精密保健基于地平线知识图背景情况

2507.19726v2

305

07-30

Hypernetworks for Model-Heterogeneous Personalized Federated Learning

Hypernetzwerke für modell-heterogenes personalisiertes Federated Learning

模拟异异异性个性化联邦学习超级网络

2507.22330v1

306

07-30

FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

FAST: Ein Optimierungsrahmen für schnelle Additive Segmentierung in Transparent ML

FAST: 透明 ML 快速添加分割的最佳框架

2402.12630v2

307

07-30

Provable Low-Frequency Bias of In-Context Learning of Representations

Wahrscheinliche frequenzarme Bias des In-Context-Lernens von Repräsentationen

可实现的低公平率代表制的理论内学习

2507.13540v2

308

07-30

Floating-Point Neural Networks Are Provably Robust Universal Approximators

Floating-Point-Neural-Netzwerke sind wahrscheinlich robuste Universal-Annäherung

浮动点神经网络具有可可预见强健的通用通用近似器

2506.16065v2

309

07-30

Scientific Machine Learning with Kolmogorov-Arnold Networks

Wissenschaftliches maschinelles Lernen mit Kolmogorov-Arnold-Netzwerken

Kolmogorov-Arnold网络的科学机器学习

2507.22959v1

310

07-30

The challenge of hidden gifts in multi-agent reinforcement learning

Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen

多试剂强化学习中隐藏礼品的挑战

2505.20579v3

311

07-30

BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

BEACON: Eine Bayesische Optimierungsstrategie für Neuheitssuche in teuren Black-Box-Systemen

BEACON: 昂贵的黑箱系统新奇搜索贝叶斯最佳最佳战略

2406.03616v4

312

07-30

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Wavelet trifft Adam: Komprimierende Gradienten für Gedächtnis-Effizientes Training

Wavelet Meets Adam:将逐步压缩用于记忆效率培训

2501.07237v3

313

07-30

Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality

Dekodierung neuraler Signaturen semantischer Bewertungen in Depression und Suizidalität

萧条和自相残杀情况下语义评价解码神经签名

2507.22313v1

314

07-30

An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

Ein asynchroner dezentralisierter Optimierungsalgorithmus für nichtkonvexe Probleme

非经济问题非集中分散化最佳优化比值

2507.22311v1

315

07-30

High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

High-Resolution Live Fuel Moisture Content (LFMC) Karten für Wildfire-Risiko aus multimodalen Erdbeobachtungsdaten

多式地球观测数据产生的野火风险高分辨率活燃料动力内容地图

2506.20132v2

316

07-30

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Vergleich von Cluster-basierten Cross-Validation-Strategien für die Bewertung von Machine Learning-Modellen

比较用于机械学习模式评价的集群交叉评估战略

2507.22299v1

317

07-29 (2)

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

AlphaEarth Foundations: Ein eingebettetes Feldmodell für genaue und effiziente globale Kartierung aus spärlichen Etikettendaten

阿尔法地球基金会:利用稀少标签数据进行准确、高效全球制图的嵌入实地模型

2507.22291v1

318

07-29

Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations

Intent Recognition und Out-of-Scope-Erkennung mit LLMs in Multi-Party-Konversationen

在多方对话中使用LLMs

2507.22289v1

319

07-29

CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

CHECK-MAT: Überprüfung von handschriftlichen mathematischen Antworten für die russische Unified State Prüfung

CHECK-MAT: 检查俄罗斯统一国家考试的手写数学答案

2507.22958v1

320

07-29

An Introduction to Modern Statistical Learning

Eine Einführung in das moderne statistische Lernen

现代统计学习介绍

2207.10185v2

321

07-29

HOG-CNN: Integrating Histogram of Oriented Gradients with Convolutional Neural Networks for Retinal Image Classification

HOG-CNN: Integration des Histogramms orientierter Gradienten mit konvolutionären Neuralnetzwerken für die Retinalbildklassifikation

HRG-CNN:将定向梯度直方图与关于视视像图像分类的革命神经网络整合

2507.22274v1

322

07-29

The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

Die Bedeutung des Diskreten seins: Messung der Auswirkungen der Diskretisierung in End-to-End-Differentially Private Synthetic Data

差异的重要性:衡量端至端端差异性私人合成数据中差异化的影响

2504.06923v3

323

07-29

Weighted Conditional Flow Matching

Gewichteter Bedingter Fluss passend

加权有条件流动匹配

2507.22270v1

324

07-29

Agent-centric learning: from external reward maximization to internal knowledge curation

Agentzentriertes Lernen: von der externen Belohnungsmaximierung bis zur internen Wissenskuration

以代理人为中心的学习:从外部奖励最大化到内部知识整理

2507.22255v1

325

07-29

Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields

Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern

由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段

2506.08146v2

326

07-29

Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training

Verwendung von Skalierungsgesetzen für Datenquellen-Utility-Schätzung im Domain-Spezifischen Pre-Training

在具体区域培训前使用数据源实用性估算法

2507.22250v1

327

07-29

LLM-Assisted Cheating Detection in Korean Language via Keystrokes

LLM-Assisted Cheating Detection in koreanischer Sprache über Tastenanschläge

通过Keystrokes用韩语协助LLM

2507.22956v1

328

07-29

Understanding Concept Drift with Deprecated Permissions in Android Malware Detection

Verständnis Konzept Drift mit veralteten Berechtigungen in Android Malware-Erkennung

理解在Android Maware 探测中拥有过时权限的漂浮概念

2507.22231v1

329

07-29

TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction

TRIBE: TRImodaler Gehirnencoder für Vollhirn fMRI-Antwortvorhersage

TRIBE:用于全脑FMRI反应预测的三元脑大脑编码器

2507.22229v1

330

07-29

LLMs Between the Nodes: Community Discovery Beyond Vectors

LLMs zwischen den Knoten: Community Discovery Beyond Vectors

节点之间的LLMs:除矢量之外的社区发现

2507.22955v1

331

07-29

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Kontrastive Test-Zeit-Zusammensetzung mehrerer LoRA-Modelle für die Bildgenerierung

图像生成多种LORA模型的反向测试时间构成

2403.19776v2

332

07-29

Explainability-Driven Feature Engineering for Mid-Term Electricity Load Forecasting in ERCOT’s SCENT Region

Erklärbarkeitsgetriebene Feature-Engineering für mittelfristige Stromlastprognosen in der SCENT-Region von ERCOT

ERCOT地区中期电力负载预报的可解释性-变化式地貌工程

2507.22220v1

333

07-29

Representation biases: will we achieve complete understanding by analyzing representations?

Repräsentationsvoreingenommenheiten: Werden wir durch die Analyse von Repräsentationen ein vollständiges Verständnis erreichen?

代表性偏差:我们能否通过分析表述来实现完全理解?

2507.22216v1

334

07-29

Neural Autoregressive Modeling of Brain Aging

Neurale Autoregressive Modellierung des Gehirnalterns

脑老龄化神经自动递减建模

2507.22954v1

335

07-29

Intent-Aware Neural Query Reformulation for Behavior-Aligned Product Search

Intent-Aware Neural Query Reformulation für verhaltensorientierte Produktsuche

用于行为自动产品搜索的内在软件元件神经查询重新校正

2507.22213v1

336

07-29

Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling

Graphenbasiertes unsicheres Selbsttraining mit stochastischem Knoten-Etikettierung

以图形为基础的不确定性软件自训练与斯托卡节点标签

2503.22745v2

337

07-29

Uncertainty-Aware Graph Self-Training with Expectation-Maximization Regularization

Unsicheres Graphen-Selbst-Training mit Erwartungsmaximierung Regularisierung

具有预期-最大程度正规化的不确定性-软件图自我培训

2503.22744v2

338

07-29

Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection

Adaptive State-Space Mamba für Echtzeit-Sensordatenanomalienerkennung

用于实时传感器数据异常探测的适应性国家空间Mamba

2503.22743v2

339

07-29

Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

Gemeinsam besser: Kreuz- und Gelenkkovarianzen verbessern die Erkennung von Signalen in unterprobierten Daten

更好:交叉和共同变量加强未充分抽样数据中的信号可探测性

2507.22207v1

340

07-29

CTG-Insight: A Multi-Agent Interpretable LLM Framework for Cardiotocography Analysis and Classification

CTG-Insight: Multi-Agent Interpretable LLM Framework für Kardiotokographie Analyse und Klassifizierung

CTG-In透视:多机构可解释LLM 心电图学分析和分类框架

2507.22205v1

341

07-29

KIX: A Knowledge and Interaction-Centric Metacognitive Framework for Task Generalization

KIX: Ein Wissen und Interaktion-Zentrisches Metakognitives Framework für die Aufgabenverallgemeinerung

KIX: 任务一般化的知识和互动中心元化框架

2402.05346v3

342

07-29

Measuring Time-Series Dataset Similarity using Wasserstein Distance

Messung der Zeitreihen-Datensätze Ähnlichkeit mit Wasserstein-Abstand

利用瓦瑟斯坦距离测量时间序列数据集的相似性

2507.22189v1

343

07-29

A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Eine skalierbare Pipeline zur Schätzung von Verb Frame Frequenzen mit großen Sprachmodellen

使用大语言模型估算 Verb 框架频谱的可缩放管道

2507.22187v1

344

07-29

SourceSplice: Source Selection for Machine Learning Tasks

SourceSplice: Auswahl der Quellen für Aufgaben des maschinellen Lernens

源代码Splice: 机器学习任务源选择

2507.22186v1

345

07-29

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

SyncDiff: Synchronisierte Bewegung Diffusion für Multi-Body Mensch-Objekt-Interaktion Synthese

同步Diff: 用于多波人-物体相互作用合成的同步运动扩散

2412.20104v5

346

07-29

Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Gestapelte SVD oder SVD gestapelt? Eine Random Matrix Theorie Perspektive auf Datenintegration

堆叠的 SVD 还是堆叠的 SVD ? 关于数据整合的随机矩阵理论视角

2507.22170v1

347

07-29

Distributional Unlearning: Forgetting Distributions, Not Just Samples

Verteilungsloses Lernen: Verteilungen vergessen, nicht nur Proben

分发的不学习:忘记分发,而不仅仅是抽样

2507.15112v2

348

07-29

When Truthful Representations Flip Under Deceptive Instructions?

Wenn wahrheitsgetreue Darstellungen unter trügerische Anweisungen fallen?

当真相代表在欺骗性指令下翻转时?

2507.22149v1

349

07-29

MOSS: Multi-Objective Optimization for Stable Rule Sets

MOSS: Multi-Objektive Optimierung für stabile Regelsätze

MOSS: 稳定规则集的多目标优化

2506.08030v2

350

07-29

Automated Label Placement on Maps via Large Language Models

Automatische Etikettenplatzierung auf Karten über große Sprachmodelle

通过大语言模型在地图上自动贴贴标签

2507.22952v1

351

07-29

Foundation Models for Demand Forecasting via Dual-Strategy Ensembling

Grundlagenmodelle für die Nachfrageprognose über Dual-Strategy-Assembling

通过双战略组合进行需求预测的基础模型

2507.22053v1

352

07-29

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives

从树形集合中提取解释模型:计算和统计视角

2506.20114v3

353

07-29

Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling

Gewicht-Parameterisierung in kontinuierlichen Zeittiefen Neuronalen Netzwerken für Surrogatmodellierung

用于代用建模的连续时间深心神经网络中的重量光度计

2507.22045v1

354

07-29

Compton Form Factor Extraction using Quantum Deep Neural Networks

Compton Form Factor Extraction mit Hilfe von Quantum Deep Neural Networks

使用量子深神经网络抽取 Compton 窗体系数

2504.15458v2

355

07-29

Structure-Informed Deep Reinforcement Learning for Inventory Management

Strukturinformiertes Deep Verstärkungslernen für das Bestandsmanagement

为库存管理进行结构化深强化学习

2507.22040v1

356

07-29

SAKE: Steering Activations for Knowledge Editing

SAKE: Steuerung von Aktivierungen für die Wissensbearbeitung

战略:知识编辑指导活动

2503.01751v2

357

07-29

Supervised Quantum Image Processing

Überwachte Quantenbildverarbeitung

监督量子图像处理

2507.22039v1

358

07-29

UserBench: An Interactive Gym Environment for User-Centric Agents

UserBench: Eine interaktive Gym-Umgebung für User-Centric-Agenten

用户 Bench: 用户中心代理器的交互式 Gym 环境

2507.22034v1

359

07-29

Classification of Honey Botanical and Geographical Sources using Mineral Profiles and Machine Learning

Klassifizierung von Honig Botanical und Geografische Quellen mit Mineralprofilen und maschinellem Lernen

利用矿物概况和机器学习对蜂蜜植物和地理来源进行分类

2507.22032v1

360

07-29

Persistent Backdoor Attacks in Continual Learning

Persistente Hintertürangriffe im kontinuierlichen Lernen

持续学习中的持续后门攻击

2409.13864v3

361

07-29

Exploring the Stratified Space Structure of an RL Game with the Volume Growth Transform

Erforschung der Stratifizierten Raumstruktur eines RL-Spiels mit der Volume Growth Transform

探索与量增长变换的RL游戏的分流空间结构

2507.22010v1

362

07-29

An $\tilde{O}$ptimal Differentially Private Learner for Concept Classes with VC Dimension 1

Ein $\tilde{O}$ptimal Differential Private Learner für Konzeptklassen mit VC Dimension 1

$\tilde{O} 用于 VC 1 层面概念类的 $timal diffical 私人不同学习器

2505.06581v2

363

07-29

Staining and locking computer vision models without retraining

Staining und Verriegelung von Computer Vision-Modelle ohne Umschulung

不经再培训而将计算机视觉模型固定和封闭

2507.22000v1

364

07-29

Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation

Lehre mich zu Trick: Erforschen von zweifelhafter Übertragbarkeit durch Wissensdestillation

教我变作:探索通过知识蒸馏来进行逆向转让

2507.21992v1

365

07-29

Higher-Order Kuramoto Oscillator Network for Dense Associative Memory

Höhere Ordnung Kuramoto Oszillator Netzwerk für Dense Assoziative Speicher

高端仓本聚合内存振动加速器网络

2507.21984v1

366

07-29

Improving Generative Ad Text on Facebook using Reinforcement Learning

Verbesserung des generativen Ad-Texts auf Facebook mit Verstärkungslernen

利用强化学习改善脸书上的创创创广告

2507.21983v1

367

07-29

Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities

Thou Shalt Not Prompt: Zero-Shot menschliche Aktivitätserkennung in Smart Homes durch Sprachmodellierung von Sensordaten & Aktivitäten

” Thowel “ 不迅速:通过感应数据和活动语言建模模拟,在智能家庭内零点热人类活动确认

2507.21964v1

368

07-29

SLA-Centric Automated Algorithm Selection Framework for Cloud Environments

SLA-Centric automatisierte Algorithmenauswahl-Framework für Cloud-Umgebungen

SLA-Centric 云层环境自动测算选择框架

2507.21963v1

369

07-29

Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data

Gewährleistung der Sicherheit von medizinischer KI: Interpretationsgestützte Erkennung und Minderung von sauberen Modellverhalten und zugehörigen Daten

确保医疗AI安全:可解释性-驱动性探测和减少污秽模型行为和相关数据

2501.13818v2

370

07-29

DeepGo: Predictive Directed Greybox Fuzzing

深度Go:预测方向灰盒模糊

2507.21952v1

371

07-29

Simulating Posterior Bayesian Neural Networks with Dependent Weights

Simulation von hinteren bayesischen neuralen Netzwerken mit abhängigen Gewichten

模拟具有依附体重量的波别海湾神经网络

2507.22095v1

372

07-29

Multi-state Protein Design with DynamicMPNN

Multi-State Protein Design mit DynamicMPNN

具有 DiriveMPNN 的多州先质设计

2507.21938v1

373

07-29

Linear Stability Analysis of Physics-Informed Random Projection Neural Networks for ODEs

Lineare Stabilitätsanalyse der physikinformierten Zufallsprojektion Neurale Netzwerke für ODEs

极光体物理集成随机投射神经网络的线性稳定性分析

2408.15393v2

374

07-29

SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs

SmoothRot: Kombination von Kanal-Weiss-Skalierung und Rotation für Quantisierungsfreundliche LLMs

平滑旋转: 将频道- Wise 缩放和旋转组合起来, 用于量化- 友好型LLMS

2506.05413v2

375

07-29

SLR: Automated Synthesis for Scalable Logical Reasoning

SLR: Automatisierte Synthese für skalierbare logische Vernunft

SLR: 用于可缩放逻辑理由的自动合成

2506.15787v3

376

07-29

HiPreNets: High-Precision Neural Networks through Progressive Training

HiPreNets: Hochpräzisions-Neuralnetzwerke durch progressives Training

HPRENets:通过渐进培训建立高精度神经网络

2506.15064v2

377

07-29

Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks

Generalists vs. Specialists: Bewertung von LLMs auf hochkonzentrierten biophysikalischen Sequenzoptimierungsaufgaben

通才与专家:评估高度约束生物物理序列优化任务中受高度约束的生物物理序列优化任务LLMs

2410.22296v5

378

07-29

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

TESSERA: Temporale Einbettungen von Oberflächenspektren für die Darstellung und Analyse der Erde

TESSERA:用于地球代表和分析的地平面表面表层实时嵌入

2506.20380v3

379

07-29

Evaluating Deepfake Detectors in the Wild

Bewertung von Deepfake-Detektoren in der Wildnis

评估野生深假探测器

2507.21905v1

380

07-29

Receding Hamiltonian-Informed Optimal Neural Control and State Estimation for Closed-Loop Dynamical Systems

Receding Hamiltonian-informed Optimal Neural Control und State Abschätzung für Closed-Loop Dynamical Systems

正在消退汉密尔顿式密密尔顿式内装最佳神经控制以及闭闭棒动态系统的国家估计

2411.01297v3

381

07-29

Can sparse autoencoders make sense of gene expression latent variable models?

Können spärliche Autoencoder die Genexpression latent variabler Modelle sinnvoll machen?

稀有的自动代碼器能理解基因表达潜在的变异模型吗?

2410.11468v3

382

07-29

Reducing Data Requirements for Sequence-Property Prediction in Copolymer Compatibilizers via Deep Neural Network Tuning

Reduzierung der Datenanforderungen für die Sequence-Property-Prognose in Copolymer-Compatibilizern über Deep Neural Network Tuning

通过深神经网络图案减少通过深神经网络图案对聚合聚合复合集成器的序列-财产预测数据要求

2507.21902v1

383

07-29

LLM-based Content Classification Approach for GitHub Repositories by the README Files

LLM-basierter Content-Klassifikationsansatz für GitHub-Repositories durch die README-Dateien

REEADME 文件中基于LLM的GitHub储存库内容分类方法

2507.21899v1

384

07-29

Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis

Kardiovaskuläre Krankheitsvorhersage mit maschinellem Lernen: Eine vergleichende Analyse

利用机器学习对心血管疾病进行预测:比较分析

2507.21898v1

385

07-29

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

Eine Rezension über Selbstüberwachtes Lernen für Zeitreihenanomalienerkennung: Neuere Fortschritte und offene Herausforderungen

《反反常探测:最新进展和公开挑战》时间序列自我监督学习回顾

2501.15196v2

386

07-29

Data-driven quantum Koopman method for simulating nonlinear dynamics

Datengesteuerte Quantenkoopman-Methode zur Simulation nichtlinearer Dynamik

模拟非线性动态的由数据驱动的量量 Koopman 方法

2507.21890v1

387

07-29

Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions

Puzzle-Ähnlichkeit: Ein perzeptuell geführtes Cross-Reference-Metrikum für Artefakterkennung in 3D-Szenenrekonstruktionen

3D 场景重建中个体行为探测概念引导交叉参考参考度量

2411.17489v3

388

07-29

Prediction accuracy versus rescheduling flexibility in elective surgery management

Vorhersagegenauigkeit versus Anpassungsflexibilität im elektiven chirurgischen Management

在选修外科管理方面,预测准确性与重新安排灵活性

2507.15566v2

389

07-29

Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting

Context-Aware Probabilistische Modellierung mit LLM für multimodale Zeitreihenvorhersage

与多种时序多时序预测的LLMLM建立环境软件概率模型

2505.10774v2

390

07-29

Representations in vision and language converge in a shared, multidimensional space of perceived similarities

Repräsentationen in Vision und Sprache konvergieren in einem geteilten, mehrdimensionalen Raum wahrgenommener Ähnlichkeiten

视觉和语言代表在共同的、多层面的、有共同感知的相似性空间中汇合在一起

2507.21871v1

391

07-29

Discovering Interpretable Ordinary Differential Equations from Noisy Data

Das Entdecken interpretierbarer gewöhnlicher Differentialgleichungen aus Noisy-Daten

从噪音数据中发现可解释的普通差异

2507.21841v1

392

07-29

Analysis of Fourier Neural Operators via Effective Field Theory

Analyse von Fourier-Neuraloperatoren über Effektive Feldtheorie

通过有效实地理论分析四架神经操作器

2507.21833v1

393

07-29

Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences

Einführung von HALC: Eine allgemeine Pipeline für die Suche nach optimalen Promptenstrategien für die automatisierte Codierung mit LLMs in den Computational Social Sciences

介绍HALC:寻找计算社会科学中与LLMs自动编码的最佳加速战略的一般管道

2507.21831v1

394

07-29

EEG-CLIP : Learning EEG representations from natural language descriptions

EEG-CLIP : Lernen von EEG-Darstellungen aus natürlichen Sprachbeschreibungen

EEG-CLIP:从自然语言说明中学习EEG代表

2503.16531v2

395

07-29

MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation

MIBoost: Ein Gradient, der Algorithmen für die variable Auswahl nach mehrfacher Imputation erhöht

MIBoost: 多重截断后变量选择的渐变推推算算法

2507.21807v1

396

07-29

Scaling and Distilling Transformer Models for sEMG

Skalierung und Destillierung von Transformer-Modellen für sEMG

SEMG 缩放和蒸馏变压器模型

2507.22094v1

397

07-29

Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

Bayesian Neural Network Surrogats für die Bayesian Optimierung von CO2-Abscheidungs- und -Speicheroperationen

Bayesian碳捕获和储存作业最佳利用Bayesian 碳捕获和储存作业的Bayesian神经网络代管国

2507.21803v1

398

07-29

Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer

Entsperrende Interpretierbarkeit für RF Sensing: Ein komplexes White-Box-Transformator

RF遥感的解锁可解释性:一个复杂而有价值的白箱变换器

2507.21799v1

399

07-29

Unifying Post-hoc Explanations of Knowledge Graph Completions

Vereinheitlichung von Post-hoc-Erklärungen von Wissensgraphen-Vervollständigungen

知识图完成后统一解释

2507.22951v1

400

07-29

Conceptualizing Uncertainty: A Concept-based Approach to Explaining Uncertainty

Konzeptualisierung der Unsicherheit: Ein konzeptbasierter Ansatz zur Erklärung der Unsicherheit

不确定性概念化:以概念为基础的解释不确定性的方法

2503.03443v2

401

07-29

A finite time analysis of distributed Q-learning

Eine endliche Zeitanalyse des verteilten Q-Learning

对分发的 “ 学习 “ 的有限时间分析

2405.14078v2

402

07-29

Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Domänenverallgemeinerung und Anpassung in Intensivpflege mit Ankerregression

锁定后退的密集护理中的广域化和适应

2507.21783v1

403

07-29

Learning Kinetic Monte Carlo stochastic dynamics with Deep Generative Adversarial Networks

Learning Kinetic Monte Carlo stochastische Dynamik mit tiefen Generativen Adversarial Networks

与深创反对流网络一起学习运动式蒙特卡洛运动

2507.21763v1

404

07-29

Unified machine-learning framework for property prediction and time-evolution simulation of strained alloy microstructure

Unified Machine-Learning-Framework für die Eigenschaftsvorhersage und Zeit-Evolutions-Simulation von strapazierter Legierungs-Mikrostruktur

财产预测统一机械学习框架和累累合金微结构时间演变模拟

2507.21760v1

405

07-29

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

VLA-Touch: Erweiterung von Vision-Language-Action-Modellen mit Dual-Level Taktiles Feedback

VLA-Touch:加强具有双轨反馈的愿景-语言-行动模式

2507.17294v2

406

07-29

Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification

Verbesserung der Neural Network Training mit Hilfe von Dynamic Learning Rate Schedules für PINNs und Bildklassifikation

改善神经网络培训,利用动态学习率表改进个人信息网络和图像分类

2507.21749v1

407

07-29

evoxels: A differentiable physics framework for voxel-based microstructure simulations

evoxels: Ein differenzierbares Physik-Framework für Voxel-basierte Mikrostruktursimulationen

evoxels:基于 voxel 的微结构模拟法的不同物理框架

2507.21748v1

408

07-29

Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

Einmal quantifizieren, schnell trainieren: Allreduce-kompatible Kompression mit wahrnehmbaren Garantien

量化一次,快速列车:用可变担保进行减压-可比较压缩

2305.18627v2

409

07-29

Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques

Motion Diffusion Autoencoder: Ermöglichen der Attributmanipulation in der menschlichen Bewegung demonstriert auf Karate-Techniken

运动扩散自动调控器:在空手道技术上展示的在人类运动中进行使能的特性操纵

2501.18729v2

410

07-29

Zero-Shot Machine Unlearning with Proxy Adversarial Data Generation

Zero-Shot-Maschine-Entlernen mit Proxy-Adversarial-Datengenerierung

零热机离学,利用代理反对流数据生成

2507.21738v1

411

07-29

Generalized few-shot transfer learning architecture for modeling the EDFA gain spectrum

Generalisierte wenig-shot Transfer Lernarchitektur für die Modellierung der EDFA Gain-Spektrum

用于模拟欧洲开发协会增益频谱的通用的几发转让学习架构

2507.21728v1

412

07-29

Riemannian Optimization on Tree Tensor Networks with Application in Machine Learning

Riemannsche Optimierung auf Tree Tensor-Netzwerken mit Anwendung im maschinellen Lernen

Riemannian 利用机器学习应用在树透镜网络上的优化

2507.21726v1

413

07-29

Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis

Intrinsische Barrieren und praktische Wege für die Mensch-AI-Ausrichtung: Eine auf Vereinbarungen basierende Komplexitätsanalyse

内在障碍和人类-AI协调的实用途径:基于协定的复杂程度分析

2502.05934v2

414

07-29

Robust Matrix Completion for Discrete Rating-Scale Data: Coping with Fake Profiles in Recommender Systems

Robuste Matrix-Vervollständigung für diskrete Rating-Scale-Daten: Umgang mit gefälschten Profilen in Recommender-Systemen

分立评分尺度数据强力矩阵补全:在推荐人系统中处理假配置配置文件

2412.20802v2

415

07-29

Data-Driven Extended Corresponding State Approach for Residual Property Prediction of Hydrofluoroolefins

Datengetriebener erweiterter korrespondierender State Approach für die Vorhersage residualer Eigenschaften von Hydrofluorolefinen

关于氢氟烯烃残余财产预测的数据驱动扩展对应对应国家办法

2507.21720v1

416

07-29

Quantum enhanced stratification of Breast Cancer: exploring quantum expressivity for real omics data

Quantenverstärkte Schichtung des Brustkrebses: Erforschung der Quantenexpressivität für reale Omics-Daten

量子增强乳腺癌分层:探索真实动脉数据的数量表达性

2409.14089v2

417

07-29

An Equal-Probability Partition of the Sample Space: A Non-parametric Inference from Finite Samples

Eine gleichberechtigte Teilung des Probenraums: Eine nicht-parametrische Folgerung von Finite-Proben

样板空间的同等概率部分:来自有限样品的非参数推论

2507.21712v1

418

07-29

PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting

PRIG: Physik-informierte und verstärkte interpretierbare GRU für die Prognose der Rohstoffnachfrage

PREIG: 物理知情和强化驱动的商品需求预测可解释的GRU

2507.21710v1

419

07-29

Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting

Lokaler Aufmerksamkeitsmechanismus: Förderung der Transformer-Architektur für Langzeit-Zeitreihenprognosen

地方关注机制:促进长序列时间序列预测的变革结构

2410.03805v3

420

07-29

Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review

Maschinelles Lernen-basierte multimodale prognostische Modelle zur Integration pathologischer Bilder und hochdurchsetzter omischer Daten für die Gesamtüberlebensvorhersage bei Krebs: eine systematische Überprüfung

综合病理图象和高通量血压数据以全面预测癌症存活率的机器学习的多式联运预测模型:系统审查

2507.16876v2

421

07-29

Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

Hierarchische Mischungen von Gaußianern zur kombinierten Dimensionalitätsreduktion und Clusterbildung

用于合并减少维度和集群的高斯人等级混合物

2206.04841v2

422

07-29

diffSPH: Differentiable Smoothed Particle Hydrodynamics for Adjoint Optimization and Machine Learning

diffSPH: Differenzierbare geglättete Partikelhydrodynamik für Adjoint-Optimierung und maschinelles Lernen

diffSPH: 用于联合优化和机械学习的有差异的滑动粒子流体动力学

2507.21684v1

423

07-29

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

Unüberwachte Risikofaktoren-Identifikation über Krebsarten und Datenmodalitäten durch erklärbare künstliche Intelligenz

通过可解释的人工智能,在癌症类型和数据模式中,通过可解释的人工智能,确定各种癌症类型和数据模式的不受监督的风险因素

2506.12944v3

424

07-29

Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing

Implementierung großer Quantenboltzmann-Maschinen als generative KI-Modelle für die Datensatz-Balancing

实施大型量子波尔兹曼机器作为数据集平衡生成的AI模型

2502.03086v2

425

07-29

Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification

Wahrscheinlichkeitskonsistenz im maschinellen Lernen und seine Verbindung zur Unsicherheitsquantifizierung

机器学习及其与不确定性量化的关联的概率一致性

2507.21670v1

426

07-29

Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification

Quantum Boltzmann Maschinen mit paralleler Abschirmung für medizinische Bildklassifikation

使用平行安内处理医疗图像分类的量子波尔兹曼机器

2507.14116v2

427

07-29

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

DGP: Ein Dual-Granularity-Prompting-Framework für Betrugserkennung mit grafisch verbesserten LLMs

DGP:用图形增强的LMMs进行欺诈侦查的两层提示框架

2507.21653v1

428

07-29

Hyperbolic Genome Embeddings

Hyperbolische Genom-Embeddings

超双曲基基因组嵌入器

2507.21648v1

429

07-29

Whilter: A Whisper-based Data Filter for “In-the-Wild” Speech Corpora Using Utterance-level Multi-Task Classification

Whilter: Ein Whisper-basierter Datenfilter für “In-the-Wild”-Sprachkorpora unter Verwendung einer Multi-Task-Klassifikation auf Utterance-Ebene

时 : 以语音为基础的数据过滤器, 用于“在野中”演讲团, 使用异地级多任务分类

2507.21642v1

430

07-29

Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

Assistax: Ein hardwarebeschleunigtes Lern-Benchmark für assistive Robotik

辅助:辅助机器人学辅助机器人学硬件加速增强学习基准

2507.21638v1

431

07-29

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Verteidigung gegen unvorhergesehene Ausfallmodi mit latenten Adversarial Training

利用远程反反向培训,防范意外失灵模式

2403.05030v6

432

07-29

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Latent Adversarial Training verbessert Robustheit für persistente schädliche Verhalten in LLMs

长效对长效有害行为培训能提高长效LMM中持久性有害行为的积极性

2407.15549v3

433

07-29

A calibration test for evaluating set-based epistemic uncertainty representations

Ein Kalibriertest zur Bewertung setbasierter epistemischer Unsicherheitsdarstellungen

用于评价基于固定定点的感知性不确定表示的校准测试

2502.16299v2

434

07-29

Hybrid activation functions for deep neural networks: S3 and S4 – a novel approach to gradient flow optimization

Hybride Aktivierungsfunktionen für tiefe neuronale Netzwerke: S3 und S4 – ein neuartiger Ansatz zur Gradientenflussoptimierung

深神经网络的混合激活功能:S3和S4 – – 梯度流优化的新办法

2507.22090v1

435

07-29

Collaborative filtering based on nonnegative/binary matrix factorization

Kollaborative Filterung auf der Grundlage nichtnegativer/binärer Matrixfaktorisierung

基于非负负/二进制矩阵因子化的合作过滤

2410.10381v4

436

07-29

Categorical Distributions are Effective Neural Network Outputs for Event Prediction

Kategorische Verteilungen sind effektive neurale Netzwerk-Ausgaben für Event-Vorhersage

分类分布是事件预测的有效神经网络产出

2507.21616v1

437

07-29

Multi-branch of Attention Yields Accurate Results for Tabular Data

Multi-Zweige der Aufmerksamkeit Erträge genaue Ergebnisse für Tabellendaten

多部门关注表格数据的准确结果

2502.12507v2

438

07-29

Demystifying Misconceptions in Social Bots Research

Entmystifizierende Missverständnisse in der Social Bots Forschung

社会生物群研究中解密错误观念

2303.17251v4

439

07-29

A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

Eine Umfrage über speichereffiziente Transformer-basierte Modellausbildung in KI für die Wissenschaft

关于AIST科学领域基于记忆-有效变压器的模型培训的调查

2501.11847v2

440

07-29

Principled Curriculum Learning using Parameter Continuation Methods

Prinzipielles Curriculum Lernen mit Parameter-Weiterführungsmethoden

使用参数持续方法进行有原则的课程学习

2507.22089v1

441

07-29

A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models

Eine detaillierte Faktorenanalyse für den politischen Kompasstest: Navigieren von Ideologien großer Sprachmodelle

《政治指南测试的详细要素分析:掌握大语言模式的特征》

2506.22493v2

442

07-29

Machine Learning Risk Intelligence for Green Hydrogen Investment: Insights for Duqm R3 Auction

Machine Learning Risk Intelligence für Green Hydrogen Investment: Einblicke für Duqm R3 Auktion

绿色氢投资的机器学习风险情报:Duqm R3拍卖的透视

2507.19529v2

443

07-29

Meta-Designing Quantum Experiments with Language Models

Meta-Designing Quantenexperimente mit Sprachmodellen

配有语言模型的元设计量子实验

2406.02470v2

444

07-29

“So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents

“So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents

“告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。

2507.07848v2

445

07-29

An em algorithm for quantum Boltzmann machines

Ein Em-Algorithmus für Quantenboltzmann-Maschinen

Boltzmann 量子机器的 Em 算法

2507.21569v1

446

07-29

Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie

生成异质多维数据:比较研究

2507.00090v3

447

07-29

Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

Verbesserung der Graphen-basierten Empfehlungen mit Mehrheitsvoting LLM-Rerank Augmentation

采用多数表决的LLM-重新升级增强图表为基础的建议

2507.21563v1

448

07-29

PEVLM: Parallel Encoding for Vision-Language Models

PEVLM: Parallele Kodierung für Vision-Language-Modelle

PEVLM: 视觉语言模型平行编码

2506.19651v3

449

07-29

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

C2-Evo: Co-Evolving multimodale Daten und Modell zur Selbstverbesserung

C2-Evo:共同演进的多模式数据和自我改进理由模型

2507.16518v2

450

07-29

Fine-Grained Perturbation Guidance via Attention Head Selection

Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl

通过 “ 关注负责人甄选 “ 指导

2506.10978v3

451

07-29

On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems

Über Politik-Stochastik in gegenseitiger Information Optimale Kontrolle von Linearsystemen

关于相互信息中政策现状的相互信息最佳控制线性系统

2507.21543v1

452

07-29

AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery

KI-Ming rückwärts: Auslöschende archäologische Landschaften in Mesopotamien und automatische Erkennung von Stätten auf CORONA-Bildern

AI-Ming倒向:美索不达米亚消失的考古景观和自动探测CORONA图像上的遗址

2507.13420v2

453

07-29

Automatic Classification of User Requirements from Online Feedback – A Replication Study

Automatische Klassifizierung der Benutzeranforderungen aus Online-Feedback – Eine Replikationsstudie

在线反馈用户要求自动分类 – – 复制研究

2507.21532v1

454

07-29

Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series

Hierarchische stochastische Differentialgleichungsmodelle für latentes Manifold Learning in der Neural Time Series

神经时间序列中前部蒙花层学习的等级学历史理学分等模型

2507.21531v1

455

07-29

Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

Auf dem Weg zu einer erleichterten Fairnessbewertung von KI-basierten Haut-Lesions-Klassifikatoren durch GenAI-basierte Bildsynthese

通过GenAI基于GenAI的图像合成,促进基于AI的皮肤皮质分类分类的公平评估

2507.17860v2

456

07-29

A Scalable and High Availability Solution for Recommending Resolutions to Problem Tickets

Eine skalierbare und hochverfügbare Lösung für die Empfehlung von Auflösungen an Problemlösungen

向问题罚单建议解决方案的可扩展和高可用性解决方案

2507.19846v2

457

07-29

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. I: Penalty Approach

Nonconvex Optimization Framework für Gruppen-Spar-Feedback Linear-Quadratische Optimale Kontrolle. I: Strafverfahren

群分反馈线性水分最佳最佳控制非康化最佳框架。

2507.18114v2

458

07-29

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

TolerantECG: Ein Grundmodell für ein imperfektes Elektrokardiogramm

缩放式ECG:不完美心电图基金会模型

2507.09887v2

459

07-29

Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment

Posture-Driven Action Intent Inferenz für Spielstil und Müdigkeit Bewertung

游戏风格和Fatigue评估的推论

2507.11642v2

460

07-29

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

Langfristige Fairness-Anfragen und Verfolgungen im Bereich des maschinellen Lernens: Eine Übersicht von Begriffen, Methoden und Herausforderungen

机构学习方面的长期公平调查与追踪:对名称、方法与挑战的调查

2406.06736v3

461

07-29

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Persona-Vektoren: Überwachung und Kontrolle von Charaktereigenschaften in Sprachmodellen

人向量:监测和控制语言模式中的字符轨迹

2507.21509v1

462

07-29

Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

Kodezi Chronos: Ein Debugging-First Language Model für Repository-Scale Code Understanding

Kodezi Chronos:调试第一语言模式,用于存储库-范围守则理解

2507.12482v2

463

07-29

Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Probabilistische gerichtete Distanzfelder für Ray-Based Shape-Darstellungen

光以光基形状表示法的直射距离场概率

2404.09081v2

464

07-29

Semantic segmentation of SEM images of lower bainitic and tempered martensitic steels

Semantische Segmentierung von SEM-Bildern von unteren bainitischen und gehärteten martensitischen Stählen

SEM图象的金属和温和的金属冶金钢的金属图象的语义分解

2312.17251v2

465

07-29

Evaluation and Benchmarking of LLM Agents: A Survey

Bewertung und Benchmarking von LLM-Agenten: Eine Umfrage

对LLLM代理的评估和基准确定:调查

2507.21504v1

466

07-29

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen

为采用大语言模式的高级指示提供激励理由

2506.01413v5

467

07-29

Multifunctional physical reservoir computing in soft tensegrity robots

Multifunktionales physikalisches Reservoir-Computing in Soft-Angespanntheit-Robotern

多功能物理储油层软时势机器人计算

2507.21496v1

468

07-29

Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning

Latte: Collaborative Test-Time Adaption von Vision-Language-Modellen im Federated Learning

Latte:联邦学习联合会愿景-语言模型协作测试-时间适应

2507.21494v1

469

07-29

Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control

Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle

利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题

2505.03134v3

470

07-29

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Sem-DPO: Semantische Inkonsistenz bei der Preference-Optimierung für Prompt Engineering mindern

Sem-DPO: 减轻在优先优化即时工程方面的语义不一致现象

2507.20133v2

471

07-29

Image Super-resolution Inspired Electron Density Prediction

Bild Super-Auflösung Inspirierte Elektronendichte Vorhersage

图像超分辨率激发电密度预测

2402.12335v2

472

07-29

Stochastic forest transition model dynamics and parameter estimation via deep learning

Stochastische Wald Übergangsmodell Dynamik und Parameterschätzung durch Deep Learning

通过深层学习对森林口过渡模型动态和参数估计

2507.21486v1

473

07-29

Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer

测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析

2507.00683v5

474

07-29

The pitfalls of next-token prediction

Die Fallstricke der Next-Token-Vorhersage

下吨预测的陷阱

2403.06963v3

475

07-29

HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

HI-PMK: Ein Data-Dependent-Kernel für unvollständige heterogene Datendarstellung

HI-PMK:一个数据依赖核心,用于不完全异基因数据代表

2501.04300v3

476

07-29

Capacity-Constrained Continual Learning

Leistungsbeschränktes kontinuierliches Lernen

受能力制约的不断学习

2507.21479v1

477

07-29

Adversarial bandit optimization for approximately linear functions

Adversariale Bandit-Optimierung für etwa lineare Funktionen

大约直线功能的对面土匪优化

2505.20734v5

478

07-29

Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning

Hebbian Memory-Augmented Recurrent Networks: Engram Neuronen im Deep Learning

Hebbian记忆增强的经常网络网络:深层学习中的Engram神经元

2507.21474v1

479

07-29

Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training

Retrieve-Augmented Generation zur Beschleunigung der Diffusionspolitik ohne zusätzliches Training

加速推广政策而无需额外培训的回收-提款一代

2507.21452v1

480

07-29

From Global to Local: A Scalable Benchmark for Local Posterior Sampling

Von Global zu Local: Ein skalierbarer Benchmark für die lokale posteriore Probenahme

从全球到地方:一个可缩放的基准

2507.21449v1

481

07-29

Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations

Echtzeit-Audio-Visuelle Sprachverbesserung mit vortrainierten visuellen Darstellungen

利用经过培训的视觉代表器加强实时视听语音语音

2507.21448v1

482

07-29

Nonparametric Sparse Online Learning of the Koopman Operator

Nonparametric Sparse Online-Lernen des Koopman-Betreibers

Koopman 运算符的非参数 Sparass 在线学习

2405.07432v3

483

07-29

PVD-ONet: A Multi-scale Neural Operator Method for Singularly Perturbed Boundary Layer Problems

PVD-ONet: Eine mehrstufige Neuraloperator-Methode für singulär gestörte Grenzschichtprobleme

PVD-ONet: 单层扰动边界层问题多级神经操作员方法

2507.21437v1

484

07-29

Measuring Sample Quality with Copula Discrepancies

Messung der Probenqualität mit Copula-Diskrepanzen

衡量抽样质量与可协调差异

2507.21434v1

485

07-29

LLAMAPIE: Proactive In-Ear Conversation Assistants

LLAMAPIE: Proaktive In-Ear-Gesprächsassistenten

LLAMAPIE: 主动的在轨在轨对话助理

2505.04066v2

486

07-29

Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning

Induzieren kausale Weltmodelle in LLMs für Zero-Shot Physical Reasoning

在零热物理原因的LLMM中引入因果世界模型

2507.19855v2

487

07-29

From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

Von Sublinear zu Linear: Schnelle Konvergenz in tiefen Netzwerken über lokale Polyak-Lojasiewicz-Regionen

从子线线至线性线性:通过地方Polyak-Lojasiewicz区在深网络中快速聚合

2507.21429v1

488

07-29

InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain für LLM mit optischen Schaltungsschalter Transceivern

无限HBD:利用光电转换收发器为LLM 建立数据中心 – – 高度宽宽度高域域

2502.03885v5

489

07-29

PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN

PAR-AdvGAN: Verbesserung der Angriffsfähigkeit mit progressiver Auto-Regression AdvGAN

PAR-AdvGAN: 提高反向攻击能力

2502.12207v2

490

07-29

Back Home: A Computer Vision Solution to Seashell Identification for Ecological Restoration

Zurück Home: Eine Computer Vision-Lösung zur Seashell-Identifizierung für die ökologische Restaurierung

返乡:通过计算机的愿景解决方案来识别海壳,促进生态恢复

2501.04873v4

491

07-29

Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling

Bergbau-Intrinsische Belohnungen aus LLM-Hidden States für effiziente Best-of-N-Probenahme

LLM隐藏国为高效率最佳采样而从LLM公司获得的采矿内部奖赏

2505.12225v2

492

07-29

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

InfiniteYou: Flexibles Foto Recrafting unter Wahrung Ihrer Identität

无限的你:在保留身份的同时灵活摄影改造

2503.16418v2

493

07-29

Randomized Kaczmarz Methods with Beyond-Krylov Convergence

Randomisierte Kaczmarz Methoden mit Beyond-Krylov Konvergenz

Kaczmarz 超克隆集成法

2501.11673v2

494

07-29

MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving

MapDiffusion: Generative Diffusion für vektorisierte Online-HD-Karte Aufbau und Unsicherheit im autonomen Fahren

地图传播:在自动驾驶中为矢量在线HD地图绘制和不确定估计进行传播

2507.21423v1

495

07-29

Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring

Drehmomentbasierte Graphenchirurgie:Verbesserung der Graphen-Neural-Netzwerke mit Hierarchischem Rewiring

基于托盘的图表外科:用等级重组增强图形神经网络

2507.21422v1

496

07-29

Cascading and Proxy Membership Inference Attacks

Cascading und Proxy Mitgliedschafts-Inferenz-Angriffe

连带和代理成员推定攻击

2507.21412v1

497

07-29

Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning

Automatisierte Generierung von vielfältigen Handlungskursen für Multi-Agenten-Betriebe mit Binäroptimierung und Graphen-Lernen

利用二进制优化和图表学习,自动产生多种多机构业务行动多样化行动方案

2506.20031v2

498

07-29

Data Leakage and Redundancy in the LIT-PCBA Benchmark

Datenleckage und Redundanz im LIT-PCBA Benchmark

LIT-PCBA基准数据泄漏和冗余

2507.21404v1

499

07-29

Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

Ermöglichung der Erkundung von Pareto-Stationarität im multi-objektiven Ausbau-Lernen: Ein multi-objektiver gewichtiger Chebyshev-Actor-Kritischer Ansatz

使多目标强化学习中的Pareto-Starity探索:多目标加权-Chebyshev Actor-Crictive 方法

2507.21397v1

500

07-29

Systolic Array-based Accelerator for State-Space Models

Systolischer Array-basierter Accelerator für State-Space-Modelle

州空间模型的基于收量的阵列加速器

2507.21394v1

501

07-28 (1)

FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning

FedStrategist: Ein Meta-Learning-Framework für adaptive und robuste Aggregation im Federated Learning

联邦战略:联邦学习中适应性和强力聚合的元学习框架

2507.14322v2

502

07-28

Addressing High Class Imbalance in Multi-Class Diabetic Retinopathy Severity Grading with Augmentation and Transfer Learning

Umgang mit hochwertigem Gleichgewicht in der multi-Klasse diabetischen Retinopathie Schweregraduierung mit Augmentation und Transfer-Lernen

多类糖尿病雷蒂诺病分病分级加增和转移学习中处理高等级的不平衡问题

2507.17121v2

503

07-28

Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Effizienter neuraler Kombinatorial-Optimierungslöser für das Min-max Heterogene kapazitive Fahrzeugrouting-Problem

用于解决机动车辆流动问题最小最大高度超异性电动车辆的高效神经组合组合优化解决方案

2507.21386v1

504

07-28

TiVy: Time Series Visual Summary for Scalable Visualization

TiVy: Zeitreihenvisuelle Zusammenfassung für skalierbare Visualisierung

TiVy:可缩放可视化的时间序列视觉摘要

2507.18972v2

505

07-28

Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators

Reservoir Computation mit Netzwerken der Differenzierung Neuron Ring Oszillatoren

差异式中子环振动器网络的储量计算

2507.21377v1

506

07-28

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment

Multi-Mikrofon- und Multi-Modal-Emotionserkennung in reverberanter Umgebung

在震动性环境中多语种和多模式情感的认可

2409.09545v3

507

07-28

Load Balancing for AI Training Workloads

Lastausgleich für KI-Trainings-Workloads

AI培训的平衡 AI 平衡 IT 培训工作量

2507.21372v1

508

07-28

A Contrastive Diffusion-based Network (CDNet) for Time Series Classification

Ein Kontrastives Diffusions-basiertes Netzwerk (CDNet) für die Zeitreihenklassifikation

用于时间序列分类的以反向传播为基础的网络(CDNet)

2507.21357v1

509

07-28

Group Relative Augmentation for Data Efficient Action Detection

Gruppenrelative Augmentation für dateneffiziente Aktionserkennung

数据高效行动检测组群相对增量

2507.21353v1

510

07-28

DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation

DEM-NeRF: Eine neuro-symbolische Methode zur wissenschaftlichen Entdeckung durch physikinformierte Simulation

DEM-NERF:通过物理成形模拟法科学发现的一种神经-共制方法

2507.21350v1

511

07-28

Recovering Manifold Structure Using Ollivier-Ricci Curvature

Recovering Manifold Structure mit Ollivier-Ricci Krümmung

使用 Oliviier- Ricci 曲线恢复处理结构

2410.01149v2

512

07-28

Graph neural networks for residential location choice: connection to classical logit models

Graphische neuronale Netze für die Wahl der Wohnlage: Anbindung an klassische Logit-Modelle

用于住宅地点选择的图形神经网络:与古典日志模型的连接

2507.21334v1

513

07-28

Predicting VBAC Outcomes from U.S. Natality Data using Deep and Classical Machine Learning Models

Vorhersage von VBAC-Ergebnissen aus US-Natality-Daten mittels Deep and Classical Machine Learning Models

利用深古机器学习模型,从美国圣诞数据中预测VBAC结果

2507.21330v1

514

07-28

SQuat: Subspace-orthogonal KV Cache Quantization

SQuat: Subraum-orthogonale KV-Cache-Quantisierung

Suat: 子空间- orthogonal KV 缓存缓存量化

2503.24358v2

515

07-28

Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen

从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架

2505.11864v3

516

07-28

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

MALLM-GAN: Multi-Agent Large Language Model als generatives Adversarial Network zur Synthese von Tabellendaten

MALM-GAN:多种需要的大型语言模型,作为合成表格数据生成反对向网络

2406.10521v4

517

07-28

Position: Adopt Constraints Over Penalties in Deep Learning

Position: Überstrapazierte Strafen im Deep Learning adoptieren

职位:在深深学习中采用约束措施以凌驾刑罚

2505.20628v3

518

07-28

Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

Großes Sprachmodell-verstärktes Verstärkungslernen für vielfältige und neuartige Empfehlungen

为多样化和新颖建议加强大型语文强化学习模式

2507.21274v1

519

07-28

Deep Polynomial Chaos Expansion

Tiefenpolynomiale Chaos-Expansion

深刻的多元混乱扩大

2507.21273v1

520

07-28

Generative imaging for radio interferometry with fast uncertainty quantification

Generative Bildgebung für die Radiointerferometrie mit schneller Unsicherheitsquantifizierung

具有快速不确定性量化的无线电干涉测量生成成像

2507.21270v1

521

07-28

Numerical PDE solvers outperform neural PDE solvers

Numerische PDE-Löser übertreffen neuronale PDE-Löser

数字 PDE 溶解器超过神经神经功能 PDE 溶解器

2507.21269v1

522

07-28

Adversarial attacks and defenses in explainable artificial intelligence: A survey

Adversariale Angriffe und Abwehrkräfte in erklärbarer künstlicher Intelligenz: Eine Umfrage

可解释的人工智能中的反向攻击和防御:一项调查

2306.06123v4

523

07-28

Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics

Multiskaliges geometrisches und topologisches Lernen in der Analyse der kollektiven Dynamik weicher Materie

在分析软物质集体动态中进行多尺度多几何学和地形学学习

2507.21265v1

524

07-28

Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Verstärktes Lernen Fine-Tunes ein Sparse Subnetwork in großen Sprachmodellen

以大语言模式建立粗略的子网络

2507.17107v2

525

07-28

Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors

Adaptive multimodale Protein-Plug-and-Play mit Diffusion-basierten Prioren

适应性多式蛋白丁质多式聚苯乙烯插件和基于传播的前期布料

2507.21260v1

526

07-28

Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees

Heterogener Behandlungseffekt bei Time-to-Event-Ergebnissen: Zensierte Daten mit rekursiv unterstellten Bäumen nutzen

时间到晚上结果中的异异异性治疗效应:利用对立的树木利用敏感数据

2502.01575v3

527

07-28

Diffusion Denoiser-Aided Gyrocompassing

传播 Denoiser 辅助热聚热器

2507.21245v1

528

07-28

Bubbleformer: Forecasting Boiling with Transformers

Bubbleformer: Vorhersage Kochen mit Transformatoren

Bubbleex: 预测与变压器相混合

2507.21244v1

529

07-28

Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors

Fluidisch innervated Gitter machen vielseitige und langlebige Taktile Sensoren

具有流力、动态和耐久感应感应传感器

2507.21225v1

530

07-28

Benchmarking a Tunable Quantum Neural Network on Trapped-Ion and Superconducting Hardware

Benchmarking eines Tunable Quantum Neural Network auf Trapped-Ion und supraleitende Hardware

设定关于受困和超导制成硬硬件的金枪鱼可量量神经网络的基准基准

2507.21222v1

531

07-28

Flow Matching Policy Gradients

Strömungszugehörige politische Gradienten

流程匹配政策梯度

2507.21053v1

532

07-28

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Rep-MTL: Lösen der Macht der Repräsentationsaufgabe Saliency für Multi-Task-Lernen

Rep-MTL:释放代表一级任务在多任务学习方面的弹性权力

2507.21049v1

533

07-28

Agentic Web: Weaving the Next Web with AI Agents

Agentic Web: Das nächste Web mit KI-Agenten weben

代理网络: 与 AI 代理进行下个网络编织

2507.21206v1

534

07-28

Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

Transformer als ungerollte Folgerung in probabilistischen Laplacian Eigenmaps: Eine Interpretation und mögliche Verbesserungen

Laplacian Eigenmaps: 解释和可能的改进

2507.21040v1

535

07-28

When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Wenn Brain Foundation Model Cauchy-Schwarz Divergenz trifft: Ein neues Framework für die bereichsübergreifende Motor Imagery Decodierung

当大脑基金会模型与Cauchy-Schwarz差异相遇时:跨物体机动图象解码新框架

2507.21037v1

536

07-28

Learning from Limited and Imperfect Data

Von begrenzten und unvollkommenen Daten lernen

学习有限和不完善数据

2507.21205v1

537

07-28

Optimization Performance of Factorization Machine with Annealing under Limited Training Data

Optimierung Leistung der Factorisierungsmaschine mit Annealing unter begrenzter Trainingsdaten

根据有限培训数据与Annaaling公司一起使用的保质机械的优化性能

2507.21024v1

538

07-28

On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Über die Verwendung des schuppigen Wertes für Anomalie Lokalisierung: Eine statistische Untersuchung

利用虚光值实现异常本地化:统计调查

2507.21023v1

539

07-28

Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

Behavior-Spezifische Filterung für verbesserte Schweineverhaltensklassifikation in der Precision Livestock Farming

精密牲畜耕作中强化猪品行为分类的具体行为过滤法

2507.21021v1

540

07-28

Deep Learning for Skeleton Based Human Motion Rehabilitation Assessment: A Benchmark

Deep Learning für skeletonbasierte Human Motion Rehabilitation Assessment: Ein Benchmark

Skeleton基于Skeleton的人类运动康复评估深学习:基准

2507.21018v1

541

07-28

Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Vorhersage der Kognition aus fMRI:Eine vergleichende Studie von Graph, Transformer und Kernelmodellen über Aufgaben- und Ruhebedingungen hinweg

FMRI的预测认知:关于不同任务和休息条件的图形、变形器和内核模型的比较研究

2507.21016v1

542

07-28

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Bewertung des Versprechens und der Fälle von LLMs bei Hiring-Entscheidungen

评估LLM女士在雇用决定中的许诺和机会

2507.02087v2

543

07-28

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

LoRA-PAR: Ein flexibler Dual-System-LoRA-Partitionsansatz für effizientes LLM-Feintuning

LOLAR-PAR:高效 LLM 微调的灵活双系统滚动分割法

2507.20999v1

544

07-28

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

Modulare Delta-Zusammenführung mit orthogonalen Einschränkungen: Ein skalierbarer Rahmen für kontinuierliche und reversible Modellzusammensetzung

模块三角洲与正纵形制约合并:可扩展的连续性和可复制模型构成框架

2507.20997v1

545

07-28

On the Robustness of Global Feature Effect Explanations

Über die Robustheit der globalen Feature-Effekt Erklärungen

全球特效解释的威力

2406.09069v2

546

07-28

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

GUI-G$^2$: Gaussian Reward Modeling für GUI Grounding

GUI-G$$2美元:GUI地基的高斯奖赏模型

2507.15846v3

547

07-28

Personalized Treatment Effect Estimation from Unstructured Data

Schätzung des personalisierten Behandlungseffekts aus unstrukturierten Daten

来自无结构数据的个人化治疗效果估算

2507.20993v1

548

07-28

Scaling Physical Reasoning with the PHYSICS Dataset

Skalierung der physikalischen Vernunft mit dem PHYSICS-Datensatz

利用PHYSICS数据集调整物理理由

2506.00022v3

549

07-28

Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs

Reparieren von Schwachstellen ohne unsichtbare Hände. Eine differenzierte Replikationsstudie auf LLMs

在没有无形手的情况下修复弱点,对LLMs进行差别化的推广研究。

2507.20977v1

550

07-28

Locally Adaptive Conformal Inference for Operator Models

Lokale Adaptive Konforme Schlussfolgerung für Operatormodelle

操作者模型的本地适应性本地化常规推论

2507.20975v1

551

07-28

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Model-Agnostic Gender Bias Control für Text-to-Image Generation via Sparse Autoencoder

通过 Sparse 自动编码器控制文本到图像生成的模型 – – 不可允许性别比控制

2507.20973v1

552

07-28

A Modular Open Source Framework for Genomic Variant Calling

Modulares Open Source Framework für den genomischen Variant Calling

基因变异召唤模块开放源框架

2411.11513v2

553

07-28

A Survey of Deep Learning for Geometry Problem Solving

Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen

解决几何问题深层学习调查

2507.11936v4

554

07-28

Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Mehrbildbeschreibungen für mehrsprachige, leichte Kognitive Impairment-Erkennung durch kontrastives Lernen enthüllen

通过差异学习发现多语种轻视认知缺陷的单形多语种描述

2505.17067v3

555

07-28

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

Von der Verschränkung zur Ausrichtung: Repräsentationsraumdekomposition für unüberwachte Zeitreihen-Domänenanpassung

从连接到对齐:无人监督的时间序列适应的代表空间分解

2507.20968v1

556

07-28

PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

PROVCREATOR: Synthese komplexer heterogener Graphen mit Knoten- und Kantenattributen

PROVCTOR: 综合复杂异异化图与节点和边缘属性

2507.20967v1

557

07-28

Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

Handoff-Design in benutzer-zentralen zellfreien Massive MIMO-Netzwerke mit DRL

使用DRL的无用户核心细胞无大规模MIMM网络的离岸设计

2507.20966v1

558

07-28

Core Safety Values for Provably Corrigible Agents

Grundlegende Sicherheitswerte für wahrscheinlich korrigierbare Wirkstoffe

可可调代用品的核心安全价值

2507.20964v1

559

07-28

Mean-Field Langevin Diffusions with Density-dependent Temperature

Mittleres Feld Langevin Diffusionen mit Dichte-abhängiger Temperatur

依赖密度温度的中度Langevin发射场

2507.20958v1

560

07-28

An empirical comparison of some outlier detection methods with longitudinal data

Ein empirischer Vergleich einiger Ausreißer-Detektionsmethoden mit Längsschnittdaten

将某些异常探测方法与纵向数据进行实证比较

2507.21203v1

561

07-28

PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery

PySHRED: Ein Python-Paket für Shallow REcurrent Decodierung für spärliche Erfassung, Modellreduktion und wissenschaftliche Entdeckung

PySHRED: Sahallow 流流解解码用于遥感、减少模型和科学发现的一个Python包件

2507.20954v1

562

07-28

Multivariate Conformal Prediction via Conformalized Gaussian Scoring

Multivariate konforme Vorhersage über konforme Gaussian Scoring

通过集成高斯测算法进行多变的多变预测

2507.20941v1

563

07-28

Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Persona-Driven Reasoning in Sprachmodellen per Aktivierungs-Patching auflösen

通过激活补丁在语言模型中通过激活补丁解剖人-人-驱动原因

2507.20936v1

564

07-28

Aether: Geometric-Aware Unified World Modeling

Äther: Geometrisch-Bewusst Unified World Modeling

以太: 几何-软件统一世界建模

2503.18945v3

565

07-28

Breaking the Precision Ceiling in Physics-Informed Neural Networks: A Hybrid Fourier-Neural Architecture for Ultra-High Accuracy

Breaking the Precision Ceiling in Physics-informed Neural Networks: Eine hybride Fourier-Neural-Architektur für ultra-hohe Genauigkeit

打破物理内成形神经网络的精确度上限:超高精确度的混合四面体-神经结构

2507.20929v1

566

07-28

LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

LLM2TEA: Agentischer AI-Designer für Entdeckung mit generativem evolutionären Multitasking

LLM2TEA: 利用产生进化多任务探索的代理AI 设计器

2406.14917v3

567

07-28

SEAL: Searching Expandable Architectures for Incremental Learning

SEAL: Suche nach erweiterbaren Architekturen für inkrementelles Lernen

SEAL: 搜索可扩展建筑以进行递增学习

2505.10457v2

568

07-28

Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Zero-Shot-Lernen mit Nachfolge Nachbestellen Vorschulung für Compound-Protein-Interaktion

用于复合蛋白相互作用的零热学习和后序重新排序的零热学习预培训

2507.20925v1

569

07-28

Modeling User Behavior from Adaptive Surveys with Supplemental Context

Modellierung des Benutzerverhaltens aus adaptiven Umfragen mit ergänzendem Kontext

模拟具有补充背景的适应性调查用户行为

2507.20919v1

570

07-28

Are ECGs enough? Deep learning classification of pulmonary embolism using electrocardiograms

Genügen EKGs? Deep Learning Klassifikation der Lungenembolie mit Elektrokardiogrammen

ECG 是否足够? 使用心电图对肺栓塞进行深度学习分类

2503.08960v2

571

07-28

Joint modeling for learning decision-making dynamics in behavioral experiments

Gemeinsame Modellierung für das Lernen von Entscheidungsdynamiken in Verhaltensexperimenten

在行为实验中为学习决策动态进行联合建模

2506.02394v2

572

07-28

Online hierarchical partitioning of the output space in extreme multi-label data stream

Online-Hierarchische Partitionierung des Ausgaberaums im extremen Multi-Label-Datenstrom

极端多标签数据流中输出空间的在线分层

2507.20894v1

573

07-28

Implementing Adaptations for Vision AutoRegressive Model

Implementierung von Anpassungen für das AutoRegressive Vision Modell

实施适应展望自动递减模式

2507.11441v2

574

07-28

Testbed and Software Architecture for Enhancing Security in Industrial Private 5G Networks

Testbed und Software-Architektur zur Verbesserung der Sicherheit in industriellen privaten 5G-Netzwerken

加强工业私营5G网络安全测试台和软件架构

2507.20873v1

575

07-28

Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer’s Disease

Nicht nur Grey Matter: OmniBrain für robuste multimodale Klassifizierung der Alzheimer-Krankheit

不仅灰物质:阿兹海默氏病强力多式联运分类

2507.20872v1

576

07-28

\textit{FedABC}: Attention-Based Client Selection for Federated Learning with Long-Term View

\textit{FedABC}: Aufmerksamkeitsbasierte Client-Auswahl für Federated Learning mit Langzeitansicht

\ textit{FedABC}:从长期角度选择关注的联邦学习对象

2507.20871v1

577

07-28

Bi-cephalic self-attended model to classify Parkinson’s disease patients with freezing of gait

Bi-zephalisches selbstbeaufsichtigtes Modell zur Einstufung von Parkinson-Patienten mit Gangeinfrieren

将帕金森病人的双脑自学分类并冻结步伐的双脑自闭模式

2507.20862v1

578

07-28

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

REDS: Ressourceneffiziente Deep Subnetworks für dynamische Ressourcenbeschränkungen

REDD: 资源效率高的动态资源制约的深层子网络

2311.13349v3

579

07-28

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Geometrie des Neuralen Verstärkungslernens in kontinuierlichen Zustands- und Handlungsräumen

持续状态和行动空间神经强化学习的几何测量

2507.20853v1

580

07-28

Learning unitaries with quantum statistical queries

Lerneinheiten mit quantenstatistischen Abfragen

附有量数统计查询的学习单

2310.02254v3

581

07-28

Towards Explainable Deep Clustering for Time Series Data

Auf dem Weg zu erklärbarem Deep Clustering für Zeitreihendaten

实现时间序列数据可解释的深层群集

2507.20840v1

582

07-28

BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network

BuildSTG: Eine Multi-Building-Methode zur Energiebelastungsprognose mit Spatio-Temporal Graph Neural Network

BuildSTG:使用SPATIO-时钟图神经网络的多建筑能源载荷预测方法

2507.20838v1

583

07-28

First Hallucination Tokens Are Different from Conditional Ones

Erste Halluzinationstoken unterscheiden sich von Bedingten

第一次幻觉声调与有条件的音调不同

2507.20836v1

584

07-28

RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge

RF-Herausforderung: Die datengetriebene Funkfrequenz-Signaltrennungs-Herausforderung

RF 挑战:数据驱动无线电频率信号分离挑战

2409.08839v3

585

07-28

Combolutional Neural Networks

Kombolutionäre Neuronale Netze

混合神经网络

2507.21202v1

586

07-28

On the similarity of bandwidth-tuned quantum kernels and classical kernels

Zur Ähnlichkeit von bandbreitengesteuerten Quantenkernen und klassischen Kerneln

关于带宽调频量子内核和古典内核的相似性

2503.05602v3

587

07-28

Why Flow Matching is Particle Swarm Optimization?

Warum ist Flow Matching Partikel-Swarm-Optimierung?

为什么花流合拍是粒子蜂群最佳化?

2507.20810v1

588

07-28

Understanding Bias in Perceiving Dimensionality Reduction Projections

Verständnis von Bias in Wahrnehmung von Dimensionalitätsreduktionsprojektionen

理解在认识减少多维度减少预测中的偏见

2507.20805v1

589

07-28

Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Kritik des unreinen Grundes: Enthüllen des Argumentationsverhaltens medizinischer Großsprachenmodelle

简便理由的批评:统一医学大语言模式的推理行为

2412.15748v2

590

07-28

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Ausrichten von großsprachlichen Modellagenten mit rationalen und moralischen Präferenzen: Ein überwachter Feintuning-Ansatz

将大语言示范物剂与理性和道德优先相匹配:受监督的微调办法

2507.20796v1

591

07-28

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

APTx Neuron: Unified Trainable Neuron Architecture Integrating Activation and Computation

APTx Neuron: 统一可训练的中子建筑综合激活和计算

2507.14270v3

592

07-28

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Kohärente Online-Road-Topologie Schätzung und Begründung mit Standard-Definitionskarten

与标准定义地图一致的在线道路地形图估计和理由

2507.01397v2

593

07-28

Dragonfly: a modular deep reinforcement learning library

Dragonfly: eine modulare Bibliothek für tiefe Verstärkung

龙蝇:一个模块式深强化学习图书馆

2505.03778v2

594

07-28

Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa

Satelliten-Oberflächen-Raum Maschinen-Learning-Modelle für Reservoir-Speicherschätzung: Regime-Sensitive Evaluation und Einsatz am Staudamm Loskop, Südafrika

在南非Loskop大坝储存量估计:制度敏感评价和行动部署卫星-储量储存量估计的卫星表面区域机械学习模型:系统敏感评价和行动部署

2502.19989v3

595

07-28

Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

Brancheneinblicke aus dem Vergleich von Deep Learning und GBDT-Modellen für E-Commerce Learning-to-Rank

比较深层学习和电子商务学习到兰克的GBDT模式的工业透视

2507.20753v1

596

07-28

Multilingual Self-Taught Faithfulness Evaluators

Mehrsprachige Selbstlernende Bewertung von Treue

多语言自学自学信仰评价员

2507.20752v1

597

07-28

Finite-Time Analysis of Discrete-Time Stochastic Interpolants

Finite-Time-Analyse von diskret-zeitlichen stochastischen Interpolanten

秘密-时时储存的内插刑警的短期分析

2502.09130v2

598

07-28

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models?

快速困难能否预测为加速理据模型的RL微调而在线化?

2507.04632v3

599

07-28

Learning the Value Systems of Societies from Preferences

Die Wertsysteme der Gesellschaften aus Präferenzen lernen

学习社会从优惠社会的价值体系

2507.20728v1

600

07-28

Everything is a Video: Unifying Modalities through Next-Frame Prediction

Alles ist ein Video: Vereinheitlichen von Modalitäten durch Next-Frame-Vorhersage

一切都是一部视频:通过下框架预测实现统一的方式

2411.10503v2

601

07-28

Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset

Verbesserung der tragbaren Wasserhahn-Audioerkennung durch Unterklasse-Annotation im HD-Epic-Datensatz

通过在HD-Epic数据集中分级注解,加强穿戴式塔普水音频探测

2505.20788v2

602

07-28

Uncertainty-driven Embedding Convolution

Ungewissheitsgetriebene Einbettung in die Konvolution

由不确定因素驱动的内嵌演变

2507.20718v1

603

07-28

GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution

GDSR: Global-Detail-Integration durch Dual-Branch-Netzwerk mit Wavelet-Verlusten für remote Sensing Image Super-Resolution

GDSR:通过带有遥感图像超分辨率波浪损失的双层网络实现全球详细一体化

2501.01460v3

604

07-28

Group Sequence Policy Optimization

Optimierung der Gruppensequenzpolitik

组序列政策优化

2507.18071v2

605

07-28

Prostate Cancer Classification Using Multimodal Feature Fusion and Explainable AI

Prostatakrebsklassifikation mit multimodaler Feature Fusion und erklärbarer KI

采用多模式特征融合和可解释的AI 的前列腺癌症分类

2507.20714v1

606

07-28

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Schnelle letzte Konvergenz der SGD im glatten Interpolationssystem

SGD在平滑的内插制度中的汇合

2507.11274v2

607

07-28

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Aufdecken der Illusion von Fairness: Prüfung von Schwachstellen bei Distributionsmanipulationsangriffen

《公平观:审计对分配操纵攻击的脆弱性》

2507.20708v1

608

07-28

PanoGAN A Deep Generative Model for Panoramic Dental Radiographs

PanoGAN Ein tiefes Generatives Modell für Panoramic Dental Radiographen

PanoGAN 全景牙科放射线的深创模型

2507.21200v1

609

07-28

Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data

Verbesserung des kontinuierlichen Open-World-Lernens unter den Zwängen knapper beschrifteter Daten

在缺少标签数据的限制下改进开放世界持续学习

2502.20974v2

610

07-28

Continual Low-Rank Scaled Dot-product Attention

Continual Low-Rank Scaled Dot-Produkt Achtung

持续低兰克缩放点产品注意

2412.03214v4

611

07-28

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Architektur-Aware Minimierung (A$^2$M): So finden Sie flache Minima in der neuralen Architektur Suche

尽量减少建筑-软件最小化(2亿澳元):如何在神经建筑搜索中找到Flat Minima

2503.10404v2

612

07-28

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

LUT Tensor Core: Ein Software-Hardware-Co-Design für LUT-basierte Low-Bit LLM-Inferenz

LUT 信标核心:基于 LUT的低比低 LLM 推断的软件-硬件共同设计

2408.06003v3

613

07-28

Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

Neue pivoted Cholesky Zersetzungen für effiziente Gaußschen Prozessableitung

高效高斯进程引力的分解

2507.20678v1

614

07-28

A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games

Eine multimodale Architektur für Endpoint-Positionsvorhersage in Team-basierten Multiplayer-Spielen

以团队为基础的多玩者运动会中端点定位预测的多模式架构

2507.20670v1

615

07-28

MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

MIMII-Agent: LLMs mit Funktionsaufruf für relative Auswertung der anomalen Schallerkennung

MIMII-代理:利用具有相对评估异常声音检测功能的LMs

2507.20666v1

616

07-28

Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications

Verbesserung der kompositorischen LLM-Reasoning mit strukturierten Arbeitsbeziehungen in der interaktiven multimodalen Kommunikation

与互动多模式通信中结构性任务关系有关的理由

2507.21199v1

617

07-28

Towards trustworthy AI in materials mechanics through domain-guided attention

Auf dem Weg zu vertrauenswürdiger KI in der Materialmechanik durch domänengeführte Aufmerksamkeit

通过域引导关注在材料机械学方面实现可信赖的AI

2507.20658v1

618

07-28

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Die Feature Speed Formel: ein flexibler Ansatz zur Skalierung von Hyperparametern tiefer neuronaler Netzwerke

特色速度公式:对深神经网络的超强参数进行缩放的灵活办法

2311.18718v4

619

07-28

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

Benchmarking Graph Neural Networks für die Dokumentenlayout-Analyse in öffentlichen Angelegenheiten

用于公共事务文件布局分析的图表神经网络

2505.14699v2

620

07-28

Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation

Tiefe generative Modelle der Evolution: SNP-Ebene Populationsanpassung durch genomische Verknüpfung

深刻的演变模式:通过基因组联系纳入SNP层次的人口适应

2507.20644v1

621

07-28

IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

IGNIS: Ein robustes neurales Netzwerk-Framework für eingeschränkte Parameterschätzungen in Archimedischen Copulas

IGNIS:Archimedean Copulas受控参数估计的强力神经网络框架

2505.22518v3

622

07-28

Learning Before Filtering: Real-Time Hardware Learning at the Detector Level

Lernen vor dem Filtern: Echtzeit-Hardware-Lernen auf Detektorebene

在过滤前学习:在探测器一级实时硬件学习

2506.11981v2

623

07-28

Secure Best Arm Identification in the Presence of a Copycat

Sichere Best Arm Identification in der Gegenwart eines Copycat

在有模仿器的情况下安全最佳武器识别

2507.18975v2

624

07-28

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Erweiterung großer multimodaler Modelle mit adaptiver Sparsamkeit und KV-Cache-Kompression

加强具有适应性平衡和KV缓存压缩的大型多式模型

2507.20613v1

625

07-28

Comparing and Scaling fMRI Features for Brain-Behavior Prediction

Vergleich und Skalierung von fMRI-Features für Gehirn-Verhalten-Vorhersage

比较和扩大FMRI 脑行为预测特征

2507.20601v1

626

07-28

Distributional Soft Actor-Critic with Three Refinements

Verteilungsweiche Aktor-Kritik mit drei Veredelungen

配发软软软动作器和三精度

2310.05858v6

627

07-28

PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation

PhaseNAS: Sprachmodellgestützte Architektursuche mit dynamischer Phasenanpassung

SqimNAS: 具有动态阶段适应性的语言模式驱动器建筑搜索

2507.20592v1

628

07-28

Enhancing generalization in high energy physics using white-box adversarial attacks

Verbesserung der Verallgemeinerung in der Hochenergiephysik mit White-Box-Angriffen

利用白箱对抗性攻击加强高能物理学的普及化

2411.09296v3

629

07-28

Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only “Better or Worse” Expert Feedback

Beyond Manual Annotation: Ein Mensch-AI-Kollaboratives Framework für medizinische Bildsegmentierung mit nur “Besser oder schlechter” Experten-Feedback

超越手册说明:仅使用“更好或更坏”专家反馈的人类-大赦国际医疗图像分割协作框架

2507.05815v2

630

07-28

AutoLibra: Agent Metric Induction from Open-Ended Feedback

AutoLibra: Agent Metric Induktion aus offenem Feedback

AutoLibra: 不限名额反馈的计量介绍代理

2505.02820v2

631

07-28

GASPnet: Global Agreement to Synchronize Phases

GASPnet: Globales Abkommen zur Synchronisierung von Phasen

GASPnet:同步阶段全球协定

2507.16674v2

632

07-28

A note on the Artstein-Avidan-Milman’s generalized Legendre transforms

Ein Hinweis auf Artstein-Avidan-Milmans generalisierte Legende transformiert

关于Artstein-Avidan-Milman的通用传说变换的注解

2507.20577v1

633

07-28

Fusing CFD and measurement data using transfer learning

Zusammenführen von CFD- und Messdaten mittels Transfer-Lernen

利用转让学习法解冻家庭发展筹资和测量数据

2507.20576v1

634

07-28

Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Reminiszenz-Angriff auf Residuals: Ausnutzung der ungefähren Maschine Unlearning für die Privatsphäre

对残余物的重复记忆攻击:利用近似机器不学习促进隐私

2507.20573v1

635

07-28

DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

DAG-AFL: 贫化的以环状图为基础的非同步联邦学习

2507.20571v1

636

07-28

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

CUDA-L1: Verbesserung der CUDA-Optimierung durch kontrastives Verstärkungslernen

CUDA-L1:通过反竞争强化学习改进CUDA优化

2507.14111v4

637

07-28

Statistical Inference for Differentially Private Stochastic Gradient Descent

Statistische Schlussfolgerung für unterschiedliche private stochastische Gradientenabstieg

不同私家私家稀有渐变后代的统计推推法

2507.20560v1

638

07-28

The Effect of Data Poisoning on Counterfactual Explanations

Die Auswirkung von Datenvergiftungen auf gegenfaktische Erklärungen

数据中毒对反事实解释的影响

2402.08290v4

639

07-28

Uncovering Gradient Inversion Risks in Practical Language Model Training

Uncovering Gradient Inversion Risiken in der praktischen Sprachmodellausbildung

实用语言示范培训中未覆盖的渐变风险

2507.21198v1

640

07-28

Improving Group Fairness in Tensor Completion via Imbalance Mitigating Entity Augmentation

Verbesserung der Gruppengerechtigkeit in der Tensor-Vervollständigung durch Imbalance Mitigating Entity Augmentation

通过不平衡的减轻实体增长扩大,改善集团公平性

2507.20542v1

641

07-28

NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

NbBench: Benchmarking-Sprachenmodelle für umfassende Nanobody-Aufgaben

NbBench:全面纳米机构任务的语言模式基准

2505.02022v2

642

07-28

Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

Action-Liste Verstärkungs-Lernsyndrom-Dekodierung für Binary Linear Block Codes

二元线性线性块块代码的标记

2507.17893v2

643

07-28

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

MagicMotion: Kontrollierbare Video-Generation mit Dense-to-Spar-Trajektorie-Anleitung

魔力运动:可控视频生成并配有高到分轨迹指导

2503.16421v2

644

07-28

Kimi K2: Open Agentic Intelligence

Kimi K2: Offene Agentische Intelligenz

Kimi K2:开放特工情报

2507.20534v1

645

07-28

Kernel Learning for Sample Constrained Black-Box Optimization

Kernel-Lernen für Probe eingeschränkte Black-Box-Optimierung

用于样本的内核学习

2507.20533v1

646

07-28

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

Versehentliche Sicherheitslücke: Faktoren bei Feinsteuerung, die das Modell schützen

意外脆弱性:改变模式保障保障措施的微调因素

2505.16789v2

647

07-28

AQUA: A Large Language Model for Aquaculture & Fisheries

AQUA: Ein großes Sprachmodell für Aquakultur und Fischerei

AQUA:水产养殖和渔业大语言模式

2507.20520v1

648

07-28

Geometric Representation Condition Improves Equivariant Molecule Generation

Geometrische Darstellung verbessert Gleichwertige Molekülerzeugung

条件改善等异分子生成

2410.03655v4

649

07-28

Guide your favorite protein sequence generative model

Führen Sie Ihre Lieblings-Protein-Sequenz generative Modell

指导您最喜爱的蛋白质序列基因模型

2505.04823v3

650

07-28

Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Effizienter Proxy Raytracer für optische Systeme mit impliziten Neuraldarstellungen

使用隐性神经仪表的光学系统

2507.20513v1

651

07-28

Tensor Completion with Nearly Linear Samples Given Weak Side Information

Tensor-Vervollständigung mit fast linearen Proben bei schwachen Seiteninformationen

由于侧面信息薄弱, Tensor 完成近线性样本的 Tensor 完成

2007.00736v4

652

07-28

Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning

Zugeschriebene Graphen-Clustering mit Multi-Scale Gewicht-basiert paarweise Coarsening und Kontrastives Lernen

与多比额表基于重量的对称相对宽度分析和差异性学习组合在一起的属性图

2507.20505v1

653

07-28

Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

Prover Agent: Ein agentenbasiertes Framework für formale mathematische Nachweise

以代理人为基础的正式数学证明框架

2506.19923v2

654

07-28

REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle

REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法

2501.03262v7

655

07-28

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Lernen zu lernen, während man aufhält: Gradientenkonflikte im maschinellen Lernen bekämpfen

学习在保存时不学习 : 在机器不学习中对抗渐变冲突

2503.06339v2

656

07-28

Customize Multi-modal RAI Guardrails with Precedent-based predictions

Multimodale RAI-Guardrails mit vorausschauenden Vorhersagen anpassen

定制具有先例预测的多式RAI护卫车

2507.20503v1

657

07-28

DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

DmC: Nächstgelegenes Orientierungs-Diffusionsmodell für Offline-Querdomain-Verstärkungs-Lernen

DMC: 近邻教育指导离线跨领域强化学习推广模式

2507.20499v1

658

07-28

Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

Mischung aus Länge und Pruning Experten für Wissensgraphen Reasoning

知识图解释理由的长长和缓冲专家混合

2507.20498v1

659

07-28

Classification of high-dimensional data with spiked covariance matrix structure

Klassifizierung von hochdimensionalen Daten mit spiked Kovarianz-Matrix-Struktur

高维数据分类和加压共变矩阵结构

2110.01950v2

660

07-28

Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data

Position: Untrainiertes maschinelles Lernen zur Erkennung von Anomalien durch Verwendung von 3D-Punkt-Cloud-Daten

位置: 使用 3D 点云数据进行异常检测的未经训练的机器学习

2502.03876v3

661

07-28

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Eine neue Methode zur zufälligen Reshuffling für ungenügende Nicht-Konvex-Finite-Summe-Optimierung

用于非移动非convelx Finite- 和优化的新随机调整方法

2312.01047v3

662

07-28

Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

Deep Reputation Scoring in DeFi: zScore-based Wallet Ranking von Liquidität und Handelssignalen

DFi:从流动性和交易信号中排列的基于zScolor的钱包

2507.20494v1

663

07-28

HIAL: A New Paradigm for Hypergraph Active Learning via Influence Maximization

HIAL: Ein neues Paradigma für Hypergraph Aktives Lernen durch Einflussmaximierung

HIAL:通过影响最大化进行超光速积极学习的新范例

2507.20490v1

664

07-28

Conditional Diffusion Models for Global Precipitation Map Inpainting

Bedingte Diffusionsmodelle für die weltweite Niederschlagskarte Inpainting

全球降地地图油漆有条件传播模型

2507.20478v1

665

07-28

Token Reduction Should Go Beyond Efficiency in Generative Models – From Vision, Language to Multimodality

Token-Reduktion sollte über Effizienz in generativen Modellen hinausgehen – Von Vision, Sprache zur Multimodalität

从愿景、语言到多式联运

2505.18227v2

666

07-28

Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction

Operator-Inferenz Bewusst Quadratische Manifolds mit isotropen reduzierten Koordinaten für nicht-intrusive Modellreduktion

使用不侵扰性减少模型减少非侵入性模型的慢位位坐标

2507.20463v1

667

07-28

EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks

EdgeAgentX-DT: Integrieren von digitalen Zwillingen und Generative KI für resiliente Edge-Intelligenz in taktischen Netzwerken

EGEAGENTX-DT:将数字双双结合和生成AI,以在战术网络中建立有弹性的边缘情报

2507.21196v1

668

07-28

Shapley-Value-Based Graph Sparsification for GNN Inference

Shapley-Value-Based Graph Sparsification für GNN-Inferenz

GNN 推断法的基于形状值的图形分隔

2507.20460v1

669

07-28

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

Masked Autoencoder, die das Herz fühlen: Enthüllen Einfachheit Bias für EKG-Analysen

感觉心脏的蒙面自动代码器:用于ECG分析的“永存的简单比”

2506.22495v4

670

07-28

Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Diagonal gewichtete generalisierte Methode von Momenten Schätzung für Gaussian Mixture Modeling

Gaussian Mixture 模型模型的对等光速估计动量通用方法

2507.20459v1

671

07-28

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Frequency-Aware Autoregressive Modellierung für effiziente High-Resolution-Bildsynthese

高效高分辨率图像合成高效高分辨率图像集自动回归模型

2507.20454v1

672

07-28

Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Schwach-zu-starke Verallgemeinerung mit Ausfall-Trajektorien: Ein baumbasierter Ansatz zur Elizit-Optimal-Politik in starken Modellen

与失败轨迹相协调的弱力至强力普遍化:以树为基础的方法,在强型模型中采用适当的最佳政策

2507.18858v2

673

07-28

Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Ihre Aufmerksamkeit ist wichtig: die Robustheit des Modells zu verbessern, um Geräusche und spurlose Korruptionen zu verursachen

注意事项:改进噪音和纯洁的标本的示范性强力

2507.20453v1

674

07-28

WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

WEEP: Ein differenzierbarer, nicht konvexe Sparse Regularizer über schwach-konvexe Umhüllung

WEEP:通过微弱-Convex 信封的可区别的、不可区分的、非confvex Spassar 正规化器

2507.20447v1

675

07-28

BOASF: A Unified Framework for Speeding up Automatic Machine Learning via Adaptive Successive Filtering

BOASF: Ein einheitliches Framework zur Beschleunigung des automatischen maschinellen Lernens durch adaptives aufeinander folgendes Filtern

BOASF: 通过适应性连续过滤加速自动机学习的统一框架

2507.20446v1

676

07-28

Provable In-Context Learning of Nonlinear Regression with Transformers

Voraussichtliches In-Context-Lernen von nichtlinearer Regression mit Transformern

以变换器对非线性回归的可证实的内文学习

2507.20443v1

677

07-27 (7)

BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool

BioNeuralNet: Ein Graph Neural Network basiertes Multi-Omics Network Data Analysis Tool

生物神经网:基于多功能网络数据分析工具的图表神经网络工具

2507.20440v1

678 07-27 Critiques of World Models Kritik an Weltmodellen 世界模式的证明 2507.05169v3

679

07-27

Surrogate modeling of Cellular-Potts Agent-Based Models as a segmentation task using the U-Net neural network architecture

Surrogate Modellierung von Zellular-Potts Agent-Based Models als Segmentierungsaufgabe mit Hilfe der U-Net-Neuralnetzwerkarchitektur

利用 U-Net 神经网络结构结构,将代用基于细胞-动力代理模型建模作为一种分离任务

2505.00316v3

680

07-27

FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning

FAST: Ähnlichkeitsbasierter Wissenstransfer für effizientes politisches Lernen

FAST: 以相似性为基础的知识转让,促进有效的政策学习

2507.20433v1

681

07-27

Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning

Dichteverhältnis Schätzungsbasierte Bayesische Optimierung mit semi-überwachtem Lernen

基于巴耶斯最优化的半强化学习

2305.15612v5

682

07-27

Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs

Interpretierbare, auf Anomalien basierende DDoS-Erkennung in AI-RAN mit XAI und LLMs

在AI-RAN使用 XAI 和LLM 进行AI-RAN的基于解释的DDoS 探测

2507.21193v1

683

07-27

ResCap-DBP: A Lightweight Residual-Capsule Network for Accurate DNA-Binding Protein Prediction Using Global ProteinBERT Embeddings

ResCap-DBP: Ein leichtes Residual-Capsule-Netzwerk für präzise DNA-Binding-Protein-Vorhersage mit globalen Protein-BERT-Embeddings

ResCapt-DBP:利用全球蛋白BER 嵌入器进行精密DNA丁丁蛋白蛋白预测的轻量残余能力网络

2507.20426v1

684

07-27

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Kommunikation-Effizient verteiltes Training für kollaborative Flat Optima Erholung im Deep Learning

促进深学习合作、平板最佳最佳恢复的传播-高效分配培训

2507.20424v1

685

07-27

Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?

Umfrage zu NLU-Benchmarks Diagnose Linguistische Phänomene: Warum nicht Diagnose-Benchmarks standardisieren?

NLU基准诊断语言神话调查:为什么不使诊断基准标准化?

2507.20419v1

686

07-27

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Clarify lernen: Multiturn-Gespräche mit aktionsbasiertem Kontrast-Selbst-Training

学习澄清:与基于行动的差异性自我培训进行多方向对话

2406.00222v2

687

07-27

A General Framework for Estimating Preferences Using Response Time Data

Ein allgemeiner Rahmen für die Schätzung von Präferenzen unter Verwendung von Reaktionszeitdaten

利用反应时间数据估计优惠的一般框架

2507.20403v1

688

07-27

A Free Probabilistic Framework for Analyzing the Transformer-based Language Models

Ein freier probabilistischer Rahmen für die Analyse der transformerbasierten Sprachmodelle

分析以变换器为基础的语言模型的自由概率框架

2506.16550v2

689

07-27

Exploring Adaptive Structure Learning for Heterophilic Graphs

Erforschen von adaptivem Strukturlernen für heterophile Graphen

探索异性哲学图的适应性结构学习

2507.21191v1

690

07-27

Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals

Jenseits neuraler Netzwerke: Symbolische Vernunft über Wavelet Logic Graph Signals

超越神经网络:波盘逻辑图信号的符号原因

2507.21190v1

691

07-27

Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning

Operator-based Machine Intelligence: Ein Hilbert Space Framework für Spektrales Lernen und Symbolische Vernunft

以操作者为基础的机器情报:希尔伯特光学学习和符号理由空间框架

2507.21189v1

692

07-27

Bipedalism for Quadrupedal Robots: Versatile Loco-Manipulation through Risk-Adaptive Reinforcement Learning

Bipedalismus für Vierradroboter: Vielseitige Loko-Manipulation durch Risiko-Adaptive Verstärkungs-Lernen

四肢机器人的双轨主义:通过风险评估强化学习进行Versatile Loco-管理

2507.20382v1

693

07-27

Set-based Implicit Likelihood Inference of Galaxy Cluster Mass

Set-based Implicit Likelihood Inferenz von Galaxy Cluster Masse

银河群群群集基于设定的隐含可能性推推推

2507.20378v1

694

07-27

PyG 2.0: Scalable Learning on Real World Graphs

PyG 2.0: 真实世界图表上的可缩放学习

2507.16991v2

695

07-27

WBHT: A Generative Attention Architecture for Detecting Black Hole Anomalies in Backbone Networks

WBHT: Eine generative Aufmerksamkeitsarchitektur zur Erkennung von Schwarzlochanomalien in Backbone Networks

WBHT:用于检测后骨网络黑洞异常现象的引人注意结构

2507.20373v1

696

07-27

A Learning-based Domain Decomposition Method

Eine lernbasierte Methode der Domänenzersetzung

以学习为基础的域分解方法

2507.17328v2

697

07-27

Memorization: A Close Look at Books

Auswendiglernen: Ein genauer Blick auf Bücher

记忆化:对书籍的近视

2504.12549v2

698

07-27

Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Clustering by Aufmerksamkeit: Leveraging Previous Fitted Transformers for Data Partitioning

集中集束注意: 利用事先适合的变异器来利用数据分割

2507.20369v1

699

07-27

Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis

Sequence-Aware Inline-Messung Attribution für gut-schlechte Wafer-Diagnose

良好巴德瓦费尔诊断的测序内线测量属性

2507.20364v1

700

07-27

Lagrangian neural networks for nonholonomic mechanics

Lagrangeische neuronale Netzwerke für nichtholonomische Mechanik

Lagrangian 神经网络,用于非蛋白体力学机械学

2411.00110v2

701

07-27

MH-GIN: Multi-scale Heterogeneous Graph-based Imputation Network for AIS Data (Extended Version)

MH-GIN: Multiskaliges Heterogenes Graph-basiertes Imputationsnetzwerk für AIS-Daten (erweiterte Version)

MH-GIN:AIS数据多比例异异形图表计算网(Expended 版本)

2507.20362v1

702

07-27

Wafer Defect Root Cause Analysis with Partial Trajectory Regression

Wafer fehlerhafte Wurzelursachenanalyse mit partieller Trajektorieregression

Wafer 偏差根源分析,带有部分轨倒退

2507.20357v1

703

07-27

A Theory of $θ$-Expectations

Eine Theorie von $θ$-Erwartungen

美元预期值的理论

2507.20353v1

704

07-27

Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs

Einbettungen in die Diagnose: Latent Fragilität unter Agentische Störungen in klinischen LLMs

诊断的嵌入:临床LMS中的干燥干扰下的潜在易碎性

2507.21188v1

705

07-27

Computational Advantages of Multi-Grade Deep Learning: Convergence Analysis and Performance Insights

Computationale Vorteile von Multi-Grade Deep Learning: Konvergenzanalyse und Leistungseinblicke

多年级深层学习的计算优势:趋同分析和业绩透视

2507.20351v1

706

07-27

From Observations to Causations: A GNN-based Probabilistic Prediction Framework for Causal Discovery

Von Beobachtungen zu Kausationen: Ein auf GNN basierendes probabilistisches Prognose-Framework für die kausale Entdeckung

从观察到因果关系:基于GNN的 “ 发现原因概率预测框架 “

2507.20349v1

707

07-27

RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model

RadMamba: Effiziente Erkennung menschlicher Aktivität durch Radar-basiertes Mikro-Doppler-Orientiertes Mamba State-Space-Modell

RadMamba:通过以雷达为基础的以微型多普勒为导向的Mamba国家空间模型,有效认识人类活动

2504.12039v2

708

07-27

Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

Hypergraph Neuronale Netzwerke enthüllen räumliche Domänen aus Single-cell-Transkriptionsdaten

从单细胞转换器数据中提取空间域域

2410.19868v2

709

07-27

Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data

Interpretierbare Graph Kolmogorov-Arnold-Netzwerke für Multi-Cancer-Klassifikation und Biomarker-Identifikation mittels Multi-Omics-Daten

利用多有机数据进行多癌症分类和生物标志识别的可解释图表 Kolmogorov-Arnold网络

2503.22939v3

710

07-27

Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Pflegen hilfreicher, personalisierter und kreativer KI-Lehrer: Ein Rahmen für pädagogische Ausrichtung mittels Stärkungslernen

培养有助、个性化和创意的AI导师:利用强化学习实现教学协调的框架

2507.20335v1

711

07-27

The Blessing and Curse of Dimensionality in Safety Alignment

Der Segen und Fluch der Dimensionalität in der Sicherheitsausrichtung

安全协调中多维度的祝福和诅咒

2507.20333v1

712

07-27

FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing

FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung

流动对等: 轨迹- 重新分类、转换- 无流动图像编辑

2505.23145v4

713

07-27

MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction

MIPS: ein multimodales Infinite Polymer Sequence Pre-Training Framework für Polymer Property Prediction

MIPS: 聚合物财产预测的多式联运无限聚合物序列培训前框架

2507.20326v1

714

07-27

ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios

ELMES: Ein automatisierter Rahmen für die Bewertung großer Sprachmodelle in Bildungsszenarien

ELMES:评估教育情景中大语言模式自动框架

2507.22947v1

715

07-27

A Comparative Study of OpenMP Scheduling Algorithm Selection Strategies

Eine vergleichende Studie der OpenMP-Scheeduling-Algorithm-Auswahlstrategien

OpenMP 测高计表选择战略比较研究

2507.20312v1

716

07-27

What is Wrong with Perplexity for Long-context Language Modeling?

Was ist falsch an Verwirrung für Langkontext-Sprachenmodellierung?

长文本语言建模的复杂性有什么问题?

2410.23771v5

717

07-27

First-Order Sparse Convex Optimization: Better Rates with Sparse Updates

Sparse Convex Optimization: Bessere Preise mit Sparse-Updates

第一序式螺旋螺旋式最优化: 与粗序更新相比, 利率更好。

2506.19075v2

718

07-27

Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach

Auf dem Weg zu einem generalisierten Parameter Tuning in kohärenten Ising-Maschinen: Ein portfoliobasierter Ansatz

向一致的自相矛盾机器的一般参数图示:基于组合的办法

2507.20295v1

719

07-27

Context-Aware Deep Lagrangian Networks for Model Predictive Control

Context-Aware Deep Lagrangian Networks für Modellvorhersagesteuerung

用于模型预测控制的深拉格朗江网络

2506.15249v3

720

07-27

Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation

Kontrollierbares Feature Whitening für hyperparameterfreie Bias Mitigation

用于减缓超参数-无偏见的可控地貌白化

2507.20284v1

721

07-27

Machine Learning Model Integration with Open World Temporal Logic for Process Automation

Machine Learning Model Integration mit Open World Temporal Logic für die Prozessautomatisierung

与开放世界时间逻辑集成的机械学习模型集成

2506.17776v2

722

07-27

Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Agent-Fin-R1: Verbesserung der Finanzintelligenz durch Domain-Expertise, Trainingseffizienz und Advanced Reasoning

Agentar Fin-Fin-R1:通过域域专门知识、培训效率和高级理由加强金融情报

2507.16802v4

723

07-27

Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation

Vidar: Verkörpertes Video-Diffusionsmodell für die generalistische Bimanualmanipulation

Vidar: 通用主义二手手操纵录相传播模型

2507.12898v2

724

07-27

Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence

Annähernde vollständige konforme Vorhersage für neurale Netzwerkregression mit Gauß-Newton-Einfluss

在高斯-牛顿影响下对神经网络倒退进行近似全常规预测

2507.20272v1

725

07-27

Data-Efficient Prediction-Powered Calibration via Cross-Validation

Dateneffiziente Vorhersage-Powered Kalibrierung über Cross-Validation

通过交叉校准进行数据有效预测力校准

2507.20268v1

726

07-27

Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

Ungewissheits-Bewusst-Test-Zeit-Optimierung für 3D menschliche Pose-Schätzung

3D 人类粒子估计的不确定性-软件测试-时间优化

2402.02339v2

727

07-27

Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining

Lernen von Experten-Faktoren: Trajektorien-Level-Reward-Formung für den Formelischen Alpha-Mining

从专家因素中学习:公式阿尔法采矿的轨迹级奖得分形状

2507.20263v1

728

07-27

Leveraging Analytic Gradients in Provably Safe Reinforcement Learning

Nutzung analytischer Gradienten im wahrscheinlich sicheren Ausbau-Lernen

在安全强化学习中利用分析梯度

2506.01665v2

729

07-27

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference

GQSA: Gruppe Quantisierung und Sparsamkeit für die Beschleunigung der großen Sprachmodellinferenz

GQSA:加速使用大语言模式模型推断的组量化和分数

2412.17560v4

730

07-27

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration für große Sprachmodelle

ABQ-LLLM:大语言模型的任意-Bit量化推断加速

2408.08554v3

731

07-27

Semi-Supervised Risk Control via Prediction-Powered Inference

Halbüberwachte Risikokontrolle durch vorausschauende Schlussfolgerung

通过预测力推断的半监督风险控制

2412.11174v2

732

07-27

Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design

Protein-SE(3): Benchmarking SE(3)-basierte Generative Modelle für Proteinstrukturdesign

蛋白因-SE(3):制定SE(3)基准的蛋白因结构设计生成模型

2507.20243v1

733

07-27

Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers

Contrast-CAT: Kontrastierende Aktivierungen für verbesserte Interpretierbarkeit in Transformer-basierten Textklassifikatoren

反对-CAT:在基于变换器的文本分类中增强解释力的对比活动

2507.21186v1

734

07-27

Recursive KalmanNet: Analyse des capacités de généralisation d’un réseau de neurones récurrent guidé par un filtre de Kalman

Rekursives KalmanNet: Analyse des capacités de généralisierung d’un réseau de neurones récurrent guidé par un filtre de Kalman

Crecursive KalmanNet:卡尔曼岛非过滤状态神经元神经元神经元神经元系统总分类能力分析

2507.14144v2

735

07-27

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung

TinySQL: 用于机械解释性研究的渐进文本到SQL数据集

2503.12730v4

736

07-27

GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Geführte Zahl: Quantisierung von großen Sprachmodellen durch Ausnutzung der End Loss Guidance

向导量:通过利用最终损失指导意见对大语言模型进行量化

2505.07004v3

737

07-27

Adaptive Real-Time Multi-Loss Function Optimization Using Dynamic Memory Fusion Framework: A Case Study on Breast Cancer Segmentation

Adaptive Echtzeit-Multi-Loss-Funktionsoptimierung mittels Dynamic Memory Fusion Framework: Eine Fallstudie zur Brustkrebssegmentierung

利用动态记忆融合框架,利用动态记忆融合框架优化适应性实时多损失功能:乳腺癌分割案例研究

2410.19745v2

738

07-27

Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading

Technical Indicator Networks (TINs): Eine interpretierbare Neuralarchitektur zur Modernisierung der klassischen al-Technischen Analyse für adaptives algorithmisches Trading

技术指标网络(TINs):适应性定值贸易的现代经典技术分析解释性神经结构

2507.20202v1

739

07-27

Does equivariance matter at scale?

Fällt die Gleichwertigkeit im Maßstab auf?

在规模上,等差是否重要?

2410.23179v2

740

07-27

Partial Domain Adaptation via Importance Sampling-based Shift Correction

Partielle Domänenanpassung über wichtige Sampling-basierte Shift-Korrektur

通过基于重要性抽样的调整校正

2507.20191v1

741

07-27

NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis

NeuroCLIP: Eine multimodale kontrastive Lernmethode für rTMS-behandelte Methamphetamin-Addiktionsanalyse

NeuroCLIP:经RTMS处理的甲基苯丙胺成瘾分析的多式反竞争学习方法

2507.20189v1

742

07-27

Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern

齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略

2506.05713v2

743

07-27

ASNN: Learning to Suggest Neural Architectures from Performance Distributions

ASNN: Neurale Architekturen aus Leistungsverteilungen vorschlagen lernen

ASNN: 学习从业绩分配中建议神经结构

2507.20164v1

744

07-27

Syno: Structured Synthesis for Neural Operators

Syno: Strukturierte Synthese für neurale Operatoren

同步:神经操作员结构化合成

2410.23745v2

745

07-27

Practical Multi-Task Learning for Rare Conversions in Ad Tech

Praktisches Multi-Task-Lernen für rare Konvertierungen in der Anzeigentechnik

利用实用多任务学习技术技术中稀有转换的多目的实用学习

2507.20161v1

746

07-27

On the Role of Discrete Representation in Sparse Mixture of Experts

Über die Rolle der diskreten Vertretung in der Sparse Mischung von Experten

关于专家在散乱混混中代表的混乱作用问题

2411.19402v2

747

07-27

SETOL: A Semi-Empirical Theory of (Deep) Learning

SETOL: Eine semi-empirische Theorie des (Tiefen) Lernens

SETOL:半经验学理论(深)学习

2507.17912v2

748

07-27

The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models

The Policy Cliff: Eine theoretische Analyse von Belohnungs-Policy-Karten in großen Sprachmodellen

政策悬崖:大语言模式奖励政策图的理论分析

2507.20150v1

749

07-27

Awesome-OL: An Extensible Toolkit for Online Learning

Awesome-OL: Ein umfangreiches Toolkit für das Online-Lernen

OSUE-OL:网上学习扩展工具包

2507.20144v1

750

07-27

Generalized Trusted Multi-view Classification Framework with Hierarchical Opinion Aggregation

Generalized Trusted Multi-View-Klassifikationsrahmen mit Hierarchischer Meinung Aggregation

普遍信任的多观点分类框架和等级性意见汇总

2411.03713v2

751

07-27

Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

Verteiltes Lernen über willkürliche Topologie: Lineares Tempo-Up mit polynomischer Transienten Zeit

任意地形学的分布式学习:线性快速提升与多面性瞬时

2503.16123v2

752

07-27

EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models

EvoSLD: Automatisierte Neural Scaling Law Discovery mit großen Sprachmodellen

EvoSLD: 用大语言模型发现自动神经放大法

2507.21184v1

753

07-27

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Wann funktioniert Metadata Conditioning (NOT) für Sprachmodell-Vorschulungen? Eine Studie mit kontextfreien Grammatiken

元数据条件(NOT)何时能为语言示范培训前培训提供语言示范?无背景语法研究

2504.17562v2

754

07-27

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

MaPPO: Maximale Posteriori-Preference-Optimierung mit vorherigem Wissen

MaPPPO: 与先前知识最优化的后世偏好

2507.21183v1

755

07-27

Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design

Generative Molekülentwicklung mit 3D-Pharmakore für effizientes strukturbasiertes Drug Design

利用3D药用磷进行高效结构制药物设计生成分子进化

2507.20130v1

756

07-27

Aggregation-aware MLP: An Unsupervised Approach for Graph Message-passing

Aggregation-aware MLP: Ein unbeaufsichtigter Ansatz für Graph Message-Passing

聚合意识 MLP: 图形信件传送的无人监督的方法

2507.20127v1

757

07-27

Minimax Optimal Reinforcement Learning with Quasi-Optimism

Minimax Optimales Stärkungslernen mit Quasi-Optimismus

以准适应主义进行最优化强化学习

2503.00810v3

758

07-27

Wine Characterisation with Spectral Information and Predictive Artificial Intelligence

Weincharakterisierung mit Spektralinformation und vorausschauender Künstlicher Intelligenz

光谱信息和预报人工智能的优美特征

2507.20114v1

759

07-27

Online Learning with Probing for Sequential User-Centric Selection

Online-Lernen mit Probing für die sequentielle Benutzer-Centric-Auswahl

在线学习,通过测试进行序列用户- Centric 选择

2507.20112v1

760

07-27

NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding

NeuroVoxel-LM: Sprachorientierte 3D-Perception über dynamische Voxelisierung und Meta-Embedding

NeuroVoxel-LM:通过动态氧化化和代谢生成的3D感知

2507.20110v1

761

07-27

Graded Transformers: A Symbolic-Geometric Approach to Structured Learning

Gradierte Transformer: Ein symbolisch-geometrischer Ansatz zum strukturierten Lernen

等级变换器:结构化学习的象征性地质计量方法

2507.20108v1

762

07-27

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

RL$^3$: Förderung des Meta-Verstärkung-Lernens über RL innerhalb RL$^2$

3,300卢比:通过RL在RL内促进元加强学习,2卢比

2306.15909v6

763

07-27

Analytic Continual Test-Time Adaptation for Multi-Modality Corruption

Analytische kontinuierliche Test-Zeit-Anpassung für Multi-Modalität Korruption

多模式腐败分析分析的连续测试时间适应多模式腐败

2410.22373v2

764

07-27

EcoTransformer: Attention without Multiplication

EcoTransformer: Achtung ohne Multiplikation

生态转换:注意不乘数

2507.20096v1

765

07-27

Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Meta Fusion: Ein einheitliches Rahmenwerk für Multimodalitätsfusion mit gegenseitigem Lernen

元融合:多式联运与相互学习统一框架

2507.20089v1

766

07-27

Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC

2505.18300v3

767

07-27

Feed-anywhere ANN (I) Steady Discrete $\to$ Diffusing on Graph Hidden States

Futtermittel überall ANN (I) Steady Discrete $\to$ Diffusion auf Graph Hidden States

ANN (I) 稳定地在图表隐藏状态上分解 $\ to $@to$#fef

2507.20088v1

768

07-26 (6)

Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection

Cluster Purge Loss: Strukturierung von Transformer-Embeddings für äquivalente Mutanten-Detektion

组群清除损失:对等变异物探测的变异体嵌入结构

2507.20078v1

769

07-26

Sparse Equation Matching: A Derivative-Free Learning for General-Order Dynamical Systems

Sparse Equation Matching: Ein Derivativ-freies Lernen für allgemein geordnete dynamische Systeme

分布分布式配对:通用平极动态系统无衍生性无损学习

2507.20072v1

770

07-26

Multi-Person Interaction Generation from Two-Person Motion Priors

Multi-Personen-Interaktionsgenerierung von Zwei-Personen-Motion-Prioren

从两人先前动议中产生多人相互影响

2505.17860v2

771

07-26

PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

PERRY: Politikevaluierung mit Vertrauensintervallen unter Verwendung von Zusatzdaten

使用辅助数据进行具有互信性的政策评价

2507.20068v1

772

07-26

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

PITA: Präferenz-geführte Inferenz-Zeit-Ausrichtung für LLM nach dem Training

PITA:LLM培训后培训的优先指导推论-时间协调

2507.20067v1

773

07-26

Geometric Operator Learning with Optimal Transport

Geometrisches Bedienerlernen mit optimalem Verkehr

以最佳运输方式学习几何操作员

2507.20065v1

774

07-26

Strategic Filtering for Content Moderation: Free Speech or Free of Distortion?

Strategisches Filtern für Content Moderation: Freie Sprache oder frei von Verzerrung?

内容调节的战略过滤: 言论自由还是无扭曲?

2507.20061v1

775

07-26

ModShift: Model Privacy via Designed Shifts

ModShift: Model Privacy über Designed Shifts

ModShifft: 通过设计变换实现的模型隐私

2507.20060v1

776

07-26

RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation

RAG in the Wild: Über die (In)Wirksamkeit von LLMs mit Mixture-of-Knowledge Retrieval Augmentation

野生ROG:关于利用混合知识回收增加的LLMs(内)效力

2507.20059v1

777

07-26

Predicting Parkinson’s Disease Progression Using Statistical and Neural Mixed Effects Models: A Comparative Study on Longitudinal Biomarkers

Vorhersage der Progression der Parkinson-Krankheit anhand statistischer und neuraler Mixed Effects-Modelle: Eine vergleichende Studie über Längsschnittbiomarker

利用统计和神经混合效应模型预测帕金森氏疾病进展:纵向生物标记的比较研究

2507.20058v1

778

07-26

What Can Grokking Teach Us About Learning Under Nonstationarity?

Was kann Grokking uns über das Lernen unter Nonstationarität lehren?

格罗金能教我们什么如何在不固定状态下学习?

2507.20057v1

779

07-26

Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism

Verbesserung der Deep Learning-basierten Atemschallanalyse mit Frequenzauswahl und Aufmerksamkeitsmechanismus

利用频率选择和注意机制改进基于深学习的呼吸系统无害分析

2507.20052v1

780

07-26

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

$K^4$: Online Log Anomalienerkennung durch unüberwachtes Lernen

4K元:在线记录异常探测不受监督的典型学习

2507.20051v1

781

07-26

Irredundant $k$-Fold Cross-Validation

Irredundant $k$-Fold Cross-Validierung

溢余美元-折价交叉估价

2507.20048v1

782

07-26

Improving Audio Classification by Transitioning from Zero- to Few-Shot

Verbesserung der Audioklassifikation durch Übergang von Null- auf Wenig-Schuss

通过从零转向少热,改进音频分类

2507.20036v1

783

07-26

Selective Prompt Anchoring for Code Generation

Selektive Prompt-Ankerung für die Code-Generierung

代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代

2408.09121v6

784

07-26

Preference learning made easy: Everything should be understood through win rate

Vorliebe Lernen leicht gemacht: Alles sollte durch Win-Rate verstanden werden

首选学习容易:人人都应通过双赢率来理解一切

2502.10505v2

785

07-26

Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization

Machine-Learning-Assisted Photonic Device Development: Ein multiskaliger Ansatz von der Theorie zur Charakterisierung

机学辅助光学设备开发:从理论到定性的多尺度方法

2506.20056v2

786

07-26

When Engineering Outruns Intelligence: A Re-evaluation of Instruction-Guided Navigation

Wenn Engineering Outruns Intelligenz: Eine Neubewertung der instruction-guided Navigation

Engineering Outs Onsruns Intelling:重新评价指示引导导航

2507.20021v1

787

07-26

Conformal Safety Shielding for Imperfect-Perception Agents

Konforme Sicherheitsabschirmung für Imperfect-Perception Agents

为不合格感化物剂提供正规安全防护

2506.17275v2

788

07-26

Shape Invariant 3D-Variational Autoencoder: Super Resolution in Turbulence flow

Shape Invariant 3D-Variational Autoencoder: Super Auflösung im Turbulenzfluss

形状 3D - 变化式自动编码器: 波动流中的超级分辨率

2507.22082v1

789

07-26

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Eine Praxis des Post-Trainings auf Llama-3 70B mit optimaler Auswahl des zusätzlichen Sprachmischverhältnisses

Llama-3-70B培训后做法,最佳选择其他语言混合比率

2409.06624v2

790

07-26

FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

FedSWA: Verbesserung der Generalisierung im Federated Learning mit hoch Heterogenen Daten über Momentum-basierte stochastische kontrollierte Gewichtsverringerung

FedSWA:通过基于动力的存储器控控湿率提高具有高度异异变数据的联邦学习普及程度

2507.20016v1

791

07-26

PaRCE: Probabilistic and Reconstruction-based Competency Estimation for CNN-based Image Classification

PaRCE: Probabilistische und rekonstruktionsbasierte Kompetenzschätzung für CNN-basierte Bildklassifikation

PaRCE:有线电视新闻网图像分类的概率和基于重建的能力估计

2411.16715v3

792

07-26

Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost

Robuste Taxi-Fare-Prognose unter Lärmbedingungen: Eine vergleichende Studie von GAT, TimesNet und XGBoost

噪音条件下的强劲的出租车票价预测:GAT比较研究,TimesNet和XGBoost

2507.20008v1

793

07-26

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning

HeLo:情感分布学习中带有标签关联的异变多模式融合

2507.06821v3

794

07-26

MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

MeTHanol: Modularisiertes Denken von Sprachmodellen mit Intermediate Layer Thinking, Decodierung und Bootstrapping Reasoning

METHanol:含有中间层思考、解毒和诱导理由的模块化思维语言模型

2409.12059v5

795

07-26

GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning

GLC++: Source-Free Universal Domain Adaptation durch Global-Local Clustering und Contrastive Affinity Learning

GLLC++:通过全球-地方集束和差异性亲密学习实现无源通用域域适应

2403.14410v2

796

07-26

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Die dunkle Seite der Kräfte: Bewertung nicht konservativer Kraftmodelle für atomistisches maschinelles Lernen

部队的黑暗面:评估非保守力量模型,以进行原子学机器学习

2412.11569v5

797

07-26

Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion

Effiziente stimmkonditionierte Musikgeneration über Soft Alignment Aufmerksamkeit und Latent Diffusion

通过软对齐关注和远程传播, 高效的Vocal有条件的音乐制作

2507.19991v1

798

07-26

Visual Analytics Using Tensor Unified Linear Comparative Analysis

Visual Analytics mit Tensor Unified Linear Comparative Analysis

利用透光器统一线性比较分析进行视觉分析

2507.19988v1

799

07-26

Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice

Recurrent neuronale Netzwerkwellenfunktionen für Rydberg-Atomarrays auf Kagome-Gitter

Rydberg kagome 板上原子阵列的经常性神经网络波函数

2405.20384v2

800

07-26

LLM-Adapted Interpretation Framework for Machine Learning Models

LLM-adapted Interpretation Framework for Machine Learning Models

LLM-成熟的机器学习模型解释框架

2507.21179v1

801

07-26

Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

Robustes Daten-Wasserzeichen in Sprachmodellen durch Einspritzen fiktiver Kenntnisse

在语言模型中,通过输入有说服力的知识在语言模型中进行强力数据水上标记

2503.04036v3

802

07-26

The Origin of Self-Attention: Pairwise Affinity Matrices in Feature Selection and the Emergence of Self-Attention

Der Ursprung der Selbstachtung: Paarweise Affinitätsmatrizen in der Feature-Auswahl und das Entstehen der Selbstachtung

自我关注的起源:选择地物中的对等亲亲关系母体和自我关注的出现

2507.14560v2

803

07-26

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

MegaScale-Infer: Servieren von Mixture-of-Experts auf Scale mit disaggregierten Experten-Parallelismus

超星级――推:利用分级专家平行主义在规模上为混合专家服务

2504.02263v4

804

07-26

Extreme value theory for singular subspace estimation in the matrix denoising model

Extreme Werttheorie für die singuläre Subraumschätzung im Matrix-Denoisierungsmodell

矩阵除空模型中单子空间估计极端值极端值理论

2507.19978v1

805

07-26

A roadmap for AI in robotics

Fahrplan für KI in der Robotik

机器人用人工智能的路线图

2507.19975v1

806

07-26

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs

NestQuant: 母体产品和LLMs的Nasted Lattice量化

2502.09720v3

807

07-26

SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions

SkinDualGen: Prompt-getriebene Diffusion für die gleichzeitige Bild-Maske-Generierung in Hautläsionen

SkinDualGen: 皮肤遗迹中同声图像元件生成的快速驱动扩散

2507.19970v1

808

07-26

Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Dimer-Enhanced Optimization: Ein Ansatz erster Ordnung, um Sattelpunkte im neuralen Netzwerktraining zu überwinden

优化优化:在神经网络培训中以第一阶梯方式解剖搭配点

2507.19968v1

809

07-26

Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning

Multi-Agenten-Verstärkungs-Lernen mit großflächiger Mixed-Traffic- und Intersektionskontrolle

利用多剂强化学习系统进行大型混合运输和跨部门控制

2504.04691v2

810

07-26

Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks

Wer besitzt dieses Beispiel: Cross-Client Mitgliedschaft Inferenz Attack in Federated Graph Neural Networks

拥有此样本者: 联邦神经网络的跨气候成员推论攻击

2507.19964v1

811

07-26

Preconditioned Inexact Stochastic ADMM for Deep Model

Vorkonditioniertes inexaktes stochastisches ADMM für Deep Model

用于深型号的预设不灵巧的ADMMD

2502.10784v3

812

07-26

$K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting

$K^2$VAE: Ein Koopman-Kalman-Verbesserter Variations-AutoEncoder für probabilistische Zeitreihenprognosen

2美元VAE: 概率时间序列预测的Koopman-Kalman增强变异自动编码器

2505.23017v3

813

07-26

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs

跨语言旅行:多模式LLM中跨语言一致基准

2505.15075v4

814

07-26

Negative Dependence as a toolbox for machine learning : review and new developments

Negative Abhängigkeit als Werkzeugkasten für maschinelles Lernen: Überprüfung und Neuentwicklungen

消极依赖作为机器学习的工具箱:审查与新发展

2502.07285v2

815 07-26 Simple Policy Optimization Einfache Optimierung der Politik 简单政策优化 2401.16025v9

816

07-26

Deep Learning Based Joint Channel Estimation and Positioning for Sparse XL-MIMO OFDM Systems

Deep Learning Based Joint Channel Schätzung und Positionierung für Sparse XL-MIMO OFDM Systeme

分散 XL-MIMO ODM系统的深学习联合频道估计和定位

2507.19936v1

817

07-26

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Frontier AI Risk Management Framework in der Praxis: Ein technischer Bericht zur Risikoanalyse

《国际边界风险管理框架实际操作:风险分析技术报告》

2507.16534v2

818

07-26

ReCA: A Parametric ReLU Composite Activation Function

ReCA: Eine parametrische ReLU-Kompositaktivierungsfunktion

ReCA: 参数雷光U复合启动功能

2504.08994v2

819

07-26

Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems

Effiziente Ritz-Methode für 1D-Diffusionsreaktionsprobleme

用于1D 扩散反应问题的高效浅流机法

2407.01496v4

820

07-26

Tractable Representation Learning with Probabilistic Circuits

Tractable Representative Learning mit probabilistischen Schaltungen

利用概率电路进行可追踪的代表性学习

2507.04385v2

821

07-26

Interleaved Multitask Learning with Energy Modulated Learning Progress

Interleaved Multitask Learning mit energiemoduliertem Lernfortschritt

利用能源调整的学习进度进行跨间多任务学习

2504.00707v2

822

07-26

Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting

Erklärung der Design-Wahl der Wahrscheinlichkeitspfade im Flow-Matching für Vorhersagen

说明预测流程匹配中概率路径的设计选择

2410.03229v3

823

07-26

SoftPipe: A Soft-Guided Reinforcement Learning Framework for Automated Data Preparation

SoftPipe: Ein Soft-Guided-Enforcement-Lernrahmen für die automatisierte Datenvorbereitung

SoftPipe: 自动数据编制软件辅助强化学习框架

2507.13710v2

824

07-26

The Impact of Fine-tuning Large Language Models on Automated Program Repair

Die Auswirkungen von Feinabstimmungen großer Sprachmodelle auf die automatisierte Programmreparatur

微调大语言模型对自动方案维修的影响

2507.19909v1

825

07-26

Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings

Treue differenzierbare Vernunft mit neugeschaffenen, regionsbasierten Einbettungen

以区域为基础的嵌入式

2406.09529v2

826

07-26

TS-Insight: Visualizing Thompson Sampling for Verification and XAI

TS-Insight: Visualisierung der Thompson-Probenahme für Verifikation und XAI

TS-深入观察:可视化Thompson抽样核查和XAI

2507.19898v1

827

07-26

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach

Nonconvex Optimization Framework for Group-Spasse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach

用于群分反馈线性水量最佳控制的非confvex优化框架。二:非惩罚性办法

2507.19895v1

828

07-26

A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction

Eine Umfrage zum Generativen Modell-Unlearning: Grundlagen, Taxonomie, Evaluation und Zukunftsrichtung

关于 “ 产生示范式 “ 学习:基本原理、分类学、评价和未来方向的调查 “

2507.19894v1

829

07-26

CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation

CLoRA: Parameter-Effizientes kontinuierliches Lernen mit Low-Rank-Anpassung

CLORA:低Rank适应的参数有效持续学习

2507.19887v1

830

07-26

CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation

CoSTI: Konsistenzmodelle für (eine schnellere) Spatio-Temporale Imputation

COSTI:(更快的)SPatio-Te时截肢的一致模型

2501.19364v2

831

07-26

DRL-AdaPart: DRL-Driven Adaptive STAR-RIS Partitioning for Fair and Frugal Resource Utilization

DRL-AdaPart: DRL-getriebene adaptive STAR-RIS-Partitionierung für faire und frugale Ressourcennutzung

DRL-AdaPart: DRL-Drive DRL-Drive 适应性STAR-风险研究分割,促进公平和节节节利用资源

2407.06868v2

832

07-26

RestoreAI – Pattern-based Risk Estimation Of Remaining Explosives

RestoreAI – Musterbasierte Risikoschätzung von verbleibenden Sprengstoffen

Res恢复AI – – 基于模式的剩余爆炸物风险估计

2507.19873v1

833

07-26

Numerical Artifacts in Learning Dynamical Systems

Numerische Artefakte im Lernen dynamischer Systeme

学习动态系统中的数值手法

2507.14491v2

834

07-26

Quantum-Informed Machine Learning for Chaotic Systems

Quanteninformiertes maschinelles Lernen für chaotische Systeme

半量量制系统半成型机器学习

2507.19861v1

835

07-26

Training Neural Networks for Modularity aids Interpretability

Ausbildung Neuronale Netzwerke für Modularitätshilfen Dolmetschbarkeit

模块辅助工具神经网络培训

2409.15747v2

836

07-26

Taming Domain Shift in Multi-source CT-Scan Classification via Input-Space Standardization

Domänenumschichtung in der CT-Scan-Klassifikation mit mehreren Quellen mittels Input-Space Standardisierung

通过输入空间标准化实现多源CT-scan分类的多源CT-Scan域变换

2507.19858v1

837

07-26

MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

MultiKernelBench: Ein Multi-Platform Benchmark für die Kernel-Generation

多KenneelBench: 核心生成的多平台基准

2507.17773v2

838

07-26

Agentic Reinforced Policy Optimization

Agentische verstärkte politische Optimierung

强化政策优化

2507.19849v1

839

07-26

VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets

VAE-GAN-basierte Preismanipulation in koordinierten lokalen Energiemärkten

VAE-GAN 协调的地方能源市场价格操纵

2507.19844v1

840

07-26

Hybrid Deep Learning and Handcrafted Feature Fusion for Mammographic Breast Cancer Classification

Hybrides Deep Learning und handwerkliche Feature Fusion für die mammographische Brustkrebsklassifikation

哺乳性乳腺癌分类的深层学习和手工制作的特征融合混合体

2507.19843v1

841

07-26

A Low-complexity Structured Neural Network to Realize States of Dynamical Systems

Ein strukturiertes neurales Netzwerk mit geringer Komplexität zur Realisierung von Zuständen dynamischer Systeme

实现动态系统状态的低复杂结构神经网络

2503.23697v2

842

07-26

GNSP: Gradient Null Space Projection for Preserving Cross-Modal Alignment in VLMs Continual Learning

GNSP: Gradient Null Raumprojektion zur Erhaltung der Cross-Modal Alignment in VLMs Continual Learning

GNSP: 用于在VLMs 持续学习中保持跨模式一致的渐进号空间预测

2507.19839v1

843

07-26

Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

Bewertung des Selbstüberwachten Lernens in der medizinischen Bildgebung: Ein Maßstab für Robustheit, Verallgemeinerbarkeit und Multi-Domain-Impact

评价医疗成像方面的自我监督学习:强力、通用性和多领域影响基准

2412.19124v2

844

07-26

On the rates of convergence for learning with convolutional neural networks

Über die Konvergenzraten für das Lernen mit konvolutionären neuronalen Netzwerken

与进进进神经网络学习的趋同率

2403.16459v3

845

07-26

FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning

FedBAP: Backdoor Defense via Benign Adversarial Perturbation im Federated Learning

FedBAP:通过联邦学习中的Benign Aversarial Proturbidation进行后门防御

2507.21177v1

846

07-26

HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs

HCAtention: Extreme KV Cache Compression via Heterogenes Aufmerksamkeitsrechnen für LLMs

HCAttention:通过不同式注意计算法对LLMs进行极端KV缓存压缩

2507.19823v1

847

07-26

Debunking Optimization Myths in Federated Learning for Medical Image Classification

Debunking Optimization Myths in Federated Learning für medizinische Bildklassifikation

联邦医学图像分类学习联合会中最优化的神话

2507.19822v1

848

07-26

LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

LLM-Barber: Block-Aware Rebuilder für Sparsity Maske in One-Shot für große Sprachmodelle

LLM-Barber:大语言模型单点单层面罩块件重建器

2408.10631v2

849

07-26

Identification and estimation for matrix time series CP-factor models

Identifizierung und Schätzung von Matrix-Zeitreihen CP-Faktor-Modellen

确定和估算矩阵时间序列、时间序列、CPC-因素模型

2410.05634v3

850

07-26

Sequence-based protein-protein interaction prediction and its applications in drug discovery

Sequenzbasierte Protein-Protein-Interaktions-Vorhersage und ihre Anwendungen in der Arzneimittel-Entdeckung

基于序列的蛋白蛋白质-蛋白质相互作用预测及其在药物发现中的应用

2507.19805v1

851

07-26

AI-Based Clinical Rule Discovery for NMIBC Recurrence through Tsetlin Machines

KI-basierte klinische Regel-Discovery für NMIBC-Wiederkehr durch Tsetlin-Maschinen

通过Tsetlin 机器对NMIBC的再现发现基于AI的临床规则

2507.19803v1

852

07-26

AI/ML Life Cycle Management for Interoperable AI Native RAN

AI/ML Life Cycle Management für interoperable KI Native RAN

AI/ML 土著RAN

2507.18538v2

853

07-26

Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling

Weiterentwicklung der Materialentdeckung mit Valence Constrained Design in der Generativen Modellierung

在产生模型模型中用贵重受控设计发现增强材料的发现

2507.19799v1

854

07-26

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Langsame Entscheidungshäufigkeiten in der kontinuierlichen Kontrolle überwinden: Modellbasiertes Sequenz-Verstärkungs-Lernen für modellfreie Steuerung

克服持续控制中缓慢决定因素:无模式控制的示范序列强化学习

2410.08979v5

855

07-26

Analyzing and Mitigating Repetitions in Trip Recommendation

Analyse und Eindämmung von Wiederholungen in der Reiseempfehlung

分析和减轻《Trip建议》中的重复上诉

2507.19798v1

856

07-26

Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning

Kleiner, schneller, billiger: Architekturdesigns für effizientes maschinelles Lernen

更小、更快、更便宜:高效机械学习的建筑设计

2507.19795v1

857

07-26

Adversarial Combinatorial Semi-bandits with Graph Feedback

Adversariale Kombinatoriale Halbbänder mit Graph Feedback

带有图图反馈的半斜面

2502.18826v5

858

07-26

Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures

Sparse-Mode Dynamische Moduszersetzung für die Disambiguierung lokaler und globaler Strukturen

局部和全球结构的偏差分解

2507.19787v1

859

07-26

SpecBPP: A Self-Supervised Learning Approach for Hyperspectral Representation and Soil Organic Carbon Estimation

SpecBPP: Ein selbstüberwachter Lernansatz für die Hyperspektraldarstellung und Boden-organische Kohlenstoffabschätzung

SpecBPP:超光谱代表性和土壤有机碳估计的自我监督学习方法

2507.19781v1

860

07-26

NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing

NeuSemSlice: Auf dem Weg zu einer effektiven DNN-Modellpflege über Semantisches Schneiden auf Neuron-Ebene

NeusSemelice:通过中程语义剪切实现有效的 DNN 模型维护

2407.20281v2

861

07-26

Bag of Coins: A Statistical Probe into Neural Confidence Structures

Münzbeutel: Eine statistische Sonde in neurale Vertrauensstrukturen

《一袋硬币:对神经信心结构的统计研究》

2507.19774v1

862

07-26

Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction

Imitation Learning in Continuous Action Spaces: Compounding Fehler ohne Wechselwirkungen

连续行动空间的模拟学习:没有相互作用的减缓化合物错误

2507.09061v2

863

07-26

Large Language Model Agent for Structural Drawing Generation Using ReAct Prompt Engineering and Retrieval Augmented Generation

利用再行动即时工程和再取回增强型

2507.19771v1

864

07-26

Generalizable Targeted Data Poisoning against Varying Physical Objects

Verallgemeinerbare gezielte Datenvergiftung gegen unterschiedliche physische Objekte

针对不同物体的通用定向中毒数据

2412.03908v2

865

07-26

MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction

MIAT: Manöver-Intention-Bewusst-Transformer für Spatio-Temporale Trajektorien-Vorhersage

MIAT: 斯帕蒂奥-时热轨程预测的操纵-有意-软件变换器

2504.05059v2

866

07-26

A Machine Learning Framework for Predicting Microphysical Properties of Ice Crystals from Cloud Particle Imagery

Ein Machine Learning Framework zur Vorhersage mikrophysikalischer Eigenschaften von Eiskristallen aus der Cloud-Partikelbildgebung

从云粒图像中预测冰晶微物理特性的机器学习框架

2507.19759v1

867

07-26

MTCAE-DFER: Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition

MTCAE-DFER: Multi-Task Cascaded Autoencoder für dynamische Gesichtsausdruckerkennung

MTCAE-DFER: 用于确认动态面谱表达式的多塔卡岩层自动编码器

2412.18988v2

868

07-26

Moving Out: Physically-grounded Human-AI Collaboration

Ausstieg: physikalisch begründete Mensch-AI-Kollaboration

搬出:基于身体的人类 – – AI协作

2507.18623v2

869

07-26

Modeling enzyme temperature stability from sequence segment perspective

Modellierung von Enzymtemperaturstabilität aus Sequenzsegment-Perspektive

从序列段角度对酶温度稳定性进行建模

2507.19755v1

870

07-26

Detecting Multimedia Generated by Large AI Models: A Survey

Multimedia-Erkennung durch große KI-Modelle: Eine Umfrage

由大型AI模型生成的多媒体检测:调查

2402.00045v7

871

07-26

Extended Histogram-based Outlier Score (EHBOS)

Erweiterter Histogrammbasierter Outlier-Score (EHBOS)

以直方图为基础的扩展外部分数(EHBOS)

2502.05719v2

872

07-26

DOA: A Degeneracy Optimization Agent with Adaptive Pose Compensation Capability based on Deep Reinforcement Learning

DOA: Ein Degenerierungs-Optimierungs-Agent mit adaptiver Pose-Kompensationsfähigkeit auf Basis von Deep Reinforcement Learning

DOA: 一种基于深强化学习的适应性胶囊补偿能力脱精优化剂

2507.19742v1

873

07-26

LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung

LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制

2506.10235v2

874

07-26

Predicting Human Mobility in Disasters via LLM-Enhanced Cross-City Learning

Vorhersage der menschlichen Mobilität in Katastrophen durch LLM-verbessertes Stadtübergreifendes Lernen

通过LLM-加强跨城市学习,预测人类在灾害中的流动性

2507.19737v1

875

07-26

Reinforcement Learning for Finite Space Mean-Field Type Games

Verstärkung Lernen für Finite Space Mean-Field Type Spiele

有限空间中场内运动会强化学习

2409.18152v3

876

07-26

A Metabolic-Imaging Integrated Model for Prognostic Prediction in Colorectal Liver Metastases

Ein metabolisch-imaging integriertes Modell für prognostische Vorhersagen in Colorectal Lebermetastasen

彩色活性器元件中预测预测综合模型

2507.19734v1

877

07-26

The Pitfalls of Imitation Learning when Actions are Continuous

Die Pitfalls des Imitationslernens, wenn die Handlungen kontinuierlich sind

连续行动时的模拟学习空洞

2503.09722v4

878

07-26

Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness

Deep RL Dual Sourcing Bestandsmanagement mit Versorgungs- und Kapazitätsrisiko-Bewusstsein

具有供应和能力风险意识的深入RL 双重保值双重保值库存管理

2507.14446v3

879

07-26

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Geometrische Struktur von flachen neuronalen Netzwerken und konstruktive Kostenminimierung ${\mathcal L}^2$

浅层神经网络的几何结构以及将成本降至最低的建设性美元=2美元

2309.10370v3

880

07-25 (5)

GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting

GSCache: Echtzeit-Radianz-Caching für Volume Path Tracing mit 3D Gaussian Splatting

GGSCache: 使用 3D Gaussian Splatting 进行音量路径追踪的实时辐射缓存

2507.19718v1

881

07-25

Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search

Beyond Neighbors: Semantische Kompression und Graph-Augmented Retrieval für erweiterte Vektorsuche

近邻以外地区:用于增强矢量搜索的语义压缩和图形放大检索

2507.19715v1

882

07-25

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Oranits: Missionszuweisung und Aufgabe-Offloading in Open RAN-basierten ITS mit Hilfe von Metaheuristic und Deep Reinforcement Learning

Oranits:利用超常和深强化学习在以开放RAN为基础的ITS中执行特派任务和卸载任务

2507.19712v1

883

07-25

A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

Ein leichtes Deep Learning-basiertes Modell für das Ranking von Einflussknoten in komplexen Netzwerken

在复杂网络中确定有影响的节点的轻量级深学习模式

2507.19702v1

884

07-25

A Validation Approach to Over-parameterized Matrix and Image Recovery

Ein Validierungsansatz für überparameterisierte Matrix und Bildwiederherstellung

超参数矩阵和图像恢复的验证方法

2209.10675v3

885 07-25 Disjoint Generative Models Disjoint Generative Modelle 分离生成模型 2507.19700v1

886

07-25

NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology

NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: Ein multimodaler Datensatz und Methodik

NAICS-NAICS-Aware 用于大型POI共同访问预测:多模式数据集和方法的大型 POI 联合访问预测的神经网络

2507.19697v1

887

07-25

BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen

BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习

2507.07769v3

888

07-25

KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System

KD-GAT: Kombination von Wissensdestillation und Graphen-Achtungstransformator für ein Controller Area Network Intrusion Detection System

KD-GAT:将知识蒸馏和图形关注变异器合并成一个总控制区域网络入侵探测系统

2507.19686v1

889

07-25

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

kompositorische Fähigkeiten treten multiplikativ auf: Die Erforschung von Diffusionsmodellen auf einer synthetischen Aufgabe

多重复制:探索合成工作传播模型

2310.09336v5

890

07-25

Salsa as a Nonverbal Embodied Language – The CoMPAS3D Dataset and Benchmarks

Salsa als nonverbale Sprache – Der CoMPAS3D Datensatz und Benchmarks

Salsa 作为一种非语言的成形语言 – – CoMPAS3D数据集和基准

2507.19684v1

891

07-25

Feature learning is decoupled from generalization in high capacity neural networks

Feature-Lernen wird von der Generalisierung in hochkapazitätigen neuronalen Netzwerken entkoppelt

特色学习与高容量神经网络的一般化脱钩

2507.19680v1

892

07-25

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Ausrichtung und Sicherheit in großen Sprachmodellen: Sicherheitsmechanismen, Trainingsparadigmen und neue Herausforderungen

大语言模式的协调和安全:安全机制、培训范式和新挑战

2507.19672v1

893

07-25

Adaptive Bayesian Data-Driven Design of Reliable Solder Joints for Micro-electronic Devices

Adaptives Bayesian Data-Driven Design von zuverlässigen Lötgelenken für mikroelektronische Geräte

微电子设备可靠太阳能联合点的调适贝耶斯数据驱动设计

2507.19663v1

894

07-25

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität

非曲线矩阵表感测:打破样本复杂程度的二次级屏障

2408.13276v4

895

07-25

Growing Neural Networks: Dynamic Evolution through Gradient Descent

Wachsende neurale Netzwerke: Dynamische Evolution durch gradienten Abstieg

不断增长的神经网络:通过渐渐后代的动态演变

2501.18012v2

896

07-25

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

Über die Grenzen von Ray-Tracing für lernbasierte RF-Aufgaben in städtischen Umgebungen

城市环境中基于学习的RF任务

2507.19653v1

897

07-25

Street network sub-patterns and travel mode

Straßennetz-Untermuster und Reisemodus

街道网络次级模式和旅行模式

2507.19648v1

898

07-25

GABRIL: Gaze-Based Regularization for Mitigating Causal Confusion in Imitation Learning

GABRIL: Gaze-based Regularization zur Minderung von Kausalverwirrung im Imitationslernen

GABRIL: 减少模拟学习中因果融合的基于气体的正规化

2507.19647v1

899

07-25

Categorical Schrödinger Bridge Matching

Kategorische Schrödinger-Brücke passend

分类式 Schrödinger 桥配对

2502.01416v3

900

07-25

Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy

Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie

使用共同自由能源曲线几何法优化

2506.09091v3

901

07-25

Mask prior-guided denoising diffusion improves inverse protein folding

Maskieren Sie vorgeführte Denoisierung Diffusion verbessert inverse Proteinfaltung

面罩前制导除去喷雾扩散会改善蛋白质反折叠

2412.07815v2

902

07-25

Directly Learning Stock Trading Strategies Through Profit Guided Loss Functions

Direktes Lernen von Tradingstrategien durch gewinnorientierte Verlustfunktionen

通过利润引导损失直接学习证券交易战略

2507.19639v1

903

07-25

MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

MOCK: ein Algorithmus für das Lernen nichtparametrischer Differentialgleichungen über multivariate Aufgaben-Kernel-Funktionen

MOCK: 通过多变量职业核心函数学习非参数等量的分等函数的算法

2306.10189v4

904

07-25

Efficient and Scalable Agentic AI with Heterogeneous Systems

Effiziente und skalierbare Agentische KI mit Heterogenen Systemen

具有异质系统的高效和可缩放剂AIA

2507.19635v1

905

07-25

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

LoX: Low-Rank-Extrapolation stärkt LLM-Sicherheit gegen Feinabstimmung

LoX:低Rank外推法强力推力LLM 安全防止微调

2506.15606v3

906

07-25

Quantum Reinforcement Learning by Adaptive Non-local Observables

Quanten-Verstärkung-Lernen durch adaptive nicht-lokale Observables

适应性非当地可观测的非当地可观测物体的量级强化学习

2507.19629v1

907

07-25

Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition

Föderierte Berechnung des Free-Support-Transport-Barycenters durch Single-Loop Dual Decomposition

按单一卢普两极分解法对自由支持运输百分中心进行的联邦计算

2507.19627v1

908

07-25

Studying number theory with deep learning: a case study with the Möbius and squarefree indicator functions

Zahlentheorie mit Deep Learning studieren: eine Fallstudie mit Möbius und quadratfreien Indikatorfunktionen

深深学习研究数字理论:与莫比乌斯和无平方指标函数有关的案例研究

2502.10335v2

909

07-25

State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees

Zustandsentwicklung über die Methoden erster Ordnung I: Starre Vorhersagen und endliche Stichprobengarantien

一: 严格预测和有限抽样保证

2507.19611v1

910

07-25

Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark

Tiefe unüberwachte Domain-Anpassung für die Zeitreihenklassifikation: ein Benchmark

时间序列分类:基准

2312.09857v3

911

07-25

MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

MOCHA: Sind Code-Sprachenmodelle gegen multi-Turn bösartige Coding-Prompts robust?

MOCHA:守则语言模型是否强力打击多发恶意编码的提示?

2507.19598v1

912

07-25

Affordance-Guided Reinforcement Learning via Visual Prompting

Erschwinglich geführtes Verstärkungslernen durch visuelle Prompting

通过视觉促视学习,提供负担得起的辅助强化教育

2407.10341v6

913

07-25

Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning

Geospatielles Wissen abmildern Halluzination in großen Sprachmodellen: Benchmarking und Dynamische Faktizität Ausrichtung

减轻大语言模式中的地理空间知识幻觉:基准和动态事实对齐

2507.19586v1

914

07-25

Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts

Weiterentwicklung der Event-Prognose durch massives Training von großen Sprachmodellen: Herausforderungen, Lösungen und breitere Auswirkungen

通过大规模培训大语言模式:挑战、解决办法和更广泛影响

2507.19477v1

915

07-25

Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization

Lassen Sie es los? Nicht ganz: Adressieren von Item Cold Start in sequentiellen Empfehlungen mit Content-basierte Initialisierung

让它走吗?不是相当的:在基于内容的初始化的序列建议中处理项目“冷启动”

2507.19473v1

916

07-25

Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?

Ist Austauschbarkeit besser als I.I.D, um Datenverteilungsverschiebungen zu handhaben, während Daten für Data-scarce Medizinische Bildsegmentierung gepoolt werden?

是否比I. I. D. 更适于处理数据分配的转移,而处理数据分散的合并数据的医疗图像分割?

2507.19575v1

917

07-25

ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation

ReSem3D: Verfeinerbare 3D-Raumeinschränkungen durch feinkörnige semantische Erdung für eine generalisierbare Robotermanipulation

ReSem3D:通过精密的可通用机器人操纵的语义定位,改进3D空间限制

2507.18262v2

918

07-25

Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces

Linear konvergente Algorithmen für rauchfreie Probleme mit unbekannten glatten Stücken

与未知平滑小块的非移动问题线性一致的线性算法

2507.19465v1

919

07-25

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

RADLADS: Schnelle Aufmerksamkeitsdestillation zu linearen Aufmerksamkeitsdecodern auf Scale

RADLADS: 缩放线性引引代码的快速注意蒸馏

2505.03005v3

920

07-25

Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

Schnelles Lernen nicht-kooperativer 3D-Modelle von Spacecraft durch Primitive Initialisierung

通过初始初始化快速学习非合作航天器3D模型

2507.19459v1

921

07-25

Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints

Hierarchischer Lernrahmen für vertiefte Stärkung von mehrjähriger Vermögensverwaltung im Rahmen von Haushaltszwängen

在预算制约下多年资产管理多年资产管理的等级式深层强化学习框架

2507.19458v1

922

07-25

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

GEPA: Reflektierende Prompt-Evolution kann Verstärkungs-Lernen übertreffen

GEPA: 反思即时进化能够超过成绩的强化学习

2507.19457v1

923

07-25

Forest-Guided Clustering – Shedding Light into the Random Forest Black Box

Wald-geführte Clustering – Licht in die zufällige Wald Black Box

森林引导集束 – – 将亮光放入随机森林黑盒

2507.19455v1

924

07-25

GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences

GVCCS: Ein Datensatz zur kontrailen Identifizierung und Verfolgung sichtbarer Ganzhimmel-Kamerasequenzen

GVCSCS:一个用于识别和跟踪可见全天相摄像机序列的可视全天相摄像头的对照识别和跟踪数据集

2507.18330v2

925

07-25

Bounded KRnet and its applications to density estimation and approximation

Gebundenes KRnet und seine Anwendungen zur Dichteschätzung und -annäherung

KRnet及其在密度估计和近似方面的应用

2305.09063v4

926

07-25

Gradient-based grand canonical optimization enabled by graph neural networks with fractional atomic existence

Gradient-basierte große kanonische Optimierung durch Graphen neuronale Netzwerke mit fraktional atomare Existenz ermöglicht

由具有分原子存在的图形神经网络促成的基于梯度的大锥体优化

2507.19438v1

927

07-25

Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

Beobachtungen treffen auf Aktionen: Learning Control-Sufficient Representations for Robust Policy Generalization

行动:学习控制-足够的代表性,促进强有力的政策普遍化

2507.19437v1

928

07-25

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

TaylorPODA: Eine Taylor Expansion-basierte Methode zur Verbesserung der Post-Hoc-Attributionen für Opaque-Modelle

泰勒·泰勒:以扩大泰勒为基础的方法,改进不透明模式的后住房分配办法

2507.10643v2

929

07-25

ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding

ASR-geführte Lautsprecher-Rolle-Diarisierung und Diarisierung-geführte ASR-Dekodierung

ASR 代号:ASR 代号:ASR

2507.17765v2

930 07-25 Distillation Scaling Laws Destillationsskalierungsgesetze 强化法律 2502.08606v2

931

07-25

Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics

Integrieren von Physik und Topologie in neurale Netzwerke zum Lernen von Starrkörperdynamik

将物理和地形学纳入学习硬体体动力学神经网络

2411.11467v3

932

07-25

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Schritt-3 ist groß und dennoch erschwinglich: Modell-System-Co-Design für kostengünstige Decodierung

第3步是大号但价格可承受的:具有成本效益的编码模型系统共同设计。

2507.19427v1

933

07-25

Perfect Clustering in Very Sparse Diverse Multiplex Networks

Perfektes Clustering in sehr Sparse Diverse Multiplex-Netzwerke

在非常分散的多元多功能网络中完美分组

2507.19423v1

934

07-25

Programmable Virtual Humans Toward Human Physiologically-Based Drug Discovery

Programmierbare virtuelle Menschen auf dem Weg zur physiologischen Drogenentdeckung

人类生理病理药物发现方案虚拟人类

2507.19568v1

935

07-25

CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

CircuitProbe: Spatiotemporale visuelle Semantik mit Circuit Tracing

电路探测:用电路追踪解剖时光视觉语义

2507.19420v1

936

07-25

SILS: Strategic Influence on Liquidity Stability and Whale Detection in Concentrated-Liquidity DEXs

SILS: Strategischer Einfluss auf Liquiditätsstabilität und Whale Detection in Konzentrations-Liquiditäts-DEXs

SILS: 集中-公平性DEX对流动性稳定和捕鲸探测的战略影响

2507.19411v1

937

07-25

On Arbitrary Predictions from Equally Valid Models

Auf willkürliche Vorhersagen von gleichermaßen gültigen Modellen

从同等有效模式作出的任意预测

2507.19408v1

938

07-25

Review of Deep Learning Applications to Structural Proteomics Enabled by Cryogenic Electron Microscopy and Tomography

Überprüfung von Deep-Learning-Anwendungen zur strukturellen Proteomik durch die kryogene Elektronenmikroskopie und Tomographie aktiviert

审查通过低低温电动显微镜和地形学对结构蛋白质组的深学习应用

2507.19565v1

939

07-25

FD4QC: Application of Classical and Quantum-Hybrid Machine Learning for Financial Fraud Detection A Technical Report

FD4QC: Anwendung von klassischem und Quantum-Hybrid-Maschinenlernen für die Erkennung von Finanzbetrug Ein technischer Bericht

FD4QC:应用古典和量子研究机器学习用于金融欺诈侦查技术报告

2507.19402v1

940

07-25

Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten

精神病纵向数据产生的可预期的学习结果

2506.16629v4

941

07-25

Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

Disentangled Latent Spaces erleichtern datengestütztes Hilfslernen

促进数据驱动辅助学习

2310.09278v3

942

07-25

Multi-fidelity Bayesian Data-Driven Design of Energy Absorbing Spinodoid Cellular Structures

Multi-Fidelity Bayesian Data-Driven Design of Energy Absorbing Spinodoid Zelluläre Strukturen

多纤维贝耶斯数据驱动设计

2507.22079v1

943

07-25

Agreement-Based Cascading for Efficient Inference

Vereinbarungsbasiertes Cascading für effiziente Schlussfolgerungen

以协议为基础的高效推断的连锁计算

2407.02348v3

944

07-25

Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Multimodale Recurrent-Ensembles zur Vorhersage von Gehirnreaktionen auf naturalistische Filme (Algonauten 2025)

预测对自然电影的脑反应的多式经常性多年度联合会议(2025年8月20日)

2507.17897v2

945

07-25

Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

Vielfältige LLMs oder unterschiedliche Frageinterpretationen? Das ist die Assembling-Frage

不同的LLMs或不同的问题解释?

2507.21168v1

946

07-25

Learning neuro-symbolic convergent term rewriting systems

neuro-symbolische konvergente Begriffs-Rewriting-Systeme lernen

学习神经 – – 共聚性神经 – – 神经 – – 共用术语重写系统

2507.19372v1

947

07-25

Deep Learning for Double Auction

Deep Learning für doppelte Auktion

双重拍卖深度学习

2504.05355v2

948

07-25

Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation

Counterfactual Erklärungen in der medizinischen Bildgebung: Erforschung SPN-geführter latenter Raummanipulation

医疗成像中的反事实解释:探索SPN-CPN 导航的冷空空间操纵

2507.19368v1

949

07-25

A Data-Driven Approach to Estimate LEO Orbit Capacity Models

Ein datengestützter Ansatz zur Schätzung von LEO-Orbit-Kapazitätsmodellen

数据驱动的低地轨道轨道能力估计模型方法

2507.19365v1

950

07-25

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

LOTUS: Ein Leaderboard für detaillierte Bildunterschriften von Qualität zu gesellschaftlichen Bias und Benutzereinstellungen

LOTUS: 从质量到社会偏见和用户首选的详细图像描述领导板

2507.19362v1

951

07-25

EffiComm: Bandwidth Efficient Multi Agent Communication

EffiComm: Bandbreite Effiziente Multi Agent Kommunikation

EffiComm: 宽带高效多代理通信

2507.19354v1

952

07-25

Reconstruction of Sparse Urban Wireless Signals via Group Equivariant Non-Expansive Operators

Rekonstruktion von Sparse Urban Wireless Signals über konzernunabhängige, nicht expansive Betreiber

通过集团等离差非扩大经营人重建城市无线无线信号

2507.19349v1

953

07-25

Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges

Kurzform-Video-Empfehlungen mit multimodalen Einbettungen: Bewältigung von Kaltstart- und Bias-Herausforderungen

短形式视频建议,带有多模式嵌有:应对冷发和偏见挑战

2507.19346v1

954

07-25

Lower Bounds on the Size of Markov Equivalence Classes

Untere Grenzen auf der Größe der Markov-Äquivalenzklassen

马克夫等等效类大小的下下界界圈

2506.20933v2

955

07-25

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Verdoppelung Ihrer Daten in Minuten: Ultraschnelle Tabellendatenerstellung über LLM-induzierte Abhängigkeitsgraphen

将数据翻倍:通过LLM-引导依赖图生成超快制表数据

2507.19334v1

956

07-25

SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence

SIDE: Sparse Information Entfremdung für erklärbare künstliche Intelligenz

SID: 用于可解释人工智能的粗略信息解析

2507.19321v1

957

07-25

Human-AI Synergy in Adaptive Active Learning for Continuous Lithium Carbonate Crystallization Optimization

Human-AI-Synergie im adaptiven aktiven Lernen für kontinuierliche Lithium-Karbonat-Kristallisierungs-Optimierung

人类-AI 在不断碳化液晶化优化的适应性积极学习中的人类-AI协同效应

2507.19316v1

958

07-25

Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer

Erzeugen klinisch realistischer EHR-Daten über einen Hierarchie- und Semantik-geführten Transformer

通过等级和语义学指导变换器生成临床现实的 EHR 数据

2502.20719v2

959

07-25

Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

Accelerometry-based Energy Expenses Abschätzung während der Aktivitäten des täglichen Lebens: Ein Vergleich unter verschiedenen Accelerometer Zusammensetzungen

日常生活活动期间的能源支出估计:不同加速计构成的比较

2502.10112v2

960

07-25

Negative news posts are less prevalent and generate lower user engagement than non-negative news posts across six countries

Negative Nachrichtenposts sind weniger verbreitet und erzeugen ein geringeres Nutzerengagement als nicht negative Nachrichtenposts in sechs Ländern

与六个国家的非负面新闻站相比,负新闻站不太普遍,用户参与率低于非负面新闻站

2507.19300v1

961

07-25

Controlling Topological Defects in Polar Fluids via Reinforcement Learning

Kontrolle topologischer Defekte in Polarflüssigkeiten durch Verstärkungslernen

通过强化学习控制极地流体的地形病变

2507.19298v1

962

07-25

Interpretable Cross-Sphere Multiscale Deep Learning Predicts ENSO Skilfully Beyond 2 Years

Interpretable Cross-Sphere Multiscale Deep Learning prognostiziert ENSO skilfully beyond 2 Years

跨跨阶段多尺度深层学习预测

2503.21211v2

963

07-25

Query Efficient Structured Matrix Learning

Effizientes strukturiertes Matrix-Lernen abfragen

查询高效结构化矩阵学习

2507.19290v1

964

07-25

Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments

Knowledge Grafting: Ein Mechanismus zur Optimierung von KI-Modellen in ressourcenbeschränkten Umgebungen

知识获取:优化在资源受限制的环境中采用AI模型模型的机制

2507.19261v1

965

07-25

Reactivation: Empirical NTK Dynamics Under Task Shifts

Reaktivierung: Empirische NTK-Dynamik unter Aufgabenverschiebungen

重新激活: 任务变换下的NTK实证动态

2507.16039v2

966

07-25

Delphos: A reinforcement learning framework for assisting discrete choice model specification

Delphos: Ein Verstärkungs-Lernrahmen zur Unterstützung diskreter Auswahlmodellspezifikation

Delphos:一个强化学习框架,协助制定独立选择模型规格

2506.06410v2

967

07-25

A Markov Categorical Framework for Language Modeling

Ein kategorisches Markov-Rahmenwerk für Sprachmodellierung

用于语言建模的 Markov 语言建模分类框架

2507.19247v1

968

07-25

AGORA: Incentivizing Group Emergence Capability in LLMs via Group Distillation

AGORA: Anreize für Gruppenerneuerungsfähigkeit in LLMs durch Gruppendestillation

AGORA:通过集体蒸馏在LLMs中激励群体新兴能力

2507.21166v1

969

07-25

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

OCSVM-geführtes Repräsentationslernen für unüberwachte Anomalieerkennung

OCSVM - 不受监督的异常检测指导代表性学习

2507.21164v1

970

07-25

Component-Based Machine Learning for Indoor Flow and Temperature Fields Prediction Latent Feature Aggregation and Flow Interaction

Komponentenbasiertes maschinelles Lernen für Indoor Flow und Temperaturfelder Vorhersage Latent Feature Aggregation und Flow Interaktion

基于组成部分的室内流动和温度场室内流动和温度场机器学习

2507.19233v1

971

07-25

Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning

Verlängerung des Werkzeuglebens: Erlernen eines kompetenzvollen Einsatzes von Allzweck-Werkzeugen durch lebenslanges Stärkungslernen

延长工具寿命:通过终身指导强化学习学习如何熟练使用普通用途工具

2507.17275v2

972

07-25

Pilot Contamination-Aware Graph Attention Network for Power Control in CFmMIMO

CFMMIMO 控制电源网络

2506.00967v2

973

07-25

SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology

SigBERT: Kombination narrativer medizinischer Berichte und rough Path Signature Theory zur Einschätzung des Überlebensrisikos in der Onkologie

SigBERT: 将叙述性医疗报告与肿瘤学生存风险估算的粗路签名理论相结合

2507.22941v1

974

07-25

Dependency-aware synthetic tabular data generation

Dependency-aware synthetische tabellarische Datengenerierung

依赖意识合成表格数据生成

2507.19211v1

975

07-25

Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems

Physik-informierte Graph-Neural-Netzwerke für transversale Momentum-Schätzung in CMS-Triggersystemen

CMS 触发系统反向动动动估计物理学综合图形神经网络

2507.19205v1

976

07-25

Latent Granular Resynthesis using Neural Audio Codecs

Latent Granular Resynthesis mit neuralen Audio Codecs

使用神经音频编码器进行前端颗粒恢复合成

2507.19202v1

977

07-25

WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design

WACA-UNet: Schwachheits-Bewusst-Kanal Aufmerksamkeit für statische IR-Drop-Vorhersage im integrierten Schaltungsdesign

WACA-UNet: 综合电路设计中静态IR投射预测的弱敏声道注意

2507.19197v1

978

07-25

Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?

Kann Small-Scale-Datenvergiftung Dialect-Linked Biases in großen Sprachmodellen exazerbieren?

在大语言模型中,小范围数据中毒加剧分解链接的分界线能否成为大语言模型?

2507.19195v1

979

07-25

Bespoke multiresolution analysis of graph signals

Maßgeschneiderte Multiauflösungsanalyse von Graphensignalen

对图形信号进行多分辨率分析

2507.19181v1

980

07-25

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Maximale Redundanz-Beschneidung: Eine schichtweise spärliche Zuordnung für LLMs

最大限度的裁员审慎:LLMM 的按原则划分的图层分布

2503.18377v2

981

07-25

Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection

Automatische Cough-Analyse für nicht-kleinzellige Lungenkrebserkennung

非细胞细胞肺癌检测的自动咳嗽分析

2507.19174v1

982

07-25

Doubly Regularized Entropic Wasserstein Barycenters

Doppelt regularisierte entropische Wasserstein Barycenter

普通化的 Entropic Wasserstein 巴利中心

2303.11844v2

983

07-25

Explainable AI guided unsupervised fault diagnostics for high-voltage circuit breakers

Erklärbare KI-geführte, unbeaufsichtigte Fehlerdiagnose für Hochspannungs-Leistungsschalter

可解释的AI 指导高压断路断路器不受监督的故障诊断

2507.19168v1

984

07-25

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them

Scalpel vs. Hammer: GRPO verstärkt bestehende Fähigkeiten, SFT ersetzt sie

缩略图与锤子:GROPO 放大现有能力,SFT 替换

2507.10616v2

985

07-25

Harnessing intuitive local evolution rules for physical learning

Nutzung intuitiver lokaler Evolutionsregeln für körperliches Lernen

利用自然直觉的地方进化规则进行物理学习

2507.19561v1

986

07-25

Learnable cut flow for high energy physics

Lernbarer Schnittfluss für die Hochenergiephysik

高能物理可学习的高能物理可削减流量

2503.22498v2

987

07-25

ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination

ReCoDe: Verstärktes Learning-basiertes dynamisches Constraint-Design für Multi-Agent-Koordination

ReCode:加强以学习为基础的强化学习,为多机构协调设计动态制约

2507.19151v1

988

07-25

Studying Cross-cluster Modularity in Neural Networks

Cross-Cluster-Modularität in neuralen Netzwerken studieren

神经网络跨集群模块化研究

2502.02470v3

989

07-25

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Beschleunigung multimodaler Großsprachenmodelle über Dynamic Visual-Token Exit und die Empirical Findings

通过动态直视退出和实证结论加速多模式大语言模型

2411.19628v2

990

07-25

Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

Vielfältiges und adaptives Verhalten Curriculum für autonomes Fahren: Ein Schüler-Lehrer-Rahmen mit Multi-Agent RL

自主驾驶的多样化和适应行为课程:学生-教师框架与多代理RL

2507.19146v1

991

07-25

Solar Photovoltaic Assessment with Large Language Model

Solar-Photovoltaik-Abschätzung mit großem Sprachmodell

采用大语言模式的太阳能光伏评估

2507.19144v1

992

07-25

Game-Theoretic Gradient Control for Robust Neural Network Training

Spiel-Theoretische Gradientensteuerung für robustes Neural Network Training

强力神经网络培训游戏- 理论梯度控制

2507.19143v1

993

07-25

Large Language Models as Attribution Regularizers for Efficient Model Training

Große Sprachmodelle als Attribution Regularizer für effiziente Modellschulungen

大语言模式,作为高效模式培训的指定正规化机构

2502.20268v3

994

07-25

Graph Structure Learning with Privacy Guarantees for Open Graph Data

Graph Structure Learning mit Datenschutzgarantien für offene Graph Data

带有开放图表数据隐私保障的图表结构学习

2507.19116v1

995

07-25

Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation

Destillieren eines kleinen Utility-Based Passage Selectors zur Verbesserung der Retrieval-Augmented Generation

蒸馏一个小型以公用事业为基础的通道选择器,以加强回收-提款一代

2507.19102v1

996

07-25

Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection

Graph Neuronaler Netzwerk-basierter Vorhersager für eine optimale Quanten-Hardware-Auswahl

优化量子硬件选择的神经网络预测器

2507.19093v1

997

07-25

Mean flow data assimilation using physics-constrained Graph Neural Networks

Mittlere Durchflussdatenassimilation mittels physikisch bedingter Graph-Neural-Netzwerke

利用受物理学限制的图形神经网络进行平均流量数据同化

2411.09476v3

998

07-25

Clustering-Oriented Generative Attribute Graph Imputation

Clustering-oriented generative Attribute Graph Imputation

以集群为主的生成图数

2507.19085v1

999

07-25

Ambient Noise Full Waveform Inversion with Neural Operators

Ambient Noise Full Waveform Inversion mit neuralen Operatoren

使用神经操作器的环境噪音全波形反向

2503.15013v3

1000

07-25

A self-supervised neural-analytic method to predict the evolution of COVID-19 in Romania

Eine selbstüberwachte neural-analytische Methode zur Vorhersage der Entwicklung von COVID-19 in Rumänien

一种自我监督的神经分析方法,用以预测罗马尼亚COVID-19的演变

2006.12926v3

1001

07-25

ToolACE: Winning the Points of LLM Function Calling

ToolACE: Die Punkte des LLM-Funktionsaufrufs gewinnen

工具ACE:赢得LLLM函数调用点

2409.00920v2

1002

07-25

Towards Sustainability Model Cards

Auf dem Weg zu Nachhaltigkeitsmodellkarten

走向可持续性示范卡

2507.19559v1

1003

07-25

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

XAI4LLM. Lassen Sie Modelle für maschinelles Lernen und LLMs für verbessertes In-Context-Lernen im Gesundheitswesen zusammenarbeiten

XAI4LLLM. 让机器学习模式和LLM合作促进保健领域加强内文学习

2405.06270v4

1004

07-25

Generating Adversarial Point Clouds Using Diffusion Model

Erstellen von Adversarial Point Clouds mit Diffusionsmodell

使用扩散模型生成反向点云

2507.21163v1

1005

07-25

Exploring molecular assembly as a biosignature using mass spectrometry and machine learning

Erforschung der molekularen Montage als Biosignatur mittels Massenspektrometrie und maschinellem Lernen

利用质量光谱测量和机器学习,探索分子组装作为一种生物签字

2507.19057v1

1006

07-25

Closing the Modality Gap for Mixed Modality Search

Schließen der Modalitätslücke für gemischte Modalitätssuche

缩小混合方式搜索模式差距

2507.19054v1

1007

07-25

Dynamics-Informed Reservoir Computing with Visibility Graphs

Dynamisch-informiertes Reservoir Computing mit Sichtbarkeitsgraphen

具有可见度图的动态化储量计算

2507.19046v1

1008

07-25

Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems

Großes Sprachmodell Automatisierte Modellierung und Optimierung von Netzwerk-Dispatch-Problemen

大型语文示范电动自动建模和优化主动分发网络调度问题

2507.21162v1

1009

07-25

Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations

学习和外推系统动态的横跨两结构的神经普通差异和外推系统动态

2507.19036v1

1010

07-25

ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs

ProGMLP: Progressive Rahmenbedingungen für die GNN-to-MLP-Wissensdestillation mit effizienten Trade-offs

ProGMLP:全球NN-MLP知识提炼与有效取舍的渐进框架

2507.19031v1

1011

07-25

Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

Stella Nera: Ein differenzierter Maddness-basierter Hardware-Beschleuniger für eine effiziente, annähernde Matrix-Multiplikation

Stella Nera: 高效近光矩阵乘法的有区别的基于 Maddness 的硬件加速器

2311.10207v2

1012

07-25

Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

Jenseits von Rahmen: Zero-Shot Fußgänger Absichtsvorhersage mit Raw Temporal Video und multimodalen Queues

环视框架之外:用原始时光视频和多模式结壳进行零点热热热热食人故意预测

2507.21161v1

1013

07-25

MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

MindSpeed RL: Distributed Dataflow für skalierbare und effiziente RL-Schulungen auf Ascend NPU Cluster

MindSpeed RL: 用于对 Ascend NPU 群集进行可缩放和高效 RL 培训的分布式数据流

2507.19017v1

1014

07-25

Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Causal Mechanism Abschätzung in Multi-Sensor-Systemen über mehrere Domains

跨多域多传感器系统中因果机制估算

2507.17792v2

1015

07-25

A diffusion-based generative model for financial time series via geometric Brownian motion

Ein diffusionsbasiertes generatives Modell für finanzielle Zeitreihen über geometrische Brownsche Bewegung

通过几何布朗运动的金融时间序列基于扩散的遗传模型

2507.19003v1

1016

07-25

Adapting to Fragmented and Evolving Data: A Fisher Information Perspective

Anpassung an zersplitterte und sich entwickelnde Daten: Ein Blick auf die Fischer

适应零碎和不断演变的数据:渔业信息视角

2507.18996v1

1017

07-25

Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen

交互式组合式动态规划:有效利用多种移动式数据集进行强力规划

2507.04790v3

1018

07-25

Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations

Agent0: LLM-Agenten nutzen, um Multi-Value-Features aus Text für erweiterte Empfehlungen zu entdecken

Ar0: 利用LLM代理器从强化建议文本中发现多价值特性

2507.18993v1

1019

07-25

Reinforcement Learning via Conservative Agent for Environments with Random Delays

Verstärktes Lernen über Conservative Agent for Environments mit zufälligen Verzögerungen

通过 “ 随机延缓环境保守代理 “ 强化学习

2507.18992v1

1020

07-25

GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units

GENIAL: Generative Design Space Exploration über Netzwerk-Inversion für stromarme algorithmische Logische Einheiten

GENIAL:通过网络转换生成设计空间探索,用于低功率测算仪

2507.18989v1

1021

07-25

CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

CLIP-geführte Backdoor-Verteidigung durch Entropie-basierte vergiftete Datensatztrennung

CLIP-通过基于英基中毒数据集的分离来引导后门防御

2507.05113v2

1022

07-25

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Verbalisiertes Repräsentationslernen für verdolmetschbare wenige-heiße Verallgemeinerung

以口头方式进行代表性学习,为可口译的少或偏的普及化提供口译

2411.18651v2

1023

07-25

Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model

Unterschiedliche Schilddrüsenkrebs-Wiederkehr-Klassifikation mit Machine Learning-Modellen und bayesischen neuralen Netzwerken mit unterschiedlichen Prioren: Eine SHAP-basierte Interpretation des besten Performing-Modells

使用机械学习模型和有不同前科的贝叶斯神经网络的有区别的甲状腺癌症重复发生分类:以SHAP为基础对最佳性能模型进行解释

2507.18987v1

1024

07-25

KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes

KASPER: Kolmogorov Arnold Netzwerke für Stock Prediction und erklärbare Regimes

KASPER: Kolmogorov Arnold 股票预测和可解释制度网络

2507.18983v1

1025

07-25

Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies

Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken

混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策

2403.10996v7

1026

07-25

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Faire Algorithmen mit Probing für Multi-Agent Multi-Armed Bandits

多代理多武装强盗验证法的公允算法

2506.14988v3

1027

07-25

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Konzept-TRAK: Verstehen, wie Diffusionsmodelle Konzepte durch Konzept-Level-Attribution lernen

概念-TRAK:了解传播模式如何通过概念层面的归属来学习概念

2507.06547v2

1028

07-25

Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling

Auf dem Weg zur Verbesserung der Langzeitprognosen von Entity in zeitlichen Wissensgraphen durch globale Ähnlichkeiten und gewichtete Stichproben

通过全球相似性和加权抽样改进时间知识图中的长期审计实体预测

2507.18977v1

1029

07-25

A Toolbox, Not a Hammer – Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

Eine Toolbox, kein Hammer – Multi-TAG: Skalierung der Mathematik mit Multi-Tool-Aggregation

一个工具箱, 不是锤锤 – – 多TAG: 使用多工具聚合的量性数学解释

2507.18973v1

1030

07-25

Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN

Unterwasser-Abfallerkennung mit Deep Learning Ein Leistungsvergleich von YOLOv7 zu 10 und schneller RCNN

YOLOv7至10YOLOv7和更快RCNN的深入学习绩效比较

2507.18967v1

1031

07-25

On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem

Auf der Erforschung des inneren Spiegelabflusses für stochastisches nichtkonvexes beschränktes Problem

探索内镜面下下下流的内孔反镜下流,以缓解杂乱的非电流制约问题

2507.15264v3

1032

07-25

MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model

MedicalBERT: Verbesserung der biomedizinischen natürlichen Sprachverarbeitung mit vorgebildetem BERT-basiertem Modell

医学BERT:利用预先培训的BERT模式,加强生物医学自然语言处理

2507.08013v2

1033

07-25

Handling Out-of-Distribution Data: A Survey

Umgang mit Out-of-Distribution-Daten: Eine Umfrage

处理分发外数据:调查

2507.21160v1

1034

07-25

Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity

Adaptive Cluster Zusammenarbeit steigert LLMs medizinische Entscheidungsunterstützung Kapazität

LLM 医疗决策支助能力

2507.21159v1

1035

07-25

Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights

Neural Tangent Kernel und Fisher Information Matrizen für einfache ReLU-Netzwerke mit zufälligen versteckten Gewichten

带有随机隐藏重的简单 ReLU 网络神经相垂直内核和渔业信息矩阵

2507.18555v2

1036

07-25

CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods

CNN-basierte Surface Temperature Forecasts mit Ensemble Numerische Wettervorhersage über Mittelstrecken-Prognoseperioden

CNN有线有线电视新闻网的地表温度预报,以及中程预报期综合数字天气预报

2507.18937v1

1037

07-25

When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label

Wenn geräuschvolle Etiketten die Klassenungleichgewichte auf Graphen treffen: Eine grafische Augmentationsmethode mit LLM und Pseudo-Label

当噪音标签在图表上达到类平衡时:与LLM和Pseudo标签的图表放大法

2507.18153v2

1038

07-25

Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction

Geometrische Multi-Color-Nachricht Passing Graph Neural Networks für Blut-Hirn-Barriere Permeabilität Vorhersage

多色消息传传图神经网络

2507.18926v1

1039

07-25

From Conditional to Unconditional Independence: Testing Conditional Independence via Transport Maps

Von der bedingten zur bedingungslosen Unabhängigkeit: Prüfung der bedingten Unabhängigkeit über Transportkarten

从有条件独立到无条件独立:通过运输图测试有条件独立

2504.09567v3

1040

07-25

A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions

Eine systematische Überprüfung der Systeme der wichtigsten retrieval-Augmented Generation (RAG): Fortschritt, Lücken und Zukunftsrichtungen

系统审查关键回收-养代(RAG)系统:进展、差距和未来方向

2507.18910v1

1041

07-25

Probably Approximately Correct Causal Discovery

Wahrscheinlich ungefähr korrekte Kausalentdeckung

可能大致正确原因发现

2507.18903v1

1042

07-25

PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

PLEIADES: Bau von Temporalkernen mit orthogonalen Polynomen

PIADES:用矫形聚合体建造时空中枢

2405.12179v4

1043

07-25

Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation

Bridging Quantum and Classical Computing in Drug Design: Architektur-Prinzipien für verbesserte Molekül-Generierung

在药物设计中架桥量计算和古典计算:改进分子生成的建筑原则

2506.01177v2

1044

07-25

A Survey on State-of-the-art Deep Learning Applications and Challenges

Eine Umfrage zu aktuellen Anwendungen und Herausforderungen des Deep Learning

关于最先进的深深学习应用和挑战的调查

2403.17561v9

1045

07-25

Why Isn’t Relational Learning Taking Over the World?

Warum übernimmt das relationale Lernen nicht die Welt?

为什么关系学习不超越世界?

2507.13558v2

1046

07-25

VIBE: Video-Input Brain Encoder for fMRI Response Modeling

VIBE: Video-Input Gehirnencoder für fMRI Response Modeling

VIBE: 用于FMRI反应建模的视频投入大脑编码器

2507.17958v2

1047

07-25

Value-Based Deep RL Scales Predictably

Wertbasierte Tiefen-RL-Skalen vorausschauend

基于价值的深 RL 尺度

2502.04327v2

1048

07-25

Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

Individueller Intrinsischer Lohn im Mehr-Agenten-Verstärkungs-Lernen durch Einbeziehung allgemeiner menschlicher Expertise

通过纳入通用的人类专门知识,学习多机构加强学习中的个人内在奖赏

2507.18867v1

1049

07-25

Estimation of conditional average treatment effects on distributed confidential data

Schätzung der bedingten durchschnittlichen Behandlungseffekte auf verteilte vertrauliche Daten

对分发的机密数据进行有条件平均待遇影响的估计

2402.02672v5

1050

07-25

Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning

Frühe Mortalitätsvorhersage bei Patienten mit hypertensiver Nierenerkrankung unter Verwendung eines interpretierbaren maschinellen Lernens

使用可解释机器学习方法对伊斯兰法院联盟高血压肾脏疾病患者进行早期死亡率预测

2507.18866v1

1051

07-25

PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning

PrismRAG: Steigerung der RAG-Faktizität mit Distraktorresilienz und geschichteter Vernunft

PrismRAG:提高RAG事实质量,使其具有抗力和策略性合理性

2507.18857v1

1052

07-25

Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition

Resonant-Tunnelling Diode Reservoir Computing System für die Bilderkennung

图像识别共振二氧化二氮储量计算系统

2507.15158v2

1053

07-24 (4)

Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number

Optimierung des Metachronal-Paddelns mit Verstärkungs-Lernen bei niedriger Reynolds-Zahl

优化低Reynolds 数字加固学习的比数倾斜

2507.18849v1

1054 07-24 Low-Rank Thinning Low-Rank Thinning 低兰氏度 2502.12063v7

1055

07-24

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Perturbationseffiziente Zeroth-Order-Optimierung für hardwarefreundliches On-Device-Training

方便硬件的硬件设备培训优化

2504.20314v2

1056

07-24

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

PIPA: Präferenz-Ausrichtung als vorherinformierte statistische Schätzung

PIPA: 优先一致,作为先前不完善的统计估计

2502.05773v2

1057

07-24

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

R-Stitch: Dynamische Trajektorien-Stitching für effiziente Vernunft

R-Stitch: 高效理性的动态轨迹切换

2507.17307v2

1058

07-24

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Auf dem Weg zur Verbesserung des Belohnungsdesigns in RL: Ein Reward Alignment Metric für RL-Praktizierende

努力改进RL的奖励设计:为RL开业医生的奖励调整计量

2503.05996v2

1059

07-24

RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification

RedactOR: Ein LLM-Powered Framework für die automatische De-Identifikation klinischer Daten

重编:一个LLM授权的自动临床数据识别框架

2505.18380v2

1060

07-24

Toward Super Agent System with Hybrid AI Routers

Auf dem Weg zum Super Agent System mit Hybrid-KI Routern

向超级代理系统过渡

2504.10519v2

1061

07-24

LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

LeanKAN: Eine Parameter-Lean Kolmogorov-Arnold Netzwerkschicht mit verbesserter Speichereffizienz und Konvergenzverhalten

LeanKAN: 提高记忆效率和一致行为达标的Lean Kolmogorov-Arnold网络层

2502.17844v2

1062

07-24

RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models

RealDeal: Realismus und Details in der Gehirnbildgenerierung durch Image-to-Image-Diffusionsmodelle verbessern

Real Deal:通过图像到图像传播模型,加强脑图像生成的现实和细节

2507.18830v1

1063

07-24

CueBuddy: helping non-native English speakers navigate English-centric STEM education

CueBuddy: Hilfe für nicht-native englische Referenten navigieren Englisch-centric STEM Bildung

CueBuddy:帮助非母语英语者掌握以英语为中心的STEM教育

2507.18827v1

1064

07-24

Scale-Consistent Learning for Partial Differential Equations

Scale-Consistent Learning für partielle Differentialgleichungen

部分差异等量的规模一致学习

2507.18813v1

1065

07-24

Even Faster Simulations with Flow Matching: A Study of Zero Degree Calorimeter Responses

Noch schnellere Simulationen mit Flow Matching: Eine Studie mit Null-Grad-Kalorimeter-Antworten

更快速的模拟流程匹配模拟:零度卡拉里米反应研究

2507.18811v1

1066

07-24

Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy

Neugier trieb die Exploration an, um Struktur-Eigentums-Lernen in der Mikroskopie zu optimieren

优化微观分析中结构-财产学习的探索

2504.20011v2

1067

07-24

MetaSel: A Test Selection Approach for Fine-tuned DNN Models

MetaSel: Ein Testauswahlverfahren für fein abgestimmte DNN-Modelle

MetaSel: 微调 DNN 模型的测试选择方法

2503.17534v3

1068

07-24

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Feature Flow analysieren, um Interpretation und Steuerung in Sprachmodellen zu verbessern

分析地貌流动,以加强语言模型的口译和指导

2502.03032v3

1069

07-24

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen

量子、原子学和连续系统科学人造情报

2307.08423v6

1070

07-24

Test-time Offline Reinforcement Learning on Goal-related Experience

Test-time Offline-Verstärkung Lernen über zielbezogene Erfahrungen

关于目标相关经验的脱线强化学习

2507.18809v1

1071

07-24

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling

通过推动积极抽样,自我监督的演讲人核查框架

2501.17772v4

1072

07-24

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

Fischer kostenlos? Annäherung der Fisher Information Matrix durch Recycling der quadratischen Gradienten Akkumulator

通过回收平梯级积聚器来接近渔业信息矩阵

2507.18807v1

1073

07-24

Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors

Ralts: Robuste Aggregation zur Verbesserung der Graphen-Neural-Netzwerk-Resilienz bei Bit-Flip-Fehlern

Ralts:加强图形神经网络抗力的强力聚合,以应对位翻转错误

2507.18804v1

1074

07-24

Central limit theorems for the eigenvalues of graph Laplacians on data clouds

Zentralgrenzensätze für die Eigenwerte von Graphen Laplacians auf Datenwolken

数据云中拉平板图 Laplacians 的天值中央限制定理

2507.18803v1

1075

07-24

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Plan für Geschwindigkeit: Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle

速度计划: 遮蔽传播语言模型的饱和日程安排

2506.19037v3

1076

07-24

2048: Reinforcement Learning in a Delayed Reward Environment

2048: Verstärktes Lernen in einer verzögerten Belohnungsumgebung

2048年:在延迟奖励环境中加强学习

2507.05465v2

1077

07-24

Semantic IDs for Music Recommendation

Semantische IDs für Musikempfehlung

用于音乐推荐的语义代号

2507.18800v1

1078

07-24

CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization

CLEAR: Unlearning Spurious Style-Content Assoziationen mit Kontrastivem Lernen mit anti-kontrastiver Regularisierung

CLEAR: 学习与反竞争正规化相悖的利于竞争的利于竞争的纯净风格-知识型协会

2507.18794v1

1079

07-24

Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

Erzählen Sie mir, was Sie sehen: Ein iteratives Deep Learning Framework für Bildunterschriften

告诉我你看到的是什么:一个用于图像描述的循环深学习框架

2507.18788v1

1080

07-24

Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification

Vergleichende Analyse von Vision Transformern und konvolutionären Neuralnetzwerken für medizinische Bildklassifikation

关于医学图像分类的愿景变异器和革命神经网络的比较分析

2507.21156v1

1081

07-24

Discovering the dynamics of \emph{Sargassum} rafts’ centers of mass

Die Dynamik von \emph{Sargassum} Floßzentren entdecken

发现木筏质量中心的动态

2507.18771v1

1082

07-24

ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

ylmmcl bei Mehrsprachiger Textentgiftung 2025: Lexikon-geführte Entgiftung und Klassifikator-gestrichenes Umschreiben

2025年多语言文本解毒:Lexicon-Guid解毒和分类法改写

2507.18769v1

1083

07-24

SPADE-S: A Sparsity-Robust Foundational Forecaster

SPADE-S: Ein Sparsity-Robust Foundational Forecaster

SPADE-S: 纯度-罗布斯基础预测器

2507.21155v1

1084

07-24

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Exploitation Over Exploration: Entlarvung der Bias in Linear Bandit Recommender Offline-Evaluation

开采过度勘探:在线性强盗建议者离岸评估中揭开比亚斯

2507.18756v1

1085

07-24

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

Lärm Kontrastive Schätzung-basiertes Matching Framework für die Erkennung von Low-Resource-Sicherheitsangriffen

低资源安保攻击模式识别比对框架

2401.10337v4

1086

07-24

Time-resolved dynamic CBCT reconstruction using prior-model-free spatiotemporal Gaussian representation (PMF-STGR)

Zeitaufgelöste dynamische CBCT-Rekonstruktion unter Verwendung einer modellfreien raumzeitlichen Gauß-Darstellung (PMF-STGR)

利用以前不设模型的时空代表性(PMF-STGR),解决时间问题,重建CBCT

2503.22139v2

1087

07-24

Learned Single-Pixel Fluorescence Microscopy

Gelernte Einzel-Pixel Fluoreszenzmikroskopie

获得单像素荧光显微镜

2507.18740v1

1088

07-24

An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid

Ein erklärbares Equity-Aware P2P Energy Trading Framework für sozio-ökonomische Diverse Microgrid

社会经济多样化微电网可解释的公平-公见P2P能源贸易框架

2507.18738v1

1089

07-24

Less is More: Adaptive Coverage for Synthetic Training Data

Weniger ist mehr: Adaptive Abdeckung für Synthetische Trainingsdaten

较少为: 合成培训数据的适应性覆盖

2504.14508v2

1090 07-24 Bootstrapped Reward Shaping Bootstrapped Reward Shaping 增强的奖励形状 2501.00989v2

1091

07-24

Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach

Mehrjährige Wartungsplanung für großräumige Infrastruktursysteme: Ein neuartiges Netzwerk Deep Q-Learning-Ansatz

大型基础设施体系多年期维持规划:新网络深学习方法

2507.18732v1

1092

07-24

Exploration Behavior of Untrained Policies

Explorationsverhalten ungeübter Politiken

未经过培训的政策的探索行为

2506.22566v3

1093

07-24

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Effizientes Sparse-Fine-Tuning mit geringem Quantisierungsfehler über Neural Network Pruning

通过神经网络节制低量错误的高效粗简精细调整

2502.11439v2

1094

07-24

The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models

Das Recht vergessen zu werden: Enthüllen Sie Maschine Entlernen von Sparse-Modellen

普鲁宁被遗忘的权利:破碎型号的unveil 机器退出学习

2507.18725v1

1095

07-24

SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning

SCORE-SET: Ein Datensatz von GuitarPro-Dateien für Musik Phrase Generation und Sequence Learning

SCORE-SET: 用于音乐词组生成和序列学习的吉他Pro文件数据集

2507.18723v1

1096

07-24

Fixed-Point RNNs: Interpolating from Diagonal to Dense

Fixed-Point RNNs: Interpolieren von Diagonal nach Dense

固定点区域NN:从对角线到对角线的内插

2503.10799v2

1097

07-24

Adaptive Neural Quantum States: A Recurrent Neural Network Perspective

Adaptive Neurale Quantenzustände: Eine wiederkehrende Neurale Netzwerkperspektive

适应性神经量子州:经常性神经网络视角

2507.18700v1

1098

07-24

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Pseudo-Labeling für Kernel Ridge Regression unter Kovariate Shift

共变移下内核循环脊回归的优多环流

2302.10160v4

1099

07-24

SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

SIDA: Synthetisches Bild angetrieben Null-Schuss Domain-Anpassung

SIDA: 合成图像驱动器零弹射域适应

2507.18632v1

1100

07-24

Gait Recognition Based on Tiny ML and IMU Sensors

Gait-Erkennung basierend auf winzigen ML- und IMU-Sensoren

基于小ML和IMU传感器的Gait识别

2507.18627v1

1101

07-24

TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

TRPrompt: Bootstrapping Query-Aware Prompt Optimierung von Textbelohnungen

TRPropt: 从文本奖励中促进解答询问软件快速优化

2507.18618v1

1102

07-24

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning

SynC: Synthetische Bildunterschrift Datensatzverfeinerung mit ein-zu-vielen Mapping für Zero-shot Bildunterschrift

合成图像说明: 合成图像说明数据集精化,用一到多个绘图进行零光图像说明的合成图像说明

2507.18616v1

1103

07-24

BEARCUBS: A benchmark for computer-using web agents

BEARCUBS: Benchmark für computergestützte Web-Agenten

BEARCUBS:计算机使用网络代理器的基准

2503.07919v3

1104

07-24

Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Interact2Vec – Ein effizientes neuronales Netzwerk-basiertes Modell zum gleichzeitigen Lernen von Benutzern und Elementen in Empfehlungssysteme

Interact2Vec – – 一个有效的神经网络模式,用于同时学习用户和项目嵌入建议系统

2506.22648v3

1105

07-24

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Erklärbarer Mapper: LLM-Embedding-Räume mit Perturbation-basierten Erklärungs- und Verifikations-Agenten kartographieren

可解释的成像仪:利用以扰动为基础的解释和核查仪器绘制LLM内嵌空间图

2507.18607v1

1106

07-24

Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Hybrider quantenklassischer Algorithmus zur nahezu optimalen Planung in POMDPs

POMDPs中接近最佳规划的混合量子-古典量子算法

2507.18606v1

1107

07-24

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

Beyond Euklid: Ein illustrierter Leitfaden zum modernen maschinellen Lernen mit geometrischen, topologischen und algebraischen Strukturen

欧几里特以外:带有几何、地形学和代数结构的现代机器学习设计指南

2407.09468v2

1108

07-24

Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Entmystifizieren Protein-Generation mit Hierarchische Bedingte Diffusion Modelle

使用等级级有条件扩散模型解密蛋白一代

2507.18603v1

1109

07-24

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Sparse Logit Sampling: Beschleunigung der Wissensdestillation in LLMs

粗略的登录抽样:加速在LLMs中进行知识蒸馏

2503.16870v2

1110

07-24

Linear Memory SE(2) Invariant Attention

Linearer Speicher SE(2) Invariante Aufmerksamkeit

线性内存 SE(2) 惯性注意

2507.18597v1

1111

07-24

Private Counterfactual Retrieval

Private kontraaktische Retrieval

私人反事实检索

2410.13812v2

1112

07-24

DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

DRWKV: Fokussierung auf Objektkanten für Low-Light Image Enhancement

DRWKV: 关注低光图像增强对象边缘

2507.18594v1

1113

07-24

On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen

关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同

2506.05249v3

1114

07-24

Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

Jenseits interner Daten: Konstruieren vollständiger Datensätze für Fairness-Tests

超越内部数据:为公平测试建立完整的数据集

2507.18561v1

1115

07-24

Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Konzept-Probing: Wo man menschendefinierte Konzepte findet (erweiterte Version)

概念检验:如何找到人类定义的概念(扩展版本)

2507.18681v1

1116

07-24

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

Omni-Thinker: Skalierung der Cross-Domain-Verallgemeinerung in LLMs über Multi-Task RL mit Hybrid Rewards

Omni-Thinker:通过多任务RL与混合奖励在LLMLM中扩大跨域通用化

2507.14783v2

1117

07-24

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

LagKV: Lag-Relative Information des KV-Cache erzählt, welche Token wichtig sind

LagKV: KV 缓存告诉哪个 Tokens 重要, 而 KV 缓存的拉格- 相对信息Name

2504.04704v2

1118

07-24

The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm

Die Geometrie der LLM-Quantisierung: GPTQ als Babai’s nächste Flugzeugalgorithmus

LLM 定量法的几何测量:GPTQ作为Babai最接近的平地

2507.18553v1

1119

07-24

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Zeroth-Order Feinsteuerung von LLMs in Random Subspaces

随机子空间中LLMs的零级微调微调

2410.08989v3

1120

07-24

On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Zur Performance von Konzept-Probing: Der Einfluss der Daten (Erweiterte Version)

关于 “ 概念检验:数据的影响 “ 的绩效(扩展版)

2507.18550v1

1121

07-24

Market Making Strategies with Reinforcement Learning

Strategien für die Marktentwicklung mit dem Ausbau des Lernens

具有强化学习的市场战略

2507.18680v1

1122

07-24

The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection

Die Preisgleichung zeigt ein universelles Gesetz des algorithmischen Lernens und der natürlichen Selektion.

价格方程式揭示了一种通用的算法学习法和自然选择法

2507.18549v1

1123

07-24

Learning Gentle Grasping Using Vision, Sound, and Touch

Sanftes Greifen lernen mit Vision, Sound und Touch

利用愿景、声音和触摸进行轻巧的学习

2503.07926v2

1124

07-24

Deep Variational Free Energy Calculation of Hydrogen Hugoniot

Tiefe Variationsfreie Energieberechnung von Wasserstoff Hugoniot

雨原氢能深变化式自由能源计算

2507.18540v1

1125

07-24

External Knowledge Injection for CLIP-Based Class-Incremental Learning

Externe Wissensinjektion für CLIP-basiertes Klassen-Inkrementelles Lernen

为基于CLIP的高级类强化学习提供外部知识注射

2503.08510v2

1126

07-24

Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

Erklärung des Design-Raums für willkürlich-lärmbasierte Diffusionsmodelle

说明以任意噪音为基础的传播模型的设计空间

2507.18534v1

1127

07-24

C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

C2G-KD: PCA-Constrained Generator für datenfreie Wissensdestillation

C2G-KD: 五氯苯甲醚-经培训的无数据知识蒸馏生成器

2507.18533v1

1128

07-24

Diffuse and Disperse: Image Generation with Representation Regularization

Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung

Diffuse & diffperse: 形象生成,有代表性的规范化

2506.09027v2

1129

07-24

Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Sind KI-erzeugte Fixes sicher? LLM und Agent Patches auf der SWE-Bench analysieren

AI - 具有安全性吗? 分析SWE-bench 上的LLM 和代理补丁

2507.02976v2

1130

07-24

The Moral Gap of Large Language Models

Die moralische Kluft großer Sprachmodelle

大语言模式的道德差距

2507.18523v1

1131

07-24

Optimal Transport Regularized Divergences: Application to Adversarial Robustness

Optimaler Transport Regularisierte Divergenzen: Anwendung auf widrige Robustheit

优化运输常规化差异:适用于逆向强力

2309.03791v3

1132

07-24

GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

GCC-Spam: Spam-Erkennung über GAN, Kontrastives Lernen und Charaktergleichheitsnetzwerke

海合会-Spam:通过全球大气监测网、反竞争学习和特征相似网络探测垃圾邮件

2507.14679v2

1133

07-24

Robust sensitivity control in digital pathology via tile score distribution matching

Robuste Sensitivitätskontrolle in der digitalen Pathologie über Kacheln-Score-Verteilungsabgleich

通过瓷砖计分分布匹配对数字病理学中的强力敏感度控制

2502.20144v3

1134

07-24

GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

GLANCE: Graph Logic Attention Network mit Cluster Enhancement für heterophiles Graph Representation Learning

图表逻辑关注网络,通过群集增强混合图示代表性学习

2507.18521v1

1135

07-24

Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Euklidische Distanz Deflation unter hochdimensionalen heteroskedastischen Geräuschen

高多变性热电传噪声下的远距离通缩

2507.18520v1

1136

07-24

Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning

Revisiting Bisimulation Metric für robuste Darstellungen in Verstärkungs-Lernen

重新研究强化学习中强力代表制的模拟比照模型

2507.18519v1

1137

07-24

Visual Adaptive Prompting for Compositional Zero-Shot Learning

Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen

零热学习的视觉适应性促进

2502.20292v6

1138

07-24

A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets

遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究

2507.10084v2

1139

07-24

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen

连续时线性强化学习问题分类的子线性遗憾

2407.17226v6

1140

07-24

Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

Multi-Preference Lambda-bewertet Listwise DPO für kleine Modellausrichtung

用于小规模模型调整的多参数 Lambda加权列表DPO

2506.19780v5

1141

07-24

DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

DualXDA: Auf dem Weg zu sparsamen, effizienten und erklärbaren Datenzuweisungen in großen KI-Modellen

DUAXDA:在大型AI型模型中实现数据分散、高效和可解释的归属

2402.12118v2

1142

07-24

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Nicht alle Funktionen widmen sich der Aufmerksamkeit: Graphengeführtes Abhängigkeitslernen für tabellarische Datengenerierung mit Sprachmodellen

并非所有值得注意的地物:用语言模型编制图表数据时的图表指导依赖性学习

2507.18504v1

1143

07-24

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

PLOT-TAL: Schnell lernen mit optimalem Transport für temporale Aktionslokalisierung

PLOT-TAL: 以最优化交通方式迅速学习,促进少数时空行动地方化

2403.18915v2

1144

07-24

EarthLink: A Self-Evolving AI Agent for Climate Science

EarthLink: Ein sich selbst entwickelnder KI-Agent für Klimawissenschaften

EarthLink:一个自我发展的AI气候科学代理机构

2507.17311v2

1145

07-24

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Unüberwachtes Konzept Drift Erkennung von Deep-Learning-Darstellungen in Echtzeit

从实时深层学习代表中检测出

2406.17813v2

1146

07-24

Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Treue, dolmetschbare Röntgendiagnose im Brustkorb mit Anti-Aliased-B-Cos-Netzwerken

真实的、可解释的胸透透透透透透透透透透透透透透透析与反闭合的B子网络的诊断

2507.16761v2

1147

07-24

DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

DriftMoE: Eine Mischung aus Experten Ansatz zum Umgang mit Konzept Drifts

DriftMoE:处理 “ 漂流概念 “ 的混合专家办法

2507.18464v1

1148

07-24

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Wiederherstellung des Rhythmus: Pünktlichkeitsrestaurierung mit Transformer-Modellen für Bangla, eine Sprache mit geringer Ressource

恢复时速:使用孟加拉国低资源语言 “ 孟加拉 “ 变压器模型恢复脉冲

2507.18448v1

1149

07-24

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen

基于成果的在线强化学习:等级和基本限制

2505.20268v2

1150

07-24

IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

IPCGRL: Sprachgestütztes Verstärkungslernen für die verfahrenstechnische Level-Generierung

ICPCGRL: 程序生成阶段语言教学强化学习

2503.12358v4

1151

07-24

NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning

NLML-HPE: Kopfhosenschätzung mit begrenzten Daten über Manifold Learning

NLML-HPE:通过人工学习用有限数据进行测算的负责人

2507.18429v1

1152

07-24

How do language models learn facts? Dynamics, curricula and hallucinations

Wie lernen Sprachmodelle Fakten? Dynamik, Lehrpläne und Halluzinationen

语言模式如何了解事实?动态、课程和幻觉

2503.21676v2

1153

07-24

Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins

Multi-Model-Ensemble und Reservoir Computing für Flussentladungsvorhersage in ungespurten Becken

多模型组合和储量计算,用于未排出盆地的河流排泄预测

2507.18423v1

1154

07-24

Residual Prior-driven Frequency-aware Network for Image Fusion

Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion

图像融合超前驱动频率感知网络

2507.06735v2

1155

07-24

FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

FinDPO: Finanz-Sentiment-Analyse für algorithmischen Handel durch Preference-Optimierung von LLMs

FinDPO:通过优惠优化LLMs,分析通过高利贷交易的金融敏感度

2507.18417v1

1156

07-24

Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

Iwin Transformer: Hierarchische Vision Transformer mit Interleaved Windows

Iwin 变换器: 使用内部视窗的等级愿景变换器

2507.18405v1

1157

07-24

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

CLEAR: Fehleranalyse über LLM-as-a-Judge leicht gemacht

CLLEAR:通过LLM-as-a法官进行错误分析

2507.18392v1

1158

07-24

A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting

Ein Vergleich der Modelle für datengesteuerte regionale Wettervorhersagen mit ausgedehntem Grid und begrenzten Flächen

数据驱动区域天气预报的用数据驱动的区域气象预报的拉累电网和有限区域模拟模型的比较

2507.18378v1

1159

07-24

On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Über die Wiederherstellung von Trainingsdaten aus Bayesischen Nachbildungen und ausgebildeten Modellen

Bayesian Posides和经过培训的模型的培训数据重建

2507.18372v1

1160

07-24

Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Effiziente Unsicherheit in LLMs durch Evidential Knowledge Destillation

通过证据知识蒸馏在LLMs中提高效能的不确定性

2507.18366v1

1161

07-24

Leveraging the Structure of Medical Data for Improved Representation Learning

Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen

利用医疗数据结构改进代表性学习

2507.02987v3

1162

07-24

Latent Space Alignment for AI-Native MIMO Semantic Communications

Latent Space Alignment für KI-Native MIMO Semantische Kommunikation

用于AI-Native MIMO语义通信的远程空间对齐

2507.16680v2

1163

07-24

Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Tiny ist nicht klein genug: Hochwertige, ressourcenarme Gesichtsanimationsmodelle durch Hybrid-Wissensdestillation

微小不够小:通过混合知识蒸馏,建立高质量、资源低的面部动画模型。

2507.18352v1

1164

07-24

Low-rank adaptive physics-informed HyperDeepONets for solving differential equations

Low-rank adaptive Physik-informiert HyperDeepONets zur Lösung von Differentialgleichungen

用于解决差别方程的低级别适应性物理知情高超深电联

2507.18346v1

1165

07-24

Remembering the Markov Property in Cooperative MARL

Erinnerung an das Markov-Grundstück in der Genossenschaft MARL

记得马尔科夫在MARL合作社中的财产

2507.18333v1

1166

07-24

Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations

Hierarchisches dimensionsloses Lernen (Hi-π): Ein physik-data-hybridgetriebener Ansatz zur Entdeckung dimensionsloser Parameterkombinationen

高层次无尺寸学习(Hi-):物理学-数据混合驱动的发现无尺寸参数组合的物理-数据混合法

2507.18332v1

1167

07-24

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Position: Eine empirisch begründete Identifizierbarkeitstheorie beschleunigt die selbstüberwachte Lernforschung

职位: 以活性基础的可识别性理论将加速自我监督学习研究

2504.13101v3

1168

07-24

A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

Ein Multi-Dataset-Benchmark für semi-überwachte semantische Segmentierung in EKG-Delineation

ECG 划定中半超部分解的多数据集基准

2507.18323v1

1169

07-24

I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

I-CEE: Maßgeschneiderte Erläuterungen von Bildklassifikationsmodellen zur Benutzerexpertise

I-CEE:根据用户专门知识对图像分类模型的定制解释

2312.12102v3

1170

07-24

State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

Zustand der Gesundheit Schätzung von Batterien mit einem zeitinformierten dynamischen Sequenz-invertierten Transformer

使用时间化动态序列反向转换器对电池进行健康状况估计

2507.18320v1

1171

07-24

Regression-aware Continual Learning for Android Malware Detection

Regressions-aware Continual Learning für Android Malware-Erkennung

Android Maware 探测 Android Maware 持续学习

2507.18313v1

1172

07-24

GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

GNN-ALLP:基于模拟电路链接预测的图表神经网络

2504.10240v4

1173

07-24

Variational inference for pile-up removal at hadron colliders with diffusion models

Variationsableitung zur Stapelabfuhr an Hadron-Kollidern mit Diffusionsmodellen

与扩散模型相撞的hadron相撞器的堆叠式清除的变异推论

2410.22074v2

1174

07-24

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

PRIX: Planen lernen von rohen Pixeln für autonomes Fahren Ende-zu-Ende

PRIX: 学习从Raw像素到计划用于终端到终端自治驾驶

2507.17596v2

1175

07-24

Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

Selbstüberwachte Verzahnung des unstrukturierten Gitters mit automatischer Differenzierung

带有自动差异的无结构网格自操作粗化

2507.18297v1

1176

07-24

Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

Leveraging Data Augmentation und Siamese Learning für vorausschauende Prozessüberwachung

利用数据增强和西亚学习来监测预测过程

2507.18293v1

1177

07-24

Learning Concepts Definable in First-Order Logic with Counting

Lernkonzepte im Logic erster Ordnung mit Zählen definierbar

一阶逻辑中与计数相容的学习概念

1909.03820v5

1178

07-24

Alternative Loss Function in Evaluation of Transformer Models

Alternative Verlustfunktion bei der Bewertung von Transformer-Modellen

变换模型评价中的替代损失功能

2507.16548v2

1179

07-24

SyncMapV2: Robust and Adaptive Unsupervised Segmentation

SyncMapV2: Robuste und adaptive unüberwachte Segmentierung

同步马普V2: 强力和适应性不受监督的分割

2506.16297v3

1180

07-24

Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

Multimodale Verhaltensmusteranalyse mit Eye-Tracking und LLM-basierter Vernunft

以眼跟踪和基于LLM的理由进行多模式行为模式分析

2507.18252v1

1181

07-24

Latent Representations of Intracardiac Electrograms for Atrial Fibrillation Driver Detection

Latente Darstellungen von intrakardialen Elektrogrammen für Vorhofflimmern-Treibererkennung

用于实验性纤维纤维化驱动检测的心内热解电图

2507.19547v1

1182

07-24

Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Revisited Boosting: Benchmarking and Advancing LP-Based Ensemble Methods

重新审视促进:基准制定和推进基于LP的组合组合方法

2507.18242v1

1183

07-24

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Robustes Multi-View-Lernen durch Darstellung Fusion von Sample-Level-Achtung und Ausrichtung der simulierten Perturbation

通过展示抽样关注层的聚合和模拟扰动的调整,通过代表方式进行强有力的多视角学习

2503.04151v2

1184

07-24

Compositional Coordination for Multi-Robot Teams with Large Language Models

Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen

具有大语言模式的多机器人小组的组成协调

2507.16068v2

1185

07-24

Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Warum wirken sich klassenabhängige Auswertungseffekte mit Zeitreihen-Feature-Attributionen aus? Eine synthetische Datenuntersuchung

为何类依赖评价效果与时间序列特征属性是否相符? 合成数据调查

2506.11790v2

1186

07-24

Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective

Sparse Identifikation von nichtlinearen Dynamiken mit Bibliotheksoptimierungsmechanismus: Rekursive langfristige Vorhersageperspektive

利用图书馆优化机制粗略地识别非线性动态与图书馆优化机制:递归性长期预测前景

2507.18220v1

1187

07-24

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

FedSA-GCL: Ein semi-asynchrones Federated Graph Learning Framework mit personalisierter Aggregation und Cluster-Aware Broadcasting

FedSA-GCL:半同步的联邦联邦图表学习框架,配有个性化聚合和集束软件广播

2507.18219v1

1188

07-24

The Role of the Time-Dependent Hessian in High-Dimensional Optimization

Die Rolle des Zeitabhängigen Hessen bei der hochdimensionalen Optimierung

时间依赖的赫西安人在高多样性最佳化中的作用

2403.02418v3

1189

07-24

Goal-based Trajectory Prediction for improved Cross-Dataset Generalization

Zielbasierte Trajektorie-Vorhersage für verbesserte Cross-Dataset-Verallgemeinerung

改进交叉数据通用化的基于目标的轨迹预测

2507.18196v1

1190

07-24

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Jenseits der Low-Rank-Dekomposition: Ein Shortcut-Ansatz für effizientes On-Device-Lernen

超越低级别分解:高效在线学习的捷径方法

2505.05086v2

1191

07-24

A general language model for peptide identification

Ein allgemeines Sprachmodell für die Peptididentifikation

铅化物识别通用语言模式

2502.15610v4

1192

07-24

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Innovator: Wissenschaftliche Weiterbildung mit feinkörnigem MoE Upcycling

创新者:科学继续预科培训,采用精美的机动车骑车

2507.18671v1

1193

07-24

ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

ChronoSelect: Robustes Lernen mit lauten Etiketten über Dynamics Temporal Memory

ChronoSect: 通过动态时空内存与新标签进行强力学习

2507.18183v1

1194

07-24

Statistical Runtime Verification for LLMs via Robustness Estimation

Statistische Laufzeitprüfung für LLMs mittels Robustheitsschätzung

通过强力估计法对LLMs进行统计运行时间校验

2504.17723v2

1195

07-24

SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

SDSC:A Structure-Aware Metric for Semantic Signal Representative Learning

SDSC:用于语义信号代言学习的结构-孔径计量仪

2507.14516v2

1196

07-24

GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar

GeoAvatar: Adaptive geometrische Gaussian Splatting für 3D-Kopf Avatar

GeoAvatar: 3D Avatar 头的适应性几何高山喷涂

2507.18155v1

1197

07-24

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Robuste, nicht adaptive Gruppenprüfung unter Fehlern in den Gruppenmitgliedschaftsspezifikationen

根据集团成员类别规格错误进行强力非适应性小组测试

2409.05345v2

1198

07-24

Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions

Neuromorphes Computing für körpereigene Intelligenz in autonomen Systemen: Aktuelle Trends, Herausforderungen und Zukunftsrichtungen

自治区内渗透情报的神经元化计算:当前趋势、挑战和未来方向

2507.18139v1

1199

07-24

DAA*: Deep Angular A Star for Image-based Path Planning

DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung

DAA*:基于图像的路径规划深角A星

2507.09305v3

1200

07-24

TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

TOC-UCO: ein umfassendes Repository von tabellarischen Klassifikationsdatensätzen

TOC-UCO:表格格式分类数据集综合储存库

2507.17348v2

1201

07-24

Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Maximierung von Prefix-Konfidenz bei Test-Time verbessert mathematische Reasoning effizient

使试验时间有效改进数学理由的预设信息最大化

2507.18122v1

1202

07-24

Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

Effizientes Knowledge Tracing Leveraging Higher-Order Information in integrierten Graphen

在综合图表中利用高级命令信息

2507.18668v1

1203

07-24

VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung

VCDiag: 失灵千兆字节加速不规则波形分类

2506.03590v3

1204

07-24

Generalizing Adam to Manifolds for Efficiently Training Transformers

Verallgemeinern von Adam zu Manifolds für effizientes Training Transformers

将亚当推广为高效率培训变换器的处理器

2305.16901v4

1205

07-24

A Two-armed Bandit Framework for A/B Testing

Ein zweiarmiges Bandit-Framework für A/B-Tests

A/B测试有两武装的土匪框架

2507.18118v1

1206

07-24

The Impact of Pseudo-Science in Financial Loans Risk Prediction

Die Auswirkungen von Pseudo-Science auf die Risikovorhersage von Finanzkrediten

假科学对金融贷款风险预测的影响

2507.16182v2

1207

07-24

Agentic AI framework for End-to-End Medical Data Inference

Agentische KI-Framework für Ende-zu-Ende medizinische Datenableitung

最终至最终医疗数据推断的AA AA 框架框架

2507.18115v1

1208

07-24

Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Politische Disruption bei der Stärkung des Lernens:Umgekehrter Angriff mit großen Sprachmodellen und kritischer Zustandsidentifikation

强化学习方面的政策混乱:以大语言模式和关键状态识别进行反向攻击

2507.18113v1

1209

07-24

Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Prozentual basierte Deep-Verstärkung-Lernen und Belohnung basierte Personalisierung für Delay Aware RAN Slicing in O-RAN

在O-RAN为延迟了解RAN切片而进行百分百分率深强化学习和奖励性个人化

2507.18111v1

1210 07-24 A New Pair of GloVes Ein neues Paar GloVes 新的地球之对 2507.18103v1

1211

07-24

Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

Vergleich der Segmentierungsmethoden bei der Fernerkundung für die Bodenbedeckung

土地利用、土地利用的变化和林业遥感遥感分路方法比较

2507.18099v1

1212

07-24

Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

Lernen von Hardlabels mit zusätzlicher Überwachung auf nicht-Hard-Label-Klassen

学习从硬标签中学习,对非黑、黑、黑、有附加监督的非黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑

2507.18098v1

1213

07-24

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Lang-Short-Distanz Graph Neural Networks und verbessertes Curriculum-Lernen für Emotionserkennung im Gespräch

长短距离远距神经神经网络和改进课程学习,以在对话中认识情感

2507.15205v2

1214

07-24

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs

LLM 网络动态:追踪在LLM网络中的模型崩溃情况

2506.15690v3

1215

07-24

A Principled Approach for Data Bias Mitigation

Ein prinzipieller Ansatz für Daten-Bias-Minderung

减轻数据偏见的原则办法

2405.12312v4

1216

07-24

Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen

共同残存挖掘者:改进现实世界接触-Rich 人教管管管

2506.16685v2

1217

07-24

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Feinangepasste Sprachmodelle erzeugen stabile anorganische Materialien als Text

精精精导语言模型生成稳定无机材料作为文本

2402.04379v2

1218

07-24

Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

Komprimierte und verteilte am wenigsten quadratische Regression: Konvergenzraten mit Anwendungen für Federated Learning

压缩和分布的最小平方回归:与应用到联邦学习的趋同率

2308.01358v2

1219

07-24

History-Guided Video Diffusion

Geschichte-geführte Video-Diffusion

历史引导视频传播

2502.06764v2

1220

07-24

Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method

Squeeze10-LLM: Gewichte der LLMs um 10 Mal durch eine stufenweise gemischte Präzisionsquantifizierung

Squeze10-LLLM:通过分阶段混合精密量化方法用10 Times挤压LLMs的重量

2507.18073v1

1221

07-24

C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams

C-AAE: Komprimierend anonymisierende Autoencoder für Datenschutz-Erhaltung Aktivitätserkennung in Healthcare Sensor Streams

C-AAE: 压缩匿名自动编码器,以便在保健感应器流中确认隐私保护活动

2507.18072v1

1222

07-24

BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz

BlockDiaect: 节能LLM 推论的粗件精细混合格式量化

2501.01144v5

1223

07-24

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Multiscale Neural PDE Surrogats für Vorhersage und Downscaling: Anwendung auf Meeresströmungen

预测和缩小预测和缩小尺度的多尺度多神经PDE代号:对洋流的应用

2507.18067v1

1224

07-24

Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Fixierung der Pitfalls der probabilistischen Zeitreihen-Prognosebewertung durch Kernel-Quadratur

由内核二次曲线确定概率时间- 系列预测评价的空隙

2503.06079v2

1225

07-24

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Causally Testing Gender Bias in LLMs: Eine Fallstudie über berufsbezogene Bias

《LLMM中因果测试性别偏见:职业偏见案例研究》

2212.10678v4

1226

07-24

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

Ein Multi-Faceted-Evaluierungsrahmen für die Bewertung synthetischer Daten, erzeugt durch große Sprachmodelle

评估由大语言模型生成的合成数据多面评价框架

2404.14445v2

1227

07-24

Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

Privacy-Preserving Synthetic Review Generation mit unterschiedlichen Schreibstilen mit LLMs

使用LLMMs以多种写作风格生成的隐私-保护合成审查

2507.18055v1

1228

07-24

Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme

Unisoma:多层系统统一变压器解决方案

2506.06021v2

1229

07-24

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText: Deepfake-Bilderkennung mit Vision-Language-Modellerklärungen und Graph-Neural-Netzwerken

ViGText: 用视觉语言模型解释和图形神经网络进行深假图像探测

2507.18031v1

1230

07-24

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

KI-Workflow, externe Validierung und Entwicklung in der Augenerkrankungen-Diagnose

AI 工作流程、外部验证和眼病诊断的发展

2409.15087v2

1231

07-24

Does visualization help AI understand data?

Hilft die Visualisierung KI, Daten zu verstehen?

可视化能帮助AI理解数据吗?

2507.18022v1

1232

07-24

Zeroth-order log-concave sampling

logkonkav-Probenahme der Nullten Ordnung

零级对数集中取样

2507.18021v1

1233

07-24

On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

Über die Nutzung nicht markierter Daten für die gleichzeitige positive und nicht markierte Klassifizierung und robuste Generierung

利用未贴标签数据进行同时正-未贴标签分类和强力生成

2006.07841v3

1234

07-24

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

vorausschauende Skalierungsgesetze für eine effiziente GRPO-Schulung großer vernünftiger Modelle

GROPP 高效培训大理由模型的预测增强法律

2507.18014v1

1235

07-24

Deep Reinforcement Learning for Real-Time Green Energy Integration in Data Centers

Deep Enforcement Learning für die Integration grüner Energie in Rechenzentren in Echtzeit

数据中心实时绿色能源整合深入强化学习

2507.21153v1

1236

07-24

Active Learning For Repairable Hardware Systems With Partial Coverage

Aktives Lernen für reparable Hardware-Systeme mit teilweiser Abdeckung

为部分覆盖的可修理硬件系统积极学习

2503.16315v3

1237

07-24

Deep Unfolding for MIMO Signal Detection

Tiefenentfaltung für MIMO-Signalerkennung

MIMIMO信号探测的深度拆解

2507.21152v1

1238

07-24

Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs

Analyse des Islamophoben Diskurses mit semi-kodierten Ausdrücken und LLMs

使用半编码术语和LLMs分析仇视伊斯兰者的情况

2503.18273v2

1239

07-24

Fine-Grained Uncertainty Quantification via Collisions

Feinkörnige Unsicherheit Quantifizierung über Kollisionen

通过碰撞进行精细的不确定性定量

2411.12127v4

Article 0

Title@2025-07-31 (4): SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

Title: SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

SUB: Benchmarking der CBM-Verallgemeinerung über Synthetische Attribute Substitutionen

基准化 CBM 通过合成属性替代实现普遍化 2507.23784v1

Authors (4): Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata

Concept Bottleneck Models (CBMs) and other concept-based interpretable models show great promise for making AI applications more transparent, which is essential in fields like medicine. Despite their success, we demonstrate that CBMs struggle to reliably identify the correct concepts under distribution shifts. To assess the robustness of CBMs to concept variations, we introduce SUB: a fine-grained image and concept benchmark containing 38,400 synthetic images based on the CUB dataset. To create SUB, we select a CUB subset of 33 bird classes and 45 concepts to generate images which substitute a specific concept, such as wing color or belly pattern. We introduce a novel Tied Diffusion Guidance (TDG) method to precisely control generated images, where noise sharing for two parallel denoising processes ensures that both the correct bird class and the correct attribute are generated. This novel benchmark enables rigorous evaluation of CBMs and similar interpretable models, contributing to the development of more robust methods. Our code is available at https://github.com/ExplainableML/sub and the dataset at http://huggingface.co/datasets/Jessica-bader/SUB.

概念瓶颈模型和其他基于概念的可解释模型显示了使AI应用更加透明的巨大前景,这在医学等领域至关重要。尽管取得了成功,但我们证明,建立信任措施努力可靠地确定分布变化下的正确概念。为了评估建立信任措施对概念变异的稳健性,我们引入了小组委员会:一个精细的图像和概念基准,其中包括基于CUB数据集的38 400个合成图像。为了创建小组委员会,我们选择了一个由33个鸟类类和45个概念组成的CUB子集,以生成替代诸如翅膀颜色或腹部模式等特定概念的图像。我们引入了一种新型的Tiifl 指南(TDG) 方法,以精确控制生成的图像,在这个方法中,噪音共享能确保产生正确的鸟类类和正确的属性。这个新的基准能够对建立信任措施和类似的可解释模型进行严格的评估,有助于开发更强有力的方法。我们的代码可在https://githhubub.com/ExlainableML/sub上查阅,并在http://huggface.co/datasets/Jessica-bader/SUB中查阅。

Article 1

Title@2025-07-31 (4): XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

Title: XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

XSpecMesh: Qualitätsschonende Auto-Regressive Mesh-Generation Beschleunigung über Multi-Head Spekulative Decodierung

XSpecMesh:通过多格投机代号加速实现质量保护自动递减的机械生成 2507.23777v1

Authors (5): Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang

Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.

目前的自动递减模型可以产生高质量的、地形精确的中间线;然而,在推理过程中,它们需要数千甚至数万次次次方的预测,从而导致大量悬浮。我们引入了XSpecMesh,这是自动递减网状生成模型的一种质量保存加速法。XSpecMesh使用一个轻量、多头投机解码计划,在一个前方通道内同时预测多个标记,从而加快推理速度。我们进一步提议了一个核查和抽查战略:主干模型核查每一个预测的标志,并重新标注不符合质量标准的任何标记。此外,我们提议了一项蒸馏战略,通过从主干模型中提炼,鼓励其预测分布,以协调并改进投机预测的成功率。广泛的实验表明,我们的方法在不牺牲一代质量的情况下实现了1.7x速度加速。我们的代码将被发布。

Article 2

Title@2025-07-31 (4): SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

Title: SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

SimuRA: Auf dem Weg zu einem General Goal-Oriented Agent über Simulative Reasoning Architecture mit LLM-basiertem Weltmodell

SimurRA:通过使用以LLM为基础的世界模型的模拟合理理由结构,努力实现以一般目标为导向的代理 2507.23773v1

Authors (7): Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing

AI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the outcomes of their actions and plans. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of optimal agent in any environment, \modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation. The generalized world model is implemented using LLM, which can flexibly plan in a wide range of environments using the concept-rich latent space of natural language. Experiments on difficult web browsing tasks show that \modelname improves the success of flight search from 0\% to 32.2\%. World-model-based planning, in particular, shows consistent advantage of up to 124\% over autoregressive planning, demonstrating the advantage of world model simulation as a reasoning paradigm. We are excited about the possibility for training a single, general agent model based on LLMs that can act superintelligently in all environments. To start, we make SimuRA, a web-browsing agent built on \modelname with pretrained LLMs, available as a research demo for public testing.

以大型语言模型(LLMs)为基础的AI代理商有着巨大的希望,但目前的做法侧重于一号任务一号试剂方法,不仅不能达到可缩放性和普遍性,而且还受到自动递减性LMs的根本限制。另一方面,人类是一般的代理商,其原因是在精神上模拟其行动和计划的结果。向更普遍和强大的AI代理商的方向发展,我们引入了Simura,这是一个面向普遍代理推理的面向目标的结构。基于任何环境中最佳代理商的原则性配方, 模型名通过模拟引入世界规划模式克服了自动递增推理的局限性。通用世界模型模型使用LLMM来实施,它可以在广泛的环境中灵活规划,使用丰富的自然语言概念潜伏空间。在困难的网络浏览任务上进行的实验表明,SimuRA将改进飞行搜索的成功程度,从0到32.2。特别是基于世界模型的规划,显示在自动递增性规划方面达到124的优势,通过模拟采用世界模型模型进行规划,展示世界模型模型模型的优势,在单一的LMsimimal 上,我们可以进行基础的测试。

Article 3

Title@2025-07-31 (4): GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Title: GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

GenoMAS: Ein Multi-Agenten-Framework für wissenschaftliche Entdeckung durch codegetriebene Genexpressionsanalyse

GenoMAS: 通过代码驱动基因表达分析科学发现多机构框架 2507.21035v2

Authors (3): Haoyang Liu, Yijiang Li, Haohan Wang

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

基因表达分析是许多生物医学发现的关键,然而,由于多个大型半结构化的半结构化文件的复杂性和对广泛领域专门知识的需要,从原始的笔录缩写数据中提取洞察力仍然十分艰巨。当前的自动化方法往往受到下列因素的限制:处于边缘的不灵活工作流程破裂,或完全自主的代理机构缺乏严格科学调查的必要精确度。GenoMAS通过展示一个基于LLMM的科学家团队,将结构化工作流程的可靠性与自主代理商的适应性结合起来,从而勾勒出不同的课程。GenoMAS通过打字式信息传递协议,将六种专门的LMM 代理商通过六个专门的LMM 代理商进行调试,这六种都为共同的解析工作提供了补充优势。GenoMAS的核心是一个指导性规划框架:方案代理商将高层次的任务指南引入行动股,并在每一时刻选择推进、修改、绕过或背轨,从而保持逻辑一致性,同时将精细的LMAS-LEX基准, GenalMAS 将89.13%混为数据预处理和BIBI_BL_BL_BL_I_BAR_BR_BAR_BAR_BR_BR_BR_BARBARBR_60BARBR_BR_BR_BR_BR_BR_BR_BR_BR_BR_BR_18BAR_18BAR_BAR_BAR_BAR_BARBAR_BARBARBARBARBARBARBARBARBARBAR_1860 AS_18_18

Article 4

Title@2025-07-31 (4): Consensus-Driven Active Model Selection

Title: Consensus-Driven Active Model Selection

Consensus-Driven Aktive Modellauswahl

采用协商一致的主动选择模式 2507.23771v1

Authors (5): Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery

The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset – a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.

广泛提供现成的机器学习模型是一个挑战:应该为特定的数据分析任务选择哪些模型,许多现有候选人中的哪个模型?模型选择问题传统上是通过收集和说明一个验证数据集 – – 费用昂贵和时间密集的过程 – – 来回答的。我们提出了一个积极的模型选择方法,利用候选人模型的预测来确定测试数据点标签的优先顺序,从而有效地区分最佳候选人。我们的方法,即CODA,通过在概率框架范围内为分类者、类别和数据点之间建模关系建模,进行协商一致驱动的积极模型选择。框架利用候选人库中模型之间的共识和分歧来指导标签获取过程,以及Bayesian推论更新关于哪种模型的最佳信念,因为收集更多的信息。我们通过整理一套26项基准任务来验证我们的方法,捕捉一系列模型选择情景。CDODA比现有的积极模型选择方法要大得多,从而减少了发现最佳模型比先前的状态艺术高70 %所需的说明努力。代码和数据可在 https://gyinsuba/justimda. data上查到。

Article 5

Title@2025-07-31 (4): Formal Bayesian Transfer Learning via the Total Risk Prior

Title: Formal Bayesian Transfer Learning via the Total Risk Prior

Formale Bayesian Transfer Learning über das Total Risk Prior

通过 “ 总风险前 “ 学习 2507.23768v1

Authors (3): Nathan Wycoff, Ali Arab, Lisa O. Singh

In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.

在进行严格的数据限制分析时,通过应用领域辅助数据集(称为源数据集)提供的信息,扩大目标数据集,扩大目标数据集,使用应用领域辅助数据集(称为源数据集)提供的信息,可以大大改进统计程序。然而,这种转让学习的现有方法可以大大改进统计程序。但是,在源数据集也有限,而且不能保证与目标数据集完全吻合的情况下,这种转让学习工作的现有方法可以处理源数据集也很有限的情况。一个典型的战略是,将源数据的经验损失最小化器作为目标参数的先前平均值,将数据源值的估计置于巴耶斯正规体系之外。我们的主要概念贡献是使用风险最小化器,取而代之以源参数参数为条件。这使我们能够为源数据集和目标数据集的所有参数建立一个单一的先前联合分布点。因此,我们受益于完全的巴耶斯不确定性量化,并能够执行模型,通过对每个源数据集的包含指标变量进行粗略取样,将我们先前的源值推算为Bayesian Lasso,并讨论将我们的方法缩为中小的数据集的计算技术。我们还特别展示了我们最近提出的最高级的遗传预测基线方法。

Article 6

Title@2025-07-31 (4): Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing

Title: Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing

Skalierte Beta-Modelle und Feature-Verdünnung für dynamische Ticket-Preise

用于动态票盘定价的缩放贝塔模型和特性稀释 2507.23767v1

Authors (1): Jonathan R. Landers

A novel approach is presented for identifying distinct signatures of performing acts in the secondary ticket resale market by analyzing dynamic pricing distributions. Using a newly curated, time series dataset from the SeatGeek API, we model ticket pricing distributions as scaled Beta distributions. This enables accurate parameter estimation from incomplete statistical data using a hybrid of quantile matching and the method of moments. Incorporating the estimated $\alpha$ and $\beta$ parameters into Random Forest classifiers significantly improves pairwise artist classification accuracy, demonstrating the unique economic signatures in event pricing data. Additionally, we provide theoretical and empirical evidence that incorporating zero-variance (constant-value) features into Random Forest models acts as an implicit regularizer, enhancing feature variety and robustness. This regularization promotes deeper, more varied trees in the ensemble, improving the bias-variance tradeoff and mitigating overfitting to dominant features. These findings are validated on both the new ticket pricing dataset and the standard UCI ML handwritten digits dataset.

通过分析动态定价分布,提出了一种新颖的方法,用以确定二级售票转售市场中从事行为的不同特征。我们利用SeatGeek API新设计的时序数据集,将票价分配模式作为标价分配模式的模型,采用量比和瞬间方法的混合方法,从不完整的统计数据中得出准确的参数估计。将估计的美元和美元等值参数纳入随机森林分类器,大大提高了双向艺术家分类的准确性,显示了在价格数据出现时的独特经济特征。此外,我们提供了理论和经验证据,证明随机森林模型中包含零差异(常值)特征作为隐含的固定因素,加强了特征多样性和稳健性。这种正规化促进了组合中更深、更多样化的树木,改进了偏差的权衡,并减轻了对主导特征的过度适应。这些结果在新的票价数据集和标准的UCI ML手写数字数据集上得到了验证。

Article 7

Title@2025-07-31 (4): Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System

Title: Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System

Verbesserung der Annotator-Auswahl in Active Learning mit einem Stimmungs- und Ermüdungs-Empfänger-System

利用情绪和疲劳意识建议系统,改进积极学习中宣传员的选择 2507.23756v1

Authors (1): Diana Mortagua

This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simulating annotators in a realistic manner, and predicting their performance with the RS. The results show that considering past accuracy values, as well as mood and fatigue levels reduces the number of annotation errors made by the annotators, and the uncertainty of the model through its training, when compared to not using internal factors. Accuracy and F1-score values were also better in the proposed approach, despite not being as substantial as the aforementioned. The methodologies and findings presented in this study begin to explore the open challenge of human cognitive factors affecting AL.

这项研究的核心是克服在积极学习(AL)中为每个查询选择最佳评分员的挑战,目的是尽量减少分类错误。AL承认在获取标签数据时成本和时间方面的挑战,并减少了所需的标签数据数量。然而,仍然有必要减少批注错误,力求尽可能高效,以更快地达到预期的准确性。大多数评分员对口战略并不考虑影响生产率的内部因素,如情绪、注意力、动机和疲劳程度。这项工作解决了现有文献中的这一差距,不仅考虑了内部因素如何影响评分员(饮食和疲劳程度),而且还提出了使用基于知识的建议系统(RS)的新的查分对口战略。RS将现有的批注者排在排名中,允许选择一个或多个评分,使用其过去的准确性值、情绪和疲劳程度,以及有关实例调查方法的信息。这项工作以现有关于人类表现的情绪和疲劳程度的文献为基础,而不是以现实的方式模拟评分,而不是以不现实的方式对评分因素来影响评分,而是采用基于知识的评分制的对口对口战略,同时将现有的评分率和预测其准确性水平,通过研究来显示其过去的结果,并显示其准确性,并预测。

Article 8

Title@2025-07-31 (4): Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic

Title: Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic

Raum-Temporale Verstärkung Lernen für Netzwerk Routing mit nicht-Markovian Verkehr

非马其顿交通网络运行空间-临时加强学习 2507.22174v2

Authors (2): Molly Wang, Kin. K Leung

Reinforcement Learning (RL) has been widely used for packet routing in communication networks, but traditional RL methods rely on the Markov assumption that the current state contains all necessary information for decision-making. In reality, internet traffic is non-Markovian, and past states do influence routing performance. Moreover, common deep RL approaches use function approximators, such as neural networks, that do not model the spatial structure in network topologies. To address these shortcomings, we design a network environment with non-Markovian traffic and introduce a spatial-temporal RL (STRL) framework for packet routing. Our approach outperforms traditional baselines by more than 19% during training and 7% for inference despite a change in network topology.

强化学习(RL)在通信网络中被广泛用于封包路径,但传统的RL方法依赖于Markov的假设,即当前状态包含所有决策所需的信息。事实上,互联网交通是非马尔科维亚的,而过去各州确实影响了路由性能。此外,共同的深层RL方法使用功能近似器,如神经网络,这些功能并不模拟网络地形的空间结构。为了解决这些缺陷,我们设计了一个非马尔科维亚交通的网络环境,并引入了空间时空RL(STL)包件路径框架。尽管网络地形的变化,我们的方法在培训期间比传统基线高出19%以上,推断比传统基线高出7%。

Article 9

Title@2025-07-31 (4): Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs

Title: Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs

Regel2Text: Natürliche Sprache Erklärung der logischen Regeln in Wissensgraphen

规则2案文:知识图中逻辑规则的自然语言解释 2507.23740v1

Authors (4): Nasim Shirvani-Mahdavi, Devin Wingfield, Amin Ghasemi, Chengkai Li

Knowledge graphs (KGs) often contain sufficient information to support the inference of new facts. Identifying logical rules not only improves the completeness of a knowledge graph but also enables the detection of potential errors, reveals subtle data patterns, and enhances the overall capacity for reasoning and interpretation. However, the complexity of such rules, combined with the unique labeling conventions of each KG, can make them difficult for humans to understand. In this paper, we explore the potential of large language models to generate natural language explanations for logical rules. Specifically, we extract logical rules using the AMIE 3.5.1 rule discovery algorithm from the benchmark dataset FB15k-237 and two large-scale datasets, FB-CVT-REV and FB+CVT-REV. We examine various prompting strategies, including zero- and few-shot prompting, including variable entity types, and chain-of-thought reasoning. We conduct a comprehensive human evaluation of the generated explanations based on correctness, clarity, and hallucination, and also assess the use of large language models as automatic judges. Our results demonstrate promising performance in terms of explanation correctness and clarity, although several challenges remain for future research. All scripts and data used in this study are publicly available at https://github.com/idirlab/KGRule2NL}{https://github.com/idirlab/KGRule2NL.

确定逻辑规则不仅提高了知识图的完整性,而且能够发现潜在的错误,揭示了微妙的数据模式,并提高了总体推理和解释能力;然而,这些规则的复杂性,加上每个知识图独特的标签公约,可能使人类难以理解这些规则;在本文件中,我们探索大型语言模型产生逻辑规则自然语言解释的潜力;具体地说,我们利用基准数据集FB15k-237和两个大型数据集FB-CVT-REV和FB+CVT-REV。我们审查各种提示战略,包括零和几发提示性战略,包括不同实体类型和思维链推理。我们根据正确性、清晰性和幻觉,对产生的解释进行全面的人文评价,并评估大型语言模型作为自动法官的使用情况。我们的结果显示在解释正确性和清晰性方面有希望的业绩,尽管在http/httpsurity/G_BAR_G_G_BAR_RR)中,所有数据都用于未来研究。

Article 10

Title@2025-07-31 (4): DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction

Title: DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction

DICOM De-Identifikation über Hybrid-KI und regelbasiertes Framework für skalierbare, unsichere Redaction

DICOM 通过混合AI和基于规则的可缩放、不确定-软件编辑框架进行识别 2507.23736v1

Authors (7): Kyle Naddeo, Nikolas Koutsoubis, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Tony OSullivan, Raj Krish

Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data. Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks. This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.

医疗成像和相关文本数据的获取有可能推动医疗保健研究和患者结果方面的重大进展,然而,在医学数字成像和通信(DICOM)档案中存在保护健康信息(PHI)和个人识别信息(PII),对以道德和安全的方式共享成像数据集构成重大障碍,本文介绍了由影响商业信息解决方案(IBIS)开发的混合脱身份框架,该框架结合基于规则和AI驱动的技术,以及从元数据和像素数据中全面删除PHI/PII的严格不确定性量化。我们的方法始于一个两级基于规则的系统,以明确和推断的元数据要素为对象,再辅之以一个大型语言模型(LLLM),为名列实体识别(NER)进行精细调,并经过培训,以一套合成数据集模拟现实的临床PHI/PII。对于像素数据,我们采用了一种具有不确定性的更快的RNN模式,将嵌入文本本地化,通过光盘识别关键身份识别(OCRICR),并应用NER管道进行最后的重新定义。

Article 11

Title@2025-07-31 (4): A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Title: A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Ein theoretisches Rahmenwerk zur Erklärung von Stärkungslernen mit Shapley-Werten

解释有阴影值的强化学习理论框架 2505.07797v2

Authors (3): Daniel Beechey, Thomas M. S. Smith, Özgür Şimşek

Reinforcement learning agents can achieve super-human performance in complex decision-making tasks, but their behaviour is often difficult to understand and explain. This lack of explanation limits deployment, especially in safety-critical settings where understanding and trust are essential. We identify three core explanatory targets that together provide a comprehensive view of reinforcement learning agents: behaviour, outcomes, and predictions. We develop a unified theoretical framework for explaining these three elements of reinforcement learning agents through the influence of individual features that the agent observes in its environment. We derive feature influences by using Shapley values, which collectively and uniquely satisfy a set of well-motivated axioms for fair and consistent credit assignment. The proposed approach, Shapley Values for Explaining Reinforcement Learning (SVERL), provides a single theoretical framework to comprehensively and meaningfully explain reinforcement learning agents. It yields explanations with precise semantics that are not only interpretable but also mathematically justified, enabling us to identify and correct conceptual issues in prior explanations. Through illustrative examples, we show how SVERL produces useful, intuitive explanations of agent behaviour, outcomes, and predictions, which are not apparent from observing agent behaviour alone.

强化学习机构在复杂的决策任务中可以实现超人业绩,但是他们的行为往往难以理解和解释。这种缺乏解释限制了部署,特别是在理解和信任至关重要的安全关键环境中。我们确定了三个核心解释目标,它们共同提供了强化学习机构的全面观点:行为、结果和预测。我们开发了一个统一的理论框架,通过强化学习机构在其环境中观察到的个别特征的影响来解释强化学习机构的这三个要素。我们利用沙普利价值观来产生特征影响,这些价值观集体和独特地满足了一套有良好动机的公平、一致的信用分配原则。拟议的方法“解释强化学习的精髓价值”为全面、有意义地解释强化学习机构提供了一个单一的理论框架。它给出了精确的语义性解释,这些语义不仅可以解释,而且具有数学上的合理性,使我们能够在先前的解释中识别和纠正概念问题。我们通过示例展示了SVERL如何对代理人的行为、结果和预测产生有用、直观的解释、结果和预测,而仅仅观察代理人的行为并不明显。

Article 12

Title@2025-07-31 (4): Intersectional Divergence: Measuring Fairness in Regression

Title: Intersectional Divergence: Measuring Fairness in Regression

Intersektionale Divergenz: Fairness in der Regression messen

跨部门的交叉差异:衡量倒退中的公平性 2505.00830v2

Authors (3): Joe Germino, Nuno Moniz, Nitesh V. Chawla

Fairness in machine learning research is commonly framed in the context of classification tasks, leaving critical gaps in regression. In this paper, we propose a novel approach to measure intersectional fairness in regression tasks, going beyond the focus on single protected attributes from existing work to consider combinations of all protected attributes. Furthermore, we contend that it is insufficient to measure the average error of groups without regard for imbalanced domain preferences. Accordingly, we propose Intersectional Divergence (ID) as the first fairness measure for regression tasks that 1) describes fair model behavior across multiple protected attributes and 2) differentiates the impact of predictions in target ranges most relevant to users. We extend our proposal demonstrating how ID can be adapted into a loss function, IDLoss, that satisfies convergence guarantees and has piecewise smooth properties that enable practical optimization. Through an extensive experimental evaluation, we demonstrate how ID allows unique insights into model behavior and fairness, and how incorporating IDLoss into optimization can considerably improve single-attribute and intersectional model fairness while maintaining a competitive balance in predictive performance.

机器学习研究的公平性通常以分类任务为框架,留下关键的回归差距。在本文中,我们提出一种新的方法,衡量回归任务中的交叉公平性,超越现有工作中单一受保护属性的重点,考虑所有受保护属性的组合。此外,我们认为,光衡量各群体的平均错误而不考虑不平衡域的偏好,是不够的。因此,我们提议将跨部门差异性(ID)作为回归任务的第一个公平性衡量标准,即(1) 描述多个受保护属性之间的公平行为模式,(2) 区分目标范围中与用户最相关的预测的影响。我们扩展了我们的提议,说明如何将ID调整为损失函数,IDLOs, 满足趋同保证, 并具有可以实现实际优化的条理顺的特性。我们通过广泛的实验评估,展示ID如何允许对模型行为和公平性有独到的洞察力,以及将IDLos纳入优化可以大大改善单一属性和交叉模式的公平性,同时保持预测性能的竞争性平衡。

Article 13

Title@2025-07-31 (4): Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Title: Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Verstärkung der multi-agenten Zusammenarbeit mit aufmerksamkeitsbasierter akteur-kritischer Politik

加强多机构与基于注意的行为者-批评政策的协作 2507.22782v2

Authors (2): Hugo Garrido-Lestache, Jeremy Kedziora

This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).

本文介绍“团队-注意-行动者-批评”(TAAC),这是一种强化学习算法,旨在加强合作环境中的多剂协作;TAAC采用集中化培训/集中执行计划,在行为者和评论家中都采用多头关注机制;这一设计有利于动态的、机构间的交流,使代理商能够明确询问队友,从而有效地管理联合行动空间的指数增长,同时确保高度合作;我们进一步引入了一种惩罚性损失功能,促进代理人之间不同但互补的作用;我们对照代表其他多剂模式的基准算法,包括Proximal政策优化和多动因-行动者-注意-批评。我们发现TAAC展示了优异性业绩,加强了各种计量(双率、目标差异、埃洛评级、机构间连通性、平衡的空间分布以及频繁的战术互动,如球拥有互换)。

Article 14

Title@2025-07-31 (4): Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach

Title: Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach

Quantentransfer-Lernen für die MNIST-Klassifizierung mit einem hybriden Quantum-Klassiker-Ansatz

采用混合量子分类方法进行MNIST分类的量子转移学习 2408.03351v2

Authors (1): Soumyadip Sarkar

We implement a hybrid quantum-classical model for image classification that compresses MNIST digit images into a low-dimensional feature space and then maps these features onto a 5-qubit quantum state. First, an autoencoder compresses each $28\times28$ image (784 pixels) into a 64-dimensional latent vector, preserving salient features of the digit with minimal reconstruction error. We further reduce the latent representation to 5 principal components using Principal Component Analysis (PCA), to match the 5 available qubits. These 5 features are encoded as rotation angles in a quantum circuit with 5 qubits. The quantum feature map applies single-qubit rotations ($R_y$ gates) proportional to the feature values, followed by a Hadamard gate and a cascade of entangling CNOT gates to produce a non-product entangled state. Measuring the 5-qubit state yields a 32-dimensional probability distribution over basis outcomes, which serves as a quantum-enhanced feature vector for classification. A classical neural network with a softmax output is then trained on these 32-dimensional quantum feature vectors to predict the digit class. We evaluate the hybrid model on the MNIST dataset and compare it to a purely classical baseline that uses the 64-dimensional autoencoder latent features for classification. The results show that the hybrid model can successfully classify digits, demonstrating the feasibility of integrating quantum computing in the classification pipeline, although its accuracy (about 75\% on test data) currently falls below the classical baseline (about 98\% on the same compressed data).

我们实施了一个混合量子古典图像分类模型,将MNIST数字图像压缩成一个低维特征空间,然后将这些特征映射到一个5平方位量子状态。首先,一个自动编码器将每28美元(784像素)的图像压缩成64维潜向矢量,保留数字的显著特征,同时尽量减少重建错误。我们进一步将潜在代表量减少到5个主要组成部分,以匹配现有的5个qubit。这5个特征被编码成一个5平方位量子电路的旋转角度。量子特征图将单Qubit旋转(R_y$y gate)与特性成比例,然后是一个Hadammard门和串联的NCNOT门的串联,以产生非产品缠绕的状态。测量5平方位状态将产生32维概率分布模型,作为定量增强的特性矢量矢量数据分类。一个具有软负式的神经网络输出,然后在32维平方位量电路的直线旋转旋转旋转旋转旋转旋转旋转(R_y$门) ,在32维基基数据分类上, 直径直径数据缩缩缩缩缩缩数据分类中,用直径直径,用直径分析结果预测一个直径基数数据, 直径直径基数数据, 显示一个直径直径基数级数据直径直径基数数据,以演示标值数据,用直径数据分类,以显示直径数据,用直径解算算数据,用直路标算算数据,用直径算算算数据,以显示其直径算数据,用直径,用直径基数据,以显示直径基数据,用直径基数据,用直径基数据,用直路路路路路基数据,用直位数据,用。

Article 15

Title@2025-07-31 (4): Anomalous Samples for Few-Shot Anomaly Detection

Title: Anomalous Samples for Few-Shot Anomaly Detection

Anomale Proben für wenige heiße Anomalien-Erkennung

很少热异常检测的异常样本 2507.23712v1

Authors (4): Aymane Abdali, Bartosz Boguslawski, Lucas Drumetz, Vincent Gripon

Several anomaly detection and classification methods rely on large amounts of non-anomalous or “normal” samples under the assump- tion that anomalous data is typically harder to acquire. This hypothesis becomes questionable in Few-Shot settings, where as little as one anno- tated sample can make a significant difference. In this paper, we tackle the question of utilizing anomalous samples in training a model for bi- nary anomaly classification. We propose a methodology that incorporates anomalous samples in a multi-score anomaly detection score leveraging recent Zero-Shot and memory-based techniques. We compare the utility of anomalous samples to that of regular samples and study the benefits and limitations of each. In addition, we propose an augmentation-based validation technique to optimize the aggregation of the different anomaly scores and demonstrate its effectiveness on popular industrial anomaly detection datasets.

一些异常的检测和分类方法依靠大量非异常或“正常”的样本,而异常数据通常难以获取。这种假设在很少的热环境中会引起疑问,因为只有很少的无色样本可以产生显著的差别。在本文件中,我们处理利用异常样本来训练双核异常分类模式的问题。我们建议采用一种方法,利用最近的零热和记忆技术,将异常样本纳入多极异常检测分数中。我们将异常样本的效用与常规样本的效用作比较,并研究每种样本的利弊和局限性。此外,我们建议采用基于增强的验证技术,优化不同异常分数的汇总,并在流行的工业异常检测数据集上展示其有效性。

Article 16

Title@2025-07-31 (4): GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network

Title: GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network

GCL-GCN: Graphiter und Kontrastives Lernen verbessertes Attribut-Graph-Clustering-Netzwerk

GCL-GCN: 石墨和反向学习强化成份图集集成网络 2507.19095v2

Authors (8): Binxiong Li, Xu Xiang, Xue Li, Quanzhou Lou, Binyu Zhao, Yujie Liu, Huijie Tang, Benhan Yang

Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.

然而,由于图表数据的复杂性和节点属性的异质性,为集群提供的图形信息仍然具有挑战性。为了解决这个问题,我们提议采用新的深图群集模型,即GCL-GCN, 专门旨在解决现有模型在处理分散和多样的图形数据时在捕捉本地依赖性和复杂结构方面的局限性。GCL-GCN 引入了一个创新的图形模型,该模型将核心编码和空间关系结合起来,有效地捕捉节点之间的全球和地方信息,从而提高节点表示的质量。此外,我们提议了一个新的对比学习模块,大大增强特征表示的歧视性力量。在培训前阶段,该模块通过对原始特征矩阵的对比性学习而增加区别,确保随后的图解变和组合任务有更明确的初始表述。关于六个数据集的广泛实验结果表明,GCL-GCN在组合质量和稳健性方面优于14种先进方法。具体地说,在科拉数据集方面,它使AC、NMI和ARI分别改进了4.94%、13.01%和10.97比M方法。

Article 17

Title@2025-07-31 (4): Disparate Conditional Prediction in Multiclass Classifiers

Title: Disparate Conditional Prediction in Multiclass Classifiers

Disparate Bedingte Vorhersagen in Mehrklassen-Klassifikatoren

多分类分类中的条件预测 2206.03234v3

Authors (3): Sivan Sabato, Eran Treister, Elad Yom-Tov

We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds,by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCPunder two different regimes,one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. Code is provided at https://github.com/sivansabato/ DCPmulticlass.

我们提出对多级分类师进行审计的方法,以便在多级相等的情况下公平对待多级分类师,方法是估计在分类师不完全公平的情况下与等分数的差值之间的差值。我们向多级分类师推广了Sabato & Yom-Tov(2020年)最初为二级分类师建议的分解条件预测(DCP)。DCP的定义是,分类师预测的有条件预测概率与最接近的共同基线不同的人群的分数。我们提供了新的本地优化方法,用于估算多级的多级DCPund两种不同的制度,一种制度了解每个受保护的亚群的有条件的混乱矩阵,而另一种则无法估算,例如,因为分类师无法进入,或者因为没有高质量的个人数据。这些方法可用于检测可能不公平地对待相当一部分人口的分类师。实验显示了方法的准确性。代码见 https://github.com/sivansabato/DCPmulticle。

Article 18

Title@2025-07-31 (4): Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks

Title: Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks

Satelliten-Federated Fine-Tuning für Basismodelle in Weltraum Computing Power Networks

卫星卫星联合会空间电子计算动力网络基础模型精密设计 2504.10403v3

Authors (6): Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief

Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. However, direct downloading of these models for fine-tuning on the ground is impeded by privacy concerns and limited bandwidth. Satellite federated learning (FL) offers a solution by enabling model fine-tuning directly on-board satellites and aggregating model updates without data downloading. Nevertheless, for large foundation models, the computational capacity of satellites is insufficient to support effective on-board fine-tuning in traditional satellite FL frameworks. To address these challenges, we propose a satellite-ground collaborative federated fine-tuning framework. The key of the framework lies in how to reasonably decompose and allocate model components to alleviate insufficient on-board computation capabilities. During fine-tuning, satellites exchange intermediate results with ground stations or other satellites for forward propagation and back propagation, which brings communication challenges due to the special communication topology of space transmission networks, such as intermittent satellite-ground communication, short duration of satellite-ground communication windows, and unstable inter-orbit inter-satellite links (ISLs). To reduce transmission delays, we further introduce tailored communication strategies that integrate both communication and computing resources. Specifically, we propose a parallel intra-orbit communication strategy, a topology-aware satellite-ground communication strategy, and a latency-minimalization inter-orbit communication strategy to reduce space communication costs. Simulation results demonstrate significant reductions in training time with improvements of approximately 33%.

人工智能(AI)和低地轨道(LEO)卫星的进步促进了大型遥感基础模型应用于各种下游任务;然而,隐私关切和有限带宽妨碍了直接下载这些模型以进行地面微调; 卫星联合学习(FL)提供了一种解决办法,使模型能够直接对机载卫星进行微调,并在没有数据下载的情况下对模型更新进行汇总; 然而,对于大型基础模型而言,卫星的计算能力不足以支持对传统卫星FL框架进行有效的机载微调; 为了应对这些挑战,我们提议了一个卫星-地面协作联合微调框架; 该框架的关键在于如何合理拆解和分配模型组成部分以缓解机载计算能力不足的情况; 在微调过程中,卫星与地面站或其他卫星交换中间结果,以便进行前向传播和后向传播,这带来了通信挑战,因为空间传输网络的特殊通信地形学,例如间歇的卫星地面通信、卫星地面通信窗口的短暂期限以及不稳定的轨道间通信连接(ISLs),为了减少传输延迟,我们提出了一个同步通信战略。

Article 19

Title@2025-07-31 (4): villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Title: villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

villa-X: Verbesserung des Latent Action Modeling in Vision-Language-Action-Modellen

VIAN-X:加强视觉-语言-行动模型的原始行动模型 2507.23682v1

Authors (12): Xiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, Jianyu Chen, Jiang Bian

Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions, an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO, as well as on two real-world robot setups including gripper and dexterous hand manipulation. We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.

视觉-语言-行动模型(VLA)已成为学习机器人操纵政策的流行范例,可以遵循语言指令,对新情景加以概括。最近的工作已经开始探索将潜在行动(即两个框架之间的视觉变化的抽象表现)纳入VLA预培训。在本文中,我们引入了别墅-X(VALLA)框架,这个新颖的视觉-语言-语言-语言-行动(VELLA)框架为学习通用机器人操纵政策提供了潜在的行动模型。我们的方法改进了潜在行动是如何学习的,以及如何将其纳入VLA预培训阶段。这些贡献加在一起,使别墅-X能够在包括SIMPLER和LIBERO在内的模拟环境中取得优异性性表现,以及两个真实世界的机器人装置(包括抓手和伸缩手操纵 ) 。我们认为VELLA范例有着重大希望,而且我们的别墅-X为未来研究提供了坚实的基础。

Article 20

Title@2025-07-31 (4): DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data

Title: DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data

DepMicroDiff: Diffusionsbasierte Abhängigkeits-Bewusst Multimodale Imputation für Mikrobiom-Daten

DepMicroDiff: 微生物数据多式多式计算法 2507.23676v1

Authors (2): Rabeya Tus Sadia, Qiang Cheng

Microbiome data analysis is essential for understanding host health and disease, yet its inherent sparsity and noise pose major challenges for accurate imputation, hindering downstream tasks such as biomarker discovery. Existing imputation methods, including recent diffusion-based models, often fail to capture the complex interdependencies between microbial taxa and overlook contextual metadata that can inform imputation. We introduce DepMicroDiff, a novel framework that combines diffusion-based generative modeling with a Dependency-Aware Transformer (DAT) to explicitly capture both mutual pairwise dependencies and autoregressive relationships. DepMicroDiff is further enhanced by VAE-based pretraining across diverse cancer datasets and conditioning on patient metadata encoded via a large language model (LLM). Experiments on TCGA microbiome datasets show that DepMicroDiff substantially outperforms state-of-the-art baselines, achieving higher Pearson correlation (up to 0.712), cosine similarity (up to 0.812), and lower RMSE and MAE across multiple cancer types, demonstrating its robustness and generalizability for microbiome imputation.

微生物数据分析对于了解宿主健康和疾病至关重要,然而,其固有的广度和噪音对精确估算构成重大挑战,阻碍生物标志发现等下游任务。现有的估算方法,包括最近的基于扩散的模型,往往不能捕捉微生物分类和可用作估算依据的忽略相关元数据之间的复杂相互依存关系。我们引入了DepMicroDiff,这是一个新颖的框架,它将基于传播的基因模型与依赖性软件变异器(DAT)相结合,以明确捕捉双向依赖性和自动反向关系。DepmicroDiff通过基于VAE的预培训,跨越多种癌症数据集和通过大型语言模型(LLMM)编码的病人元数据调节,得到了进一步的增强。对TCGA微生物数据集的实验表明,Dep MicroDiff大大超越了最新基线,实现了较高的Pearson相关关系(高达0.712 ) , cosine 相似性(高达0.812 ) , 低的RMSE 和 MAE 在多种癌症类型上具有稳健性和可变性。

Article 21

Title@2025-07-31 (4): A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Title: A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms

二进制黑洞波形的深学习动力数字相对相对性替代工具 2412.06946v3

Authors (9): Osvaldo Gramaxo Freitas, Anastasios Theodoropoulos, Nino Villanueva, Tiago Fernandes, Solange Nunes, José A. Font, Antonio Onofre, Alejandro Torres-Forné, José D. Martin-Guerrero

Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.

引力波相近器对于引力波波天文学至关重要,允许覆盖的双黑洞参数参数空间用于在没有昂贵的数值相对性模拟的情况下进行推导或匹配过滤,但一般地以某种精确度进行计算效率交易。为了减少这种权衡,可以使用NR波形空间的内插来建造NR代孕模型。我们为以神经网络为基础的NR代孕模型提出了一个两阶段培训方法。最初对准氧化生成的波形进行了培训,然后对NR数据进行了微调调整,这些双级人造神经外形模型(\ textt{danSur})提供了快速和有竞争力的精确波形生成,在GPUP上产生不到20米的百万个波形,同时与NR保持平均的不匹配值大约10-4美元。我们在\ textsc{bilby}框架中执行,我们显示这些模型可用于参数评估任务。

Article 22

Title@2025-07-31 (4): One-Step Flow Policy Mirror Descent

Title: One-Step Flow Policy Mirror Descent

Ein-Schritt-Fluss-Politik Spiegelabstieg

单步流动政策从属 2507.23675v1

Authors (5): Tianyi Chen, Haitong Ma, Na Li, Kai Wang, Bo Dai

Diffusion policies have achieved great success in online reinforcement learning (RL) due to their strong expressive capacity. However, the inference of diffusion policy models relies on a slow iterative sampling process, which limits their responsiveness. To overcome this limitation, we propose Flow Policy Mirror Descent (FPMD), an online RL algorithm that enables 1-step sampling during policy inference. Our approach exploits a theoretical connection between the distribution variance and the discretization error of single-step sampling in straight interpolation flow matching models, and requires no extra distillation or consistency training. We present two algorithm variants based on flow policy and MeanFlow policy parametrizations, respectively. Extensive empirical evaluations on MuJoCo benchmarks demonstrate that our algorithms show strong performance comparable to diffusion policy baselines while requiring hundreds of times fewer function evaluations during inference.

传播政策在网上强化学习(RL)方面取得了巨大成功,因为它们具有很强的表达能力。然而,传播政策模型的推论依赖于一个缓慢的迭代抽样过程,这限制了它们的反应能力。为了克服这一限制,我们提议了流动政策镜源(FPMD)(FPMD),这是一个在线RL算法,在政策推理期间可以进行一步抽样。我们的方法利用了分布差异和单步抽样在直线内插流量匹配模型中的分立错误之间的理论联系,不需要额外的蒸馏或一致性培训。我们提出了两个基于流动政策和中流政策参数的演算法变量。关于MuJoCo基准的广泛经验评估表明,我们的算法表现出与分散政策基线相当的强效,同时在推断期间需要数百倍的功能评估。

Article 23

Title@2025-07-31 (4): TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses

Title: TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses

TweakLLM: Eine Routing-Architektur für dynamisches Tailoring von Cached Responses

TweakLLLM: 快速快速定制快速响应的运行结构 2507.23674v1

Authors (6): Muhammad Taha Cheema, Abeer Aamir, Khawaja Gul Muhammad, Naveed Anwar Bhatti, Ihsan Ayyub Qazi, Zafar Ayyub Qazi

Large Language Models (LLMs) process millions of queries daily, making efficient response caching a compelling optimization for reducing cost and latency. However, preserving relevance to user queries using this approach proves difficult due to the personalized nature of chatbot interactions and the limited accuracy of semantic similarity search. To address this, we present TweakLLM, a novel routing architecture that employs a lightweight LLM to dynamically adapt cached responses to incoming prompts. Through comprehensive evaluation, including user studies with side-by-side comparisons, satisfaction voting, as well as multi-agent LLM debates, we demonstrate that TweakLLM maintains response quality comparable to frontier models while significantly improving cache effectiveness. Our results across real-world datasets highlight TweakLLM as a scalable, resource-efficient caching solution for high-volume LLM deployments without compromising user experience.

大型语言模型(LLM)每天处理数以百万计的询问,使高效的响应为降低成本和延时提供了令人信服的优化,但是,由于聊天机互动的个性性质和语义相似性搜索的准确性有限,很难保持与用户查询的相关性,但是,为了解决这个问题,我们提出了TweakLLM,这是一个新型的路线结构,它使用轻量级LM来动态地调整缓存的响应速度。通过全面评价,包括用户研究,同时进行平行比较、满意投票以及多剂LLM辩论,我们证明TweakLLM保持了与前沿模型相似的反应质量,同时大大提高了缓存效率。我们跨越现实世界数据集的结果突出表明TweakLLM是高容量LM部署的可扩展性、资源高效的缓冲解决方案,同时又不损害用户的经验。

Article 24

Title@2025-07-31 (4): SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

Title: SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

SAMSA: Segment alles Modell mit Spektralwinkeln für hyperspektrale interaktive medizinische Bildsegmentierung verbessert

SAMSA:用超光谱交互式医学图像截面光谱光谱角度增强的片段“任何东西”模型 2507.23673v1

Authors (5): Alfie Roddan, Tobias Czempiel, Chi Xu, Daniel S. Elson, Stamatia Giannarou

Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA’s effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.

超光谱成像(HSI)为医学成像提供丰富的光谱信息,但由于数据限制和硬件差异而遇到重大挑战。我们引入了SAMSA,这是一个将RGB基础模型与光谱分析相结合的新型互动分解框架。SAMSA有效地利用用户点击来指导RGB分解和光谱相似性计算。该方法通过一个独立于光谱波段计数和分辨率的独特光谱特征聚合战略来解决高光谱分解的关键局限性。对公开提供的数据集的绩效评估显示,在神经外科和81.1%一击和89.2%五击DICE,为超光谱医学图像分析提供了一个灵活的框架。实验结果显示SAMSA在几发和零发学习情景中的有效性,并使用极少的培训实例。我们的方法使具有不同光谱特征的数据集得以无缝结合,为超光谱医学图像分析提供了一个灵活的框架。

Article 25

Title@2025-07-31 (4): SHAP-Guided Regularization in Machine Learning Models

Title: SHAP-Guided Regularization in Machine Learning Models

SHAP-geführte Regularisierung in Machine Learning-Modellen

SHAP-指导的机械学习模式规范化 2507.23665v1

Authors (1): Amal Saadallah

Feature attribution methods such as SHapley Additive exPlanations (SHAP) have become instrumental in understanding machine learning models, but their role in guiding model optimization remains underexplored. In this paper, we propose a SHAP-guided regularization framework that incorporates feature importance constraints into model training to enhance both predictive performance and interpretability. Our approach applies entropy-based penalties to encourage sparse, concentrated feature attributions while promoting stability across samples. The framework is applicable to both regression and classification tasks. Our first exploration started with investigating a tree-based model regularization using TreeSHAP. Through extensive experiments on benchmark regression and classification datasets, we demonstrate that our method improves generalization performance while ensuring robust and interpretable feature attributions. The proposed technique offers a novel, explainability-driven regularization approach, making machine learning models both more accurate and more reliable.

沙普利Additive Explanations(SHAP)等特征归属方法有助于了解机器学习模式,但其在指导模型优化方面的作用仍未得到充分探讨。在本文中,我们提议一个SHAP指导的规范化框架,将一些重要因素纳入示范培训,以提高预测性能和可解释性。我们的方法采用基于恒温的处罚办法来鼓励稀疏、集中的特征属性,同时促进各样本之间的稳定性。这个框架适用于回归和分类任务。我们的第一个探索始于利用TreaSHAP调查基于树的模型规范化模式。我们通过对回归和分类数据集的基准进行广泛的实验,证明我们的方法在确保稳健和可解释性特征属性的同时提高了一般化绩效。拟议的方法提供了一种新颖的、以解释性驱动的规范化方法,使机器学习模型更加准确和可靠。

Article 26

Title@2025-07-31 (4): Parallel Split Learning with Global Sampling

Title: Parallel Split Learning with Global Sampling

Paralleles Split-Lernen mit globaler Probenahme

与全球抽样平行拆分学习 2407.15738v4

Authors (4): Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink

Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size by dynamically adjusting client-side batch sizes. This decouples the effective batch size from the number of participating devices and ensures that global batches better reflect the overall data distribution. Using standard concentration bounds, we establish tighter deviation guarantees compared to existing approaches. Empirical results on a benchmark dataset confirm that the proposed method improves model accuracy, training efficiency, and convergence stability, offering a scalable solution for learning at the network edge.

在资源受限制的环境中,分散的深层次学习面临可缩放和概括化的挑战,因为有大量有效的批量规模和未识别分布的客户数据。我们采用了由服务器驱动的抽样战略,通过动态调整客户端批量规模,保持固定的全球批量规模。这将有效的批量规模与参与设备的数量脱钩,确保全球批量更好地反映总体数据分布。我们使用标准集中界限,建立比现有方法更严格的偏离保证。基准数据集的经验结果证实,拟议方法提高了模型的准确性、培训效率和趋同稳定性,为网络边缘学习提供了可扩展的解决方案。

Article 27

Title@2025-07-31 (4): How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Title: How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Wie kann ich meinen LLM-Benchmark veröffentlichen, ohne die wahren Antworten wegzugeben?

我怎样才能公布我的LLM基准而不给出正确的答案? 2505.18102v2

Authors (3): Takashi Ishida, Thanawat Lodkaew, Ikko Yamane

Publishing a large language model (LLM) benchmark on the Internet risks contaminating future LLMs: the benchmark may be unintentionally (or intentionally) used to train or select a model. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers. However, this strategy will require trust in a single organization and still permits test-set overfitting through repeated queries. To overcome this issue, we propose a way to publish benchmarks without completely disclosing the ground-truth answers to the questions, while still maintaining the ability to openly evaluate LLMs. Our main idea is to inject randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. This reduces the best possible accuracy, i.e., Bayes accuracy, of the benchmark. Not only is this helpful to keep us from disclosing the ground truth, but this approach also offers a test for detecting data contamination. In principle, even fully capable models should not surpass the Bayes accuracy. If a model surpasses this ceiling despite this expectation, this is a strong signal of data contamination. We present experimental evidence that our method can detect data contamination accurately on a wide range of benchmarks, models, and training methodologies.

在互联网上公布一个大型语言模型(LLM)基准可能会污染未来的LLM:基准可能是无意的(或有意的)用于培训或选择一个模型。一个共同的缓解措施是保持基准的隐私,让参与者向组织者提交模型或预测。然而,这一战略需要信任一个组织,并仍然允许通过反复询问过度测试。为了克服这一问题,我们建议一种公布基准的方法,而不完全披露对问题的地面真相答案,同时保持公开评估LLMS的能力。我们的主要想法是通过编制几个逻辑正确的答案来给答案注入随机性,并且只将其中之一作为基准的解决方案。这降低了基准的最佳准确性,即Bayes准确性。这不仅有助于我们不披露地面真相,而且这一方法也为探测数据污染提供了一个测试。原则上,即使完全有能力的模型也不应该超过Bayes的准确性。尽管有这一预期,但模型超过这一上限,这是数据污染的强烈信号。我们提出实验性证据,说明我们的方法能够准确检测数据污染的范围很广的基准、培训模型和方法。

Article 28

Title@2025-07-31 (4): Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

Title: Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

Kandinsky Conformal Prediction: Jenseits von Klassen- und Kovariate-Conditional Coverage

Kandinsky 共变预测:超越等级和共同 – – 有条件的覆盖范围 2502.17264v2

Authors (3): Konstantina Bairaktari, Jiayun Wu, Zhiwei Steven Wu

Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to disparities in coverage. Prior work has explored coverage guarantees conditioned on events related to the covariates and label of the test point. We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees. In contrast to Mondrian conformal prediction, which restricts its coverage guarantees to disjoint groups – reminiscent of the rigid, structured grids of Piet Mondrian’s art – our framework flexibly handles overlapping and fractional group memberships defined jointly on covariates and labels, reflecting the layered, intersecting forms in Wassily Kandinsky’s compositions. Our algorithm unifies and extends existing methods, encompassing covariate-based group conditional, class conditional, and Mondrian conformal prediction as special cases, while achieving a minimax-optimal high-probability conditional coverage bound. Finally, we demonstrate the practicality of our approach through empirical evaluation on real-world datasets.

复杂预测是构建具有覆盖保障的预测数据集的强大无分配框架。古典方法,如分解符合预测,提供边缘覆盖,确保预测集包含随机测试点的标签,有目标概率。然而,这些保障可能在不同亚群体中不统一,导致覆盖差异。先前的工作探索了覆盖保障,条件是与试验点的共变和标签相关的事件为条件。我们提出了Kandinsky符合的预测,这个框架大大扩大了有条件覆盖保障的范围。与蒙德里安符合的预测相比,它限制了其覆盖的保障,使其限于不连的团体 – – 重复了Piet Mondrian艺术的僵硬、结构化的网格 – – 我们的框架灵活地处理在共变和标签上共同界定的重叠和分化群体成员,反映了Lassily Kandinsky 的组合中的分层和交叉形式。我们的算法统一并扩展了现有方法,包括基于共变组合的有条件群体,等级和Mondrian符合的预测,作为特殊案例,同时我们最终实现了一个小轴式的、真实、真实的、真实、真实的、可靠的数据覆盖。

Article 29

Title@2025-07-31 (4): OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting

Title: OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting

OptiGradTrust: Byzantinisch-Robust-Federiertes Lernen mit Multi-Feature Gradientenanalyse und Verstärkung Learning-Based Trust Gewichtung

OptiGrad Trustt:Byzantine-Robust 采用多性质渐进分析和强化学习的联邦学习,基于信任的加权 2507.23638v1

Authors (4): Mohammad Karami, Fatemeh Ghassemi, Hamed Kebriaei, Hamid Azadegan

Federated Learning (FL) enables collaborative model training across distributed medical institutions while preserving patient privacy, but remains vulnerable to Byzantine attacks and statistical heterogeneity. We present OptiGradTrust, a comprehensive defense framework that evaluates gradient updates through a novel six-dimensional fingerprint including VAE reconstruction error, cosine similarity metrics, $L_2$ norm, sign-consistency ratio, and Monte Carlo Shapley value, which drive a hybrid RL-attention module for adaptive trust scoring. To address convergence challenges under data heterogeneity, we develop FedBN-Prox (FedBN-P), combining Federated Batch Normalization with proximal regularization for optimal accuracy-convergence trade-offs. Extensive evaluation across MNIST, CIFAR-10, and Alzheimer’s MRI datasets under various Byzantine attack scenarios demonstrates significant improvements over state-of-the-art defenses, achieving up to +1.6 percentage points over FLGuard under non-IID conditions while maintaining robust performance against diverse attack patterns through our adaptive learning approach.

联邦学习联合会(FL)在维护患者隐私的同时,使分布式医疗机构的合作模式培训得以遍及分布式医疗机构,同时保护患者隐私,但仍然容易受到拜占庭袭击和统计差异性的影响。我们提出OptiGrad Trust,这是一个全面的防御框架,通过新型六维指纹评估梯度更新,包括VAE重建错误、cosine相似度指标、$2美元标准、标志-一致性比率和蒙特卡洛·夏普利价值,它们驱动着一个适应性信任评分的混合RL-注意模块。为了应对数据差异性下的趋同挑战,我们开发了FDBN-Prox(FedBN-P),将FedBN-Prox(FedBN-P)与最优化的准确-趋同交易的准正规化结合起来。在Byzantine攻击情景下对MRI数据集进行了广泛的评估,表明比最新技术防御水平有了显著改善,在非IID条件下超过FLGuard的1.6个百分点,同时通过我们的适应性学习方法保持对不同攻击模式的强劲表现。

Article 30

Title@2025-07-31 (4): On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Title: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Zur Expressivität von Softmax Achtung: Eine recurrente Neurale Netzwerkperspektive

” 软体关注的表达性:神经网络的经常性视角 “ 2507.23632v1

Authors (2): Gabriel Mongaras, Eric C. Larson

Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length. By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention. Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention. Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs). Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.

自引入以来,软体关注因其清晰度和可伸缩性而成为现代变压器结构的骨干;然而,软体关注的主要缺陷是序列长度的二次内存要求和计算复杂性。通过替换软体不线性、线性关注和类似方法,以避免软体关注的四面性瓶颈。尽管这些线性关注形式源于最初的软体式配方,但通常在下游准确性方面落后。虽然在查询和关键内部产品上软体不线性的强烈直觉表明,与其他非线性相比,它具有可取的特性,但这种差异存在的原因仍然没有得到答案。这项工作表明,线性关注是软体性关注近似于软体性关注,产生软体性关注的经常性形式。使用这种形式,软体性关注的每个部分都可以用经常的神经网络的语言描述(RNNS)来描述。将软体关注的每个部分描述为软体性关注,将软体性关注作为REN允许软体分解软体性关注组成部分与其他非线性产品之间的关联性特征,从而理解软体性关注的重要性。

Article 31

Title@2025-07-31 (4): CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

Title: CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

CS-SHRED: Verbesserung von SHRED zur robusten Erholung der Spatiotemporalen Dynamik

CS-SHRED:加强光学时光动力的强劲恢复 2507.22303v2

Authors (4): Romulo B. da Silva, Diego Passos, Cássio M. Oishi, J. Nathan Kutz

We present CS-SHRED, a novel deep learning architecture that integrates Compressed Sensing (CS) into a Shallow Recurrent Decoder (SHRED) to reconstruct spatiotemporal dynamics from incomplete, compressed, or corrupted data. Our approach introduces two key innovations. First, by incorporating CS techniques into the SHRED architecture, our method leverages a batch-based forward framework with $\ell_1$ regularization to robustly recover signals even in scenarios with sparse sensor placements, noisy measurements, and incomplete sensor acquisitions. Second, an adaptive loss function dynamically combines Mean Squared Error (MSE) and Mean Absolute Error (MAE) terms with a piecewise Signal-to-Noise Ratio (SNR) regularization, which suppresses noise and outliers in low-SNR regions while preserving fine-scale features in high-SNR regions. We validate CS-SHRED on challenging problems including viscoelastic fluid flows, maximum specific humidity fields, sea surface temperature distributions, and rotating turbulent flows. Compared to the traditional SHRED approach, CS-SHRED achieves significantly higher reconstruction fidelity – as demonstrated by improved SSIM and PSNR values, lower normalized errors, and enhanced LPIPS scores-thereby providing superior preservation of small-scale structures and increased robustness against noise and outliers. Our results underscore the advantages of the jointly trained CS and SHRED design architecture which includes an LSTM sequence model for characterizing the temporal evolution with a shallow decoder network (SDN) for modeling the high-dimensional state space. The SNR-guided adaptive loss function for the spatiotemporal data recovery establishes CS-SHRED as a promising tool for a wide range of applications in environmental, climatic, and scientific data analyses.

我们展示了CS-SHRED, 这是一种新型的深层学习架构, 将压缩的遥感( CS) 整合成一个浅色经常性解码器( SHRED) , 以便从不完整、压缩或腐败的数据中重建空间时空动态动态。我们的方法引入了两个关键创新。首先, 我们的方法将 CS 技术纳入SSARD 架构, 利用一个基于批量的远期框架, 价格为1美元, 即使在传感器位置稀少、测量和传感器获取不全的情况下, 也能够强有力地恢复信号。其次, 适应性损失功能将 Sqread 错误( MSE) 和平均值绝对错误( MAE) 术语动态地结合在一起, 以笔记式信号至噪音时空的动态变异常动态( SMIS ) , 高度的SIS- NRS 的变现性、更高级的SLS-S-SL-SLADR) 数据更新, 以显示S-S-S-S-SL-S-SL-S-SL-SL-S-S-S-S-S-S-SL-SL-S-S-S-SL-S-SL-SL-S-S-SL-S-S-S-S-S-S-S-S-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-SL-SL-S-SL-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-

Article 32

Title@2025-07-31 (4): DivControl: Knowledge Diversion for Controllable Image Generation

Title: DivControl: Knowledge Diversion for Controllable Image Generation

DivControl: Wissensdiversion für steuerbare Bilderzeugung

Div Control: 知识转移用于可控图像生成 2507.23620v1

Authors (6): Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

Diffusion models have advanced from text-to-image (T2I) to image-to-image (I2I) generation by incorporating structured inputs such as depth maps, enabling fine-grained spatial control. However, existing methods either train separate models for each condition or rely on unified architectures with entangled representations, resulting in poor generalization and high adaptation costs for novel conditions. To this end, we propose DivControl, a decomposable pretraining framework for unified controllable generation and efficient adaptation. DivControl factorizes ControlNet via SVD into basic components-pairs of singular vectors-which are disentangled into condition-agnostic learngenes and condition-specific tailors through knowledge diversion during multi-condition training. Knowledge diversion is implemented via a dynamic gate that performs soft routing over tailors based on the semantics of condition instructions, enabling zero-shot generalization and parameter-efficient adaptation to novel conditions. To further improve condition fidelity and training efficiency, we introduce a representation alignment loss that aligns condition embeddings with early diffusion features. Extensive experiments demonstrate that DivControl achieves state-of-the-art controllability with 36.4$\times$ less training cost, while simultaneously improving average performance on basic conditions. It also delivers strong zero-shot and few-shot performance on unseen conditions, demonstrating superior scalability, modularity, and transferability.

从文字到图像(T2I)和图像到图像(I2I)生成的融合模型已经从文字到图像(I2I)获得进步,将深度地图等结构化投入纳入到图像(I2I)生成中,从而能够进行细微的空间控制,然而,现有的方法要么为每个条件培训不同的模型,要么依靠具有纠缠的表达方式的统一结构结构,从而导致对新条件的概括性差,适应成本高。为此,我们提议DivControl,一个可分解的可统一控制生成和高效适应的训练前培训框架。为了进一步提高条件的忠诚性和训练效率,我们引入了通过SVDOD控制,同时将单一矢量矢量的特性与条件-异性学习和特定条件的裁量性进行分解,在多条件培训期间通过知识转移来培养不同的模型。知识转移是通过一个动态门进行,根据条件指示的语义性,使零光的概括化和参数效率适应新条件。为了进一步提高条件的准确性和训练效率,我们引入一种代表性调整损失,使条件与早期传播特性嵌入数美元特性,同时与条件-高度学习和高级性学习4,同时进行广泛的实验,提高性性性性性能性能,同时进行。

Article 33

Title@2025-07-31 (4): L-GTA: Latent Generative Modeling for Time Series Augmentation

Title: L-GTA: Latent Generative Modeling for Time Series Augmentation

L-GTA: Latent Generative Modellierung für Zeitreihenvergrößerung

L-GTA: 时间序列递增原始生成模型 2507.23615v1

Authors (4): Luis Roque, Carlos Soares, Vitor Cerqueira, Luis Torgo

Data augmentation is gaining importance across various aspects of time series analysis, from forecasting to classification and anomaly detection tasks. We introduce the Latent Generative Transformer Augmentation (L-GTA) model, a generative approach using a transformer-based variational recurrent autoencoder. This model uses controlled transformations within the latent space of the model to generate new time series that preserve the intrinsic properties of the original dataset. L-GTA enables the application of diverse transformations, ranging from simple jittering to magnitude warping, and combining these basic transformations to generate more complex synthetic time series datasets. Our evaluation of several real-world datasets demonstrates the ability of L-GTA to produce more reliable, consistent, and controllable augmented data. This translates into significant improvements in predictive accuracy and similarity measures compared to direct transformation methods.

数据增强在时间序列分析的各个方面,从预测到分类和异常探测任务,都越来越重要。我们引入了“低变异变异变异增强”模型(L-GTA)模型,这是一种基因化方法,使用以变异器为基础的变异常态自动编码器。该模型使用模型潜在空间内的控制变异生成新的时间序列,以保存原始数据集的内在特性。L-GTA使得能够应用从简单的抖动到规模扭曲等多种变异,并结合这些基本变异,以生成更复杂的合成时间序列数据集。我们对若干真实世界数据集的评估表明,L-GTA能够生成更可靠、一致和可控的扩大数据。这与直接变异方法相比,在预测准确性和类似性衡量方法方面大有改进。

Article 34

Title@2025-07-31 (4): MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Title: MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

MaxInfoRL: Förderung der Exploration im Verstärkungslernen durch Informationsgewinnmaximierung

MaxInfoRL:促进探索,通过信息获取最大化加强学习 2412.12098v2

Authors (5): Bhavya Sukhija, Stelian Coros, Andreas Krause, Pieter Abbeel, Carmelo Sferrazza

Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.

强化学习(RL)算法旨在平衡当前最佳战略的利用与探索可能导致更高回报的新选项之间的平衡。大多数常见的RL算法使用非定向勘探,即选择随机的行动序列。探索也可以使用内在回报(如好奇心或模型的认知不确定性)来引导。然而,有效平衡任务和内在回报是具有挑战性的,而且往往取决于任务。在这项工作中,我们引入了一个框架,即MaxInforRL,以平衡内在和外部探索。MaxInforRL引导探索走向信息化转型,通过最大限度地增加内在回报,如基本任务的信息收益。在与博尔茨曼探索相结合时,这一方法自然地将价值功能的最大化与对状态、奖赏和行动等诱变作用的最大化进行交易。我们表明,我们的方法在简化多武装匪徒的设置过程中实现了亚线性遗憾。我们随后将这一通用的表述用于一系列不受政策约束的示范RL方法,用于持续的州-行动空间,产生新的算法,从而在硬性探索问题和视觉控制任务等复杂情景中实现优绩。

Article 35

Title@2025-07-31 (4): Consistent Point Matching

Title: Consistent Point Matching

Konsistente Punktgleichung

统一点匹配 2507.23609v1

Authors (2): Halid Ziya Yerebakan, Gerardo Hermosillo Valadez

This study demonstrates that incorporating a consistency heuristic into the point-matching algorithm \cite{yerebakan2023hierarchical} improves robustness in matching anatomical locations across pairs of medical images. We validated our approach on diverse longitudinal internal and public datasets spanning CT and MRI modalities. Notably, it surpasses state-of-the-art results on the Deep Lesion Tracking dataset. Additionally, we show that the method effectively addresses landmark localization. The algorithm operates efficiently on standard CPU hardware and allows configurable trade-offs between speed and robustness. The method enables high-precision navigation between medical images without requiring a machine learning model or training data.

这项研究表明,在点对称算法\cite{yerebakan2023hiarartic}中纳入一致性超值法可以提高对相对医学图像的解剖位置进行匹配的稳健性。我们验证了我们关于跨CT和MRI模式的不同纵向内部和公共数据集的方法。值得注意的是,它超过了深 Lesion跟踪数据集的最新结果。此外,我们表明,该方法有效地解决了里程碑式的定位问题。该算法在标准的 CPU 硬件上运作高效,允许在速度和稳健性之间进行可配置的权衡。该方法可以在不需要机器学习模型或培训数据的情况下对医疗图像进行高精度导航。

Article 36

Title@2025-07-31 (4): Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Title: Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Assessments

具有不确定性估计值的临床试验的深入学习预测 2507.23607v1

Authors (4): Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther

Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.

临床试验是评估新药物或新疗法的安全和效能的系统性努力。进行这种试验通常需要大量的资金投资和仔细规划,强调准确预测试验结果的必要性。准确预测病人入学是试验成功的一个关键因素,这是规划阶段的主要挑战之一。在这项工作中,我们提出一种新的深层次的基于学习的方法来应对这一重大挑战。我们作为一种神经网络模型采用的方法,利用预先训练的语言模型(PLM)来捕捉临床文件的复杂性和细微差别,将其转化为直观的表述。然后,这些演示与通过关注机制编码的表格特征相结合。为了说明入学预测的不确定性,我们根据伽马分布,用概率层加强模型,从而能够进行范围估计。我们采用拟议的模型来预测临床试验期限,假设现场一级的招生遵循Poisson-Gamma进程。我们对现实世界临床试验数据进行了广泛的实验,并表明拟议的方法可以有效地预测在一定的临床试验地点注册的病人人数,超过既定基线模型。

Article 37

Title@2025-07-31 (4): Hierarchical Message-Passing Policies for Multi-Agent Reinforcement Learning

Title: Hierarchical Message-Passing Policies for Multi-Agent Reinforcement Learning

Hierarchische Message-Passing-Politiken für das Mehr-Agenten-Verstärkungs-Lernen

促进多机构强化学习的等级信息传递政策 2507.23604v1

Authors (3): Tommaso Marzi, Cesare Alippi, Andrea Cini

Decentralized Multi-Agent Reinforcement Learning (MARL) methods allow for learning scalable multi-agent policies, but suffer from partial observability and induced non-stationarity. These challenges can be addressed by introducing mechanisms that facilitate coordination and high-level planning. Specifically, coordination and temporal abstraction can be achieved through communication (e.g., message passing) and Hierarchical Reinforcement Learning (HRL) approaches to decision-making. However, optimization issues limit the applicability of hierarchical policies to multi-agent systems. As such, the combination of these approaches has not been fully explored. To fill this void, we propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies. We adopt the feudal HRL framework and rely on a hierarchical graph structure for planning and coordination among agents. Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level. To learn hierarchical multi-agent policies, we design a novel reward-assignment method based on training the lower-level policies to maximize the advantage function associated with the upper levels. Results on relevant benchmarks show that our method performs favorably compared to the state of the art.

分散式多机构强化学习(MARL)方法有助于学习可扩缩多机构政策,但受到部分可观察性和诱发的非常态性的影响。这些挑战可以通过采用便利协调和高层规划的机制加以解决。具体地说,协调和时间抽象可以通过沟通(例如传递信息)和等级式强化学习(HRL)方法实现。不过,优化问题限制了等级政策对多机构系统的适用性。因此,这些方法的结合尚未得到充分探讨。为填补这一空白,我们提出了一种创新和有效的方法,用于学习多机构的信息传递政策等级。我们采用了封建式的HRL框架,并依靠一个等级图结构来规划和协调代理人。下层的代理人从高层获得目标并与同一级别的邻接者交流信息。为了学习等级级多机构政策,我们设计了一种新的奖励分配方法,其基础是培训较低级别的政策,以最大限度地发挥与上层相关的优势功能。关于相关基准的结果显示,我们的方法优于国家艺术的优势。

Article 38

Title@2025-07-31 (4): EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve Resolution

Title: EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve Resolution

EB-gMCR: Energiebasierte Generative Modellierung für Signalunmixing und Multivariate Kurvenauflösung

EB-gMCR: 以能源为基础的信号融合和多变量曲线分辨率生成模型 2507.23600v1

Authors (2): Yu-Tang Chang, Shih-Fang Chen

Signal unmixing analysis decomposes data into basic patterns and is widely applied in chemical and biological research. Multivariate curve resolution (MCR), a branch of signal unmixing, separates mixed chemical signals into base patterns (components) and their concentrations, playing a key role in understanding composition. Classical MCR is typically framed as matrix factorization (MF) and requires a user-specified component count, usually unknown in real data. As dataset size or component count increases, the scalability and reliability of MF-based MCR face significant challenges. This study reformulates MCR as a generative process (gMCR), and introduces an energy-based deep learning solver, EB-gMCR, that automatically discovers the smallest component set able to reconstruct the data faithfully. EB-gMCR starts from a large candidate pool (e.g., 1024 spectra) and employs a differentiable gating network to retain only active components while estimating their concentrations. On noisy synthetic datasets containing up to 256 latent sources, EB-gMCR maintained R^2 >= 0.98 and recovered the component count within 5% of the ground truth; at lower noise it achieved R^2 >= 0.99 with near exact component estimation. Additional chemical priors, such as non-negativity or nonlinear mixing, enter as simple plug-in functions, enabling adaptation to other instruments or domains without altering the core learning process. By uniting high-capacity generative modeling and hard component selection, EB-gMCR offers a practical route to large-scale signal unmixing analysis, including chemical library-driven scenarios. The source code is available at https://github.com/b05611038/ebgmcr_solver.

典型的 MCR 通常以矩阵因子化(MF) 格式化为框架,通常在真实数据中不为人知。随着数据集大小或组件计数的增加,基于MF1的MCR的缩放性和可靠性将面临重大挑战。这项研究将MCR 重新配置为一个基因化过程(gMCR),并引入一个基于能量的深学习求解器(EB-GMCR),将混合化学信号分解成基础模式(构件)及其浓度,在理解构成方面发挥着关键作用。经典 MCR通常以矩阵因子化(MF)为框架,需要用户指定的组件计数,通常在真实数据中未知。随着数据集大小或组件计数的增加,基于MFMCR的缩放性和可靠性,ECR% 2 和基于能量的深度深层解析解析解析(EBRQR% 9) ,在快速解析(包括快速解析)之前将其它部分计在内。

Article 39

Title@2025-07-31 (4): Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots

Title: Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots

Geteilte Aufmerksamkeit: Unüberwachte Multi-Objekt-Entdeckung mit kontextuell getrennten Slots

分散注意: 未监督的多对象发现, 带有上下文分隔的空格 2304.01430v3

Authors (5): Dong Lao, Zhengyang Hu, Francesco Locatello, Yanchao Yang, Stefano Soatto

We investigate the emergence of objects in visual perception in the absence of any semantic annotation. The resulting model has received no supervision, does not use any pre-trained features, and yet it can segment the domain of an image into multiple independently moving regions. The resulting motion segmentation method can handle an unknown and varying number of objects in real-time. The core multi-modal conditional encoder-decoder architecture has one modality (optical flow) feed the encoder to produce a collection of latent codes (slots), and the other modality (color image) conditions the decoder to generate the first modality (flow) from the slots. The training criterion is designed to foster ‘information separation’ among the slots, while the architecture explicitly allocates activations to individual slots, leading to a method we call Divided Attention (DivA). At test time, DivA handles a different number of objects and different image resolution than seen at training, and is invariant to permutations of the slots. DivA achieves state-of-the-art performance while tripling the runtime speed of comparable methods, up to 104 FPS, and reduces the performance gap from supervised methods to 12% or less. Objects bootstrapped by DivA can then be used to prime static classifiers via contrastive learning. On fewer than 5,000 video clips, training DINO on DivA’s object proposals narrows the performance gap to ImageNet-based training by up to 30.2% compared to training directly on the video frames.

在没有任何语义注释的情况下,我们调查视觉视觉上出现的物体出现。生成的模型没有受到监督, 没有使用任何预先训练的特性, 但它可以将图像的域分割成多个独立的移动区域。由此产生的运动分割法可以实时处理未知和不同数量的对象。核心多式有条件编码解码器结构有一个模式( 光流) 向编码器提供一种( 光流) , 以生成隐藏代码集( 绘图) 和其他模式( 彩色图像) , 使解码器处于从空格生成第一个模式( 流) 的条件。培训标准旨在将图像的域分割成多个独立移动的区域。培训标准旨在将图像域分隔为“ 信息 ” , 而该结构则明确将激活到各个空格中, 导致我们称之为分散注意( DivA) 的方法。在测试时, DivA 处理不同数量的物体和不同的图像分辨率解析度, 并且无法通过空格来缩小空格。 DivA 实现状态, 和通过运行时间段的图像解析变距速度比 A 10- PSBSBS

Article 40

Title@2025-07-31 (4): SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

Title: SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

SinBasis Networks: Matrix-äquivalente Feature-Extraktion für wellenähnliche optische Spektrogramme

Sinbasis 网络: 类似光谱仪波的矩阵等效特征提取 2505.06275v2

Authors (4): Yuzhou Zhu, Zheng Zhang, Ruyi Zhang, Liang Zhou

Wave-like images-from attosecond streaking spectrograms to optical spectra, audio mel-spectrograms and periodic video frames-encode critical harmonic structures that elude conventional feature extractors. We propose a unified, matrix-equivalent framework that reinterprets convolution and attention as linear transforms on flattened inputs, revealing filter weights as basis vectors spanning latent feature subspaces. To infuse spectral priors we apply elementwise $\sin(\cdot)$ mappings to each weight matrix. Embedding these transforms into CNN, ViT and Capsule architectures yields Sin-Basis Networks with heightened sensitivity to periodic motifs and built-in invariance to spatial shifts. Experiments on a diverse collection of wave-like image datasets-including 80,000 synthetic attosecond streaking spectrograms, thousands of Raman, photoluminescence and FTIR spectra, mel-spectrograms from AudioSet and cycle-pattern frames from Kinetics-demonstrate substantial gains in reconstruction accuracy, translational robustness and zero-shot cross-domain transfer. Theoretical analysis via matrix isomorphism and Mercer-kernel truncation quantifies how sinusoidal reparametrization enriches expressivity while preserving stability in data-scarce regimes. Sin-Basis Networks thus offer a lightweight, physics-informed approach to deep learning across all wave-form imaging modalities.

我们提出了一个统一、矩阵等效框架,随着平坦输入线性变换而重新解释变异和注意力。将过滤权重作为基础矢量, 覆盖潜伏特征子空间。我们将光谱前端用于每个重量矩阵的元素 $\sin(cdot) 映射。将这些转换成CNN、ViT和Capsule结构的图像框架定期视频框架, 生成出对周期性模型的高度敏感度的Sin- Basis网络, 并随着空间变异而构建。实验以多种方式收集波状图像数据集, 包括80 000个在二次连动谱图中合成的合成矢量, 数千个Raman, 光光光亮度和FTIR光度光谱, 所有来自AudioSet 和循环平台的Syprographrographrames。从 Kindics- Viestrial-emplainal-commilal- syal-remadeal- salstalalal-reportalalalalalalal- trainal- trainal- trainal- truplievental- translation) 和Smal- sal- sal- translational- smalational- trupal- truplational- trupal- trupal- trupal- trupal- trupal- trupal- trupal- trupal- trupal- tral- tral- tral- 分析, 的快速化法化法化, 分析, , 的快速分析如何如何进行如何在稳定化分析中, 和制成, 和制式的快速分析, , 和制的快速分析如何如何如何。

Article 41

Title@2025-07-31 (4): Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding

Title: Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding

Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding

路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v2

Authors (7): Shiyue Wang, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, Wenhao Li

Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.

多机构路径定位(MAPF)是人工智能和机器人研究中一个根本问题,需要计算从最初地点到指定目标的多种代理商的无碰撞路径。随着自动系统在仓库、城市交通和其他复杂环境中日益盛行,MAPF已经从理论挑战演变为现实世界多机器人协调的关键推动者。这一全面调查弥合了传统算法方法与MAPF研究中新出现的基于学习的方法之间的长期差距。我们提出了一个统一的框架,其中包括基于搜索的方法(包括基于冲突的搜索、基于优先权的搜索和大型邻里搜索)、日益基于汇编的方法(SAT、SMT、CSP、ASP和MIP的制定)以及数据驱动技术(加强学习、监管的学习和混合战略)。通过系统分析200+文件的实验做法,我们发现评价方法存在重大差异,典型方法通常在更大规模的解决方案中测试(超过200个网络,有1 000+的参考工具),而基于学习的方法(主要为10100个内行者)、基于汇编的方法(SAT、SMER、SAP和MIMF的深度模型选择,我们作为未来标准化的常规排序选择方法,我们最终的模型选择,包括标准化的标准化的模型,我们作为基础的模型选择。

Article 42

Title@2025-07-31 (4): GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

Title: GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

GraphRAG-R1: Graph Retrieval-Augmented Generation mit prozessabhängigem Verstärkungslernen

图图RAG-R-1:具有过程限制的加强学习的回流-加速一代图 2507.23581v1

Authors (11): Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang

Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.

Retrieval-Auged General(GraphRAG)在利用图表结构促进知识代表性和模拟复杂的现实-世界关系,提高LLMS的推理能力方面显示了巨大的效力。然而,现有的GragRAG方法在处理需要多点推理的复杂问题时,仍然面临着巨大的瓶颈,因为其查询和检索阶段主要基于预先定义的超自然理论,没有充分利用LLMS的推理潜力。为了解决这一问题,我们建议GraphRAG-R1是一个适应性的GreaGRG框架,通过对LLMS进行过程限制的基于成果的强化学习(RL)来培训LMS(RL)以加强多点推理能力。我们的方法可以解析复杂的问题,自动引用检索工具获取必要的信息,并进行有效推理。具体地说,我们使用一个修改版的GROPO(GPO)组相对政策最佳化(GROPO)支持运用思考能力推出的推理方法。接下来,我们设计两个过程有限制的奖励功能。为了处理浅的检索问题, 我们设计一个进步的REARD-Review-Review-AF-Recalde(PRAD)用来鼓励基本的推算方法,然后用必要的实验的推算方法来研究。

Article 43

Title@2025-07-31 (4): Neutral Residues: Revisiting Adapters for Model Extension

Title: Neutral Residues: Revisiting Adapters for Model Extension

Neutrale Rückstände: Adapter zur Modellerweiterung

中立残留物:重新审视适应器,用于示范推广 2410.02744v3

Authors (3): Franck Signe Talla, Edouard Grave, Hervé Jégou

We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.

我们处理将预先培训的大型语言模式扩大到培训期间没有看到的新领域的问题。标准技术,如微调或低级别适应(LORA)在领域适应方面是成功的,但并不正式增加模型能力。这往往导致在新领域表现良好与原始领域表现有辱人格之间取舍。在这里,我们重新审视和改进适应器,将LLMS从三个角度扩大:数据、结构和培训程序,这三者是共同考虑的优势。由此产生的方法,称为中性残留物,将适应器改变为导致每个新的剩余块到原始领域接近零的输出。当将最初接受英语培训的先进模型改造成新语言时,这一解决方案将带来强有力的结果。中性残留物在学习新语言和不忘英语之间的交易中,大大超越了微调、LORA或香草适应器等相互竞争的方法。

Article 44

Title@2025-07-31 (4): Optimised Feature Subset Selection via Simulated Annealing

Title: Optimised Feature Subset Selection via Simulated Annealing

Optimierte Feature-Subset-Auswahl über Simuliertes Annealing

通过模拟 Annaaling 模拟优化功能子集选择 2507.23568v1

Authors (5): Fernando Martínez-García, Álvaro Rubio-García, Samuel Fernández-Lorenzo, Juan José García-Ripoll, Diego Porras

We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.

我们引入了SA-FDR, 这是一种用于 $@ ell_ 0$- norm 特性选择的新算法, 它将这项任务视为组合优化问题, 并通过使用模拟肛门对地谱子集的空间进行全球搜索来解决它。优化以Fisher- Fredit 比例为指南, 我们使用该比例作为分类任务模型质量的计算高效替代物。我们在有数十万个样本和数百个特征的数据集上进行的实验表明, SA- FDR 一贯选择更紧凑的特性子集, 并实现高预测性准确度。这种恢复信息化但最小的特征组合的能力源于其捕捉贪婪的优化方法常常遗漏的地物依赖性的能力。结果, SA- FDR 为设计高维环境的可解释模型提供了灵活而有效的解决方案, 特别是当模型宽度、可解释性和性以及性非常关键时。

Article 45

Title@2025-07-31 (4): Momentum-based gradient descent methods for Lie groups

Title: Momentum-based gradient descent methods for Lie groups

Momentumbasierte Gradientenabstufungsmethoden für Lie-Gruppen

针对 “ 骗子 “ 群体的基于动力的梯度梯度下降方法 2404.09363v2

Authors (3): Cédric M. Campos, David Martín de Diego, José Torrente

Polyak’s Heavy Ball (PHB; Polyak, 1964), a.k.a. Classical Momentum, and Nesterov’s Accelerated Gradient (NAG; Nesterov, 1983) are well-established momentum-descent methods for optimization. Although the latter generally outperforms the former, primarily, generalizations of PHB-like methods to nonlinear spaces have not been sufficiently explored in the literature. In this paper, we propose a generalization of NAG-like methods for Lie group optimization. This generalization is based on the variational one-to-one correspondence between classical and accelerated momentum methods (Campos et al., 2023). We provide numerical experiments for chosen retractions on the group of rotations based on the Frobenius norm and the Rosenbrock function to demonstrate the effectiveness of our proposed methods, and that align with results of the Euclidean case, that is, a faster convergence rate for NAG.

Polyak的重球(PHB;Polyak,1964年), a.k.a.a. 经典潮流和Nesterov的加速梯度(NAG;Nesterov,1983年)是完善的优化动力-加速法(NAG;Nesterov,1983年),尽管后者总体上优于前者,主要是PHB类方法对非线性空间的概括,文献中对此没有进行充分探讨。在本文件中,我们建议对类似NAG的精密组优化方法进行概括化。这种概括化的基础是古典和加速加速加速加速动力方法之间的一对一对应(Campos等人,2023年)。我们根据Frobenius规范和Rosenbrock功能,为选择的轮回组合提供了数字实验,以显示我们拟议方法的有效性,并与Euclidean案的结果相一致,即加速NAG的趋同率。

Article 46

Title@2025-07-31 (4): Weighted least-squares approximation with determinantal point processes and generalized volume sampling

Title: Weighted least-squares approximation with determinantal point processes and generalized volume sampling

Gewichtete am wenigsten quadratische Annäherung mit determinativen Punktprozessen und generalisierter Volumen-Probenahme

带有确定点过程和通用量抽样的加权最小方平方近似值 2312.14057v4

Authors (2): Anthony Nouy, Bertrand Michel

We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\boldsymbol{\varphi}$, using evaluations of the function at random points $x_1, \dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\boldsymbol{\varphi}(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation error in $L^2$ is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.

我们考虑的是,如果使用独立和相同分布点来回顾最佳加权最低方位的某些结果,则使用预测定点进程(DPP)或量抽样来计算加权最低方位。这些分布使某些特性中促进多样性的点之间产生依赖性,用某种特写地图 $\boldsymbol_varphie}$\boldsymbol_parphil$(x_i) 美元。我们首先在随机点对函数进行评估后,用随机点 $x_1,\dorts,x_n美元来计算该函数。在回顾一些最佳加权最低方位使用独立分布点的点的结果后,我们考虑使用预测点点点(DPP)或量取样量抽样取样的某个要素(美元)之间产生依赖性关系。此外,我们进一步假设某些规范的矢量(美元)的基数(Boldicalime) 的计算结果,用最精确的值($xxxxmillal_l) 或最精确的计算结果(美元),用最精确的数值(美元)的数值(美元),用最精确的数值(美元)的数值(美元)的数值(美元)计算,用最精确的数值(美元)的数值(美元)的数值(美元)的数值(美元)的数值(美元)的计算,用最精确的数值(美元)的计算)。

Article 47

Title@2025-07-31 (4): Optimal and Near-Optimal Adaptive Vector Quantization

Title: Optimal and Near-Optimal Adaptive Vector Quantization

Optimale und nahezu optimale adaptive Vektor-Quantisierung

最佳和近近最佳适应性 2402.03158v2

Authors (4): Ran Ben-Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, Shay Vargaftik

Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.

量化是许多机器学习使用案例的基本优化, 包括压缩梯度、模型重量和激活以及数据集。最准确的量化形式是 \ emph{ adaptition} , 即对给定输入的错误最小化, 而不是对最坏的输入优化。然而, 最佳适应量化方法的运行时间和记忆要求被认为是不可行的。我们重新审视适应性矢量化( AVQ) 问题, 并提出算法, 找到最佳的解决方案, 且时间和空间的简单改进。我们还为大量输入提供了更快的近乎最佳的算法。我们的实验显示我们的算法可能会打开在各种机器学习应用中更广泛地使用 AVQ 的大门。

Article 48

Title@2025-07-31 (4): Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform

Title: Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform

Hardware-Aware Feintuning von Spiking Q-Netzwerken auf der SpiNNaker2 Neuromorphic Platform

SpinNNNAK2 神经变形平台SpiNNAKK QNetwork 的硬件- 硬件- 软件精密配置 2507.23562v1

Authors (3): Sirine Arfa, Bernhard Vogginger, Christian Mayr

Spiking Neural Networks (SNNs) promise orders-of-magnitude lower power consumption and low-latency inference on neuromorphic hardware for a wide range of robotic tasks. In this work, we present an energy-efficient implementation of a reinforcement learning (RL) algorithm using quantized SNNs to solve two classical control tasks. The network is trained using the Q-learning algorithm, then fine-tuned and quantized to low-bit (8-bit) precision for embedded deployment on the SpiNNaker2 neuromorphic chip. To evaluate the comparative advantage of SpiNNaker2 over conventional computing platforms, we analyze inference latency, dynamic power consumption, and energy cost per inference for our SNN models, comparing performance against a GTX 1650 GPU baseline. Our results demonstrate SpiNNaker2’s strong potential for scalable, low-energy neuromorphic computing, achieving up to 32x reduction in energy consumption. Inference latency remains on par with GPU-based execution, with improvements observed in certain task settings, reinforcing SpiNNaker2’s viability for real-time neuromorphic control and making the neuromorphic approach a compelling direction for efficient deep Q-learning.

Spik NealNetworks(SNNS)承诺对神经形态硬件进行高压低电耗和低长推导,用于一系列广泛的机器人任务。在这项工作中,我们展示了使用四分制 SNNS 解决两个古典控制任务的强化学习(RL)算法的节能应用。该网络使用Q-学习算法进行了培训,然后对SpinnNAker2神经形态芯片的嵌入部署精确度进行了微调(8-位)微调和量化。为了评估SpinNNAker2相对于常规计算平台的比较优势,我们分析了SnNNE2模型的推断力、动态电耗和能源成本,对照GTX 1650 GPU基准对性能进行了比较。我们的结果表明SpinNNARC2在可缩放电、低能量神经形态计算方面的巨大潜力,达到32x能源消耗量的减少量。推导力仍与基于GPU的操作法相近,在某些任务环境中观察到的改进,加强了Spinne-national lacer 方法,加强了Spal-stable-stregregregrostal laction Qs for the rest for the recialction aprevental

Article 49

Title@2025-07-31 (4): Physics-informed Gaussian Processes as Linear Model Predictive Controller

Title: Physics-informed Gaussian Processes as Linear Model Predictive Controller

Physik-informierte Gaußsche Prozesse als linearer Modellvorhersageregler

作为线性模拟预测主计长 2412.04502v2

Authors (3): Jörn Tebbe, Andreas Besginow, Markus Lange-Hegermann

We introduce a novel algorithm for controlling linear time invariant systems in a tracking problem. The controller is based on a Gaussian Process (GP) whose realizations satisfy a system of linear ordinary differential equations with constant coefficients. Control inputs for tracking are determined by conditioning the prior GP on the setpoints, i.e. control as inference. The resulting Model Predictive Control scheme incorporates pointwise soft constraints by introducing virtual setpoints to the posterior Gaussian process. We show theoretically that our controller satisfies open-loop stability for the optimal control problem by leveraging general results from Bayesian inference and demonstrate this result in a numerical example.

我们引入了一种在跟踪问题中控制线性时间变数系统的新算法。控制器基于一个高斯进程( GP ) , 它的实现满足了一个以恒定系数的线性普通差分方程系统。控制跟踪投入的确定是通过在设定点上对以前的 GP 进行调节来确定的, 即将控制作为推论。由此产生的模型预测控制计划通过在后戈西亚进程中引入虚拟设置点, 纳入了点性软约束。我们从理论上显示, 我们的控制器通过利用巴耶西亚推论的一般结果, 满足了最佳控制问题的开放环的稳定性, 并以数字为例展示了这一结果。

Article 50

Title@2025-07-31 (4): Molecule Graph Networks with Many-body Equivariant Interactions

Title: Molecule Graph Networks with Many-body Equivariant Interactions

Molekulare Graphen-Netzwerke mit Vielkörper-Equivariant-Interaktionen

多体等同交互作用的分子图图网络 2406.13265v3

Authors (8): Zetian Mao, Chuan-Shen Hu, Jiawen Li, Chen Liang, Diptesh Das, Masato Sumita, Kelin Xia, Koji Tsuda

Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates l = 1 equivariant many-body interactions to enhance directional symmetric information in the message passing scheme. We provided a mathematical analysis demonstrating the necessity of incorporating many-body equivariant interactions and generalized the formulation to $N$-body interactions. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties.

电文传递神经网络显示,电文传递神经网络在预测分子相互作用方面有显著的功效。引入等量矢量表达法通过捕捉几何数据对称来增强表达性,从而提高模型准确性。然而,在电文传递过程中,对立的双体债券矢量可能会相互抵消,导致共享节点上的方向信息丢失。在本研究中,我们开发了“等量 N-body互动网络 ” ( ENINet) , 明确整合了 l = 1 等量性多体互动, 以加强电文传递方案中的方向对称信息。我们提供了数学分析,表明有必要纳入多体等量相互作用,并将配方推广到$N-体互动。实验表明,将多体间等量表示法整合可以提高不同变力和高量化学特性的预测准确性。

Article 51

Title@2025-07-31 (4): Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Title: Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Tile and Slide : Ein neuer Rahmen für die Skalierung von NeRF von lokaler bis globaler 3D-Erdbeobachtung

平板和幻灯片:从地方向全球3D地球观测扩大内域FF的新框架 2507.01631v2

Authors (4): Camille Billouard, Dawa Derksen, Alexandre Constantin, Bruno Vallet

Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.

神经辐射场(Neoral Radiance Fields)(NeRF)最近从多视卫星图像中成为3D重建的范例。然而,由于我们在本文件中研究的训练过程中的记忆足迹,最先进的NeRF方法通常局限于小场景。关于大型NeRFs的以往工作通过将场景分为NeRFs来将它平缓下来。本文介绍了蛇-NeRFs这个向大场景缩放的框架。我们的“核心外”方法消除了同时装入所有图像和网络的需要,并使用一个单一装置操作。我们通过将感兴趣的区域分为3D瓷砖的NRFs。重要的是,我们将图像与重叠进行裁剪,以确保每个NERFs都得到所有必要的像素的培训。我们引入了一个新型的2美元3D的推进策略和分层取样器,共同防止3D在台边的重建错误。我们的实验结论是,大型卫星图像可以有效地用直线时间复杂性处理,在单一的GPUP上进行,而没有质量妥协。

Article 52

Title@2025-07-31 (4): Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Title: Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Verbesserte Algorithmen für Kernel-Matrix-Vektor-Multiplikation unter Sparsamkeitsannahmen

改进内核矩阵矩阵-变量乘法乘法的数值 2507.23539v1

Authors (4): Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner

Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $K\in \mathbb{R}^{n\times n}$. $K$’s columns are indexed by a set of $n$ keys $k_1,k_2\ldots, k_n\in \mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,\ldots,q_n\in \mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-|q_i-k_j|_2^2/2\sigma^2}$ for some bandwidth parameter $\sigma>0$. Given a vector $x\in \mathbb{R}^n$ and error parameter $\epsilon>0$, our task is to output a $y\in \mathbb{R}^n$ such that $|Kx-y|_2\leq \epsilon |x|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.

受关注矩阵快速处理问题的影响, 我们研究用于计算基质- 矢量的基质- 矢量产品的快速算法 $K\ in\ mathbb{R\\n\timen n}$K\ k\\\tys n$。 $K的列由一组美元 $k_ 1, k_2\ldots, k_n\ in\ mathbb{R\d$, 行由一组美元查询$q_1,q_2,\ldots,q_n\in\mathb{Rd$, q_n_n\mathb{R}Rd$, $j$j, j$j$j$ 的输入是 $K\qq_q_i_k_k_k_ral_ral_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Article 53

Title@2025-07-31 (4): From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

Title: From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

Von LLMs bis Edge: Parametereffizientes Feintuning auf Edge-Geräten

从LLMs到边缘:边缘装置的参数-有效精密喷射 2507.23536v1

Authors (3): Georg Slamanig, Francesco Corti, Olga Saukh

Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.

参数高效微调(PEFT)方法减少了更新深层次学习模型的计算成本,通过将用于调整模型以适应下流任务的额外参数数量减少到最低程度,从而降低更新深层学习模型的计算成本。虽然在大型语言模型(LLMS)中进行了广泛研究,但其应用在边缘设备(如神经神经网络)中使用的较小模型中仍然未得到充分探讨。本文的基准和分析了通常在资源限制的边缘环境中部署的革命结构中流行的PEFT方法。我们评估了LORA、DoRA和Ga Loore更新标准和深层革命结构,以管理分销移动并容纳看不见的班级。我们最近提出的PyTorrch 剖析器将这些大型语言模型的更新性能和计算成本与传统的微调方法进行比较。我们考虑到资源效率,我们调查了它们在不同级别层面的更新行为。我们发现,经过评估的PEFT方法在应用深度可分离的同级结构时,与LMS相比,只有一半的记忆效率。相反的是,在针对横向配置结构时,在优化配置结构时,通过浮动应用时,可以更新PFTFTFT的硬化应用方法来选择基于浮动的硬化的硬化的硬化方法。

Article 54

Title@2025-07-31 (4): PurpCode: Reasoning for Safer Code Generation

Title: PurpCode: Reasoning for Safer Code Generation

PurpCode: Begründung für eine sicherere Code-Generierung

PurpCode:更安全代码生成的理由 2507.19060v2

Authors (14): Jiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, Gang Wang

We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.

我们引入了PurpCode(PurpCode)(PurpCode)(PurpCode)(PurpCode)(PurpCode)(Purcledge Learning)(这是培训安全代码推理模型的第一个培训后指南)(PurpCode)(这是培训安全代码推理模型的第一批培训后配方),旨在生成安全代码和防范恶意网络活动。PurpCode(Purp Learning)将一个推理模型分为两个阶段:(一) 规则学习,明确教授参考网络安全规则模式,以生成无脆弱性代码,避免为恶意网络活动提供便利;(二) 强化学习(Sergment Learning)(Sergment)(通过多种多目标奖励机制优化模式安全模式,维护模型的实用性,通过综合网络安全数据使培训管道具备能力,我们内部红队(refusal)将基于现实世界任务的全面和高覆盖性提示器,同时维护代码生成和共同安全知识的模型实用性。

Article 55

Title@2025-07-31 (4): Transparent AI: The Case for Interpretability and Explainability

Title: Transparent AI: The Case for Interpretability and Explainability

Transparente KI: Der Fall für Dolmetschbarkeit und Erklärbarkeit

透明 AI: 解释和解释的理由 2507.23535v1

Authors (6): Dhanesh Ramachandram, Himanshu Joshi, Judy Zhu, Dhari Gandhi, Lucas Hartman, Ananya Raval

As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI research and enabling industry adoption, we present key insights and lessons learned from practical interpretability applications across diverse domains. This paper offers actionable strategies and implementation guidance tailored to organizations at varying stages of AI maturity, emphasizing the integration of interpretability as a core design principle rather than a retrospective add-on.

随着人工情报系统越来越多地为跨部门的高层决策提供信息,透明度已成为负责任和值得信赖的AI执行工作的基础。我们发挥带头机构的作用,推进AI研究和扶持产业的采用,我们从不同领域的实用解释应用中提出了关键见解和经验教训。本文件为处于AI成熟程度不同阶段的组织提供了可采取行动的战略和执行指南,强调将可解释性整合为核心设计原则,而不是追溯性补充。

Article 56

Title@2025-07-31 (4): Continual Learning with Synthetic Boundary Experience Blending

Title: Continual Learning with Synthetic Boundary Experience Blending

Kontinuierliches Lernen mit synthetischer Grenzerfahrung Blending

与合成边界不断学习 2507.23534v1

Authors (3): Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

Continual learning (CL) aims to address catastrophic forgetting in models trained sequentially on multiple tasks. While experience replay has shown promise, its effectiveness is often limited by the sparse distribution of stored key samples, leading to overly simplified decision boundaries. We hypothesize that introducing synthetic data near the decision boundary (Synthetic Boundary Data, or SBD) during training serves as an implicit regularizer, improving boundary stability and mitigating forgetting. To validate this hypothesis, we propose a novel training framework, {\bf Experience Blending}, which integrates knowledge from both stored key samples and synthetic, boundary-adjacent data. Experience blending consists of two core components: (1) a multivariate Differential Privacy (DP) noise mechanism that injects batch-wise noise into low-dimensional feature representations, generating SBD; and (2) an end-to-end training strategy that jointly leverages both stored key samples and SBD. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet demonstrate that our method outperforms nine CL baselines, achieving accuracy improvements of 10%, 6%, and 13%, respectively.

持续学习(CL)旨在解决在连续训练的多任务模型中灾难性的遗忘问题。虽然经验重现显示前景,但其有效性往往因储存的关键样本分布稀少而受到限制,导致决定界限过于简化。我们假设在培训期间在决定边界(合成边界数据或SBD)附近引入合成数据是一种隐含的常规,可以提高边界稳定性和减轻遗忘。为了验证这一假设,我们提议了一个新颖的培训框架(bf 经验闪烁 ) , 将储存的关键样本和合成边界相邻数据的知识结合起来。经验混合由两个核心部分组成:(1) 多变量差异性差异性隐私(DP)噪音机制,将批量噪音注入低维特征表,生成SBD;(2) 端对端培训战略,共同利用储存的关键样本和SBD. 对CIFAR-10、CIFAR-100和Tiny图像网的广泛实验,显示我们的方法超越了CL的9个基线,分别实现了10%、6%和13%的精确度改进。

Article 57

Title@2025-07-31 (4): Diffusion Beats Autoregressive in Data-Constrained Settings

Title: Diffusion Beats Autoregressive in Data-Constrained Settings

Diffusion schlägt Autoregressive in datenbeschränkten Einstellungen

在受数据约束的设置中自动递减 2507.15857v4

Authors (5): Mihir Prabhudesai, Mengning Wu, Amir Zadeh, Katerina Fragkiadaki, Deepak Pathak

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings-where training involves repeated passes over limited data-and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We interpret this advantage as implicit data augmentation: masked diffusion exposes the model to a diverse distribution of token orderings and prediction tasks, unlike AR’s fixed left-to-right factorization. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. These results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.

长期以来,自动递减(AR)模型在大型语言模型的景观中占据了主导地位,推动了一系列广泛任务的进展。最近,基于扩散的语言模型作为一种有希望的替代模式出现,尽管它们相对于AR模型的优势仍未得到充分探讨。在本文件中,我们系统地研究数据封闭环境中的蒙面扩散模型,培训涉及对有限数据的反复传递,发现在计算数据时,这些模型明显优于AR模型,但在计算数据丰富但数据稀少。扩散模型更好地利用了重复的数据,实现了较低的验证损失和更高的下游性能。我们将此优势解释为隐含的数据增强:遮蔽的传播将模型暴露在代号订单和预测任务的不同分布上,与AR的固定左对右系数化不同。我们为扩散模型找到新的缩放法,并为关键折算阈值的表达方式,而扩散开始超过AR。这些结果显示,当数据(不是编译的)是瓶颈时,扩散模型为标准的AR模型提供了令人信服的替代方法。我们的代码可以在 https://difcilting-scalinging.github.io.

Article 58

Title@2025-07-31 (4): H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

Title: H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

H-RDT: Menschliche Manipulation verbessert bimanuelle Robotermanipulation

H-RDT:人类操纵增强二手机械操纵 2507.23523v1

Authors (7): Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu

Imitation learning for robotic manipulation faces a fundamental challenge: the scarcity of large-scale, high-quality robot demonstration data. Recent robotic foundation models often pre-train on cross-embodiment robot datasets to increase data scale, while they face significant limitations as the diverse morphologies and action spaces across different robot embodiments make unified training challenging. In this paper, we present H-RDT (Human to Robotics Diffusion Transformer), a novel approach that leverages human manipulation data to enhance robot manipulation capabilities. Our key insight is that large-scale egocentric human manipulation videos with paired 3D hand pose annotations provide rich behavioral priors that capture natural manipulation strategies and can benefit robotic policy learning. We introduce a two-stage training paradigm: (1) pre-training on large-scale egocentric human manipulation data, and (2) cross-embodiment fine-tuning on robot-specific data with modular action encoders and decoders. Built on a diffusion transformer architecture with 2B parameters, H-RDT uses flow matching to model complex action distributions. Extensive evaluations encompassing both simulation and real-world experiments, single-task and multitask scenarios, as well as few-shot learning and robustness assessments, demonstrate that H-RDT outperforms training from scratch and existing state-of-the-art methods, including Pi0 and RDT, achieving significant improvements of 13.9% and 40.5% over training from scratch in simulation and real-world experiments, respectively. The results validate our core hypothesis that human manipulation data can serve as a powerful foundation for learning bimanual robotic manipulation policies.

机器人操纵的模拟学习面临一个根本性挑战:大规模、高质量机器人演示数据稀缺。最近的机器人基础模型往往在交叉渗透机器人数据集上进行预培训,以增加数据规模,同时它们面临巨大的局限性,因为不同机器人的多种形态和动作空间不同,使得统一培训具有挑战性。在本论文中,我们介绍了H-RDT(人类到机器人扩散变异变异器),这是利用人类操纵数据提高机器人操纵能力的新办法。我们的主要洞察力是,与配对的 3D 手的大型自我中心人类操纵视频显示有丰富的行为前科,可以捕捉自然操纵策略,并有利于机器人政策学习。我们引入了两个阶段的重要培训模式:(1) 大规模自我自我中心人类操纵数据培训前和动作空间,以及使用模块化动作导变变变变变变变器,以2B参数为基础,H-RDT使用与模型复杂的动作分布相匹配。广泛的评价包括模拟和真实世界实验、单级和真实操作和多级模型基础,以13RD的模拟、单项操作和模拟模型形式展示现有数据,作为模拟和模拟的学习基础。

Article 59

Title@2025-07-31 (4): TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

Title: TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

TPP-SD: Beschleunigung der Transformer-Punkt-Prozedursampling mit spekulativer Dekodierung

TPP-SD:加速变速点进程与投机代号抽样 2507.09252v2

Authors (5): Shukai Gong, Yiyang Fu, Fengyuan Ran, Quyu Kong, Feng Zhou

We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6$\times$ speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency. TPP-SD bridges the gap between powerful Transformer TPP models and the practical need for rapid sequence sampling.

我们建议采用TPP-SD, 这是一种通过调整语言模型的投机性解码(SD)技术来加速变换时间点进程抽样的新办法。通过查明TPP减缩算法和语言模型投机性解码之间的结构相似性,我们制定了高效的抽样框架,利用一个较小的模型草案来产生多种候选事件,然后由较大的目标模型平行核实。TPP-SD保持与自动递减抽样相同的产出分布,同时实现显著加速。合成和真实数据集实验表明,我们的方法从相同的分布中产生样本,作为标准方法,但以2-6美元计时的加速法。我们的通货膨胀研究分析了超参数的影响,例如长度草案和模型草案对采样效率的影响。TPP-SD将强大的变换TPP模型与快速序列取样的实际需要之间的差距拉近。

Article 60

Title@2025-07-31 (4): Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

Title: Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

Unterschiedlich Private Clipped-SGD: Hochwahrscheinlichkeitskonvergenz mit willkürlicher Clipping-Ebene

区别私人的Cllipped-SGD:高概率与任意缩小水平相融合 2507.23512v1

Authors (5): Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horváth, Eduard Gorbunov

Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $\alpha$-th moment assumption, $\alpha \in (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.

渐进式剪报是深层学习的一个基本工具,它改进了SGD、AdaGrad和Adam等随机第一阶方法的高概率趋同,在大型语言模型培训中常见的重尾噪音下,它也是差异隐私机制的重要组成部分。然而,现有的高概率趋同分析通常要求剪报阈值随着优化步骤的增加而增加,而优化步骤与Gaussian机制等标准的DP机制不相容。在这项工作中,我们缩小了这一差距,为DP-Clash-SGD提供了第一次高概率汇合分析,其固定剪辑等级适用于在重尾噪音下的convex和非convex平稳优化,其特点是以约束中央的$alpha$-th 假设(1,2,2美元)为特征,而目前的结果显示,在固定剪辑级别上,该方法会与比现有机制更快的最佳解决方案相近。邻区可以与DP引入的噪音相平衡,提供精确的贸易速度和保证之间的保密性。

Article 61

Title@2025-07-31 (4): A Verifier Hierarchy

Title: A Verifier Hierarchy

Eine Prüferhierarchie

验证者等级分层 2507.23504v1

Authors (1): Maurits Kaptein

We investigate the trade-off between certificate length and verifier runtime. We prove a Verifier Trade-off Theorem showing that reducing the inherent verification time of a language from (f(n)) to (g(n)), where (f(n) \ge g(n)), requires certificates of length at least (\Omega(\log(f(n) / g(n)))). This theorem induces a natural hierarchy based on certificate complexity. We demonstrate its applicability to analyzing conjectured separations between complexity classes (e.g., (\np) and (\exptime)) and to studying natural problems such as string periodicity and rotation detection. Additionally, we provide perspectives on the (\p) vs. (\np) problem by relating it to the existence of sub-linear certificates.

我们调查证书长度和验证运行时间之间的权衡。我们证明一个验证器交易理论显示, 将一种语言的固有核查时间从( f(n)\) 到( g(n)), 在那里, (f(n)\ g(n)) 需要至少 (((f(n)/ g(n)))\ ) 的长度证书。这个理论引出了基于证书复杂性的自然等级。我们证明它适用于分析复杂等级(例如,(n)(n)\ ) 和(\ (\\ explittime\ ) ) 之间的预测分解, 以及研究字符串周期和旋转检测等自然问题。此外, 我们通过将它与子线性证书的存在联系起来来提供对(\ p) vs. (\\) 问题的看法。 (np) 问题。

Article 62

Title@2025-07-31 (4): Directional Ensemble Aggregation for Actor-Critics

Title: Directional Ensemble Aggregation for Actor-Critics

Regie-Ensemble Aggregation für Schauspieler-Kritik

行为者-批评者方向集合群 2507.23501v1

Authors (4): Nicklas Werge, Yi-Shan Wu, Bahareh Tasdighi, Melih Kandemir

Off-policy reinforcement learning in continuous control tasks depends critically on accurate $Q$-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propose Directional Ensemble Aggregation (DEA), an aggregation method that adaptively combines $Q$-value estimates in actor-critic frameworks. DEA introduces two fully learnable directional parameters: one that modulates critic-side conservatism and another that guides actor-side policy exploration. Both parameters are learned using ensemble disagreement-weighted Bellman errors, which weight each sample solely by the direction of its Bellman error. This directional learning mechanism allows DEA to adjust conservatism and exploration in a data-driven way, adapting aggregation to both uncertainty levels and the phase of training. We evaluate DEA across continuous control benchmarks and learning regimes - from interactive to sample-efficient - and demonstrate its effectiveness over static ensemble strategies.

在连续控制任务中进行强化政策学习,关键取决于准确的美元价值估计。在集合上进行保守的汇总,例如采用最低限度的集合,通常用于减少过高估计偏差。然而,这些静态规则粗糙,抛弃了来自共同组合的宝贵信息,无法适应具体任务的需求或不同的学习制度。我们提议了 “ 方向聚合 “ (DEA),这是一种综合方法,适应性地将数值估算与行为者-行为者-批评框架结合起来。 DEA引入了两个完全可以学习的方向参数:一个是调整批评方保守主义,另一个是指导行为方政策探索的参数。这两个参数都是利用共同的分歧加权贝尔曼错误来学习的,每个样本的权重完全取决于其贝尔曼错误的方向。这个方向学习机制使DEA能够以数据驱动的方式调整保守主义和探索,将汇总既适应不确定性水平,又适应培训阶段。我们评估DEDA从连续控制基准和学习制度—-从互动到抽样效率—-到展示其效力,并展示其相对于静态组合战略的有效性。

Article 63

Title@2025-07-31 (4): Incorporating structural uncertainty in causal decision making

Title: Incorporating structural uncertainty in causal decision making

Einbeziehung struktureller Unsicherheit in die kausale Entscheidungsfindung

将结构性不确定性纳入因果决策 2507.23495v1

Authors (1): Maurits Kaptein

Practitioners making decisions based on causal effects typically ignore structural uncertainty. We analyze when this uncertainty is consequential enough to warrant methodological solutions (Bayesian model averaging over competing causal structures). Focusing on bivariate relationships ($X \rightarrow Y$ vs. $X \leftarrow Y$), we establish that model averaging is beneficial when: (1) structural uncertainty is moderate to high, (2) causal effects differ substantially between structures, and (3) loss functions are sufficiently sensitive to the size of the causal effect. We prove optimality results of our suggested methodological solution under regularity conditions and demonstrate through simulations that modern causal discovery methods can provide, within limits, the necessary quantification. Our framework complements existing robust causal inference approaches by addressing a distinct source of uncertainty typically overlooked in practice.

根据因果关系作出决定的从业者通常忽视结构性不确定性。我们分析这种不确定性何时产生足以证明有必要采用方法解决办法( Bayesian 模型,平均取代相互竞争的因果结构 ) 。侧重于双轨关系(X\rightrow Y$对 $X leftorrow Y$),我们确定,当以下情况下,平均模式是有益的:(1) 结构不确定性中度至高,(2) 结构之间的因果关系差异很大,(3) 损失功能对因果效应的规模足够敏感。我们证明,在正常条件下,我们建议的方法解决办法取得了最佳效果,并通过模拟证明,现代因果发现方法可在限度内提供必要的量化。我们的框架通过处理实践中通常被忽视的明显不确定性来源,补充了现有的稳健的因果推断方法。

Article 64

Title@2025-07-31 (4): Neural-ANOVA: Analytical Model Decomposition using Automatic Integration

Title: Neural-ANOVA: Analytical Model Decomposition using Automatic Integration

Neural-ANOVA: Analytische Modellzersetzung mit automatischer Integration

神经-ANOVA:使用自动集成法分析模型分解 2408.12319v2

Authors (3): Steffen Limmer, Steffen Udluft, Clemens Otte

The analysis of variance (ANOVA) decomposition offers a systematic method to understand the interaction effects that contribute to a specific decision output. In this paper we introduce Neural-ANOVA, an approach to decompose neural networks into the sum of lower-order models using the functional ANOVA decomposition. Our approach formulates a learning problem, which enables fast analytical evaluation of integrals over subspaces that appear in the calculation of the ANOVA decomposition. Finally, we conduct numerical experiments to provide insights into the approximation properties compared to other regression approaches from the literature.

差异(ANOVA)分解分析提供了一种系统的方法来理解有助于具体决定产出的相互作用效应。在本文件中,我们采用了神经-ANOVA,一种利用功能性ANOVA分解法将神经网络分解成低级模型总和的方法。我们的方法形成了一个学习问题,它使得能够对亚空间的集成进行快速分析评价,这种评价在计算ANOVA分解法中出现。最后,我们进行了数字实验,以提供与文献中其他回归法相比较的近似特性的洞察。

Article 65

Title@2025-07-31 (4): Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus

Title: Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus

Erklärbares Modell für künstliche Intelligenz zur Vorhersage des Risikos einer Gesamtsterblichkeit bei Patienten mit Typ-2-Diabetes mellitus

可解释的人工智能模型,预测2型糖尿病患者因各种原因死亡的风险 2507.23491v1

Authors (10): Olga Vershinina, Jacopo Sabbatinelli, Anna Rita Bonfigli, Dalila Colombaretti, Angelica Giuliani, Mikhail Krivonosov, Arseniy Trukhanov, Claudio Franceschi, Mikhail Ivanchenko, Fabiola Olivieri

Objective. Type 2 diabetes mellitus (T2DM) is a highly prevalent non-communicable chronic disease that substantially reduces life expectancy. Accurate estimation of all-cause mortality risk in T2DM patients is crucial for personalizing and optimizing treatment strategies. Research Design and Methods. This study analyzed a cohort of 554 patients (aged 40-87 years) with diagnosed T2DM over a maximum follow-up period of 16.8 years, during which 202 patients (36%) died. Key survival-associated features were identified, and multiple machine learning (ML) models were trained and validated to predict all-cause mortality risk. To improve model interpretability, Shapley additive explanations (SHAP) was applied to the best-performing model. Results. The extra survival trees (EST) model, incorporating ten key features, demonstrated the best predictive performance. The model achieved a C-statistic of 0.776, with the area under the receiver operating characteristic curve (AUC) values of 0.86, 0.80, 0.841, and 0.826 for 5-, 10-, 15-, and 16.8-year all-cause mortality predictions, respectively. The SHAP approach was employed to interpret the model’s individual decision-making processes. Conclusions. The developed model exhibited strong predictive performance for mortality risk assessment. Its clinically interpretable outputs enable potential bedside application, improving the identification of high-risk patients and supporting timely treatment optimization.

目标:2型糖尿病(T2DM)是一种高度流行的非传染慢性疾病,大大缩短了预期寿命;准确估计T2DM病人所有原因的死亡风险对于个性化和优化治疗战略至关重要;研究设计和方法:这项研究分析了554名病人(40-87岁)的组群(40-87岁)在最多后续16.8年期间被诊断出T2DM(36 % ) ,在此期间202名病人(36%)死亡;查明了关键生存相关特征,对多机学习模型进行了培训和验证,以预测所有原因的死亡率风险;为了提高模型的可解释性,对最佳表现模式应用了SHAP(SHAP) 添加解释(SHAP) ;结果:包含10个关键特征的额外生存树(EST)模型显示了最佳的预测性表现;模型实现了0.776的C统计,在接收者特征曲线下,5-、10、15-和16.8年多机床(ML)的死亡率预测模型值;分别对5-10、15和16.8年所有原因的死亡率预测模型进行了培训和验证;分别采用SHPSHE(SHIP)模型的预测方法,以解释其快速预测。

Article 66

Title@2025-07-31 (4): On the Approximation of Stationary Processes using the ARMA Model

Title: On the Approximation of Stationary Processes using the ARMA Model

Zur Annäherung von stationären Prozessen mit dem ARMA-Modell

使用ARMA模型的固定工艺接近情况 2408.10610v4

Authors (3): Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

We look at a problem related to Autoregressive Moving Average (ARMA) models, on quantifying the approximation error between a true stationary process $X_t$ and an ARMA model $Y_t$. We take the transfer function representation $x(L)$ of a stationary process $X_t$ and show that the $L^{\infty}$ norm of $x$ acts as a valid norm on $X_t$ that controls the $\ell^2$ norm of its Wold coefficients. We then show that a certain subspace of stationary processes, which includes ARMA models, forms a Banach algebra under the $L^{\infty}$ norm that respects the multiplicative structure of $H^{\infty}$ transfer functions and thus improves on the structural properties of the cepstral norm for ARMA models. The natural definition of invertibility in this algebra is consistent with the original definition of ARMA invertibility, and generalizes better to non-ARMA processes than Wiener’s $\ell^1$ condition. Finally, we calculate some explicit approximation bounds in the simpler context of continuous transfer functions, and critique some heuristic ideas on Pad'e approximations and parsimonious models.

我们研究了与自动递减平均移动(ARMA)模型有关的问题,即如何量化真实固定过程$X$t美元和ARMA模型$Y美元之间的近似差差。我们采用固定过程$X$美元的转移函数代表$x(L)美元标准,并显示美元x美元标准对于控制其沃尔德系数的美元值标准值的X美元值来说是一种有效的规范。然后我们展示了某种固定过程的子空间,其中包括ARMA模型,在$Linfty}标准下形成一个Banach代数,该标准尊重$Hinfty美元转移功能的倍复制结构,从而改进了ARMA模型的 cepstral规范的结构特性。该代数中不可忽略的自然定义符合ARMA不可省系数的原始定义,并比Wiener的 $ell_1美元标准值更好地适用于非ARMA进程。最后,我们计算了某些直截面直截面的转移功能,以及某些直截面的直截面方向模型。

Article 67

Title@2025-07-31 (4): Machine learning and machine learned prediction in chest X-ray images

Title: Machine learning and machine learned prediction in chest X-ray images

Maschinelles Lernen und maschinell gelernte Vorhersagen in Röntgenbildern in der Brust

胸部X光图像中的机器学习和机器学习预测 2507.23455v1

Authors (5): Shereiff Garrett, Abhinav Adhikari, Sarina Gautam, DaShawn Marquis Morris, Chandra Mani Adhikari

Machine learning and artificial intelligence are fast-growing fields of research in which data is used to train algorithms, learn patterns, and make predictions. This approach helps to solve seemingly intricate problems with significant accuracy without explicit programming by recognizing complex relationships in data. Taking an example of 5824 chest X-ray images, we implement two machine learning algorithms, namely, a baseline convolutional neural network (CNN) and a DenseNet-121, and present our analysis in making machine-learned predictions in predicting patients with ailments. Both baseline CNN and DenseNet-121 perform very well in the binary classification problem presented in this work. Gradient-weighted class activation mapping shows that DenseNet-121 correctly focuses on essential parts of the input chest X-ray images in its decision-making more than the baseline CNN.

机器学习和人工智能是迅速发展的研究领域,其中数据被用于培训算法、学习模式和预测。这种方法有助于通过识别数据中的复杂关系,通过识别5824个胸前X光图像的例子,通过识别数据中的复杂关系,以相当准确的方式解决看似复杂的问题。我们采用了两种机器学习算法,即基线进化神经网络(CNN)和DenseNet-121, 并介绍了我们在预测疾病患者方面进行机器学习预测的分析。基线CNN和DenseNet-121在这项工作提出的二元分类问题中表现得非常好。加权级激活绘图显示,DenseNet-121在决策中正确地侧重于输入的胸X光图像的基本部分,而不是基线CNN。

Article 68

Title@2025-07-31 (4): Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly Detection

Title: Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly Detection

Manifold-regularisierte Signatur-Kernel Large-Margin $\ell_p$-SVDD für mehrdimensionale Zeitreihenanomalienerkennung

用于多层时间序列异常探测的大型内核 $\ ell_ p$- SVDD $\ ell_ p$- SVDD 2507.23449v1

Authors (1): Shervin Rahimzadeh Arashloo

We generalise the recently introduced large-margin $\ell_p$-SVDD approach to exploit the geometry of data distribution via manifold regularising and a signature kernel representation for time series anomaly detection. Specifically, we formulate a manifold-regularised variant of the $\ell_p$-SVDD method to encourage label smoothness on the underlying manifold to capture structural information for improved detection performance. Drawing on an existing Representer theorem, we then provide an effective optimisation technique for the proposed method and show that it can benefit from the signature kernel to capture time series complexities for anomaly detection. We theoretically study the proposed approach using Rademacher complexities to analyse its generalisation performance and also provide an experimental assessment of the proposed method across various data sets to compare its performance against other methods.

具体地说,我们制定了一个多重常规变式,用于分析反常现象,鼓励在基本元件上贴上光滑标签,以获取结构信息,改善探测性能。然后,我们利用现有的代表理论,为拟议方法提供一种有效的优化技术,并表明它可从签字内核中受益,以捕捉异常现象探测的时间序列复杂性。我们理论上研究使用Rademacher复杂情况分析其一般化性能的拟议方法,并同时提供对各数据集的拟议方法的实验性评估,以比较其与其他方法的性能。

Article 69

Title@2025-07-31 (4): Adjoint-Based Aerodynamic Shape Optimization with a Manifold Constraint Learned by Diffusion Models

Title: Adjoint-Based Aerodynamic Shape Optimization with a Manifold Constraint Learned by Diffusion Models

Adjoint-Based Aerodynamic Shape Optimization mit einer Manifold Constraint durch Diffusion Modelle gelernt

以联合为基础的空气动力学元件优化,通过扩散模型进行控制 2507.23443v1

Authors (6): Long Chen, Emre Oezkaya, Jan Rottmayer, Nicolas R. Gauger, Zebang Shen, Yinyu Ye

We introduce an adjoint-based aerodynamic shape optimization framework that integrates a diffusion model trained on existing designs to learn a smooth manifold of aerodynamically viable shapes. This manifold is enforced as an equality constraint to the shape optimization problem. Central to our method is the computation of adjoint gradients of the design objectives (e.g., drag and lift) with respect to the manifold space. These gradients are derived by first computing shape derivatives with respect to conventional shape design parameters (e.g., Hicks-Henne parameters) and then backpropagating them through the diffusion model to its latent space via automatic differentiation. Our framework preserves mathematical rigor and can be integrated into existing adjoint-based design workflows with minimal modification. Demonstrated on extensive transonic RANS airfoil design cases using off-the-shelf and general-purpose nonlinear optimizers, our approach eliminates ad hoc parameter tuning and variable scaling, maintains robustness across initialization and optimizer choices, and achieves superior aerodynamic performance compared to conventional approaches. This work establishes how AI generated priors integrates effectively with adjoint methods to enable robust, high-fidelity aerodynamic shape optimization through automatic differentiation.

我们引入了基于联合的空气动力形状优化框架, 整合了经过现有设计培训的传播模型, 以学习空气动力学上可行的形状的平滑成形体。该元体是作为形状优化问题的平等制约而强制实施的。我们方法的核心是计算设计目标(例如拖动和升动)与多元空间的双向梯度。这些梯度是先在常规形状设计参数(例如Hicks-Henne参数)方面计算成形形衍生物的形状衍生物,然后通过自动区分法,通过扩散模型将其反射到其潜在空间。我们的框架保存数学钻孔,并且可以纳入现有的基于联合的设计工作流程中,但只有最小的修改。在使用现成和通用的非线性非线性优化器的大型透音式RANS空气 foil设计案例上演示。我们的方法消除了常规形状设计参数调整和可变缩缩放,在初始化和优化中保持稳健性, 并与常规方法相比, 实现高压性动力性性性性性表现。这项工作确定了AI 如何有效地将前期与基于联合的模型整合, 实现高性优化。

Article 70

Title@2025-07-31 (4): Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design

Title: Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design

Coflex: Verbesserung von HW-NAS mit Sparse Gaussian Prozessen für effizientes und skalierbares DNN Accelerator Design

Coflex:加强HW-NAS,并配有用于高效和可缩放 DNN 加速器设计的斯普尔斯高斯进程 2507.23437v1

Authors (5): Yinhui Ma, Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Bo Wang

Hardware-Aware Neural Architecture Search (HW-NAS) is an efficient approach to automatically co-optimizing neural network performance and hardware energy efficiency, making it particularly useful for the development of Deep Neural Network accelerators on the edge. However, the extensive search space and high computational cost pose significant challenges to its practical adoption. To address these limitations, we propose Coflex, a novel HW-NAS framework that integrates the Sparse Gaussian Process (SGP) with multi-objective Bayesian optimization. By leveraging sparse inducing points, Coflex reduces the GP kernel complexity from cubic to near-linear with respect to the number of training samples, without compromising optimization performance. This enables scalable approximation of large-scale search space, substantially decreasing computational overhead while preserving high predictive accuracy. We evaluate the efficacy of Coflex across various benchmarks, focusing on accelerator-specific architecture. Our experi- mental results show that Coflex outperforms state-of-the-art methods in terms of network accuracy and Energy-Delay-Product, while achieving a computational speed-up ranging from 1.9x to 9.5x.

硬件-软件神经结构搜索(HW-NAS)是自动优化神经网络性能和硬件能效的一种有效方法,它对于在边缘开发深神经网络加速器特别有用,然而,广泛的搜索空间和高计算成本对其实际采用提出了重大挑战。为解决这些局限性,我们提议Coflex,这是一个将斯普尔采高斯进程(SGP)与多目标贝叶西亚优化相结合的新型HW-NAS框架。Coflex利用稀释引点将GP内核复杂性从立方降低到近线,同时不损害优化性能。这可以使大型搜索空间的可扩缩近近,大幅降低计算间接费用,同时保持高预测性准确性。我们评估了Coflex在各种基准上的功效,侧重于加速器特定结构。我们的超常心理结果显示,Coflex在网络精度和能量- Delay-Protracal上从19-9-9-9-9-9-x速度计算中超越了最新方法。

Article 71

Title@2025-07-31 (4): A ZeNN architecture to avoid the Gaussian trap

Title: A ZeNN architecture to avoid the Gaussian trap

Eine ZeNN-Architektur, um die Gaussische Falle zu vermeiden

避免高斯陷阱的 ZeNN 建筑 2505.20553v2

Authors (4): Luís Carvalho, João L. Costa, José Mourão, Gonçalo Oliveira

We propose a new simple architecture, Zeta Neural Networks (ZeNNs), in order to overcome several shortcomings of standard multi-layer perceptrons (MLPs). Namely, in the large width limit, MLPs are non-parametric, they do not have a well-defined pointwise limit, they lose non-Gaussian attributes and become unable to perform feature learning; moreover, finite width MLPs perform poorly in learning high frequencies. The new ZeNN architecture is inspired by three simple principles from harmonic analysis: i) Enumerate the perceptons and introduce a non-learnable weight to enforce convergence; ii) Introduce a scaling (or frequency) factor; iii) Choose activation functions that lead to near orthogonal systems. We will show that these ideas allow us to fix the referred shortcomings of MLPs. In fact, in the infinite width limit, ZeNNs converge pointwise, they exhibit a rich asymptotic structure beyond Gaussianity, and perform feature learning. Moreover, when appropriate activation functions are chosen, (finite width) ZeNNs excel at learning high-frequency features of functions with low dimensional domains.

我们建议一个新的简单架构,Zeta Neural 网络(ZeNNSs),以克服标准多层立方体(MLPs)的若干缺陷。也就是说,在宽宽度限制下, MLP是非参数性参数,它们没有明确界定的点向限制,它们失去了非Gausian属性,无法进行特征学习;此外,有限的宽度MLP在学习高频率方面表现不佳。新的ZeNNS结构受调心分析的三个简单原则的启发:i) 点数概念,并引入不可忽略的重量,以强制趋同;ii) 引入一个缩放(或频度)系数;iii) 选择导致接近正方形系统的激活功能。我们将表明这些理念允许我们修正 MLPs所推荐的缺陷。事实上,在宽度限制下,ZeNNUS会聚集点,它们展现出超出高比值的丰富惯性结构,并进行特征学习。此外,当选择适当的激活功能时,在高空空间学习的特性下,(固定宽度)将显示高空域。

Article 72

Title@2025-07-31 (4): Merging Memory and Space: A Spatiotemporal State Space Neural Operator

Title: Merging Memory and Space: A Spatiotemporal State Space Neural Operator

Zusammenführen von Speicher und Raum: Ein räumlich-temporaler Zustandsraum-Neural-Betreiber

合并的记忆与空间:一个瞬间国家空间神经操作员 2507.23428v1

Authors (2): Nodens F. Koren, Samuel Lanthaler

We propose the Spatiotemporal State Space Neural Operator (ST-SSM), a compact architecture for learning solution operators of time-dependent partial differential equations (PDEs). ST-SSM introduces a novel factorization of the spatial and temporal dimensions, using structured state-space models to independently model temporal evolution and spatial interactions. This design enables parameter efficiency and flexible modeling of long-range spatiotemporal dynamics. A theoretical connection is established between SSMs and neural operators, and a unified universality theorem is proved for the resulting class of architectures. Empirically, we demonstrate that our factorized formulation outperforms alternative schemes such as zigzag scanning and parallel independent processing on several PDE benchmarks, including 1D Burgers’ equation, 1D Kuramoto-Sivashinsky equation, and 2D Navier-Stokes equations under varying physical conditions. Our model performs competitively with existing baselines while using significantly fewer parameters. In addition, our results reinforce previous findings on the benefits of temporal memory by showing improved performance under partial observability. Our results highlight the advantages of dimensionally factorized operator learning for efficient and generalizable PDE modeling, and put this approach on a firm theoretical footing.

我们提议采用Spatotoental State Spatolophy Neal Constrict (ST-SSM),这是一个学习基于时间的局部差异方程(PDEs)操作者学习解决方案的紧凑架构。ST-SSSM采用结构化的州空间模型,对空间和时空互动进行新的因素化,使用结构化的州空间模型独立模拟时间进化和空间互动。这种设计可以使长距离超时空间时空动态的参数效率和灵活建模。 Spatives和神经操作者之间建立了理论联系,并证明由此产生的建筑类别具有统一的普遍性。我们生动地表明,我们的因子化的公式优于Zigzag扫描和若干PDE基准的平行独立处理等替代方案,包括1D Burgers 等、 1D Kuramoto- Sivashinsky 等式和 2D Navier-Stoks 等式在不同物理条件下独立处理。我们的模型与现有基线竞争,同时使用显著的参数。此外,我们的结果通过显示部分可防守状态下改进的性,加强了对时间记忆的好处,从而强化了先前的发现。我们的结果强调了公司在高效和通用的理论基础上学习的分位化理论的理论的优点的优点的好处。

Article 73

Title@2025-07-31 (4): Identifying Super Spreaders in Multilayer Networks

Title: Identifying Super Spreaders in Multilayer Networks

Identifizieren von Superspreizern in Multilayer-Netzwerken

识别多层网络中的超级传播器 2505.20980v2

Authors (5): Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka

Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent’s spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.

识别超级扩散器可以被设计成影响最大化问题的子任务。它试图在网络内定位一个代理器, 如果选择作为单一的传播种子, 最有效地传播信息。多层网络, 是一个特殊的多层图表, 可以捕捉不同类型的互动( 例如物理虚拟或专业- 社会), 从而更准确地表达复杂的关系结构。在这项工作中, 我们引入一种新颖的方法, 通过利用图形神经网络来识别这些网络中的超级扩散器。为此, 我们通过模拟数百个网络的信息传播来构建一个数据集。这是我们的知识中最优秀的, 这是第一个专门为多层网络定制的高级网络。我们进一步将这项任务设计成基于四维矢量的预测问题的变异性( 比如物理虚拟或专业- 社交- 社会 ) , 从而提供复杂的关系结构; 激活次数; (二) 传播过程的持续时间; (三) 激活的峰值; 和 (四) 这个峰值的模拟步骤。我们的模型, 顶层SreadterworkNetwork, 包括一个基于我们历史结构的深度的网络, 以及一个高层次的变换版数据。

Article 74

Title@2025-07-31 (4): Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine Learning

Title: Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine Learning

Erkennung von Verwechslungen in Kokosmilch mittels Infrarotspektroskopie und maschinellem Lernen

利用红外红外光谱镜像和机器学习探测椰子牛奶中通奸 2507.23418v1

Authors (2): Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

In this paper, we propose a system for detecting adulteration in coconut milk, utilizing infrared spectroscopy. The machine learning-based proposed system comprises three phases: preprocessing, feature extraction, and classification. The first phase involves removing irrelevant data from coconut milk spectral signals. In the second phase, we employ the Linear Discriminant Analysis (LDA) algorithm for extracting the most discriminating features. In the third phase, we use the K-Nearest Neighbor (KNN) model to classify coconut milk samples into authentic or adulterated. We evaluate the performance of the proposed system using a public dataset comprising Fourier Transform Infrared (FTIR) spectral information of pure and contaminated coconut milk samples. Findings show that the proposed method successfully detects adulteration with a cross-validation accuracy of 93.33%.

在本文中,我们提出利用红外线光谱检查椰子牛奶通奸的系统。机器学习系统包括三个阶段:预处理、地物提取和分类。第一阶段是从椰子牛奶光谱信号中去除无关的数据。在第二阶段,我们采用线性差异分析算法(LDA)来提取最有区别的特性。在第三阶段,我们使用K-Nearest Neearbearbor(KNN)模型来将椰子牛奶样品分类为真实的或混合的。我们利用由Fourier变换红外线(FTIR)光谱信息组成的公共数据集来评估拟议系统的性能。调查结果显示,拟议的方法成功地检测了交叉估价准确度93.33%的通奸。

Article 75

Title@2025-07-31 (4): Honey Adulteration Detection using Hyperspectral Imaging and Machine Learning

Title: Honey Adulteration Detection using Hyperspectral Imaging and Machine Learning

Honey Adulteration Detection mit Hyperspektrale Bildgebung und maschinelles Lernen

利用超光谱成像和机器学习探测蜂蜜通奸 2507.23416v1

Authors (2): Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

This paper aims to develop a machine learning-based system for automatically detecting honey adulteration with sugar syrup, based on honey hyperspectral imaging data. First, the floral source of a honey sample is classified by a botanical origin identification subsystem. Then, the sugar syrup adulteration is identified, and its concentration is quantified by an adulteration detection subsystem. Both subsystems consist of two steps. The first step involves extracting relevant features from the honey sample using Linear Discriminant Analysis (LDA). In the second step, we utilize the K-Nearest Neighbors (KNN) model to classify the honey botanical origin in the first subsystem and identify the adulteration level in the second subsystem. We assess the proposed system performance on a public honey hyperspectral image dataset. The result indicates that the proposed system can detect adulteration in honey with an overall cross-validation accuracy of 96.39%, making it an appropriate alternative to the current chemical-based detection methods.

本文旨在根据蜂蜜超光谱成像数据,开发一个机器学习系统,以自动检测蜂蜜与糖浆混合的蜂蜜。首先,蜂蜜样本的花粉来源由植物源识别子系统分类。然后,确定糖糖浆掺杂,将其浓度量化为掺拌检测子系统。两个子系统由两步组成。第一步是使用线性分辨分析(LDA)从蜂蜜样本中提取相关特征。第二步是,我们使用K-Nearest Neigbors(KNN)模型,对第一个子系统中的蜂蜜植物来源进行分类,并确定第二个子系统中的掺拌水平。我们评估了拟议的系统在公众蜂蜜超光谱图像数据集上的系统性能。结果显示,拟议的系统可以用96.39%的总体交叉校准精确度检测蜂蜜中的掺杂情况,从而成为目前基于化学的检测方法的适当替代方法。

Article 76

Title@2025-07-31 (4): A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles

Title: A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles

Ein maschineller Lernansatz für die Erkennung von Honig-Adulteration mittels Mineralelement-Profilen

利用矿物元素简介进行蜂蜜通奸检测的机械学习方法 2507.23412v1

Authors (2): Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

This paper aims to develop a Machine Learning (ML)-based system for detecting honey adulteration utilizing honey mineral element profiles. The proposed system comprises two phases: preprocessing and classification. The preprocessing phase involves the treatment of missing-value attributes and normalization. In the classifica-tion phase, we use three supervised ML models: logistic regression, decision tree, and random forest, to dis-criminate between authentic and adulterated honey. To evaluate the performance of the ML models, we use a public dataset comprising measurements of mineral element content of authentic honey, sugar syrups, and adul-terated honey. Experimental findings show that mineral element content in honey provides robust discriminative information for detecting honey adulteration. Results also demonstrate that the random forest-based classifier outperforms other classifiers on this dataset, achieving the highest cross-validation accuracy of 98.37%.

本文旨在开发一个基于机器学习(ML)的系统,用于利用蜂蜜矿元素剖面探测蜂蜜涂层。拟议系统包括两个阶段:预处理和分类。预处理阶段涉及处理缺失的价值属性和正常化。在分类阶段,我们使用三种受监督的ML模型:后勤回归、决策树和随机森林,以区分真实的和混合的蜂蜜。为了评估ML模型的性能,我们使用一个由测量真实蜂蜜、糖糖浆和配制蜂蜜的矿物元素内容组成的公共数据集。实验结果显示,蜂蜜中的矿物元素含量为检测蜂蜜涂层提供了有力的歧视性信息。结果还表明,随机森林的分类器比其他分类器在这个数据集上更符合其他分类,达到98.37%的最高交叉校准精确度。

Article 77

Title@2025-07-31 (4): Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Title: Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Effiziente Schmerzerkennung durch Respirationssignale: Eine einzige Cross-Attention Transformer Multi-Window Fusion Pipeline

通过呼吸信号进行有效的疼痛识别:单一交叉感应变异器多窗口融合管道 2507.21886v3

Authors (3): Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis

Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model’s representational capacity.

准确和一致的评价对于遭受疼痛的个人至关重要,它支持制定有效和先进的管理战略。自动疼痛评估系统提供持续的监测和支持临床决策,目的是减少危难并防止功能下降。这项研究已经提交给下一轮癌症评估的“第二次多式遥感大挑战” 。拟议方法引入了一条管道,利用呼吸作为输入信号,并结合高效的交叉注意变压器,同时采用多窗口战略。广泛的实验表明,呼吸是评估疼痛的一种宝贵的生理方式。此外,实验还显示,如果适当优化,紧凑和有效的模型能够取得强效,往往超过较大的对应模型。拟议的多窗口方法有效地捕捉了短期和长期的特征以及全球特征,从而增强了模型的代表性能力。

Article 78

Title@2025-07-31 (4): Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Title: Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Künstliche induktive Bias für die synthetische tabellarische Datengenerierung in Data-Scarce-Szenarien

数据碎片假设情景中合成图示数据生成人工诱导比值 2407.03080v2

Authors (5): Patricia A. Apellániz, Ana Jiménez, Borja Arroyo Galende, Juan Parras, Santiago Zazo

While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on the availability of substantial training data, often lacking in real-world scenarios. To overcome this limitation, we propose a novel methodology that explicitly integrates artificial inductive biases into the generative process to improve data quality in low-data regimes. Our framework leverages transfer learning and meta-learning techniques to construct and inject informative inductive biases into DGMs. We evaluate four approaches (pre-training, model averaging, Model-Agnostic Meta-Learning (MAML), and Domain Randomized Search (DRS)) and analyze their impact on the quality of the generated text. Experimental results show that incorporating inductive bias substantially improves performance, with transfer learning methods outperforming meta-learning, achieving up to 60\% gains in Jensen-Shannon divergence. The methodology is model-agnostic and especially relevant in domains such as healthcare and finance, where high-quality synthetic data are essential, and data availability is often limited.

虽然利用深创模型(DGM)合成表格数据生成为数据稀缺和隐私问题提供了令人信服的解决办法,但其有效性取决于能否获得大量培训数据,而这种数据往往在现实世界情景中缺乏。为了克服这一局限性,我们提议了一种新的方法,将人工进化偏差明确纳入基因化进程,以提高低数据制度的数据质量。我们的框架利用学习和元学习技术向DGMs转让和注入信息输入偏差。我们评价了四种方法(预培训、模型平均、模型-高级元数据(MAML)和Domain随机搜索(DRS)),并分析了其对生成文本质量的影响。实验结果显示,吸收进化偏差会大大改善绩效,而转移学习方法优于元学习,在Jensen-Shannon差异中实现高达60的收益。该方法具有示范性,在保健和金融等领域特别相关,因为高质量的合成数据至关重要,而且数据提供往往有限。

Article 79

Title: AGA: An adaptive group alignment framework for structured medical cross-modal representation learning

AGA: Ein adaptiver Gruppenausrichtungsrahmen für strukturiertes medizinisches Cross-Modalitäts-Repräsentations-Lernen

AGA:结构化医疗跨模式代表性学习的适应性小组调整框架 2507.23402v1

Authors (4): Wei Li, Xun Gong, Jiao Li, Xiaobin Sun

Learning medical visual representations from paired images and reports is a promising direction in representation learning. However, current vision-language pretraining methods in the medical domain often simplify clinical reports into single entities or fragmented tokens, ignoring their inherent structure. In addition, contrastive learning frameworks typically depend on large quantities of hard negative samples, which is impractical for small-scale medical datasets. To tackle these challenges, we propose Adaptive Grouped Alignment (AGA), a new framework that captures structured semantics from paired medical images and reports. AGA introduces a bidirectional grouping mechanism based on a sparse similarity matrix. For each image-report pair, we compute fine-grained similarities between text tokens and image patches. Each token selects its top-matching patches to form a visual group, and each patch selects its most related tokens to form a language group. To enable adaptive grouping, we design two threshold gating modules, called Language Grouped Threshold Gate and Vision Grouped Threshold Gate, which learn grouping thresholds dynamically. Group representations are computed as weighted averages based on similarity scores. To align each token with its group representation, we introduce an Instance Aware Group Alignment loss that operates within each image-text pair, removing the need for external negatives. Finally, a Bidirectional Cross-modal Grouped Alignment module is applied to enhance fine-grained alignment between visual and linguistic group representations. Extensive experiments on public and private datasets show that our method achieves strong performance on image-text retrieval and classification tasks under both fine-tuning and zero-shot settings.

从配对图像和报告中学习医学视觉表现是代表制学习的一个有希望的方向。然而,医疗领域目前的视觉语言预培训方法往往将临床报告简化为单一实体或零散的象征物,忽视其固有的结构。此外,对比式学习框架通常取决于大量硬性负抽样,这对小规模医疗数据集来说是不切实际的。为了应对这些挑战,我们建议采用适应性分组对齐(AGA)这一新框架,从配对的医疗图像和报告中收集结构性的语义。AGA采用双向组合式组合机制,以稀少的相似性矩阵为基础。对于每个图像报告配对配对,我们将细微和细微的相似的相似性格。每个符号选择其最上匹配的补丁补丁,形成一个视觉组,每个补丁选择其最相关的符号组成一个语言组。为了适应性分组,我们设计两个门槛式模块,称为语言组的临界值门和愿景组的缩略图门,以动态的方式学习组合。对于每个图像配对齐的精确平均数,根据相似的直观性批值和图像配对齐的直径直径直径对齐,每个缩缩缩缩缩的缩缩缩缩缩缩缩图,以显示每个组合的组合之下需要调整一个直径对正对正对整。

Article 80

Title@2025-07-31 (4): Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Title: Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Politik Lernen aus großen Vision-Sprache Modell Feedback ohne Belohnung Modellierung

从大视野 – – 语言模型反馈中学习政策而不进行奖励建模 2507.23391v1

Authors (4): Tung M. Luu, Donghoon Lee, Younghwan Lee, Chang D. Yoo

Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language task description. The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective, bypassing the need to learn explicit reward models. Through extensive experiments on robotic manipulation tasks from the MetaWorld, PLARE achieves performance on par with or surpassing existing state-of-the-art VLM-based reward generation methods. Furthermore, we demonstrate the effectiveness of PLARE in real-world manipulation tasks with a physical robot, further validating its practical applicability.

离线强化学习(RL)为培训机器人代理提供了强有力的框架,利用预先收集的、低于最佳的数据集培训机器人代理,消除成本、耗时和潜在危险的在线互动需求,这在安全临界现实应用中特别有用,因为在线数据收集费用昂贵且不切实际。然而,现有的离线 RL 算法通常需要奖励标签数据,这又增加了一个瓶颈:奖励功能设计本身成本高昂,劳动密集型,需要大量领域专门知识。在本文中,我们引入了PLARE,这是一种利用大型视觉语言模型为代理培训提供指导信号的新办法。PLARE不依靠人工设计的奖励功能,而是根据语言任务说明,对视像轨道部分的配对进行优惠标签查询VLM。然后,该政策直接从这些偏好标签中培训,利用监督的对比偏好偏好学习目标,绕过学习明确奖励模式的需要。通过对MetaWorld的机器人操纵任务进行广泛的实验,PLARE在实际操作性VLM(PLM)中,我们进一步展示其实际操作效率。

Article 81

Title@2025-07-31 (4): Causal Explanation of Concept Drift – A Truly Actionable Approach

Title: Causal Explanation of Concept Drift – A Truly Actionable Approach

Kausale Erklärung des Konzepts Drift – Ein wirklich handlungsfähiger Ansatz

对 “ 漂流 “ 概念的因果解释 – – 真正可采取行动的方法 2507.23389v1

Authors (5): David Komnick, Kathrin Lammers, Barbara Hammer, Valerie Vaquet, Fabian Hinder

In a world that constantly changes, it is crucial to understand how those changes impact different systems, such as industrial manufacturing or critical infrastructure. Explaining critical changes, referred to as concept drift in the field of machine learning, is the first step towards enabling targeted interventions to avoid or correct model failures, as well as malfunctions and errors in the physical world. Therefore, in this work, we extend model-based drift explanations towards causal explanations, which increases the actionability of the provided explanations. We evaluate our explanation strategy on a number of use cases, demonstrating the practical usefulness of our framework, which isolates the causally relevant features impacted by concept drift and, thus, allows for targeted intervention.

在一个不断发生变化的世界中,了解这些变化如何影响不同的系统,例如工业制造或关键基础设施,至关重要。解释关键的变化,即所谓机器学习领域的概念漂移,是使有针对性的干预能够避免或纠正模型故障以及物理世界的故障和错误的第一步。因此,在这项工作中,我们将基于模型的漂移解释推广到因果解释,这增加了所提供的解释的可操作性。我们评估了我们对一些使用案例的解释战略,显示了我们框架的实际效用,这种框架分离了概念漂移所影响的因果相关特征,从而允许有针对性干预。

Article 82

Title@2025-07-31 (4): Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

Title: Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken

关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果 2507.07675v2

Authors (1): Darshan Makwana

We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the $m\times n$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the “revival” depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e^{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.

我们分析完全连接的定期ReLU 网络中有限宽度范围内的分层有效维度( 特性矩阵的排序) 。具体地说, 对于固定的批量的美元投入和随机的 Gausian 重量, 我们为预期的 $m\ time n\ 隐藏的激活矩阵的级别得出闭式表达式。我们的主要结果表明, $\ mathbb{ [EDim( ell)] =m[ 1- (1- 2/\ pi)\ pl) +O( e- cm} ) , 使排名赤字以 1-2 /\ pi\ aprox 0. 3634$ 的比例递减几何。我们还证明, 预期排名达到本地最大值的“ revival” 深度。特别是, 这些峰值发生在深度 $\ kpprox (k+1/2)\\ log/ pi) $, lexal- a deal- listal listal listal rodeal oral ormadeal oral ormadeal: 我们 rodeal a deal a oral deal deal bal ortial ortial ortial ortial ortial rogal rogal rogal pral pral pral pral pral pressal pral pral oral orgal pral pral oral orgal oral pral pressal pressal pral pressal pral pressal pressal pressal pressal pral pral pral pral pral pral pral pral pral pral pral pral pral pral pral pral pressal pressal pral pral pral pral pral pressal pressal pral pral pral pral pral pral pral pral pral pral pr

Article 83

Title@2025-07-31 (4): EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

Title: EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

EP-Diffusor: Ein effizientes Diffusionsmodell für die Verkehrsszenengenerierung und -vorhersage über polynomische Darstellungen

EP-Diffuser:通过多边代表制有效传播交通景点生成和预测模式 2504.05422v3

Authors (4): Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt

As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach. The code and model checkpoints are available at: https://github.com/continental/EP-Diffuser.

随着预测地平线的扩大,预测交通场景的未来演变变得日益困难,因为代理运动具有多模式性质,大多数最先进的(SotA)预测模型主要侧重于预测最可能的未来。然而,对于自主车辆的安全运行,同样重要的是覆盖分配,以寻找可信的运动替代办法。为了解决这个问题,我们引入了一个新的参数高效扩散模型EP-Diffuser,这是一个新的参数高效扩散模型,旨在捕捉交通场面演化的分布。在道路布局和代理历史中,我们的模型起到预测的作用,并产生多样化的、可信的场景延续。我们根据两种SotA模型对EP-Diffuser进行基准,说明Argoverse 2数据集预测的准确性和可信度。尽管我们的模型规模要小得多,但我们的方法实现了高度准确和可信的交通场景预测。我们进一步评估了利用Waymo Oomo OoD测试的模型普及能力,并展示了我们方法的高度稳健性。我们的代码和模型检查站有: https://githus/stalmausion-Dasyal-Duralopcom。

Article 84

Title@2025-07-31 (4): Robust and Fine-Grained Detection of AI Generated Texts

Title: Robust and Fine-Grained Detection of AI Generated Texts

Robuste und feinkörnige Erkennung von KI-generierten Texten

对 AI 生成文本的强力和精细探测 2504.11952v3

Authors (14): Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Suman Debnath, Hamza Farooq

An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models’ performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.

理想的机器生成内容检测系统应适用于任何发电机,因为许多更先进的LLM每天都存在。现有系统往往难以准确识别AI产生的内容,而不能精确辨别较短的文本。此外,并非所有文本都可能完全由人或LLM编写,因此我们更侧重于部分案例,即人-LLM共同编写的文本。我们的文件介绍了一套为象征性分类任务而设计的模型,这些模型经过培训,涉及大量人体-机器共同编写的文本,这些文本在看不见域的文本、看不见的生成者、非母语发言人的文本和有对抗性投入的文本方面表现得非常出色。我们还引入了2.4M以上这类文本的新数据集,这些文本大多由超过23种语言的几个受欢迎的专利LMM共同编写。我们还介绍了我们模型对每个域和生成器的每种文本的绩效调查结果。其他调查结果包括对照每一种对抗方法的绩效、输入文本的长度和生成文本的特点与原始人类撰写的文本的对比。

Article 85

Title@2025-07-31 (4): SWE-Exp: Experience-Driven Software Issue Resolution

Title: SWE-Exp: Experience-Driven Software Issue Resolution

SWE-Exp: Erfahrungsgetriebene Software-Ausgabeauflösung

SWE-Expl:经验丰富的软件问题决议 2507.23361v1

Authors (10): Silin Chen, Shaoxin Lin, Xiaodong Gu, Yuling Shi, Heng Lian, Longfei Yun, Dong Chen, Weiguo Sun, Lin Cao, Qianxiang Wang

Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified under open-source agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.

大型语言模型(LLM)代理最近的进展表明,在软件问题的解决方面取得了显著进展,利用了多剂协作和蒙特卡洛树搜索等先进技术。然而,目前代理作为没有记忆的探险家,在不保留或重复以往修复经验的知识的情况下分别处理每个问题,从而导致对失败的轨迹进行重复探索,并错过了将成功解决问题的方法适应类似问题的机会。为解决这一问题,我们引入SWE-Exporation(SWE-Exporation),一种强化方法,从以前的代理轨迹中提取简明和可操作的经验,使各种问题能够不断学习。我们的方法引入了一个多面的经验库,记录成功和失败的修复尝试。具体地说,它提取了不同层次的可重复的解决问题知识――从高层次的问题理解到具体的代码变化。实验表明,SWE-Explex在开放源代理框架下对SWE-bench-Verizer化的SWE-pass@1,在SWE-bench-vicer 框架下,我们的方法建立了一个新的范例,使自动软件工程代理系统积累和利用修复专门知识,从试验和驱动的解决方案问题从根本上转向战略探索。

Article 86

Title@2025-07-31 (4): Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment Rules

Title: Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment Rules

Optimales Verkehrslernen: Wertoptimierung und Fairness in individualisierten Behandlungsregeln ausgleichen

最佳交通学习:在个人化待遇规则中平衡价值的优化和公平 2507.23349v1

Authors (5): Wenhai Cui, Xiaoting Ji, Wen Su, Xiaodong Yan, Xingqiu Zhao

Individualized treatment rules (ITRs) have gained significant attention due to their wide-ranging applications in fields such as precision medicine, ridesharing, and advertising recommendations. However, when ITRs are influenced by sensitive attributes such as race, gender, or age, they can lead to outcomes where certain groups are unfairly advantaged or disadvantaged. To address this gap, we propose a flexible approach based on the optimal transport theory, which is capable of transforming any optimal ITR into a fair ITR that ensures demographic parity. Recognizing the potential loss of value under fairness constraints, we introduce an ``improved trade-off ITR,” designed to balance value optimization and fairness while accommodating varying levels of fairness through parameter adjustment. To maximize the value of the improved trade-off ITR under specific fairness levels, we propose a smoothed fairness constraint for estimating the adjustable parameter. Additionally, we establish a theoretical upper bound on the value loss for the improved trade-off ITR. We demonstrate performance of the proposed method through extensive simulation studies and application to the Next 36 entrepreneurial program dataset.

个人化治疗规则(ITRs)由于在精准医学、搭便车和广告建议等领域的广泛应用而得到极大关注,然而,当ITRs受到种族、性别或年龄等敏感属性的影响时,它们可能导致某些群体处于不公平的有利地位或不利地位;为弥补这一差距,我们提议基于最佳运输理论的灵活办法,该理论能够将任何最佳ITR转变成公平的ITR,确保人口均等;认识到公平限制下的价值潜在损失,我们引入了“简化的权衡 ITR”,目的是平衡价值优化和公平,同时通过参数调整兼顾不同程度的公平性;为了在具体公平水平下尽可能扩大改进的ITR贸易价值,我们建议对估计可调整的参数实行一种顺畅的公平性限制;此外,我们为改进的ITR贸易的价值设定了一个理论上限。我们通过广泛的模拟研究和对下一个36个创业方案数据集的应用,展示了拟议方法的绩效。

Article 87

Title@2025-07-31 (4): SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Title: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

SWE-Debatte: Wettbewerbsfähige Multi-Agenten-Debatte für die Lösung von Software-Problemen

SWE-Debate:解决软件问题竞争性多机构辩论 2507.23348v1

Authors (9): Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, Qianxiang Wang

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents’ independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.

由于大型语言模型(LLMs)的先进推理能力,问题解决取得了显著进展。最近,SWE代理商等基于代理商的框架进一步推进了这一进展,使自动使用工具的代理商能够应对复杂的软件工程任务。虽然现有基于代理商的问题解决方法主要基于代理商的独立探索,但它们往往被困在本地解决方案中,无法查明跨越代码库不同部分的问题模式。为了应对这一限制,我们提议SWE-Debate,这是一个竞争性多代理商辩论框架,鼓励多种推理路径,实现更综合的问题本地化。SWE-Debate首先通过绘制代码依赖性图表,生成多重错误传播痕迹,作为本地化建议。然后,它组织专门代理商之间的三轮辩论,每个都体现了与错误传播跟踪相关的不同推理观点。这种结构竞争使代理商能够就综合固定计划开展合作。最后,这一综合固定计划被纳入基于MCTS的代码修改工具,用于补丁生成。SWE-Debate在SWE-Bench基准上进行的实验表明,SWE-Deate在开放代理商基准框架中实现了新的州差幅。

Article 88

Title@2025-07-31 (4): Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

Title: Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

Strompreisvorhersage mit Multi-Kernel Gaussian Prozess-Regression kombiniert mit Kernel-basierte Unterstützung Vektor-Regression

利用多克朗高斯进程回归与内核支持矢量回归结合进行的电力价格预测 2412.00123v4

Authors (3): Abhinav Das, Stephan Schlüter, Lorenz Schneider

This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].

本文为预测德国电价提供了一个新的混合模型。算法基于高斯进程回归( GPR) 和支持矢量回归( SVR) 的组合。虽然 GPR 是数据中学习随机模式和内推法的合格模型, 但它用于抽样外数据的性能并不大有希望。通过选择一个合适的数据依赖共差功能, 我们就可以提高GPR对正在测试的德国小时电价的性能。但是, 由于标外预测取决于培训数据, 预测容易受到噪音和外向效应的影响。要克服这一问题, 使用SVR( 应用基于边差的优化) 来计算单独的预测。在处理非线性进程和外向值时, 这种方法并不十分有利, 因为培训数据中只有某些必要的点( 支持矢量) 才是回归的责任。然后, 个人预测是使用统一的重量进行线性组合。在测试德国历史电价时, 这种方法比公众可以使用的基准, 即用最近的LASO 估计的自动回归模型, 提供了最新的自动回归模型。

Article 89

Title: Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation

Dynamische Preisgestaltung für Bike-Sharing-Systeme über eine charakteristische agentenbasierte Simulation

通过基于不同制剂的模拟,为自行车共享系统设计动态定价 2507.23344v1

Authors (4): Tatsuya Mitomi, Fumiyasu Makinoshima, Fumiya Makihara, Eigo Segawa

Bike-sharing systems are emerging in various cities as a new ecofriendly transportation system. In these systems, spatiotemporally varying user demands lead to imbalanced inventory at bicycle stations, resulting in additional relocation costs. Therefore, it is essential to manage user demand through optimal dynamic pricing for the system. However, optimal pricing design for such a system is challenging because the system involves users with diverse backgrounds and their probabilistic choices. To address this problem, we develop a differentiable agent-based simulation to rapidly design dynamic pricing in bike-sharing systems, achieving balanced bicycle inventory despite spatiotemporally heterogeneous trips and probabilistic user decisions. We first validate our approach against conventional methods through numerical experiments involving 25 bicycle stations and five time slots, yielding 100 parameters. Compared to the conventional methods, our approach obtains a more accurate solution with a 73% to 78% reduction in loss while achieving more than a 100-fold increase in convergence speed. We further validate our approach on a large-scale urban bike-sharing system scenario involving 289 bicycle stations, resulting in a total of 1156 parameters. Through simulations using the obtained pricing policies, we confirm that these policies can naturally induce balanced inventory without any manual relocation. Additionally, we find that the cost of discounts to induce the balanced inventory can be minimized by setting appropriate initial conditions.

在不同的城市中,自行车共享系统正在作为一种新的生态友好运输系统出现。在这些系统中,零星变化的用户需求导致自行车站库存的不平衡,导致更多的搬迁费用。因此,通过系统的最佳动态定价来管理用户需求至关重要。然而,这种系统的最佳定价设计具有挑战性,因为该系统涉及不同背景的用户及其概率选择。为解决这一问题,我们开发了一种基于不同代理的模拟,以迅速设计自行车共享系统的动态定价,尽管有短暂的不同旅行和概率性用户决定,却实现平衡的自行车库存。我们首先通过涉及25个自行车站和5个时档的数字实验来验证我们的方法,产生100个参数。与常规方法相比,我们的方法获得了更准确的解决方案,将损失减少73%至78%,同时将趋同速度提高100倍以上。我们进一步验证了我们对于涉及289个自行车站的大规模城市自行车共享系统情景的方法,从而得出了总共1156项参数。我们通过使用获得的定价政策进行模拟,首先验证了常规方法,通过涉及25个自行车站和5个时档,产生100个时段参数。我们的方法,我们确认与常规方法的方法是有效的。与常规方法的验证。与常规方法相比,我们的方法得到了一种更精确的解决方案,与常规方法,与常规方法,与常规方法相比,与常规方法得到100个参数。与常规方法是比较,与常规方法是比较,与常规方法,与常规方法,我们的方法是比较,与常规方法,与常规方法,与常规方法,与常规方法的比,我们的方法是比较,与常规方法的比,我们的方法,与常规方法是比较,我们的方法,我们的方法得到的比,与常规方法的比,我们的方法得到的比,我们的方法得到的比,我们的方法得到的比,我们的方法得到的比,与常规方法得到100。与常规方法得到的比,比,我们的方法得到的比,比,我们的方法得到的比,我们的方法得到的比,比,我们的方法得到的比,我们的方法得到的比,我们得到的比,我们的方法得到的比,我们的方法是可以使我们的方法得到的比,比,我们的方法得到的比,我们得到的比重,我们的方法获得的价格政策可以使我们的方法得到的比,我们得到的比,我们得到的比,我们得到的比价政策,比,我们得到的

Article 90

Title@2025-07-31 (4): Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k Predictions

Title: Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k Predictions

Skalierbare und präzise Patch Robustness Zertifizierung für Deep Learning Modelle mit Top-K Vorhersagen

具有顶级预测力的深学习模型可缩放和精确的补丁强度认证 2507.23335v1

Authors (4): Qilin Zhou, Haipeng Wang, Zhengyuan Wei, W. K. Chan

Patch robustness certification is an emerging verification approach for defending against adversarial patch attacks with provable guarantees for deep learning systems. Certified recovery techniques guarantee the prediction of the sole true label of a certified sample. However, existing techniques, if applicable to top-k predictions, commonly conduct pairwise comparisons on those votes between labels, failing to certify the sole true label within the top k prediction labels precisely due to the inflation on the number of votes controlled by the attacker (i.e., attack budget); yet enumerating all combinations of vote allocation suffers from the combinatorial explosion problem. We propose CostCert, a novel, scalable, and precise voting-based certified recovery defender. CostCert verifies the true label of a sample within the top k predictions without pairwise comparisons and combinatorial explosion through a novel design: whether the attack budget on the sample is infeasible to cover the smallest total additional votes on top of the votes uncontrollable by the attacker to exclude the true labels from the top k prediction labels. Experiments show that CostCert significantly outperforms the current state-of-the-art defender PatchGuard, such as retaining up to 57.3% in certified accuracy when the patch size is 96, whereas PatchGuard has already dropped to zero.

认证的回收技术保证了对经认证样本的唯一真实标签的预测。然而,现有技术,如果适用于顶级预测,对标签之间的票数进行共同的对比比较,由于攻击者控制的票数(即攻击预算)的通货膨胀,未能核证最高K类预测标签中的唯一真实标签(准确地说,攻击者控制票数的通胀性,攻击者控制票数(即攻击预算)的最小增加票总数;但计算所有组合的选票分配情况都因组合式爆炸问题而受到影响。我们提议CostCert,一个新颖的、可扩缩的、精确的基于投票的认证样本保护者。成本Cert在最高K类预测中核实一个样本的真实标签,而不进行对齐比较,并通过新设计进行组合爆炸:由于攻击者控制票数的最小总数(即攻击者无法控制的预算);但是将所有选票分配的组合都从顶级预测标签中排除了真正的标签。实验显示,在经过认证的精确度为第57号时,CostCert明显超过目前K级预测的准确度。

Article 91

Title@2025-07-31 (4): FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural Networks

Title: FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural Networks

FovEx: Menschlich inspirierte Erklärungen für Visionstransformer und konvolutionäre Neuralnetzwerke

FovEx:对愿景变异者和革命性神经网络的人类启发解释 2408.02123v3

Authors (6): Mahadev Prasad Panda, Matteo Tiezzi, Martina Vilas, Gemma Roig, Bjoern M. Eskofier, Dario Zanca

Explainability in artificial intelligence (XAI) remains a crucial aspect for fostering trust and understanding in machine learning models. Current visual explanation techniques, such as gradient-based or class-activation-based methods, often exhibit a strong dependence on specific model architectures. Conversely, perturbation-based methods, despite being model-agnostic, are computationally expensive as they require evaluating models on a large number of forward passes. In this work, we introduce Foveation-based Explanations (FovEx), a novel XAI method inspired by human vision. FovEx seamlessly integrates biologically inspired perturbations by iteratively creating foveated renderings of the image and combines them with gradient-based visual explorations to determine locations of interest efficiently. These locations are selected to maximize the performance of the model to be explained with respect to the downstream task and then combined to generate an attribution map. We provide a thorough evaluation with qualitative and quantitative assessments on established benchmarks. Our method achieves state-of-the-art performance on both transformers (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility among various architectures. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14\% in NSS compared to RISE, +203\% in NSS compared to GradCAM). This comparison enhances our confidence in FovEx’s ability to close the interpretation gap between humans and machines.

人工智能(XAI)的可解释性仍然是培养对机器学习模型的信任和理解的一个重要方面。当前视觉解释技术,例如梯度法或以阶级活动为基础的方法,往往表现出对特定模型结构的高度依赖。相反,以扰动为基础的方法,尽管是模型-不可知性,在计算上成本很高,因为它们需要评估大量远征的模型。在这项工作中,我们引入了基于改变的解释(FovEx),这是由人类视觉启发的新颖的XAI方法。FovEx通过迭接地生成图像的受色化图解和将其与基于梯度的视觉探索结合起来,以高效地确定感兴趣的地点。选择这些地点是为了尽量扩大模型的性能,以便根据下游任务加以解释,然后形成归属图。我们用对既定基准的定性和定量评估来进行彻底评估。我们的方法在变异器(5度图中的4个)和变异式模型(5度图中的3个比值)之间实现最先进的突扰动性干扰。通过5度比度模型显示人类变异性模型和变异性系统结构,我们制作的变异性模型。

Article 92

Title@2025-07-31 (4): MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation

Title: MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation

MUST-RAG: MUSical Text Question Beantwortung mit retrieval Augmented Generation

MOST-RAG: 以回取增加的一代人回答的中文本问题 2507.23334v1

Authors (3): Daeyong Kwon, SeungHeon Doh, Juhan Nam

Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs’ effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to LLMs by retrieving relevant context information when generating answers to questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a music-specialized vector database for the retrieval stage, and (2) utilizes context information during both inference and fine-tuning processes to effectively transform general-purpose LLMs into music-specific models. Our experiment demonstrates that MusT-RAG significantly outperforms traditional fine-tuning approaches in enhancing LLMs’ music domain adaptation capabilities, showing consistent improvements across both in-domain and out-of-domain MQA benchmarks. Additionally, our MusWikiDB proves substantially more effective than general Wikipedia corpora, delivering superior performance and computational efficiency.

大型语言模型(LLMS)的近期进步表明,大语言模型(LLMS)在不同领域都取得了显著进步,尽管在各种任务上表现得非常零,但LLMS在音乐相关应用方面的效力仍然有限,因为其培训数据中音乐专用知识所占比例相对较小。为解决这一局限性,我们提议MUST-RAG,一个基于再获取增强一代(RAG)的综合框架,以将通用LMS改编成仅供文字解答的音乐问题解答(MQA)任务。RAG是一种技术,通过在生成问题答案时检索相关背景信息,为LLMS提供外部知识。为了优化音乐领域的RAG,我们(1) 提议MusWikiDB,即用于检索阶段的音乐专用矢量数据库,以及(2) 在推论和微调过程中利用背景信息,将通用LMMMLMSMS(MQA)MQA(MQA)调控域能力(MSUWIA)的常规微调方法大大超越了LMMMMMMA(GA)的适应能力,并大大改进了我们的普通和高级QA(MQA)计算效率标准。

Article 93

Title@2025-07-31 (4): EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models

Title: EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models

EaqVLA: Kodierungsorientierte Quantisierung für Vision-Language-Action-Modelle

EaqVLA: 愿景-语言-行动模式的编码和一致的量化 2505.21567v2

Authors (6): Feng Jiang, Zihao Zheng, Xiuping Cui, Maoliang Li, JIayu Chen, Xiang Chen

With the development of Embodied Artificial intelligence, the end-to-end control policy such as Vision-Language-Action (VLA) model has become the mainstream. Existing VLA models faces expensive computing/storage cost, which need to be optimized. Quantization is considered as the most effective method which can not only reduce the memory cost but also achieve computation acceleration. However, we find the token alignment of VLA models hinders the application of existing quantization methods. To address this, we proposed an optimized framework called EaqVLA, which apply encoding-aligned quantization to VLA models. Specifically, we propose an complete analysis method to find the misalignment in various granularity. Based on the analysis results, we propose a mixed precision quantization with the awareness of encoding alignment. Experiments shows that the porposed EaqVLA achieves better quantization performance (with the minimal quantization loss for end-to-end action control and xxx times acceleration) than existing quantization methods.

随着Embudi人工智能的开发,终端到终端控制政策,如Vavis-Language-Action(VLA)模型已成为主流。现有的VLA模型面临昂贵的计算/存储成本,需要优化。量化被认为是最有效的方法,不仅可以降低内存成本,还可以实现计算加速。然而,我们发现VLA模型的象征性匹配阻碍了现有量化方法的应用。为了解决这个问题,我们提议了一个称为EaqVLA的优化框架,对VLA模型采用对齐的量化。具体地说,我们提出一个完整的分析方法,以发现不同颗粒体的不匹配。根据分析结果,我们建议一种混合精度的量化,同时认识到编码的一致。实验表明,压轴 EaqVLA的量化性表现比现有的四分解法要好得多(在终端到终端的行动控制和加速度方面最小的量化损失)。

Article 94

Title@2025-07-31 (4): Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Title: Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Transformierte Low-Rank-Anpassung über Tensor-Zersetzung und deren Anwendungen zu Text-zu-Bild-Modellen

通过Tensor分解及其在文本到图像模型中的应用 2501.08727v2

Authors (5): Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji

Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an increasingly popular technique with many applications. Among the various PEFT methods, Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness, enabling users to fine-tune models with limited computational resources. However, the approximation gap between the low-rank assumption and desired fine-tuning weights prevents the simultaneous acquisition of ultra-parameter-efficiency and better performance. To reduce this gap and further improve the power of LoRA, we propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations. In specific, we first apply a full-rank and dense transform to the pre-trained weight. This learnable transform is expected to align the pre-trained weight as closely as possible to the desired weight, thereby reducing the rank of the residual weight. Then, the residual part can be effectively approximated by more compact and parameter-efficient structures, with a smaller approximation error. To achieve ultra-parameter-efficiency in practice, we design highly flexible and effective tensor decompositions for both the transform and residual adaptations. Additionally, popular PEFT methods such as DoRA can be summarized under this transform plus residual adaptation scheme. Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation. The results manifest that our method can achieve better performances and parameter efficiency compared to LoRA and several baselines.

文本到图像模型(PEFT)的精度精度微调(PEFT)已经成为一种日益流行的技术,有许多应用。在各种PEFT方法中,低Rank适应(LORA)及其变方因其有效性而得到极大关注,使用户能够以有限的计算资源微调模型;然而,低级别假设和所需微调重量之间的近距离差距使得无法同时获得超参数效率和更好的性能。为了缩小这一差距并进一步提高LORA的功率,我们提出了一种新的PEFT方法,将两种类型的适应(即变换和剩余适应)结合起来。具体地说,我们首先将完全和密集的变换到预先训练的重量。这种可学习的变换有望尽可能地将预先调整的重量与预期的重量相匹配,从而降低剩余重量的等级。然后,剩余部分可以用更紧凑和具有参数效率的结构来有效地比较,同时缩小近效度误差。为了实现超偏差效率的做法,我们比较了不同等级的基线和剩余值的参数,我们设计了高弹性的变压和升级的变压方法。

Article 95

Title@2025-07-31 (4): MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

Title: MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

MVCNet: Multi-View Kontrastives Netzwerk für die Klassifizierung von Motorbildern

MVCNet:机动图像分类多视比网络 2502.17482v4

Authors (4): Ziwei Wang, Siyang Li, Xiaoqing Chen, Dongrui Wu

Electroencephalography (EEG)-based brain-computer interfaces (BCIs) enable neural interaction by decoding brain activity for external communication. Motor imagery (MI) decoding has received significant attention due to its intuitive mechanism. However, most existing models rely on single-stream architectures and overlook the multi-view nature of EEG signals, leading to limited performance and generalization. We propose a multi-view contrastive network (MVCNet), a dual-branch architecture that parallelly integrates CNN and Transformer blocks to capture both local spatial-temporal features and global temporal dependencies. To enhance the informativeness of training data, MVCNet incorporates a unified augmentation pipeline across time, frequency, and spatial domains. Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches. Final representations are fused and jointly optimized by contrastive and classification losses. Experiments on five public MI datasets across three scenarios demonstrate that MVCNet consistently outperforms nine state-of-the-art MI decoding networks, highlighting its effectiveness and generalization ability. MVCNet provides a robust solution for MI decoding by integrating multi-view information and dual-branch modeling, contributing to the development of more reliable BCI systems.

机动图像(MI)解码因其直观机制而受到极大关注,然而,大多数现有模型依赖单流结构,忽视了EEEG信号的多视图性质,导致性能和一般化有限。我们建议建立一个多视角对比网络(MVCNet),一个双层结构,将CNN和变异器组合在一起,以捕捉当地空间时空特征和全球时间依赖性。为提高培训数据的信息性,MVCNet在时间、频率和空间范围内采用了统一的增强管道。还引入了两个对比式模块:一个交叉视角对比模块,使原始观点和强化观点的一致性得以实现,以及一个交叉模式对比式模块,使从两个分支提取的特征保持一致。最后的演示通过对比性损失和分类性损失进行整合和联合优化。在三种情景下对五种公共MI数据集的实验表明,MVCNet持续超越了在时间、频率和空间范围内的统一增强管道。进一步引入了两个对比式模块:一个交叉视角对比式模块,使原始观点和强化观点的一致性,使IMV-MI-co系统更加稳健的双向下发展。

Article 96

Title: HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

HER2-Expression-Vorhersage mit flexiblen Multi-Modal-Eingängen durch dynamische bidirektionale Rekonstruktion

通过动态双向重建灵活多模式输入的HER2表达式预测 2506.10006v2

Authors (8): Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi

In breast cancer HER2 assessment, clinical evaluation relies on combined H&E and IHC images, yet acquiring both modalities is often hindered by clinical constraints and cost. We propose an adaptive bimodal prediction framework that flexibly supports single- or dual-modality inputs through two core innovations: a dynamic branch selector activating modality completion or joint inference based on input availability, and a cross-modal GAN (CM-GAN) enabling feature-space reconstruction of missing modalities. This design dramatically improves H&E-only accuracy from 71.44% to 94.25%, achieves 95.09% with full dual-modality inputs, and maintains 90.28% reliability under single-modality conditions. The “dual-modality preferred, single-modality compatible” architecture delivers near-dual-modality accuracy without mandatory synchronized acquisition, offering a cost-effective solution for resource-limited regions and significantly improving HER2 assessment accessibility.

在乳腺癌HER2评估中,临床评价依赖于H&E和IHC综合图像,但这两种模式往往都受到临床限制和成本的阻碍。我们提议一个适应性双模式预测框架,通过两种核心创新灵活地支持单一或双重模式投入:一个动态分支选择模式的完成或基于投入可获得性的联合推论,以及一个跨模式GAN(CM-GAN),使缺失模式的地貌空间重建得以实现。这一设计大大提高了H&E的纯精度,从71.44%提高到94.25%,实现了95.09%的全双重模式投入,并在单一模式条件下保持了90.28%的可靠性。 “双模式首选、单一模式兼容性”架构提供了近乎双模式的准确性,而无需强制同步获取,为资源有限的地区提供了具有成本效益的解决方案,并显著改进了HER2评估的无障碍性。

Article 97

Title@2025-07-31 (4): Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Title: Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Gute Lernende denken an ihr Denken: Generatives PRM macht groß aufschlussreiches Modell effizienter Math Learner

优秀的学习者思考他们的思考:创创型程序使大型理性模型提高数学学习者的效率 2507.23317v1

Authors (6): Tao He, Rongchuan Mu, Lizi Liao, Yixin Cao, Ming Liu, Bing Qin

Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in inefficient optimization process. In this work, we investigate the function of process reward models (PRMs) to accelerate the RL training for LRMs. We propose a novel intrinsic signal-driven generative process evaluation mechanism operating at the thought level to address major bottlenecks in RL-based training. Specifically, instead of requiring PRMs to know how to solve problems, our method uses intrinsic signals in solutions to judge stepwise correctness and aggregate contiguous correct/incorrect steps into coherent ‘thought’ units. This structured, thought-level rewards enable more reliable credit assignment by reducing ambiguity in step segmentation and alleviating reward hacking. We further introduce a capability-adaptive reward mechanism that dynamically balances exploration and exploitation based on the LRM’s current proficiency, guiding learning without stifling creative trial-and-error. These innovations are integrated into a new off-policy RL algorithm, TP-GRPO, which extends grouped proximal optimization with process-based rewards and improves training efficiency. Experiments on 1.5B and 7B parameter LRMs demonstrate that our method achieves higher problem-solving accuracy with significantly fewer training samples than outcome-only reward baselines. The results validate that well-structured process rewards can substantially accelerate LRM optimization in math reasoning tasks. Code is available at https://github.com/cs-holder/tp_grpo.

大型推理模型(LRMs)最近显示,在采用强化学习(RL)优化时,在解决复杂的数学问题时,大型推理模型(LRMs)最近显示了解决复杂的数学问题的希望。但是,常规方法依靠只注重结果的奖励,而这种奖励则提供稀少的反馈,导致效率低下的优化进程。在这项工作中,我们调查流程奖励模型(PRMs)的功能,以加快对LRMs的RL培训。我们提议在思维层面建立一个新型的由信号驱动的基因化进程评价机制,以解决基于RLM培训的主要瓶颈问题。具体地说,我们的方法不是要求PRMS知道如何解决问题,而是在解决方案中使用内在的信号来判断渐进式正确性,将连续的正确/不正确的步骤合并成“思考”的单位。这种结构化的、思想层次上的奖励能够通过减少分级分级的分级分级分级化和减轻黑客的奖励任务,我们进一步引入一种能力适应性奖励机制,根据LRMMM目前精度的精度水平,指导学习,而不用试验-ral-ral-ral-ration-ration-ration-ration-reval-ral-reval-rilling-reval-ral-ral-ral-rmal-rmal-rvial-rmal-rmal-rmal-rvilation-rus-rvald-rmal-rmal-rma-rmalxx-rmax-ass-rmal-rmalx-lation-lation-lation-rmal-ld-lis-lation-lx-lx-lxx-lis-lis-lis-al-al-al-al-al-al-al-al-al-al-al-lg-al-al-al-lg-lxxxxxxxxxxxxxxxxxx-al-lx-lx-l-l-lx-lx-lx-al-al-al-l-l-l-l-lx-lx-lx-lax-al-lx-al-l-lx-al-l会能-al-

Article 98

Title@2025-07-31 (4): MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

Title: MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

MemShare: Memory Effiziente Schlussfolgerung für große Vernunftmodelle durch KV Cache Reuse

Memshare:通过 KV 缓存再使用大型理由模型的内存高效引用 2507.21433v2

Authors (4): Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu

Large Reasoning Models (LRMs) have achieved significant advances in mathematical reasoning and formal logic tasks. However, their tendency to generate lengthy chain-of-thought sequences leads to substantial memory overhead during inference. We observe that LRMs frequently produce highly similar intermediate reasoning steps, which correspond to similar KV cache states across layers. Motivated by this observation, we propose MemShare, a novel KV cache management approach that effectively reduces memory overhead. MemShare employs a collaborative filtering algorithm to efficiently identify reusable KV cache blocks and enables zero copy cache reuse to significantly reduce memory overhead, improve throughput while maintaining accuracy. Experimental results demonstrate that MemShare delivers up to 84.79\% improvement in throughput while maintaining better accuracy compared to existing KV cache management methods.

大型理性模型(LRMs)在数学推理和正式逻辑任务方面取得了显著进步,然而,它们往往产生冗长的思维链序列,导致在推理过程中大量内存管理。我们观察到,LRMs经常产生非常相似的中间推理步骤,这与不同层次的类似的KV缓存状态相对应。我们提出MemShare,这是一个创新的KV缓存管理方法,可以有效减少内存管理。MemShare使用一种协作过滤算法,以有效识别可重复使用的KV缓存区块,使零复制缓存再利用能够大大减少内存管理,提高吞吐量,同时保持准确性。实验结果显示,MemShare在与现有的KV缓存管理方法相比保持更高的准确性的同时,在吞吐量方面实现了高达84.79的改进。

Article 99

Title@2025-07-31 (4): Impact of Hyperparameter Optimization on the Accuracy of Lightweight Deep Learning Models for Real-Time Image Classification

Title: Impact of Hyperparameter Optimization on the Accuracy of Lightweight Deep Learning Models for Real-Time Image Classification

Auswirkungen von Hyperparameter-Optimierung auf die Genauigkeit leichter Deep Learning-Modelle für Echtzeit-Bildklassifikation

超超参数优化对实时图像分类轻型深度学习模型准确性的影响 2507.23315v1

Authors (5): Vineet Kumar Rakesh, Soumya Mazumdar, Tapas Samanta, Sarbajit Pal, Amitabha Das

Lightweight convolutional and transformer-based models have become vital for real-time image classification in resource-constrained applications, such as embedded systems and edge devices. This work analyzes the influence of hyperparameter adjustment on the accuracy and convergence behavior of seven efficient deep learning architectures: EfficientNetV2-S, ConvNeXt-T, MobileViT v2 (XXS/XS/S), MobileNetV3-L, TinyViT-21M, and RepVGG-A2. All models are trained on the ImageNet-1K dataset under consistent training settings, with an emphasis on real-time practicality. An comprehensive ablation study is undertaken to separate the effect of critical hyperparameters, including learning rate schedules, batch sizes, input resolution, data augmentation, regularization approaches, and optimizer choice. To assess appropriateness for real-time applications, each model is assessed not only in terms of Top-1 and Top-5 classification accuracy, but also in terms of inference time, parameter count, model size, and frames-per-second (FPS) on a GPU-accelerated edge deployment simulation. Results demonstrate that cosine learning rate decay and adjustable batch size may greatly boost both accuracy and convergence speed, while keeping low latency and memory cost. Notably, RepVGG-A2 achieves over 80% Top-1 accuracy with efficient inference performance, offering a compelling balance between accuracy and deployment cost for VGG-style models. The results give practical guidance for constructing resource-efficient deep learning models appropriate for real-time image processing pipelines. All code and training logs are publicly accessible at https://github.com/VineetKumarRakesh/lcnn-opt.

以轻量级变速器为基础的模型和变压器模型对于资源限制的应用,如嵌入系统和边缘装置等应用程序的实时图像效率分类至关重要。这项工作分析了超参数调整对七种高效深层学习结构的准确性和趋同行为的影响:高效NetV2-S、ConnNeXt-T、MoveVT v2(XXS/XS/S)、MmoveNetV3-L、TinyVVT-2M和RepVGG-A2。所有模型都经过了在持续培训环境中对图像Net-1K数据集进行实时图像效率分类的培训,重点是实时实用性。进行了全面的模拟研究,以区分关键超参数的精确度,包括学习率表、批量规模、投入分辨率分辨率分辨率分辨率解析、数据增强、正规化方法以及优化选择。为了评估实时应用的适宜性应用,不仅从Top-1和Top-5分类的准确性角度对每个模型进行评估,而且从精确度、可理解性参数计数、模型大小和深度代码二(FPS)的角度,在GPUE-递增实时精确性精确性精确度精确度精确度精确度的精确度中进行。在快速部署精度精确度精确度精确度精确度精确度精确度上进行测试和快速度精确度精确度精确度上,同时进行成本和精确度精确度模拟进行。在标准测试和精确度上,在进行实时度上,在进行成本级和精确度精确度精确度上进行。在进行。

Article 100

Title@2025-07-31 (4): An Interpretable Data-Driven Unsupervised Approach for the Prevention of Forgotten Items

Title: An Interpretable Data-Driven Unsupervised Approach for the Prevention of Forgotten Items

Ein interpretierbarer, datengestützter, unbeaufsichtigter Ansatz zur Vermeidung vergessener Gegenstände

防止被遗忘物品不受监督的可解释数据驱动的未受监督的防止被遗忘物品的方法 2507.23303v1

Authors (5): Luca Corbucci, Javier Alejandro Borges Legrottaglie, Francesco Spinnato, Anna Monreale, Riccardo Guidotti

Accurately identifying items forgotten during a supermarket visit and providing clear, interpretable explanations for recommending them remains an underexplored problem within the Next Basket Prediction (NBP) domain. Existing NBP approaches typically only focus on forecasting future purchases, without explicitly addressing the detection of unintentionally omitted items. This gap is partly due to the scarcity of real-world datasets that allow for the reliable estimation of forgotten items. Furthermore, most current NBP methods rely on black-box models, which lack transparency and limit the ability to justify recommendations to end users. In this paper, we formally introduce the forgotten item prediction task and propose two novel interpretable-by-design algorithms. These methods are tailored to identify forgotten items while offering intuitive, human-understandable explanations. Experiments on a real-world retail dataset show our algorithms outperform state-of-the-art NBP baselines by 10-15% across multiple evaluation metrics.

准确识别超市访问中被遗忘的物品并提供明确、可解释的解释性的解释来建议它们仍然是下一个篮子预测(NBP)范围内一个未得到充分探讨的问题。现有的NBP方法通常只侧重于预测未来的采购,而没有明确地解决无意遗漏物品的检测问题。这一差距部分是由于缺乏真实世界数据集,无法可靠地估计被遗忘物品。此外,大多数目前的NBP方法都依赖于黑盒模型,这些模型缺乏透明度,限制了向终端用户提出建议的合理性。在本文中,我们正式介绍了被遗忘的物品预测任务,并提出了两种新颖的可逐个解释的算法。这些方法专门用来识别被遗忘的物品,同时提供了直观的、人能理解的解释。对真实世界零售数据集的实验显示,我们的算法在多个评价指标中超出了NBP基准的10-15%。

Article 101

Title@2025-07-31 (4): Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuning

Title: Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuning

Simulationsbasierte Inferenz für Präzisions-Neutrinophysik durch Neural Monte Carlo-Tuning

通过神经蒙特卡洛调控精密中子物理的模拟推论 2507.23297v1

Authors (66): A. Gavrikov, A. Serafini, D. Dolzhikov, A. Garfagnini, M. Gonchar, M. Grassi, R. Brugnera, V. Cerrone, L. V. D’Auria, R. M. Guizzetti, L. Lastrucci, G. Andronico, V. Antonelli, A. Barresi, D. Basilico, M. Beretta, A. Bergnoli, M. Borghesi, A. Brigatti, R. Bruno, A. Budano, B. Caccianiga, A. Cammi, R. Caruso, D. Chiesa, C. Clementi, C. Coletta, S. Dusini, A. Fabbri, G. Felici, G. Ferrante, M. G. Giammarchi, N. Giudice, N. Guardone, F. Houria, C. Landini, I. Lippi, L. Loi, P. Lombardi, F. Mantovani, S. M. Mari, A. Martini, L. Miramonti, M. Montuschi, M. Nastasi, D. Orestano, F. Ortica, A. Paoloni, L. Pelicci, E. Percalli, F. Petrucci, E. Previtali, G. Ranucci, A. C. Re, B. Ricci, A. Romani, C. Sirignano, M. Sisti, L. Stanco, E. Stanescu Farilla, V. Strati, M. D. C Torri, C. Tuvè, C. Venettacci, G. Verde, L. Votano

Precise modeling of detector energy response is crucial for next-generation neutrino experiments which present computational challenges due to lack of analytical likelihoods. We propose a solution using neural likelihood estimation within the simulation-based inference framework. We develop two complementary neural density estimators that model likelihoods of calibration data: conditional normalizing flows and a transformer-based regressor. We adopt JUNO - a large neutrino experiment - as a case study. The energy response of JUNO depends on several parameters, all of which should be tuned, given their non-linear behavior and strong correlations in the calibration data. To this end, we integrate the modeled likelihoods with Bayesian nested sampling for parameter inference, achieving uncertainties limited only by statistics with near-zero systematic biases. The normalizing flows model enables unbinned likelihood analysis, while the transformer provides an efficient binned alternative. By providing both options, our framework offers flexibility to choose the most appropriate method for specific needs. Finally, our approach establishes a template for similar applications across experimental neutrino and broader particle physics.

精确模拟探测器能量反应的模型对于下一代中微子实验至关重要,这些实验由于缺乏分析可能性而带来计算挑战。我们提议在模拟推断框架内使用神经概率估计的解决方案。我们开发了两个补充性神经密度估计器,这些测量器可以模拟校准数据的可能性:有条件的正常流流和以变压器为基础的递减器。我们采用JUNO – – 一个大型中微子实验 – – 作为案例研究。JUNO的能量反应取决于若干参数,所有参数都应加以调整,因为其非线性行为和校准数据中的强烈关联性。为此目的,我们将模型的可能性与贝耶斯人的嵌套取样作为参数的推断性结合,只有近于零系统偏差的统计才能实现不确定性。正常流模型能够进行未孵化的概率分析,而变压器则提供高效的组合替代方法。通过提供两种选项,我们的框架为选择最适合具体需要的最适当方法提供了灵活性。最后,我们的方法为实验性中微微子和更广泛的粒物理学的类似应用建立了模板。

Article 102

Title@2025-07-31 (4): SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

Title: SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

SequenzLayer: Sequenzverarbeitung und Streaming von Neuronalen Netzwerken leicht gemacht

序列激光器:序列处理和串联神经网络变得容易 2507.23292v1

Authors (11): RJ Skerry-Ryan, Julian Salazar, Soroosh Mariooryad, David Kao, Daisy Stanton, Eric Battenberg, Matt Shannon, Ron J. Weiss, Robin Scheibler, Jonas Rothfuss, Tom Bagby

We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at https://github.com/google/sequence-layers.

为实现这一目标,我们引入了神经网络层 API 和序列模型库, 目的是容易地创建可以逐层执行的序列模型( 教师强制培训) 和一步步执行的序列模型( 自动递减抽样 ) 。为了实现这一点, 层界定了它们随着时间推移的状态的清晰描述( 例如变换器 KV 缓存、混凝土缓冲、隐藏的 RNN ) , 并引入一个步骤方法, 该步骤方法将状态化, 测试为给无国籍的分层性职业带来相同结果。以及序列激光器合同的其他方面使复杂模型能够立即流动, 减轻在串流和平行序列处理中产生的广泛常见的错误, 并且可以在任何深层学习图书馆中实施。一个可比较和具有宣示性的 API , 连同一个全面的层层和梳理器组合, 将生产规模模型的构建从简单可流成的组件简化, 同时又保持强烈的正确性保证。我们目前实施的SquecesLayers ( JAX, TensorFlow 2) 可在 httpsrence.

Article 103

Title@2025-07-31 (4): Evaluating the Dynamics of Membership Privacy in Deep Learning

Title: Evaluating the Dynamics of Membership Privacy in Deep Learning

Bewertung der Dynamik der Mitgliedschafts-Privacy in Deep Learning

深层学习中成员隐私动态评估 2507.23291v1

Authors (5): Yuetian Chen, Zhiqi Wang, Nathalie Baracaldo, Swanand Ravindra Kadhe, Lei Yu

Membership inference attacks (MIAs) pose a critical threat to the privacy of training data in deep learning. Despite significant progress in attack methodologies, our understanding of when and how models encode membership information during training remains limited. This paper presents a dynamic analytical framework for dissecting and quantifying privacy leakage dynamics at the individual sample level. By tracking per-sample vulnerabilities on an FPR-TPR plane throughout training, our framework systematically measures how factors such as dataset complexity, model architecture, and optimizer choice influence the rate and severity at which samples become vulnerable. Crucially, we discover a robust correlation between a sample’s intrinsic learning difficulty, and find that the privacy risk of samples highly vulnerable in the final trained model is largely determined early during training. Our results thus provide a deeper understanding of how privacy risks dynamically emerge during training, laying the groundwork for proactive, privacy-aware model training strategies.

尽管在攻击方法方面取得了显著进展,但我们对培训期间成员信息编码模型的时间和方式的理解仍然有限,本文件为在单个样本一级解剖和量化隐私渗漏动态提供了一个动态分析框架。通过在整个培训过程中跟踪FPR-TRP飞机上每个样本的脆弱性,我们的框架系统地测量了数据集复杂性、模型结构以及优化选择等因素如何影响样本易受损的速度和严重性。关键是,我们发现了样本内在学习困难之间的紧密关联,发现最终培训模式中样本高度脆弱的隐私风险在培训初期基本确定。因此,我们的结果更深入地了解了培训期间隐私风险的动态出现,为积极主动的、意识到隐私的示范培训战略奠定了基础。

Article 104

Title@2025-07-31 (4): Tailored Forecasting from Short Time Series via Meta-learning

Title: Tailored Forecasting from Short Time Series via Meta-learning

Maßgeschneiderte Prognose aus Kurzzeitreihen über Meta-Learning

通过元学习从短时间系列中进行量身定制的预测 2501.16325v2

Authors (5): Declan A. Norton, Edward Ott, Andrew Pomerance, Brian Hunt, Michelle Girvan

Machine learning models can effectively forecast dynamical systems from time-series data, but they typically require large amounts of past data, making forecasting particularly challenging for systems with limited history. To overcome this, we introduce Meta-learning for Tailored Forecasting using Related Time Series (METAFORS), which generalizes knowledge across systems to enable forecasting in data-limited scenarios. By learning from a library of models trained on longer time series from potentially related systems, METAFORS builds and initializes a model tailored to short time-series data from the system of interest. Using a reservoir computing implementation and testing on simulated chaotic systems, we demonstrate that METAFORS can reliably predict both short-term dynamics and long-term statistics without requiring contextual labels. We see this even when test and related systems exhibit substantially different behaviors, highlighting METAFORS’ strengths in data-limited scenarios.

机械学习模型能够有效地从时间序列数据中预测动态系统,但它们通常需要大量过去的数据,使预测对历史有限的系统特别具有挑战性。为了克服这一点,我们采用了使用相关时间序列(MEtaFORS)进行定制预测的元学习,该模型将各系统的知识概括化,以便能够在数据有限的假设情景中进行预测。通过从潜在相关系统中学习长期系列培训模型的图书馆,METAFORS建立并初始化了一个适合有兴趣的系统短期系列数据的模型。我们利用储油层计算实施和模拟混乱系统测试,我们证明METAFORS可以可靠地预测短期动态和长期统计数据,而无需贴上上相关标签。我们甚至在测试和相关系统表现出截然不同的行为时也看到这一点,突出METAFORS在数据有限假设情景中的优势。

Article 105

Title@2025-07-31 (4): Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling

Title: Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling

Einblicke in die Closed-Form IPM-GAN Discriminator Guidance for Diffusion Modeling

透视到封闭式 IPPM-GAN-GAN 2306.01654v2

Authors (4): Aadithya Srikanth, Siddarth Asokan, Nishanth Shetty, Chandra Sekhar Seelamantula

Diffusion models are a state-of-the-art generative modeling framework that transform noise to images via Langevin sampling, guided by the score, which is the gradient of the logarithm of the data distribution. Recent works have shown empirically that the generation quality can be improved when guided by classifier network, which is typically the discriminator trained in a generative adversarial network (GAN) setting. In this paper, we propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling, and show that the IPM-GAN optimization can be seen as one of smoothed score-matching, wherein the scores of the data and the generator distributions are convolved with the kernel function associated with the IPM. The proposed approach serves to unify score-based training and optimization of IPM-GANs. Based on these insights, we demonstrate that closed-form kernel-based discriminator guidance, results in improvements (in terms of CLIP-FID and KID metrics) when applied atop baseline diffusion models. We demonstrate these results on the denoising diffusion implicit model (DDIM) and latent diffusion model (LDM) settings on various standard datasets. We also show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.

传播模型是一个最先进的基因模型框架,它通过Langevin抽样将噪音转化为图像,以评分为指导,这是数据分布对数的梯度。最近的工作从经验上表明,在分类网络的指导下,生成质量可以得到改善,分类网络通常是在基因对抗网络(GAN)设置中受过培训的歧视问题人。在本文件中,我们提出了一个理论框架,以分析GAN歧视者对基于Langevin的抽样的影响,并表明IPM-GAN优化可被视为一种平滑的得分匹配方法,其中数据分数和发电机分布与IPM相关的内核函数混在一起。拟议方法有助于统一基于分数的培训和优化IPM-GANs。根据这些认识,我们展示了封闭式的内核分析师指南,在应用(CLIP-FID和KID指标)基准传播模型时,结果可以被看成是平坦的比对,数据分分和发电机分布与IPM相关的内核元函数函数混合。我们展示了这些结果,我们还展示了各种隐含的图像分析模式,我们还展示了各种隐性模型,以便展示了各种隐性模型。

Article 106

Title@2025-07-31 (4): CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

Title: CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

CEE: Eine Inferenz-Zeit-Jailbreak-Verteidigung für eingedrungene Intelligenz über Subraumkonzept-Rotation

中东欧:通过子空间概念旋转对潜入式情报进行推论-时间破狱防御 2504.13201v2

Authors (8): Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang

Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as robots and autonomous vehicles. However, this integration also exposes them to serious jailbreak risks, where malicious instructions can be transformed into dangerous physical actions. Existing defense mechanisms suffer from notable drawbacks–including high training costs, significant inference delays, and complex hyperparameter tuning–which limit their practical applicability. To address these challenges, we propose a novel and efficient inference-time defense framework: Concept Enhancement Engineering (CEE). CEE enhances the model’s inherent safety mechanisms by directly manipulating its internal representations, requiring neither additional training nor external modules, thereby improving defense efficiency. Furthermore, CEE introduces a rotation-based control mechanism that enables stable and linearly tunable behavioral control of the model. This design eliminates the need for tedious manual tuning and avoids the output degradation issues commonly observed in other representation engineering methods. Extensive experiments across multiple EI safety benchmarks and diverse attack scenarios demonstrate that CEE significantly improves the defense success rates of various multimodal LLMs. It effectively mitigates safety risks while preserving high-quality generation and inference efficiency, offering a promising solution for deploying safer embodied intelligence systems.

大型语言模型(LLMS)正日益成为机器人和自主车辆等Ebodied Intell(EI)系统的认知核心;然而,这种整合还使其面临严重的越狱风险,恶意指令可转化为危险的物理行为;现有防御机制存在明显的缺陷,包括高培训成本、严重推论拖延和复杂的超参数调,限制了其实际适用性;为应对这些挑战,我们提议了一个创新和有效的推论时间防御框架:概念增强工程。中东欧通过直接调整内部代表方式,既不需要额外的培训,也不需要外部模块,从而提高防御效率,加强了模型的固有安全机制。此外,中东欧还引入了基于轮换的控制机制,使该模型能够稳定和线性地对金枪鱼行为进行控制。这一设计消除了对老调和避免其他代表工程方法常见的产出退化问题的需求。跨多个EI安全基准和不同攻击情景的广泛实验表明,中东欧通过直接操纵其内部代表方式,既不需要额外的培训,也不需要额外的外部模块,从而能够提高国防成功率,从而提高防御效率。此外,中东欧还引入了基于高品质的系统,有效地降低安全风险。

Article 107

Title: SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research

SmartPNT-MSF: Multi-Sensor-Fusionsdatensatz für Positionierung und Navigationsforschung

SmartPNT-MSF:用于定位和导航研究的多传感器融合数据集 2507.19079v2

Authors (5): Feng Zhu, Zihang Zhang, Kangcheng Teng, Abduhelil Yakup, Xiaohong Zhang

High-precision navigation and positioning systems are critical for applications in autonomous vehicles and mobile mapping, where robust and continuous localization is essential. To test and enhance the performance of algorithms, some research institutions and companies have successively constructed and publicly released datasets. However, existing datasets still suffer from limitations in sensor diversity and environmental coverage. To address these shortcomings and advance development in related fields, the SmartPNT Multisource Integrated Navigation, Positioning, and Attitude Dataset has been developed. This dataset integrates data from multiple sensors, including Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), optical cameras, and LiDAR, to provide a rich and versatile resource for research in multi-sensor fusion and high-precision navigation. The dataset construction process is thoroughly documented, encompassing sensor configurations, coordinate system definitions, and calibration procedures for both cameras and LiDAR. A standardized framework for data collection and processing ensures consistency and scalability, enabling large-scale analysis. Validation using state-of-the-art Simultaneous Localization and Mapping (SLAM) algorithms, such as VINS-Mono and LIO-SAM, demonstrates the dataset’s applicability for advanced navigation research. Covering a wide range of real-world scenarios, including urban areas, campuses, tunnels, and suburban environments, the dataset offers a valuable tool for advancing navigation technologies and addressing challenges in complex environments. By providing a publicly accessible, high-quality dataset, this work aims to bridge gaps in sensor diversity, data accessibility, and environmental representation, fostering further innovation in the field.

高精确度导航和定位系统对于自主车辆和移动绘图的应用至关重要,在自主车辆和移动绘图中,稳健和持续的地方化至关重要。为测试和加强算法的性能,一些研究机构和公司相继建造并公开发布数据集。然而,现有的数据集仍然受到传感器多样性和环境覆盖方面的限制。为克服这些缺陷和相关领域的先进发展,开发了SmartPNT多源综合导航、定位和姿态数据集。该数据集整合了来自多个传感器的数据,包括全球导航卫星系统(GNSS)、惯性测量单位(IMU)、光学照相机和LIDAR。为多传感器融合和高精确度导航导航的研究提供丰富和多功能的资源。数据集的构建过程有详细记录,包括传感器配置、协调系统定义和对照相机的校正程序。一个标准化的数据收集和处理框架确保一致性和可扩展性,便于进行大规模分析。利用州级的同步定位和可达度度度度度度度度度测量单位(IM)、复杂度摄像头摄影机和绘图(SLAM),在高清晰度导航和高清晰度的轨道上进行数据评估,在高端导航区域进行数据分析。

Article 108

Title@2025-07-31 (4): DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

Title: DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

DynaSwarm: Dynamische Graphenstrukturauswahl für LLM-basiertes Multi-Agent-System

DynSwarm: 以LLM为基础的多剂系统动态图结构选择 2507.23261v1

Authors (2): Hui Yi Leong, Yuqing Wu

Current multi-agent systems (MAS) frameworks often rely on manually designed and static collaboration graph structures, limiting adaptability and performance. To address these limitations, we propose DynaSwarm, a dynamic framework that enhances LLM-based MAS through two key innovations: (1) an actor-critic reinforcement learning (A2C) mechanism to optimize graph structures with improved stability over prior RL methods, and (2) a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample via parameter-efficient LLM fine-tuning. DynaSwarm eliminates the need for rigid, one-fits-all graph architectures, instead leveraging sample-specific idiosyncrasies to dynamically route queries through specialized agent networks. (c) We propose to fine-tune the demonstration retriever to fully exploit the power of in-context learning (ICL). Extensive experiments on question answering, mathematical reasoning, and coding tasks demonstrate that DynaSwarm consistently outperforms state-of-the-art single-agent and MAS baselines across multiple LLM backbones. Our findings highlight the importance of sample-aware structural flexibility in LLM MAS designs.

目前的多试剂系统框架往往依赖于人工设计和静态协作图结构,限制了适应性和性能。为解决这些限制,我们提议Dynaswarm,这是一个动态框架,通过两个关键的创新,加强以LLM为基础的MAS,加强LM的LMS:(1) 行为者-北极强化学习(A2C)机制,优化图形结构,使其比先前RL方法更加稳定;(2) 动态图形选择器,通过参数高效LLM微调,适应性地选择每个输入样本的最佳图形结构。Dynaswarm消除了对硬性、一刀切的图形结构的需要,而不用通过专门的LMM主干网利用样本的特征合成来动态地查询。 (c) 我们建议对演示检索器进行微调,以充分利用文本学习的力量(ICLL),对问题回答、数学推理和共同任务进行广泛的实验,表明Dynaswarm始终超越LMM系统多个主干网的S-awa结构。我们的调查结果强调了LMMAS设计中样品结构灵活性的重要性。

Article 109

Title@2025-07-31 (4): AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

Title: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

KI sollte besser fühlen, nicht nur größer skalieren: Adaptive Sensing als Paradigmenverschiebung

AI 应当更好,而不仅仅是规模更大:将适应性遥感作为范式转变 2507.07820v2

Authors (6): Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim

Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)–we advocate for adaptive sensing as a necessary and foundational shift. Adaptive sensing proactively modulates sensor parameters (e.g., exposure, sensitivity, multimodal configurations) at the input level, significantly mitigating covariate shifts and improving efficiency. Empirical evidence from recent studies demonstrates that adaptive sensing enables small models (e.g., EfficientNet-B0) to surpass substantially larger models (e.g., OpenCLIP-H) trained with significantly more data and compute. We (i) outline a roadmap for broadly integrating adaptive sensing into real-world applications spanning humanoid, healthcare, autonomous systems, agriculture, and environmental monitoring, (ii) critically assess technical and ethical integration challenges, and (iii) propose targeted research directions, such as standardized benchmarks, real-time adaptive algorithms, multimodal integration, and privacy-preserving methods. Collectively, these efforts aim to transition the AI community toward sustainable, robust, and equitable artificial intelligence systems.

尽管取得了显著的成功,但这一模式带来了巨大的环境、经济和道德成本,限制了可持续性和公平获取。受到生物感官系统的启发,在生物感官系统的指导下,适应性动态地出现在投入(例如调整学生规模、调整视野)上,我们倡导将适应性遥感作为一项必要和基本的变化。适应性遥感积极主动地在投入一级调控传感器参数(例如接触、敏感度、多式配置),大大减轻共变变化并提高效率。从最近的研究中获得的经验证据表明,适应性感应使小型模型(例如高效网-B0)能够大大超过规模更大的模型(例如OpenCLIP-H),这些模型经过大量的数据培训,并经过大量整理。我们(一) 概述了将适应性遥感广泛纳入涵盖人类类、保健、自主系统、农业和环境监测等真实世界应用的路线图,(二) 严格评估技术和道德融合的挑战,并提高效率。从最近获得的经验证据表明,适应性感应使小型模型(例如高效网-B0)能够大大超过那些经过数据培训并广泛加以调整的模型(例如On CLIP-H)——我们提出了适应性)——我们所设计的模型。 (一)将适应性、稳健健健健健准的模型、机构、自动和人造型系统纳入这些系统。

Article 110

Title@2025-07-31 (4): Efficient Machine Unlearning via Influence Approximation

Title: Efficient Machine Unlearning via Influence Approximation

Effizientes maschinelles Lernen durch Einflussannäherung

通过 “ 影响力接近 “ 解除学习 2507.23257v1

Authors (4): Jiawei Liu, Chenwang Wu, Defu Lian, Enhong Chen

Due to growing privacy concerns, machine unlearning, which aims at enabling machine learning models to ``forget” specific training data, has received increasing attention. Among existing methods, influence-based unlearning has emerged as a prominent approach due to its ability to estimate the impact of individual training samples on model parameters without retraining. However, this approach suffers from prohibitive computational overhead arising from the necessity to compute the Hessian matrix and its inverse across all training samples and parameters, rendering it impractical for large-scale models and scenarios involving frequent data deletion requests. This highlights the difficulty of forgetting. Inspired by cognitive science, which suggests that memorizing is easier than forgetting, this paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning). This connection allows machine unlearning to be addressed from the perspective of incremental learning. Unlike the time-consuming Hessian computations in unlearning (forgetting), incremental learning (memorizing) typically relies on more efficient gradient optimization, which supports the aforementioned cognitive theory. Based on this connection, we introduce the Influence Approximation Unlearning (IAU) algorithm for efficient machine unlearning from the incremental perspective. Extensive empirical evaluations demonstrate that IAU achieves a superior balance among removal guarantee, unlearning efficiency, and comparable model utility, while outperforming state-of-the-art methods across diverse datasets and model architectures. Our code is available at https://github.com/Lolo1222/IAU.

由于对隐私的日益关切,旨在让机器学习模式能够“忘记”特定培训数据的机器不学习日益受到越来越多的关注。在现有方法中,基于影响力的不学习已成为一个突出的方法,因为它能够估计单个培训样本对模型参数的影响,而无需再培训。然而,由于有必要对所有培训样本和参数进行计算,因此这种方法的计算间接费用令人望而生畏,因为需要在所有培训样本和参数中计算赫森矩阵及其反射,这使得大规模模型和涉及频繁数据删除请求的情景不切实际。这突显了忘记的困难。在认知科学的启发下,认为记忆化比忘记容易,本文在记忆化(深入学习)和忘记(不学习)之间建立了一种理论联系。这种联系使得机器不学习能够从渐进学习的角度处理。与花费时间的黑森矩阵计算方法不同,在所有培训样本和参数中进行累进式学习通常依赖于效率更高的梯度优化,从而支持上述认知理论。基于这一联系,我们引入了“影响应用模型化”而不是忘记记忆,本文建立了一种理论之间的理论联系,在记忆中,在记忆中,在记忆中,在记忆中,在记忆中,在记忆化(入入学习(入学习)和忘记(未学习)记忆学习中,从高级的系统效率分析中,在学习中,从我们的数据效率分析中,从系统学习中,从系统上学习一种可演进化的系统上,从结构上演进进进进化方法,在学方法上演进化方法中展示了一种不进进进进化方法,然后演算。

Article 111

Title@2025-07-31 (4): ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Title: ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

ActSafe: Aktive Exploration mit Sicherheitseinschränkungen für Verstärkungslernen

Acsafe:积极探索加强学习的安全制约因素 2410.09486v3

Authors (6): Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, Andreas Krause

Reinforcement learning (RL) is ubiquitous in the development of modern AI systems. However, state-of-the-art RL agents require extensive, and potentially unsafe, interactions with their environments to learn effectively. These limitations confine RL agents to simulated environments, hindering their ability to learn directly in real-world settings. In this work, we present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. ActSafe learns a well-calibrated probabilistic model of the system and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics, while enforcing pessimism w.r.t. the safety constraints. Under regularity assumptions on the constraints and dynamics, we show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements and enables safe exploration even in high-dimensional settings such as visual control. We empirically show that ActSafe obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.

现代光学强化学习系统(RL)的发展是无处不在的。然而,最先进的光学强化学习系统(RL)需要与环境进行广泛且可能不安全的互动,才能有效地学习。这些限制将光学强化剂限制在模拟环境中,阻碍其在现实环境中直接学习的能力。在这项工作中,我们介绍了Acsafe, 一种基于模型的新颖的安全和高效探索的RL算法。Acsafe 学习了一种以模型为基础的系统及计划的精确的概率模型,并乐观地了解了未知动态的隐性不确定性,同时实施悲观主义(w.r.t.)的安全限制。在对限制和动态的常规假设下,我们表明Acsafe保证学习期间的安全,同时在有限的时间内获得接近最佳的政策。此外,我们提出了Acsafe的一个实用的变体,以最新的光学进步为基础,甚至能够在高层次环境中进行安全探索,例如视觉控制。我们的经验显示,Acsafe在深度的安全探索任务中获得了州级的学习标准。

Article 112

Title@2025-07-31 (4): Evaluating LLMs’ Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis

Title: Evaluating LLMs’ Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis

Bewertung der Mehrsprachigkeitsfähigkeiten von LLMs für Bengalen: Benchmark-Erstellung und Leistungsanalyse

评价孟加拉多种语文能力:基准设定和业绩分析 2507.23248v1

Authors (5): Shimanto Bhowmik, Tawsif Tashwar Dipto, Md Sazzad Islam, Sheryl Hsu, Tahsin Reasat

Bengali is an underrepresented language in NLP research. However, it remains a challenge due to its unique linguistic structure and computational constraints. In this work, we systematically investigate the challenges that hinder Bengali NLP performance by focusing on the absence of standardized evaluation benchmarks. We then evaluated 10 recent open source Large Language Models (LLMs) in 8 of the translated datasets and performed a comprehensive error analysis to pinpoint their primary failure modes. Our findings reveal consistent performance gaps for Bengali compared to English, particularly for smaller models and specific model families like Mistral. We also identified promising robustness in certain architectures, such as DeepSeek, that maintain more stable performance across languages. Our analysis reveals an inverse relationship between tokenization efficiency and LLM accuracy where models tend to perform worse when inputs are excessively tokenized, whereas more efficient \& concise tokenization results in improved performance. These findings highlight critical areas where current models fall short and underscore the need for improved dataset quality and evaluation methodologies tailored to multilingual contexts. This work will catalyze further research on NLP for underrepresented languages, helping to democratize access to advanced language technologies worldwide. The code and dataset used in this research is publicly available at https://github.com/BengaliAI/bn-llm-benchmark.

孟加拉语是国家语言方案研究中代表不足的语言。然而,由于语言结构和计算限制的独特性,孟加拉语是一个挑战。在这项工作中,我们系统地调查阻碍孟加拉语国家语言方案业绩的挑战,重点是缺乏标准化的评价基准。我们随后在8个翻译数据集中评估了10个最近开放源码大语言模型(LLMs),并进行了全面的错误分析,以确定其主要失败模式。我们的调查结果显示孟加拉语与英语的绩效差距始终存在,特别是米斯特拉尔等小型模型和具体模型家庭。我们还发现,在某些结构中,如DeepSeek(DeepSeek),保持不同语言更稳定的性能。我们的分析揭示了象征性效率与LLM(LM)准确性之间的反比关系。当投入过于象征性时,模型往往表现更差,而更有效率的缩写效果则导致业绩的改善。这些调查结果突出了当前模型落后的关键领域,并强调了改进数据集质量和评估方法的必要性,特别是针对多种语言背景的小型模型。这项工作将推动对NLP(NLP)进行进一步的研究,有助于全世界使用先进语言技术的民主化。我们的分析揭示了象征性的代码和数据。用于这一公开研究。

Article 113

Title@2025-07-31 (4): GrokAlign: Geometric Characterisation and Acceleration of Grokking

Title: GrokAlign: Geometric Characterisation and Acceleration of Grokking

GrokAlign: Geometrische Charakterisierung und Beschleunigung von Grokking

Grokalign:Grokking的几何特征和加速 2506.12284v2

Authors (4): Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network’s functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network’s Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks – a method we introduce as GrokAlign – which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).

机器学习界的一个关键挑战是理解和加速深网络的培训动态,从而导致延迟的概括化和快速的稳健性,以输入扰动,也称为grokking。先前的工作与以下现象相关:深网络从线性向特征学习制度的过渡延迟的概括化,以及网络功能几何变化的新兴稳健性,特别是深网络中所谓的线性区域使用连续片断非线性线性结构的安排。这里,我们解释如何在深网络的雅各布人中实现角化,并表明将网络的雅各布人与培训数据(以焦素相似感)相匹配,确保了在低层次的雅各布人假设下进行角化。我们的结果为利用雅各布人对深网络进行正规化提供了强烈的理论动机,我们称之为GrokAlign – 我们用经验来引导重力腐蚀的更早于更常规的规范者。此外,我们引入了半机器人调整,作为可牵引力和可解释的雅各布人校准的校正(以识别和跟踪深度网络的阶段/马基质性) 。

Article 114

Title@2025-07-31 (4): Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World Documents

Title: Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World Documents

Generalisiertes Verstärkungslernen für retriever-spezifische Abfrage-Rewriter mit unstrukturierten Real-World-Dokumenten

利用无结构的 “ 现实世界文件 “ 检索特定查询卷卷的通用强化学习 2507.23242v1

Authors (6): Sungguk Cha, DongWook Kim, Taeseung Hahn, Mintae Kim, Youngsub Han, Byoung-Ki Jeon

Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce \textbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $\text{RL-QR}{\text{multi-modal}}$ achieving an 11\% relative gain in NDCG@3 for multi-modal RAG and $\text{RL-QR}{\text{lexical}}$ yielding a 9\% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR’s potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.

(RAG) 系统严重依赖有效的查询配置,以释放外部知识,而优化对多样化、非结构化现实世界文件的查询仍然是一项挑战。我们引入了\ textbf{RL-QR},这是一个强化的检索器特定查询重写的学习框架,它消除了对人附加说明数据集的需求,并扩展了对文本专用数据库和多模式数据库的适用性。通过对情景-问题配对并利用通用的Reward政策优化化(GROPO)、RL-QR火车针对特定检索器的查询重写者,从而提高了不同领域的检索性能。我们对工业内部数据的实验显示出了显著的改进,用$\ text{RL-QL-Qtext{Mult{Muld-modal$在NDCG@3中实现了11 相对增益,用于多模式的RAG和$text{RL-rlexli reliflical relitical },使软件回收者进一步增9。然而,对于语调化和混合检索器精炼机组的挑战依然存在着挑战依然存在着,在Rlical-rchal-rcal-rch的重新整理中,在重新整理中无法改进我们的研究,而提供一种可能的学习。

Article 115

Title@2025-07-31 (4): Accumulator-Aware Post-Training Quantization for Large Language Models

Title: Accumulator-Aware Post-Training Quantization for Large Language Models

Akkumulator-Aware-Nachschulungs-Quantisierung für große Sprachmodelle

大型语文模式培训后量化 2409.17092v2

Authors (5): Ian Colbert, Giuseppe Franco, Fabian Grob, Jinjie Zhang, Rayan Saab

When quantizing weights and activations to increasingly narrower representations, the cost of additions begins to dominate that of multiplications in multiply-accumulate (MAC) units. Recent studies show that reducing addition costs via low-precision accumulation improves throughput, power, and area across inference platforms, albeit with an increased risk of overflow. Accumulator-aware quantization research has so far only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models and datasets continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To bridge this gap, we introduce AXE, the first accumulator-aware quantization framework explicitly designed to endow overflow avoidance guarantees to PTQ algorithms. We present theoretical motivation for AXE and demonstrate its flexibility by implementing it on top of two existing algorithms: GPFQ and OPTQ. We design AXE to support multi-stage accumulation, opening the door to full datapath optimization for the first time. We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naive bit width manipulation by up to 15%.

当将权重和启动量量化到越来越狭义的表示方式时,增加的成本开始在乘积(MAC)单位的倍增成本中占据主导地位。最近的研究表明,通过低精度累积(MAC)单位的倍增成本降低增加成本会改善引算平台的吞吐量、功率和面积,尽管溢出的风险增加。累计觉察量化研究迄今只考虑量化认知培训模式(QAT)模式,其中模型从零到零进行精细调整或培训,在循环中进行量化。随着模型和数据集的不断扩大,QAT技术变得越来越昂贵,这促使最近培训后夸大化(PTQ)研究的激增。为了缩小这一差距,我们引入了AXE,即第一个累积器-觉察觉定量测试框架,其明确旨在向PTQ算法提供避免溢出保证。我们展示了AXE的理论动机,并通过在现有两种算法:GPFFQ和ALQ的顶端执行来显示其灵活性。我们设计AXE的AX技术越来越昂贵,这促使最近AX的AX-E级升级数据在18级模型上进行。

Article 116

Title@2025-07-31 (4): Achieving Deep Continual Learning via Evolution

Title: Achieving Deep Continual Learning via Evolution

Deep Continual Learning durch Evolution erreichen

通过演进实现深入不断学习 2502.06210v2

Authors (6): Aojun Lu, Junchao Ke, Chunhui Ding, Jiahao Fan, Jiancheng Lv, Yanan Sun

Deep neural networks, despite their remarkable success, remain fundamentally limited in their ability to perform Continual Learning (CL). While most current methods aim to enhance the capabilities of a single model, Inspired by the collective learning mechanisms of human populations, we introduce Evolving Continual Learning (ECL), a framework that maintains and evolves a diverse population of neural network models. ECL continually searches for an optimal architecture for each introduced incremental task. This tailored model is trained on the corresponding task and archived as a specialized expert, contributing to a growing collection of skills. This approach inherently resolves the core CL challenges: stability is achieved through the isolation of expert models, while plasticity is greatly enhanced by evolving unique, task-specific architectures. Experimental results demonstrate that ECL significantly outperforms state-of-the-art individual-level CL methods. By shifting the focus from individual adaptation to collective evolution, ECL presents a novel path toward AI systems capable of CL.

深神经网络尽管取得了显著的成功,但其持续学习的能力仍然受到根本的限制。虽然目前大多数方法都旨在增强单一模式的能力,但受人类集体学习机制的启发,我们引入了不断发展的连续学习(ECL),这是一个维持和发展多种神经网络模型的框架。ECL不断为每一项引入的渐进任务寻找最佳架构。这一定制模型在相应任务方面受过培训,并被归档为专门专家,有助于不断积累技能。这一方法本身就解决了核心的CL挑战:稳定是通过专家模型的孤立而实现的,而可塑性则通过不断发展的独特、特定任务的结构而大大增强。实验结果表明ECL大大超越了最新的个人水平的CL方法。通过将重点从个人适应转向集体演进,ECL提出了一条通往能够实现 CL的AI系统的新途径。

Article 117

Title@2025-07-31 (4): Enabling Few-Shot Alzheimer’s Disease Diagnosis on Tabular Biomarker Data with LLMs

Title: Enabling Few-Shot Alzheimer’s Disease Diagnosis on Tabular Biomarker Data with LLMs

Ermöglichung der weniger scharfen Alzheimer-Krankheit Diagnose auf Tabular Biomarker Daten mit LLMs

使小热阿尔茨海默氏病的疾病诊断能够用LMS在表示生物标记数据上进行 2507.23227v1

Authors (9): Sophie Kearney, Shu Yang, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, Jason Moore, Marylyn Ritchie, Li Shen

Early and accurate diagnosis of Alzheimer’s disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer’s Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.

对阿尔茨海默氏病(AD)这一复杂的神经退化性神经疾病(AD)的早期和准确诊断是复杂的神经退化性疾病,需要分析通常以表格形式呈现的多种生物标志(例如神经成像、遗传风险因素、认知测试和脑脊髓液蛋白),以表格形式对它进行分析。采用灵活的短片推理、多式整合和基于自然语言的可解释性,大型语言模型(LLLMS)提供了前所未有的机会,用结构化的生物医学数据进行预测。我们提议了一个称为TAP-GPT的新型框架,即Tabulal 阿尔茨海默氏氏病的预测GPT,以调整表GPT2,即最初为商业情报任务而开发的多表单专用LMM,用于使用结构化生物标志的结构性生物标志诊断诊断。我们的方法,即利用结构化生物数据学数据和细微的表型模型,将LMMS的先前知识用于更高级的预测基础。

Article 118

Title@2025-07-31 (4): Unveiling the Influence of Amplifying Language-Specific Neurons

Title: Unveiling the Influence of Amplifying Language-Specific Neurons

Enthüllen des Einflusses amplifizierender sprachspezifischer Neuronen

消除扩增语言特有新元的影响 2507.22581v2

Authors (6): Inaya Rahmanisa, Lyzander Marciano Andrylie, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji

Language-specific neurons in LLMs that strongly correlate with individual languages have been shown to influence model behavior by deactivating them. However, their role in amplification remains underexplored. This work investigates the effect of amplifying language-specific neurons through interventions across 18 languages, including low-resource ones, using three models primarily trained in different languages. We compare amplification factors by their effectiveness in steering to the target language using a proposed Language Steering Shift (LSS) evaluation score, then evaluate it on downstream tasks: commonsense reasoning (XCOPA, XWinograd), knowledge (Include), and translation (FLORES). The optimal amplification factors effectively steer output toward nearly all tested languages. Intervention using this factor on downstream tasks improves self-language performance in some cases but generally degrades cross-language results. These findings highlight the effect of language-specific neurons in multilingual behavior, where amplification can be beneficial especially for low-resource languages, but provides limited advantage for cross-lingual transfer.

与个别语言密切相关的LLM中语言特有神经元在LLM中被证明通过解除其作用来影响模式行为。然而,他们在扩增中的作用仍未得到充分探讨。这项工作调查了通过18种语言的干预措施扩大语言特有神经元的效果,包括使用三种主要以不同语言培训的模型,使用三种主要以不同语言培训的模型。我们通过使用拟议的语言指导转变评价分数来比较扩增因素在引导目标语言方面的效力,然后评价下游任务:常识推理(XCOPA、XWinograd)、知识(Include)和翻译(FLORES)。最佳扩增因素有效地将产出引向几乎所有经过测试的语言。在下游任务中使用这一因素的干预措施在某些情况下提高了自我语言性能,但通常会降低跨语言结果。这些调查结果突出了语言特有的神经元在多语种行为中的影响,在这些中增益特别有利于低资源语言,但为跨语言转移提供了有限的优势。

Article 119

Title@2025-07-31 (4): A Single Direction of Truth: An Observer Model’s Linear Residual Probe Exposes and Steers Contextual Hallucinations

Title: A Single Direction of Truth: An Observer Model’s Linear Residual Probe Exposes and Steers Contextual Hallucinations

Eine einzige Richtung der Wahrheit: Die linearen residualen Sonden eines Beobachtermodells zeigen und säumen kontextuelle Halluzinationen

真相的单一方向:观察模型的线性残余研究发现和脚底背景幻觉 2507.23221v1

Authors (5): Charles O’Neill, Slava Chalnev, Chi Chi Zhao, Max Kirkby, Mudith Jayasekara

Contextual hallucinations – statements unsupported by given context – remain a significant challenge in AI. We demonstrate a practical interpretability insight: a generator-agnostic observer model detects hallucinations via a single forward pass and a linear probe on its residual stream. This probe isolates a single, transferable linear direction separating hallucinated from faithful text, outperforming baselines by 5-27 points and showing robust mid-layer performance across Gemma-2 models (2B to 27B). Gradient-times-activation localises this signal to sparse, late-layer MLP activity. Critically, manipulating this direction causally steers generator hallucination rates, proving its actionability. Our results offer novel evidence of internal, low-dimensional hallucination tracking linked to specific MLP sub-circuits, exploitable for detection and mitigation. We release the 2000-example ContraTales benchmark for realistic assessment of such solutions.

环境幻觉 – – 未经特定背景支持的语句 – – 仍然是AI中的一个重大挑战。我们展示了一个实用的解释性洞察力。我们展示了一个实用的洞察力:一个发电机 – – 不可知性观察模型通过单一前方传球和对剩余流的线性探测检测检测幻觉。这个探测器分离出一个单一可转移的线性方向,将幻觉与忠实文字分离,以5-27点为超值基线,并显示Gemma-2模型(2B至27B)的稳健中层性性性能。渐进时间激活将这个信号定位为稀疏、末级 MLP 活动。关键地说,操纵这个方向会以因果关系引导产生幻觉的速度,证明其可操作性。我们的结果提供了与特定的 MLP 子电路相连的内、低维性幻觉跟踪的新证据,可用于探测和减缓。我们为现实评估这些解决方案发布了2000年倍相对映基准。

Article 120

Title@2025-07-31 (4): Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles

Title: Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles

Förderung generativer Künstlicher Intelligenz und großer Sprachmodelle für das Nachfrage-Side-Management mit dem Internet von Elektrofahrzeugen

利用电动车辆互联网推动产生供求方管理的人工情报和大语言模型 2501.15544v4

Authors (6): Hanwen Zhang, Ruichen Zhang, Wei Zhang, Dusit Niyato, Yonggang Wen, Chunyan Miao

Generative artificial intelligence, particularly through large language models (LLMs), is poised to transform energy optimization and demand side management (DSM) within microgrids. This paper explores the integration of LLMs into energy management, emphasizing their roles in automating the optimization of DSM strategies with Internet of electric vehicles. We investigate challenges and solutions associated with DSM and explore the new opportunities presented by leveraging LLMs. Then, we propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles, highlighting our solution’s significant advancements in energy efficiency and user adaptability. This work underscores the potential of LLMs for energy optimization and fosters a new era of intelligent DSM solutions.

人造智能,特别是通过大型语言模型(LLMS)产生人工智能,准备在微型电网内改变能源优化和需求方管理(DSM),本文件探讨将LLMS纳入能源管理,强调其在电动车辆互联网优化DSM战略方面的作用,调查与DSM有关的挑战和解决办法,探索利用LLMS带来的新机会。然后,我们提出一个创新解决办法,加强LLMS的回收生成,以便自动制定问题、代码生成和定制优化。我们提出一个案例研究,以展示我们在电动车辆收费和优化方面拟议解决办法的有效性,突出我们在能源效率和用户适应性方面取得重大进展。这项工作强调了LMMS在能源优化方面的潜力,并促进智能DSM解决方案的新时代。

Article 121

Title@2025-07-31 (4): Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Title: Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Model Directions, keine Worte: Mechanistische Themenmodelle mit Sparse Autoencodern

模型方向,非单词:使用粗态自动编码器的机械专题模型 2507.23220v1

Authors (8): Carolina Zheng, Nicolas Beltran-Velez, Sweta Karlekar, Claudia Shi, Achille Nazaret, Asif Mallik, Amir Feder, David M. Blei

Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.

传统专题模型对于在大型文本集中发现潜在主题十分有效,然而,由于它们依赖一袋字表,因此难以捕捉精度抽象特征。虽然一些神经变异体使用较丰富的表达方式,但它们同样受到以单词列表的形式表达主题的限制,这限制了它们阐述复杂专题的能力。我们引入了机械主题模型(MTMs),这是一组以稀疏自动计算器(SAEs)所学的可解释特征运作的一类专题模型。MTMs通过界定这个精度丰富的空间,可以揭示更深的概念主题,并进行表达性特征描述。此外,在专题模型中,MTMs使得能够使用基于主题的指导矢量进行可控的文本生成。为了恰当地评估基于单词列表的方法的MTM专题,我们建议采用基于LMM的双向比较评估框架。在五个数据集中,MTMs匹配或超过关于一致性指标的传统和神经基线,专题法官一贯选择,并能够有效地指导LM产出。

Article 122

Title@2025-07-31 (4): Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation

Title: Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation

Zero-Shot-Dokument Verständnis mit Pseudo Inhaltsverzeichnis-Geführte retrieval-Augmented Generation

使用内容引导回源回收新一代的 Pseudo 表格进行零热文档理解 2507.23217v1

Authors (6): Hyeon Seong Jeong, Sangwoo Jo, Byeong Hyun Yoon, Yoonseok Heo, Haedong Jeong, Taehoon Kim

Understanding complex multimodal documents remains challenging due to their structural inconsistencies and limited training data availability. We introduce \textit{DocsRay}, a training-free document understanding system that integrates pseudo Table of Contents (TOC) generation with hierarchical Retrieval-Augmented Generation (RAG). Our approach leverages multimodal Large Language Models’ (LLMs) native capabilities to seamlessly process documents containing diverse elements such as text, images, charts, and tables without requiring specialized models or additional training. DocsRay’s framework synergistically combines three key techniques: (1) a semantic structuring module using prompt-based LLM interactions to generate a hierarchical pseudo-TOC, (2) zero-shot multimodal analysis that converts diverse document elements into unified, text-centric representations using the inherent capabilities of multimodal LLMs, and (3) an efficient two-stage hierarchical retrieval system that reduces retrieval complexity from $O(N)$ to $O(S + k_1 \cdot N_s)$. Evaluated on documents averaging 49.4 pages and 20,971 textual tokens, DocsRay reduced query latency from 3.89 to 2.12 seconds, achieving a 45% efficiency improvement. On the MMLongBench-Doc benchmark, DocsRay-Pro attains an accuracy of 64.7%, substantially surpassing previous state-of-the-art results.

理解复杂的多式联运文件仍因其结构上的不一致和有限的培训数据提供而具有挑战性。我们引入了一个无培训文件理解系统,将假目录(TOC)的生成与等级回收-启动一代(RAG)相结合。我们的方法利用多式联运大语言模型(LLLM)的本地能力,无缝处理包含各种要素的文件,如文本、图像、图表和表格,而不需要专门模型或额外培训。 DocsRay的框架协同结合了三种关键技术:(1) 使用基于迅速的LLLM互动的语义结构模块,以产生等级化伪技术;(2) 零速多式联运分析,利用多式联运LMM的固有能力,将不同文件要素转换为统一的、以文字为中心的表述;(3) 高效的两阶段级检索系统,将检索复杂性从$(N)降低到$(S+k_1)\cdddd_s。 DocsRay框架从494页和20,971个文本符号评估文件,从3.89-rental lexal-legendalal-legleglemental lementalmental-deal-dealmental-matitual legis pal legis pal lementalmentmentalmentaltitudeal.3.89=2.8,大幅降低45-lemental-lemental-lemental-legy-legy-lemental-lemental-lemental-lemental-lementaltialtial-lemental-lemental-lementaltimental)。

Article 123

Title@2025-07-31 (4): Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation

Title: Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation

Nicht nur was, aber wann: Integrieren unregelmäßiger Intervalle in LLM für sequentielle Empfehlung

不仅只是什么,但是当: 将非正常的间联者纳入LLM, 以便按顺序提出建议 2507.23209v1

Authors (3): Wei-Wei Du, Takuma Udagawa, Kei Tateno

Time intervals between purchasing items are a crucial factor in sequential recommendation tasks, whereas existing approaches focus on item sequences and often overlook by assuming the intervals between items are static. However, dynamic intervals serve as a dimension that describes user profiling on not only the history within a user but also different users with the same item history. In this work, we propose IntervalLLM, a novel framework that integrates interval information into LLM and incorporates the novel interval-infused attention to jointly consider information of items and intervals. Furthermore, unlike prior studies that address the cold-start scenario only from the perspectives of users and items, we introduce a new viewpoint: the interval perspective to serve as an additional metric for evaluating recommendation methods on the warm and cold scenarios. Extensive experiments on 3 benchmarks with both traditional- and LLM-based baselines demonstrate that our IntervalLLM achieves not only 4.4% improvements in average but also the best-performing warm and cold scenarios across all users, items, and the proposed interval perspectives. In addition, we observe that the cold scenario from the interval perspective experiences the most significant performance drop among all recommendation methods. This finding underscores the necessity of further research on interval-based cold challenges and our integration of interval information in the realm of sequential recommendation tasks. Our code is available here: https://github.com/sony/ds-research-code/tree/master/recsys25-IntervalLLM.

采购项目之间的时间间隔是按顺序提出建议任务的一个关键因素,而现有办法侧重于项目顺序,往往忽略了项目之间的间隔,而现有办法则以项目之间的间隔为重点,这种间隔是静止的;然而,动态间隔是一个描述用户不仅对用户内部历史的描述,而且对具有相同项目历史的不同用户进行用户特征描述的层面;在这项工作中,我们提议了IntervalLLLM,这是一个新颖的框架,将间隔信息纳入LLM,并纳入新的间隔关注,以共同审议项目和间隔信息。此外,与以前从用户和项目的角度处理冷开始情况的研究不同,我们引入了一个新观点:作为评估暖和冷情况建议方法的附加衡量尺度的间隔视角。对传统和LLMM基线的三项基准进行的广泛实验表明,我们的IntervalLLLM不仅平均改善了4.4%的改进,而且所有用户、项目和拟议间隔观点都采用了最佳的暖和冷情况。此外,我们观察到,从间隔角度看冷情况时,所有建议方法中都出现了最显著的绩效下降。这一结论强调了进一步研究的必要性:基于间/Mreal规则,我们现有的冷层/直观。

Article 124

Title@2025-07-31 (4): Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

Title: Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

Sind Recommenders Self-Aware? Label-freie Empfehlung Leistungsschätzung über Modellunsicherheit

推荐人是否自觉?通过模型不确定性对无标签建议绩效的估算 2507.23208v1

Authors (7): Jiayu Li, Ziyi Ye, Guohao Jian, Zhiqiang Guo, Weizhi Ma, Qingyao Ai, Min Zhang

Can a recommendation model be self-aware? This paper investigates the recommender’s self-awareness by quantifying its uncertainty, which provides a label-free estimation of its performance. Such self-assessment can enable more informed understanding and decision-making before the recommender engages with any users. To this end, we propose an intuitive and effective method, probability-based List Distribution uncertainty (LiDu). LiDu measures uncertainty by determining the probability that a recommender will generate a certain ranking list based on the prediction distributions of individual items. We validate LiDu’s ability to represent model self-awareness in two settings: (1) with a matrix factorization model on a synthetic dataset, and (2) with popular recommendation algorithms on real-world datasets. Experimental results show that LiDu is more correlated with recommendation performance than a series of label-free performance estimators. Additionally, LiDu provides valuable insights into the dynamic inner states of models throughout training and inference. This work establishes an empirical connection between recommendation uncertainty and performance, framing it as a step towards more transparent and self-evaluating recommender systems.

建议模式是否具有自我意识? 本文通过量化其不确定性来调查推荐人的自我意识,提供无标签的性能估计。这种自评可以使推荐者在与任何用户接触之前更知情地理解和决策。为此,我们提出一种直观和有效的方法,即基于概率的列表分布不确定性(LiDu)。 LiDu通过确定推荐者根据个别项目的预测分布生成某种排序列表的可能性来衡量不确定性。我们验证了李杜在两种情况下代表模型自我意识的能力:(1) 合成数据集的矩阵化因子模型,(2) 真实世界数据集的流行建议算法。实验结果表明,李杜比一系列无标签的性能估计器更符合建议性能。此外,LiDu在整个培训和推理过程中对模型动态的内部状态提供了宝贵的洞察力。这项工作在建议不确定性和性能之间建立了经验联系,将建议性能描述为更加透明和自我评价建议系统的一个步骤。

Article 125

Title@2025-07-31 (4): Adapt before Continual Learning

Title: Adapt before Continual Learning

Anpassung vor dem kontinuierlichen Lernen

在持续学习前适应 2506.03956v3

Authors (5): Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model’s plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL} process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods. Code is available at https://github.com/byyx666/ACL_code.

持续学习(CL)力求使神经网络在保留现有知识(稳定性)的同时逐步获得新的知识(塑料),同时使神经网络能够逐步获得新的知识(塑料)。尽管预先培训的模型为CL提供了坚实的基础,但现有方法在平衡这两个相互竞争的目标方面面临着根本性的挑战。目前的方法通常通过冻结PTM主干柱来解决稳定性问题,这严重限制了模型的可塑性,特别是当输入的数据的可塑性与培训前的数据分配大不相同时。或者,对整个PTM进行顺序的微调,可以适应新的知识,但往往导致灾难性的遗忘,突出基于PTM的CL的关键稳定性和可塑性交易。为了解决这一限制,我们提议在核心CL}进程(ACL)之前调整PTM,这是一个新的框架,在学习每一项新任务之前引入插接和播放的适应阶段。在这一阶段,ACL通过将原始的原型与不相干的方式结合起来来完善PTM的骨干。这一机制在理论上和经验上表明稳定性和塑料性之间的适当平衡,大大改进了CLUB/Lcomcol_Codecodection http://http://http://http://LgyLgy/Cocol/CLgyrcodeol。

Article 126

Title@2025-07-31 (4): InfAlign: Inference-aware language model alignment

Title: InfAlign: Inference-aware language model alignment

InfAlign: Inference-aware Sprachmodellausrichtung

Infagign: 参考意识语言模型对齐 2412.19792v4

Authors (12): Ananth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad Beirami

Language model alignment is a critical step in training modern generative language models. Alignment targets to improve win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of such inference-time methods. To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model. We prove that for any inference-time decoding procedure, the optimal aligned policy is the solution to the standard RLHF problem with a transformation of the reward. This motivates us to provide the calibrate-and-transform RL (InfAlign-CTRL) algorithm to solve this problem, which involves a reward calibration step and a KL-regularized reward maximization step with a transformation of the calibrated reward. For best-of-N sampling and best-of-N jailbreaking, we propose specific transformations offering up to 3-8% improvement on inference-time win rates. Finally, we also show that our proposed reward calibration method is a strong baseline for optimizing standard win rate.

语言模型的匹配是培训现代基因化语言模型的关键步骤。匹配目标的目标是提高参照基准模型的匹配模式样本的赢率。今天,我们越来越多地使用推算时间算法(例如最佳计算方法、受控解码、树搜索)从语言模型解码,而不是标准抽样。我们显示,这种火车/测试错配使标准的RLHF框架在这种推论时间方法方面达到亚最佳标准RLHF框架。为此,我们提议了一个推算-觉一致(InfAllign)框架框架框架框架,目的是根据基准模型优化一致政策中的推论时间赢率。我们证明,对于任何推断-时间解码程序,最佳调整政策是解决标准RLHF问题的办法,而不是标准抽样。这激励我们提供校准RL(InfAlign-CTRL)框架的校准和变校准方法来解决这一问题,这需要一个奖赏性校准步骤和KL- 定期奖赏步骤,目的是对基准模型进行最优化的修改。最佳的校准率是最终的校准率。显示我们最终的校准标准的校准率。

Article 127

Title@2025-07-31 (4): Learning 3D Scene Analogies with Neural Contextual Scene Maps

Title: Learning 3D Scene Analogies with Neural Contextual Scene Maps

3D-Szenen-Analogien mit neuralen Kontext-Szenenkarten lernen

学习 3D 与天体背景场景地图的场景模拟 2503.15897v2

Authors (4): Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim

Understanding scene contexts is crucial for machines to perform tasks and adapt prior knowledge in unseen or noisy 3D environments. As data-driven learning is intractable to comprehensively encapsulate diverse ranges of layouts and open spaces, we propose teaching machines to identify relational commonalities in 3D spaces. Instead of focusing on point-wise or object-wise representations, we introduce 3D scene analogies, which are smooth maps between 3D scene regions that align spatial relationships. Unlike well-studied single instance-level maps, these scene-level maps smoothly link large scene regions, potentially enabling unique applications in trajectory transfer in AR/VR, long demonstration transfer for imitation learning, and context-aware object rearrangement. To find 3D scene analogies, we propose neural contextual scene maps, which extract descriptor fields summarizing semantic and geometric contexts, and holistically align them in a coarse-to-fine manner for map estimation. This approach reduces reliance on individual feature points, making it robust to input noise or shape variations. Experiments demonstrate the effectiveness of our approach in identifying scene analogies and transferring trajectories or object placements in diverse indoor scenes, indicating its potential for robotics and AR/VR applications. Project page including the code is available through this link: https://82magnolia.github.io/3d_scene_analogies/.

由于数据驱动的学习难以全面包罗各种布局和开放空间,因此我们建议使用教学机器来识别3D空间中的关联共性。我们建议使用3D现场模拟,而不是侧重于点或对象的表达方式,而是将3D现场模拟方法引入3D现场模拟方法,即3D现场区域之间的平滑地图,使空间关系相互一致。与经过仔细研究的单一实例级地图不同,这些场景级地图顺利地连接了大场景区域,有可能在AR/VR的轨迹传输中实现独特的应用,为模拟学习和背景觉悟天体重新布局进行长期演示传输。为了找到 3D 场景模拟,我们建议使用神经背景场景图,以提取描述语义和几何背景的描述字段,并用粗略到松动的方式将它们统一起来,用于估算地图。这种方法减少了对单个地貌点的依赖,使其对输入的噪音或形状变异性具有很强的强度。实验表明我们在识别场景模拟和转移轨迹轨迹轨迹或对象位置定位方面的做法的有效性,包括不同室内镜框中的ARV82/3/Pro图。

Article 128

Title@2025-07-31 (4): Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Title: Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Geak: Einführung von Triton Kernel AI Agent & Evaluation Benchmarks

Geak:介绍Triton Kernel AI 代理和评估基准 2507.23194v1

Authors (10): Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum

The demand for AI-generated GPU kernels is rapidly growing, influenced by the need for scalable, hardware-optimized solutions in both industry and academia. As deep learning workloads grow in complexity and diversity, it is imperative to automate low-level kernel development to meet performance and productivity demands. Major cloud providers, semiconductor companies, and research institutions are now investing heavily in AI-driven code generation for GPUs, aiming to reduce manual optimization efforts while achieving near-expert performance on hardware like AMD MI300X. The Triton language, a Python-based DSL for GPU programming, has emerged as a popular target for such AI-generated kernels due to its balance of performance and ease-of-coding. In this work, we present an evaluation suite for Triton-based GPU kernels and GEAK (Generating Efficient AI-centric GPU Kernels)-a framework that leverages cutting-edge LLMs to generate performant Triton code specifically for AMD GPUs, including the AMD MI300X and MI250. GEAK leverages inference-time compute scaling to produce Triton-based GPU kernels using a reasoning loop adapted from Reflexion-style feedback mechanisms. On two evaluation benchmarks, GEAK significantly outperformed the baselines of directly prompting frontier LLMs as well as Reflexion-based generation pipelines by achieving correctness up to $63$% and execution speed up of up to $2.59$X. These results highlight the promise of GEAK-like agentic code generation for accelerating the adoption of diverse hardware platforms and democratizing access to expert-level kernel performance.

对AI 生成的 GPU 内核的需求正在迅速增长,这受到产业和学术界对可升级、硬件优化解决方案的需求的影响。随着深层次学习工作量在复杂和多样性方面不断增加,必须使低层内核开发自动化,以满足业绩和生产力需求。主要云源提供商、半导体公司和研究机构目前正在对GPU的AI驱动代码生成进行大量投资,目的是减少手工优化努力,同时在AMD MI300X等硬件上实现近距离专家业绩。Triton语言是用于GPU编程的基于Python的快速加速、硬件优化解决方案。Triton语言是用于GPUNB内核的多功能目标,因为其业绩平衡和生成方便性能。在这项工作中,我们为基于Triton的 GPUP内核内核和Gech (Generage 高效的 AI-cent GPUN) 生成了一套框架,该框架利用尖端LMM(包括AM MI300X 和 MI250) 的快速性硬化硬化操作码,具体用于AUD内核化的硬化的硬化操作。

Article 129

Title@2025-07-31 (4): G-Core: A Simple, Scalable and Balanced RLHF Trainer

Title: G-Core: A Simple, Scalable and Balanced RLHF Trainer

G-Core: Ein einfacher, skalierbarer und ausbalancierter RLHF-Trainer

G-Core: 简单、可缩放和平衡的RLHF培训员 2507.22789v2

Authors (11): Junyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Haoqiang Hong, Boqi Liu, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting Yao

Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks of a single centralized controller. Furthermore, we propose a dynamic placement schema that adaptively partitions resources and schedules workloads, significantly reducing hardware idle time and improving utilization, even under highly variable training conditions. G-Core has successfully trained models that support WeChat product features serving a large-scale user base, demonstrating its effectiveness and robustness in real-world scenarios. Our results show that G-Core advances the state of the art in RLHF training, providing a solid foundation for future research and deployment of large-scale, human-aligned models.

在培训大型语言模式和传播模式方面,从人类反馈中学习(RLHF)已成为日益流行的范例;虽然现有的RLHF培训系统使得取得显著进展,但它们在向多模式和传播工作流程扩展和适应动态工作量方面往往面临挑战;特别是,目前的方法在处理复杂的RLHF管道时,尤其是在动态抽样或基因化奖励模型的情景中,可能会遇到控制器可扩缩性、灵活资源配置和高效调控方面的限制,特别是在处理复杂的RLHF管道时,特别是在处理动态抽样或基因化奖励模型的情景中;在本文件中,我们提出了旨在应对这些挑战的简单、可扩缩和平衡的RLHF培训框架;G-Core引入了平行的调度规划模式,使复杂的RLHF工作流程能够灵活和高效地协调,而没有单一的集中控制器的瓶颈;此外,我们提议了一个动态的布置方案,即适应性地分配资源和工作量,大大减少硬件闲置时间并改进利用;即使在高度变化的培训条件下,G-Core成功地培训了支持WHE产品特性的模型,为大型用户基础提供大规模用户基础,展示全球战略基础的大规模部署成果。

Article 130

Title@2025-07-31 (4): NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions

Title: NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions

NaN-Propagation: Eine neuartige Methode zur Erkennung von Sparsität in Black-Box Computational Functions

NaN- propagation: 在黑箱计算函数中检测分数的新颖方法 2507.23186v1

Authors (1): Peter Sharpe

Sparsity detection in black-box functions enables significant computational speedups in gradient-based optimization through Jacobian compression, but existing finite-difference methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number floating-point values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate false negatives. We demonstrate the approach on an aerospace wing weight model, achieving a 1.52x speedup while detecting dozens of dependencies missed by conventional methods – a significant improvement since gradient computation is the bottleneck in many optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without modifying existing black-box codes. Advanced strategies including NaN payload encoding enable faster-than-linear time complexity, improving upon existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.

黑盒功能的分辨使通过 Jacobian 压缩,在基于梯度优化的精度优化中实现大量计算加速,但现有的有限差异方法由于时空零梯度而出现虚假的负差。这些虚假的负差可以静悄悄地腐蚀梯度计算,导致难以辨别错误。我们引入了NaN-propagation, 利用IEEE 754 Not-a- exple点值的普遍污染属性,通过浮动点数字计算,追踪输入-输出依赖性。通过系统污染与纳纳NN的输入,观察产出成为纳NNN,该方法重建了消除虚假负差的保守的松散模式。我们展示了在航空航天翼重量模型上采用的方法,实现了1.52x的加速,同时检测了被传统方法忽略的数十种依赖性 – – 这是一大进步,因为梯度计算是许多优化工作流程中的瓶颈。技术将IEEEEEE- 754在不修改现有黑箱代码的情况下,使所有方案库库库库库的合规性工作得以遵守。包括纳N 有效编码编码能够更快地实现超线时间的复杂程度,同时改进了现有的黑箱操作。

Article 131

Title@2025-07-31 (4): H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

Title: H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

H2Tune: Federated Foundation Model Feintuning mit Hybrid Heterogenität

H2Tune: 联邦基金会混合异质性混合调整示范 2507.22633v2

Authors (8): Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong

Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foundation models based on their task requirements and resource limitations, leading to dimensional mismatches during LoRA parameter aggregation; and 2) multi-task knowledge interference, where local shared parameters, trained with both task-shared and task-specific knowledge, cannot ensure only task-shared knowledge is transferred between clients. To address these challenges, we propose H2Tune, a federated foundation model fine-tuning with hybrid heterogeneity. Our framework H2Tune consists of three key components: (i) sparsified triple matrix decomposition to align hidden dimensions across clients through constructing rank-consistent middle matrices, with adaptive sparsification based on client resources; (ii) relation-guided matrix layer alignment to handle heterogeneous layer structures and representation capabilities; and (iii) alternating task-knowledge disentanglement mechanism to decouple shared and specific knowledge of local model parameters through alternating optimization. Theoretical analysis proves a convergence rate of O(1/\sqrt{T}). Extensive experiments show our method achieves up to 15.4% accuracy improvement compared to state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/H2Tune-1407.

与现有的基础模型联邦化微调(FFT)方法不同,混合混合混合混合混合联合联合联合联合调整(HHFFT)是一种探索不足的情景,客户在模型架构和下游任务中表现出双重异质性。这种混合异质性提出了两大挑战:1)混合矩阵汇总,客户根据任务要求和资源限制采用不同的大型基础模型,导致LORA参数汇总期间的维度不匹配;2)多任务知识干扰,当地共享参数,经过任务分担和任务特定知识的培训,无法确保客户之间仅共享任务知识。4 为了应对这些挑战,我们提议H2Tune, 一种混合异质性基模型的微调。我们的 H2Tune框架由三个关键组成部分组成:(一) 通过建立分级和分级的中间矩阵,使客户之间隐藏的维度不相匹配,基于客户资源的适应性弥漫度;(二) 将矩阵结构与处理的兼容性层次结构及代表能力挂钩;以及(三) 将任务周期性基准值的精度比度分析,通过具体指标性分析,将任务级化方法转化为特定的深度分析。

Article 132

Title@2025-07-31 (4): MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

Title: MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

MolPIF: Ein Parameter Interpolationsflussmodell für die Molekülerzeugung

MoLPIF: 分子一代的参数内插流动模型 2507.13762v3

Authors (13): Yaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi

Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.

分子生产深层学习的进展显示了加速发现毒品的希望。贝叶斯流动网络(BFNs)最近展示了各种化学任务令人印象深刻的绩效,其成功往往归功于低变量参数空间的建模模式,然而,基于贝叶斯推论的战略对设计更灵活的分布转换路径施加了限制,使其难以适应不同数据分布和不同任务要求。此外,建立更简单、更高效的参数-空间模型的潜力尚未开发出来。为了解决这一问题,我们提出了具有详细理论基础、培训和推断程序的新型 Indigulation流程模型(称为PIF)。我们随后为基于结构的药物设计开发了MolPIF,展示了它相对于基线的不同指标的优异性业绩。这项工作验证了基于参数-空间的分子基因模型模式的有效性,并为模型设计提供了新的视角。

Article 133

Title@2025-07-31 (4): Entanglement-induced provable and robust quantum learning advantages

Title: Entanglement-induced provable and robust quantum learning advantages

Verflechtung-induzierte nachweisbare und robuste Vorteile des Quantenlernens

纠缠引发的可证实和稳健量量的学习优势 2410.03094v2

Authors (2): Haimeng Zhao, Dong-Ling Deng

Quantum computing holds unparalleled potentials to enhance machine learning. However, a demonstration of quantum learning advantage has not been achieved so far. We make a step forward by rigorously establishing a noise-robust, unconditional quantum learning advantage in expressivity, inference speed, and training efficiency, compared to commonly-used classical models. Our proof is information-theoretic and pinpoints the origin of this advantage: entanglement can be used to reduce the communication required by non-local tasks. In particular, we design a task that can be solved with certainty by quantum models with a constant number of parameters using entanglement, whereas commonly-used classical models must scale linearly to achieve a larger-than-exponentially-small accuracy. We show that the quantum model is trainable with constant resources and robust against constant noise. Through numerical and trapped-ion experiments on IonQ Aria, we demonstrate the desired advantage. Our results provide valuable guidance for demonstrating quantum learning advantages with current noisy intermediate-scale devices.

量子计算具有增强机器学习的无与伦比的潜力。然而, 量子学习优势的示范至今尚未实现。我们向前迈出了一步, 严格地建立与常用古典模型相比, 在表达性、推断速度和培训效率方面, 无条件的量子学习优势。我们的证明是信息理论, 并确定了这一优势的起源: 纠缠可以用来减少非本地任务所需要的通信。特别是, 我们设计了一项任务, 可以通过量子模型, 以恒定的参数数来明确解决, 使用缠绕, 而常用的古典模型必须直线缩缩, 以达到大于极小的精确度。我们显示, 量子模型可以用恒定的资源来训练, 并且能够抵御恒定的噪音。我们通过在 Ion Q Aria 上进行数字和困住实验, 展示了预期的优势。我们的成果提供了宝贵的指导, 以显示当前噪噪的中间级装置的量子学习优势。

Article 134

Title@2025-07-31 (4): A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

Title: A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

Eine umfassende Überprüfung von Difffusionsmodellen in der intelligenten Landwirtschaft: Fortschritt, Anwendungen und Herausforderungen

全面审查 “ 智能农业传播模式:进展、应用和挑战 “ 2507.18376v2

Authors (9): Xing Hu, Haodong Chen, Qianqian Duan, Danfeng Hong, Ruijiao Li, Huiliang Shang, Linghua Jiang, Haima Yang, Dawei Zhang

With the global population growing and arable land resources becoming increasingly scarce,smart agriculture and precision agriculture have emerged as key directions for the future ofagricultural development.Artificial intelligence (AI) technologies, particularly deep learning models, have found widespread applications in areas such as crop monitoring and pest detection. As an emerging generative model, diffusion models have shown significant promise in tasks like agricultural image processing, data augmentation, and remote sensing. Compared to traditional generative adversarial networks (GANs), diffusion models offer superior training stability and generation quality, effectively addressing challenges such as limited agricultural data and imbalanced image samples. This paper reviews the latest advancements in the application of diffusion models in agriculture, focusing on their potential in crop pest and disease detection, remote sensing image enhancement, crop growth prediction, and agricultural resource management. Experimental results demonstrate that diffusion models significantly improve model accuracy and robustness in data augmentation, image generation, and denoising, especially in complex environments. Despite challenges related to computational efficiency and generalization capabilities, diffusion models are expected to play an increasingly important role in smart and precision agriculture as technology advances, providing substantial support for the sustainable development of global agriculture.

随着全球人口增长和可耕地资源日益稀缺,智能农业和精准农业已成为农业发展未来的关键方向。人工智能技术,特别是深层学习模式,在作物监测和虫害检测等领域广泛应用。作为一种新兴的基因化模式,推广模式在农业图像处理、数据增强和遥感等任务方面显示出巨大的希望。与传统的基因对抗网络相比,推广模式提供了较高的培训稳定性和生产质量,有效地应对了农业数据有限和图像样本不平衡等挑战。本文件回顾了农业推广模式应用的最新进展,重点是其在作物虫害和疾病检测、遥感图像增强、作物生长预测和农业资源管理方面的潜力。实验结果表明,推广模式极大地提高了模型在数据增强、图像生成和淡化方面的准确性和稳健性,特别是在复杂环境中。尽管在计算效率和普遍化能力方面存在挑战,但随着技术进步,推广模式可望在智能和精准农业方面发挥越来越重要的作用,为全球农业的可持续发展提供大量支持。

Article 135

Title@2025-07-31 (4): CNN-based solution for mango classification in agricultural environments

Title: CNN-based solution for mango classification in agricultural environments

CNN-basierte Lösung für die Mangoklassifizierung in landwirtschaftlichen Umgebungen

以有线有线电视新闻网为基础的农业环境芒果分类解决办法 2507.23174v1

Authors (3): Beatriz Díaz Peón, Jorge Torres Gómez, Ariel Fajardo Márquez

This article exemplifies the design of a fruit detection and classification system using Convolutional Neural Networks (CNN). The goal is to develop a system that automatically assesses fruit quality for farm inventory management. Specifically, a method for mango fruit classification was developed using image processing, ensuring both accuracy and efficiency. Resnet-18 was selected as the preliminary architecture for classification, while a cascade detector was used for detection, balancing execution speed and computational resource consumption. Detection and classification results were displayed through a graphical interface developed in MatLab App Designer, streamlining system interaction. The integration of convolutional neural networks and cascade detectors proffers a reliable solution for fruit classification and detection, with potential applications in agricultural quality control.

本条举例说明利用进化神经网络(CNN)设计水果检测和分类系统。目的是为农场库存管理开发一个自动评估水果质量的系统。具体地说,利用图像处理开发了芒果分类方法,确保了准确性和效率。Resnet-18被选为初步分类结构,同时使用级联检测器进行检测,平衡执行速度和计算资源消耗。检测和分类结果通过MatLab App设计器开发的图形界面展示,简化系统互动。将进化神经网络和级联探测器整合为水果分类和检测的可靠解决方案,并有可能应用于农业质量控制。

Article 136

Title@2025-07-31 (4): BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning

Title: BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning

BAR Conjecture: Die Machbarkeit von Schlussfolgerungen Budget-konstruierten LLM-Diensten mit Authentizität und Vernunft

BAR 假设:具有真实性和合理性、经过预算约束的有限LLM服务推论的可行性 2507.23170v1

Authors (5): Jinan Zhou, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta, Aryan Singhal

When designing LLM services, practitioners care about three key properties: inference-time budget, factual authenticity, and reasoning capacity. However, our analysis shows that no model can simultaneously optimize for all three. We formally prove this trade-off and propose a principled framework named The BAR Theorem for LLM-application design.

在设计LLM服务时,从业者关心三个关键属性:推论时间预算、事实真实性和推理能力。然而,我们的分析表明,任何模型都无法同时优化所有三个模型。我们正式证明这一权衡,并提议了一个称为LM应用设计BAR理论的原则框架。

Article 137

Title@2025-07-31 (4): LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration

Title: LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration

LENS: Lerne Ensemble Vertrauen aus neuralen Staaten für Multi-LLM-Antwortintegration

LENS:从神经国家学习多LLM应答整合的集合信任 2507.23167v1

Authors (1): Jizhou Guo

Large Language Models (LLMs) have demonstrated impressive performance across various tasks, with different models excelling in distinct domains and specific abilities. Effectively combining the predictions of multiple LLMs is crucial for enhancing system robustness and performance. However, existing ensemble methods often rely on simple techniques like voting or logits ensembling, which overlook the varying confidence and reliability of models in different contexts. In this work, we propose LENS (Learning ENsemble confidence from Neural States), a novel approach that learns to estimate model confidence by analyzing internal representations. For each LLM, we train a lightweight linear confidence predictor that leverages layer-wise hidden states and normalized probabilities as inputs. This allows for more nuanced weighting of model predictions based on their context-dependent reliability. Our method does not require modifying the model parameters and requires negligible additional computation. Experimental results on multiple-choice and boolean question-answering tasks demonstrate that LENS outperforms traditional ensemble methods by a substantial margin. Our findings suggest that internal representations provide valuable signals for determining model confidence and can be effectively leveraged for ensemble learning.

大型语言模型(LLMS)在各种任务中表现出了令人印象深刻的成绩,不同模型在不同的领域和具体能力方面表现得不同。有效地结合对多个LLMS的预测对于提高系统稳健性和性能至关重要。然而,现有的混合方法往往依赖简单的技术,如投票或登录组合,这些技术忽视了不同情况下模型的不同信心和可靠性。在这项工作中,我们提议LENS(从神经国学习可综合信任),这是一种新颖的方法,通过分析内部代表来评估模型的信心。我们为每个LM公司培训了一个轻量线性线性信心预测器,该预测器能够利用分层的隐藏状态和正常的概率作为投入。这使得能够根据不同背景的可靠性对模型预测进行更细致的加权。我们的方法并不要求修改模型参数,而需要微不足道的额外计算。多曲和布林问答任务的实验结果表明,LENS比传统的混合方法要差很多。我们的研究结果表明,内部代表提供了宝贵的信号,用以确定模型信任度,并且能够有效地利用该软件学习。

Article 138

Title@2025-07-31 (4): Tensor Product Neural Networks for Functional ANOVA Model

Title: Tensor Product Neural Networks for Functional ANOVA Model

Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell

功能ANOVA模型的神经网络 2502.15215v5

Authors (5): Seokhun Park, Insung Kong, Yongchan Choi, Chanmoo Park, Yongdai Kim

Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.

随着机器学习模式变得更加复杂,机器学习模式的可解释性正在变得越来越重要。功能性ANOVA模型将一个高维功能分解成一个低维功能的组合(通常称为组件),是解释性AI最受欢迎的工具之一,最近,在功能性ANOVA模型中,已经开发了各种神经网络来估计每个组成部分。然而,在估算每个组成部分时,这种神经网络变得高度不稳定,因为各组成部分本身不是独特的定义。也就是说,一个功能性ANOVA-TPNN可以多重功能的ANOVA分解。在本文中,我们提出了一个新颖的神经网络,保证一个独特的功能性ANOVA分解功能,从而能够对每个组成部分进行精确和精确的估算。我们称之为我们拟议的神经网络ANOVA Tensor产品神经网络(ANOVA-TPNNN),因为它的动机是高压产品基础的扩展。理论上,我们证明ANOVA-TPNNN可以对任何顺利的功能进行近似的计算。我们表明,ANOVA-TPNNNN模型提供了比现有每个组成部分的最初的参数更加稳定的估计,因此在对每个组成部分进行更稳定的解释时,我们表明ANOVAVA-TPNNNNNNNM模型提供较稳定得多的数据和较稳定的评估。

Article 139

Title@2025-07-31 (4): Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Title: Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Kompositorische Funktionsnetzwerke: Eine leistungsstarke Alternative zu tiefen neuralen Netzwerken mit eingebauter Interpretierbarkeit

构成函数网络:具有内置可解释性的深神经网络高性能替代品 2507.21004v2

Authors (1): Fang Li

Deep Neural Networks (DNNs) deliver impressive performance but their black-box nature limits deployment in high-stakes domains requiring transparency. We introduce Compositional Function Networks (CFNs), a novel framework that builds inherently interpretable models by composing elementary mathematical functions with clear semantics. Unlike existing interpretable approaches that are limited to simple additive structures, CFNs support diverse compositional patterns – sequential, parallel, and conditional – enabling complex feature interactions while maintaining transparency. A key innovation is that CFNs are fully differentiable, allowing efficient training through standard gradient descent. We demonstrate CFNs’ versatility across multiple domains, from symbolic regression to image classification with deep hierarchical networks. Our empirical evaluation shows CFNs achieve competitive performance against black-box models (96.24% accuracy on CIFAR-10) while outperforming state-of-the-art interpretable models like Explainable Boosting Machines. By combining the hierarchical expressiveness and efficient training of deep learning with the intrinsic interpretability of well-defined mathematical functions, CFNs offer a powerful framework for applications where both performance and accountability are paramount.

深神经网络(DNNS) 带来令人印象深刻的绩效,但其黑箱自然特性限制了在需要透明度的高风险领域的部署。我们引入了组成功能网络(CFNS),这是一个新颖的框架,它通过以清晰的语义组成基本数学函数来建立内在可解释的模式。与目前仅限于简单添加结构的可解释方法不同, CFNs支持多种可解释模式 – – 连续、平行和有条件 – – 促成复杂特征互动,同时保持透明度。一项关键创新是,CFNs是完全不同的,通过标准的梯度下降进行高效培训。我们展示了CFNFS在多个领域,从象征性回归到与深层次网络的图像分类的多功能性。我们的经验评估显示,CFNFS取得了与黑箱模式(CIFAR-10的精确度为96.24 % ) 的竞争性业绩,而优于先进的可解释模型,如可解释的机器。通过将分级表达性和有效的深层次培训与定义的数学功能的内在可解释性,CFMs提供了一个强大的应用框架,在业绩和问责都至关重要的地方提供了强大的应用框架。

Article 140

Title@2025-07-30 (3): TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

Title: TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

TokenBlowUp: Auflösung von Repräsentationssingularitäten in LLM-Tokenräumen über monoidale Transformationen

TokenBlowUp: 通过一式转换解决LLM Token空间的代表标志 2507.19747v2

Authors (1): Dongfang Zhao

Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models (LLMs). These findings reveal the presence of geometric singularities around polysemous tokens, which can lead to representational instability. Existing methodologies, which presuppose a smooth data manifold, are ill-equipped to address such intrinsic structural flaws. In this paper, we formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point. This procedure replaces a singular point in the ambient affine scheme with its exceptional divisor, which we identify as a canonical geometric space – a projective space of directions – that houses the disambiguated semantic meanings of the token. This process of ``representational desingularization’’ constructs a new geometric landscape for embeddings. We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved. Finally, we outline the architectural implications of our framework, arguing for a paradigm shift from static look-ups to dynamic, geometrically-grounded computation.

最近的工作提供了令人信服的证据,对大语言模型(LLMs)象征性嵌入空间的基本假设提出了挑战。这些结论揭示了在多元符号周围存在几何特征,这可能导致代表性的不稳定。现有的方法,以光滑的数据元为前提,不具备解决这种内在结构缺陷的能力。在本文件中,我们用计划理论的语言将这一问题正式化,并通过在每个单一点应用方案理论打击来提出严格的解决方案。这个程序用其特殊的分光器取代了环境缝合方案中的一个单点,我们把它确定为一个可谱化的几何空间 – – 一个方向的投影空间 – – 含有该符号模糊的语义含义。这个“代表脱色化”进程为嵌入设计了新的几何景观。我们证明了一种正式的理论,保证了这一新空间的几何结构规范,表明最初的病理得到了解决。最后,我们概述了我们框架的建筑影响,说明范式从静态的外观向动态的、几何式的地面计算方法转变。

Article 141

Title@2025-07-30 (3): Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting Materials

Title: Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting Materials

Erweiterte Factorisierungsmaschine Annealing für die schnelle Entdeckung von transparenten leitenden Materialien

迅速发现透明操作材料的扩展保理装置 2507.23160v1

Authors (3): Daisuke Makino, Tatsuya Goto, Yoshinori Suga

The development of novel transparent conducting materials (TCMs) is essential for enhancing the performance and reducing the cost of next-generation devices such as solar cells and displays. In this research, we focus on the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ system and extend the FMA framework, which combines a Factorization Machine (FM) and annealing, to search for optimal compositions and crystal structures with high accuracy and low cost. The proposed method introduces (i) the binarization of continuous variables, (ii) the utilization of good solutions using a Hopfield network, (iii) the activation of global search through adaptive random flips, and (iv) fine-tuning via a bit-string local search. Validation using the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ data from the Kaggle “Nomad2018 Predicting Transparent Conductors” competition demonstrated that our method achieves faster and more accurate searches than Bayesian optimization and genetic algorithms. Furthermore, its application to multi-objective optimization showed its capability in designing materials by simultaneously considering both the band gap and formation energy. These results suggest that applying our method to larger, more complex search problems and diverse material designs that reflect realistic experimental conditions is expected to contribute to the further advancement of materials informatics.

开发新的透明操作材料(TCM)对于提高太阳能电池和显示器等下一代装置的性能和降低其成本至关重要,在这项研究中,我们注重于($xGa$_y$_z$_z$)系统,并扩展FMA框架,该框架将一个保理机(FM)和喷射结合起来,以寻找精度高、成本低的最佳构成和晶体结构。拟议方法引入了(一) 连续变量的二进制化,(二) 利用Hopfield网络的好解决办法,(三) 通过随机随机翻转启动全球搜索,以及(四) 通过局部搜索进行微调。使用(Al_xGa$y$_z$_z$)和喷射仪的FMA框架验证,以寻找“2018年进一步预测透明导体”竞争中的最佳组成和晶体结构数据。拟议方法引入了(一)连续变量的二),(二)利用Hopfield 网络,(二) 利用良好的解决方案,(三) 启动全球搜索,(三) 通过随机随机随机翻转,启动全球搜索,(四) 启动全球搜索,(四) 通过点搜索系统进行微调) 并进行微调,(四进调) 并推广) 利用该系统, 并同时将其应用, 展示,使多动, 显示其设计,使多式搜索能力向多动。

Article 142

Title@2025-07-30 (3): AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Title: AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

AdaptHetero: maschinelles Lernen Interpretationsgetriebene Subgruppenanpassung für EHR-basierte klinische Vorhersagen

适应赫特罗:基于EHR的临床预测的机器学习口译驱动分组适应 2507.21197v2

Authors (2): Ling Liao, Eva Aagaard

Machine learning interpretation (MLI) has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets: GOSSIS-1-eICU, WiDS, and MIMIC-IV, AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance and optimized clinical deployment.

机器学习解释(MLI)主要用于建立临床信任和发现EHR系统中可采取行动的洞察力,然而,EHR数据的内在复杂性和异质性限制了其在指导针对具体子群的建模方面的效力。我们建议采用一个全新的MLI驱动框架,即SdangHetero,将可解释性洞察力转化为针对各医院系统内各子群进行模式培训和评价的可操作指南。对三个大规模EHR数据集进行了评估:GOSISIS-1-eICU、WIDS和MIMIMI-IV,Dept Hetero 持续地查明了在预测ICU死亡率、住院死亡和隐性缺氧症方面的各种模型行为。通过整合基于SHAP的诠释和不受监督的集群,该框架加强了对临床上有意义的子群特性的识别,从而改进了预测性业绩和优化临床部署。

Article 143

Title@2025-07-30 (3): Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization

Title: Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization

Entscheidung von Supervised Learning mit tiefen Ensembles: Ein praktischer Rahmen für robuste Portfolio-Optimierung

受监督的深群学习决定:强力组合组合优化实用框架 2503.13544v4

Authors (6): Juhyeong Kim, Sungyoon Choi, Youngbin Lee, Yejin Kim, Yongmin Choi, Yongjae Lee

We propose Decision by Supervised Learning (DSL), a practical framework for robust portfolio optimization. DSL reframes portfolio construction as a supervised learning problem: models are trained to predict optimal portfolio weights, using cross-entropy loss and portfolios constructed by maximizing the Sharpe or Sortino ratio. To further enhance stability and reliability, DSL employs Deep Ensemble methods, substantially reducing variance in portfolio allocations. Through comprehensive backtesting across diverse market universes and neural architectures, shows superior performance compared to both traditional strategies and leading machine learning-based methods, including Prediction-Focused Learning and End-to-End Learning. We show that increasing the ensemble size leads to higher median returns and more stable risk-adjusted performance. The code is available at https://github.com/DSLwDE/DSLwDE.

我们提议由监督学习公司(DSSL)作出决定,这是稳健组合优化的实用框架。DSL将组合建设作为一个监管的学习问题重新设定:模型经过培训,以预测最佳组合权重,使用通过最大限度地增加Sharpe或Sortino比率构建的交叉热带损失和组合;为了进一步加强稳定性和可靠性,DSL采用深团组合方法,大幅度减少组合分配的差异。通过对不同市场领域和神经结构进行全面的回溯测试,显示与传统战略和主要机器学习方法相比,业绩优于传统战略和主要机器学习方法,包括预测-受控制学习和端至端学习。我们显示,增加组合规模导致更高的中位回报和更稳定的风险调整性能。该代码可在https://github.com/DSLwDE/DSLwDE/DSLwDE上查阅。

Article 144

Title@2025-07-30 (3): On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Title: On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Über die Komplexität der Suche nach Stationären Punkten in nicht konvexe einfache Bilevel-Optimierung

关于非电解简单双级最佳化中寻找固定点的复杂性 2507.23155v1

Authors (4): Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-{\L}ojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(\epsilon_f, \epsilon_g)$-stationary point-where $\epsilon_f$ and $\epsilon_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(\epsilon_f^{-\frac{3+p}{1+p}}, \epsilon_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.

在本文中, 我们研究如何解决简单的双层优化问题, 即将上层目标最小化于较低层次问题的解决方案组。我们侧重于上层和下层目标平滑但可能不固定的一般设置。由于缺乏对较低层次目标的额外结构性假设, 比如 convexity 或 Polyak- l}ojasiewicz (PL) 条件保障全球最佳性一般难以解决。相反, 我们为这类问题引入了适当的固定性概念, 目的是设计一个在多元时间找到此类固定点的第一阶算法。直观地说, 此设置的静态目标无法在本地大幅改进, 而不会导致更低层次目标的恶化。至此, 我们显示动态屏障下层( DBGD) 框架的简单和可执行变量可以有效解决被认为是的非conx 简单的双层问题。具体地说, 要达到 $( climal_ f) 美元、\ ral_ ral_ lex_ real_ pal_ pal_ preal_ pal_ pal_ statal_ pal_ pal_ pal_ sal_ pal_ pal_ pal_ sal_ pal_ pal_ pal_ plupal_ plupal_ plupal_ plational_ plational_ plupal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ sal_ pal_ pal_ pal_ sal_ pal_ pal_ pal_ pal_ pal_ sal_ pal_ pal_ pal_ pal_ pal_ sal_ sal_ sal_ sal_ sal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ plgal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_ pal_

Article 145

Title@2025-07-30 (3): FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations

Title: FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations

FuseTen: Ein generatives Modell für täglich 10 m Landoberflächentemperaturschätzung aus räumlich-zeitlichen Satellitenbeobachtungen

FuseTen:斯帕蒂奥-时空卫星观测每日10米地表温度估计的生成模型 2507.23154v1

Authors (4): Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai

Urban heatwaves, droughts, and land degradation are pressing and growing challenges in the context of climate change. A valuable approach to studying them requires accurate spatio-temporal information on land surface conditions. One of the most important variables for assessing and understanding these phenomena is Land Surface Temperature (LST), which is derived from satellites and provides essential information about the thermal state of the Earth’s surface. However, satellite platforms inherently face a trade-off between spatial and temporal resolutions. To bridge this gap, we propose FuseTen, a novel generative framework that produces daily LST observations at a fine 10 m spatial resolution by fusing spatio-temporal observations derived from Sentinel-2, Landsat 8, and Terra MODIS. FuseTen employs a generative architecture trained using an averaging-based supervision strategy grounded in physical principles. It incorporates attention and normalization modules within the fusion process and uses a PatchGAN discriminator to enforce realism. Experiments across multiple dates show that FuseTen outperforms linear baselines, with an average 32.06% improvement in quantitative metrics and 31.42% in visual fidelity. To the best of our knowledge, this is the first non-linear method to generate daily LST estimates at such fine spatial resolution.

在气候变化的背景下,城市热浪、干旱和土地退化是紧迫和日益严峻的挑战,在气候变化的背景下,挑战日益严峻和日益严峻。研究它们的宝贵方法要求关于陆地表面条件的准确空间-时空信息。评估和理解这些现象的最重要变量之一是地表温度(LST),它来自卫星,提供关于地球表面热状态的基本信息。然而,卫星平台在空间分辨率和时间分辨率之间必然面临平衡。为了缩小这一差距,我们提议FuseTen,这是一个新型的基因化框架,它通过使用来自Sentinel-2、Landsat 8和Terra MODIS的地表-时空观测,以10米的微分空间分辨率产生每日LST观测结果。FuseTen使用基于物理原理的基于平均监督战略的培训的基因化结构。它将聚变过程的注意力和正常化模块纳入并使用PatchGAN歧视器来实施现实主义。为了弥合这一差距,我们提出的实验表明,FuseTen的线性基线比线性基线高出了10.06%,视觉真实度为31.42%。这种知识的最佳方法是每天得出的最佳空间分辨率。

Article 146

Title@2025-07-30 (3): AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

Title: AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

KI-Paradigma zur Lösung von Differentialgleichungen: First-Principles Datengenerierung und Scale-Dilation Operator KI-Löser

解决差别方程式的AI模式:第一原则数据生成和比例关系操作员AI求解器 2507.23141v1

Authors (7): Xiangshu Gong, Zhiqiang Xie, Xiaowei Jin, Chen Wang, Yanling Qu, Wangmeng Zuo, Hui Li

Many problems are governed by differential equations (DEs). Artificial intelligence (AI) is a new path for solving DEs. However, data is very scarce and existing AI solvers struggle with approximation of high frequency components (AHFC). We propose an AI paradigm for solving diverse DEs, including DE-ruled first-principles data generation methodology and scale-dilation operator (SDO) AI solver. Using either prior knowledge or random fields, we generate solutions and then substitute them into the DEs to derive the sources and initial/boundary conditions through balancing DEs, thus producing arbitrarily vast amount of, first-principles-consistent training datasets at extremely low computational cost. We introduce a reversible SDO that leverages the Fourier transform of the multiscale solutions to fix AHFC, and design a spatiotemporally coupled, attention-based Transformer AI solver of DEs with SDO. An upper bound on the Hessian condition number of the loss function is proven to be proportional to the squared 2-norm of the solution gradient, revealing that SDO yields a smoother loss landscape, consequently fixing AHFC with efficient training. Extensive tests on diverse DEs demonstrate that our AI paradigm achieves consistently superior accuracy over state-of-the-art methods. This work makes AI solver of DEs to be truly usable in broad nature and engineering fields.

人工智能(AI)是解决DE的一条新途径。然而,数据非常稀缺,现有的AI解答者与高频组件近似(AHFC)抗争。我们提出了解决多种DE的AI范式,包括以DE为规则的一原则生成数据的方法和比例比关系操作者(SDO) AI求解器。我们利用先前的知识或随机的字段,产生解决方案,然后取而代之,通过平衡DEs来获取源和初始/初始/封闭条件,从而产生大量任意的、符合原则的培训数据集,而计算成本极低。我们引入了可逆的SDO,利用多尺度解决方案的四重转换来修复AHFC,并设计一个与SDO相连接的基于关注的变异的 AI求解器。关于赫斯条件损失函数的上限与解决方案梯度的正方形 2 节成比例,从而显示SDO生成了一种平稳的DO型损失模型,从而在高频度的 AIC 测试中,从而实现高频度的AIC 测试。

Article 147

Title@2025-07-30 (3): Observational Multiplicity

Title: Observational Multiplicity

Beobachtungsvielfalt

观测多样性 2507.23136v1

Authors (3): Erin George, Deanna Needell, Berk Ustun

Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in {0,1}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.

许多预测任务可以接受几乎同样能发挥作用的多种模型。当相互竞争的模型对个人作出相互矛盾的预测时,这种现象可能会损害可解释性和安全性。在这项工作中,我们研究由于我们称之为 emph{servational multiple]的影响,在概率分类任务中会如何产生任意性。我们讨论这种影响如何在广泛的实际应用类别中产生,在这个类别中,我们学习一个分类员来预测概率[0,1]美元,但获得一组观察数据($y_i\in 0,1%$)。我们提议通过\emph{regret}的透镜来评估个人概率预测的任意性。我们提出对概率分类任务采取的一种遗憾措施,以衡量模型预测如何因不同的培训标签变化而改变。我们提出了一个一般用途方法来估计概率分类任务中的遗憾。我们使用我们的措施来表明某些群体在数据集中的遗憾程度更高,并讨论可能的应用。我们展示了如何估计遗憾促进真实世界的安全,通过弃权的数据收集和数据的应用。我们展示了如何估计实际应用促进安全。

Article 148

Title@2025-07-30 (3): Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts

Title: Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts

Bewertung und Verbesserung der Robustheit von Sprachbefehlserkennungsmodellen für Geräusch- und Verteilungsverschiebungen

评估和改进语音指令识别模式对噪音和分配变化的威力 2507.23128v1

Authors (2): Anaïs Baranger, Lucas Maison

Although prior work in computer vision has shown strong correlations between in-distribution (ID) and out-of-distribution (OOD) accuracies, such relationships remain underexplored in audio-based models. In this study, we investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions. We benchmark several neural architectures across a variety of evaluation sets. To quantify the impact of noise on generalization, we make use of two metrics: Fairness (F), which measures overall accuracy gains compared to a baseline model, and Robustness (R), which assesses the convergence between ID and OOD performance. Our results suggest that noise-aware training improves robustness in some configurations. These findings shed new light on the benefits and limitations of noise-based augmentation for generalization in speech models.

虽然计算机愿景先前的工作显示,在分配(ID)和分配(OOD)之间有着密切的关联,但在音频模型中,这种关系仍未得到充分探讨。在本研究中,我们调查培训条件和输入特点如何影响OOD条件下口语关键词分类器的稳健性和一般化能力。我们用各种评价组来衡量若干神经结构。为了量化噪音对概括化的影响,我们使用两个衡量比基线模型衡量总体准确性收益的尺度:公平(F)和Robustness(R),评估ID与OOD性能的趋同。我们的研究结果表明,噪音意识培训在某些配置中提高了稳健性。这些结果为语音模型中基于噪音的放大的好处和局限性提供了新的线索。

Article 149

Title: ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

ModalTune: Fine-Tuning Slide-Level-Grundlagenmodelle mit multi-Modalen Informationen für Multi-Task-Lernen in der digitalen Pathologie

模式图纳:数字病理学多任务学习多模式信息多模式学习的精准引导幻灯片级基金会模型 2503.17564v2

Authors (6): Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel

Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, current methods under-utilize shared information between tasks and modalities. To overcome this challenge, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.

数字病理学的预测任务具有挑战性,因为全流图像规模庞大,培训信号薄弱。计算、数据提供和自我监督学习的进步为幻灯片级基础模型(SLFMS)铺平了道路,这些模型可以改进低数据系统中的预测任务。然而,目前的方法没有充分利用任务和模式之间的共享信息。为了克服这一挑战,我们提议ModalTune(ModalTune),这是一个创新的微调框架,引入摩尔适应器,以整合新模式,而不修改SLFM重量。此外,我们使用大语言模型(LLMS)将标签编码成文字,在单一的培训食谱中捕捉多种任务和癌症类型的语义关系。ModalTune(SOTA)在四种癌症类型的单式和多模式模型中都取得了最新的结果,共同改善生存和癌症亚型预测,同时保持在泛癌症环境中的竞争力。此外,我们展示了ModalTune(LLLMMM)将首个模型调整为二个模外模型,在数字式模型、多式模型上是我们的知识。

Article 150

Title: MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

MLE-STAR: Maschinenbauer über Suche und gezielte Veredelung

MLE-STAR:通过搜索和定向改进进行机械学习工程代理 2506.15692v2

Authors (6): Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Jinwoo Shin, Sercan Ö. Arık, Tomas Pfister

Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and perform deep exploration within specific components, such as experimenting extensively with feature engineering options. To overcome these, we propose MLE-STAR, a novel approach to build MLE agents. MLE-STAR first leverages external knowledge by using a search engine to retrieve effective models from the web, forming an initial solution, then iteratively refines it by exploring various strategies targeting specific ML components. This exploration is guided by ablation studies analyzing the impact of individual code blocks. Furthermore, we introduce a novel ensembling method using an effective strategy suggested by MLE-STAR. Our experimental results show that MLE-STAR achieves medals in 64% of the Kaggle competitions on the MLE-bench Lite, significantly outperforming the best alternative.

机械学习工程(MLE)基于大型语言模型(LLMS)的代理商可以通过代码生成自动应用MLL模型。然而,现有的建立这种代理商的方法往往严重依赖固有的LLM知识,并采用粗略的勘探战略,立即修改整个代码结构。这限制了他们选择有效任务特有模型和在特定部件中进行深入探索的能力,例如广泛试验地物工程选项。为了克服这些,我们提议MLE-STAR,这是建立MLE代理商的新办法。MLE-STAR首先利用外部知识,利用搜索引擎从网上检索有效的模型,形成初步解决方案,然后通过探索针对特定MLE组件的各种战略来迭接地加以完善。这种探索以分析单个代码块的影响为指南。此外,我们采用了一种新颖的组合方法,使用MLE-STAR所建议的有效战略。我们的实验结果表明,MLE-STAR在M-Bench Lite的Kagle竞争中取得了64%的奖牌。

Article 151

Title@2025-07-30 (3): Controlling diverse robots by inferring Jacobian fields with deep networks

Title: Controlling diverse robots by inferring Jacobian fields with deep networks

Steuerung diverser Roboter durch Rückschlüsse auf Jacobian-Felder mit tiefen Netzwerken

通过将雅各布田地与深层网络进行推断,控制各种机器人 2407.08722v2

Authors (7): Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have greatly expanded the feasible hardware, but using these systems requires control software to translate the desired motions into actuator commands. Conventional robots can easily be modeled as rigid links connected by joints, but it remains an open challenge to model and control biologically inspired robots that are often soft or made of several materials, lack sensing capabilities, and may change their material properties with use. Here, we introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field (the sensitivity of all 3D points to the robot’s actuators). Our method enables the control of robots from only a single camera, makes no assumptions about the robots’ materials, actuation, or sensing, and is trained without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators that vary in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. Because it enables robot control using a generic camera as the only sensor, we anticipate that our work will broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

复制自然有机体的复杂结构和不同功能是机器人的长期挑战。现代制造技术极大地扩大了可行的硬件,但使用这些系统需要控制软件将想要的动作转换成动画指令。常规机器人可以很容易地以通过联合连接的僵硬链接的形式建模,但它仍然是对模型和控制生物学启发的机器人的公开挑战,这些机器人往往是软的或由几种材料制成的,缺乏感知能力,并可能用来改变其物质特性。在这里,我们引入了一种方法,用深层神经网络将机器人的视频流映射到其正反射的雅各布场(所有3D点对机器人动画机的敏感性 ) 。我们的方法只能让机器人从一个单一的相机上控制,不对机器人的材料、动作或感知进行假设,而且没有经过专家干预,通过观察随机指令的执行来培训这些机器人。我们用多种机器人操控器展示了我们的方法,在动作、材料、制造和成本方面各不相同。我们的方法实现了准确的闭路控制,并恢复了每个机器人的因果动力动态结构结构结构启动每个机器人,因为我们只能使用一种普通的机器人的机器人,而使机器人能感官能够对机器人进行自我控制。

Article 152

Title@2025-07-30 (3): Insights into resource utilization of code small language models serving with runtime engines and execution providers

Title: Insights into resource utilization of code small language models serving with runtime engines and execution providers

Einblicke in die Ressourcennutzung von Code-Small Language-Modellen, die mit Laufzeit-Engines und Ausführungsanbietern dienen

深入了解为运行时引擎和执行提供方服务的编码小型语文模式的资源利用情况 2412.15441v2

Authors (4): Francisco Durán, Matias Martinez, Patricia Lago, Silverio Martínez-Fernández

The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers’ requirements for enhancing serving resource utilization efficiency.

语言模式的迅速增长,特别是在代码生成方面,要求大量计算资源,引起对能源消耗和环境影响的关切。优化语言模型推断资源利用的假设至关重要,而小型语言模型为减少资源需求提供了一个大有希望的解决办法。我们的目标是分析深度学习服务配置的影响,即运行时间引擎和执行提供者的组合,在能源消耗、执行时间和计算资源利用方面,在能源使用、执行时间和计算资源利用方面,从在代码生成可持续土地管理方面进行推断的软件工程师的角度来看,对资源利用的影响。我们开展了面向技术的多阶段试验管道,使用12种代码生成的可持续土地管理来调查能源消耗、执行时间和计算资源利用情况。我们利用了12种代码生成的可持续土地管理来调查能源消耗、执行时间和计算资源利用情况。CUDA执行供应商配置比CUP执行供应商的配置要大得多。TORCH与CUDA的组合相比,进一步提高了能源效率,实现了从72.99 %到89.16%的节能节约率,而其他服务配置则比其他配置。同样,将ONX的运行引擎与CNX的运行周期运行引擎与C-PUDFE的利用率进行大幅提升的节能利用。

Article 153

Title@2025-07-30 (3): FLOSS: Federated Learning with Opt-Out and Straggler Support

Title: FLOSS: Federated Learning with Opt-Out and Straggler Support

FLOSS: Föderiertes Lernen mit Opt-Out und Straggler-Unterstützung

FLOSS: 具有“Opt-Out”和“Straggler”支持的联邦学习 2507.23115v1

Authors (4): David J Goetze, Dahlia J Felten, Jeannie R Albrecht, Rohit Bhattacharya

Previous work on data privacy in federated learning systems focuses on privacy-preserving operations for data from users who have agreed to share their data for training. However, modern data privacy agreements also empower users to use the system while opting out of sharing their data as desired. When combined with stragglers that arise from heterogeneous device capabilities, the result is missing data from a variety of sources that introduces bias and degrades model performance. In this paper, we present FLOSS, a system that mitigates the impacts of such missing data on federated learning in the presence of stragglers and user opt-out, and empirically demonstrate its performance in simulations.

联邦学习系统中以往关于数据隐私的工作侧重于对同意分享其数据以供培训的用户提供的数据进行隐私保护操作,然而,现代数据隐私协议还授权用户使用该系统,同时选择不按要求分享数据,如果与不同设备能力产生的分层数据相结合,结果就会缺少各种来源的数据,从而导致偏差和降低模型性能。在本文件中,我们介绍了FLOSS这一系统,该系统可以减轻这种缺失数据对在排行器和用户选择退出的情况下进行联合学习的影响,并在模拟中从经验上展示其性能。

Article 154

Title@2025-07-30 (3): Scalable Generative Modeling of Weighted Graphs

Title: Scalable Generative Modeling of Weighted Graphs

Skalierbare Generative Modellierung von gewichteten Graphen

加权图表的可缩放生成建模 2507.23111v1

Authors (3): Richard Williams, Eric Nalisnick, Andrew Holbrook

Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are either designed for unweighted graphs and are not easily extended to weighted topologies or incorporate edge weights without consideration of a joint distribution with topology. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model BiGG-E, a nontrivial extension of the BiGG model, that learns a joint distribution over weighted graphs while still exploiting sparsity to generate a weighted graph with $n$ nodes and $m$ edges in $O((n + m)\log n)$ time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that BiGG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient.

在生物、化学和社会科学中,加权图表无处不在,推动利用深神经网络为抽象加权图表数据开发基因模型,然而,目前大多数深基因模型要么是为未加权图表设计的,不易扩展至加权表层,要么在不考虑与地形学联合分布的情况下纳入边缘加权数。此外,在加权图的分布中,必须考虑到图表边缘与每一边缘相应重量之间的复杂非局部依赖性。我们开发了一个自动递增模型BiGG-E,这是BiGG模型的非边际扩展,该模型学习了对加权图的联合分布,同时仍在利用宽度生成一个以美元(n + m)\log n美元计时间的加权图。对各种基准数据集的模拟研究和实验表明,BiGG-E最佳的捕获分布高于加权图,同时保持可缩放和计算效率。

Article 155

Title@2025-07-30 (3): Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Title: Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Graphenstichproben für skalierbare und expressive Graphenneurale Netzwerke auf homophilen Graphen

光益图可缩缩和伸缩图形神经网络图示样本 2410.16593v4

Authors (3): Haolin Li, Haoyu Wang, Luana Ruiz

Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace – a proxy for the graph connectivity – than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.

神经网络图(GNN)在许多图形机器学习任务中非常出色,但在向大型网络推广时面临挑战。 GNN可转让性允许对较小的图表进行培训,并将模型应用到较大的网络中,但现有方法往往依靠随机的子抽样,导致断开子图和减少模型表达性。我们建议采用新的图表抽样算法,利用同质特征来保存图形结构。通过尽量减少数据相关矩阵的痕量,我们的方法比随机抽样更好地保存图Lapalecian追踪 – – 图形连接的替代物 – – 而不是随机抽样,同时比光谱方法复杂。对引用网络的实验显示,与随机抽样相比,在保护Laplacian追踪和GNN可转移性方面表现更好。

Article 156

Title@2025-07-30 (3): Coarse Graining with Neural Operators for Simulating Chaotic Systems

Title: Coarse Graining with Neural Operators for Simulating Chaotic Systems

Grobkörnung mit neuralen Operatoren zur Simulation chaotischer Systeme

与模拟劣质系统神经操作员的粗粗谷物 2408.05177v5

Authors (7): Chuwei Wang, Julius Berner, Zongyi Li, Di Zhou, Jiayun Wang, Jane Bae, Anima Anandkumar

Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.

准确预测混乱系统的长期行为对于气候模型等各种应用来说至关重要。然而,实现这样的预测通常需要在一个密集的瞬时电网上进行迭代计算,以说明混乱系统的不稳定性,在许多现实世界局势中,这种系统费用昂贵且不切实际。完全解析模拟的替代办法是使用粗略的网格,然后通过一个粗略的网格纠正其错误,而后通过一个粗略的网格来纠正其错误。它与粗略的网格模拟中未捕捉到的微小比例的总体信息相近。最近, ML 方法被用于进行闭合模型,但通常需要大量来自昂贵的完全解析模拟(FRS)的培训样本。在这项工作中,我们学习关闭模型的标准方法与一般问题的大近似差有关,不管模型有多大,而且它们源自于非基础的 Ral-ral-ral 平流模型,我们用物理学的内值的内值的内价模型(PINO) 和内值的内值的内值的内端到端学习方法。

Article 157

Title@2025-07-30 (3): RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

Title: RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

RASL: 大规模数据库文本到 SQL 的检索增强的相连接表表 2507.23104v1

Authors (3): Jeffrey Eben, Aitzaz Ahmad, Stephen Lau

Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases with varying structure and available metadata. Our solution enables practical text-to-SQL systems deployable across diverse enterprise settings without specialized fine-tuning, addressing a critical scalability gap in natural language database interfaces.

尽管在数据库的大型语言模型(LLM)的自然语言界面方面取得了进展,但推广到企业一级数据目录仍然是一个未得到充分探讨的挑战。先前应对这一挑战的工作依赖于具体领域的微调(复杂的部署),未能利用数据库元数据中所包含的重要的语义背景。为解决这些局限性,我们引入了一个基于组成部分的检索结构,将数据库的图和元数据分解成独立的语义单位,每个单元单独编制索引,用于有针对性的检索。我们的方法优先考虑有效的表格识别,同时利用列级信息,确保检索的表格总数保持在可管理的背景预算之内。实验表明,我们的方法保持高回溯率和准确性,我们的系统运行基线超过结构不同和可用元数据庞大的数据库。我们的解决办法使实用的文本到SQL系统能够在没有专门微调的情况下在不同的企业环境中部署,从而解决自然语言数据库界面中关键的可扩展差距。

Article 158

Title@2025-07-30 (3): On the Sustainability of AI Inferences in the Edge

Title: On the Sustainability of AI Inferences in the Edge

Zur Nachhaltigkeit von KI-Schlussfolgerungen am Rande

AI I 边缘推论的可持续性 2507.23093v1

Authors (4): Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, Israat Haque

The proliferation of the Internet of Things (IoT) and its cutting-edge AI-enabled applications (e.g., autonomous vehicles and smart industries) combine two paradigms: data-driven systems and their deployment on the edge. Usually, edge devices perform inferences to support latency-critical applications. In addition to the performance of these resource-constrained edge devices, their energy usage is a critical factor in adopting and deploying edge applications. Examples of such devices include Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson nano (NJn), and Google Coral USB (GCU). Despite their adoption in edge deployment for AI inferences, there is no study on their performance and energy usage for informed decision-making on the device and model selection to meet the demands of applications. This study fills the gap by rigorously characterizing the performance of traditional, neural networks, and large language models on the above-edge devices. Specifically, we analyze trade-offs among model F1 score, inference time, inference power, and memory usage. Hardware and framework optimization, along with external parameter tuning of AI models, can balance between model performance and resource usage to realize practical edge AI deployments.

互联网(IOT)及其尖端的AI驱动应用程序(如自主汽车和智能行业)的扩展(如自主汽车和智能行业)结合了两个范例:数据驱动系统及其在边缘的部署。通常,边缘装置会为支持潜伏关键应用程序而进行推导。除了这些资源限制的边缘装置外,它们的能源使用是采用和部署边缘应用程序的一个关键因素。这些装置的例子包括Raspberry Pi(Ripi)、Intel Neal Comput Stact(INCS)、NVIDIA Jetson Nam(NJN)和Google Coral USB(GCU)等。尽管它们被采用为人工智能推断进行边缘部署,但是没有研究其性能和能源使用情况,用以支持对装置和模型选择的知情决策,以满足应用程序的需求。这一研究通过严格描述传统、神经网络的性能和关于上层装置的大型语言模型的性能模型,填补了空白。我们分析了模型F1的得分、推价时间、推力力和记忆使用之间的实际利用。硬盘和框架优势,可以实现AI资源部署的外部参数的性能平衡,同时调整AI的资源性能和框架优化。

Article 159

Title@2025-07-30 (3): Accenture-NVS1: A Novel View Synthesis Dataset

Title: Accenture-NVS1: A Novel View Synthesis Dataset

Accenture-NVS1: Ein neuartiger Synthesedatensatz

Accenture-NVS1:新观点合成数据集 2503.18711v2

Authors (9): Thomas Sugg, Kyle O’Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani

This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.

本文件介绍ACC-NVS1,这是专门为新观点合成研究而设计的专门数据集,专门用于空中和地面图像;Austin、TX和匹兹堡、巴权力机构2023年和2024年收集了ACC-NVS1的数据;收集包括从空中和地面摄像头拍摄的六种不同的真实世界景象,共产生148 000个图像;ACC-NVS1处理不同高度和瞬时物体等挑战;该数据集旨在补充现有的数据集,为全面研究提供额外资源,而不是作为基准。

Article 160

Title@2025-07-30 (3): Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation

Title: Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation

Dynamisch inspiriertes Lernen invarianter Subräume für Koopman und Transferoperator Approximation

Koopman 和传输操作员近似值的动态激励学习动态激励的变量子空间和传输操作员近似值 2505.05085v2

Authors (2): Gary Froyland, Kevin Kühl

Transfer and Koopman operator methods offer a framework for representing complex, nonlinear dynamical systems via linear transformations, enabling a deeper understanding of the underlying dynamics. The spectra of these operators provide important insights into system predictability and emergent behaviour, although efficiently estimating them from data can be challenging. We approach this issue through the lens of general operator and representational learning, in which we approximate these linear operators using efficient finite-dimensional representations. Specifically, we machine-learn orthonormal basis functions that are dynamically tailored to the system. This learned basis provides a particularly accurate approximation of the operator’s action as well as a nearly invariant finite-dimensional subspace. We illustrate our approach with examples that showcase the retrieval of spectral properties from the estimated operator, and emphasise the dynamically adaptive quality of the machine-learned basis.

转让和Koopman操作员方法为通过线性转换代表复杂、非线性动态系统提供了一个框架,使这些操作员的光谱能够更深刻地了解系统可预测性和突发行为,尽管从数据中有效地估计它们可能具有挑战性。我们通过一般操作员和代表学习的透镜来处理这个问题,我们从一般操作员和代表学习的角度来比较这些线性操作员。具体地说,我们机利恩或体外基功能是动态地适应系统的,这种学习基础为操作员的行动提供了特别准确的近似,并提供了几乎是无变的有限维次空间。我们举例说明了从估计操作员那里检索光谱特性的方法,并强调了机学基础的动态适应性质量。

Article 161

Title@2025-07-30 (3): A Foundation Model for Material Fracture Prediction

Title: A Foundation Model for Material Fracture Prediction

Ein Grundlagenmodell für die Vorhersage von Materialfrakturen

材料断裂预测基金会模型 2507.23077v1

Authors (17): Agnese Marcato, Aleksandra Pachalieva, Ryley G. Hill, Kai Gao, Xiaoyu Wang, Esteban Rougier, Zhou Lei, Vinamra Agrawal, Janel Chua, Qinjun Kang, Jeffrey D. Hyman, Abigail Hunter, Nathan DeBardeleben, Earl Lawrence, Hari Viswanathan, Daniel O’Malley, Javier E. Santos

Accurately predicting when and how materials fail is critical to designing safe, reliable structures, mechanical systems, and engineered components that operate under stress. Yet, fracture behavior remains difficult to model across the diversity of materials, geometries, and loading conditions in real-world applications. While machine learning (ML) methods show promise, most models are trained on narrow datasets, lack robustness, and struggle to generalize. Meanwhile, physics-based simulators offer high-fidelity predictions but are fragmented across specialized methods and require substantial high-performance computing resources to explore the input space. To address these limitations, we present a data-driven foundation model for fracture prediction, a transformer-based architecture that operates across simulators, a wide range of materials (including plastic-bonded explosives, steel, aluminum, shale, and tungsten), and diverse loading conditions. The model supports both structured and unstructured meshes, combining them with large language model embeddings of textual input decks specifying material properties, boundary conditions, and solver settings. This multimodal input design enables flexible adaptation across simulation scenarios without changes to the model architecture. The trained model can be fine-tuned with minimal data on diverse downstream tasks, including time-to-failure estimation, modeling fracture evolution, and adapting to combined finite-discrete element method simulations. It also generalizes to unseen materials such as titanium and concrete, requiring as few as a single sample, dramatically reducing data needs compared to standard ML. Our results show that fracture prediction can be unified under a single model architecture, offering a scalable, extensible alternative to simulator-specific workflows.

精确地预测材料在何时和如何失灵对于设计安全、可靠结构、机械系统和在压力下运作的工程构件至关重要。然而,骨折行为仍然难以在现实世界应用中建模材料的多样性、地貌和装载条件。虽然机器学习方法很有希望,但大多数模型都是在狭窄的数据集、缺乏稳健性和难以概括方面接受培训。与此同时,基于物理的模拟器提供了高纤维性预测,但在不同专门方法之间是分散的,需要大量高性能的模拟资源来探索输入空间。为了应对这些限制,我们提出了数据驱动的基础模型模型,用于骨折预测、基于变压器的建筑、在模拟器中运行的多种材料(包括塑性炸药、钢铁、铝、页岩、牙),以及不同的装载条件。该模型既支持结构又结构化又不结构化的模形模范,同时将它们与大量语言模型嵌入的流体化输入平台结合起来,以探究材料的特性、边界条件和解析器的设置。这一模型设计可以使模型能够灵活地调整模型,在模拟的模型中进行模拟,包括模拟的模化的模化到模型,将数据转换到模型,将数据转换到模型显示到模型显示的模化的模型,可以显示的模变形结构的模化的模化的模化的模型可以显示的模范。

Article 162

Title@2025-07-30 (3): Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Title: Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Weiterentwicklung der visionsbasierten menschlichen Handlungserkennung: Erforschen eines visionssprachlichen CLIP-Modells zur Generalisierung in domänenunabhängigen Aufgaben

推进基于愿景的人类行动认识:探索愿景-语言化 CLIP 在独立领域各任务中推广的CLIP模式 2507.18675v2

Authors (4): Utkarsh Shandilya, Marsha Mariya Kappan, Sanyam Jain, Vijeta Sharma

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.

人类行动承认在医疗和医疗方面发挥着关键作用,支持病人行为监测、秋季检测、手术机器人监督和程序技能评估等应用。尽管CNN和RNNs等传统模式取得了中度成功,但往往难以在各种复杂行动中一概而论。视觉语言模式最近的进展,特别是基于变压器的CLIP模式,为从视频数据中推广行动承认提供了有希望的能力。在这项工作中,我们评估了关于UCF-101数据集的CLIP,并系统地分析了其绩效,这三项掩盖战略是:(1) 基于百分数和基于形状的黑遮罩,其值为10%、30%和50%,(2) 特有遮罩,以抑制偏见诱导元素,(3) 隔离掩蔽,仅保留特定类别区域。我们的结果表明,CLIP显示行为不一致和频繁的分类错误,特别是当基本的视觉提示被模糊时。为了克服这些限制,我们建议纳入特定类别噪音,通过定制损失功能学习,以加强对分类特征的注意。这一改进了分类准确性和模型信任度,同时减少偏见。我们的结论是,在临床设想中改进了整个领域的工作方向。我们的结论是,要改进了在临床设想中改进了整个领域。

Article 163

Title@2025-07-30 (3): Locally Differentially Private Thresholding Bandits

Title: Locally Differentially Private Thresholding Bandits

Lokal unterschiedlich private Thresholding Bandits

地方差异式私家强盗 2507.23073v1

Authors (3): Annalisa Barbara, Joseph Lazzaro, Ciara Pike-Burke

This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.

这项工作调查了在临界强盗问题中确保地方隐私差异的影响。我们既考虑固定预算,也考虑固定的信任环境。我们提出一些方法,利用以伯努利为基础的有差别的私人机制获得的私人反应,确定预期收益超过预定门槛的武器。我们表明,这一程序提供了有力的隐私保障,并从拟议算法的理论性能界限中得出理论性能界限。此外,我们提出了一般较低的界限,这些界限是任何有差别的私人机制造成的额外损失的特征,并表明所提出的算法符合这些较低界限的多元对数因素。我们的结果为在强盗问题中保护隐私的决策框架提供了宝贵的洞见。

Article 164

Title@2025-07-30 (3): Affect Models Have Weak Generalizability to Atypical Speech

Title: Affect Models Have Weak Generalizability to Atypical Speech

Affect Models haben geringe Verallgemeinerbarkeit zu atypischer Sprache

效果模型对非典型演讲的可普及性较弱 2504.16283v2

Authors (5): Jaya Narain, Amrit Romana, Vikramjit Mitra, Colin Lea, Shirley Ren

Speech and voice conditions can alter the acoustic properties of speech, which could impact the performance of paralinguistic models for affect for people with atypical speech. We evaluate publicly available models for recognizing categorical and dimensional affect from speech on a dataset of atypical speech, comparing results to datasets of typical speech. We investigate three dimensions of speech atypicality: intelligibility, which is related to pronounciation; monopitch, which is related to prosody, and harshness, which is related to voice quality. We look at (1) distributional trends of categorical affect predictions within the dataset, (2) distributional comparisons of categorical affect predictions to similar datasets of typical speech, and (3) correlation strengths between text and speech predictions for spontaneous speech for valence and arousal. We find that the output of affect models is significantly impacted by the presence and degree of speech atypicalities. For instance, the percentage of speech predicted as sad is significantly higher for all types and grades of atypical speech when compared to similar typical speech datasets. In a preliminary investigation on improving robustness for atypical speech, we find that fine-tuning models on pseudo-labeled atypical speech data improves performance on atypical speech without impacting performance on typical speech. Our results emphasize the need for broader training and evaluation datasets for speech emotion models, and for modeling approaches that are robust to voice and speech differences.

语言和声音条件可以改变语言的声学特性,这可能会影响语言模式对非典型演讲者的影响。我们评估了公开的承认绝对和维度的模型,这些模型从非典型演讲的数据集上的讲话中产生绝对和维度的影响,将结果与典型演讲的数据集进行比较。我们调查了言语非典型的三个维度:与发音相关的智能;与言调质量相关的单一和严厉性。我们查看:(1) 绝对影响数据集内预测的分布趋势;(2) 绝对影响与典型演讲类似数据集的预测的分布性趋势;(3) 文本和言词预测之间的相关性强度,以换取价值和振奋人心。我们发现,影响模型的输出受到言语的出现和程度的重大影响。举例说,与类似典型的言语数据集相比,预测为悲哀的言语比例要高得多。在初步调查中,如何改善非典型演讲的稳健度和典型的言词的预测性能,我们发现,如何改进典型的言语学模型需要精确的情绪模型,而不需要对典型的言语学模型进行精准性评价。

Article 165

Title@2025-07-30 (3): Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

Title: Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

Vision-Language Fusion für autonomes Fahren in Echtzeit: Zielzentrierte Cross-Achtung von Kamera, HD-Karte und Wegpunkten

实时自主驾驶的视觉-语言融合:以目标为中心交叉使用相机、HD-地图、和途径点 2507.23064v1

Authors (3): Santosh Patapati, Trisanth Srinivasan, Murari Ambati

Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model. On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate. Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving.

自动汽车需要几何精度和语义理解来导航复杂环境, 但大多数堆叠都单独处理。我们展示了 XYZ- Drive, 是一个单一的视觉语言模型, 读的是前摄盘框架, 25 美元 25 美元, 下一条路标, 然后是输出方向和速度。一个轻量的目标偏向交叉注意层让路标标记突出相关图像和地图补丁, 支持动作和文字解释, 在装配的标牌进入一个部分微调的 LLama- 3.2 11B 模型之前, 。在 MD- NEX 外部驱动基准 XYZ- Drive 上, 成功95% 和 0.80 成功由路径长度( SPL) 加权, 超过 Physnav- DG 15 % , 将碰撞减半。同时通过只使用一个分支大大地提高效率。 16 缩略图解释了收益。将任何模式( 前景、路标、点、地图) 成功率降低到 11 % , , 证实其互补作用和连接连接连接连接连接连接。成功成功上, 提升了这些目标- 的地图提升了方向- 提升了方向的进度 , 提升了这些驱动力提升了 , 提升了将驱动力 , 提升了提升了 , , 提升了将将将将将方向的的将提升了驱动算方向平流将的。

Article 166

Title@2025-07-30 (3): Lattice Protein Folding with Variational Annealing

Title: Lattice Protein Folding with Variational Annealing

Gitterprotein-Falten mit Variations-Analing

Lattice Protein 以变式安纳林方式折叠 2502.20632v2

Authors (3): Shoummo Ahsan Khandoker, Estelle M. Inack, Mohamed Hibat-Allah

Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.

理解蛋白折叠原则是计算生物学的基石,对药物设计、生物工程和基本生物过程的理解具有影响。拉蒂蛋白折叠模型为研究蛋白折叠的复杂性提供了一个简化而有力的框架,使得能够在受限条件下探索能干的最佳折叠。然而,找到这些最佳折叠是一个在计算上具有挑战性的组合优化问题。在这项工作中,我们引入了一个新型的上限培训计划,利用掩罩来识别二维水分疏松-Pollar(HP)压实蛋白折叠中的最低能量折叠。通过利用冷冻的常规神经网络(RNNS)与受温度波动驱动的反射过程相结合,我们的方法准确地预测了高达60个珠色的基准系统的最佳折叠。我们的方法还有效地遮盖了无效的折叠叠,而不会损害RNS的自动递增取样特性。这个计划可概括为三个空间层面,可以扩展到带有较大字母的拉蒂蛋白模型。我们的研究结论强调先进的机器学习技术在解决复杂蛋白质折叠问题方面的潜力。

Article 167

Title@2025-07-30 (3): Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through Catboost

Title: Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through Catboost

Vorhersage einer signifikanten Kreatininerhöhung in der ersten Intensivstation bei Vancomycin Anwendung: Eine retrospektive Studie über Catboost

第一次伊斯兰法院联盟第一次重大生物量升高预测与Vancomycin使用保持:通过Cattopost进行的回顾性研究 2507.23043v1

Authors (9): Junyi Fan, Li Sun, Shuheng Chen, Yong Si, Minoo Ahmadi, Greg Placencia, Elham Pishgar, Kamiar Alaei, Maryam Pishgar

Background: Vancomycin, a key antibiotic for severe Gram-positive infections in ICUs, poses a high nephrotoxicity risk. Early prediction of kidney injury in critically ill patients is challenging. This study aimed to develop a machine learning model to predict vancomycin-related creatinine elevation using routine ICU data. Methods: We analyzed 10,288 ICU patients (aged 18-80) from the MIMIC-IV database who received vancomycin. Kidney injury was defined by KDIGO criteria (creatinine rise >=0.3 mg/dL within 48h or >=50% within 7d). Features were selected via SelectKBest (top 30) and Random Forest ranking (final 15). Six algorithms were tested with 5-fold cross-validation. Interpretability was evaluated using SHAP, Accumulated Local Effects (ALE), and Bayesian posterior sampling. Results: Of 10,288 patients, 2,903 (28.2%) developed creatinine elevation. CatBoost performed best (AUROC 0.818 [95% CI: 0.801-0.834], sensitivity 0.800, specificity 0.681, negative predictive value 0.900). Key predictors were phosphate, total bilirubin, magnesium, Charlson index, and APSIII. SHAP confirmed phosphate as a major risk factor. ALE showed dose-response patterns. Bayesian analysis estimated mean risk 60.5% (95% credible interval: 16.8-89.4%) in high-risk cases. Conclusions: This machine learning model predicts vancomycin-associated creatinine elevation from routine ICU data with strong accuracy and interpretability, enabling early risk detection and supporting timely interventions in critical care.

本研究旨在开发一个机器学习模型,用常规的 ICU 数据来预测vancomycin 与胆碱酯酶相关的胆碱酯酶升高。方法:我们从MIMIMIC-IV 数据库中分析了10 288名接受vancomycin的ICU病人(18-80岁),根据KDIGO标准(48小时内肾脏酸碱增加0.3毫克/dL,或7天内50%内肾脏伤害的早期预测具有挑战性。通过SEPKBest(顶部30)和随机森林排名(最后15)选择了功能特征。六种算法进行了5倍交叉校验。我们用SHAP、累积地方效应(ALE)和Bayesian远地点取样结果进行了评估。结果:10 288个P,2 903个PSEGERO(28.2%),483 mindexielentiorateal elevorate:Cooperation - clobal-IOC 0.818 [95,CILILILE) IM 预估测算。

Article 168

Title@2025-07-30 (3): Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Title: Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Frühe zielgeführte Multi-Scale Fusion für Echtzeit-Vision-Sprachenfahren

实时愿景-语言定位驱动目标引导的早期多阶段融合 2507.23042v1

Authors (2): Santosh Patapati, Trisanth Srinivasan

Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive’s shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.

自动车辆必须在光量、两阶段交叉关注区块首先将路标与 HD 地图统一起来, 然后将注意力放在细微的图像和深度补丁上。加上新的平稳性损失, 阻止突然方向和速度变化, 这个设计消除了对经常性记忆的需求。我们微调了11B Lama- 3.2 愿景语言主干线的顶端15层, 使得实时推断。在MD- EXT 外部基准的 Nusces / Waymo 子集上, NovaDrive将成功率提高到84% (+4 % ) , 提高路径效率( SPL) 到 0.66 (+0. 11 ) , 并且将碰撞频率从2. 6% (- 1.4%) 降低到经常性记忆。相对于 11 BLLamaMA-3.2 愿景语言主干线的顶端15层, 使得实时推断。 NVD 将成功率提高到84% (+4 % ) , 提高路径效率(SPL) 至 0.66 (+0.11) , 降低碰撞频率频率频率频率从2.4) 降低到 1.4 将成本更新到 1.4 更新到更新到。

Article 169

Title@2025-07-30 (3): Two-dimensional Parallel Tempering for Constrained Optimization

Title: Two-dimensional Parallel Tempering for Constrained Optimization

Zweidimensionales paralleles Temperieren für eingeschränkte Optimierung

限制优化的二维平行热量 2506.14781v2

Authors (4): Corentin Delacour, M Mahmudul Hasan Sajeeb, Joao P. Hespanha, Kerem Y. Camsari

Sampling Boltzmann probability distributions plays a key role in machine learning and optimization, motivating the design of hardware accelerators such as Ising machines. While the Ising model can in principle encode arbitrary optimization problems, practical implementations are often hindered by soft constraints that either slow down mixing when too strong, or fail to enforce feasibility when too weak. We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT) that addresses this challenge by adding a second dimension of replicas interpolating the penalty strengths. This scheme ensures constraint satisfaction in the final replicas, analogous to low-energy states at low temperature. The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas and eliminates the need to explicitly tune the penalty strength. In a representative example of graph sparsification with copy constraints, 2D-PT achieves near-ideal mixing, with Kullback-Leibler divergence decaying as O(1/t). When applied to sparsified Wishart instances, 2D-PT yields orders of magnitude speedup over conventional PT with the same number of replicas. The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.

Boltzmann 取样概率分布在机器学习和优化中起着关键作用,激励设计像 Ising 机器这样的硬件加速器。虽然Ising 模型原则上可以将任意优化问题编码起来, 但实际执行往往受到软性限制的阻碍,这些软性限制要么在太强时放慢混合速度,要么在太弱时不能执行可行性。我们引入了强大的平行相平行调和算法(PT)的二维扩展,通过增加复制的第二维维度来应对这一挑战,将惩罚力相互调和。这个方法确保最终复制品的满意度,类似于低温低能状态。由此产生的二维平行调和调和算法( 2D-PT) 改进了在严重受限制的复制品中的混合,并消除了明确调和刑罚强度的需要。在具有代表性的复制限制的图形中, 2D- PT 实现了近理想的混合, 其 Kullback- Leeper 差异与 O (1/t) 相衰减。在施压Winart 实例时, 2D-PT 产生比常规PT 速度加快速度的定, 与现有机器被广泛限制。

Article 170

Title@2025-07-30 (3): Linking Actor Behavior to Process Performance Over Time

Title: Linking Actor Behavior to Process Performance Over Time

Verknüpfen des Verhaltens von Schauspielern mit der Prozessleistung im Laufe der Zeit

将动作器行为与时间过程性能链接 2507.23037v1

Authors (6): Aurélie Leribaux, Rafael Oyamada, Johannes De Smedt, Zahra Dasht Bozorgi, Artem Polyvyanyy, Jochen De Weerdt

Understanding how actor behavior influences process outcomes is a critical aspect of process mining. Traditional approaches often use aggregate and static process data, overlooking the temporal and causal dynamics that arise from individual actor behavior. This limits the ability to accurately capture the complexity of real-world processes, where individual actor behavior and interactions between actors significantly shape performance. In this work, we address this gap by integrating actor behavior analysis with Granger causality to identify correlating links in time series data. We apply this approach to realworld event logs, constructing time series for actor interactions, i.e. continuation, interruption, and handovers, and process outcomes. Using Group Lasso for lag selection, we identify a small but consistently influential set of lags that capture the majority of causal influence, revealing that actor behavior has direct and measurable impacts on process performance, particularly throughput time. These findings demonstrate the potential of actor-centric, time series-based methods for uncovering the temporal dependencies that drive process outcomes, offering a more nuanced understanding of how individual behaviors impact overall process efficiency.

了解行为体行为如何影响流程结果是开采过程的一个关键方面。传统方法经常使用综合和静态流程数据,忽略个体行为体行为产生的时间和因果动态。这限制了准确捕捉真实世界进程复杂性的能力,因为个体行为体行为和行为体之间的互动极大地影响着绩效。在这项工作中,我们通过整合行为体行为分析与Granger因果关系来消除这一差距,以确定时间序列数据中的关联性。我们对真实世界事件日志采用这一方法,为行为体互动构建时间序列,即持续、中断和移交,以及流程结果。使用Group Lasso来进行滞后选择,我们找出了一组小但始终具有影响力的滞后点,以捕捉大多数因果影响,表明行为体行为对流程绩效,特别是吞吐时间有直接和可衡量的影响。这些发现显示了以行为体为中心的时间序列方法在发现驱动流程结果的时间依赖性方面的潜力,更细致地理解个体行为如何影响整个流程效率。

Article 171

Title@2025-07-30 (3): KLLM: Fast LLM Inference with K-Means Quantization

Title: KLLM: Fast LLM Inference with K-Means Quantization

KLLM: Schnelle LLM-Inferenz mit K-Means-Quantisierung

KLLM: 快速LLM 与 K- Means 量化的推论 2507.23035v1

Authors (7): Xueying Wu, Baijun Zhou, Zhihui Gao, Yuzhe Fu, Qilin Zheng, Yintao He, Hai Li

Large language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. However, two key challenges remain in the existing WAQ designs. (1) Traditional WAQ designs rely on uniform integer-based quantization for hardware efficiency, but this often results in significant accuracy degradation at low precision. K-Means-based quantization, a non-uniform quantization technique, achieves higher accuracy by matching the Gaussian-like distributions of weights and activations in LLMs. However, its non-uniform nature prevents direct execution on low-precision compute units, requiring dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers further hinder effective low-precision WAQ. Offline thresholding methods for outlier detection can lead to significant model performance degradation, while existing online detection techniques introduce substantial runtime overhead. To address the aforementioned challenges and fully unleash the potential of WAQ with K-Means quantization for LLM inference, in this paper, we propose KLLM, a hardware-software co-design framework. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a novel outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference. Extensive experiments show that, on average, KLLM achieves speedups of 9.67x, 7.03x and energy efficiency improvements of 229.50x, 150.21x compared to the A100 GPU and Atom, respectively.

大型语言模型(LLM)的推论因其密集的内存和计算需求而构成重大挑战。重力和激活四分制(WAQ)通过减少记忆足迹和算术复杂性提供了一个大有希望的解决方案。然而,现有的WAQ设计中仍存在两个关键挑战。 (1) 传统的WAQ设计依靠统一的整数量化来提高硬件效率,但通常导致低精度的精确度大幅下降。 K-Means基四分制(一种非单式四分制技术),通过匹配像高斯一样的重量分布和LLMM的激活(WAQQ)。然而,它的不统一性质使得无法直接执行低精确度的计算单位。 (1) 传统的WAQQQQQQ 设计依靠统一的整分级整分级量化,但是这往往导致低精度有效低精度的WAQQ。 K-MLUIQ的离值临界分级分级测试方法可以导致显著的模型性能退化,而现有的在线检测技术可以大量运行。为了应对上述挑战,并且完全释放了K-MA公司最低的操作,K-MLULLLLL的中的一项不精度操作,在K-MA的中可以显示一个不精度框架。

Article 172

Title@2025-07-30 (3): Recursive Learning-Based Virtual Buffering for Analytical Global Placement

Title: Recursive Learning-Based Virtual Buffering for Analytical Global Placement

Rekursives Lernen-basiertes virtuelles Puffern für analytische globale Platzierung

分析全球职位安排的基于学习的累累虚拟缓冲 2506.17247v2

Authors (3): Andrew B. Kahng, Yiting Liu, Zhiang Wang

Due to the skewed scaling of interconnect versus cell delay in modern technology nodes, placement with buffer porosity (i.e., cell density) awareness is essential for timing closure in physical synthesis flows. However, existing approaches face two key challenges: (i) traditional van Ginneken-Lillis-style buffering approaches are computationally expensive during global placement; and (ii) machine learning-based approaches, such as BufFormer, lack a thorough consideration of Electrical Rule Check (ERC) violations and fail to “close the loop” back into the physical design flow. In this work, we propose MLBuf-RePlAce, the first open-source learning-driven virtual buffering-aware analytical global placement framework, built on top of the OpenROAD infrastructure. MLBuf-RePlAce adopts an efficient recursive learning-based generative buffering approach to predict buffer types and locations, addressing ERC violations during global placement. We compare MLBuf-RePlAce against the default virtual buffering-based timing-driven global placer in OpenROAD, using open-source testcases from the TILOS MacroPlacement and OpenROAD-flow-scripts repositories. Without degradation of post-route power, MLBuf-RePlAce achieves (maximum, average) improvements of (56%, 31%) in total negative slack (TNS) within the open-source OpenROAD flow. When evaluated by completion in a commercial flow, MLBuf-RePlAce achieves (maximum, average) improvements of (53%, 28%) in TNS with an average of 0.2% improvement in post-route power.

由于现代技术节点互连和细胞延迟的偏斜缩放,在物理合成流程中,使用缓冲孔隙(即细胞密度)意识对于时间关闭至关重要。然而,现有方法面临两大挑战:(一) 传统的van Ginneken-Lillis式缓冲方法在全球定位期间计算成本高昂;(二) 机械学习方法,如BufFormer, 缺乏对电气规则检查(ERC)违反情况的彻底考虑,未能将“循环关闭”回到物理设计流中。在此工作中,我们提议在 OpenROAD 基础设施顶端建立第一个开放源学习驱动的虚拟缓冲意识分析全球定位框架。 MLBuf-Replace-LAce, 预测缓冲类型和地点,处理全球定位中 ERC 违反情况。我们比较MLf-RepalAce, Opreal-Pral-Place Oral-LOral-ral-ral-ral-ral-ral-ral-rassal-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-rvial-rvial-ral-ral-ral-l-l-l-rviolal-ral-rvial-ral-rvical-rvical-rvical-ral-ral-ral-rvical-ral-ral-l-l-l-l-rvical-ral-ral-ral-l-ral-ral-ral-ral-l-l-l-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-l-l-l-l-l-ral-l-l-l-l-l-l-l-l-l-l-ral-ral-ral-ral-ral-ral-ral-ral-r

Article 173

Title@2025-07-30 (3): Data Readiness for Scientific AI at Scale

Title: Data Readiness for Scientific AI at Scale

Datenbereitstellung für wissenschaftliche KI im Maßstab

规模化科学AI 数据准备程度 2507.23018v1

Authors (7): Wesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp Oral

This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models. We analyze archetypal workflows across four representative domains - climate, nuclear fusion, bio/health, and materials - to identify common preprocessing patterns and domain-specific constraints. We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard), both tailored to high performance computing (HPC) environments. This framework outlines key challenges in transforming scientific data for scalable AI training, emphasizing transformer-based generative models. Together, these dimensions form a conceptual maturity matrix that characterizes scientific data readiness and guides infrastructure development toward standardized, cross-domain support for scalable and reproducible AI for science.

本文探讨了AI(DRAI)原则的数据准备程度如何适用于用于培训基础模型的领导层规模科学数据集。我们分析了四个有代表性的领域――气候、核聚变、生物/健康和材料――的老式工作流程,以确定共同的预处理模式和特定领域的制约因素。我们引入了由数据准备程度水平(拖动至AI-准备就绪)和数据处理阶段(取之于硬)组成的两维准备状态框架,这两个阶段都是针对高性能计算环境的。本框架概述了在将科学数据转换为可升级的AI(HPC)培训、强调以变压器为基础的基因化模型方面所面临的主要挑战。这些层面共同形成了一个概念成熟性矩阵,以科学数据准备状态为特征,并指导基础设施的发展走向标准化、跨位支持可缩放和可复制的AI用于科学。

Article 174

Title@2025-07-30 (3): Deciphering interventional dynamical causality from non-intervention complex systems

Title: Deciphering interventional dynamical causality from non-intervention complex systems

Entschlüsselung interventioneller dynamischer Kausalität durch nichtinterventionsfähige komplexe Systeme

消除不干预复杂系统造成的干预性动态因果关系 2407.01621v2

Authors (8): Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Rui Bao, Kazuyuki Aihara, Luonan Chen, Wei Lin

Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.

检测和量化因果关系是科学、工程和跨学科研究领域的一个核心主题。然而,关于不干预系统的因果关系研究吸引了许多关注,但仍然极具挑战性。延迟加入技术提供了一种有希望的方法。在本研究中,我们提出了一个称为“干预动态因果关系(IntDC)”的框架,与传统的动态动态因果关系(ConDC)形成对照。ConDC,包括“因果性、传导性和交叉绘图的趋同性,通过不考虑干预而构建动态模型衡量因果关系。一个计算标准,即“干预嵌入 Entropy (IEE) ,建议以干预方式衡量因果关系。IEE是一种干扰性信息流动,但是在延迟累积空间。此外,IEEE在理论上和数字上都能够将Intreacity(不干预性)时间序列数据解密,而不需要对动态模型或对所考虑的深度干预性干预性框架的任何了解。IEEEEEE, 特别是可以将非因果性影响排序到它们的重要性,并且从真实的因果关系分析中,我们也可以将I的因果性数据流化分析从真实性分析到精确地进行。

Article 175

Title@2025-07-30 (3): A Smoothing Newton Method for Rank-one Matrix Recovery

Title: A Smoothing Newton Method for Rank-one Matrix Recovery

Eine glättende Newton-Methode für die Rank-One-Matrix-Wiederherstellung

为一等一矩阵恢复采用平滑的牛顿方法 2507.23017v1

Authors (2): Tyler Maunu, Gabriel Abreu

We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton’s method with a nonsmooth and nonconvex objective. We develop a smoothing framework that regularizes the objective, enabling a stable method with rigorous superlinear convergence guarantees. Experiments on synthetic data demonstrate this superior stability while maintaining fast convergence.

我们考虑的是阶段回收问题,这涉及到从一级测量中恢复一个一级正半无限期矩阵。最近提出的基于布里斯-沃萨尔斯坦梯度下降(BWGD)的算法(BWGD)显示出超线性趋同,但不稳定,现有的理论只能证明高级矩阵回收的局部线性趋同。我们通过披露BWGD采用牛顿方法时的非悬浮和无线性目标来消除这一差距。我们制定了一个平稳框架,规范目标,使具有严格的超线性趋同保证的稳定方法得以实现。对合成数据的实验表明这种高度稳定,同时保持快速趋同。

Article 176

Title@2025-07-30 (3): Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning

Title: Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning

Hypergraph Neural Sheaf Diffusion: Ein symmetrischer Simplicial-Set-Rahmen für höhere Anforderungen an das Lernen

超时光谱神经纤维扩散:高阶学习的对称简易设置框架 2505.05702v3

Authors (3): Seongjin Choi, Gahee Kim, Yong-Geun Oh

The absence of intrinsic adjacency relations and orientation systems in hypergraphs creates fundamental challenges for constructing sheaf Laplacians of arbitrary degrees. We resolve these limitations through symmetric simplicial sets derived directly from hypergraphs, called symmetric simplicial lifting, which encode all possible oriented subrelations within each hyperedge as ordered tuples. This construction canonically defines adjacency via facet maps while inherently preserving hyperedge provenance. We establish that the normalized degree zero sheaf Laplacian on our symmetric simplicial lifting reduces exactly to the traditional graph normalized sheaf Laplacian when restricted to graphs, validating its mathematical consistency with prior graph-based sheaf theory. Furthermore, the induced structure preserves all structural information from the original hypergraph, ensuring that every multi-way relational detail is faithfully retained. Leveraging this framework, we introduce Hypergraph Neural Sheaf Diffusion (HNSD), the first principled extension of neural sheaf diffusion to hypergraphs. HNSD operates via normalized degree zero sheaf Laplacian over symmetric simplicial lifting, resolving orientation ambiguity and adjacency sparsity inherent to hypergraph learning. Experimental evaluations demonstrate HNSDs competitive performance across established benchmarks.

在高压中,缺乏内在的相邻关系和定向系统,给建造任意度的沙夫拉皮拉仪造成了根本性挑战。我们通过直接来自高压的对称平模化装置来解决这些限制,这些装置被称为对称平模提升,将每个高格中所有可能的面向的次关系编码为订购的图质。这一建筑通过面图,对相邻关系作了可喜定义,同时自然保存了高端来源。我们确定,我们对称的平面升降平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面。我们平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面,通过平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面,平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面,平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面

Article 177

Title@2025-07-30 (3): Learning to Prune Branches in Modern Tree-Fruit Orchards

Title: Learning to Prune Branches in Modern Tree-Fruit Orchards

Lernen, Zweige in modernen Baumobstplantagen zu beschneiden

学习现代树枝果园的普鲁纳分支 2507.23015v1

Authors (3): Abhinav Jain, Cindy Grimm, Stefan Lee

Dormant tree pruning is labor-intensive but essential to maintaining modern highly-productive fruit orchards. In this work we present a closed-loop visuomotor controller for robotic pruning. The controller guides the cutter through a cluttered tree environment to reach a specified cut point and ensures the cutters are perpendicular to the branch. We train the controller using a novel orchard simulation that captures the geometric distribution of branches in a target apple orchard configuration. Unlike traditional methods requiring full 3D reconstruction, our controller uses just optical flow images from a wrist-mounted camera. We deploy our learned policy in simulation and the real-world for an example V-Trellis envy tree with zero-shot transfer, achieving a 30% success rate – approximately half the performance of an oracle planner.

Dormant 树的切割是劳动密集型的,但对于维持现代高产果园至关重要。在这项工作中,我们展示了机器人切割的闭环比目机控制器。控制器引导切割机穿过一个杂乱的树环境, 以达到特定的切断点, 并确保切削机与树枝的切削机是垂直的。我们用一种新颖的果园模拟方法对控制器进行培训, 以捕捉目标苹果果园配置中树枝的几何分布。与需要完全重建的3D传统方法不同, 我们的控制器使用手持相机的光学流图像。我们在模拟中运用我们所学的政策, 并用真实世界来模拟V- Trellis 嫉妒树, 零发转移, 达到30%的成功率 – 大约是甲板板的性能的一半。

Article 178

Title@2025-07-30 (3): Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Title: Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Untersuchung der Wechselbarkeit multimodaler Latentenräume: Einschränkungen von Optimierungsmethoden

调查多式联运低温空间的不可视性:以优化为基础的方法的局限性 2507.23010v1

Authors (1): Siwoo Park

This paper investigates the inverse capabilities and broader utility of multimodal latent spaces within task-specific AI (Artificial Intelligence) models. While these models excel at their designed forward tasks (e.g., text-to-image generation, audio-to-text transcription), their potential for inverse mappings remains largely unexplored. We propose an optimization-based framework to infer input characteristics from desired outputs, applying it bidirectionally across Text-Image (BLIP, Flux.1-dev) and Text-Audio (Whisper-Large-V3, Chatterbox-TTS) modalities. Our central hypothesis posits that while optimization can guide models towards inverse tasks, their multimodal latent spaces will not consistently support semantically meaningful and perceptually coherent inverse mappings. Experimental results consistently validate this hypothesis. We demonstrate that while optimization can force models to produce outputs that align textually with targets (e.g., a text-to-image model generating an image that an image captioning model describes correctly, or an ASR model transcribing optimized audio accurately), the perceptual quality of these inversions is chaotic and incoherent. Furthermore, when attempting to infer the original semantic input from generative models, the reconstructed latent space embeddings frequently lack semantic interpretability, aligning with nonsensical vocabulary tokens. These findings highlight a critical limitation. multimodal latent spaces, primarily optimized for specific forward tasks, do not inherently possess the structure required for robust and interpretable inverse mappings. Our work underscores the need for further research into developing truly semantically rich and invertible multimodal latent spaces.

本文考察了特定任务AI(人工智能)模式中多式联运潜在空间的反向能力和更广泛用途。虽然这些模型在设计前期任务(如文字到图像生成、音到文字转录)方面表现优异,但其反向映射潜力在很大程度上仍没有探索。我们提议一个优化框架,从理想产出中推断输入特性,在文本图像(BLIP、Flus.1-dev)和文本-Audio(Hisper-Large-V3、Chatterbox-TTS)模式中双向应用它。虽然这些模型在设计前期任务(如文字到图像生成、音频到文字转换)方面表现优异。我们的中央假设假设认为,虽然优化可以引导模型走向反向反向任务,但其多式联运潜在空间映射空间映射场将无法始终支持具有实际意义和表面一致性。我们的优化模型可以迫使模型产生与目标一致的输出结果(e.g.,文本到图像到智能模型将要求图像描述为正确描述模型,或者ASR模型不精确地将精确地预言的预言的预言,这些预言的预言中的预言质量质量和预判。

Article 179

Title@2025-07-30 (3): Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

Title: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

Stoppen Sie die Bewertung von KI mit menschlichen Tests, entwickeln Sie Prinzipien, KI-spezifische Tests statt

停止用人类测试来评价AI, 制定原则性、 AI 特定测试 2507.23009v1

Authors (5): Tom Sühr, Florian E. Dorner, Olawale Salaudeen, Augustin Kelava, Samira Samadi

Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population. Applying these tests to non-human subjects without empirical validation, risks mischaracterizing what is being measured. Furthermore, a growing trend frames AI performance on benchmarks as measurements of traits such as ``intelligence’’, despite known issues with validity, data contamination, cultural bias and sensitivity to superficial prompt changes. We argue that interpreting benchmark performance as measurements of human-like traits, lacks sufficient theoretical and empirical justification. This leads to our position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. We call for the development of principled, AI-specific evaluation frameworks tailored to AI systems. Such frameworks might build on existing frameworks for constructing and validating psychometrics tests, or could be created entirely from scratch to fit the unique context of AI.

大型语言模型(LLMS)在一系列标准化测试中取得了显著成果,这些测试最初旨在评估人类认知和心理特征,例如智力和个性;这些结果往往被解释为在LLMS中具有人性特征的有力证据,但本文认为,这些解释构成本体错误;人类心理和教育测试是理论驱动的测量工具,根据特定人类人口进行校准;将这些测试应用到非人类科目,而没有经验验证,对测量的内容有错误描述的风险;此外,越来越多的趋势框架将AI在诸如“情报”等特征衡量基准方面的绩效作为衡量基准,例如“情报”等,尽管已知的问题的有效性、数据污染、文化偏见和对表面即时变化的敏感性。我们认为,将基准性绩效解释为人性特征的测量,缺乏充分的理论和经验上的理由。这导致我们的立场:停止用人类测试来评估AI,制定有原则的、针对AI的具体测试。我们呼吁为AI系统制定有原则的、针对AI的具体评估框架。这种框架可以建立在现有的构建和验证心理测量测试框架之上,或者完全从头到完全适合AI的独特背景。

Article 180

Title@2025-07-30 (3): Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation

Title: Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation

Planung für coolere Städte: Ein multimodales KI-Framework zur Vorhersage und Milderung von städtischem Wärmestress durch Urban Landscape Transformation

更冷城市规划:通过城市景观转型预测和减轻城市热量压力的多模式AI框架 2507.23000v1

Authors (4): Shengao Yi, Xiaojiang Li, Wei Tu, Tianhong Zhao

As extreme heat events intensify due to climate change and urbanization, cities face increasing challenges in mitigating outdoor heat stress. While traditional physical models such as SOLWEIG and ENVI-met provide detailed assessments of human-perceived heat exposure, their computational demands limit scalability for city-wide planning. In this study, we propose GSM-UTCI, a multimodal deep learning framework designed to predict daytime average Universal Thermal Climate Index (UTCI) at 1-meter hyperlocal resolution. The model fuses surface morphology (nDSM), high-resolution land cover data, and hourly meteorological conditions using a feature-wise linear modulation (FiLM) architecture that dynamically conditions spatial features on atmospheric context. Trained on SOLWEIG-derived UTCI maps, GSM-UTCI achieves near-physical accuracy, with an R2 of 0.9151 and a mean absolute error (MAE) of 0.41{\deg}C, while reducing inference time from hours to under five minutes for an entire city. To demonstrate its planning relevance, we apply GSM-UTCI to simulate systematic landscape transformation scenarios in Philadelphia, replacing bare earth, grass, and impervious surfaces with tree canopy. Results show spatially heterogeneous but consistently strong cooling effects, with impervious-to-tree conversion producing the highest aggregated benefit (-4.18{\deg}C average change in UTCI across 270.7 km2). Tract-level bivariate analysis further reveals strong alignment between thermal reduction potential and land cover proportions. These findings underscore the utility of GSM-UTCI as a scalable, fine-grained decision support tool for urban climate adaptation, enabling scenario-based evaluation of greening strategies across diverse urban environments.

随着气候变化和城市化导致的极端热量事件加剧,城市在缓解室外热压力方面面临越来越多的挑战。传统物理模型,如SOLWEIG和ENVIM-Met等传统物理模型,提供了人类感知热暴露的详细评估,但其计算要求限制了全城市规划的可缩放性。在本研究中,我们提议GSM-UTCI是一个多式深层次学习框架,旨在以1米超本地分辨率预测日平均普遍热量气候指数(UTCI),模型结合地表形态(nDSM)、高分辨率土地覆盖数据和小时气象条件,使用地貌线线性调节(FILUM)架构,动态地貌暴露于大气环境中的空间特征。在SOLWEIG-衍生的UTCI地图上,GSM-UTCI实现了近物理准确度的近距离精确度,R2为0.9151美元,平均误差(MAE)为0.41,整个城市的绝对误差值。同时将基于可测时间的时间缩短至5分钟的地平整,为了展示其规划相关性,我们应用GS-UTI-18的地理结构,我们应用GS-48的模型模拟模拟模拟地平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平

Article 181

Title@2025-07-30 (3): Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Title: Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Konsistenz der Eigenschaftszuweisung in Deep Learning Architekturen für Multi-Omics

多种语言深深学习结构中地物归属的一致性 2507.22877v1

Authors (7): Daniel Claborne, Javier Flores, Samantha Erwin, Luke Durell, Rachel Richardson, Ruby Fore, Lisa Bramer

Machine and deep learning have grown in popularity and use in biological research over the last decade but still present challenges in interpretability of the fitted model. The development and use of metrics to determine features driving predictions and increase model interpretability continues to be an open area of research. We investigate the use of Shapley Additive Explanations (SHAP) on a multi-view deep learning model applied to multi-omics data for the purposes of identifying biomolecules of interest. Rankings of features via these attribution methods are compared across various architectures to evaluate consistency of the method. We perform multiple computational experiments to assess the robustness of SHAP and investigate modeling approaches and diagnostics to increase and measure the reliability of the identification of important features. Accuracy of a random-forest model fit on subsets of features selected as being most influential as well as clustering quality using only these features are used as a measure of effectiveness of the attribution method. Our findings indicate that the rankings of features resulting from SHAP are sensitive to the choice of architecture as well as different random initializations of weights, suggesting caution when using attribution methods on multi-view deep learning models applied to multi-omics data. We present an alternative, simple method to assess the robustness of identification of important biomolecules.

在过去十年里,生物研究的流行程度和使用的机器和深层次学习在生物学研究中越来越受欢迎和使用,但在解释适合的模型方面仍然存在着挑战。开发和使用衡量标准来确定驱动预测和增加模型解释的特征,这仍然是一个开放的研究领域。我们调查了在多种组合数据中应用的多视角深层次学习模型的Sapley Additive解释(SHAP)的使用情况,该模型适用于多组群数据,目的是查明感兴趣的生物分子。通过这些归因方法对各种特征的排序进行了比较,以评价该方法的一致性。我们进行了多种计算实验,以评估SHAP的稳健性,并调查模型和诊断方法的建模,以增加和衡量重要特征的识别的可靠性。我们使用随机森林模型的准确性,适合选定具有最大影响力的一组特征,以及仅使用这些特征的组合质量,以衡量归因方法的有效性。我们的研究结果表明,SHAP的特征的排序对结构的选择十分敏感,并且对不同的结构进行随机初始化。我们进行了多次进行计算,在使用简单的生物深层次识别方法时,我们对当前重要的生物深层次识别方法应用了一种重要的归属方法来评估。

Article 182

Title@2025-07-30 (3): LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content

Title: LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content

LCS: Ein KI-basierter Low-Complexity Scaler für leistungsstarke Super-Resolution von Spielinhalten

LCS: 以AI为基础的高功率超级游戏内容分辨率低复杂度缩放仪 2507.22873v1

Authors (8): Simon Pochinda, Momen K. Tageldeen, Mark Thompson, Tony Rinaldi, Troy Giorshev, Keith Lee, Jie Zhou, Frederick Walls

The increasing complexity of content rendering in modern games has led to a problematic growth in the workload of the GPU. In this paper, we propose an AI-based low-complexity scaler (LCS) inspired by state-of-the-art efficient super-resolution (ESR) models which could offload the workload on the GPU to a low-power device such as a neural processing unit (NPU). The LCS is trained on GameIR image pairs natively rendered at low and high resolution. We utilize adversarial training to encourage reconstruction of perceptually important details, and apply reparameterization and quantization techniques to reduce model complexity and size. In our comparative analysis we evaluate the LCS alongside the publicly available AMD hardware-based Edge Adaptive Scaling Function (EASF) and AMD FidelityFX Super Resolution 1 (FSR1) on five different metrics, and find that the LCS achieves better perceptual quality, demonstrating the potential of ESR models for upscaling on resource-constrained devices.

现代游戏内容的日益复杂导致GPU工作量增加成问题。在本文中,我们建议采用基于AI的低复杂度缩放器,该模型受最新技术高效超分辨率模型的启发,可将GPU的工作量卸到神经处理单位等低功率装置上。LCS接受关于GameIR图像配对的培训,这些配对原以低分辨率和高分辨率制作。我们利用对抗性培训鼓励重建重要概念细节,并采用重新计分法和量化技术来降低模型的复杂性和大小。在比较分析中,我们评估LCS与公开提供的ACM硬件调控功能(EASF)和AMD FidriityFX超分辨率1(FSR1)一起,在5个不同尺度上,发现LCS达到更好的感知性质量,展示ESR模型在扩大资源限制装置方面的潜力。

Article 183

Title@2025-07-30 (3): Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Title: Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Föderiertes Lernen mit On-Device-Training und Kommunikation im 8-Bit-Schwebepunkt

在8位浮动点进行联邦在职培训和交流 2407.02610v2

Authors (4): Bokun Wang, Axel Berg, Durmus Alp Emre Acar, Chuteng Zhou

Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training neural networks with reduced computational cost compared to training in FP32/FP16. In this work, we investigate the use of FP8 training in a federated learning context. This approach brings not only the usual benefits of FP8 which are desirable for on-device training at the edge, but also reduces client-server communication costs due to significant weight compression. We present a novel method for combining FP8 client training while maintaining a global FP32 server model and provide convergence analysis. Experiments with various machine learning models and datasets show that our method consistently yields communication reductions of at least 2.9x across a variety of tasks and models compared to an FP32 baseline to achieve the same trained model accuracy.

最近的工作表明,8比特浮动点(FP8)可用于有效培训神经网络,与FP32/FP16培训相比,计算成本降低。在这项工作中,我们调查了在联合学习背景下使用FP8培训的情况。这种办法不仅带来FP8的通常好处,这些好处对于边缘的在设备上的培训是可取的,而且由于体重压缩很大,客户-服务器通信费用也有所减少。我们提出了一个新颖的方法,将FP8客户培训结合起来,同时保持全球FP32服务器模型并提供趋同分析。与各种机器学习模型和数据集的实验表明,与FP32基准相比,我们的方法使各种任务和模型的通信量减少至少2.9倍,以实现同样的经过培训的模型准确性。

Article 184

Title@2025-07-30 (3): Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

Title: Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

Nutzung von Evolutionsstrategien zur Ausbildung von Transformern in der Stärkung des Lernens

利用进化战略培训变革者加强学习培训 2501.13883v2

Authors (2): Matyáš Lorenc, Roman Neruda

We explore the capability of evolution strategies to train an agent with a policy based on a transformer architecture in a reinforcement learning setting. We performed experiments using OpenAI’s highly parallelizable evolution strategy to train Decision Transformer in the MuJoCo Humanoid locomotion environment and in the environment of Atari games, testing the ability of this black-box optimization technique to train even such relatively large and complicated models (compared to those previously tested in the literature). The examined evolution strategy proved to be, in general, capable of achieving strong results and managed to produce high-performing agents, showcasing evolution’s ability to tackle the training of even such complex models.

我们探索进化战略培训代理人的能力,在强化学习环境中以变压器结构为基础制定政策,我们利用OpenAI高度平行的进化战略进行实验,在MuJoco人类类人运动环境以及在Atari游戏环境中培训决策变异器,测试这种黑盒优化技术是否有能力培训甚至如此规模较大和复杂的模型(与以前在文献中测试的模型相比),所审查的进化战略总的来说能够取得强有力的成果,并能够产生高性能的剂,表明进化能力,甚至能够解决这类复杂模型的培训问题。

Article 185

Title@2025-07-30 (3): Mesh based segmentation for automated margin line generation on incisors receiving crown treatment

Title: Mesh based segmentation for automated margin line generation on incisors receiving crown treatment

Mesh-basierte Segmentierung für automatisierte Margenlinien-Generierung an Schneidezähnen, die Kronenbehandlung erhalten

在接受皇冠治疗的开切器上自动生成边线的网状隔断除法 2507.22859v1

Authors (6): Ammar Alsheghri, Ying Zhang, Farnoosh Ghadiri, Julia Keren, Farida Cheriet, Francois Guibault

Dental crowns are essential dental treatments for restoring damaged or missing teeth of patients. Recent design approaches of dental crowns are carried out using commercial dental design software. Once a scan of a preparation is uploaded to the software, a dental technician needs to manually define a precise margin line on the preparation surface, which constitutes a non-repeatable and inconsistent procedure. This work proposes a new framework to determine margin lines automatically and accurately using deep learning. A dataset of incisor teeth was provided by a collaborating dental laboratory to train a deep learning segmentation model. A mesh-based neural network was modified by changing its input channels and used to segment the prepared tooth into two regions such that the margin line is contained within the boundary faces separating the two regions. Next, k-fold cross-validation was used to train 5 models, and a voting classifier technique was used to combine their results to enhance the segmentation. After that, boundary smoothing and optimization using the graph cut method were applied to refine the segmentation results. Then, boundary faces separating the two regions were selected to represent the margin line faces. A spline was approximated to best fit the centers of the boundary faces to predict the margin line. Our results show that an ensemble model combined with maximum probability predicted the highest number of successful test cases (7 out of 13) based on a maximum distance threshold of 200 m (representing human error) between the predicted and ground truth point clouds. It was also demonstrated that the better the quality of the preparation, the smaller the divergence between the predicted and ground truth margin lines (Spearman’s rank correlation coefficient of -0.683). We provide the train and test datasets for the community.

牙科牙冠是修复受损或失踪病人牙齿的基本牙科治疗方法。最近使用商业牙科设计软件对牙科牙冠进行了设计方法。一旦将准备的扫描上传到软件中,牙科技术员需要手动界定准备表面的精确边线,这构成了一个不可重复和不一致的程序。这项工作提出了一个新的框架,以自动确定边线,并使用深层学习来准确确定边线。一个协作牙科实验室提供了切口牙数据集,以训练一个深层学习分解模型。一个基于网眼的神经网通过改变输入渠道而修改,并用来将准备的牙齿分解成两个区域,这样,边线就将位于边界内的两个区域隔开。接下来,用Kxxxxx交叉校验法来训练5个模型,并使用投票分类法将结果组合起来,以便用图形剪切法来改进分解结果。随后,选择了两个区域之间的边界面,以代表边线面。一个更小的线将预制的牙齿分解线分分到两个区域间,以便把边线线线分解成为边界线线线线线线线,这样在边界内将标线直线直线线线线内,这样最小的线直径直线直径直径直线线直线线线线线线以内,这样小,这样在边界线线线线线正正正正正正正正正正正正正正正正正正正正,这样最接近于边界距线直,这样最深处之间,这样最接近两个,这样接近,这样比较近,这样接近于边界距线线线线线线线线线线线线线线线线线线线直对边界线线线线线线线将两个,这样,这样在边界直径直径直径直,这样接近于边界线线线线线线线线线线线线线线线线线线线线线正,这样接近于边界之间最深点之间最深线直径直径直径直径直径直径直径直隔,这样,这样接近,然后在边界间最深点之间最深点之间最深点之间最深点之间最深点之间最深点之间最深点之间最深点之间最深点之间最深点之间,这样接近,然后在边界距距点之间,

Article 186

Title@2025-07-30 (3): Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Title: Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Leistungsfähiges Pokémon auf menschlicher Ebene durch skalierbares Offline-Verstärkungslernen mit Transformern

通过与变革者一起进行可缩放的离线强化学习,进行人级竞争Pokémon 2504.04395v2

Authors (5): Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, Yuke Zhu

Competitive Pok'emon Singles (CPS) is a popular strategy game where players learn to exploit their opponent based on imperfect information in battles that can last more than one hundred stochastic turns. AI research in CPS has been led by heuristic tree search and online self-play, but the game may also create a platform to study adaptive policies trained offline on large datasets. We develop a pipeline to reconstruct the first-person perspective of an agent from logs saved from the third-person perspective of a spectator, thereby unlocking a dataset of real human battles spanning more than a decade that grows larger every day. This dataset enables a black-box approach where we train large sequence models to adapt to their opponent based solely on their input trajectory while selecting moves without explicit search of any kind. We study a progression from imitation learning to offline RL and offline fine-tuning on self-play data in the hardcore competitive setting of Pok'emon’s four oldest (and most partially observed) game generations. The resulting agents outperform a recent LLM Agent approach and a strong heuristic search engine. While playing anonymously in online battles against humans, our best agents climb to rankings inside the top 10% of active players. All agent checkpoints, training details, datasets, and baselines are available at https://metamon.tech.

竞争性 Pok\ emamon Sone (CPS) 是一个流行的战略游戏, 游戏的玩家在战斗中根据不完善的信息学习如何利用对手, 而这种不完善的信息可以持续超过100个随机转折。 CPS 的AI 研究一直是由黑树搜索和在线自玩来引导的, 但游戏也可以创建一个平台来研究在大型数据集上不在线培训的适应性政策。我们开发了一条管道, 以重建从第三人视角从观众的第三人视角中节省下来的一流代理的一流个人视角, 从而打开了真正人类战斗的数据集, 超过10年以上, 并且每天都在成长。这个数据集可以让大型序列模型只根据输入轨迹来适应对手, 而没有进行任何明确的搜索。我们从模拟学习到离线 RL 和离线调整自我游戏数据在Pok\ mon 的四个最古老( 和最部分被观察的) 游戏代中, 由此形成的代理们超越了最新的LMAnnem代理器方法, 和强大的黑盒搜索机排名十号, 。在网络上进行最先进的数据库中, 正在进行匿名的服务器上进行。

Article 187

Title@2025-07-30 (3): Synchronization of mean-field models on the circle

Title: Synchronization of mean-field models on the circle

Synchronisierung von Mittelwert-Feld-Modellen auf dem Kreis

圆圈中平均场模型同步化 2507.22857v1

Authors (3): Yury Polyanskiy, Philippe Rigollet, Andrew Yao

This paper considers a mean-field model of $n$ interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of $L_1$-norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all $\beta \ge -0.16$, which significantly extends the previous bound of $0\le \beta \le 1$ from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when $\beta < -2/3$.

本文考虑的是美元互动粒子的平均场模型,其状态空间为单位圆,这是古典仓本模型的概括化。据说,如果从几乎任何初始状态开始,所有粒子都聚在一起到圆上的一个共同点,全球同步就会发生。我们建议了一个一般同步标准,即粒子互动函数第三个衍生物的以1美元为以中下角。作为一种应用,我们通过显示所有 $\beta\ ge -0.16美元的同步,解决了所谓的自我关注动态(变压器的基化模型)的推测,这大大扩展了Criscitiello、Rebjock、McRae和Boumal(2024年)以前的1美元约束值。我们还表明,当 $\beta < -2/3美元时,全球同步不会发生。

Article 188

Title@2025-07-30 (3): Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach

Title: Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach

Föderiertes Lernen auf Riemannschen Manifolds: Ein gradient-free-Projektion-basierter Ansatz

里伊曼曼字形上的联邦学习:基于渐进、无预测的渐进式项目方法 2507.22855v1

Authors (5): Hongye Wang, Zhaoye Pan, Chang He, Jiaxiang Li, Bo Jiang

Federated learning (FL) has emerged as a powerful paradigm for collaborative model training across distributed clients while preserving data privacy. However, existing FL algorithms predominantly focus on unconstrained optimization problems with exact gradient information, limiting its applicability in scenarios where only noisy function evaluations are accessible or where model parameters are constrained. To address these challenges, we propose a novel zeroth-order projection-based algorithm on Riemannian manifolds for FL. By leveraging the projection operator, we introduce a computationally efficient zeroth-order Riemannian gradient estimator. Unlike existing estimators, ours requires only a simple Euclidean random perturbation, eliminating the need to sample random vectors in the tangent space, thus reducing computational cost. Theoretically, we first prove the approximation properties of the estimator and then establish the sublinear convergence of the proposed algorithm, matching the rate of its first-order counterpart. Numerically, we first assess the efficiency of our estimator using kernel principal component analysis. Furthermore, we apply the proposed algorithm to two real-world scenarios: zeroth-order attacks on deep neural networks and low-rank neural network training to validate the theoretical findings.

联邦学习(FL)已成为分布式客户之间合作模式培训的强大范例,同时保护数据隐私。然而,现有的FL算法主要侧重于精确梯度信息中未受限制的优化问题,限制了其在只提供噪音功能评价或模型参数受限制的情况下的适用性。为了应对这些挑战,我们提议在FL的里曼尼特元体上采用新的零顺序投影算法。我们利用投影操作员,引入了一种计算效率为零顺序的里曼尼梯度估计仪。与现有的估测器不同,我们的FL算法只需要简单的Euclidean随机扰动,就不需要在色度空间中抽取随机矢量,从而降低计算成本。理论上,我们首先证明估算器的近似性,然后确定拟议算法的亚线性趋近性趋近性,并与其第一阶对应方的速率相匹配。从数量上,我们首先利用内核元元元元元元分析来评估我们的估测算器的效率。此外,我们提出的算法适用于两种真实世界情景:对深层神经网络的理论验证网络进行零序攻击。

Article 189

Title@2025-07-30 (3): A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

Title: A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

Ein bisschen Freiheit ist ein langer Weg: Klassische und Quantenalgorithmen zur Stärkung des Lernens unter einem generativen Modell

自由的一段长路:在创举模式下,为强化学习而进行古典和量子分析。 2507.22854v1

Authors (3): Andris Ambainis, Joao F. Doriguello, Debbie Lim

We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs). Our algorithms are based on a hybrid exploration-generative reinforcement learning (RL) model wherein the agent can, from time to time, freely interact with the environment in a generative sampling fashion, i.e., by having access to a “simulator”. By employing known classical and new quantum algorithms for approximating optimal policies under a generative model within our learning algorithms, we show that it is possible to avoid several paradigms from RL like “optimism in the face of uncertainty” and “posterior sampling” and instead compute and use optimal policies directly, which yields better regret bounds compared to previous works. For finite-horizon MDPs, our quantum algorithms obtain regret bounds which only depend logarithmically on the number of time steps $T$, thus breaking the $O(\sqrt{T})$ classical barrier. This matches the time dependence of the prior quantum works of Ganguly et al. (arXiv’23) and Zhong et al. (ICML’24), but with improved dependence on other parameters like state space size $S$ and action space size $A$. For infinite-horizon MDPs, our classical and quantum bounds still maintain the $O(\sqrt{T})$ dependence but with better $S$ and $A$ factors. Nonetheless, we propose a novel measure of regret for infinite-horizon MDPs with respect to which our quantum algorithms have $\operatorname{poly}\log{T}$ regret, exponentially better compared to classical algorithms. Finally, we generalise all of our results to compact state spaces.

我们提出新的经典和量子在线算法,以学习限制和偏振平均 Markov 的 MADPs (MDPs) 。我们的算法基于一种混合探索和遗传强化学习(RL) 模型, 代理器可以不时以基因抽样方式自由地与环境互动, 也就是说, 使用“ 模拟器 ” 。通过使用已知的经典和新的量子算法, 在学习算法的模型下适应最佳政策, 我们表明, 可以从RL 中避免一些范例, 比如“ 面对不确定性时的optimical- Excistical- recessment ” (RLLL) 和“ 其它抽样” , 而不是直接计算和使用最佳政策, 与以前的工作相比, 产生更好的遗憾界限。对于MDPsmizontical , 我们的量算算算法仅取决于时间步数的对数, $T, 从而打破 $O(srickral) , 但是经典屏障比我们更需要时间上的Oal-ral-ral- deal exalal 和salalalal exxal 。

Article 190

Title@2025-07-30 (3): Lightweight Online Adaption for Time Series Foundation Model Forecasts

Title: Lightweight Online Adaption for Time Series Foundation Model Forecasts

Leichte Online-Anpassung für Time Series Foundation Modellprognosen

时间系列基础基础模型预测 2502.12920v3

Authors (5): Thomas L. Lee, William Toner, Rajkarn Singh, Artjom Joosen, Martin Asenov

Foundation models (FMs) have emerged as a promising approach for time series forecasting. While effective, FMs typically remain fixed during deployment due to the high computational costs of learning them online. Consequently, deployed FMs fail to adapt their forecasts to current data characteristics, despite the availability of online feedback from newly arriving data. This raises the question of whether FM performance can be enhanced by the efficient usage of this feedback. We propose ELF to answer this question. ELF is a lightweight mechanism for the online adaption of FM forecasts in response to online feedback. ELF consists of two parts: a) the ELF-Forecaster which is used to learn the current data distribution; and b) the ELF-Weighter which is used to combine the forecasts of the FM and the ELF-Forecaster. We evaluate the performance of ELF in conjunction with several recent FMs across a suite of standard time series datasets. In all of our experiments we find that using ELF improves performance. This work demonstrates how efficient usage of online feedback can be used to improve FM forecasts.

基础模型(FMs)已成为对时间序列预测的一种很有希望的方法。由于在线学习的计算成本很高,调频系统在部署期间通常会保持固定状态,因此,部署调频系统无法根据当前数据特点调整其预测,尽管有新抵达数据的在线反馈,这就提出了调频系统能否通过有效使用这种反馈而提高性能的问题。我们建议以ELF来回答这个问题。ELF是一个对在线反馈进行调频预报在线调整的轻量机制。ELF由两部分组成:(a) 用于学习当前数据分布的ELF-Foraster;(b) ELF-Weighter,用于将调频和ELF-F-Forester的预测结合起来。我们结合一套标准时间序列数据集的最近几次调频系统来评估ELF的性能。我们发现,在所有实验中,使用ELF来改进性能。这项工作表明,如何有效地利用在线反馈来改进调频系统预测。

Article 191

Title@2025-07-30 (3): FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Title: FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

FRED: Finanzielle Retrieval-erweiterte Erkennung und Bearbeitung von Halluzinationen in Sprachmodellen

FRED: 财务检索-加强发现和编辑语言模型中的幻觉 2507.20930v2

Authors (3): Likun Tan, Kuan-Wei Huang, Kevin Wu

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just a 2% drop in binary detection and a 0.1% decline in overall detection compared to OpenAI-o3. Our work provides a practical solution for detecting and editing factual inconsistencies in financial text generation while introducing a generalizable framework that can enhance the trustworthiness and alignment of large language models across diverse applications beyond finance. Our code and data are available at https://github.com/pegasi-ai/shield.

大型语言模型的幻觉给需要事实可靠性的应用,特别是金融等高端领域,带来了严峻的挑战。这项工作为根据所提供的背景,探测和编辑模型生成的回复中不正确内容提供了有效方法。鉴于用户定义的域特定误差分类,我们通过将贴标签错误插入财务问题解答公司,然后将四种语言模型(Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B)微调四种语言模型(Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B),以发现和编辑这些事实不准确之处。我们最优秀的模型(经微调的Phi-4-4)在二进F1评分上实现了8%的改进,总体检测性能比OploaAI-o3增加了30%。值得注意的是,我们经过微调的Phi-4-mini模型尽管只有40亿个参数,但仍保持竞争性的性能,与OpenAI-o3相比,总体检测下降了0.1%。我们的工作为在发现和编辑财务生成中发现和编辑事实不一致事实不一致提供了实用解决办法的实用解决办法,同时采用通用的通用框架,可以加强信任和调整。

Article 192

Title@2025-07-30 (3): Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Title: Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Anwendung des Vision-Language-Modells auf Fußgänger Verhalten und Szeneverständnis im autonomen Fahren

在自主驾驶中将视觉语言模型应用到行人行为和场景理解 2501.06680v2

Authors (5): Haoxiang Gao, Li Zhang, Yu Zhao, Zhou Yang, Jinghan Cao

Vision-language models (VLMs) have become a promising approach to enhancing perception and decision-making in autonomous driving. The gap remains in applying VLMs to understand complex scenarios interacting with pedestrians and efficient vehicle deployment. In this paper, we propose a knowledge distillation method that transfers knowledge from large-scale vision-language foundation models to efficient vision networks, and we apply it to pedestrian behavior prediction and scene understanding tasks, achieving promising results in generating more diverse and comprehensive semantic attributes. We also utilize multiple pre-trained models and ensemble techniques to boost the model’s performance. We further examined the effectiveness of the model after knowledge distillation; the results show significant metric improvements in open-vocabulary perception and trajectory prediction tasks, which can potentially enhance the end-to-end performance of autonomous driving.

视觉语言模型(VLMs)已成为提高自主驾驶的认知和决策的有希望的方法,在应用VLMs来理解与行人互动和高效车辆部署的复杂情景方面仍然存在差距。在本文中,我们提议了一种知识蒸馏方法,将知识从大型视觉语言基础模型转移到高效的视觉网络,我们将其应用于行人行为预测和场景理解任务,在创造更多样化和更全面的语义属性方面取得有希望的结果。我们还利用多种预先培训的模型和混合技术来提高模型的性能。我们进一步审查了在知识蒸馏后该模型的有效性;结果显示,在开放语言概念和轨迹预测任务方面有显著的衡量改进,这有可能提高自主驾驶的端到端的性能。

Article 193

Title@2025-07-30 (3): Decentralized Differentially Private Power Method

Title: Decentralized Differentially Private Power Method

Dezentralisierte Differential-Private-Power-Methode

分散分散的、有区别的私用电力方法 2507.22849v1

Authors (3): Andrew Campbell, Anna Scaglione, Sean Peisert

We propose a novel Decentralized Differentially Private Power Method (D-DP-PM) for performing Principal Component Analysis (PCA) in networked multi-agent settings. Unlike conventional decentralized PCA approaches where each agent accesses the full n-dimensional sample space, we address the challenging scenario where each agent observes only a subset of dimensions through row-wise data partitioning. Our method ensures $(\epsilon,\delta)$-Differential Privacy (DP) while enabling collaborative estimation of global eigenvectors across the network without requiring a central aggregator. We achieve this by having agents share only local embeddings of the current eigenvector iterate, leveraging both the inherent privacy from random initialization and carefully calibrated Gaussian noise additions. We prove that our algorithm satisfies the prescribed $(\epsilon,\delta)$-DP guarantee and establish convergence rates that explicitly characterize the impact of the network topology. Our theoretical analysis, based on linear dynamics and high-dimensional probability theory, provides tight bounds on both privacy and utility. Experiments on real-world datasets demonstrate that D-DP-PM achieves superior privacy-utility tradeoffs compared to naive local DP approaches, with particularly strong performance in moderate privacy regimes ($\epsilon\in[2, 5]$). The method converges rapidly, allowing practitioners to trade iterations for enhanced privacy while maintaining competitive utility.

我们提出一种新的分散式私用权力方法(D-DP-PM),用于在网络多试剂环境中进行主要部件分析(PCA)。与传统的分散式常设常设仲裁院(CPA)方法不同,每个代理商都进入全正方位样本空间,我们处理一个富有挑战的情景,即每个代理商只通过行式数据分割观察一组维度。我们的方法确保$(epsilon,\delta)$-差异隐私(DP),同时允许在不需要中央聚合器的情况下对整个网络的全球密封源进行协作估算。我们这样做的方法是,使代理商只分享当前静脉代体的本地嵌入点,利用随机初始化和仔细校准高斯音添加的固有隐私。我们证明我们的算法符合规定的$(epsilon,\delta)-delta$-D-DP保证,并确立明确描述网络表面学影响的趋同率。我们基于中度动态和高度概率理论的理论的理论,为隐私和实用性提供了紧密的界限。对现实世界数据进行实验,同时对现实-世界的保密性数据系统进行比较显示,而D-DP-D-D-D-dalentalxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx快速性能快速性能能-Syalxxxxxxxxxxxxxxxxxxxxxxxxxxxxx。

Article 194

Title@2025-07-30 (3): Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Title: Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Krümmung Dynamischer Black-Box-Angriff: Wiederherstellung der gegnerischen Robustheit durch dynamische Krümmungsschätzung

曲线动态黑盒攻击: 通过动态曲线估计, 重新审视对抗性对称稳健性 2505.19194v2

Authors (1): Peiran Sun

Adversarial attack reveals the vulnerability of deep learning models. For about a decade, countless attack and defense methods have been proposed, leading to robustified classifiers and better understanding of models. Among these methods, curvature-based approaches have attracted attention because it is assumed that high curvature may give rise to rough decision boundary. However, the most commonly used \textit{curvature} is the curvature of loss function, scores or other parameters from within the model as opposed to decision boundary curvature, since the former can be relatively easily formed using second order derivative. In this paper, we propose a new query-efficient method, dynamic curvature estimation(DCE), to estimate the decision boundary curvature in a black-box setting. Our approach is based on CGBA, a black-box adversarial attack. By performing DCE on a wide range of classifiers, we discovered, statistically, a connection between decision boundary curvature and adversarial robustness. We also propose a new attack method, curvature dynamic black-box attack(CDBA) with improved performance using the dynamically estimated curvature.

反向攻击暴露了深层次学习模式的脆弱性。大约十年来, 提出了无数的攻击和防御方法, 从而形成了强大的分类和对模型的更好理解。在这些方法中, 以曲线为基础的方法引起了人们的注意, 因为假设高曲线可能导致粗略的决定界限。然而, 最常用的 \ textit{ curvature} 是模型中丢失功能、分数或其他参数的曲线性能, 而不是决定边界曲线性能, 因为前者可以相对容易地使用第二顺序衍生物形成。在本文中, 我们提出了一个新的查询效率方法, 动态曲线估计( DCE) , 来估计黑盒子环境中的决定边界曲线曲线。我们的方法是以黑盒子对抗性攻击为基础。我们从统计学上发现, 在一系列广泛的分类中执行DCE, 我们发现决定边界曲线性和对抗性强度之间的关联。我们还提出一种新的攻击方法, 曲线动态黑盒攻击( CDBA) , 使用动态估计曲线性分析法改进了业绩。

Article 195

Title@2025-07-30 (3): RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

Title: RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

RLVMR: Verstärktes Lernen mit überprüfbaren Meta-Reasoning-Belohnungen für robuste Long-Horizon-Agenten

RLVMR: 对强力长森剂采用可核查的可计量可计量的奖赏加强学习 2507.22844v1

Authors (5): Zijing Zhang, Ziyang Chen, Mingxiao Li, Zhaopeng Tu, Xiaolong Li

The development of autonomous agents for complex, long-horizon tasks is a central goal in AI. However, dominant training paradigms face a critical limitation: reinforcement learning (RL) methods that optimize solely for final task success often reinforce flawed or inefficient reasoning paths, a problem we term inefficient exploration. This leads to agents that are brittle and fail to generalize, as they learn to find solutions without learning how to reason coherently. To address this, we introduce RLVMR, a novel framework that integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors. RLVMR equips an agent to explicitly tag its cognitive steps, such as planning, exploration, and reflection, and provides programmatic, rule-based rewards for actions that contribute to effective problem-solving. These process-centric rewards are combined with the final outcome signal and optimized using a critic-free policy gradient method. On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results, with our 7B model reaching an 83.6% success rate on the most difficult unseen task split. Our analysis confirms these gains stem from improved reasoning quality, including significant reductions in redundant actions and enhanced error recovery, leading to more robust, efficient, and interpretable agents.

然而,主要的培训模式面临一个严重的限制:强化学习方法(RL)的优化只为最终任务的成功而优化,往往强化了有缺陷或效率低下的推理路径,这是我们称之为低效率探索的一个问题。这导致代理物变得易碎,没有推广,因为他们学会如何找到解决办法而没有学习如何以一致的方式理解问题。为了解决这个问题,我们引入了RLVMR,这是一个将密集、程序一级的监督纳入最终到最后的RLL的新框架,通过奖励可核查的、符合新需要的行为。RLVMR装备了一名代理物对其认知步骤(如规划、探索和反思)进行明确标记,并为有助于有效解决问题的行动提供基于规则的方案奖励。这些以程序为中心的奖励与最终结果信号相结合,并采用无批评性的政策梯度方法优化。在具有挑战性的ALFWMR和SWorld基准方面,RLVMR取得了新的最新状态成果,我们的7B模型在最困难的、最难的、最难的回收行动上达到83.6%的成功率,包括更稳健的推理,我们的分析证实了这些成果。

Article 196

Title@2025-07-30 (3): Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection

Title: Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection

Subgrid BoostCNN: Effiziente Steigerung konvolutionärer Netzwerke durch gradient-geführte Feature-Auswahl

Subgrid 启动CNN: 通过渐变引导特性选择有效推动革命网络 2507.22842v1

Authors (4): Biyi Fang, Jean Utke, Truong Vo, Diego Klabjan

Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of parameters often make CNNs computationally expensive to train, requiring extensive time and manual tuning to discover optimal architectures. In this paper, we introduce a novel framework for boosting CNN performance that integrates dynamic feature selection with the principles of BoostCNN. Our approach incorporates two key strategies: subgrid selection and importance sampling, to guide training toward informative regions of the feature space. We further develop a family of algorithms that embed boosting weights directly into the network training process using a least squares loss formulation. This integration not only alleviates the burden of manual architecture design but also enhances accuracy and efficiency. Experimental results across several fine-grained classification benchmarks demonstrate that our boosted CNN variants consistently outperform conventional CNNs in both predictive performance and training speed.

通过利用深层建筑的等级特征学习,连带神经网络(CNN)在一系列广泛的机器学习任务中取得了显著成功。然而,大量层次和数以百万计的参数往往使CNN计算费用昂贵,需要大量时间和人工调整才能发现最佳建筑。在本文中,我们引入了一个促进CNN性能的新框架,将动态特征选择与BOustCNN原则相结合。我们的方法包含两个关键战略:子网格选择和重要性抽样,以指导对地貌空间信息丰富的区域的培训。我们进一步开发了一套算法,利用最小方形损失公式将提升重量直接嵌入网络培训过程。这种集法不仅减轻了手工结构设计的负担,而且提高了准确性和效率。若干精细的分类基准的实验结果表明,我们的CNN变式在预测性业绩和培训速度方面始终优于常规CNN。

Article 197

Title@2025-07-30 (3): PAF-Net: Phase-Aligned Frequency Decoupling Network for Multi-Process Manufacturing Quality Prediction

Title: PAF-Net: Phase-Aligned Frequency Decoupling Network for Multi-Process Manufacturing Quality Prediction

PAF-Net: Phase-Aligned Frequency Entkopplungsnetzwerk für die Qualitätsvorhersage in der Mehrprozessfertigung

PAF-Net:多处理制造质量预测的分阶段统一频率脱钩网络 2507.22840v1

Authors (6): Yang Luo, Haoyang Luan, Haoyun Pan, Yongquan Jia, Xiaofeng Gao, Guihai Chen

Accurate quality prediction in multi-process manufacturing is critical for industrial efficiency but hindered by three core challenges: time-lagged process interactions, overlapping operations with mixed periodicity, and inter-process dependencies in shared frequency bands. To address these, we propose PAF-Net, a frequency decoupled time series prediction framework with three key innovations: (1) A phase-correlation alignment method guided by frequency domain energy to synchronize time-lagged quality series, resolving temporal misalignment. (2) A frequency independent patch attention mechanism paired with Discrete Cosine Transform (DCT) decomposition to capture heterogeneous operational features within individual series. (3) A frequency decoupled cross attention module that suppresses noise from irrelevant frequencies, focusing exclusively on meaningful dependencies within shared bands. Experiments on 4 real-world datasets demonstrate PAF-Net’s superiority. It outperforms 10 well-acknowledged baselines by 7.06% lower MSE and 3.88% lower MAE. Our code is available at https://github.com/StevenLuan904/PAF-Net-Official.

为解决这些问题,我们提议PAF-Net,这是一个频率分解的时间序列预测框架,有三项关键创新:(1) 由频率域能引导的相近关系协调方法,以同步时间滞后质量系列,解决时间错位问题。(2) 与分解 Cosine 变换(DCT) 相配的频率独立补丁机制,以捕捉各系列不同操作特征。(3) 频率分解的交叉注意模块,以抑制不相关频率的噪音,完全侧重于共享频带内有意义的依赖性。对4个真实世界数据集的实验显示了PAF-Net的优势。它比10个公认基准值高出7.06%,低于3.88%。我们的代码可在https://github.com/STevenLual904/PAF-Net-Office查阅。

Article 198

Title@2025-07-30 (3): Tapping into the Black Box: Uncovering Aligned Representations in Pretrained Neural Networks

Title: Tapping into the Black Box: Uncovering Aligned Representations in Pretrained Neural Networks

In die Black Box tappen: Uncovering Aligned Representations in Pretrained Neural Networks

进入黑盒:在培训前神经网络中实现统一代表制 2507.22832v1

Authors (1): Maciej Satkiewicz

In this paper we argue that ReLU networks learn an implicit linear model we can actually tap into. We describe that alleged model formally and show that we can approximately pull its decision boundary back to the input space with certain simple modification to the backward pass. The resulting gradients (called excitation pullbacks) reveal high-resolution input- and target-specific features of remarkable perceptual alignment on a number of popular ImageNet-pretrained deep architectures. This strongly suggests that neural networks do, in fact, rely on learned interpretable patterns that can be recovered after training. Thus, our findings may have profound implications for knowledge discovery and the development of dependable artificial systems.

在本文中,我们争论说,ReLU网络学会了一种我们可以实际利用的隐性线性模型。我们正式描述了这一所谓的模型,并表明我们可以将其决定界限拉回输入空间,对后向通道进行某些简单的修改。由此形成的梯度(所谓的“引力回引”)揭示出一些广受欢迎的图像网络先入为主的深层结构的高度清晰的输入和目标特定的特征。这强烈地表明神经网络事实上依赖于经过培训后可以恢复的可解释的已知模式。因此,我们的调查结果可能对知识的发现和可靠人工系统的开发产生深远影响。

Article 199

Title@2025-07-30 (3): The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA

Title: The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA

Die Geometrie der Abfragen: Query-based Innovationen in der retrieval-Augmented Generation für Healthcare QA

查询的几何学:以查询为基础的求取-养代保健创新 2407.18044v2

Authors (5): Eric Yang, Jonathan Amar, Jong Ha Lee, Bhawesh Kumar, Yugang Jia

Deploying Large Language Models (LLMs) for healthcare question answering requires robust methods to ensure accuracy and reliability. This work introduces Query-Based Retrieval Augmented Generation (QB-RAG), a framework for enhancing Retrieval-Augmented Generation (RAG) systems in healthcare question-answering by pre-aligning user queries with a database of curated, answerable questions derived from healthcare content. A key component of QB-RAG is an LLM-based filtering mechanism that ensures that only relevant and answerable questions are included in the database, enabling reliable reference query generation at scale. We provide theoretical motivation for QB-RAG, conduct a comparative analysis of existing retrieval enhancement techniques, and introduce a generalizable, comprehensive evaluation framework that assesses both the retrieval effectiveness and the quality of the generated response based on faithfulness, relevance, and adherence to the guideline. Our empirical evaluation on a healthcare data set demonstrates the superior performance of QB-RAG compared to existing retrieval methods, highlighting its practical value in building trustworthy digital health applications for health question-answering.

用于保健问题解答的大型语言模型(LLMs)的部署,要求采取强有力的方法确保准确性和可靠性。这项工作引入了基于查询的检索增强型(QB-RAG),这是在保健问题解答中加强检索增强型(RAG)系统的一个框架,通过预先调整用户查询,以一个基于保健内容的可回答的整理问题数据库来回答保健问题。QB-RAG的一个关键组成部分是一个基于LLM的过滤机制,它确保数据库中只包含相关和可回答的问题,从而能够大规模地生成可靠的参考查询。我们为QB-RAG提供了理论动力,对现有检索增强技术进行了比较分析,并引入了一个笼统和全面的评价框架,根据忠诚性、相关性和对准则的遵守情况,评估了检索效果和质量。我们对一套保健数据集的经验评价表明,QB-RAG与现有的检索方法相比,其业绩优优异,突出其在为健康问题解答建立可靠的数字健康应用方面的实际价值。

Article 200

Title@2025-07-30 (3): Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit

Title: Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit

Wiederholung macht perfekt: Recurrent Graph Neural Networks Match Message Passing Limit

翻版完美:经常图表神经网络符合信件传递限制 2505.00291v2

Authors (2): Eran Rosenbluth, Martin Grohe

We precisely characterize the expressivity of computable Recurrent Graph Neural Networks (recurrent GNNs). We prove that recurrent GNNs with finite-precision parameters, sum aggregation, and ReLU activation, can compute any graph algorithm that respects the natural message-passing invariance induced by the Color Refinement (or Weisfeiler-Leman) algorithm. While it is well known that the expressive power of GNNs is limited by this invariance [Morris et al., AAAI 2019; Xu et al., ICLR 2019], we establish that recurrent GNNs can actually match this limit. This is in contrast to non-recurrent GNNs, which have the power of Weisfeiler-Leman only in a very weak, “non-uniform”, sense where each graph size requires a different GNN to compute with. Our construction introduces only a polynomial overhead in both time and space. Furthermore, we show that by incorporating random initialization, for connected graphs recurrent GNNs can express all graph algorithms. In particular, any polynomial-time graph algorithm can be emulated on connected graphs in polynomial time by a recurrent GNN with random initialization.

我们精确地描述可计算经常图形神经网络(常量 GNNs) 的表达性。我们证明, 带有有限精度参数、汇总和 ReLU 激活的经常性 GNNs 能够计算出任何尊重由彩色精度( 或 Weisfeiler- Leman) 算法引发的自然传递信息变化的图形算法。虽然我们众所周知, GNS 的表达力受到这种变化的限制[Morris 等, AAAI 2019; Xu 等 , ICLR 2019], 我们确定经常性 GNNS 能够真正符合这一限制。这与非经常性的 GNNNS 相悖, 后者只有在非常弱的“ 非统一” 感官才具有Weisfeiler- Leman的能量。每个图形大小都要求不同的 GNNN( GNN) 来进行计算。我们的构造在时间和空间都只引入一个多数值管理器。此外, 我们通过随机初始化, 将 GNNNNPs 的经常性图形纳入随机初始化可表达所有图表的图形算法。

Article 201

Title@2025-07-30 (3): Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

Title: Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

Verringerung des Varianzverlustes in der Ensembledatenassimilation: maschinelle Learning-basierte und distanzfreie Lokalisierung

减缓在数据共化方面差异的损失:机械学习和远距离本地化 2506.13362v2

Authors (3): Vinicius L. S. Silva, Gabriel S. Seabra, Alexandre A. Emerick

We propose two new methods based/inspired by machine learning for tabular data and distance-free localization to enhance the covariance estimations in an ensemble data assimilation. The main goal is to enhance the data assimilation results by mitigating loss of variance due to sampling errors. We also analyze the suitability of several machine learning models and the balance between accuracy and computational cost of the covariance estimations. We introduce two distance-free localization techniques leveraging machine learning methods specifically tailored for tabular data. The methods are integrated into the Ensemble Smoother with Multiple Data Assimilation (ES-MDA) framework. The results show that the proposed localizations improve covariance accuracy and enhance data assimilation and uncertainty quantification results. We observe reduced variance loss for the input variables using the proposed methods. Furthermore, we compare several machine learning models, assessing their suitability for the problem in terms of computational cost, and quality of the covariance estimation and data match. The influence of ensemble size is also investigated, providing insights into balancing accuracy and computational efficiency. Our findings demonstrate that certain machine learning models are more suitable for this problem. This study introduces two novel methods that mitigate variance loss for model parameters in ensemble-based data assimilation, offering practical solutions that are easy to implement and do not require any additional numerical simulation or hyperparameter tuning.

我们提出基于表格数据和无距离本地化的机械学习的两种新方法,这些方法基于表格数据和无距离本地化的机器学习,以加强混合数据同化框架中的共变估计。主要目标是通过减少抽样错误造成的差异损失,加强数据同化结果。我们还分析若干机器学习模型的适宜性,以及调和共变估计的准确性和计算成本之间的平衡。我们采用两种利用专门为表格数据定制的机器学习方法的无距离本地化技术。这些方法被融入了与多重数据同化(ES-MDA)相结合的混合平滑度框架。结果显示,拟议的本地化提高了同化的精确性,提高了数据同化和不确定性的量化结果。我们观察到,使用拟议方法减少了输入变量的差异损失。此外,我们比较了几个机器学习模型,评估它们在计算成本、异性估计质量和数据匹配方面是否适合问题。还调查了组合规模的影响,为平衡准确性和计算效率提供了深刻的洞察力。我们的研究结果表明,某些机器学习模型更适合这一问题。我们观察到了使用拟议方法减少输入输入变量的差异。此外,还采用了两种新式的模型,以降低损失的方法。

Article 202

Title@2025-07-30 (3): Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models

Title: Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models

Quantifying surprise in clinical care: Erkennung von hochinformativen Ereignissen in elektronischen Gesundheitsakten mit Fundamentmodellen

将临床护理的意外事件量化:用基础模型检测电子健康记录中的高度信息化事件 2507.22798v1

Authors (5): Michael C. Burkhart, Bashar Ramadan, Luke Solo, William F. Parker, Brett K. Beaulieu-Jones

We present a foundation model-derived method to identify highly informative tokens and events in electronic health records. Our approach considers incoming data in the entire context of a patient’s hospitalization and so can flag anomalous events that rule-based approaches would consider within a normal range. We demonstrate that the events our model flags are significant for predicting downstream patient outcomes and that a fraction of events identified as carrying little information can safely be dropped. Additionally, we show how informativeness can help interpret the predictions of prognostic models trained on foundation model-derived representations.

我们提出一种基础示范方法,用以确定电子健康记录中信息量高的代用品和事件;我们的方法考虑在病人住院的整个背景下获得的数据,从而能够标出以规则为基础的方法在正常范围内会考虑的异常事件;我们证明我们的示范旗帜对于预测下游病人的结果具有重要意义,并且可以安全地放弃一部分被确定为信息很少的事件;此外,我们表明信息性如何有助于解释根据以模式为基础的代表方式培训的预测性模型的预测。

Article 203

Title@2025-07-30 (3): Towards the Law of Capacity Gap in Distilling Language Models

Title: Towards the Law of Capacity Gap in Distilling Language Models

Auf dem Weg zum Gesetz der Kapazitä tigkeitslücke bei der Destillierung von Sprachmodellen

迈向《语文模式再学习能力差距法》 2311.07052v4

Authors (6): Chen Zhang, Qiuchi Li, Dawei Song, Zheyu Ye, Yan Gao, Yan Hu

Language model (LM) distillation aims at distilling the knowledge in a large teacher LM to a small student one. As a critical issue facing LM distillation, a superior student often arises from a teacher of a relatively small scale instead of a larger one, especially in the presence of substantial capacity gap between the teacher and student. This issue, often referred to as the \textit{curse of capacity gap}, suggests that there is likely an optimal teacher yielding the best-performing student along the scaling course of the teacher. Consequently, distillation trials on teachers of a wide range of scales are called for to determine the optimal teacher, which becomes computationally intensive in the context of large LMs (LLMs). This paper addresses this critical bottleneck by providing the \textit{law of capacity gap} inducted from a preliminary study on distilling a broad range of small-scale (<3B) LMs, where the optimal teacher consistently scales linearly with the student scale across different model and data scales. By extending the law to LLM distillation on a larger scale (7B), we succeed in obtaining versatile LLMs that outperform a wide array of competitors.

语言模型(LM)蒸馏法(LM)旨在将大型教师LM中的知识蒸馏成小学生的大型LM中的知识。作为LM蒸馏所面临的一个关键问题,高级学生往往来自规模相对较小而不是较大规模的教师,特别是在教师与学生之间能力差距很大的情况下。这个问题通常被称为能力差距的缩放管 , 表明在教师规模扩大过程中,很可能有一个最优秀的教师, 产生成绩最好的学生。因此, 需要对各种规模的教师进行蒸馏试验,以确定在大型LMS(LLMs)中具有计算密集性的最佳教师。本文通过在对广泛小规模( < 3B > ) LMS(<3B] LMS)进行的初步研究中引入的\ textitit{能力差距法} 来解决这一关键瓶颈问题, 最佳教师与不同模式和数据尺度的学生比例一致, 将法律推广到LMTM(7B)的蒸馏法, 通过在更大的规模(7B)中将法律推广到LM(LM)中,我们成功地获得了超越了多种磁体的磁体竞争阵列。

Article 204

Title@2025-07-30 (3): Amorphous Solid Model of Vectorial Hopfield Neural Networks

Title: Amorphous Solid Model of Vectorial Hopfield Neural Networks

Amorphes solides Modell von Vectorial Hopfield Neural Networks

矢量跳式浮式神经网络固态模型 2507.22787v1

Authors (2): F. Gallavotti, A. Zaccone

We present a vectorial extension of the Hopfield associative memory model inspired by the theory of amorphous solids, where binary neural states are replaced by unit vectors $\mathbf{s}_i \in \mathbb{R}^3$ on the sphere $S^2$. The generalized Hebbian learning rule creates a block-structured weight matrix through outer products of stored pattern vectors, analogous to the Hessian matrix structure in amorphous solids. We demonstrate that this model exhibits quantifiable structural properties characteristic of disordered materials: energy landscapes with deep minima for stored patterns versus random configurations (energy gaps $\sim 7$ units), strongly anisotropic correlations encoded in the weight matrix (anisotropy ratios $\sim 10^2$), and order-disorder transitions controlled by the pattern density $\gamma = P/(N \cdot d)$. The enhanced memory capacity ($\gamma_c \approx 0.55$ for a fully-connected network) compared to binary networks ($\gamma_c \approx 0.138$) and the emergence of orientational correlations establish connections between associative memory mechanisms and amorphous solid physics, particularly in systems with continuous orientational degrees of freedom. We also unveil the scaling with the coordination number $Z$ of the memory capacity: $\gamma_c \sim (Z-6)$ from the isostatic point $Z_c =6$ of the 3D elastic network, which closely mirrors the scaling of the shear modulus $G \sim (Z-6)$ in 3D central-force spring networks.

我们展示了一个由不固定固体理论启发的Hopfield连带内存模型的矢量扩展。这个模型展示了由不固定固体理论所启发的Hopfield 组合内存模型的可量化结构特性: 存储模式与随机配置的深度微型能源景观( 能源缺口 $\sim 7$) 取代二进神经状态, 在重量矩阵中编码的强烈反向关系( 氮比率 $s%2$) 。普通的 Hebbian 学习规则通过存储模式矢量的外产产品创建了块状结构式重力矩阵, 类似于在不固定固体固体的基质结构结构结构。这个模型展示了一个完全连接的网络的可量化结构特性: 与二进制网络的深度微型微型小行星网络( $38\simmal_calal 方向) 和连续的硬化系统( ASimlimalalalalalalalalal_Appolation Z) 。

Article 205

Title@2025-07-30 (3): DO-EM: Density Operator Expectation Maximization

Title: DO-EM: Density Operator Expectation Maximization

DO-EM: Dichte-Operator-Erwartungsmaximierung

DO-EM: 密度操作员预期最大化 2507.22786v1

Authors (4): Adit Vishnu, Abhay Shastry, Dhruva Kashyap, Chiranjib Bhattacharyya

Density operators, quantum generalizations of probability distributions, are gaining prominence in machine learning due to their foundational role in quantum computing. Generative modeling based on density operator models (\textbf{DOMs}) is an emerging field, but existing training algorithms – such as those for the Quantum Boltzmann Machine – do not scale to real-world data, such as the MNIST dataset. The Expectation-Maximization algorithm has played a fundamental role in enabling scalable training of probabilistic latent variable models on real-world datasets. \textit{In this paper, we develop an Expectation-Maximization framework to learn latent variable models defined through \textbf{DOMs} on classical hardware, with resources comparable to those used for probabilistic models, while scaling to real-world data.} However, designing such an algorithm is nontrivial due to the absence of a well-defined quantum analogue to conditional probability, which complicates the Expectation step. To overcome this, we reformulate the Expectation step as a quantum information projection (QIP) problem and show that the Petz Recovery Map provides a solution under sufficient conditions. Using this formulation, we introduce the Density Operator Expectation Maximization (DO-EM) algorithm – an iterative Minorant-Maximization procedure that optimizes a quantum evidence lower bound. We show that the \textbf{DO-EM} algorithm ensures non-decreasing log-likelihood across iterations for a broad class of models. Finally, we present Quantum Interleaved Deep Boltzmann Machines (\textbf{QiDBMs}), a \textbf{DOM} that can be trained with the same resources as a DBM. When trained with \textbf{DO-EM} under Contrastive Divergence, a \textbf{QiDBM} outperforms larger classical DBMs in image generation on the MNIST dataset, achieving a 40–60\% reduction in the Fr'echet Inception Distance.

密度运算符,即概率分布的量性60的典型化运算符,正在机器学习中越来越显眼,这是因为他们在量子计算中的基本作用。基于密度运算模型(\ textbf{DOMM})的生成模型是一个新兴的字段,但现有的培训算法,例如Quantum Boltzmann 机器的这种算法,并不与真实世界的数据相适应,例如MNIST数据集。期望-最大化算法在使真实世界数据集中概率潜伏变量模型的可扩缩培训方面发挥了根本作用。在本文中,我们开发了一个期待-最大模型,以学习通过古典硬件\ textbf{DOM}确定的潜在变量模型,而将资源与用于概率模型的资源相仿,同时缩放到真实世界数据中。但是,设计这样的算法并不令人触动,因为缺乏精确的量模拟,使得期待步骤更为复杂。为了克服这一点,我们重新将期待的步伐作为定量信息投影化过程,我们正在使用一个基础数据化的快速化模型。

Article 206

Title@2025-07-30 (3): Effective Non-Random Extreme Learning Machine

Title: Effective Non-Random Extreme Learning Machine

Effektive Non-Random Extreme Lernmaschine

有效的非兰地极端学习机 2411.16229v2

Authors (2): Daniela De Canditiis, Fabiano Veglianti

The Extreme Learning Machine (ELM) is a growing statistical technique widely applied to regression problems. In essence, ELMs are single-layer neural networks where the hidden layer weights are randomly sampled from a specific distribution, while the output layer weights are learned from the data. Two of the key challenges with this approach are the architecture design, specifically determining the optimal number of neurons in the hidden layer, and the method’s sensitivity to the random initialization of hidden layer weights. This paper introduces a new and enhanced learning algorithm for regression tasks, the Effective Non-Random ELM (ENR-ELM), which simplifies the architecture design and eliminates the need for random hidden layer weight selection. The proposed method incorporates concepts from signal processing, such as basis functions and projections, into the ELM framework. We introduce two versions of the ENR-ELM: the approximated ENR-ELM and the incremental ENR-ELM. Experimental results on both synthetic and real datasets demonstrate that our method overcomes the problems of traditional ELM while maintaining comparable predictive performance.

极端学习机(ELM)是一个不断发展的统计技术,广泛应用于回归问题。本质上,ELM(ELM)是单层神经网络,隐藏层重量是从特定分布中随机抽取的,而产出层重量则从数据中学习。这一方法的两个主要挑战是结构设计,具体确定隐藏层中神经元的最佳数量,以及该方法对隐性层重量随机初始化的敏感性。本文为回归任务引入了新的强化学习算法,有效的非兰多姆ELM(ENR-ELM)简化了结构设计,并消除了随机隐性层重量选择的需要。拟议方法将信号处理的概念,例如基础功能和预测,纳入ELM框架。我们引入了ENR-ELM的两个版本:近似的ENR-ELM和渐进式的ENR-ELM。合成和真实数据集的实验结果表明,我们的方法克服了传统的ELM问题,同时保持可比较的预测性能。

Article 207

Title@2025-07-30 (3): Label-free estimation of clinically relevant performance metrics under distribution shifts

Title: Label-free estimation of clinically relevant performance metrics under distribution shifts

Labelfreie Schätzung klinisch relevanter Leistungsmetriken unter Verteilungsverschiebungen

无标签地估算分布转移中与临床相关的绩效衡量指标 2507.22776v1

Authors (4): Tim Flühmann, Alceu Bissoto, Trung-Dung Hoang, Lisa M. Koch

Performance monitoring is essential for safe clinical deployment of image classification models. However, because ground-truth labels are typically unavailable in the target dataset, direct assessment of real-world model performance is infeasible. State-of-the-art performance estimation methods address this by leveraging confidence scores to estimate the target accuracy. Despite being a promising direction, the established methods mainly estimate the model’s accuracy and are rarely evaluated in a clinical domain, where strong class imbalances and dataset shifts are common. Our contributions are twofold: First, we introduce generalisations of existing performance prediction methods that directly estimate the full confusion matrix. Then, we benchmark their performance on chest x-ray data in real-world distribution shifts as well as simulated covariate and prevalence shifts. The proposed confusion matrix estimation methods reliably predicted clinically relevant counting metrics on medical images under distribution shifts. However, our simulated shift scenarios exposed important failure modes of current performance estimation techniques, calling for a better understanding of real-world deployment contexts when implementing these performance monitoring techniques for postmarket surveillance of medical AI models.

然而,由于目标数据集中通常没有地面实况标签,直接评估真实世界模型的性能是行不通的。最先进的性能估计方法通过利用信心分数来估计目标准确性来解决这个问题。尽管这是一个有希望的方向,但既定方法主要是估计模型的准确性,很少在临床领域对它进行评估,因为临床领域存在着严重的阶级不平衡和数据集变化。我们的贡献是双重的:首先,我们采用现有性能预测方法的概括性,直接估计完全混乱的矩阵。然后,我们将它们的性能以真实世界分布变化中的胸前X光数据以及模拟的 Covary和流行性变化作为基准。拟议的混乱性矩阵估计方法可靠地预测了在分布变化中医疗图像的临床相关计数。然而,我们模拟的转移假设暴露了当前性能估测技术的重要失败模式,呼吁在应用这些性能监测技术对医学AI模型进行后期监测时,更好地了解现实世界部署情况。

Article 208

Title@2025-07-30 (3): Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Title: Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Empirische Bewertung von Konzept Drift in ML-basierte Android Malware-Erkennung

对以ML为基体和机器人毛虫探测中的概念漂流进行经验评估 2507.22772v1

Authors (4): Ahmed Sabbah, Radi Jarrar, Samer Zein, David Mohaisen

Despite outstanding results, machine learning-based Android malware detection models struggle with concept drift, where rapidly evolving malware characteristics degrade model effectiveness. This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types–static, dynamic, hybrid, semantic, and image-based–were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and concept drift, the impact was relatively minor compared to other variables since hyperparameters were not fine-tuned, and the default algorithm configurations were used. While LLMs using few-shot learning demonstrated promising detection performance, they did not fully mitigate concept drift, highlighting the need for further investigation.

尽管取得了突出的成果,但基于机器的学习和机器人的恶意软件检测模型与概念漂移有关,而迅速演变的恶意软件特征会降低模型的有效性。本研究审查了概念漂移对安卓的恶意软件检测的影响,评估了两个数据集和九个机器学习和深层次学习算法以及大语言模型(LLMs),考虑了各种特征类型 – – 静态、动态、动态、混合、语义和基于图像的模型。结果显示概念漂移很普遍,对模型性能有重大影响。影响漂移的因素包括特征类型、数据环境和检测方法。平衡算法有助于阶级失衡,但没有完全解决概念漂移问题,这主要来自恶意软件环境的动态性质。在使用的算法类型和概念漂移之间没有找到强有力的联系,这种影响与其他变量相比是相对较小的,因为超参数没有进行微调,而且使用了默认的算法配置。使用微小的学习显示有希望的检测性能,但它们没有完全减轻概念的漂移,突出了进一步调查的必要性。

Article 209

Title@2025-07-30 (3): The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis

Title: The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis

Die Wirkung der Stochastik bei der Score-Based Diffusion Sampling: eine KL Divergence Analyse

存储在基于分分数的传播抽样中的效果:KL差异分析 2506.11378v2

Authors (3): Bernardo P. Schaeffer, Ricardo M. S. Rosa, Glauco Valle

Sampling in score-based diffusion models can be performed by solving either a reverse-time stochastic differential equation (SDE) parameterized by an arbitrary time-dependent stochasticity parameter or a probability flow ODE, corresponding to the stochasticity parameter set to zero. In this work, we study the effect of this stochasticity on the generation process through bounds on the Kullback-Leibler (KL) divergence, complementing the analysis with numerical and analytical examples. Our main results apply to linear forward SDEs with additive noise and Lipschitz-continuous score functions, and quantify how errors from the prior distribution and score approximation propagate under different choices of the stochasticity parameter. The theoretical bounds are derived using log-Sobolev inequalities for the marginals of the forward process, which enable a more effective control of the KL divergence decay along sampling. For exact score functions, we find that stochasticity acts as an error-correcting mechanism, decreasing KL divergence along the sampling trajectory. For an approximate score function, there is a trade-off between error correction and score error amplification, so that stochasticity can either improve or worsen the performance, depending on the structure of the score error. Numerical experiments on simple datasets and a fully analytical example are included to illustrate and enlighten the theoretical results.

在基于分数的传播模型中,可以通过下列方法进行抽样:通过任意的基于时间的随机切分参数参数或概率流值值参数(与设定为零的随机切分参数相对),解决反时间的随机切分方程式参数(SDE)参数;在这项工作中,我们通过Kullback-Leibertr(KL)差异的界限,研究这种随机性对生成过程的影响;用数字和分析实例来补充分析。我们的主要结果适用于带有添加噪音和利普西茨持续得分函数的线性推进SDE(SDE),并量化先前分布和分数近似值错误如何在随机切分参数的不同选择下传播。理论界限是通过对前方过程边缘的日志-博值不平等推导出,从而能够更有效地控制KL差异在取样过程中的衰变。关于精确的评分功能,我们发现,随机度作为纠正错误的机制,在取样轨迹上减少KLIL差异。对于一个近似的评分函数,在先前的分布和分差差差差值中,在对前位参数的分布和分差差差值的分布进行量化之间如何传播,在选择的分布之间是如何在选择中传播差差差差差差差和分,因此将分析性判断和评分中可以包括:对评分率性、分率性判断性判断性判断性分析性分析性分析性、分析性分析性分析性实验,从而可以改进性分析性分析性、分。

Article 210

Title@2025-07-30 (3): Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Title: Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Lehren des Lehrers: Verbesserung der Neuralen Netzwerk-Destillierbarkeit für symbolische Regression durch Jacobian Regularisierung

教师教学:通过雅各的正规化,提高神经网络的可固化性 2507.22767v1

Authors (3): Soumyadeep Dhar, Kei Sen Fong, Mehul Motani

Distilling large neural networks into simple, human-readable symbolic formulas is a promising path toward trustworthy and interpretable AI. However, this process is often brittle, as the complex functions learned by standard networks are poor targets for symbolic discovery, resulting in low-fidelity student models. In this work, we propose a novel training paradigm to address this challenge. Instead of passively distilling a pre-trained network, we introduce a \textbf{Jacobian-based regularizer} that actively encourages the ``teacher’’ network to learn functions that are not only accurate but also inherently smoother and more amenable to distillation. We demonstrate through extensive experiments on a suite of real-world regression benchmarks that our method is highly effective. By optimizing the regularization strength for each problem, we improve the $R^2$ score of the final distilled symbolic model by an average of \textbf{120\% (relative)} compared to the standard distillation pipeline, all while maintaining the teacher’s predictive accuracy. Our work presents a practical and principled method for significantly improving the fidelity of interpretable models extracted from complex neural networks.

将大型神经网络蒸发成简单、可人类阅读的象征性公式是走向可信赖和可解释的AI的一条充满希望的道路。然而,这一过程往往不顺利,因为标准网络所学到的复杂功能是象征性发现的目标,导致低信仰学生模式。在这项工作中,我们提出了一个应对这一挑战的新培训模式。我们不是被动地将一个预先培训的网络蒸发,而是引入一个“textbf{Jacobian-Remanizer},积极鼓励“教师”网络学习不仅准确而且内在更顺畅、更易于蒸馏的功能。我们通过对一套真实世界回归基准的广泛实验,展示了我们的方法非常有效。通过优化每个问题的正规化力量,我们通过平均的\ textf{120{(realtical)}来提高最后蒸馏的象征性模型的2美元分数,而与标准的蒸馏管道相比,所有这一切都保持教师的预测准确性。我们的工作展示了一种切实和有原则的方法,可以大幅改进从复杂神经网络中提取的可解释模型的准确性。

Article 211

Title@2025-07-30 (3): Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Title: Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Bayesische Optimierung von Prozessparametern eines Sensor-basierten Sortiersystems unter Verwendung Gaussischer Prozesse als Surrogate-Modelle

利用高斯进程作为代位模型,优化基于传感器的排序系统的处理参数 2507.22766v1

Authors (3): Felix Kronenwett, Georg Maier, Thomas Laengle

Sensor-based sorting systems enable the physical separation of a material stream into two fractions. The sorting decision is based on the image data evaluation of the sensors used and is carried out using actuators. Various process parameters must be set depending on the properties of the material stream, the dimensioning of the system, and the required sorting accuracy. However, continuous verification and re-adjustment are necessary due to changing requirements and material stream compositions. In this paper, we introduce an approach for optimizing, recurrently monitoring and adjusting the process parameters of a sensor-based sorting system. Based on Bayesian Optimization, Gaussian process regression models are used as surrogate models to achieve specific requirements for system behavior with the uncertainties contained therein. This method minimizes the number of necessary experiments while simultaneously considering two possible optimization targets based on the requirements for both material output streams. In addition, uncertainties are considered during determining sorting accuracies in the model calculation. We evaluated the method with three example process parameters.

基于传感器的分类系统能够将材料流实际分离成两个部分。分类决定基于对所用传感器的图像数据评价,并且使用动因器进行。必须根据材料流的特性、系统尺寸和所需的分类准确性确定各种过程参数。但是,由于要求的变化和材料流构成的变化,持续核查和调整是必要的。在本文件中,我们引入了优化、经常监测和调整基于传感器的分类系统的流程参数的方法。根据Bayesian Optimization, Gausian 进程回归模型被用作替代模型,以实现系统行为及其所含不确定性的具体要求。这种方法最大限度地减少必要的实验数量,同时考虑基于两种材料输出流要求的两种可能的优化目标。此外,在确定模型计算中的分拣时会考虑到不确定性。我们用三个示例程序参数对方法进行了评估。

Article 212

Title@2025-07-30 (3): Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

Title: Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

Von guten Dämonen und schlechten Engeln: Sichere Kontrolle unter finite Precision garantieren

善魔和坏天使:在有限精密情况下保证安全控制 2507.22760v1

Authors (3): Samuel Teuber, Debasmita Lohar, Bernhard Beckert

As neural networks (NNs) become increasingly prevalent in safety-critical neural network-controlled cyber-physical systems (NNCSs), formally guaranteeing their safety becomes crucial. For these systems, safety must be ensured throughout their entire operation, necessitating infinite-time horizon verification. To verify the infinite-time horizon safety of NNCSs, recent approaches leverage Differential Dynamic Logic (dL). However, these dL-based guarantees rely on idealized, real-valued NN semantics and fail to account for roundoff errors introduced by finite-precision implementations. This paper bridges the gap between theoretical guarantees and real-world implementations by incorporating robustness under finite-precision perturbations – in sensing, actuation, and computation – into the safety verification. We model the problem as a hybrid game between a good Demon, responsible for control actions, and a bad Angel, introducing perturbations. This formulation enables formal proofs of robustness w.r.t. a given (bounded) perturbation. Leveraging this bound, we employ state-of-the-art mixed-precision fixed-point tuners to synthesize sound and efficient implementations, thus providing a complete end-to-end solution. We evaluate our approach on case studies from the automotive and aeronautics domains, producing efficient NN implementations with rigorous infinite-time horizon safety guarantees.

随着神经网络在安全临界神经网络控制的网络物理系统中越来越普遍,正式保障其安全变得至关重要。对于这些系统,必须在整个运行过程中确保安全,需要无限时间的地平线核查。为了核查NNCS无限的地平线安全,最近的方法利用了差异动态逻辑(dL),但这些基于DNA的保障依赖于理想化、实际价值的NNN语语义,无法说明实施有限精确度时引入的逆差。本文缩小了理论保障与现实世界实施之间的差距,在这些系统的整个运行过程中,必须保证安全,在感测、操作和计算中,将稳健性纳入安全核查。我们把这一问题作为好魔、负责控制行动、坏天使之间的混合游戏,引入了扰动。这种提法能够正式证明稳健性(r.t.) 受限性精确度执行的(约束) 。用这一约束来弥合理论保障与现实世界实施之间的差距,我们采用了在有限精确的精确度扰动下实施模式,我们用州级的混合精确度方法,我们用一个高效的州-闭式同步的州化解决方案,我们用一个同步的同步的解决方案来进行同步的合成的案例研究,我们用一个同步的同步的合成的解决方案,从一个同步的合成的合成的解决方案的合成的合成的合成的解决方案,从生成的落实到一个同步执行。

Article 213

Title@2025-07-30 (3): MASCA: LLM based-Multi Agents System for Credit Assessment

Title: MASCA: LLM based-Multi Agents System for Credit Assessment

MASCA: LLM-basiertes Multi-Agenten-System zur Bonitätsbeurteilung

MASCA: 以LLM为基础的信用评估多边代理系统 2507.22758v1

Authors (3): Gautam Jajoo, Pranjal A Chitale, Saksham Agarwal

Recent advancements in financial problem-solving have leveraged LLMs and agent-based systems, with a primary focus on trading and financial modeling. However, credit assessment remains an underexplored challenge, traditionally dependent on rule-based methods and statistical models. In this paper, we introduce MASCA, an LLM-driven multi-agent system designed to enhance credit evaluation by mirroring real-world decision-making processes. The framework employs a layered architecture where specialized LLM-based agents collaboratively tackle sub-tasks. Additionally, we integrate contrastive learning for risk and reward assessment to optimize decision-making. We further present a signaling game theory perspective on hierarchical multi-agent systems, offering theoretical insights into their structure and interactions. Our paper also includes a detailed bias analysis in credit assessment, addressing fairness concerns. Experimental results demonstrate that MASCA outperforms baseline approaches, highlighting the effectiveness of hierarchical LLM-based multi-agent systems in financial applications, particularly in credit scoring.

最近在金融问题解决方面的进展利用了LLMs和代理系统,主要侧重于贸易和金融模式,然而,信用评估仍是一个未得到充分探讨的挑战,传统上依赖基于规则的方法和统计模式。我们在本文件中引入了由LMCA驱动的多种代理系统,即由LMCA驱动的多种代理系统,目的是通过反映现实世界的决策进程加强信用评估。框架采用一个多层结构,由基于LLMM的专门代理机构协作应对次级任务。此外,我们将风险和奖励评估的对比性学习纳入优化决策。我们进一步展示了有关等级多试剂系统的示性游戏理论观点,提供了对其结构和互动的理论见解。我们的文件还包括了信用评估中的详细偏见分析,解决公平问题。实验结果表明,MASCA在金融应用中,特别是信用评分中,以LMM为主的多代理系统优于基线方法,突出其效力。

Article 214

Title@2025-07-30 (3): Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Title: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Adressierung von Darstellungskollapsen in Vector Quantized Models mit einer linearen Ebene

处理单线层矢量量化模型中代表折叠情况 2411.02038v2

Authors (5): Yongxin Zhu, Bocheng Li, Yifei Xin, Zhihua Xia, Linli Xu

Vector Quantization (VQ) is essential for discretizing continuous representations in unsupervised learning but suffers from representation collapse, causing low codebook utilization and limiting scalability. Existing solutions often rely on complex optimizations or reduce latent dimensionality, which compromises model capacity and fails to fully solve the problem. We identify the root cause as disjoint codebook optimization, where only a few code vectors are updated via gradient descent. To fix this, we propose \textbf{Sim}ple\textbf{VQ}, which reparameterizes code vectors through a learnable linear transformation layer over a latent basis, optimizing the \textit{entire linear space} rather than nearest \textit{individual code vectors}. Although the multiplication of two linear matrices is equivalent to applying a single linear layer, this simple approach effectively prevents collapse. Extensive experiments on image and audio tasks demonstrate that SimVQ improves codebook usage, is easy to implement, and generalizes well across modalities and architectures.

矢量量化( VQ) 对于在不受监督的学习中使连续的表达方式分解至关重要, 但它会受到代表方式崩溃的影响, 导致代码簿使用率低, 并限制可缩放性。现有的解决方案往往依赖于复杂的优化或降低潜在维度, 从而影响模型能力, 无法完全解决问题。我们确认根本原因为代码簿优化脱节, 即只有少量的代码矢量通过渐渐下降更新。为了解决这个问题, 我们提议 \ textbf{Sim}ple\textbf{V}}, 它将代码矢量通过可学习的线性转换层在潜基上重新校准, 优化 \ textit{ 线性空间} 而不是最近的\ textit{ 个人代码矢量。虽然两个线性矩阵的乘法相当于应用单一线性层, 但这一简单的方法有效防止崩溃。在图像和音频任务上进行的广泛实验表明, SimVQ 改进了代码簿的使用, 很容易实施, 并广泛归纳各种模式和结构。

Article 215

Title@2025-07-30 (3): Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Title: Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Bewertungsprüfer: Bewertung der synthetischen Überprüfung für Code und Begründung

标定验证符:评估编码和理由的合成核查 2502.13820v3

Authors (4): Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg

Synthetic verification techniques such as generating test cases and reward modelling are common ways to enhance the coding capabilities of large language models (LLM) beyond predefined tests. Additionally, code verification has recently found great success as a critical component in improving reasoning capability of LLMs via reinforcement learning. In this paper, we propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We also propose multiple metrics to measure different aspects of the synthetic verifiers with the proposed benchmarks. By employing the proposed approach, we release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.

合成核查技术,如产生测试案例和奖励建模,是提高大型语言模型(LLM)的编码能力,超越预先界定的测试的常见方法。此外,守则核查最近发现,作为通过强化学习提高LLMS推理能力的一个关键组成部分,在通过强化学习提高LMS推理能力方面,取得了巨大成功。在本文件中,我们提出一种方法,可将现有的编码基准转换成评分和排名数据集,以评价合成核查员的效力。我们还提出多种指标,用拟议基准衡量合成核查员的不同方面。通过采用拟议方法,我们发布了四个新的基准(HE-R、HE-R+、MBPP-R和MBPP-R+),并以标准、推理和奖励为基础的LMS分析合成核查方法。我们的实验表明,推理可以大大改进测试案例的生成,扩大测试案例的数量可以提高核查的准确性。

Article 216

Title@2025-07-30 (3): RocketStack: Level-aware deep recursive ensemble learning framework with adaptive feature fusion and model pruning dynamics

Title: RocketStack: Level-aware deep recursive ensemble learning framework with adaptive feature fusion and model pruning dynamics

RocketStack: Level-aware deep rekursive ensemble Learning Framework mit adaptiver Funktion Fusion und Modellschnitt Dynamik

火箭堆: 具有适应性特征聚集和模型排出动态的有意识的深层循环深层共聚学习框架 2506.16965v2

Authors (1): Çağatay Demirel

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains rare, as most designs prioritize horizontal diversity over recursive depth due to model complexity, feature redundancy, and computational burden. To address these challenges, RocketStack, a level-aware recursive ensemble framework, is introduced and explored up to ten stacking levels, extending beyond prior architectures. The framework incrementally prunes weaker learners at each level, enabling deeper stacking without excessive complexity. To mitigate early performance saturation, mild Gaussian noise is added to out-of-fold (OOF) scores before pruning, and compared against strict OOF pruning. Further both per-level and periodic feature compressions are explored using attention-based selection, Simple, Fast, Efficient (SFE) filter, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), linear-trend tests confirmed rising accuracy with depth in most variants, and the top performing meta-model at each level increasingly outperformed the strongest standalone ensemble. In the binary subset, periodic SFE with mild OOF-score randomization reached 97.08% at level 10, 5.14% above the strict-pruning configuration and cut runtime by 10.5% relative to no compression. In the multi-class subset, periodic attention selection reached 98.60% at level 10, exceeding the strongest baseline by 6.11%, while reducing runtime by 56.1% and feature dimensionality by 74% compared to no compression. These findings highlight mild randomization as an effective regularizer and periodic compression as a stabilizer. Echoing the design of multistage rockets in aerospace (prune, compress, propel) RocketStack achieves deep recursive ensembling with tractable complexity.

堆叠式学习仍然是机器学习的基石,它用来通过一个元模型整合多个基础学习者的预测。然而,深堆式仍然很少,因为由于模型复杂、特性冗余和计算负担,大多数设计将横向多样性置于循环深度之上,因为由于模型复杂、特性冗余和计算负担等原因,大多数设计将横向多样性排在了循环深度之上。为了应对这些挑战,引入并探索了具有水平的循环组合框架,最高达十层堆叠式,超越了以前的架构。框架在每个级别逐步淡化较弱的学习者,使更深堆叠不复杂。为了减轻早期性饱和,低高山噪音仍然很少。为了减少早期性能(OOF)的分数分数分数,由于模型的精度(OOOF)的精度,与严格的 OOOFO的分数。进一步的一级和周期性特征压缩,通过基于关注选择、简单、快速、快速、高效(SFE)的分解式,通过最深层的直径直径直径(O)的直径直径直径直径直径直径直到最深层。

Article 217

Title@2025-07-30 (3): FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

Title: FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

FLOSS: Kostenloses Mittagessen in Open-Vocabulary Semantic Segmentation

FLOSS: 开放词汇语义分割中的免费午餐 2504.10487v2

Authors (5): Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Raoul de Charette

In this paper, we challenge the conventional practice in Open-Vocabulary Semantic Segmentation (OVSS) of using averaged class-wise text embeddings, which are typically obtained by encoding each class name with multiple templates (e.g., a photo of , a sketch of a ). We investigate the impact of templates for OVSS, and find that for each class, there exist single-template classifiers--which we refer to as class-experts--that significantly outperform the conventional averaged classifier. First, to identify these class-experts, we introduce a novel approach that estimates them without any labeled data or training. By leveraging the class-wise prediction entropy of single-template classifiers, we select those yielding the lowest entropy as the most reliable class-experts. Second, we combine the outputs of class-experts in a new fusion process. Our plug-and-play method, coined FLOSS, is orthogonal and complementary to existing OVSS methods, offering an improvement without the need for additional labels or training. Extensive experiments show that FLOSS consistently enhances state-of-the-art OVSS models, generalizes well across datasets with different distribution shifts, and delivers substantial improvements in low-data scenarios where only a few unlabeled images are available. Our code is available at https://github.com/yasserben/FLOSS .

在本文中,我们质疑使用平均类类文字嵌入器(OVSS)的常规做法,即使用平均类类文字嵌入器(OVSS)使用平均类类文字嵌入器(通常通过将每个类名称编码成多个模板(例如, < 类 > 的照片,一个 < 类的草图 ) 获得)。我们调查了OVSS模板的影响,发现对于每个类,我们称之为类专家的单类板分类器都存在,大大优于常规平均分类师。首先,为了确定这些类专家,我们采用了一种新颖的方法,在没有任何标签数据或培训的情况下,对每个类名称进行估算。我们通过利用单类分类分类师的课堂预测指南,我们选择那些产生最小的源代码作为最可靠的类专家。第二,我们将班专家的产出结合到一个新的聚合过程中。我们所采用的插接式和游戏方法(coded FLOSOSS)是对现有非类类开放源码软件方法的任意和补充方法,提供了改进,无需在常规分类/LVSA中进行更多的变换,在常规的模型中提供大量数据。

Article 218

Title@2025-07-30 (3): Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Title: Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Erhöhung der Ultra-Low-Bit-Quantisierung großer Sprachmodelle durch Saliency-Aware Partial Retraining

通过提高质量-软件部分再培训,加强大语言模型的超低比小量量化 2504.13932v3

Authors (2): Deyu Cao, Samin Aref

The growing use of large language models has raised environmental and economic concerns about their intensity of resource usage during inference. Serving these models to each user requires substantial energy and water for cooling. Model compression techniques like quantization can shrink large language models and make them more resource efficient at the cost of potential performance degradation. Quantization methods compress model size through replacing their high-precision parameters by quantized values of lower precision. Among existing methods, the ApiQ method achieves superior accuracy preservation at minimal memory and time overhead. We investigate two ideas to extend performance in ultra-low-bit quantization beyond ApiQ’s level. First, we look into combining existing quantization-aware training techniques with ApiQ’s partial training. We show that this does not outperform the baseline ApiQ method with limited training data and frozen weights. This leads to two key insights: (1) The substantial representational capacity that is gained through full retraining is unlikely to be feasible through partial training. (2) This gain may depend on using a large and diverse dataset in quantization-aware training. Second, through a novel approach informed by the two insights, we propose an ultra-low-bit quantization method that builds upon ApiQ and extends its performance without the need for full retraining. This publicly available method relies on a saliency-aware regularization term that prioritizes preserving the most impactful parameters during quantization. Our experiments on LLaMA 7B and 13B benchmarks demonstrate that our method reduces the ApiQ’s accuracy degradation by 10.85% and 7.54% respectively. A Python implementation of the proposed quantization method is publicly available on GitHub https://github.com/TokuyuSou/ULB-SAPR.

大型语言模型的使用日益增多,引起了人们对在推断期间资源使用强度的环境和经济关切。向每个用户提供这些模型需要大量的能量和水来进行冷却。模型压缩技术,如量化等,可以压缩大型语言模型,使其以潜在性能退化的代价提高资源效率。量化方法压缩模型规模,以低精度值的分量值取代高精度参数。在现有方法中,ApiQ方法在最小的记忆和时间管理上实现高度准确性保护。我们调查了将超低比特四分化的性能扩大到ApiQ的两种想法。首先,我们研究了将现有的量化培训技术与ApiQ的部分培训结合起来的情况。我们表明,这并没有以有限的培训数据和冷却的重量来超过ApiQ的基线方法。这导致两个关键见解:(1) 通过全面再培训获得的大量代表能力不太可能通过部分培训获得。(2)这一增益取决于在夸度化培训中采用大规模和多样化的数据设置的参数。首先,我们研究了现有的量级-测测测测测测测测标准,在Sqial-ralreal再使用两种方法后,我们提出了一种完整的业绩方法。

Article 219

Title@2025-07-30 (3): Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

Title: Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

Verbesserte Vorhersage der CAR T-Zell-Zytotoxizität mit Quanten-Kernel-Methoden

采用量子管方法增强对CAR T-Cell Cyt毒性的预测 2507.22710v1

Authors (9): Filippo Utro, Meltem Tolunay, Kahn Rhrissorrakrai, Tanvi P. Gujarati, Jie Shi, Sara Capponi, Mirko Amico, Nate Earnest-Noble, Laxmi Parida

Chimeric antigen receptor (CAR) T-cells are T-cells engineered to recognize and kill specific tumor cells. Through their extracellular domains, CAR T-cells bind tumor cell antigens which triggers CAR T activation and proliferation. These processes are regulated by co-stimulatory domains present in the intracellular region of the CAR T-cell. Through integrating novel signaling components into the co-stimulatory domains, it is possible to modify CAR T-cell phenotype. Identifying and experimentally testing new CAR constructs based on libraries of co-stimulatory domains is nontrivial given the vast combinatorial space defined by such libraries. This leads to a highly data constrained, poorly explored combinatorial problem, where the experiments undersample all possible combinations. We propose a quantum approach using a Projected Quantum Kernel (PQK) to address this challenge. PQK operates by embedding classical data into a high dimensional Hilbert space and employs a kernel method to measure sample similarity. Using 61 qubits on a gate-based quantum computer, we demonstrate the largest PQK application to date and an enhancement in the classification performance over purely classical machine learning methods for CAR T cytotoxicity prediction. Importantly, we show improved learning for specific signaling domains and domain positions, particularly where there was lower information highlighting the potential for quantum computing in data-constrained problems.

T细胞是T细胞,用来识别和杀死特定的肿瘤细胞。通过它们的外细胞域,CAR T细胞将肿瘤细胞抗原捆绑起来,触发CAR T激活和扩散。这些过程由CAR T细胞内细胞区域内存在的共振域管理。通过将新型信号元件纳入共振域,可以修改CAR T细胞型CAR T细胞类型。查明和实验测试基于共振域图书馆的新CAR结构,鉴于这些图书馆界定的庞大组合空间,CAR T细胞将肿瘤细胞抗原捆绑起来。这导致数据高度受限、探索不善的组合问题,使所有可能的组合都受到共振动域的制约。我们提出一个量子方法,利用预测的 Quantum Kernel (PQK) 来应对这一挑战。PK通过将古典数据嵌入高维度的Hilbert空间来操作,并使用一种内核模方法来测量样本相似性。在纯基域域域域的基调位置上,利用61个基比特,特别是基调的组合空间,我们用最高级的基级的基质级的计算机显示我们最高级的升级的升级的升级的升级的升级数据,用来来进行数据分析。

Article 220

Title@2025-07-30 (3): Unsupervised Learning in Echo State Networks for Input Reconstruction

Title: Unsupervised Learning in Echo State Networks for Input Reconstruction

Unüberwachtes Lernen in Echo State Networks für Input-Reconstruction

在回声州投入重建网络中无人监督的学习 2501.11409v4

Authors (3): Taiki Yamada, Yuichi Katori, Kantaro Fujiwara

Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle: not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.

回声状态网络(ESNs)是一系列经常性神经网络,其中只有读取层是可训练的,而经常和输入层则是固定的。这种建筑限制使得能够对时间序列数据进行计算效率的处理。传统上,ESNs的读取层是使用有监督的学习和目标产出来培训的。在本研究中,我们侧重于输入重建(IR),在这个读取层受过训练,以重建输入到ESN的输入时间序列。我们表明,IR可以通过不受监督的学习(UL)实现,而不能进入受监督的目标,只要ESN参数是已知的先验和满足可逆性条件。这种设计允许依赖IR的应用,如动态系统复制和噪音过滤,通过直接结合现有算法在 UL框架内重新制定。我们的结果表明,对ESN参数的先前知识可以减少对监督的依赖,从而确立新的原则:不仅通过确定网络参数的一部分,而且通过利用它们的具体价值。此外,我们基于UL的输入重建和相关任务的算法适合于自主处理,通过直观的数学基础的深度理解,从而了解其数学计算模型的更深层次分析模型。

Article 221

Title@2025-07-30 (3): Inferring biological processes with intrinsic noise from cross-sectional data

Title: Inferring biological processes with intrinsic noise from cross-sectional data

Ableitung biologischer Prozesse mit intrinsischem Rauschen aus Querschnittsdaten

从跨部门数据中以内在噪音推断生物过程 2410.07501v2

Authors (3): Suryanarayana Maddu, Victor Chardès, Michael. J. Shelley

Inferring dynamical models from data continues to be a significant challenge in computational biology, especially given the stochastic nature of many biological processes. We explore a common scenario in omics, where statistically independent cross-sectional samples are available at a few time points, and the goal is to infer the underlying diffusion process that generated the data. Existing inference approaches often simplify or ignore noise intrinsic to the system, compromising accuracy for the sake of optimization ease. We circumvent this compromise by inferring the phase-space probability flow that shares the same time-dependent marginal distributions as the underlying stochastic process. Our approach, probability flow inference (PFI), disentangles force from intrinsic stochasticity while retaining the algorithmic ease of ODE inference. Analytically, we prove that for Ornstein-Uhlenbeck processes the regularized PFI formalism yields a unique solution in the limit of well-sampled distributions. In practical applications, we show that PFI enables accurate parameter and force estimation in high-dimensional stochastic reaction networks, and that it allows inference of cell differentiation dynamics with molecular noise, outperforming state-of-the-art approaches.

从数据中推导动态模型仍然是计算生物学中的一个重大挑战,特别是考虑到许多生物过程的随机性质。我们探索了在动脉中常见的情景,在动脉中可以找到统计上独立的跨部门样本,目的是推断产生数据的基本扩散过程。现有的推理方法往往简化或忽略系统固有的噪音,为了最优化的方便而降低准确性。我们绕过这一折中,通过推断相位空间概率流,与基本生理过程具有相同时间依赖的边际分布。我们的方法、概率流导(PFI),从内在的随机性中分离出力量,同时保留ODE的算法简单性。从分析上,我们证明对Ornstein-Uhlenbeck来说,正规化的PFI正式主义过程在广泛抽样分布的限度中产生了一种独特的解决办法。在实际应用中,我们表明PFI能够在高维度的随机反应网络中进行精确的参数和强度估计。我们发现,它允许以分子分辨的分子动态方法,从而得出细胞动态。

Article 222

Title@2025-07-30 (3): Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

Title: Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

Unüberwachtes Lernen: Vergleichende Analyse von Clustering-Techniken auf hochdimensionalen Daten

未受监督的学习:高数据群集技术比较分析 2503.23215v2

Authors (2): Vishnu Vardhan Baligodugula, Fathi Amsaad

This paper presents a comprehensive comparative analysis of prominent clustering algorithms K-means, DBSCAN, and Spectral Clustering on high-dimensional datasets. We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques (PCA, t-SNE, and UMAP) using diverse quantitative metrics. Experiments conducted on MNIST, Fashion-MNIST, and UCI HAR datasets reveal that preprocessing with UMAP consistently improves clustering quality across all algorithms, with Spectral Clustering demonstrating superior performance on complex manifold structures. Our findings show that algorithm selection should be guided by data characteristics, with Kmeans excelling in computational efficiency, DBSCAN in handling irregular clusters, and Spectral Clustering in capturing complex relationships. This research contributes a systematic approach for evaluating and selecting clustering techniques for high dimensional data applications.

本文件对突出的群集算法K值、DBSCAN和高维数据集的光谱集成进行了全面的比较分析。我们引入了一个新的评价框架,利用不同的量化指标评估多维减少技术(PCA、t-SNE和UMAP)的组合性业绩。对MNIST、时装MNIST和UCI HAR数据集进行的实验显示,与UMAP一起的预处理始终在提高所有算法的群集质量,而光谱集成显示复杂多元结构的优异性。我们的调查结果显示,算法选择应该以数据特征为指导,在计算效率方面优异,DBSCAN在处理非正常组群集方面优异,在捕捉复杂关系方面光谱集。这一研究为评价和选择高维数据应用的群集技术提供了系统方法。

Article 223

Title@2025-07-30 (3): Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Title: Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Lokale Mischungen von Experten: Im Wesentlichen kostenlose Test-Zeit-Training über Modellverschmelzung

当地专家混合:通过模式合并进行的基本免费试验时间培训 2505.14136v2

Authors (4): Ryo Bertolissi, Jonas Hübotter, Ido Hakimi, Andreas Krause

Mixture of expert (MoE) models are a promising approach to increasing model capacity without increasing inference cost, and are core components of many state-of-the-art language models. However, current MoE models typically use only few experts due to prohibitive training and inference cost. We propose Test-Time Model Merging (TTMM) which scales the MoE paradigm to an order of magnitude more experts and uses model merging to avoid almost any test-time overhead. We show that TTMM is an approximation of test-time training (TTT), which fine-tunes an expert model for each prediction task, i.e., prompt. TTT has recently been shown to significantly improve language models, but is computationally expensive. We find that performance of TTMM improves with more experts and approaches the performance of TTT. Moreover, we find that with a 1B parameter base model, TTMM is more than 100x faster than TTT at test-time by amortizing the cost of TTT at train-time. Thus, TTMM offers a promising cost-effective approach to scale test-time training.

专家(MoE)模型的混合是提高模型能力而又不增加推论成本的一个很有希望的方法,也是许多最先进的语言模型的核心组成部分。然而,由于培训令人望而却步和推论成本,目前的教育部模型通常只使用少数专家。我们提议试验时间模型合并(TTMM),将教育部模型的规模扩大到一个数量级,更多的专家,并使用模型合并来避免几乎任何试验时间的间接费用。我们表明TTMM是测试时间培训的近似值。TTMM是测试时间培训(TTTT)的近似值,它为每个预测任务(即迅速)精确地调整了专家模型。TTTT最近显示,大大改进了语言模型,但计算成本成本昂贵。我们发现TTMM的业绩随着更多的专家和接近TTT的绩效而得到改善。此外,我们发现,如果采用1B参数基准模型,TMM在测试时间比TTT快100倍以上,通过在培训时间摊销TTTT的费用,TMM在试验时间提供有希望的成本效益的方法。

Article 224

Title@2025-07-30 (3): Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Title: Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Raumsprache Likelihood Grounding Network für Bayesian Fusion von Mensch-Roboter-Beobachtungen

Bayesian人类-机器人观测融合空间语言定位网络 2507.19947v2

Authors (4): Supawich Sitdhipol, Waritwong Sukprasongdee, Ekapol Chuangsuwanich, Rina Tse

Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.

人类观测的阻燃信息可以帮助机器人克服协作任务中的感知限制。然而,一个具有不确定性的聚合框架需要具有代表人类投入不确定性的有根有据的可能性。本文介绍了一个地貌虫状金字塔网络(FP-LGN),它通过学习相关的地图图像特征及其与空间关系语义的关系而将空间语言作为基础。模型被培训为概率估计器,以便利用三阶段课程学习来捕捉人类语言中的感知性不确定性。结果显示,FP-LGN与专家设计的规则相匹配,其平均值为负日志-日产(NLLL),并显示在较低标准偏差情况下更加稳健。合作感结果表明,基于地貌的概率成功地促成了多种人类语言观测和机器人传感器测量的不确定性-认知融合,从而在人类机器人协作性工作方面取得了显著的改进。

Article 225

Title@2025-07-30 (3): Cluster-Based Random Forest Visualization and Interpretation

Title: Cluster-Based Random Forest Visualization and Interpretation

Clusterbasierte Random Forest Visualisierung und Interpretation

以集束为基础的随机森林视觉化和解释 2507.22665v1

Authors (5): Max Sondag, Christofer Meinecke, Dennis Collaris, Tatiana von Landesberger, Stef van den Elzen

Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the “Glass” dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.

随机森林是一种机械学习方法,用来自动分类数据集,由多种决策树组成。这些随机森林的性能通常比单一决策树要高,一般化程度也好,但解释起来也比较困难。本文展示了一种可视化的方法和系统,以增加随机森林的可解释性。我们将类似的树木分组起来,使用户能够解释模型的一般表现,而无需详细分析每个单个决策树,或解释整个森林的过于简化的概要。为了有意义地分组决定树,我们引入了新的距离测量,既考虑到决策规则,也考虑到对一对决策树的预测。我们还提出了两种新的可视化方法,既可视化了集成的和单个决策树的可视化:(1) 特征图,它直观地描绘了决策树的特征,而不需要详细分析每个决策树的地形,又解释一个规则的图。我们通过对“Glass”数据集进行案例研究来展示我们的方法的有效性,该数据集是一个相对复杂的标准机器学习数据集,以及一个小用户研究。

Article 226

Title@2025-07-30 (3): Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG

Title: Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG

Lag nicht, RAG: Training-freie Adversarial Detection mit RAG

不要拉格,RAG:使用RAG进行无训练的反向探测 2504.04858v3

Authors (4): Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar

Adversarial patch attacks pose a major threat to vision systems by embedding localized perturbations that mislead deep models. Traditional defense methods often require retraining or fine-tuning, making them impractical for real-world deployment. We propose a training-free Visual Retrieval-Augmented Generation (VRAG) framework that integrates Vision-Language Models (VLMs) for adversarial patch detection. By retrieving visually similar patches and images that resemble stored attacks in a continuously expanding database, VRAG performs generative reasoning to identify diverse attack types, all without additional training or fine-tuning. We extensively evaluate open-source large-scale VLMs, including Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO, alongside Gemini-2.0, a closed-source model. Notably, the open-source UI-TARS-72B-DPO model achieves up to 95 percent classification accuracy, setting a new state-of-the-art for open-source adversarial patch detection. Gemini-2.0 attains the highest overall accuracy, 98 percent, but remains closed-source. Experimental results demonstrate VRAG’s effectiveness in identifying a variety of adversarial patches with minimal human annotation, paving the way for robust, practical defenses against evolving adversarial patch attacks.

通过嵌入局部扰动,误导深层模型,Adversarial Adversarial Communication攻击对视觉系统构成重大威胁。传统防御方法往往需要再培训或微调,使其不适合于现实世界的部署。我们提议了一个无培训的视觉回溯回动一代(VRAG)框架,将视觉-语言模型(VLMS)与Gemini-2.0的封闭源模式结合起来。值得注意的是,开放源代码的UI-TARS-72B-DPO模型在不断扩展的数据库中获取类似存储式袭击的类似视觉补丁和图像,实现了95%的分类精确度,在没有额外培训或微调的情况下,都确定了各种不同的攻击类型。我们广泛评价了开放源大型VLMS,包括Qwen-VL-Plus、Qwen2.5-VL-72B和UI-TARS-72B-DPO(VRAG-72B-DPO),以及封闭源模型。一个最高水平的保密性测试结果,用于不断升级的VG-TAR-TAR-72B-DPO。

Article 227

Title@2025-07-30 (3): Transductive Model Selection under Prior Probability Shift

Title: Transductive Model Selection under Prior Probability Shift

Transduktive Modellauswahl unter vorheriger Wahrscheinlichkeitsverschiebung

先前概率变化下的转变模式选择 2507.22647v1

Authors (3): Lorenzo Volpi, Alejandro Moreo, Fabrizio Sebastiani

Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning, the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the IID assumption does not hold. We here propose a method, tailored to transductive classification contexts, for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the labelled training data. We provide experimental results that show the benefits brought about by our method.

与传统的感化学习不同,转基因学习是一项受监督的机器学习任务,与传统的感化学习不同,需要贴标签的未贴标签数据是一套有限的数据集,可在培训时间提供。与感化学习环境类似,转基因学习环境可能受到数据集变化的影响,即IID的假设可能无法维持。我们在此提出一种适应感化分类环境的方法,用于在数据显示先前概率变化时进行模型选择(即超参数优化),这是典型的反阴道学习问题的一种重要的数据集转换类型。在我们提议的方法中,超参数可以直接在必须应用的未贴标签数据上优化;这与传统的模式选择方法不同,即基于对标签的培训数据进行交叉校验。我们提供了实验结果,以显示我们方法带来的效益。

Article 228

Title@2025-07-30 (3): Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

Title: Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

Sichere Einführung von Offline-Verstärkungslernen über Input Convex-Action-Korrektur

通过投入Convex行动校正安全部署离线强化学习 2507.22640v1

Authors (7): Alex Durkin, Jasper Stolte, Matthew Jones, Raghuraman Pitchumani, Bei Li, Christian Michler, Mehmet Mercangöz

Offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems using historical data, without the risks or costs of online experimentation. This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor. We introduce a Gymnasium-compatible simulation environment that captures the reactor’s nonlinear dynamics, including reaction kinetics, energy balances, and operational constraints. The environment supports three industrially relevant scenarios: startup, grade change down, and grade change up. It also includes reproducible offline datasets generated from proportional-integral controllers with randomised tunings, providing a benchmark for evaluating offline RL algorithms in realistic process control tasks. We assess behaviour cloning and implicit Q-learning as baseline algorithms, highlighting the challenges offline agents face, including steady-state offsets and degraded performance near setpoints. To address these issues, we propose a novel deployment-time safety layer that performs gradient-based action correction using input convex neural networks (PICNNs) as learned cost models. The PICNN enables real-time, differentiable correction of policy actions by descending a convex, state-conditioned cost surface, without requiring retraining or environment interaction. Experimental results show that offline RL, particularly when combined with convex action correction, can outperform traditional control approaches and maintain stability across all scenarios. These findings demonstrate the feasibility of integrating offline RL with interpretable and safety-aware corrections for high-stakes chemical process control, and lay the groundwork for more reliable data-driven automation in industrial systems.

离线强化学习(离线RL)为利用历史数据制定化学工艺系统控制战略提供了一个很有希望的框架,没有在线实验的风险或成本。这项工作调查了离线RL用于安全有效地控制异热聚合连续振动式堆积反应堆。我们引入了一个反映反应堆非线性动态(包括反应动能、能源平衡和业务限制)的Gymnasium兼容模拟环境。环境支持了三种与工业相关的情景:启动、降级解释和升级。它还包括通过随机调试,重新复制从离线的离线RL控制器产生的离线数据集,为在现实的进程控制任务中评估离线的RL算法提供了基准。我们评估行为克隆和隐含的Q-学习作为基线算法,突出离线剂面临的挑战,包括稳定状态抵消和接近定点的性性能。为了解决这些问题,我们建议建立一个新型的部署-时间安全层,利用输入的内线(PICNIS)的内线(S-nel-rolial-ligal-deal-deal-deal-creal-deal-coltraction) 校正平流,我们用不同的成本模型来显示不同的成本行动,用不同的成本行动,我们的行为克隆过程,我们评估过程,以显示不同的成本-cent-cent-creal-creal-creal-traction-cal-col-cal-cal-traction-traction-cal-cal-cal-cal-cal-cal-cal-cal-d-caltraction-caltraction-d-d-d-d-d-cal-d-d-d-d-d-d-d-d-dal-d-d-d-d-dal-d-dal-d-d-d-d-d-d-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-l-d-d-d-d-d-d-cal-d-l-d-d-d-l-l-l-mod-cal-cal-cal-cal-cal-cal-cal-d-

Article 229

Title@2025-07-30 (3): trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images

Title: trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images

trAIce3D: Ein prompt-getriebenes Transformer-basiertes U-Net zur semantischen Segmentierung von Mikrogliazellen aus großformatigen 3D-Mikroskopiebildern

trAIce3D: 一个基于U-Net的快速驱动变形器,用于从大型 3D 显微镜片图像中对微晶体细胞进行语义分解 2507.22635v1

Authors (4): MohammadAmin Alamalhoda, Arsalan Firoozi, Alessandro Venturino, Sandra Siegert

The shape of a cell contains essential information about its function within the biological system. Segmenting these structures from large-scale 3D microscopy images is challenging, limiting clinical insights especially for microglia, immune-associated cells involved in neurodegenerative diseases. Existing segmentation methods mainly focus on cell bodies, struggle with overlapping structures, perform poorly on noisy images, require hyperparameter tuning for each new dataset, or rely on tedious semi-automated approaches. We introduce trAIce3D, a deep-learning architecture designed for precise microglia segmentation, capturing both somas and branches. It employs a two-stage approach: first, a 3D U-Net with vision transformers in the encoder detects somas using a sliding-window technique to cover the entire image. Then, the same architecture, enhanced with cross-attention blocks in skip connections, refines each soma and its branches by using soma coordinates as a prompt and a 3D window around the target cell as input. Training occurs in two phases: self-supervised Soma Segmentation, followed by prompt-based Branch Segmentation, leveraging pre-trained weights from the first phase. Trained and evaluated on a dataset of 41,230 microglial cells, trAIce3D significantly improves segmentation accuracy and generalization, enabling scalable analysis of complex cellular morphologies. While optimized for microglia, its architecture can extend to other intricate cell types, such as neurons and astrocytes, broadening its impact on neurobiological research.

细胞形状包含有关其在生物系统中功能的基本信息。从大型 3D 显微镜图像中将这些结构从大规模 3D 神经退化性疾病中分离出来, 具有挑战性, 限制了临床洞察力, 限制了临床洞察力, 特别是对于与神经退化性疾病有关的免疫相关细胞。现有的分解方法主要侧重于细胞体, 与重叠结构进行斗争, 工作在噪音图像上表现不佳, 需要对每个新数据集进行超光谱调, 或者依赖烦琐的半自动方法。我们引入了 trAI3D, 这是一种深层学习结构, 用于精确的微流层分割, 捕捉到音质和分支。它采用两阶段的培训: 首先, 3D 3D 宽度的视觉变异网络, 利用滑动式软体技术检测沙变形。然后, 同样的结构, 由跳动的交叉感应注意区块加以强化, 改进每个沙眼和它的分支, 在目标的精度细胞周围的3D 。培训分为两个阶段: 自我校准 Soma , , 直观分析, 直径直径分析直径。

Article 230

Title@2025-07-30 (3): A Unified Analysis of Generalization and Sample Complexity for Semi-Supervised Domain Adaptation

Title: A Unified Analysis of Generalization and Sample Complexity for Semi-Supervised Domain Adaptation

Eine einheitliche Analyse von Generalisierung und Probenkomplexität für halbüberwachte Domain-Anpassung

半监督域适应通用和抽样复杂程度统一分析 2507.22632v1

Authors (2): Elif Vural, Huseyin Karaca

Domain adaptation seeks to leverage the abundant label information in a source domain to improve classification performance in a target domain with limited labels. While the field has seen extensive methodological development, its theoretical foundations remain relatively underexplored. Most existing theoretical analyses focus on simplified settings where the source and target domains share the same input space and relate target-domain performance to measures of domain discrepancy. Although insightful, these analyses may not fully capture the behavior of modern approaches that align domains into a shared space via feature transformations. In this paper, we present a comprehensive theoretical study of domain adaptation algorithms based on domain alignment. We consider the joint learning of domain-aligning feature transformations and a shared classifier in a semi-supervised setting. We first derive generalization bounds in a broad setting, in terms of covering numbers of the relevant function classes. We then extend our analysis to characterize the sample complexity of domain-adaptive neural networks employing maximum mean discrepancy (MMD) or adversarial objectives. Our results rely on a rigorous analysis of the covering numbers of these architectures. We show that, for both MMD-based and adversarial models, the sample complexity admits an upper bound that scales quadratically with network depth and width. Furthermore, our analysis suggests that in semi-supervised settings, robustness to limited labeled target data can be achieved by scaling the target loss proportionally to the square root of the number of labeled target samples. Experimental evaluation in both shallow and deep settings lends support to our theoretical findings.

校正校正校正校正校正 : 校正 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对 : 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对: 校对:校对: 校对: 校对: 校对: 校对:校对:校对:校对: 校对: 校对: 校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:校对:

Article 231

Title@2025-07-30 (3): Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Title: Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Graph Kollaboratives Aufmerksamkeitsnetzwerk für Link-Vorhersage in Wissensgraphen

知识图中预测联系协作关注网络 2507.03947v2

Authors (1): Thanh Hoang-Minh

Knowledge graphs offer a structured representation of real-world entities and their relationships, enabling a wide range of applications from information retrieval to automated reasoning. In this paper, we conduct a systematic comparison between traditional rule-based approaches and modern deep learning methods for link prediction. We focus on KBGAT, a graph neural network model that leverages multi-head attention to jointly encode both entity and relation features within local neighborhood structures. To advance this line of research, we introduce \textbf{GCAT} (Graph Collaborative Attention Network), a refined model that enhances context aggregation and interaction between heterogeneous nodes. Experimental results on four widely-used benchmark datasets demonstrate that GCAT not only consistently outperforms rule-based methods but also achieves competitive or superior performance compared to existing neural embedding models. Our findings highlight the advantages of attention-based architectures in capturing complex relational patterns for knowledge graph completion tasks.

知识图表提供了真实世界实体及其关系的结构化代表,使得从信息检索到自动推理等各种应用得以广泛应用。在本文中,我们系统地比较了传统基于规则的方法和现代的链接预测深学习方法。我们侧重于KBGAT,这是一个图形神经网络模型,它利用多头注意力将实体和当地邻里结构中的关系特征联合编码。为了推进这一研究线,我们引入了\ textbf{GCAT}(Graph Company Commission Network) (Groupf Community Communities Network) (Graph Commission Commission Network) (Graction) (Graph Commission Commission Net) (Group) (Graphen) (Graph Commissionality Commission) 。在四套广泛使用的基准数据集方面的实验结果显示,GCAT不仅一贯地表现了基于规则的方法,而且与现有的内嵌嵌模型相比,还取得了竞争或优越的优势。我们的发现基于关注的结构在捕捉取知识图表完成任务的复杂关系模式方面的优势。

Article 232

Title@2025-07-30 (3): The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Title: The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Die kooperative Netzwerkarchitektur: Lernstrukturierte Netzwerke als Darstellung sensorischer Muster

合作网络架构:学习结构网络作为感官模式的体现 2407.05650v4

Authors (5): Pascal J. Sager, Jan M. Deriu, Benjamin F. Grewe, Thilo Stadelmann, Christoph von der Malsburg

We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed “nets.” Nets are dynamically assembled from overlapping net fragments, which are learned based on statistical regularities in sensory input. This architecture offers robustness to noise, deformation, and out-of-distribution data, addressing challenges in current vision systems from a novel perspective. We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise. Our findings establish CNA as a promising paradigm for developing neural representations that integrate local feature processing with global structure formation, providing a foundation for future research on invariant object recognition.

我们引入了合作网络架构(CNA),这是一个代表感官信号的模型,它使用结构化的、经常连接的神经元网络网络,称为“网络 ” 。网络是由重叠的净碎片动态地组成的,它们是根据感官输入的统计规律学学到的。这一架构为噪音、变形和分配外数据提供了稳健性,从新角度应对当前视觉系统中的挑战。我们证明,在没有监督的情况下,可以学习净碎片,而无需灵活地重新组合,以编码新模式,使图象的完成和对噪音的适应能力。我们的发现建立了CNA,作为发展将本地地物处理与全球结构形成相结合的神经表层的有希望的模式,为未来关于异物识别的研究提供了基础。

Article 233

Title@2025-07-30 (3): Skull-stripping induces shortcut learning in MRI-based Alzheimer’s disease classification

Title: Skull-stripping induces shortcut learning in MRI-based Alzheimer’s disease classification

Skull-Stipendien induziert das Kurzlehren in der MRT-basierten Alzheimer-Klassifikation

Skull脱光诱发在以磁RI为基础的阿尔茨海默氏病分类中进行捷径学习。 2501.15831v3

Authors (6): Christian Tinauer, Maximilian Sackl, Rudolf Stollberger, Reinhold Schmidt, Stefan Ropele, Christian Langkammer

Objectives: High classification accuracy of Alzheimer’s disease (AD) from structural MRI has been achieved using deep neural networks, yet the specific image features contributing to these decisions remain unclear. In this study, the contributions of T1-weighted (T1w) gray-white matter texture, volumetric information, and preprocessing – particularly skull-stripping – were systematically assessed. Methods: A dataset of 990 matched T1w MRIs from AD patients and cognitively normal controls from the ADNI database were used. Preprocessing was varied through skull-stripping and intensity binarization to isolate texture and shape contributions. A 3D convolutional neural network was trained on each configuration, and classification performance was compared using exact McNemar tests with discrete Bonferroni-Holm correction. Feature relevance was analyzed using Layer-wise Relevance Propagation, image similarity metrics, and spectral clustering of relevance maps. Results: Despite substantial differences in image content, classification accuracy, sensitivity, and specificity remained stable across preprocessing conditions. Models trained on binarized images preserved performance, indicating minimal reliance on gray-white matter texture. Instead, volumetric features – particularly brain contours introduced through skull-stripping – were consistently used by the models. Conclusions: This behavior reflects a shortcut learning phenomenon, where preprocessing artifacts act as potentially unintended cues. The resulting Clever Hans effect emphasizes the critical importance of interpretability tools to reveal hidden biases and to ensure robust and trustworthy deep learning in medical imaging.

系统评估了结构性神经网络对阿尔茨海默氏病(AD)的高度分类准确性(AD),这是结构性磁共振病(AD)的高度分类精确性,但促成这些决定的具体图像特征仍不清楚。在本研究中,对T1加权(T1w)灰白色物质质素、体积信息以及预处理(特别是头骨剥离)的贡献进行了系统评估。方法:使用了AD病人的990个匹配T1wMSIMI数据集以及ADNI数据库的感知性正常控制。通过头骨剥离和强度二进制来进行预处理,以分离质素和形状贡献。3D革命性神经网络对每种配置进行了培训,并将分类性表现与精确的McNemar测试与离散的Bonferroni-Holm物质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质质进行对比测试。用体质质质质质质质质质质质质特征特征分析模型分析模型分析模型分析,其前的精确性特征特征特征分析分析分析分析分析分析分析分析,其前演性分析分析分析分析分析分析,其前演性能分析分析分析,其前演性能演性分析,其前演性表现性表现表现表现表现为:通过直性分析,其直性力分析,其前演演演前演变为直性力演变,其前演变,其前演变,其前演变,其直性力演变,其演变,其前演变,其前演变,其前演,其演变,其前演,其前演,其前演演,其前演,其前演,其前演演演演演演演演,其前,其前演前演,其前演,其前演变,其演演演演演演演演演演演演演,其前,其前,其前,其前,其为,其演性力分析,其前演演演演演,其演演演演演演,其为,其为,其演,其

Article 234

Title@2025-07-30 (3): TempRe: Template generation for single and direct multi-step retrosynthesis

Title: TempRe: Template generation for single and direct multi-step retrosynthesis

TempRe: Template-Generierung für einzelne und direkte Mehrschritt-Retrosynthese

Tempre: 用于单步和直接多步复演合成的模板生成 2507.21762v2

Authors (4): Nguyen Xuan-Vu, Daniel P Armstrong, Zlatko Jončev, Philippe Schwaller

Retrosynthesis planning remains a central challenge in molecular discovery due to the vast and complex chemical reaction space. While traditional template-based methods offer tractability, they suffer from poor scalability and limited generalization, and template-free generative approaches risk generating invalid reactions. In this work, we propose TempRe, a generative framework that reformulates template-based approaches as sequence generation, enabling scalable, flexible, and chemically plausible retrosynthesis. We evaluated TempRe across single-step and multi-step retrosynthesis tasks, demonstrating its superiority over both template classification and SMILES-based generation methods. On the PaRoutes multi-step benchmark, TempRe achieves strong top-k route accuracy. Furthermore, we extend TempRe to direct multi-step synthesis route generation, providing a lightweight and efficient alternative to conventional single-step and search-based approaches. These results highlight the potential of template generative modeling as a powerful paradigm in computer-aided synthesis planning.

由于化学反应空间广泛而复杂,重新合成规划仍然是分子发现的一个中心挑战,因为化学反应空间广泛而复杂,传统模板方法具有可移动性,但具有可缩放性且有限的一般性和无模板的基因化方法,产生无效反应的风险。在这项工作中,我们提议TempRe,这是一个基因化框架,将基于模板的方法改制为序列生成,使可缩放、灵活和化学上看似可信的复古法。我们评估了跨单步和多步复古法任务的Tempre,表明其优于模板分类和基于SMILES的生成方法。在Paroutes多步基准中,Tempre实现了强的顶级路径准确性。此外,我们将Temre扩大到直接的多步合成路径生成,为传统的单步和搜索方法提供了轻量和高效的替代方法。这些结果突出表明了模板变形模型作为计算机辅助合成规划的强大范例的潜力。

Article 235

Title@2025-07-30 (3): Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Title: Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Komprimierungsmethode für das Deep Diagonal State Space Model basierend auf $H^2$ Optimale Reduktion

以2千赫元最佳减少量为基础的深对角国家空间模型压缩方法 2507.10078v2

Authors (2): Hiroki Sakamoto, Kazuhiro Sato

Deep learning models incorporating linear SSMs have gained attention for capturing long-range dependencies in sequential data. However, their large parameter sizes pose challenges for deployment on resource-constrained devices. In this study, we propose an efficient parameter reduction method for these models by applying $H^{2}$ model order reduction techniques from control theory to their linear SSM components. In experiments, the LRA benchmark results show that the model compression based on our proposed method outperforms an existing method using the Balanced Truncation, while successfully reducing the number of parameters in the SSMs to $1/32$ without sacrificing the performance of the original models.

包含线性 SMS 的深层学习模型在连续数据中捕捉远程依赖性方面引起了注意,然而,它们的大参数大小对在资源有限的装置上部署构成挑战。在本研究中,我们建议为这些模型采用一种高效的参数削减方法,从控制理论到其线性 SSM 组件,从控制理论中应用$H2}示范减少订单技术。在实验中,上帝军的基准结果表明,基于我们拟议方法的模型压缩超过了使用平衡缩短的现有方法,同时在不牺牲原始模型性能的情况下,成功地将SMS的参数数量减少到1/320美元。

Article 236

Title@2025-07-30 (3): Deep learning of geometrical cell division rules

Title: Deep learning of geometrical cell division rules

Deep learning von geometrischen Zellteilungsregeln

深入学习几几何细胞分区规则 2507.22587v1

Authors (3): Alexandre Durrmeyer, Jean-Christophe Palauqui, Philippe Andrey

The positioning of new cellular walls during cell division plays a key role in shaping plant tissue organization. The influence of cell geometry on the positioning of division planes has been previously captured into various geometrical rules. Accordingly, linking cell shape to division orientation has relied on the comparison between observed division patterns and predictions under specific rules. The need to define a priori the tested rules is a fundamental limitation of this hypothesis-driven approach. As an alternative, we introduce a data-based approach to investigate the relation between cell geometry and division plane positioning, exploiting the ability of deep neural network to learn complex relationships across multidimensional spaces. Adopting an image-based cell representation, we show how division patterns can be learned and predicted from mother cell geometry using a UNet architecture modified to operate on cell masks. Using synthetic data and A. thaliana embryo cells, we evaluate the model performances on a wide range of diverse cell shapes and division patterns. We find that the trained model accounted for embryo division patterns that were previously irreconcilable under existing geometrical rules. Our work shows the potential of deep networks to understand cell division patterns and to generate new hypotheses on the control of cell division positioning.

细胞分裂期间新细胞墙的定位在形成植物组织中起着关键作用。细胞几何测量对分裂飞机定位的影响先前被记录在各种几何规则中。因此, 将细胞形状与分裂方向联系起来依赖于对观察到的分裂模式和根据具体规则作出的预测的比较。需要先验地界定经过测试的规则是这种假设驱动的方法的一个基本局限性。作为替代办法, 我们采用基于数据的方法来调查细胞几何和分裂平面定位之间的关系, 利用深神经网络的能力来学习跨多维空间的复杂关系。采用基于图像的细胞代表, 我们展示如何利用经修改的UNet结构从母细胞几何测量中学习和预测分裂模式, 使用合成数据和A. Thaliana胚胎细胞, 我们评估了多种不同的细胞形状和分裂模式的模型性能。我们发现,经过训练的模型考虑到胚胎分裂模式,而根据现有的几何规则,这些模式以前是无法调和的。我们的工作展示了深网络在理解细胞分裂模式和产生新的细胞分裂定位控制方面的潜力。

Article 237

Title@2025-07-30 (3): A Mean-Field Theory of $Θ$-Expectations

Title: A Mean-Field Theory of $Θ$-Expectations

Eine Mittlere-Feld-Theorie von $ ?$-Erwartungen

平均实地理论(美元-15美元)-预期 2507.22577v1

Authors (1): Qian Qi

The canonical theory of sublinear expectations, a foundation of stochastic calculus under ambiguity, is insensitive to the non-convex geometry of primitive uncertainty models. This paper develops a new stochastic calculus for a structured class of such non-convex models. We introduce a class of fully coupled Mean-Field Forward-Backward Stochastic Differential Equations where the BSDE driver is defined by a pointwise maximization over a law-dependent, non-convex set. Mathematical tractability is achieved via a uniform strong concavity assumption on the driver with respect to the control variable, which ensures the optimization admits a unique and stable solution. A central contribution is to establish the Lipschitz stability of this optimizer from primitive geometric and regularity conditions, which underpins the entire well-posedness theory. We prove local and global well-posedness theorems for the FBSDE system. The resulting valuation functional, the $\Theta$-Expectation, is shown to be dynamically consistent and, most critically, to violate the axiom of sub-additivity. This, along with its failure to be translation invariant, demonstrates its fundamental departure from the convex paradigm. This work provides a rigorous foundation for stochastic calculus under a class of non-convex, endogenous ambiguity.

亚线性期望的卡通理论是模糊不清的细线性微积分的基础,对于原始不确定性模型的非混凝土几何学来说,这一理论并不敏感。本文为结构化的非convex模型类结构化的模型开发了新的随机微积分。我们引入了一组完全结合的双向、前向、前向、背向、微分等等分法, 使 BSDE 驱动器在基于法律的非Convex 数据集上以点定位最大化来定义。由此产生的估值功能, $\ Thetal- Explaination, 通过在控制变量上对驱动器进行统一有力的精确假设来实现数学可感应。这确保优化能接受一个独特而稳定的解决方案。中心贡献是, 从原始的地理度和规律性条件中建立这一优化的利普施奇茨性稳定性, 支撑着整个有良好定位的理论。我们证明, 本地和全球有良好的受控的FBSDE 系统矩。由此得出的估值功能, $Thetualalalprealtitution, lax the calliverstitude, dical decaltistraltitude, facaltitude, vicaltitudealtitude, sucaltitude, sucalto the ax sucaltitudealtitudealtitude ax su su su su sualtistritaltistritalitaltito su subilticaltistecaltistevicildalticaltialticalticaltialtix。

Article 238

Title@2025-07-30 (3): MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models

Title: MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models

MLMC-basierte Ressourcenadäquatitätsbewertung mit aktiven Learning-Trained-Surrogate-Modellen

以MLMC为基础的基于MLMC的资源充足性评估,与积极学习、经过培训的代用模型进行资源充足性评估 2505.20930v2

Authors (2): Ruiqi Zhang, Simon H. Tindemans

Multilevel Monte Carlo (MLMC) is a flexible and effective variance reduction technique for accelerating reliability assessments of complex power system. Recently, data-driven surrogate models have been proposed as lower-level models in the MLMC framework due to their high correlation and negligible execution time once trained. However, in resource adequacy assessments, pre-labeled datasets are typically unavailable. For large-scale systems, the efficiency gains from surrogate models are often offset by the substantial time required for labeling training data. Therefore, this paper introduces a speed metric that accounts for training time in evaluating MLMC efficiency. Considering the total time budget is limited, a vote-by-committee active learning approach is proposed to reduce the required labeling calls. A case study demonstrates that, within a given computational budget, active learning in combination with MLMC can result in a substantial reduction variance.

多层次蒙特卡洛(MLMC)是加速复杂电力系统可靠性评估的一种灵活而有效的减少差异技术,最近,由于数据驱动代用模型具有高度相关性,培训后执行时间微不足道,因此在MLMC框架内提出了数据驱动代用模型,作为较低级别的模型;然而,在资源充足性评估中,通常没有预贴标签的数据集;对于大型系统而言,代用模型的效率收益往往被标明培训数据所需的大量时间所抵消;因此,本文件引入了计算评价MLMC效率培训时间的速度指标;考虑到总时间有限,建议采用逐个表决的积极学习方法,以减少所需的标签要求;一项案例研究表明,在特定计算预算内,与MLMC一起积极学习可能会导致大幅缩小差异。

Article 239

Title@2025-07-30 (3): Hyperbolic Graph Learning: A Comprehensive Review

Title: Hyperbolic Graph Learning: A Comprehensive Review

Hyperbolisches Graphenlernen: Eine umfassende Übersicht

超双曲图学习:全面审查 2202.13852v3

Authors (8): Menglin Yang, Min Zhou, Tong Zhang, Jiahong Liu, Zhihao Li, Lujia Pan, Hui Xiong, Irwin King

Graph representation learning in Euclidean space, despite its widespread adoption and proven utility in many domains, often struggles to effectively capture the inherent hierarchical and complex relational structures prevalent in real-world data, particularly for datasets exhibiting a highly non-Euclidean latent anatomy or power-law distributions. Hyperbolic geometry, with its constant negative curvature and exponential growth property, naturally accommodates such structures, offering a promising alternative for learning rich graph representations. This survey paper provides a comprehensive review of the rapidly evolving field of Hyperbolic Graph Learning (HGL). We systematically categorize and analyze existing methods broadly dividing them into (1) hyperbolic graph embedding-based techniques, (2) graph neural network-based hyperbolic models, and (3) emerging paradigms. Beyond methodologies, we extensively discuss diverse applications of HGL across multiple domains, including recommender systems, knowledge graphs, bioinformatics, and other relevant scenarios, demonstrating the broad applicability and effectiveness of hyperbolic geometry in real-world graph learning tasks. Most importantly, we identify several key challenges that serve as directions for advancing HGL, including handling complex data structures, developing geometry-aware learning objectives, ensuring trustworthy and scalable implementations, and integrating with foundation models, e.g., large language models. We highlight promising research opportunities in this exciting interdisciplinary area. A comprehensive repository can be found at https://github.com/digailab/awesome-hyperbolic-graph-learning.

在欧几里德空间进行图示学,尽管它被广泛采用,并且在许多领域被证明是有用的,但往往在努力有效捕捉现实世界数据中普遍存在的固有的等级和复杂的关系结构,特别是展示高度非欧几里德潜伏解剖学或动力法分布的数据集。超曲形几何学,其常态的曲线和指数增长属性自然会适应这种结构,为学习丰富的图示提供了一个有希望的替代办法。这份调查文件全面审查了超双曲线图解学(HGL)领域迅速演变的情况。我们系统地将现有方法分为:(1)超曲线图嵌入技术,(2)基于图形神经网络的超曲线模型,和(3)新出现的范式。除了方法外,我们广泛讨论HGL的多种应用,包括建议系统、知识图表、生物文理学和其他相关情景,展示超曲线地理测量学在现实世界图解学学习任务中的广泛适用性和有效性。最重要的是,我们确定了若干关键挑战,作为推进HABL图像嵌入技术领域的方向,包括处理复杂数据模型的大型基础,我们找到了可信任的地理模型。

Article 240

Title@2025-07-30 (3): COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

Title: COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

COOkeD: Ensemble-basierte OOD-Erkennung im Zeitalter von Zero-Shot CLIP

COOOKD:在零弹CLIP时代以组合为基础的OOOD探测 2507.22576v1

Authors (4): Galadrielle Humblot-Renaux, Gianni Franchi, Sergio Escalera, Thomas B. Moeslund

Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems as unknown classes may arise at test-time. OOD detection methods typically revolve around a single classifier, leading to a split in the research field between the classical supervised setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot setting (class names fed as prompts to CLIP). In both cases, an overarching challenge is that the OOD detection performance is implicitly constrained by the classifier’s capabilities on in-distribution (ID) data. In this work, we show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble - COOkeD combines the predictions of a closed-world classifier trained end-to-end on a specific dataset, a zero-shot CLIP classifier, and a linear probe classifier trained on CLIP image features. While bulky at first sight, this approach is modular, post-hoc and leverages the availability of pre-trained VLMs, thus introduces little overhead compared to training a single standard classifier. We evaluate COOkeD on popular CIFAR100 and ImageNet benchmarks, but also consider more challenging, realistic settings ranging from training-time label noise, to test-time covariate shift, to zero-shot shift which has been previously overlooked. Despite its simplicity, COOkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods. Code is available at https://github.com/glhr/COOkeD

在两种情况下,一个首要挑战是OOD检测性能受到分类员在分布(ID)数据上的能力的隐含限制。在这项工作中,我们表明,由于两端都有一些开放的观念,因此通过创建混杂的团团-COOkeD可以实现显著的OOD检测,而不是通过创建一个混杂的团团团-COOkeD, 将受过训练的封闭世界级团团团的预测结合到特定数据集、零点CLIP分类和受过CLIP零点图像学训练的线性探测器。虽然在第一眼中,OOOD检测性能受到分类师在分布(ID)数据上的能力的隐含限制。在这项工作中,我们显示,由于两端都略有开放的开放性体,ODD的检测性能显著,因此通过创建一个混杂的团团团团团团团团团团团-COOKD(COOD) ,从而引入了更具有挑战性的ODODO(C-C-C-C-C-GRO-C-C-C-SOL-C-SOL-C-SOL-C-C-C-C-C-C-C-C-C-C-C-ILVLVLVLD-C-S-S-ILD-C-C-C-C-S-C-C-C-C-C-C-C-C-C-S-S-S-SOL-C-C-C-C-C-C-C-C-C-ILOLOLOLOLOLOL-C-C-C-C-C-C-C-C-C-C-C-C-C-S-S-IL-IL-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-ID-C-C-C-C-IL-C-C-C-C-C-C-C-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-I

Article 241

Title@2025-07-30 (3): Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity

Title: Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity

Erklärung der tiefen Netzwerkklassifikation von Matrizen: Eine Fallstudie zur Monotonizität

解释母体深网络分类:单体性案例研究 2507.22570v1

Authors (2): Leandro Farina, Sergey Korotov

This work demonstrates a methodology for using deep learning to discover simple, practical criteria for classifying matrices based on abstract algebraic properties. By combining a high-performance neural network with explainable AI (XAI) techniques, we can distill a model’s learned strategy into human-interpretable rules. We apply this approach to the challenging case of monotone matrices, defined by the condition that their inverses are entrywise nonnegative. Despite their simple definition, an easy characterization in terms of the matrix elements or the derived parameters is not known. Here, we present, to the best of our knowledge, the first systematic machine-learning approach for deriving a practical criterion that distinguishes monotone from non-monotone matrices. After establishing a labelled dataset by randomly generated monotone and non-monotone matrices uniformly on $(-1,1)$, we employ deep neural network algorithms for classifying the matrices as monotone or non-monotone, using both their entries and a comprehensive set of matrix features. By saliency methods, such as integrated gradients, we identify among all features, two matrix parameters which alone provide sufficient information for the matrix classification, with $95\%$ accuracy, namely the absolute values of the two lowest-order coefficients, $c_0$ and $c_1$ of the matrix’s characteristic polynomial. A data-driven study of 18,000 random $7\times7$ matrices shows that the monotone class obeys $\lvert c_{0}/c_{1}\rvert\le0.18$ with probability $>99.98\%$; because $\lvert c_{0}/c_{1}\rvert = 1/\mathrm{tr}(A^{-1})$ for monotone $A$, this is equivalent to the simple bound $\mathrm{tr}(A^{-1})\ge5.7$.

这项工作展示了一种方法, 利用深度学习来发现基于抽象代数属性的简单、实用的矩阵分类标准。通过将高性能神经网络与可解释的 AI (XAI) 技术相结合, 我们可以将模型的学习策略提炼到人类解释规则中。我们用这种方法将单调矩阵这一具有挑战性的案例, 其定义是它们的反向是非负的。尽管它们有简单的定义, 以矩阵元素或衍生参数来进行简单描述。在此, 我们根据我们的知识, 第一个系统化的机器学习方法, 以得出一个区分单调与非单调的矩阵的实用标准。我们用随机生成的单调和非单调矩阵矩阵的数据集, 以$( 1, 1, 1) 内值为单调的神经网络算法, 使用它们的条目或全调的矩阵特征。通过诸如集成的梯度, 我们在所有特性中找到两个基调参数, 以 $_ 美元1 的绝对值表示 5 。

Article 242

Title@2025-07-30 (3): Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Title: Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Effizientes Differentielles Privates Feintuning von LLMs durch Verstärkungslernen

通过强化学习对LLMs 进行有区别的私人高效率私人罚款 2507.22565v1

Authors (5): Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen

The tension between data privacy and model utility has become the defining bottleneck for the practical deployment of large language models (LLMs) trained on sensitive corpora including healthcare. Differentially private stochastic gradient descent (DP-SGD) guarantees formal privacy, yet it does so at a pronounced cost: gradients are forcibly clipped and perturbed with noise, degrading sample efficiency and final accuracy. Numerous variants have been proposed to soften this trade-off, but they all share a handicap: their control knobs are hard-coded, global, and oblivious to the evolving optimization landscape. Consequently, practitioners are forced either to over-spend privacy budget in pursuit of utility, or to accept mediocre models in order to stay within privacy constraints. We present RLDP, the first framework to cast DP optimization itself as a closed-loop control problem amenable to modern deep reinforcement learning (RL). RLDP continuously senses rich statistics of the learning dynamics and acts by selecting fine-grained per parameter gradient-clipping thresholds as well as the magnitude of injected Gaussian noise. A soft actor-critic (SAC) hyper-policy is trained online during language model fine-tuning; it learns, from scratch, how to allocate the privacy budget where it matters and when it matters. Across more than 1,600 ablation experiments on GPT2-small, Llama-1B, Llama-3B, and Mistral-7B, RLDP delivers perplexity reductions of 1.3-30.5% (mean 5.4%) and an average 5.6% downstream utility gain. RLDP reaches each baseline’s final utility after only 13-43% of the gradient-update budget (mean speed-up 71%), all while honoring the same ($\epsilon$, $\delta$)-DP contract and exhibiting equal or lower susceptibility to membership-inference and canary-extraction attacks.

数据隐私和模型效用之间的紧张关系已成为实际部署大型语言模型(LLMS)(LLMs)(LLMs)(LLMs)(LLMs)(LLMs)(LLMS)(LLMM)(LLMM)(LLMM(DP-SGD(DP-SGD))(DP-SGD(DP-SGD))(DP-SGD)(DP-SGD)(DP-SGD)(DP-SGD)(DP-SGD(DP-SGD)(DP)(DP)(DP)(LLM)(PLLLLD)(T)(TLLLLD)(TF)(LLLLDDP(LD)(LLLD)(T)(SLLD)(LLLLDP)(S)(LLLD)(LLLD)(LP)(LD)(LO)(LDB(LD)(L)(LDRD)(LD)(IDRD)(L-I-ID)(SD)(LD)(LI-I-IDID)(S)(S)(S)(S-I(SD)(SDLI)(S)(S)(S)(S)(SLD)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(S)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(S)(L)(IDL)(L)(L)(L)(ID)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(

Article 243

Title@2025-07-30 (3): Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Title: Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Nutzung synergistischer Kognitiv-Biasen zur Umgehung der Sicherheit in LLMs

利用协同协同一致的双星体在LLM中用于绕过安全 2507.22564v1

Authors (5): Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu

Large Language Models (LLMs) demonstrate impressive capabilities across a wide range of tasks, yet their safety mechanisms remain susceptible to adversarial attacks that exploit cognitive biases – systematic deviations from rational judgment. Unlike prior jailbreaking approaches focused on prompt engineering or algorithmic manipulation, this work highlights the overlooked power of multi-bias interactions in undermining LLM safeguards. We propose CognitiveAttack, a novel red-teaming framework that systematically leverages both individual and combined cognitive biases. By integrating supervised fine-tuning and reinforcement learning, CognitiveAttack generates prompts that embed optimized bias combinations, effectively bypassing safety protocols while maintaining high attack success rates. Experimental results reveal significant vulnerabilities across 30 diverse LLMs, particularly in open-source models. CognitiveAttack achieves a substantially higher attack success rate compared to the SOTA black-box method PAP (60.1% vs. 31.6%), exposing critical limitations in current defense mechanisms. These findings highlight multi-bias interactions as a powerful yet underexplored attack vector. This work introduces a novel interdisciplinary perspective by bridging cognitive science and LLM safety, paving the way for more robust and human-aligned AI systems.

大型语言模型(LLMS)展示了在一系列广泛任务中令人印象深刻的能力,然而,其安全机制仍然容易受到利用认知偏差 – – 系统偏离理性判断的系统偏差 – – 的对抗性攻击。与以往侧重于迅速工程或算法操纵的侵入性做法不同,这项工作凸显了在破坏LLM保障措施方面多偏见互动的被忽视力量。我们提议了CognitiveAttack,这是一个新型的红色组合框架,系统地利用个人和综合认知偏差。通过整合受监督的微调和强化学习,CognitiveAttack生成了闪烁,将最佳偏差组合嵌入其中,有效绕过安全协议,同时保持高攻击成功率。实验结果揭示了30个不同LMSM的显著脆弱性,特别是在开放源模型中。ConnitiveAtack实现了比SOTA黑箱方法PAP(60.1%对31.6%)要高得多的攻击成功率,暴露了当前防御机制的关键限制。这些发现突出了多偏见相互作用,作为强大但未被充分利用的攻击矢控矢量媒介。这项工作通过连接认知科学和LLM安全系统,为新的学科视角。

Article 244

Title@2025-07-30 (3): VAR: Visual Analysis for Rashomon Set of Machine Learning Models’ Performance

Title: VAR: Visual Analysis for Rashomon Set of Machine Learning Models’ Performance

VAR: Visuelle Analyse der Leistungsfähigkeit von Rashomon-Modellen

VAR: Rashomon系列机器学习模型的视觉分析 2507.22556v1

Authors (1): Yuanzhe Jin

Evaluating the performance of closely matched machine learning(ML) models under specific conditions has long been a focus of researchers in the field of machine learning. The Rashomon set is a collection of closely matched ML models, encompassing a wide range of models with similar accuracies but different structures. Traditionally, the analysis of these sets has focused on vertical structural analysis, which involves comparing the corresponding features at various levels within the ML models. However, there has been a lack of effective visualization methods for horizontally comparing multiple models with specific features. We propose the VAR visualization solution. VAR uses visualization to perform comparisons of ML models within the Rashomon set. This solution combines heatmaps and scatter plots to facilitate the comparison. With the help of VAR, ML model developers can identify the optimal model under specific conditions and better understand the Rashomon set’s overall characteristics.

长期以来,评估在特定条件下密切匹配的机器学习模型(ML)的性能一直是机器学习领域研究人员的一个重点。Rashomon集集收集了紧密匹配的ML模型,其中包括一系列类似但结构不同的模型,传统上,对这些数据集的分析侧重于纵向结构分析,这涉及比较ML模型不同层次的相应特征。然而,缺乏有效的可视化方法来横向比较具有具体特征的多个模型。我们提议VAR可视化解决方案。VAR利用可视化方法在Rashomon集中比较ML模型。这一解决方案将热图和分散图结合起来,以便于比较。在VAR的帮助下,ML模型开发者可以在特定条件下确定最佳模型,并更好地了解Rashomon集的总体特征。

Article 245

Title@2025-07-30 (3): DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology

Title: DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology

DeepC4: Deep Conditional Census-Constrained Clustering für großflächige Multitasking-Spatiale Disaggregation der Urbanen Morphologie

深层C4:深入有条件的人口普查 – – 为大规模多任务城市病理学多任务空间分解进行有约束的集群 2507.22554v1

Authors (3): Joshua Dimasaka, Christian Geiß, Emily So

To understand our global progress for sustainable development and disaster risk reduction in many developing economies, two recent major initiatives - the Uniform African Exposure Dataset of the Global Earthquake Model (GEM) Foundation and the Modelling Exposure through Earth Observation Routines (METEOR) Project - implemented classical spatial disaggregation techniques to generate large-scale mapping of urban morphology using the information from various satellite imagery and its derivatives, geospatial datasets of the built environment, and subnational census statistics. However, the local discrepancy with well-validated census statistics and the propagated model uncertainties remain a challenge in such coarse-to-fine-grained mapping problems, specifically constrained by weak and conditional label supervision. Therefore, we present Deep Conditional Census-Constrained Clustering (DeepC4), a novel deep learning-based spatial disaggregation approach that incorporates local census statistics as cluster-level constraints while considering multiple conditional label relationships in a joint multitask learning of the patterns of satellite imagery. To demonstrate, compared to GEM and METEOR, we enhanced the quality of Rwandan maps of urban morphology, specifically building exposure and physical vulnerability, at the third-level administrative unit from the 2022 census. As the world approaches the conclusion of our global frameworks in 2030, our work has offered a new deep learning-based mapping technique towards a spatial auditing of our existing coarse-grained derived information at large scales.

为了了解我们在许多发展中国家实现可持续发展和减少灾害风险方面的全球进展,最近采取了两项主要举措,即全球地震模型基金会的统一非洲接触数据集和通过对地观测网(METEOR)项目进行模拟接触模型,实施了经典空间分类技术,利用来自各种卫星图像及其衍生物、建筑环境的地理空间数据集和国家以下各级人口普查统计数据等信息,对城市形态进行大规模测绘;然而,与经过充分验证的人口普查统计数据和传播的模型不确定性之间的地方差异,在这种从粗到粗的、特别受薄弱和有条件标签监督制约的绘图问题中,仍然是一项挑战;因此,我们介绍了深度有条件普查-有限制的集群(DiepC4)项目,这是一种新型的深层次基于学习的空间分类方法,将地方普查统计数据作为组级制约因素纳入其中,同时在对卫星图像模式进行多任务联合学习时考虑多种有条件的标签关系;但是,与GEM和METEOR相比,我们提高了卢旺达城市形态图的质量,特别是建立暴露和物理脆弱性,这特别受到薄弱和有条件的标签监督。

Article 246

Title@2025-07-30 (3): RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning

Title: RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning

RainbowPrompt: Diversity-enhanced Prompt-Evolving für kontinuierliches Lernen

” 彩虹 “ 方案:加强多样性,为继续学习迅速发展 2507.22553v1

Authors (3): Kiseong Hong, Gyeong-hyeon Kim, Eunwoo Kim

Prompt-based continual learning provides a rehearsal-free solution by tuning small sets of parameters while keeping pre-trained models frozen. To meet the complex demands of sequential tasks, it is crucial to integrate task-specific knowledge within prompts effectively. However, existing works rely on either fixed learned prompts (i.e., prompts whose representations remain unchanged during new task learning) or on prompts generated from an entangled task-shared space, limiting the representational diversity of the integrated prompt. To address this issue, we propose a novel prompt-evolving mechanism to adaptively aggregate base prompts (i.e., task-specific prompts) into a unified prompt while ensuring diversity. By transforming and aligning base prompts, both previously learned and newly introduced, our approach continuously evolves accumulated knowledge to facilitate learning new tasks. We further introduce a learnable probabilistic gate that adaptively determines which layers to activate during the evolution process. We validate our method on image classification and video action recognition tasks in class-incremental learning, achieving average gains of 9.07% and 7.40% over existing methods across all scenarios.

以快速为基础的持续学习通过调整小型参数提供无排练的解决方案,同时将经过预先训练的模型冻结下来。为了满足连续任务的复杂需求,必须有效地将具体任务的知识纳入迅速的迅速之中。然而,现有的工程依靠固定的学习提示(即在新任务学习期间其表现没有变化的提示)或由交织的任务共享空间产生的提示,限制综合提示的代表性多样性。为了解决这一问题,我们提议建立一个新的快速发展机制,以适应性综合基础提示(即任务特定提示)为统一的快速机制,同时确保多样性。通过改造和调整基础提示(包括以前学习的和新引进的提示),我们的方法不断发展积累的知识,以便利学习新任务。我们进一步引入一个可学习的概率性门,以适应性决定进化过程中要激活的层次。我们验证我们在课堂学习中图像分类和视频动作识别任务的方法,在所有情景中实现平均9.07%和7.40%比现有方法平均增加9.7%和7.4%。

Article 247

Title@2025-07-30 (3): Thermodynamics-Inspired Computing with Oscillatory Neural Networks for Inverse Matrix Computation

Title: Thermodynamics-Inspired Computing with Oscillatory Neural Networks for Inverse Matrix Computation

Thermodynamik-inspiriertes Rechnen mit oszillatorischen Neuronalen Netzwerken für Inverse Matrix Computation

由热动力-受热力启发的计算,与用于反向矩阵计算算法的观测神经网络 2507.22544v1

Authors (3): George Tsormpatzoglou, Filip Sabo, Aida Todri-Sanial

We describe a thermodynamic-inspired computing paradigm based on oscillatory neural networks (ONNs). While ONNs have been widely studied as Ising machines for tackling complex combinatorial optimization problems, this work investigates their feasibility in solving linear algebra problems, specifically the inverse matrix. Grounded in thermodynamic principles, we analytically demonstrate that the linear approximation of the coupled Kuramoto oscillator model leads to the inverse matrix solution. Numerical simulations validate the theoretical framework, and we examine the parameter regimes that computation has the highest accuracy.

我们描述了基于血管神经网络的热动力激励计算模式(ONNs ) 。虽然对ONNs作为解决复杂组合优化问题的Ising机器已经进行了广泛研究,但这项工作调查了它们解决线性代数问题的可行性,特别是反向矩阵。我们以热力原理为基础,分析表明,结合的Kuramoto振动模型的线性近距离导致逆向矩阵解决方案。数字模拟验证了理论框架,我们研究了计算最精确的参数系统。

Article 248

Title@2025-07-30 (3): Pre-trained Models Perform the Best When Token Distributions Follow Zipf’s Law

Title: Pre-trained Models Perform the Best When Token Distributions Follow Zipf’s Law

Vortrainierte Modelle führen das Beste aus, wenn Token-Distributionen Zipfs Gesetz folgen

事先培训的模型按照Zipf法在配制时最佳表现 2507.22543v1

Authors (3): Yanjin He, Qingkai Zeng, Meng Jiang

Tokenization is a fundamental step in natural language processing (NLP) and other sequence modeling domains, where the choice of vocabulary size significantly impacts model performance. Despite its importance, selecting an optimal vocabulary size remains underexplored, typically relying on heuristics or dataset-specific choices. In this work, we propose a principled method for determining the vocabulary size by analyzing token frequency distributions through Zipf’s law. We show that downstream task performance correlates with how closely token distributions follow power-law behavior, and that aligning with Zipfian scaling improves both model efficiency and effectiveness. Extensive experiments across NLP, genomics, and chemistry demonstrate that models consistently achieve peak performance when the token distribution closely adheres to Zipf’s law, establishing Zipfian alignment as a robust and generalizable criterion for vocabulary size selection.

本地化是自然语言处理( NLP) 和其他序列建模域中的一个基本步骤, 词汇大小的选择对模型性能有重大影响。尽管它很重要, 选择最优词汇大小仍然未得到充分探索, 通常依赖超自然学或数据集特定选择。在这项工作中, 我们提出一个原则性方法, 通过 Zipf 法则分析象征性频率分布来确定词汇大小。我们显示下游任务性能与符号分布与权力法行为之间的密切关联, 与齐普菲安缩放相匹配既能提高模型性能, 也能提高模型性能。跨国家语言、基因组学和化学的广泛实验表明, 当象征性分布与 Zipf 法的严格一致时, 模式始终能达到峰值, 从而将齐普菲亚校准作为选择词汇大小的强有力和通用标准。

Article 249

Title@2025-07-30 (3): A surrogate model for topology optimisation of elastic structures via parametric autoencoders

Title: A surrogate model for topology optimisation of elastic structures via parametric autoencoders

Ein Surrogatmodell zur Topologieoptimierung von elastischen Strukturen über parametrische Autoencoder

通过参数自动电解器使弹性结构在地形学上优化的替代模型 2507.22539v1

Authors (2): Matteo Giacomini, Antonio Huerta

A surrogate-based topology optimisation algorithm for linear elastic structures under parametric loads and boundary conditions is proposed. Instead of learning the parametric solution of the state (and adjoint) problems or the optimisation trajectory as a function of the iterations, the proposed approach devises a surrogate version of the entire optimisation pipeline. First, the method predicts a quasi-optimal topology for a given problem configuration as a surrogate model of high-fidelity topologies optimised with the homogenisation method. This is achieved by means of a feed-forward net learning the mapping between the input parameters characterising the system setup and a latent space determined by encoder/decoder blocks reducing the dimensionality of the parametric topology optimisation problem and reconstructing a high-dimensional representation of the topology. Then, the predicted topology is used as an educated initial guess for a computationally efficient algorithm penalising the intermediate values of the design variable, while enforcing the governing equations of the system. This step allows the method to correct potential errors introduced by the surrogate model, eliminate artifacts, and refine the design in order to produce topologies consistent with the underlying physics. Different architectures are proposed and the approximation and generalisation capabilities of the resulting models are numerically evaluated. The quasi-optimal topologies allow to outperform the high-fidelity optimiser by reducing the average number of optimisation iterations by $53\%$ while achieving discrepancies below $4\%$ in the optimal value of the objective functional, even in the challenging scenario of testing the model to extrapolate beyond the training and validation domain.

在参数负荷和边界条件下,提出了线性弹性结构的基于代位表层优化算法。建议的方法不是作为迭代函数来学习状态(和交错)问题或优化轨迹的参数性解决方案,而是设计整个优化管道的代位版本。首先,该方法预测了特定问题配置的准最佳地形学,作为与均匀方法相优化的高纤维性表层结构的替代模型。这是通过向上净额学习系统设置和由编码/脱coder区块确定的潜在空间之间的输入选择参数的映射实现的。该方法不仅没有将系统设置的特征和优化轨迹的优化轨迹化轨迹作为函数性解决方案的参数性解决方案,而是设计出一个半最佳的地形学模型,用来对设计变量的中间值进行优化,同时执行系统下调方方程式。这一步骤使得由系统设置的输入的输入的系统性选择性参数与由编码/解析码组合组合确定的潜在空间之间的映射,结果是:在最高一级测试中,在最高一级测试中,通过稳定性机级模型进行排序后,在总体结构上进行升级后,最高一级评估,将产生最高一级分析,从而实现最高一级评估。

Article 250

Title@2025-07-30 (3): Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

Title: Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

Accident-Driven Congestion Prediction and Simulation: Ein erklärbares Framework mit Advanced Clustering und Bayesian Networks

意外 – – 发生时的拥挤预测和模拟:使用先进集束和贝耶斯网络的可解释框架 2507.22529v1

Authors (3): Kranthi Kumar Talluri, Galia Weidl, Vaishnavi Kasuluru

Traffic congestion due to uncertainties, such as accidents, is a significant issue in urban areas, as the ripple effect of accidents causes longer delays, increased emissions, and safety concerns. To address this issue, we propose a robust framework for predicting the impact of accidents on congestion. We implement Automated Machine Learning (AutoML)-enhanced Deep Embedding Clustering (DEC) to assign congestion labels to accident data and predict congestion probability using a Bayesian Network (BN). The Simulation of Urban Mobility (SUMO) simulation is utilized to evaluate the correctness of BN predictions using evidence-based scenarios. Results demonstrate that the AutoML-enhanced DEC has outperformed traditional clustering approaches. The performance of the proposed BN model achieved an overall accuracy of 95.6%, indicating its ability to understand the complex relationship of accidents causing congestion. Validation in SUMO with evidence-based scenarios demonstrated that the BN model’s prediction of congestion states closely matches those of SUMO, indicating the high reliability of the proposed BN model in ensuring smooth urban mobility.

由于事故等不确定因素造成的交通拥堵是城市地区的一个重大问题,因为事故的连锁效应导致更长时间的延误、排放增加和安全关切。为了解决这一问题,我们提出了一个强有力的框架来预测事故对拥堵的影响。我们实施了自动机修养(Automle)增强的深嵌嵌入集群(DEC),以便用巴伊西亚网络(BN)为事故数据分配交通拥堵标签,预测拥堵概率。模拟城市流动(SUMO)用来评估基于证据的假设情景对BN预测的正确性。结果显示Automle增强的DEC已经超过了传统的集群方法。拟议的BN模型的性能总体精确度达到了95.6%,表明它有能力理解引起拥堵的事故的复杂关系。在SUMO中验证基于证据的情景表明,BN模型对拥堵状态的预测与SUMO的预测非常接近,表明拟议的BN模型在确保城市平稳的流动性方面非常可靠。

Article 251

Title@2025-07-30 (3): FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

Title: FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

FGFP: Ein zerbrechlicher Gaußfilter und Pruning für tiefe neurale Netzwerke Kompression

FGFP: 一个分数高斯过滤器和深神经网络压缩 2507.22527v1

Authors (4): Kuan-Ting Tu, Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien

Network compression techniques have become increasingly important in recent years because the loads of Deep Neural Networks (DNNs) are heavy for edge devices in real-world applications. While many methods compress neural network parameters, deploying these models on edge devices remains challenging. To address this, we propose the fractional Gaussian filter and pruning (FGFP) framework, which integrates fractional-order differential calculus and Gaussian function to construct fractional Gaussian filters (FGFs). To reduce the computational complexity of fractional-order differential operations, we introduce Gr"unwald-Letnikov fractional derivatives to approximate the fractional-order differential equation. The number of parameters for each kernel in FGF is minimized to only seven. Beyond the architecture of Fractional Gaussian Filters, our FGFP framework also incorporates Adaptive Unstructured Pruning (AUP) to achieve higher compression ratios. Experiments on various architectures and benchmarks show that our FGFP framework outperforms recent methods in accuracy and compression. On CIFAR-10, ResNet-20 achieves only a 1.52% drop in accuracy while reducing the model size by 85.2%. On ImageNet2012, ResNet-50 achieves only a 1.63% drop in accuracy while reducing the model size by 69.1%.

近年来,深神经网络的压缩技术变得日益重要,因为深神经网络(DNNS)的负荷在现实世界应用中对于边缘设备来说是巨大的。虽然许多方法压缩神经网络参数,但在边缘设备上部署这些模型仍然具有挑战性。为了解决这个问题,我们提议了分数高斯过滤和剪裁框架(FGPP),它整合了分级分级差异计算和高森功能,以构建分级高斯过滤器(FGFs)。为了减少分级分级差异操作的计算复杂性,我们引入了Gr\“unwald-Letnikov”分级衍生物,以接近分级差异方程式等。在FGFP中,将每个内核的参数数量减少到仅7个。除了分级高官过滤器的架构外,我们的FGFP框架还包含调控非结构化的普鲁斯(AUP)功能,以构建更高压缩比例。对各种架构和基准的实验显示,我们的FGFPP框架在精确度和压缩方面优于最近的方法。在CIFAR-10上,ResNet2-20的精确度只能降低比例。

Article 252

Title@2025-07-30 (3): HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

Title: HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

HGCN(O): Ein selbsttätiges GCN-Hypermodell-Toolkit zur Vorhersage der Ergebnisse in Ereignis-Sequenzdaten

HGCN(O):关于事件序列数据结果预测的自发GCN超模工具箱 2507.22524v1

Authors (3): Fang Wang, Paolo Ceravolo, Ernesto Damiani

We propose HGCN(O), a self-tuning toolkit using Graph Convolutional Network (GCN) models for event sequence prediction. Featuring four GCN architectures (O-GCN, T-GCN, TP-GCN, TE-GCN) across the GCNConv and GraphConv layers, our toolkit integrates multiple graph representations of event sequences with different choices of node- and graph-level attributes and in temporal dependencies via edge weights, optimising prediction accuracy and stability for balanced and unbalanced datasets. Extensive experiments show that GCNConv models excel on unbalanced data, while all models perform consistently on balanced data. Experiments also confirm the superior performance of HGCN(O) over traditional approaches. Applications include Predictive Business Process Monitoring (PBPM), which predicts future events or states of a business process based on event logs.

我们提议采用图变网络模型进行事件序列预测的自调工具包HGCN(O),这是用于事件序列预测的自调工具包,在GCN、T-GCN、TP-GCN、TE-GCN横跨GCN和Greaph Conv两层的4个GCN结构(O-GCN、T-GCN、TP-GCN、TE-GCN)中,我们的工具包将多图式事件序列的表示方式与不同的节点和图级属性选择以及以边缘重量为根据的时间依赖性,优化均衡和不平衡数据集的预测准确性和稳定性。广泛的实验表明,GCNCon模式优于不平衡的数据,而所有模型都一致使用平衡的数据。实验还证实了HGCN(O)优于传统方法,应用包括预测业务流程监测(PBPPM),根据事件日志预测未来事件或业务流程状况。

Article 253

Title@2025-07-30 (3): Rethinking Individual Fairness in Deepfake Detection

Title: Rethinking Individual Fairness in Deepfake Detection

Individuelle Fairness in Deepfake Detection neu denken

重新思考个人在深假探测中的公平性 2507.14326v2

Authors (4): Aryana Hou, Li Lin, Justin Li, Shu Hu

Generative AI models have substantially improved the realism of synthetic media, yet their misuse through sophisticated DeepFakes poses significant risks. Despite recent advances in deepfake detection, fairness remains inadequately addressed, enabling deepfake markers to exploit biases against specific populations. While previous studies have emphasized group-level fairness, individual fairness (i.e., ensuring similar predictions for similar individuals) remains largely unexplored. In this work, we identify for the first time that the original principle of individual fairness fundamentally fails in the context of deepfake detection, revealing a critical gap previously unexplored in the literature. To mitigate it, we propose the first generalizable framework that can be integrated into existing deepfake detectors to enhance individual fairness and generalization. Extensive experiments conducted on leading deepfake datasets demonstrate that our approach significantly improves individual fairness while maintaining robust detection performance, outperforming state-of-the-art methods. The code is available at https://github.com/Purdue-M2/Individual-Fairness-Deepfake-Detection.

尽管最近在深假检测方面有所进展,但公平性仍未得到充分处理,从而使得深假标记能够利用对特定人群的偏见。虽然以往的研究强调群体一级的公平性,但个人公平(即确保类似个人的类似预测)在很大程度上仍未得到探讨。在这项工作中,我们第一次发现,个人公平原则在深假检测方面根本失败,揭示了文献中以前尚未探讨的临界差距。为了减轻这一差距,我们提议了第一个可纳入现有深假探测器的普遍框架,以加强个人公平和普遍化。关于主要深假数据集的广泛实验表明,我们的方法在保持强健的检测性能的同时,大大改进了个人公平性,超前状态方法。代码可在 https://github.com/Purduead-M2/Indidual-Fairness-Deepfake-Revecition上查阅。

Article 254

Title@2025-07-30 (3): SmilesT5: Domain-specific pretraining for molecular language models

Title: SmilesT5: Domain-specific pretraining for molecular language models

SmilesT5: Domainspezifische Vorausbildung für molekulare Sprachmodelle

微笑T5:具体领域分子语言模型预培训 2507.22514v1

Authors (3): Philip Spence, Brooks Paige, Anne Osbourn

Molecular property prediction is an increasingly critical task within drug discovery and development. Typically, neural networks can learn molecular properties using graph-based, language-based or feature-based methods. Recent advances in natural language processing have highlighted the capabilities of neural networks to learn complex human language using masked language modelling. These approaches to training large transformer-based deep learning models have also been used to learn the language of molecules, as represented by simplified molecular-input line-entry system (SMILES) strings. Here, we present novel domain-specific text-to-text pretraining tasks that yield improved performance in six classification-based molecular property prediction benchmarks, relative to both traditional likelihood-based training and previously proposed fine-tuning tasks. Through ablation studies, we show that data and computational efficiency can be improved by using these domain-specific pretraining tasks. Finally, the pretrained embeddings from the model can be used as fixed inputs into a downstream machine learning classifier and yield comparable performance to finetuning but with much lower computational overhead.

分子财产预测是药物发现和开发中一项越来越关键的任务。一般来说,神经网络可以使用基于图表、语言或基于特征的方法来学习分子特性。自然语言处理方面的最近进展突出了神经网络利用隐形语言模型学习复杂的人类语言的能力。这些培训大型变压器深层学习模型的方法也被用来学习分子的语言,如简化分子-投入线输入系统(SMILES)的字符串。在这里,我们介绍了新的特定域文本到文字的培训前任务,这些任务在六个基于分类的分子财产预测基准方面,相对于传统的基于可能性的培训和以前提议的微调任务,都提高了绩效。我们通过通货膨胀研究,表明通过使用这些特定领域的训练前任务,数据和计算效率可以提高。最后,模型中预先训练的嵌入可用作下游机器学习分类的固定投入,并产生可比的性能,以微调但低得多的计算间接费用。

Article 255

Title@2025-07-30 (3): AlphaDent: A dataset for automated tooth pathology detection

Title: AlphaDent: A dataset for automated tooth pathology detection

AlphaDent: Ein Datensatz für automatisierte Zahnpathologie-Erkennung

AlphaDent:用于自动检测牙齿病理学的数据集 2507.22512v1

Authors (8): Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V. Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, Aleksandr Y. Romanov

In this article, we present a new unique dataset for dental research - AlphaDent. This dataset is based on the DSLR camera photographs of the teeth of 295 patients and contains over 1200 images. The dataset is labeled for solving the instance segmentation problem and is divided into 9 classes. The article provides a detailed description of the dataset and the labeling format. The article also provides the details of the experiment on neural network training for the Instance Segmentation problem using this dataset. The results obtained show high quality of predictions. The dataset is published under an open license; and the training/inference code and model weights are also available under open licenses.

在本篇文章中,我们为牙科研究提供了一套新的独特的数据集 – – AlphaDent。该数据集基于德国航天中心对295名病人牙齿的相机照片,包含1 200多张图像。该数据集的标签用于解决例分解问题,分为9类。该文章详细描述了数据集和标签格式。该文章还详细介绍了使用该数据集为例分解问题进行神经网络培训的实验细节。获得的结果显示预测质量很高。该数据集以公开许可证形式发布;培训/推断代码和模型重量也以公开许可证形式提供。

Article 256

Title@2025-07-30 (3): Geometry of nonlinear forecast reconciliation

Title: Geometry of nonlinear forecast reconciliation

Geometrie der nichtlinearen Vorhersageabgleichung

非线性预测对账的几何测量 2507.22500v1

Authors (3): Lorenzo Nespoli, Anubhab Biswas, Vasco Medici

Forecast reconciliation, an ex-post technique applied to forecasts that must satisfy constraints, has been a prominent topic in the forecasting literature over the past two decades. Recently, several efforts have sought to extend reconciliation methods to the probabilistic settings. Nevertheless, formal theorems demonstrating error reduction in nonlinear contexts, analogous to those presented in Panagiotelis et al.(2021), are still lacking. This paper addresses that gap by establishing such theorems for various classes of nonlinear hypersurfaces and vector-valued functions. Specifically, we derive an exact analog of Theorem 3.1 from Panagiotelis et al.(2021) for hypersurfaces with constant-sign curvature. Additionally, we provide probabilistic guarantees for the broader case of hypersurfaces with non-constant-sign curvature and for general vector-valued functions. To support reproducibility and practical adoption, we release a JAX-based Python package, \emph{to be released upon publication}, implementing the presented theorems and reconciliation procedures.

预测调节是一种事后技术,应用于必须满足限制条件的预测,在过去二十年中一直是预测文献中的一个突出主题。最近,一些努力试图将调节方法扩大到概率环境,然而,与Panagiotelis等人(2021年)所介绍的理论类似,在非线性环境中仍然缺乏正式的理论,表明减少非线性环境中的误差。本文通过为非线性超表层和矢量价值函数等各种类型的非线性超表层建立这种理论来弥补这一差距。具体地说,我们从Panagiotelis等人(2021年)获得关于具有恒定曲线的超表层的理论3.1的精确比喻。此外,我们为超表层的无一致标定法和一般矢量值功能提供了概率保障。为了支持再生和实际采用,我们发行了基于JAX的Python软件包,\emph{将在出版时发布,执行所提出的理论和调节程序。

Article 257

Title@2025-07-30 (3): LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning

Title: LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning

LoReUn: Daten selbst stellen implizit Cues zur Verbesserung des maschinellen Lernens zur Verfügung

LOUUU:数据本身不言而喻地提供了改进机器脱学的库子 2507.22499v1

Authors (4): Xiang Li, Qianli Shen, Haonan Wang, Kenji Kawaguchi

Recent generative models face significant risks of producing harmful content, which has underscored the importance of machine unlearning (MU) as a critical technique for eliminating the influence of undesired data. However, existing MU methods typically assign the same weight to all data to be forgotten, which makes it difficult to effectively forget certain data that is harder to unlearn than others. In this paper, we empirically demonstrate that the loss of data itself can implicitly reflect its varying difficulty. Building on this insight, we introduce Loss-based Reweighting Unlearning (LoReUn), a simple yet effective plug-and-play strategy that dynamically reweights data during the unlearning process with minimal additional computational overhead. Our approach significantly reduces the gap between existing MU methods and exact unlearning in both image classification and generation tasks, effectively enhancing the prevention of harmful content generation in text-to-image diffusion models.

最近的基因模型面临产生有害内容的巨大风险,这突出表明了机器不学习作为消除不理想数据影响的关键技术的重要性,然而,现有的MU方法通常对有待遗忘的所有数据赋予同等的份量,这使得难以有效地忘记某些比其他数据更难解开的数据。在本文件中,我们从经验上证明,数据丢失本身可以隐含地反映其不同的困难。我们根据这一认识,引入了基于损失的重新加权不学习(LoReun),这是一种简单而有效的插件和玩耍战略,在未学习过程中动态地对数据进行重新加权,并尽可能增加计算间接费用。我们的方法大大缩小了现有MU方法与在图像分类和生成任务中准确的不学习之间的差距,有效地加强了在文本到图像传播模型中有害内容生成的预防工作。

Article 258

Title@2025-07-30 (3): Reconstructing Historical Climate Fields With Deep Learning

Title: Reconstructing Historical Climate Fields With Deep Learning

Rekonstruieren von historischen Klimafeldern mit tiefem Lernen

重建历史气候领域与深学习 2311.18348v2

Authors (4): Nils Bochow, Anna Poltronieri, Martin Rypdal, Niklas Boers

Historical records of climate fields are often sparse due to missing measurements, especially before the introduction of large-scale satellite missions. Several statistical and model-based methods have been introduced to fill gaps and reconstruct historical records. Here, we employ a recently introduced deep-learning approach based on Fourier convolutions, trained on numerical climate model output, to reconstruct historical climate fields. Using this approach we are able to realistically reconstruct large and irregular areas of missing data, as well as reconstruct known historical events such as strong El Ni~no and La Ni~na with very little given information. Our method outperforms the widely used statistical kriging method as well as other recent machine learning approaches. The model generalizes to higher resolutions than the ones it was trained on and can be used on a variety of climate fields. Moreover, it allows inpainting of masks never seen before during the model training.

气候领域的历史记录往往由于缺少测量而少之又少,特别是在采用大规模卫星飞行任务之前。我们采用了几种统计和模型方法来填补空白并重建历史记录。在这里,我们采用了最近采用的基于Fourier Convolutions的深层次学习方法,对数字气候模型输出进行了培训,对历史气候领域进行了重建。我们利用这一方法能够现实地重建大量和不规则的缺失数据领域,并以很少提供的信息来重建强大的El Nino和La Nina等已知的历史事件。我们的方法超过了广泛使用的统计克里金方法以及其他最近的机器学习方法。该模型概括了比它所培训的更高分辨率,并且可以用于各种气候领域。此外,它还允许在模型培训期间对以前从未见过的面具进行油漆。

Article 259

Title@2025-07-30 (3): Emergence of Quantised Representations Isolated to Anisotropic Functions

Title: Emergence of Quantised Representations Isolated to Anisotropic Functions

Entstehung quantifizierter Repräsentationen isoliert mit anisotropen Funktionen

孤立到非尼斯代职能的量化代表的出现情况 2507.12070v2

Authors (1): George Bird

This paper presents a novel methodology for determining representational alignment, which builds upon the existing Spotlight Resonance method. Particularly, this new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study in which only the activation function is altered. Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. This confirms the hypothesis: algebraic symmetries of network primitives can carry unintended inductive biases which produce task-independent artefactual structures in representations. The discrete symmetry of contemporary forms is shown to be a strong predictor for the induction of discrete representations transformed from otherwise continuous structures – a quantisation effect. This motivates further reassessment of functional forms in common usage. Moreover, this supports a general causal model for one mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and possibly Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide insights into emergent interpretability research. Finally, preliminary results indicate that quantisation of representations appears to correlate with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.

本文展示了一种用于确定代表比对的新方法, 该方法以现有的可见光共振法为基础。特别是, 这一新工具被用于通过一个控制化活化功能仅改变激活功能的反动研究, 深入了解离散的表示方式如何出现和组织在自动编码模型中。使用这一技术, 函数驱动的对称是否可产生隐含的感应偏差的正确性得到确定。当激活功能通过一种离散的直线变异- 等离异性对称法来定义时, 代表的表达方式往往离散。相反, 在连续的易变异性代表形式下, 它们仍然在连续的易变异性代表形式下持续存在。这证实了一种假设: 网络原始的变异性对称可能带有意外的内向偏差偏差偏差, 产生任务独立的表达方式。现代形式的离异性对现代形式的描述方式的离异性对从其他连续结构结构变异的离异性表示强烈的预测力 – 一种量化效果。这促使对功能的变异性形式进行进一步的重新评估, 在共同用途中, 直径变正变的正变的正态分析中, , 可能形成一种直径变的直径变的模型 , 、直变的对立性表示方式、、、直变的直变的直变的、、直变的、直变的、变变的变的变变变的变的变的变的变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变

Article 260

Title@2025-07-30 (3): Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

Title: Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

Eigentumsverifizierung von DNN-Modellen mit White-Box-Adversarial-Angriffen mit spezifizierter Wahrscheinlichkeitsmanipulation

DNN 使用白毒对反对反对性袭击模式进行指定概率操纵的DNN自有性核查 2505.17579v3

Authors (5): Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Isobe, Eisuke Koizumi

In this paper, we propose a novel framework for ownership verification of deep neural network (DNN) models for image classification tasks. It allows verification of model identity by both the rightful owner and third party without presenting the original model. We assume a gray-box scenario where an unauthorized user owns a model that is illegally copied from the original model, provides services in a cloud environment, and the user throws images and receives the classification results as a probability distribution of output classes. The framework applies a white-box adversarial attack to align the output probability of a specific class to a designated value. Due to the knowledge of original model, it enables the owner to generate such adversarial examples. We propose a simple but effective adversarial attack method based on the iterative Fast Gradient Sign Method (FGSM) by introducing control parameters. Experimental results confirm the effectiveness of the identification of DNN models using adversarial attack.

在本文中,我们提出了一个用于图像分类任务的深神经网络模型(DNN)所有权核实的新框架。它允许合法所有人和第三方核查模型身份,而无需展示原始模型。我们假设了一个灰色框情景,即未经授权的用户拥有一种从原始模型中非法复制的模型,在云层环境中提供服务,用户投掷图像并接收分类结果,作为输出等级的概率分布。框架应用白箱对抗性攻击来将特定类别输出概率与指定值相匹配。由于对原始模型的了解,它使所有者能够生成这种对抗性实例。我们提出一种简单而有效的对抗性攻击方法,其依据是迭代快速重力信号方法(FGSM), 其方法是引入控制参数。实验结果证实了使用对抗性攻击确定 DNN模型的有效性。

Article 261

Title@2025-07-30 (3): Probing Information Distribution in Transformer Architectures through Entropy Analysis

Title: Probing Information Distribution in Transformer Architectures through Entropy Analysis

Probing Information Distribution in Transformer-Architekturen durch Entropie-Analyse

通过 Entropy 分析在变形结构中进行测试信息发布 2507.15347v2

Authors (5): Amedeo Buonanno, Alessandro Rivetti, Francesco A. N. Palmieri, Giovanni Di Gennaro, Gianmarco Romano

This work explores entropy analysis as a tool for probing information distribution within Transformer-based architectures. By quantifying token-level uncertainty and examining entropy patterns across different stages of processing, we aim to investigate how information is managed and transformed within these models. As a case study, we apply the methodology to a GPT-based large language model, illustrating its potential to reveal insights into model behavior and internal representations. This approach may offer insights into model behavior and contribute to the development of interpretability and evaluation frameworks for transformer-based models

这项工作探索了作为在以变压器为基础的结构内进行信息传播的检验工具的酶分析。通过量化象征性的不确定性和审查不同处理阶段的酶型态,我们的目标是调查如何在这些模型内管理和转换信息。作为案例研究,我们将该方法应用到以GPT为基础的大型语言模型中,说明其揭示对模型行为和内部表现的洞察力的潜力。这一方法可以提供对模型行为的洞察力,并有助于为以变压器为基础的模型制定可解释性和评估框架。

Article 262

Title@2025-07-30 (3): LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

Title: LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

LVM-GP: Unsicherer PDE Solver über Kopplung latent variables Modell und Gaußschen Prozess

LVM-GP:通过混合潜潜伏变量模型和Gaussian过程的不确定性-软件PDE溶解器 2507.22493v1

Authors (6): Xiaodong Feng, Ling Guo, Xiaoliang Wan, Hao Wu, Tao Zhou, Wenwen Zhou

We propose a novel probabilistic framework, termed LVM-GP, for uncertainty quantification in solving forward and inverse partial differential equations (PDEs) with noisy data. The core idea is to construct a stochastic mapping from the input to a high-dimensional latent representation, enabling uncertainty-aware prediction of the solution. Specifically, the architecture consists of a confidence-aware encoder and a probabilistic decoder. The encoder implements a high-dimensional latent variable model based on a Gaussian process (LVM-GP), where the latent representation is constructed by interpolating between a learnable deterministic feature and a Gaussian process prior, with the interpolation strength adaptively controlled by a confidence function learned from data. The decoder defines a conditional Gaussian distribution over the solution field, where the mean is predicted by a neural operator applied to the latent representation, allowing the model to learn flexible function-to-function mapping. Moreover, physical laws are enforced as soft constraints in the loss function to ensure consistency with the underlying PDE structure. Compared to existing approaches such as Bayesian physics-informed neural networks (B-PINNs) and deep ensembles, the proposed framework can efficiently capture functional dependencies via merging a latent Gaussian process and neural operator, resulting in competitive predictive accuracy and robust uncertainty quantification. Numerical experiments demonstrate the effectiveness and reliability of the method.

我们提出了一个新的概率框架,称为LVM-GP,用于在用繁琐的数据解决前方和反向部分差异方程(PDEs)时进行不确定性量化。核心理念是建立一个从输入到高维潜在代表的随机映射,从而能够对解决方案进行具有不确定性的预测。具体地说,该架构包括一个具有信心的摄像头和概率解码器。该架构基于一个高斯进程(LVM-GP),采用了一个高维潜伏变量模型,其中潜在代表性是通过一个可学习的确定性特征和先前的高斯进程之间的间插而构建的。其核心理念是:从输入到一个高维潜在代表,从输入到一个高维潜在代表,从输入到高维潜在代表,由此而来的内空分析力力,比比现有的稳健健可操作性、稳健可操作性、稳健性机能性机率的内核定位网络。

Article 263

Title@2025-07-30 (3): Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

Title: Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

Proto-EVFL: Verbessertes vertikales Federated Learning über Dual Prototype mit extrem ungebundenen Daten

EVFL:通过具有极不匹配数据的双重原型强化垂直联邦学习 2507.22488v1

Authors (10): Wei Guo, Yiyang Duan, Zhaojun Hu, Yiqi Tong, Fuzhen Zhuang, Xiao Zhang, Jin Dong, Ruofan Wu, Tengfei Liu, Yifan Sun

In vertical federated learning (VFL), multiple enterprises address aligned sample scarcity by leveraging massive locally unaligned samples to facilitate collaborative learning. However, unaligned samples across different parties in VFL can be extremely class-imbalanced, leading to insufficient feature representation and limited model prediction space. Specifically, class-imbalanced problems consist of intra-party class imbalance and inter-party class imbalance, which can further cause local model bias and feature contribution inconsistency issues, respectively. To address the above challenges, we propose Proto-EVFL, an enhanced VFL framework via dual prototypes. We first introduce class prototypes for each party to learn relationships between classes in the latent space, allowing the active party to predict unseen classes. We further design a probabilistic dual prototype learning scheme to dynamically select unaligned samples by conditional optimal transport cost with class prior probability. Moreover, a mixed prior guided module guides this selection process by combining local and global class prior probabilities. Finally, we adopt an \textit{adaptive gated feature aggregation strategy} to mitigate feature contribution inconsistency by dynamically weighting and aggregating local features across different parties. We proved that Proto-EVFL, as the first bi-level optimization framework in VFL, has a convergence rate of 1/\sqrt T. Extensive experiments on various datasets validate the superiority of our Proto-EVFL. Even in a zero-shot scenario with one unseen class, it outperforms baselines by at least 6.97%

在纵向联合学习(VFL)中,多个企业通过利用当地不结盟的大量样本,促进协作学习,解决了匹配的样本稀缺问题。然而,VFL中不同党派的不匹配样本可能极有可能达到分级平衡,导致特征代表不足和模型预测空间有限。具体地说,类平衡问题包括党内阶级不平衡和党间阶级不平衡,这可能会进一步导致地方模式偏差和贡献不一致问题。为了应对上述挑战,我们建议采用通过双重原型强化的VFL框架。我们首先为每个当事方引入了分类原型,以学习潜空各阶层之间的关系,让活跃的一方能够预测看不见的阶级。我们进一步设计了一种稳妥的双重原型学习计划,以便以有条件的最佳运输成本和先前的概率来动态选择不匹配的样本。此外,一个混合的先前指导模块通过将地方和全球阶级先前的概率合并来指导这一选择进程。最后,我们采用了一种text{adapitedadflor gread gread great greal 战略来减轻特征的不一致性。我们证明Proto-EVLlevloralbal 的模型是第一个水平的升级框架,在IFLFLFLFLFLMBBBBBBBBBBBBBBBBBRBRBRBR 上, 。我们以第一个水平上, 和BBBBBRBRBRBBBBBBBBBBBBBBBBBBBB 。我们以第一个水平上, 。

Article 264

Title@2025-07-30 (3): Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

Title: Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

Konvergenzeigenschaften der natürlichen Gradientenablassung zur Minimierung der KL-Divergenz

最小化 KL 差异的自然渐变源的趋同属性 2504.19259v2

Authors (2): Adwait Datar, Nihat Ay

The Kullback-Leibler (KL) divergence plays a central role in probabilistic machine learning, where it commonly serves as the canonical loss function. Optimization in such settings is often performed over the probability simplex, where the choice of parameterization significantly impacts convergence. In this work, we study the problem of minimizing the KL divergence and analyze the behavior of gradient-based optimization algorithms under two dual coordinate systems within the framework of information geometry$-$ the exponential family ($\theta$ coordinates) and the mixture family ($\eta$ coordinates). We compare Euclidean gradient descent (GD) in these coordinates with the coordinate-invariant natural gradient descent (NGD), where the natural gradient is a Riemannian gradient that incorporates the intrinsic geometry of the underlying statistical model. In continuous time, we prove that the convergence rates of GD in the $\theta$ and $\eta$ coordinates provide lower and upper bounds, respectively, on the convergence rate of NGD. Moreover, under affine reparameterizations of the dual coordinates, the convergence rates of GD in $\eta$ and $\theta$ coordinates can be scaled to $2c$ and $\frac{2}{c}$, respectively, for any $c>0$, while NGD maintains a fixed convergence rate of $2$, remaining invariant to such transformations and sandwiched between them. Although this suggests that NGD may not exhibit uniformly superior convergence in continuous time, we demonstrate that its advantages become pronounced in discrete time, where it achieves faster convergence and greater robustness to noise, outperforming GD. Our analysis hinges on bounding the spectrum and condition number of the Hessian of the KL divergence at the optimum, which coincides with the Fisher information matrix.

Kullback- Leiber (KL) 差异在概率机器学习中起着核心作用, 通常可以作为卡通损函数。在这种环境下, 优化效果往往在概率简单度上进行, 参数化的选择会显著影响趋同。在这项工作中, 我们研究如何在两个双协调系统的框架内, 最大限度地缩小KL差异, 分析基于梯度优化算法的行为。在两个双协调系统的框架内, 指数家族( 美元坐标) 和混合物组合( 美元坐标) 。我们把这些坐标中的 Euclidean 梯度下降( GD) 与协调- 不稳定性自然梯度下降( NDD) 进行对比。在这种情况下, 自然梯度是Riemannian梯度的梯度, 包含基本统计模型的内在地理测量。在连续的时间里, 我们证明了GD在 $ 和 $ 美元的趋同率上的趋同率, 在 NGGGD 的趋同率上, 和美元的趋同率可能更低和上, 。在美元美元的趋同值上, KGDGD 的趋同率和的趋同率以美元美元的趋同率以以美元美元以美元以美元以美元以美元美元以美元以美元以美元美元美元以美元美元以以以以美元美元以美元美元以以的的的美元美元的的以的的以以以以以的以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以以

Article 265

Title@2025-07-30 (3): The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications

Title: The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications

Die Kugel-Proximal (=”Broximal”) Punktmethode: ein neuer Algorithmus, Konvergenztheorie und Anwendungen

Ball- Proximal (=“ 布鲁克马” ) 点法: 新的算法、趋同理论和应用 2502.02002v2

Authors (4): Kaja Gruntkowska, Hanmin Li, Aadi Rane, Peter Richtárik

Non-smooth and non-convex global optimization poses significant challenges across various applications, where standard gradient-based methods often struggle. We propose the Ball-Proximal Point Method, Broximal Point Method, or Ball Point Method (BPM) for short - a novel algorithmic framework inspired by the classical Proximal Point Method (PPM) (Rockafellar, 1976), which, as we show, sheds new light on several foundational optimization paradigms and phenomena, including non-convex and non-smooth optimization, acceleration, smoothing, adaptive stepsize selection, and trust-region methods. At the core of BPM lies the ball-proximal (“broximal”) operator, which arises from the classical proximal operator by replacing the quadratic distance penalty by a ball constraint. Surprisingly, and in sharp contrast with the sublinear rate of PPM in the nonsmooth convex regime, we prove that BPM converges linearly and in a finite number of steps in the same regime. Furthermore, by introducing the concept of ball-convexity, we prove that BPM retains the same global convergence guarantees under weaker assumptions, making it a powerful tool for a broader class of potentially non-convex optimization problems. Just like PPM plays the role of a conceptual method inspiring the development of practically efficient algorithms and algorithmic elements, e.g., gradient descent, adaptive step sizes, acceleration (Ahn & Sra, 2020), and “W” in AdamW (Zhuang et al., 2022), we believe that BPM should be understood in the same manner: as a blueprint and inspiration for further development.

在各种应用中,标准的梯度选择和信任区域方法往往会挣扎。我们提出“球-精度点法 ” 、 “布罗西马点法 ” 或“球点法 ” (BPM) , 简称为“球度法 ” ( PPM) ( Rokafellar, 1976年) 。古典的普罗西亚点法( PPM) 启发了一种新型的算法框架。正如我们所显示的那样,它为一些基础优化范式和现象提供了新的亮点,包括非电流和非脉冲优化、加速、加速、平稳、适应性级选择和信任区域方法。在 BPM 的核心是“ 球- 精度法 ” ( Broximalpoint ) 操作者的核心是“ Brox ” 操作者, 以球质约束取代四边距离法。令人惊讶的是, 与非光谱 convex 制度下的PMPM 子率率率率相比, 我们证明BPM 将线性、的精度的精度的精度的精度进进进进进进进进进进进度、、、、的进进进进进进进进进进进制的进度、的进进进进进进进制的进制的进制的进制的进进进制、的进制、、的进进进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进。

Article 266

Title@2025-07-30 (3): Visual Language Models as Zero-Shot Deepfake Detectors

Title: Visual Language Models as Zero-Shot Deepfake Detectors

Visuelle Sprachmodelle als Zero-Shot Deepfake Detektoren

视觉语言模型,作为零热深假探测器 2507.22469v1

Authors (1): Viacheslav Pirogov

The contemporary phenomenon of deepfakes, utilizing GAN or diffusion models for face swapping, presents a substantial and evolving threat in digital media, identity verification, and a multitude of other systems. The majority of existing methods for detecting deepfakes rely on training specialized classifiers to distinguish between genuine and manipulated images, focusing only on the image domain without incorporating any auxiliary tasks that could enhance robustness. In this paper, inspired by the zero-shot capabilities of Vision Language Models, we propose a novel VLM-based approach to image classification and then evaluate it for deepfake detection. Specifically, we utilize a new high-quality deepfake dataset comprising 60,000 images, on which our zero-shot models demonstrate superior performance to almost all existing methods. Subsequently, we compare the performance of the best-performing architecture, InstructBLIP, on the popular deepfake dataset DFDC-P against traditional methods in two scenarios: zero-shot and in-domain fine-tuning. Our results demonstrate the superiority of VLMs over traditional classifiers.

当代的深假现象,利用GAN或扩散模型进行面部转换,在数字媒体、身份验证和许多其他系统中构成了一种不断发展的巨大威胁。发现深假的现有方法大多依靠培训专业分类师来区分真实的和被操纵的图像,仅侧重于图像域,而没有纳入任何能够增强强力的辅助任务。在本文中,在视觉语言模型零射能力启发下,我们提出了一个基于VLM的图像分类新颖方法,然后对图像分类进行评估,以便进行深假检测。具体地说,我们使用由60 000个图像组成的新的高品质深假数据集,我们零射模型在其中展示出优于几乎所有现有方法的性能。随后,我们比较了最佳结构(指示BLIP)的性能,即流行的深假数据数据集DDC-P与两种情景的传统方法:零射和内部微调。我们的结果显示VLMS优于传统分类师。

Article 267

Title@2025-07-30 (3): Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework

Title: Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework

Auf dem Weg zu einer interpretierbaren Renal Health-Prognose über Multi-LMM-Kollaboratives Reasoning-Framework

通过多伦多和多伦多MM合作理由框架,迈向可解释性中时健康下降预测 2507.22464v1

Authors (6): Peng-Yi Wu, Pei-Cing Huang, Ting-Yu Chen, Chantung Ku, Ming-Yen Lin, Yihuang Kang

Accurate and interpretable prediction of estimated glomerular filtration rate (eGFR) is essential for managing chronic kidney disease (CKD) and supporting clinical decisions. Recent advances in Large Multimodal Models (LMMs) have shown strong potential in clinical prediction tasks due to their ability to process visual and textual information. However, challenges related to deployment cost, data privacy, and model reliability hinder their adoption. In this study, we propose a collaborative framework that enhances the performance of open-source LMMs for eGFR forecasting while generating clinically meaningful explanations. The framework incorporates visual knowledge transfer, abductive reasoning, and a short-term memory mechanism to enhance prediction accuracy and interpretability. Experimental results show that the proposed framework achieves predictive performance and interpretability comparable to proprietary models. It also provides plausible clinical reasoning processes behind each prediction. Our method sheds new light on building AI systems for healthcare that combine predictive accuracy with clinically grounded interpretability.

对估计球状过滤率(eGFR)的准确和可解释的预测对于管理慢性肾病(CKD)和支持临床决定至关重要。大型多式模型(LMMs)最近的进展表明,由于其处理视觉和文字信息的能力,临床预测任务具有巨大潜力。然而,与部署成本、数据隐私和模型可靠性有关的挑战妨碍了其采用。在本研究中,我们提议了一个合作框架,以提高用于eGFR预报的开放源LMs的性能,同时产生具有临床意义的解释。框架包括视觉知识转移、诱拐推理和短期记忆机制,以提高预测的准确性和可解释性。实验结果显示,拟议的框架实现了预测性业绩和可与专利模型相比的可解释性。它还提供了各种预测背后的可信的临床推理过程。我们的方法为建立将预测性准确性和基于临床的解释性结合起来的AI系统提供了新的思路。

Article 268

Title@2025-07-30 (3): SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

Title: SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

SDBA: Ein steter und langlebiger Hintertürangriff im Federated Learning

SDBA: 联邦学习中的隐秘和长期持久的后门攻击 2409.14805v2

Authors (4): Minyeong Choe, Cheolhee Park, Changho Seo, Hyunil Kim

Federated learning is a promising approach for training machine learning models while preserving data privacy. However, its distributed nature makes it vulnerable to backdoor attacks, particularly in NLP tasks, where related research remains limited. This paper introduces SDBA, a novel backdoor attack mechanism designed for NLP tasks in federated learning environments. Through a systematic analysis across LSTM and GPT-2 models, we identify the most vulnerable layers for backdoor injection and achieve both stealth and long-lasting durability by applying layer-wise gradient masking and top-k% gradient masking. Also, to evaluate the task generalizability of SDBA, we additionally conduct experiments on the T5 model. Experiments on next-token prediction, sentiment analysis, and question answering tasks show that SDBA outperforms existing backdoors in terms of durability and effectively bypasses representative defense mechanisms, demonstrating notable performance in transformer-based models such as GPT-2. These results highlight the urgent need for robust defense strategies in NLP-based federated learning systems.

联邦学习是一种在保护数据隐私的同时培训机器学习模式的很有希望的方法,然而,其分布性使其易受后门攻击,特别是在国家实验室项目的相关研究仍然有限的国家实验室任务中。本文介绍了SDBA,这是在联邦学习环境中为NLP任务设计的新颖的后门攻击机制。通过对LSTM和GPT-2模型的系统分析,我们确定了最易被利用的后门注射层,并通过应用从层到层的梯度遮罩和顶到梯度遮罩来实现隐性和长期耐性。此外,为了评估SDBA的任务的可概括性,我们还就T5模型进行了实验。关于后方预测、情绪分析和回答问题的实验表明,SDBA在耐久性和有效绕过具有代表性的防御机制方面超越了现有的后门,这表明了GPT-2等基于变型模型的显著表现。这些结果突出表明,迫切需要在基于NLPF的联邦学习系统中制定强有力的防御战略。

Article 269

Title@2025-07-30 (3): Trajectory First: A Curriculum for Discovering Diverse Policies

Title: Trajectory First: A Curriculum for Discovering Diverse Policies

Trajektorie zuerst: Ein Curriculum für die Entdeckung unterschiedlicher Politiken

轨迹第一:发现多样化政策课程 2506.01568v2

Authors (3): Cornelius V. Braun, Sayantan Auddy, Marc Toussaint

Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.

能够以不同方式解决一项任务,使代理人更能应付变化,更不易受到当地选择的影响。在这方面,限制多样性优化已成为一个强大的强化学习框架,可以同时培训各种各样的代理人。然而,现有的限制多样性RL方法往往在诸如机器人操纵等复杂任务中爆炸不足,导致政策多样性的缺乏。因此,为了改进RL的多样性优化,我们提议了一个课程,首先在轨迹层次上探索,然后学习渐进式政策。在经验评估中,我们对基于技能的多样性优化的缺陷提供了新的洞察力,并用经验证明我们的课程改善了所学技能的多样性。

Article 270

Title@2025-07-30 (3): Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

Title: Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

Strategische Integration der Künstlichen Intelligenz in die C-Suite: Die Rolle des Chief AI Officer

C. 人造情报在C-实物中的战略整合:AI首席干事的作用 2407.10247v2

Authors (1): Marc Schmitt

The integration of Artificial Intelligence (AI) into corporate strategy has become critical for organizations seeking to maintain a competitive advantage in the digital age. As AI transforms business models, operations, and decision-making, the need for dedicated executive leadership to guide, govern, and orchestrate this transformation becomes increasingly evident. This paper examines emerging future scenarios across three domains: the AI Economy, the AI Organization, and Competition in the Age of AI. These domains reveal environmental, structural, and strategic tensions that existing C-suite roles struggle to resolve. In response, the paper develops a theory-informed framework for the Chief AI Officer (CAIO), outlining the distinct functions and capabilities required to guide and govern AI at scale. Drawing on illustrative cases and emerging practice, this conceptualization clarifies the CAIOs unique role within the executive landscape and presents a forward-looking research agenda. This paper advances the discourse on AI leadership by offering a theory-driven rationale for the strategic integration of AI at the executive level and by positioning the Chief AI Officer as a distinct and necessary role within modern organizations.

将人工智能(AI)纳入公司战略对寻求在数字时代保持竞争优势的组织至关重要。随着AI转变了商业模式、业务和决策,因此越来越明显地需要专门的行政领导来指导、管理和安排这种转变。本文件审查了三个领域的新的未来情景:AI经济、AI组织和AI时代的竞争。这些领域揭示了现有C角色难以解决的环境、结构和战略紧张局势。作为回应,该文件为AI首席干事制定了一个理论知情的框架,概述了在规模上指导和管理AI所需的不同职能和能力。这一概念化根据案例和新出现的实践,澄清了CAIO在行政领域的独特作用,并提出了前瞻性研究议程。本文件通过为AI在行政层面的战略整合提供理论驱动的理由,并将首席AI干事定位为现代组织中的独特和必要作用,从而推进了AI领导的讨论。

Article 271

Title@2025-07-30 (3): A case for data valuation transparency via DValCards

Title: A case for data valuation transparency via DValCards

Ein Fall für Datenbewertungstransparenz über DValCards

通过 DValCards 提高数据估价透明度的一个案例 2506.23349v2

Authors (2): Keziah Naggita, Julienne LaChance

Following the rise in popularity of data-centric machine learning (ML), various data valuation methods have been proposed to quantify the contribution of each datapoint to desired ML model performance metrics (e.g., accuracy). Beyond the technical applications of data valuation methods (e.g., data cleaning, data acquisition, etc.), it has been suggested that within the context of data markets, data buyers might utilize such methods to fairly compensate data owners. Here we demonstrate that data valuation metrics are inherently biased and unstable under simple algorithmic design choices, resulting in both technical and ethical implications. By analyzing 9 tabular classification datasets and 6 data valuation methods, we illustrate how (1) common and inexpensive data pre-processing techniques can drastically alter estimated data values; (2) subsampling via data valuation metrics may increase class imbalance; and (3) data valuation metrics may undervalue underrepresented group data. Consequently, we argue in favor of increased transparency associated with data valuation in-the-wild and introduce the novel Data Valuation Cards (DValCards) framework towards this aim. The proliferation of DValCards will reduce misuse of data valuation metrics, including in data pricing, and build trust in responsible ML systems.

在以数据为中心的机器学习(ML)越来越受欢迎之后,提出了各种数据评价方法,以量化每个数据点对理想的ML模型性能指标的贡献(例如,准确性);除了数据评价方法的技术应用(例如,数据清理、数据获取等)之外,还提出在数据市场的范围内,数据购买者可能利用这些方法来公平补偿数据拥有者;在这里,我们证明在简单的算法设计选择下,数据评价指标具有内在的偏向性和不稳定性,从而产生技术和道德影响;通过分析9个表格分类数据集和6个数据评价方法,我们说明(1) 共同和廉价的数据预处理技术如何能够大幅度改变估计数据价值;(2) 通过数据评价指标进行子抽样抽样,可能会增加分类不平衡;(3) 数据评价指标可能低估代表性不足的群体数据;因此,我们赞成增加与在网上进行数据评价有关的透明度,并为此目的采用新的数据估价卡框架;DValCard的泛滥将减少数据估价指标的滥用,包括在数据定价方面,并在ML系统中建立负责任的信任。

Article 272

Title@2025-07-30 (3): Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

Title: Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

Breaking Obfuscation: Cluster-Aware Graph mit LLM-gestützte Erholung für bösartige JavaScript-Erkennung

打破困惑:利用LLM辅助回收利用LLM的集束器图,用于恶意爪哇Script探测 2507.22447v1

Authors (8): Zhihong Liang, Xin Wang, Zhenhuang Hu, Liangliang Song, Lin Chen, Jingjing Guo, Yanbin Wang, Ye Tian

With the rapid expansion of web-based applications and cloud services, malicious JavaScript code continues to pose significant threats to user privacy, system integrity, and enterprise security. But, detecting such threats remains challenging due to sophisticated code obfuscation techniques and JavaScript’s inherent language characteristics, particularly its nested closure structures and syntactic flexibility. In this work, we propose DeCoda, a hybrid defense framework that combines large language model (LLM)-based deobfuscation with code graph learning: (1) We first construct a sophisticated prompt-learning pipeline with multi-stage refinement, where the LLM progressively reconstructs the original code structure from obfuscated inputs and then generates normalized Abstract Syntax Tree (AST) representations; (2) In JavaScript ASTs, dynamic typing scatters semantically similar nodes while deeply nested functions fracture scope capturing, introducing structural noise and semantic ambiguity. To address these challenges, we then propose to learn hierarchical code graph representations via a Cluster-wise Graph that synergistically integrates graph transformer network, node clustering, and node-to-cluster attention to simultaneously capture both local node-level semantics and global cluster-induced structural relationships from AST graph. Experimental results demonstrate that our method achieves F1-scores of 94.64% and 97.71% on two benchmark datasets, demonstrating absolute improvements of 10.74% and 13.85% over state-of-the-art baselines. In false-positive control evaluation at fixed FPR levels (0.0001, 0.001, 0.01), our approach delivers 4.82, 5.91, and 2.53 higher TPR respectively compared to the best-performing baseline. These results highlight the effectiveness of LLM-based deobfuscation and underscore the importance of modeling cluster-level relationships in detecting malicious code.

随着基于网络的应用程序和云服务的迅速扩展,恶意 JavaScript 代码继续给用户隐私、系统完整性和企业安全构成重大威胁。但是,由于先进的代码模糊技术和JavaScript的固有语言特征,特别是其嵌入的封闭结构和合成灵活性,发现这些威胁仍然具有挑战性。在这项工作中,我们提议DeCoda是一个混合防御框架,将大型语言模型(LLLM)基于的deobfuscation与代码图解学习:(1) 我们首先建造一个精密的快速学习管道,并进行多阶段的完善,让LLM从模糊的投入中逐步重建原代码结构结构结构结构结构结构结构结构,随后在JavaScript ASTs, 动态打字将精密的断裂范围缩小,引入结构性的噪音和语义模糊性。为了应对这些挑战,我们提议通过一组化的状态控制模型学习高级代码图示,将错误的变形器网络、节流式组合、节流式组合、节流流式的递化的RM-在5-ST的基质模型上同时展示了我们的数据级模型水平,同时展示了Olation-limaxl-lial-maxl-maxal-maxal-maxal-laus laxal-maxal-laxal-laxxxxxxxxxxxxxxx。

Article 273

Title@2025-07-30 (3): RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function

Title: RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function

RCR-AF: Verbesserung der Modellverallgemeinerung durch Rademacher-Komplexitätsreduktions-Aktivierungsfunktion

RCR-AF:通过雷德马赫赫减少复杂程度减少激活功能加强示范性一般化 2507.22446v1

Authors (4): Yunrui Yu, Kafeng Wang, Hang Su, Jun Zhu

Despite their widespread success, deep neural networks remain critically vulnerable to adversarial attacks, posing significant risks in safety-sensitive applications. This paper investigates activation functions as a crucial yet underexplored component for enhancing model robustness. We propose a Rademacher Complexity Reduction Activation Function (RCR-AF), a novel activation function designed to improve both generalization and adversarial resilience. RCR-AF uniquely combines the advantages of GELU (including smoothness, gradient stability, and negative information retention) with ReLU’s desirable monotonicity, while simultaneously controlling both model sparsity and capacity through built-in clipping mechanisms governed by two hyperparameters, $\alpha$ and $\gamma$. Our theoretical analysis, grounded in Rademacher complexity, demonstrates that these parameters directly modulate the model’s Rademacher complexity, offering a principled approach to enhance robustness. Comprehensive empirical evaluations show that RCR-AF consistently outperforms widely-used alternatives (ReLU, GELU, and Swish) in both clean accuracy under standard training and in adversarial robustness within adversarial training paradigms.

尽管取得了广泛成功,但深心神经网络仍然极易受到对抗性攻击的伤害,对安全敏感应用构成重大风险。本文件将激活功能作为加强模型坚固度的关键但探索不足的部分来调查。我们提议了Rademacher 复杂度减少活化功能(RCR-AF),这是旨在改进一般化和对抗性抗御能力的一种新颖的启动功能。RCR-AF独有地将GELU的优势(包括光滑、梯度稳定性和消极信息保留)与RELU可取的单一性结合起来,同时通过标准培训的清洁准确性和在对抗性强势培训范式内的对抗性强度,同时控制模型溢出和能力(RELU、GELU和Swish)。我们基于Rademacher复杂度的理论分析表明,这些参数直接调整了模型的Rademacher复杂性,提供了加强稳健性的原则性方法。全面的经验评估表明,RCR-AF在标准培训和对抗性强性培训范式中,在清洁准确性培训中始终超越广泛使用的替代品(ReLU、GLU和Swish)。

Article 274

Title@2025-07-30 (3): RANA: Robust Active Learning for Noisy Network Alignment

Title: RANA: Robust Active Learning for Noisy Network Alignment

RANA: Robustes aktives Lernen für geräuschreiche Netzwerkausrichtung

RANA: 大力积极学习,促进吵闹网络对齐 2507.22434v1

Authors (6): Yixuan Nan, Xixun Lin, Yanmin Shang, Zhuofan Li, Can Zhao, Yanan Cao

Network alignment has attracted widespread attention in various fields. However, most existing works mainly focus on the problem of label sparsity, while overlooking the issue of noise in network alignment, which can substantially undermine model performance. Such noise mainly includes structural noise from noisy edges and labeling noise caused by human-induced and process-driven errors. To address these problems, we propose RANA, a Robust Active learning framework for noisy Network Alignment. RANA effectively tackles both structure noise and label noise while addressing the sparsity of anchor link annotations, which can improve the robustness of network alignment models. Specifically, RANA introduces the proposed Noise-aware Selection Module and the Label Denoising Module to address structural noise and labeling noise, respectively. In the first module, we design a noise-aware maximization objective to select node pairs, incorporating a cleanliness score to address structural noise. In the second module, we propose a novel multi-source fusion denoising strategy that leverages model and twin node pairs labeling to provide more accurate labels for node pairs. Empirical results on three real-world datasets demonstrate that RANA outperforms state-of-the-art active learning-based methods in alignment accuracy. Our code is available at https://github.com/YXNan0110/RANA.

然而,大多数现有工作主要侧重于标签宽度问题,而忽略网络连接中的噪音问题,这可能会大大削弱模型性能。这种噪音主要包括来自噪音边缘的结构噪音和由人为和过程驱动错误引起的标签噪音。为了解决这些问题,我们建议RANA, 即一个活跃网络对齐的强劲活跃学习框架。RANA有效地解决结构噪音和标签噪音,同时解决锚链接说明的广度问题,这可以提高网络对齐模式的稳健性。具体来说,RANA采用了拟议的噪音-观测选择模块和Label Denoising模块,分别解决结构噪音和标签噪音。在第一个模块中,我们设计一个噪音觉悟最大化目标,选择节点配对,纳入清洁度评分,以解决结构噪音。在第二个模块中,我们提出了一个新的多源混合战略,利用模型和双节配对标签,为无对配对提供更准确的标签。在三个真实世界的数据校准中,我们用AMS01/ANDA系统的现有方法显示,我们现有的RANA-A系统校正。

Article 275

Title@2025-07-30 (3): Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems

Title: Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems

Vergleich der Normalisierungsströme mit der Schätzung der Kerneldichte bei der Schätzung des Risikos Automatisierter Fahrsysteme

在估计自动驱动系统的风险时,将正常流动与内核密度量估计值与内核密度量的标准化对比 2507.22429v1

Authors (3): Erwin de Gelder, Maren Buermann, Olaf Op den Camp

The development of safety validation methods is essential for the safe deployment and operation of Automated Driving Systems (ADSs). One of the goals of safety validation is to prospectively evaluate the risk of an ADS dealing with real-world traffic. Scenario-based assessment is a widely-used approach, where test cases are derived from real-world driving data. To allow for a quantitative analysis of the system performance, the exposure of the scenarios must be accurately estimated. The exposure of scenarios at parameter level is expressed using a Probability Density Function (PDF). However, assumptions about the PDF, such as parameter independence, can introduce errors, while avoiding assumptions often leads to oversimplified models with limited parameters to mitigate the curse of dimensionality. This paper considers the use of Normalizing Flows (NF) for estimating the PDF of the parameters. NF are a class of generative models that transform a simple base distribution into a complex one using a sequence of invertible and differentiable mappings, enabling flexible, high-dimensional density estimation without restrictive assumptions on the PDF’s shape. We demonstrate the effectiveness of NF in quantifying risk and risk uncertainty of an ADS, comparing its performance with Kernel Density Estimation (KDE), a traditional method for non-parametric PDF estimation. While NF require more computational resources compared to KDE, NF is less sensitive to the curse of dimensionality. As a result, NF can improve risk uncertainty estimation, offering a more precise assessment of an ADS’s safety. This work illustrates the potential of NF in scenario-based safety. Future work involves experimenting more with using NF for scenario generation and optimizing the NF architecture, transformation types, and training hyperparameters to further enhance their applicability.

安全验证方法的开发对于自动驱动系统的安全部署和运行至关重要。安全验证的目标之一是对用于真实世界交通的ADS的风险进行前瞻性评估。基于情景的评估是一种广泛使用的方法,测试案例来自真实世界驱动数据。为了对系统性能进行定量分析,必须准确估计假设情景的暴露情况。在参数一级对情景的暴露情况使用一个可见度密度值的估算值来表示。但是,对PDF的假设,如参数独立性,可能会引入错误,而同时避免假设往往导致简化模型的过度简化,且有有限的参数来减轻维度的诅咒。本文认为,使用标准化流程(NFF)来估算参数的参数。NF是将简单的基分布转换成一个复杂的模型,使用一个不可逆和不同性的培训测度值的绘图序列来表示,使得在PDF的形状上可以更灵活、高维度的密度估计,而没有限制性的假设。我们展示了NFFF的效益,在将风险、更精确的IFS的预测性评估方面,将更精确的A-RFS的模型与更精确的模型进行对比。

Article 276

Title@2025-07-30 (3): Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss

Title: Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss

Theoretische Analyse von relativen Fehlern bei gradienten Berechnungen für Adversarialangriffe mit CE-Verlust

CE损失反向攻击的渐进计算中的相对误差理论分析 2507.22428v1

Authors (5): Yunrui Yu, Hang Su, Cheng-zhong Xu, Zhizhong Su, Jun Zhu

Gradient-based adversarial attacks using the Cross-Entropy (CE) loss often suffer from overestimation due to relative errors in gradient computation induced by floating-point arithmetic. This paper provides a rigorous theoretical analysis of these errors, conducting the first comprehensive study of floating-point computation errors in gradient-based attacks across four distinct scenarios: (i) unsuccessful untargeted attacks, (ii) successful untargeted attacks, (iii) unsuccessful targeted attacks, and (iv) successful targeted attacks. We establish theoretical foundations characterizing the behavior of relative numerical errors under different attack conditions, revealing previously unknown patterns in gradient computation instability, and identify floating-point underflow and rounding as key contributors. Building on this insight, we propose the Theoretical MIFPE (T-MIFPE) loss function, which incorporates an optimal scaling factor $T = t^*$ to minimize the impact of floating-point errors, thereby enhancing the accuracy of gradient computation in adversarial attacks. Extensive experiments on the MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that T-MIFPE outperforms existing loss functions, including CE, C\&W, DLR, and MIFPE, in terms of attack potency and robustness evaluation accuracy.

由于浮动点算引致的梯度计算中的相对误差,使用跨Entropy(CE)损失的基于渐变的对抗性攻击往往被高估。本文对这些误差进行了严格的理论分析,对这些误差进行了严格的理论分析,对以下四种不同情景的基于梯度的攻击中浮点计算误差进行了首次全面研究:(一) 无目标攻击不成功,(二) 无目标攻击成功,(三) 定点攻击不成功,(四) 定点攻击成功,(四) 定点攻击成功,(四) 定点攻击成功。我们为不同攻击条件下相对数字错误行为的特点建立了理论基础,揭示了梯度计算不稳定的先前未知模式,并确定浮点内流和环绕是关键贡献者。我们根据这一深入了解,提出了理论MIFPE(T-MIFE)损失函数,其中包括最佳缩放系数$T=t美元,以尽量减少浮动点错误的影响,从而提高对抗性攻击的精确性。我们对MNIIS、CIFAR-10和CIFA-100数据集进行广泛的实验,表明T-MIFTE超越了现有损失功能,包括CE、C-CQE、CR、CR、CR、DR、DR、DFAL、DR、DR、DR、DFA、DR和FD。

Article 277

Title@2025-07-30 (3): Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Title: Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Multimodales Late-Fusion-Modell für Problemlösungsstrategie-Klassifizierung in einem Machine-Learning-Spiel

机器学习游戏中解决问题战略分类的多模式晚期融合模式 2507.22426v1

Authors (4): Clemens Witt, Thiemo Leonhardt, Nadine Bergner, Mareen Grillenberger

Machine learning models are widely used to support stealth assessment in digital learning environments. Existing approaches typically rely on abstracted gameplay log data, which may overlook subtle behavioral cues linked to learners’ cognitive strategies. This paper proposes a multimodal late fusion model that integrates screencast-based visual data and structured in-game action sequences to classify students’ problem-solving strategies. In a pilot study with secondary school students (N=149) playing a multitouch educational game, the fusion model outperformed unimodal baseline models, increasing classification accuracy by over 15%. Results highlight the potential of multimodal ML for strategy-sensitive assessment and adaptive support in interactive learning contexts.

现有方法通常依赖抽象的游戏记录数据,这些数据可能会忽略与学习者的认知战略有关的微妙行为提示。本文建议采用多式迟发聚合模型,将屏幕视觉数据和结构化的游戏内行动序列结合起来,对学生解决问题的战略进行分类。在与中学生一起进行的一次实验研究(N=149)中,混合模型的形成胜过单式基线模型,将分类精确度提高15%以上。结果突出了多式ML在互动学习环境中战略敏感评估和适应性支持的潜力。

Article 278

Title@2025-07-30 (3): Bridging Privacy and Robustness for Trustworthy Machine Learning

Title: Bridging Privacy and Robustness for Trustworthy Machine Learning

Überbrückung von Privatsphäre und Robustheit für vertrauenswürdiges maschinelles Lernen

连接隐私和强力,促进可信赖的机器学习 2403.16591v5

Authors (2): Xiaojin Zhang, Wei Chen

The widespread adoption of machine learning necessitates robust privacy protection alongside algorithmic resilience. While Local Differential Privacy (LDP) provides foundational guarantees, sophisticated adversaries with prior knowledge demand more nuanced Bayesian privacy notions, such as Maximum Bayesian Privacy (MBP) and Average Bayesian Privacy (ABP), first introduced by \cite{zhang2022no}. Concurrently, machine learning systems require inherent robustness against data perturbations and adversarial manipulations. This paper systematically investigates the intricate theoretical relationships among LDP, MBP, and ABP. Crucially, we bridge these privacy concepts with algorithmic robustness, particularly within the Probably Approximately Correct (PAC) learning framework. Our work demonstrates that privacy-preserving mechanisms inherently confer PAC robustness. We present key theoretical results, including the formalization of the established LDP-MBP relationship, novel bounds between MBP and ABP, and a proof demonstrating PAC robustness from MBP. Furthermore, we establish a novel theoretical relationship quantifying how privacy leakage directly influences an algorithm’s input robustness. These results provide a unified theoretical framework for understanding and optimizing the privacy-robustness trade-off, paving the way for the development of more secure, trustworthy, and resilient machine learning systems.

虽然本地差异隐私(LDP)提供了基础性保障,但与先前知识的尖端对手则要求更细致的巴伊西亚隐私概念,如最深的巴伊西亚隐私(MBP)和平均巴伊西亚隐私(ABP),最初由\cite{zhang2022no}提出。与此同时,机器学习系统要求对数据干扰和对立操纵具有内在的强健性。本文系统调查了LDP、MBP和ABP之间的复杂理论关系。关键是我们将这些隐私概念与算法的强健性联系起来,特别是在可能为近似正统(PAC)的学习框架内。我们的工作表明,隐私保护机制必然会赋予巴伊隐私的稳健性。我们提出了关键的理论结果,包括已建立的巴伊关系正规化,MBP和ABP之间的新界限,以及MBP的证明PAC稳健性。此外,我们建立了一种新的理论关系,以量化隐私渗漏如何直接影响到算法的投入稳健性。这些结果提供了一个统一的理论框架,有利于更稳定地理解和优化的隐私-学习系统。

Article 279

Title@2025-07-30 (3): Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance

Title: Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance

Spec-VLA: Spekulative Dekodierung für Vision-Language-Action-Modelle mit entspannter Akzeptanz

Spec-VLA:放宽接受的愿景-语言-行动模式的投机代号 2507.22424v1

Authors (7): Songsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu, Feng Gao, Yu Wang, Derek F. Wong

Vision-Language-Action (VLA) models have made substantial progress by leveraging the robust capabilities of Visual Language Models (VLMs). However, VLMs’ significant parameter size and autoregressive (AR) decoding nature impose considerable computational demands on VLA models. While Speculative Decoding (SD) has shown efficacy in accelerating Large Language Models (LLMs) by incorporating efficient drafting and parallel verification, allowing multiple tokens to be generated in one forward pass, its application to VLA models remains unexplored. This work introduces Spec-VLA, an SD framework designed to accelerate VLA models. Due to the difficulty of the action prediction task and the greedy decoding mechanism of the VLA models, the direct application of the advanced SD framework to the VLA prediction task yields a minor speed improvement. To boost the generation speed, we propose an effective mechanism to relax acceptance utilizing the relative distances represented by the action tokens of the VLA model. Empirical results across diverse test scenarios affirm the effectiveness of the Spec-VLA framework, and further analysis substantiates the impact of our proposed strategies, which enhance the acceptance length by 44%, achieving 1.42 times speedup compared with the OpenVLA baseline, without compromising the success rate. The success of the Spec-VLA framework highlights the potential for broader application of speculative execution in VLA prediction scenarios.

通过利用视觉语言模型(VLM)的强大能力,视觉语言行动模型(VLA)取得了长足的进展。然而,VLM公司显著的参数大小和自动递减解码(AR)性质对VLA模型提出了大量的计算要求。虽然投机化代号(SD)显示在加快大语言模型(LLMS)方面的效力,纳入了高效的起草和平行核查,允许通过一个前方传票产生多种标志,该模型对VLA模型的应用仍未得到探讨。这项工作引入了Spec-VLA(SDA)框架,这是一个旨在加速VLA模型的SDA框架。由于行动预测任务的困难和VLA模型的贪婪解码机制,将高级SD框架直接应用于VLA预测任务只带来微小速度的改善。为了提高生成速度,我们建议一个有效的机制,利用VLA模型行动标志所代表的相对距离来放松接受。各种测试情景的Emprical结果确认Spec-VLA框架的有效性,旨在加速VLA模型的实施速度,并进一步分析我们提议的SLA(SBA)在44-L)基准期的接受率框架中提高我们提议的SBA的成功率的影响。

Article 280

Title@2025-07-30 (3): Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

Title: Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

Neurale Netzwerke als universelle Finite-State-Maschinen: Ein konstruktives ReLU-Simulations-Framework für NFAs

神经网络作为普遍有限国家机器:非官方FAS的建设性再LU模拟框架 2505.24110v2

Authors (1): Sahil Rajesh Dhayalkar

We present a formal and constructive simulation framework for nondeterministic finite automata (NFAs) using standard feedforward ReLU neural networks. Unlike prior approaches that rely on recurrent architectures or post hoc extraction methods, our formulation symbolically encodes automaton states as binary vectors, transitions as sparse linear transformations, and nondeterministic branching - including {\epsilon}-closures - as compositions of shared ReLU layers. We prove that every regular language can be recognized exactly by a depth-unrolled ReLU network with shared parameters, independent of input length. Our construction yields not only formal equivalence between NFAs and ReLU networks, but also practical trainability: we demonstrate that the networks can learn NFA acceptance behavior through gradient descent using standard supervised data. Extensive experiments validate all theoretical results, achieving perfect or near-perfect agreement on acceptance, state propagation, and closure dynamics. This work establishes a new bridge between symbolic automata theory and modern neural architectures, showing that feedforward networks can perform precise, interpretable, and trainable symbolic computation.

我们用标准进料前ReLU神经网络为非确定性有限自动网(NFAs)提供了一个正式和建设性的模拟框架。与以前依赖经常性结构或后临时提取方法的方法不同,我们用象征性的编码自动成像状态作为二进制矢量,作为细线变换的过渡,以及作为共同的ReLU层构成的非确定性分支(包括 ipsilon}-locures)。我们证明,每个常规语言都可以完全被一个具有共享参数且不依赖输入长度的深度无源ReLU网络所识别。我们的建筑不仅产生非常规式的NFAs和ReLU网络之间的等同形式,而且具有实际的可训练性:我们证明这些网络可以通过标准监督的数据通过梯度下降来学习NFA的接受行为。广泛的实验验证所有理论结果,在接受、状态传播和封闭动态方面达成完美或接近效果的协议。这项工作在象征性的自动数据理论和现代神经结构之间建立了新的桥梁,显示FFFedforward网络能够进行精确、可解释和可训练的象征性计算。

Article 281

Title@2025-07-30 (3): SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Title: SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

SmallThinker: Eine Familie von effizienten großen Sprachmodellen Natively Trained for Local Deployment

小规模:一个由本地培训的高效大语言模式组成的家庭,供当地部署使用 2507.20984v2

Authors (14): Yixin Song, Zhenliang Xue, Dongliang Wei, Feiyang Chen, Jianxiang Gao, Junchen Liu, Hangyu Liang, Guangshuo Qin, Chengrong Tian, Bo Wen, Longyu Zhao, Xinrui Zheng, Zeyu Mi, Haibo Chen

While frontier large language models (LLMs) continue to push capability boundaries, their deployment remains confined to GPU-powered cloud infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs natively designed - not adapted - for the unique constraints of local devices: weak computational power, limited memory, and slow storage. Unlike traditional approaches that mainly compress existing models built for clouds, we architect SmallThinker from the ground up to thrive within these limitations. Our innovation lies in a deployment-aware architecture that transforms constraints into design principles. First, We introduce a two-level sparse structure combining fine-grained Mixture-of-Experts (MoE) with sparse feed-forward networks, drastically reducing computational demands without sacrificing model capacity. Second, to conquer the I/O bottleneck of slow storage, we design a pre-attention router that enables our co-designed inference engine to prefetch expert parameters from storage while computing attention, effectively hiding storage latency that would otherwise cripple on-device inference. Third, for memory efficiency, we utilize NoPE-RoPE hybrid sparse attention mechanism to slash KV cache requirements. We release SmallThinker-4B-A0.6B and SmallThinker-21B-A3B, which achieve state-of-the-art performance scores and even outperform larger LLMs. Remarkably, our co-designed system mostly eliminates the need for expensive GPU hardware: with Q4_0 quantization, both models exceed 20 tokens/s on ordinary consumer CPUs, while consuming only 1GB and 8GB of memory respectively. SmallThinker is publicly available at hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct and hf.co/PowerInfer/SmallThinker-21BA3B-Instruct.

虽然前沿的大型语言模型(LLMS)继续推进能力界限,但其部署仍局限于GPU3驱动的云层基础设施。我们与SmallThinker(SmallThinker)一起挑战这一模式,SmallThinker(MoE)是一家本地设计、但未经改造的LLMM公司,以适应本地设备的独特限制:计算能力薄弱、记忆力有限和储存速度慢。不同于主要压缩现有云层模型的传统方法,我们从地面设计SmallThinker(SmallThinker),我们的创新在于将限制转化为设计原则的部署认知结构。首先,我们引入一种两级的稀薄结构,将精细精细的Mix-Exixert(MoE)结合起来,同时使用SmPE-RoPE-BOmissoral Oral3,同时使用Smal-BOrickral-B Oral-Oral-Oral-Oral-Sal-Sal-Oral-Sal-Oral-IFral-Sal-Sal-Sal-Sal-Sal-Sal-Oral-Oral-Sal-Sal-Sal-Sal-Sal-I),我们共同设计的发动机发动机发动机发动机引擎,我们设计,我们设计发动机-Smal-Sto-Sral-Sto-S-O-O-O-O-Sto-Sto-Sal-Sal-Sal-Sal-Sal-Sal-B-Sal-Sal-Sal-S-S-S-S-S-Sal-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-O-O-S-S-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-

Article 282

Title@2025-07-30 (3): Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

Title: Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

Föderierte, distributiv robuste Optimierung mit nicht konvexen Zielen: Algorithmen und Analysen

与非Convex目标优化的联邦分布强度优化:等级和分析 2307.14364v2

Authors (3): Yang Jiao, Kai Yang, Dongjin Song

Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.

最佳分配优化(DRO)旨在找到最佳决定,最大限度地减少概率分布的模糊性差幅的最大案例成本,在多种应用中广泛应用,例如网络行为分析、风险管理等。然而,现有的DRO技术面临三大挑战:(1) 如何在分布式环境中应对分布式环境中的无同步更新;(2) 如何有效利用先前分配;(3) 如何根据不同情景适当调整稳健度。为此,我们提议采用一个非同步分布式算法,名为Asyncronous SlooP ExplantiveIve gRadient proEction(ASPIRI)(ASPIRI)(ASPIRI)(ASPIR)(ASPIR)(ASPIR)(ASER)(A)(ASPER)(ARE)(AST)(AS)(AST)算法),以解决联邦化分布式分布式强力优化(FDRO)问题。此外,正在开发一套新的不确定性,即受限制的D-norm不确定性集,以有效地利用先前的分布和灵活控制强健度。最后,我们的理论分析表明,拟议的算法保证了趋同它相趋同的精确性复杂性,并且也只能用快速的数据分析。

Article 283

Title@2025-07-30 (3): FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

Title: FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

FedCVD++: Kommunikationseffizientes Federated Learning für kardiovaskuläre Risikovorhersage mit parametrischer und nicht parametrischer Modelloptimierung

FedCVD++: 具有参数和非参数模型优化的心血管风险预测通信-高效联邦学习 2507.22963v1

Authors (8): Abdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBatt

Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance. Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy. Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.

心血管疾病(CVD)每年在全世界造成1 700多万人死亡,这突出表明迫切需要建立隐私保护预测系统。我们引入了FedCVD++,即强化的联邦学习(FL)框架(FedCVD++),这一框架整合了用于冠心病风险预测的参数模型(逻辑回归、SVM、神经网络)和非参数模型(Random Forest、XGBost),为了应对主要的FL挑战,我们提议:(1) 树分位取样,将随机森林通信管理减少70%;(2) XGBOest基于地提取地功能,使轻重量联合的聚合团团群;(3) 联合SMOTE同步,以解决跨机构级失衡问题。对Framingham数据集(4 238个记录)进行了评估,FedCVD++P取得最新结果:FGBOst(F1=0.80)超过其中央对应系统(F1实际值=0.78),以及Fterrical Refrical Formal-deal-deal-deal-deal-laction a firmissional-liferal-de-deferal-de-de-listal-de-lif-de-de-list-listal-defal-listal-lifal-deal-deal-list-lif-lif-lationald-lationald-lational-lationfal-defal-defal-defal-lationfal-lationfal-lationfal-d-d-d-defal-fal-d-d-fal-d-d-d-defal-d-d-d-d-d-d-fal-fal-fal-d-fal-fal-fal-fal-dal-fal-fal-fal-fal-fal-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-

Article 284

Title@2025-07-30 (3): MINR: Implicit Neural Representations with Masked Image Modelling

Title: MINR: Implicit Neural Representations with Masked Image Modelling

MIRR: Implizite Neuraldarstellungen mit maskierter Bildmodellierung

MINR:带有蒙面图像建模的隐性神经图示 2507.22404v1

Authors (3): Sua Lee, Joonhun Lee, Myungjoo Kang

Self-supervised learning methods like masked autoencoders (MAE) have shown significant promise in learning robust feature representations, particularly in image reconstruction-based pretraining task. However, their performance is often strongly dependent on the masking strategies used during training and can degrade when applied to out-of-distribution data. To address these limitations, we introduce the masked implicit neural representations (MINR) framework that synergizes implicit neural representations with masked image modeling. MINR learns a continuous function to represent images, enabling more robust and generalizable reconstructions irrespective of masking strategies. Our experiments demonstrate that MINR not only outperforms MAE in in-domain scenarios but also in out-of-distribution settings, while reducing model complexity. The versatility of MINR extends to various self-supervised learning applications, confirming its utility as a robust and efficient alternative to existing frameworks.

自我监督的学习方法,如蒙面自动校正仪(MAE),在学习强健的特征表现方面显示出很大的希望,特别是在基于图像的重建培训前任务方面。然而,它们的绩效往往在很大程度上取决于培训期间使用的遮罩策略,在应用到分配外数据时可以降解。为解决这些局限性,我们引入了隐蔽的隐含神经表现(MINR)框架,将隐含的神经表现与蒙面的图像建模协同起来。MINR学会了代表图像的连续功能,使得无论采用遮罩策略,都能够进行更强有力和可推广的重建。我们的实验表明,MINR不仅在常规情景中超越了MAE,而且在分配外的设置中也超越了MAE,同时降低了模型的复杂性。MIR的多功能延伸到了各种自我监督的学习应用,证实了其作为现有框架的强有力和高效替代工具的效用。

Article 285

Title@2025-07-30 (3): OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

Title: OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

OpenEarthSensing: Großformatiger, feinkörniger Benchmark für Open-World Remote Sensing

开放地球传感器:开放世界遥感大型精细基准 2502.20668v2

Authors (10): Xiang Xiang, Zhuo Xu, Yao Deng, Qinhao Zhou, Yifan Liang, Ke Chen, Qingfang Zheng, Yaowei Wang, Xilin Chen, Wen Gao

The advancement of remote sensing, including satellite systems, facilitates the continuous acquisition of remote sensing imagery globally, introducing novel challenges for achieving open-world tasks. Deployed models need to continuously adjust to a constant influx of new data, which frequently exhibits diverse shifts from the data encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update their parameters without forgetting learned knowledge, as has been considered in works on a variety of open-world tasks. However, existing studies are typically conducted within a single dataset to simulate realistic conditions, with a lack of large-scale benchmarks capable of evaluating multiple open-world tasks. In this paper, we introduce \textbf{OpenEarthSensing (OES)}, a large-scale fine-grained benchmark for open-world remote sensing. OES includes 189 scene and object categories, covering the vast majority of potential semantic shifts that may occur in the real world. Additionally, to provide a more comprehensive testbed for evaluating the generalization performance, OES encompasses five data domains with significant covariate shifts, including two RGB satellite domains, one RGB aerial domain, one multispectral RGB domain, and one infrared domain. We evaluate the baselines and existing methods for diverse tasks on OES, demonstrating that it serves as a meaningful and challenging benchmark for open-world remote sensing. The proposed dataset OES is available at https://haiv-lab.github.io/OES.

遥感的进步,包括卫星系统,促进了全球遥感图像的持续获取,为完成开放世界的任务带来了新的挑战。部署模型需要不断调整以适应不断涌现的新数据,这些新数据经常显示与培训阶段所见数据的不同变化。为了有效地处理新数据,需要模型来检测语义变化,适应千变万化,并不断更新其参数而不忘记知识,正如在各种开放世界任务工作中所考虑的那样,但是,现有的研究通常在一个单一数据集内进行,以模拟现实条件,缺乏能够评估多种开放世界任务的大规模基准。在本文件中,我们引入了与开放世界遥感有关的大规模微缩基准,即开放世界遥感的精细度基准。OES包括189个场和对象类别,涵盖在现实世界可能发生的绝大多数潜在的语义变化。此外,为评估一般化绩效提供一个更全面的测试台,OES涵盖五个数据域,包括两个 RGB 红外光谱卫星域,一个RGB OS 在线域域域域,一个RGB 和一个在线域域域,一个RGB 用于展示一个有意义的数据基准。

Article 286

Title@2025-07-30 (3): Gems: Group Emotion Profiling Through Multimodal Situational Understanding

Title: Gems: Group Emotion Profiling Through Multimodal Situational Understanding

Edelsteine: Gruppen-Emotion Profiling durch multimodale Situation verstehen

Gems:通过多模式情况理解来分析群体情感 2507.22393v1

Authors (5): Anubhav Kataria, Surbhi Madan, Shreya Ghosh, Tom Gedeon, Abhinav Dhall

Understanding individual, group and event level emotions along with contextual information is crucial for analyzing a multi-person social situation. To achieve this, we frame emotion comprehension as the task of predicting fine-grained individual emotion to coarse grained group and event level emotion. We introduce GEMS that leverages a multimodal swin-transformer and S3Attention based architecture, which processes an input scene, group members, and context information to generate joint predictions. Existing multi-person emotion related benchmarks mainly focus on atomic interactions primarily based on emotion perception over time and group level. To this end, we extend and propose VGAF-GEMS to provide more fine grained and holistic analysis on top of existing group level annotation of VGAF dataset. GEMS aims to predict basic discrete and continuous emotions (including valence and arousal) as well as individual, group and event level perceived emotions. Our benchmarking effort links individual, group and situational emotional responses holistically. The quantitative and qualitative comparisons with adapted state-of-the-art models demonstrate the effectiveness of GEMS framework on VGAF-GEMS benchmarking. We believe that it will pave the way of further research. The code and data is available at: https://github.com/katariaak579/GEMS

理解个人、群体和事件情绪以及背景信息对于分析多人社会状况至关重要。为此,我们将情感理解作为预测细微个人情感的任务,以预测粗粗的粒子群体和事件情绪;我们引入以GEMS为杠杆的基于多式双向传递和S3注意的架构,该架构处理输入场景、小组成员和背景信息,以产生联合预测。现有的多人情感相关基准主要侧重于原子互动,主要基于时间和群体层面的情感感觉。为此,我们扩大并提议VGAF-GEMS, 在现有群体级别上对VGAF-GEMS数据库进行更精细的粒和全面分析。GEMS旨在预测基本的离散和连续的情绪(包括价值和振奋)以及个人、群体和事件层面的感知情绪。我们的基准工作将个人、群体和情境情感反应整体联系起来。与经调整的状态-艺术模型的定量和定性比较展示了GEMS框架对VGAF-GEMS基准的实效。我们认为,它将进一步铺平研究方式。

Article 287

Title@2025-07-30 (3): Outcome-based Reinforcement Learning to Predict the Future

Title: Outcome-based Reinforcement Learning to Predict the Future

Ergebnisbasiertes Bewehrungslernen zur Vorhersage der Zukunft

基于成果的强化学习,以预测未来 2505.17989v3

Authors (5): Benjamin Turtel, Danny Franklin, Kris Skotheim, Luke Hewitt, Philipp Schoenegger

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models’ reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model’s performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.

使用可变奖励强化学习(RLVR)是改进大语言模型在诸如编码和数学等领域的推理的有效方法。在这里,我们应用RLVR方法来预测未来真实世界事件——由于涉及非常吵(和延迟)的结果,RL是一项艰巨的任务。我们利用预测市场最新问题的新数据集和相关新闻头条头条,显示可以对一个紧凑(14B)推理模型进行培训,以匹配或超过像O1这样的前沿模型的预测准确性,同时大大改进概率校准。模型的性能也具有实际意义:在多边市场交易模拟中,我们估计其赌注在测试组所有问题中将产生超过10%的投资回报。我们详细比较了用于培训模型的方法,包括用合成预测问题、学习稳定性的防护装置和推断时的中位预测抽样来增加我们的培训数据。

Article 288

Title@2025-07-30 (3): PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

Title: PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

PATENTWRITER: Eine Benchmarking-Studie für die Patenterstellung mit LLMs

PATENTWRITER: 专利起草基准研究与LLMs 2507.22387v1

Authors (3): Homaira Huda Shomee, Suman Kalyan Maity, Sourav Medya

Large language models (LLMs) have emerged as transformative approaches in several important fields. This paper aims for a paradigm shift for patent writing by leveraging LLMs to overcome the tedious patent-filing process. In this work, we present PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation. Given the first claim of a patent, we evaluate six leading LLMs – including GPT-4 and LLaMA-3 – under a consistent setup spanning zero-shot, few-shot, and chain-of-thought prompting strategies to generate the abstract of the patent. Our benchmark PATENTWRITER goes beyond surface-level evaluation: we systematically assess the output quality using a comprehensive suite of metrics – standard NLP measures (e.g., BLEU, ROUGE, BERTScore), robustness under three types of input perturbations, and applicability in two downstream patent classification and retrieval tasks. We also conduct stylistic analysis to assess length, readability, and tone. Experimental results show that modern LLMs can generate high-fidelity and stylistically appropriate patent abstracts, often surpassing domain-specific baselines. Our code and dataset are open-sourced to support reproducibility and future research.

大型语言模型(LLMS)已成为若干重要领域的变革方法。本文件旨在通过利用LLMS来利用LLMs来克服无聊的专利过滤程序,实现专利写作范式的转变。在这项工作中,我们介绍PATENTWRITER,这是在专利抽象生成过程中评价LMS的第一个统一基准框架。鉴于第一项专利主张,我们根据一个涵盖零发、少发和一连串的激励策略的一致设置,对六大LMS – – 包括GPT-4和LLLAMA-3 – – 进行了评价,这六大LLMS(包括GPT-4和LLAMA-3)进行了评价,以产生专利的抽象。我们的基准PATENTWRITER超越了地表一级的评估:我们系统评估产出质量的方法包括一套综合的计量尺度 – – 标准NLP措施(例如,BLEU、ROUGE、BERSTScore),在三种类型的投入扰动作用下,以及两种下专利分类和检索任务的适用性。我们还进行了模拟分析,以评估性分析,以评估性分析,以评估长度、可读性和调和调度。实验性分析结果显示,现代LMSMSDMs能够产生高的专利基础和对未来数据库基础和升级性支持。

Article 289

Title@2025-07-30 (3): Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

Title: Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

Multi-Hazard Frühwarnsysteme für die Landwirtschaft mit featured-Temporal Erklärungen

多危险农业预警系统及时/时解释 2507.22962v1

Authors (2): Boyuan Zheng, Victor W. Chu

Climate extremes present escalating risks to agriculture intensifying the need for reliable multi-hazard early warning systems (EWS). The situation is evolving due to climate change and hence such systems should have the intelligent to continue to learn from recent climate behaviours. However, traditional single-hazard forecasting methods fall short in capturing complex interactions among concurrent climatic events. To address this deficiency, in this paper, we combine sequential deep learning models and advanced Explainable Artificial Intelligence (XAI) techniques to introduce a multi-hazard forecasting framework for agriculture. In our experiments, we utilize meteorological data from four prominent agricultural regions in the United States (between 2010 and 2023) to validate the predictive accuracy of our framework on multiple severe event types, which are extreme cold, floods, frost, hail, heatwaves, and heavy rainfall, with tailored models for each area. The framework uniquely integrates attention mechanisms with TimeSHAP (a recurrent XAI explainer for time series) to provide comprehensive temporal explanations revealing not only which climatic features are influential but precisely when their impacts occur. Our results demonstrate strong predictive accuracy, particularly with the BiLSTM architecture, and highlight the system’s capacity to inform nuanced, proactive risk management strategies. This research significantly advances the explainability and applicability of multi-hazard EWS, fostering interdisciplinary trust and effective decision-making process for climate risk management in the agricultural industry.

气候极端现象加剧了农业对可靠多危害预警系统的需求。气候变化使情况不断演变,因此,这些系统应具有继续从最近的气候行为中学习的智慧。然而,传统的单一灾害预测方法不足以捕捉同时发生的气候事件之间的复杂互动。为了解决这一缺陷,我们在本文件中将连续的深层次学习模型和先进的可解释人工智能(XAI)技术结合起来,以引入一个农业多危害预报框架。在我们的实验中,我们利用美国四个主要农业区域(2010年至2023年)的气象数据来验证我们关于多种严重事件类型的框架的预测准确性,这些事件类型是极端寒冷、洪水、冰霜、冰雹、热浪和大降雨量,并配有针对每个地区的专门模型。这一框架将关注机制与TimeSHAP(一个经常性的XAI解释器,用于时间序列)相结合,以提供全面的时间解释,不仅揭示哪些气候特征具有影响力,而且准确显示其影响发生时。我们的成果显示了强烈的预测性准确性,特别是在BILSTM结构中,并突出系统的能力,用以为多度风险管理进展、积极主动的多风险管理的系统解释。

Article 290

Title@2025-07-30 (3): OWLViz: An Open-World Benchmark for Visual Question Answering

Title: OWLViz: An Open-World Benchmark for Visual Question Answering

OWLViz: Ein Open-World-Benchmark für visuelle Fragen

OWLViz:视觉问答的开放世界基准 2503.07631v3

Authors (6): Thuy Nguyen, Dang Nguyen, Hoang Nguyen, Thuan Luong, Long Hoang Dang, Viet Dac Lai

We present a challenging benchmark for the Open WorLd VISual question answering (OWLViz) task. OWLViz presents concise, unambiguous queries that require integrating multiple capabilities, including visual understanding, web exploration, and specialized tool usage. While humans achieve 69.2% accuracy on these intuitive tasks, even state-of-the-art VLMs struggle, with the best model, Gemini 2.0, achieving only 26.6% accuracy. Current agentic VLMs, which rely on limited vision and vision-language models as tools, perform even worse. This performance gap reveals significant limitations in multimodal systems’ ability to select appropriate tools and execute complex reasoning sequences, establishing new directions for advancing practical AI research.

我们为Open WorLd Visual 答题(OWLViz)的任务提出了一个具有挑战性的基准。 OWLViz 给出了简明、明确的询问,要求整合多种能力,包括视觉理解、网络探索和专门工具使用。虽然人类在这些直观任务上实现了69.2%的准确性,但即使是最先进的VLM,其最佳模型是Gemini 2.0, 其准确性仅为26.6%。目前依赖有限的愿景和愿景语言模型作为工具的VLMs,其表现更差。这一绩效差距揭示了多式联运系统在选择适当工具和执行复杂推理序列、为推进实际的AI研究确定新方向方面的巨大局限性。

Article 291

Title@2025-07-30 (3): Set Invariance with Probability One for Controlled Diffusion: Score-based Approach

Title: Set Invariance with Probability One for Controlled Diffusion: Score-based Approach

Invarianz mit Probability One für kontrollierte Diffusion einstellen: Score-basierter Ansatz

设定控制下扩散的概率一的变量一:计分法 2507.22385v1

Authors (4): Wenqing Wang, Alexis M. H. Teter, Murat Arcak, Abhishek Halder

Given a controlled diffusion and a connected, bounded, Lipschitz set, when is it possible to guarantee controlled set invariance with probability one? In this work, we answer this question by deriving the necessary and sufficient conditions for the same in terms of gradients of certain log-likelihoods – a.k.a. score vector fields – for two cases: given finite time horizon and infinite time horizon. The deduced conditions comprise a score-based test that provably certifies or falsifies the existence of Markovian controllers for given controlled set invariance problem data. Our results are constructive in the sense when the problem data passes the proposed test, we characterize all controllers guaranteeing the desired set invariance. When the problem data fails the proposed test, there does not exist a controller that can accomplish the desired set invariance with probability one. The computation in the proposed tests involve solving certain Dirichlet boundary value problems, and in the finite horizon case, can also account for additional constraint of hitting a target subset at the terminal time. We illustrate the results using several semi-analytical and numerical examples.

鉴于控制扩散和连接的、捆绑的利普西茨设置,当有可能保证受控的设置与概率的差错时,我们将回答这个问题。在这项工作中,我们从某些日志相似值的梯度 – – a.k.a.评分矢量字段 – – 的两个案例中,即给定的时间范围与无限的时间范围,得出了必要和充分的条件。推断的条件包括一个基于分数的测试,该测试可以可靠地验证或伪造给给定的受控变量问题数据设置的Markovian控制器的存在。在问题数据通过拟议测试时,我们的结果具有建设性。我们用所有控制器的特性来保证所期望的不变值。当问题数据失败时,没有一个控制器能够用概率一达到所期望的差数。在拟议测试中进行的计算涉及解决某些德里赫特边界值问题的计算,在有限地平线的情况下,也可以说明在终端时间打击目标子组的额外限制。我们用几个半分析和数字例子来说明结果。

Article 292

Title@2025-07-30 (3): MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Title: MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

MAVFlow: Paralinguistische Elemente mit konditionellem Fluss erhalten, passend für blitzfreie AV2AV Mehrsprachige Übersetzung

MAVFlow: 将语言要素与条件流动相匹配的配方元素保留在零点热AV2AV多语种翻译中 2503.11026v2

Authors (4): Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim, Se-Young Yun

Despite recent advances in text-to-speech (TTS) models, audio-visual-to-audio-visual (AV2AV) translation still faces a critical challenge: maintaining speaker consistency between the original and translated vocal and facial features. To address this issue, we propose a conditional flow matching (CFM) zero-shot audio-visual renderer that utilizes strong dual guidance from both audio and visual modalities. By leveraging multimodal guidance with CFM, our model robustly preserves speaker-specific characteristics and enhances zero-shot AV2AV translation abilities. For the audio modality, we enhance the CFM process by integrating robust speaker embeddings with x-vectors, which serve to bolster speaker consistency. Additionally, we convey emotional nuances to the face rendering module. The guidance provided by both audio and visual cues remains independent of semantic or linguistic content, allowing our renderer to effectively handle zero-shot translation tasks for monolingual speakers in different languages. We empirically demonstrate that the inclusion of high-quality mel-spectrograms conditioned on facial information not only enhances the quality of the synthesized speech but also positively influences facial generation, leading to overall performance improvements in LSE and FID score. Our code is available at https://github.com/Peter-SungwooCho/MAVFlow.

尽管在文本到语音(TTS)模式、视听到视听(AV2AV)翻译(AV2AV)翻译方面最近有所进展,但是,尽管在文本到语音(TTS)模式、视听到视听(AV2AV)翻译(AV2AV)方面最近仍面临一个重大挑战:保持原声和翻译和面部特征之间的发言者一致性。为了解决这一问题,我们提议使用来自视听模式的强有力的双重指导,以有条件的流量匹配(CFM)零弹射视听制作器。通过利用CFM的多式指导,我们的模型强有力地保存了发言者特有的特点,并提高了AV2AV2AV翻译能力。对于音频模式,我们通过将强健的扬声器嵌入XVVVVFD进程,加强C调频进程,这有助于提高发言的一致性。此外,我们向面部的模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模组传达了情感上的细模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模

Article 293

Title@2025-07-30 (3): Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Title: Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Verbesserung der Verallgemeinerung Fähigkeit des Roboterimitationslernens durch Lösung von Kausalverwirrung in Beobachtungen

通过解决观测中的原因融合,提高机器人模拟学习的普遍化能力 2507.22380v1

Authors (5): Yifei Chen, Yuzhe Zhang, Giovanni D’urso, Nicholas Lawrance, Brendan Tidd

Recent developments in imitation learning have considerably advanced robotic manipulation. However, current techniques in imitation learning can suffer from poor generalization, limiting performance even under relatively minor domain shifts. In this work, we aim to enhance the generalization capabilities of complex imitation learning algorithms to handle unpredictable changes from the training environments to deployment environments. To avoid confusion caused by observations that are not relevant to the target task, we propose to explicitly learn the causal relationship between observation components and expert actions, employing a framework similar to [6], where a causal structural function is learned by intervention on the imitation learning policy. Disentangling the feature representation from image input as in [6] is hard to satisfy in complex imitation learning process in robotic manipulation, we theoretically clarify that this requirement is not necessary in causal relationship learning. Therefore, we propose a simple causal structure learning framework that can be easily embedded in recent imitation learning architectures, such as the Action Chunking Transformer [31]. We demonstrate our approach using a simulation of the ALOHA [31] bimanual robot arms in Mujoco, and show that the method can considerably mitigate the generalization problem of existing complex imitation learning algorithms.

在这项工作中,我们的目标是提高复杂的模拟学习算法的普及能力,以处理从培训环境到部署环境的不可预测的变化;为了避免与目标任务无关的观测造成的混乱,我们提议采用类似于[6]的框架,明确了解观察组成部分与专家行动之间的因果关系,采用类似于[6]的框架,即通过对模仿学习政策的干预来学习因果结构功能。在机器人操作中,从[6] 的图像输入中分离特征表示很难在复杂的模拟学习过程中满足。我们理论上澄清,这一要求在因果关系学习中没有必要。因此,我们提出了一个简单的因果结构学习框架,可以很容易地嵌入最近的模拟学习结构,例如“行动振动变换器[31]。我们用模拟穆乔科的ALOHA [31] 双性机器人武器来证明我们的方法,并表明这种方法可以大大减轻现有的复杂模仿学习算法的普遍问题。

Article 294

Title@2025-07-30 (3): Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review

Title: Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review

Jährliche Entwicklungen bei der Erkennung von Finanzbetrug durch Deep Learning: Ein systematischer Literaturbericht

《通过深学习侦查金融欺诈:系统文学审查》年年发展动态 2502.00201v2

Authors (5): Yisong Chen, Chuqing Zhao, Yixin Xu, Chuanhao Nie, Yixin Zhang

This paper systematically reviews advancements in deep learning (DL) techniques for financial fraud detection, a critical issue in the financial sector. Using the Kitchenham systematic literature review approach, 57 studies published between 2019 and 2024 were analyzed. The review highlights the effectiveness of various deep learning models such as Convolutional Neural Networks, Long Short-Term Memory, and transformers across domains such as credit card transactions, insurance claims, and financial statement audits. Performance metrics such as precision, recall, F1-score, and AUC-ROC were evaluated. Key themes explored include the impact of data privacy frameworks and advancements in feature engineering and data preprocessing. The study emphasizes challenges such as imbalanced datasets, model interpretability, and ethical considerations, alongside opportunities for automation and privacy-preserving techniques such as blockchain integration and Principal Component Analysis. By examining trends over the past five years, this review identifies critical gaps and promising directions for advancing DL applications in financial fraud detection, offering actionable insights for researchers and practitioners.

该文件系统地审查了金融部门发现金融欺诈的深层学习(DL)技术的进展,这是金融部门的一个关键问题。利用基切汉姆系统文献审查方法,分析了2019年至2024年期间发表的57项研究。审查突出了各种深层学习模式的有效性,如革命神经网络、长期短期记忆以及信用卡交易、保险索赔和财务报表审计等领域的变压器。评价了精确性、召回、F1核心和AUC-ROC等业绩指标。所探讨的关键主题包括数据隐私框架的影响和特征工程和数据处理的进展。研究强调了不平衡的数据集、模型可解释性和道德考虑,以及自动化和隐私保护技术的机会,如块链整合和主要组成部分分析。通过审查过去五年的趋势,本审查确定了在金融欺诈侦查方面推进DL应用的关键差距和有希望的方向,为研究人员和从业人员提供了可操作的洞察力。

Article 295

Title@2025-07-30 (3): Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

Title: Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

Vorhersage des akustischen Feldes im 1-D-Uniformkanal mit unterschiedlichem mittleren Durchfluss und Temperatur über neuronale Netze

使用神经网络以不同平均流量和温度的1D级统一管道声学场预测 2507.22370v1

Authors (2): D. Veerababu, Prasanta K. Ghosh

Neural networks constrained by the physical laws emerged as an alternate numerical tool. In this paper, the governing equation that represents the propagation of sound inside a one-dimensional duct carrying a heterogeneous medium is derived. The problem is converted into an unconstrained optimization problem and solved using neural networks. Both the acoustic state variables: acoustic pressure and particle velocity are predicted and validated with the traditional Runge-Kutta solver. The effect of the temperature gradient on the acoustic field is studied. Utilization of machine learning techniques such as transfer learning and automatic differentiation for acoustic applications is demonstrated.

受物理定律制约的神经网络作为一种替代数字工具出现。在本文中, 生成了代表带异质介质的单维导管内声音传播的主导方程式。问题被转化成一个不受限制的优化问题, 并通过神经网络解决。音效状态变量: 声压和粒子速度由传统的龙格- 库塔求解器预测和验证。温度梯度对声学领域的影响得到了研究。演示了对机器学习技术的利用, 如传导学习和声学应用的自动区分。

Article 296

Title@2025-07-30 (3): BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Title: BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

BlockFFN: Auf dem Weg zur End-Side Acceleration-Friendly Mixture-of-Experts mit Chunk-Level-Aktivierung Sparsity

块块FFN: 向具有整块级激活分级的终端- 双极加速- 友好混合混合专家方向 2507.08771v2

Authors (8): Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Yuxuan Li, Zhiyuan Liu, Maosong Sun

To alleviate the computational burden of large language models (LLMs), architectures with activation sparsity, represented by mixture-of-experts (MoE), have attracted increasing attention. However, the non-differentiable and inflexible routing of vanilla MoE hurts model performance. Moreover, while each token activates only a few parameters, these sparsely-activated architectures exhibit low chunk-level sparsity, indicating that the union of multiple consecutive tokens activates a large ratio of parameters. Such a sparsity pattern is unfriendly for acceleration under low-resource conditions (e.g., end-side devices) and incompatible with mainstream acceleration techniques (e.g., speculative decoding). To address these challenges, we introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques. Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing. Next, to promote both token-level sparsity (TLS) and chunk-level sparsity (CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly. Finally, we implement efficient acceleration kernels, combining activation sparsity and speculative decoding for the first time. The experimental results demonstrate the superior performance of BlockFFN over other MoE baselines, achieving over 80% TLS and 70% 8-token CLS. Our kernels achieve up to 3.67$\times$ speedup on real end-side devices than dense models. All codes and checkpoints are available publicly (https://github.com/thunlp/BlockFFN).

为了减轻大型语言模型(LLMS)的计算负担,以专家混合(MoE)为代表的具有激活性弹性的架构(LLMS)吸引了越来越多的关注。然而,香草MoE的无差别和不灵活路线令模式伤害了模型性能。此外,尽管每个象征性的架构只激活了几个参数,但这些分散活跃的架构却呈现出低块水平的宽度,表明多个连续代号的结合激活了巨大的参数比例。这样的松散模式对于在低资源条件(例如,终端设备)下加速运行不方便,而且与主流加速技术(例如,投机性解码)不兼容。为了应对这些挑战,我们引入了全新的MOE架构(BlubFFN)及其高效培训和部署技术。具体地说,我们使用一个将RELU的激活和RMSNormm 整合到不同和灵活路线上的路径上。接下来,在Sal-levelopmental-ality(TLS)和块级级的终端设备(CLS)中(CS-LS-s-wapildal-loadalalalalalalalal-dealalalalalalalal) strationalalalalalalalal 和3.(Cal-deal-deal-dealal) 80) 目标是设计、CLFMal-deal-deal-deal-dealizaldal-dealmentalmentalmental-dealmentalmentalal 80 80 。最后运行,为我们80 和80的升级化的加速性能、CLFMFMFTalmental-deal-deal-deal-deal-deal-deal-deal-tamental-tamental-al-al-al-al-deal-tamental-tamental-deal-deal-tamental-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-al-al-deal-al-al-deal-deal-deal-al-al-al-al-al-al-al-al-al-deal-

Article 297

Title@2025-07-30 (3): ($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

Title: ($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametrische Multi-Task-Optimierung: Gemeinsame Suche in Lösungs- und unendlichen Aufgabenräumen

(boldsymboll,\boldsymbolu$) - 几何多功能优化:在解决方案中共同搜索和无限任务空间 2503.08394v3

Authors (5): Tingyang Wei, Jiao Liu, Abhishek Gupta, Puay Siew Tan, Yew-Soon Ong

Multi-task optimization is typically characterized by a fixed and finite set of tasks. The present paper relaxes this condition by considering a non-fixed and potentially infinite set of optimization tasks defined in a parameterized, continuous and bounded task space. We refer to this unique problem setting as parametric multi-task optimization (PMTO). Assuming the bounds of the task parameters to be ($\boldsymbol{\theta}_l$, $\boldsymbol{\theta}_u$), a novel ($\boldsymbol{\theta}_l$, $\boldsymbol{\theta}_u$)-PMTO algorithm is crafted to operate in two complementary modes. In an offline optimization mode, a joint search over solution and task spaces is carried out with the creation of two approximation models: (1) for mapping points in a unified solution space to the objective spaces of all tasks, which provably accelerates convergence by acting as a conduit for inter-task knowledge transfers, and (2) for probabilistically mapping tasks to their corresponding solutions, which facilitates evolutionary exploration of under-explored regions of the task space. In the online mode, the derived models enable direct optimization of any task within the bounds without the need to search from scratch. This outcome is validated on both synthetic test problems and practical case studies, with the significant real-world applicability of PMTO shown towards fast reconfiguration of robot controllers under changing task conditions. The potential of PMTO to vastly speedup the search for solutions to minimax optimization problems is also demonstrated through an example in robust engineering design.

多任务优化通常具有固定和有限的任务组合的特点。本文通过考虑在参数化、连续和封闭的任务空间中定义的非固定且潜在无限的优化任务来放松这一条件。我们将这一独特的问题设置称为参数性多任务优化( PMTO ) 。假设任务参数的界限是 (boldsymbol_thetal$, $\boldsymbol_thetal$) , 一种新颖的( boldsymbol_thetal$, $\boldsymbol_thetal$)- PMTO 算法以两种互补模式操作。在离线优化模式中, 联合搜索解决方案和任务空间任务空间, 并创建两种近似模型:(1) 在所有任务目标空间的统一解决方案空间中绘制点, 通过充当任务间实际任务转移知识转移的管道, 以及以稳定方式绘制任务到相应的解决方案, 从而便利在快速的轨道结构中进行进化的进化探索, 在快速的模型中, 将演示后, 将演示结果结果。

Article 298

Title@2025-07-30 (3): MSQ: Memory-Efficient Bit Sparsification Quantization

Title: MSQ: Memory-Efficient Bit Sparsification Quantization

MSQ: Speichereffiziente Bit Sparsifikation Quantisierung

MSQ: 内存效率比分分量化 2507.22349v1

Authors (7): Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko

As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer to enable differentiable computation of the least significant bits (LSBs) from model weights. It further employs regularization to induce sparsity in these LSBs, enabling effective precision reduction without explicit bit-level parameter splitting. Additionally, MSQ incorporates Hessian information, allowing the simultaneous pruning of multiple LSBs to further enhance training efficiency. Experimental results show that MSQ achieves up to 8.00x reduction in trainable parameters and up to 86% reduction in training time compared to previous bit-level quantization, while maintaining competitive accuracy and compression rates. This makes it a practical solution for training efficient DNNs on resource-constrained devices.

深度神经网络(DNNs)看到在移动和边缘设备上部署的人数增加,优化模型效率就变得至关重要。混合精度量化被广泛支持,因为它在效率与准确性之间与统一量化之间提供了更好的平衡。然而,找到每个层的最佳精确度是具有挑战性的。最近利用比特级宽度的研究显示了希望,但它们往往带来大量的培训复杂性和高GPU内存要求。在本文件中,我们提议一种解决这些限制的新办法,即优化模型效率。MSQ采用一个圆分级量化法,以便能够从模型重量中不同地计算出最小的位数(LSBs),因为它进一步采用正规化来吸引这些位数的偏狭度,使有效精确度降低而无需明确的比特级参数分裂。此外,MSQ纳入了赫西信息,允许同时对多个LSBs进行分类,以进一步提高培训效率。实验结果表明,MSQ在可培训参数方面达到8.00x的削减幅度,在前一季度培训中将达到86%的精确度,同时使DNS级培训达到一定的精确度。

Article 299

Title@2025-07-30 (3): Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Title: Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Nutzung von großen Sprachmodellen für Bengalische Mathematik-Wort-Probleme bei der Lösung der Kette der Gedankenveranlagung

利用大语言模型解决孟加拉语数学字词与思维链理性的解决问题 2505.21354v2

Authors (5): Bidyarthi Paul, Jalisha Jashim Era, Mirazur Rahman Zim, Tahmid Sattar Aothoi, Faisal Muhammad Shah

Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language’s low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT prompting consistently improved performance over standard prompting, especially in tasks requiring multi-step logic. LLaMA-3.3 70B achieved the highest accuracy of 88% with few-shot CoT prompting. We also applied Low-Rank Adaptation (LoRA) to fine-tune models efficiently, enabling them to adapt to Bengali MWPs with minimal computational cost. Our work fills a critical gap in Bengali NLP by providing a high-quality reasoning dataset and a scalable framework for solving complex MWPs. We aim to advance equitable research in low-resource languages and enhance reasoning capabilities in educational and language technologies.

解决孟加拉数学字数问题(MWPs)仍然是自然语言处理(NLP)的一大挑战,原因是该语言资源水平低,需要多步推理。现有的模型与复杂的孟加拉语言模型(LLMMS)挣扎,这主要是因为以前没有人类附加说明的孟加拉语数据集。这一差距限制了孟加拉语数学推理的进展。为了解决这个问题,我们创建了由8792个复杂的孟加拉语模型组成的数据集SOMADHAN,配有手写、逐步的解决方案。我们设计了这一数据集,以支持在语言代表性不足的背景下进行注重逻辑的评价和模型开发。我们利用SOMAADHAN评估了一系列大型语言模型(LLMMSM),包括GPT-4o、GPT-3.5 Turbo、LLMMA系列模型、Deepseek和Quwen-通过零发和几发的推理(CoT)推理推理来不断改进标准,特别是在需要多步逻辑的情况下,标准推理的推理。LMA-3.3-RMM-B在高层次推理学上实现了88的精度的精准,我们的数据推理成本,我们用低的精确推理学推理学推理学推理,我们用低的推理推理推理学的推理,我们用高的推理的推理,我们用低的推理学的推理的推理的推理的推理的推理方法推理方法推理,还了88888,也提高了了高的推理。

Article 300

Title@2025-07-30 (3): Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Title: Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Koopman-basierte Verallgemeinerung des Deep Reinforcement Learning mit Anwendung in der drahtlosen Kommunikation

以Koopman为基础的深强化学习通用,应用于无线通信 2503.02961v2

Authors (3): Atefeh Termehchi, Ekram Hossain, Isaac Woungang

Deep Reinforcement Learning (DRL) is a key machine learning technology driving progress across various scientific and engineering fields, including wireless communication. However, its limited interpretability and generalizability remain major challenges. In supervised learning, generalizability is commonly evaluated through the generalization error using information-theoretic methods. In DRL, the training data is sequential and not independent and identically distributed (i.i.d.), rendering traditional information-theoretic methods unsuitable for generalizability analysis. To address this challenge, this paper proposes a novel analytical method for evaluating the generalizability of DRL. Specifically, we first model the evolution of states and actions in trained DRL algorithms as unknown discrete, stochastic, and nonlinear dynamical functions. Then, we employ a data-driven identification method, the Koopman operator, to approximate these functions, and propose two interpretable representations. Based on these interpretable representations, we develop a rigorous mathematical approach to evaluate the generalizability of DRL algorithms. This approach is formulated using the spectral feature analysis of the Koopman operator, leveraging the H_\infty norm. Finally, we apply this generalization analysis to compare the soft actor-critic method, widely recognized as a robust DRL approach, against the proximal policy optimization algorithm for an unmanned aerial vehicle-assisted mmWave wireless communication scenario.

nan

Article 301

Title@2025-07-30 (3): Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution

Title: Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution

Robustes Filtern und Lernen in State-Space-Modellen: Skewness und Heavy Tails via Asymmetrische Laplace-Distribution

州空间模型中的强力过滤和学习:扭曲和重尾体通过反对称拉皮板分布 2507.22343v1

Authors (3): Yifan Yu, Shengjie Xiu, Daniel P. Palomar

State-space models are pivotal for dynamic system analysis but often struggle with outlier data that deviates from Gaussian distributions, frequently exhibiting skewness and heavy tails. This paper introduces a robust extension utilizing the asymmetric Laplace distribution, specifically tailored to capture these complex characteristics. We propose an efficient variational Bayes algorithm and a novel single-loop parameter estimation strategy, significantly enhancing the efficiency of the filtering, smoothing, and parameter estimation processes. Our comprehensive experiments demonstrate that our methods provide consistently robust performance across various noise settings without the need for manual hyperparameter adjustments. In stark contrast, existing models generally rely on specific noise conditions and necessitate extensive manual tuning. Moreover, our approach uses far fewer computational resources, thereby validating the model’s effectiveness and underscoring its potential for practical applications in fields such as robust control and financial modeling.

nan

Article 302

Title@2025-07-30 (3): A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks

Title: A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks

Ein semi-überwachtes Federated Learning Framework mit Hierarchical Clustering Aggregation für heterogene Satellitennetzwerke

半上层联邦学习框架,包括异源卫星网络的等级集群聚合 2507.22339v1

Authors (6): Zhuocheng Liu, Zhishu Shen, Qiushi Zheng, Tiehua Zhang, Zheng Lei, Jiong Jin

Low Earth Orbit (LEO) satellites are emerging as key components of 6G networks, with many already deployed to support large-scale Earth observation and sensing related tasks. Federated Learning (FL) presents a promising paradigm for enabling distributed intelligence in these resource-constrained and dynamic environments. However, achieving reliable convergence, while minimizing both processing time and energy consumption, remains a substantial challenge, particularly in heterogeneous and partially unlabeled satellite networks. To address this challenge, we propose a novel semi-supervised federated learning framework tailored for LEO satellite networks with hierarchical clustering aggregation. To further reduce communication overhead, we integrate sparsification and adaptive weight quantization techniques. In addition, we divide the FL clustering into two stages: satellite cluster aggregation stage and Ground Stations (GSs) aggregation stage. The supervised learning at GSs guides selected Parameter Server (PS) satellites, which in turn support fully unlabeled satellites during the federated training process. Extensive experiments conducted on a satellite network testbed demonstrate that our proposal can significantly reduce processing time (up to 3x) and energy consumption (up to 4x) compared to other comparative methods while maintaining model accuracy.

nan

Article 303

Title@2025-07-30 (3): Parametrized Multi-Agent Routing via Deep Attention Models

Title: Parametrized Multi-Agent Routing via Deep Attention Models

Parametrisiertes Multi-Agent Routing über Deep Attachment Modelle

透过深关注模型流出 2507.22338v1

Authors (3): Salar Basiri, Dhananjay Tiwari, Srinivasa M. Salapaka

We propose a scalable deep learning framework for parametrized sequential decision-making (ParaSDM), where multiple agents jointly optimize discrete action policies and shared continuous parameters. A key subclass of this setting arises in Facility-Location and Path Optimization (FLPO), where multi-agent systems must simultaneously determine optimal routes and facility locations, aiming to minimize the cumulative transportation cost within the network. FLPO problems are NP-hard due to their mixed discrete-continuous structure and highly non-convex objective. To address this, we integrate the Maximum Entropy Principle (MEP) with a neural policy model called the Shortest Path Network (SPN)-a permutation-invariant encoder-decoder that approximates the MEP solution while enabling efficient gradient-based optimization over shared parameters. The SPN achieves up to 100$\times$ speedup in policy inference and gradient computation compared to MEP baselines, with an average optimality gap of approximately 6% across a wide range of problem sizes. Our FLPO approach yields over 10$\times$ lower cost than metaheuristic baselines while running significantly faster, and matches Gurobi’s optimal cost with annealing at a 1500$\times$ speedup-establishing a new state of the art for ParaSDM problems. These results highlight the power of structured deep models for solving large-scale mixed-integer optimization tasks.

nan

Article 304

Title@2025-07-30 (3): HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

Title: HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

HypKG: Hypergraph-basierte Wissensgrafik Kontextualisierung für Precision Healthcare

HYPKG: 精密保健基于地平线知识图背景情况 2507.19726v2

Authors (6): Yuzhang Xie, Xu Han, Ran Xu, Xiao Hu, Jiaying Lu, Carl Yang

Knowledge graphs (KGs) are important products of the semantic web, which are widely used in various application domains. Healthcare is one of such domains where KGs are intensively used, due to the high requirement for knowledge accuracy and interconnected nature of healthcare data. However, KGs storing general factual information often lack the ability to account for important contexts of the knowledge such as the status of specific patients, which are crucial in precision healthcare. Meanwhile, electronic health records (EHRs) provide rich personal data, including various diagnoses and medications, which provide natural contexts for general KGs. In this paper, we propose HypKG, a framework that integrates patient information from EHRs into KGs to generate contextualized knowledge representations for accurate healthcare predictions. Using advanced entity-linking techniques, we connect relevant knowledge from general KGs with patient information from EHRs, and then utilize a hypergraph model to “contextualize” the knowledge with the patient information. Finally, we employ hypergraph transformers guided by downstream prediction tasks to jointly learn proper contextualized representations for both KGs and patients, fully leveraging existing knowledge in KGs and patient contexts in EHRs. In experiments using a large biomedical KG and two real-world EHR datasets, HypKG demonstrates significant improvements in healthcare prediction tasks across multiple evaluation metrics. Additionally, by integrating external contexts, HypKG can learn to adjust the representations of entities and relations in KG, potentially improving the quality and real-world utility of knowledge.

nan

Article 305

Title@2025-07-30 (3): Hypernetworks for Model-Heterogeneous Personalized Federated Learning

Title: Hypernetworks for Model-Heterogeneous Personalized Federated Learning

Hypernetzwerke für modell-heterogenes personalisiertes Federated Learning

模拟异异异性个性化联邦学习超级网络 2507.22330v1

Authors (5): Chen Zhang, Husheng Li, Xiang Liu, Linshan Jiang, Danxin Wang

Recent advances in personalized federated learning have focused on addressing client model heterogeneity. However, most existing methods still require external data, rely on model decoupling, or adopt partial learning strategies, which can limit their practicality and scalability. In this paper, we revisit hypernetwork-based methods and leverage their strong generalization capabilities to design a simple yet effective framework for heterogeneous personalized federated learning. Specifically, we propose MH-pFedHN, which leverages a server-side hypernetwork that takes client-specific embedding vectors as input and outputs personalized parameters tailored to each client’s heterogeneous model. To promote knowledge sharing and reduce computation, we introduce a multi-head structure within the hypernetwork, allowing clients with similar model sizes to share heads. Furthermore, we further propose MH-pFedHNGD, which integrates an optional lightweight global model to improve generalization. Our framework does not rely on external datasets and does not require disclosure of client model architectures, thereby offering enhanced privacy and flexibility. Extensive experiments on multiple benchmarks and model settings demonstrate that our approach achieves competitive accuracy, strong generalization, and serves as a robust baseline for future research in model-heterogeneous personalized federated learning.

nan

Article 306

Title@2025-07-30 (3): FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

Title: FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

FAST: Ein Optimierungsrahmen für schnelle Additive Segmentierung in Transparent ML

FAST: 透明 ML 快速添加分割的最佳框架 2402.12630v2

Authors (2): Brian Liu, Rahul Mazumder

We present FAST, an optimization framework for fast additive segmentation. FAST segments piecewise constant shape functions for each feature in a dataset to produce transparent additive models. The framework leverages a novel optimization procedure to fit these models $\sim$2 orders of magnitude faster than existing state-of-the-art methods, such as explainable boosting machines \citep{nori2019interpretml}. We also develop new feature selection algorithms in the FAST framework to fit parsimonious models that perform well. Through experiments and case studies, we show that FAST improves the computational efficiency and interpretability of additive models.

nan

Article 307

Title@2025-07-30 (3): Provable Low-Frequency Bias of In-Context Learning of Representations

Title: Provable Low-Frequency Bias of In-Context Learning of Representations

Wahrscheinliche frequenzarme Bias des In-Context-Lernens von Repräsentationen

可实现的低公平率代表制的理论内学习 2507.13540v2

Authors (3): Yongyi Yang, Hidenori Tanaka, Wei Hu

In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates. Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations. However, the mechanisms by which LLMs achieve this ability is left open. In this paper, we present the first rigorous explanation of such phenomena by introducing a unified framework of double convergence, where hidden representations converge both over context and across layers. This double convergence process leads to an implicit bias towards smooth (low-frequency) representations, which we prove analytically and verify empirically. Our theory explains several open empirical observations, including why learned representations exhibit globally structured but locally distorted geometry, and why their total energy decays without vanishing. Moreover, our theory predicts that ICL has an intrinsic robustness towards high-frequency noise, which we empirically confirm. These results provide new insights into the underlying mechanisms of ICL, and a theoretical foundation to study it that hopefully extends to more general data distributions and settings.

nan

Article 308

Title@2025-07-30 (3): Floating-Point Neural Networks Are Provably Robust Universal Approximators

Title: Floating-Point Neural Networks Are Provably Robust Universal Approximators

Floating-Point-Neural-Netzwerke sind wahrscheinlich robuste Universal-Annäherung

浮动点神经网络具有可可预见强健的通用通用近似器 2506.16065v2

Authors (5): Geonho Hwang, Wonyeol Lee, Yeachan Park, Sejun Park, Feras Saad

The classical universal approximation (UA) theorem for neural networks establishes mild conditions under which a feedforward neural network can approximate a continuous function $f$ with arbitrary accuracy. A recent result shows that neural networks also enjoy a more general interval universal approximation (IUA) theorem, in the sense that the abstract interpretation semantics of the network using the interval domain can approximate the direct image map of $f$ (i.e., the result of applying $f$ to a set of inputs) with arbitrary accuracy. These theorems, however, rest on the unrealistic assumption that the neural network computes over infinitely precise real numbers, whereas their software implementations in practice compute over finite-precision floating-point numbers. An open question is whether the IUA theorem still holds in the floating-point setting. This paper introduces the first IUA theorem for floating-point neural networks that proves their remarkable ability to perfectly capture the direct image map of any rounded target function $f$, showing no limits exist on their expressiveness. Our IUA theorem in the floating-point setting exhibits material differences from the real-valued setting, which reflects the fundamental distinctions between these two computational models. This theorem also implies surprising corollaries, which include (i) the existence of provably robust floating-point neural networks; and (ii) the computational completeness of the class of straight-line programs that use only floating-point additions and multiplications for the class of all floating-point programs that halt.

nan

Article 309

Title@2025-07-30 (3): Scientific Machine Learning with Kolmogorov-Arnold Networks

Title: Scientific Machine Learning with Kolmogorov-Arnold Networks

Wissenschaftliches maschinelles Lernen mit Kolmogorov-Arnold-Netzwerken

Kolmogorov-Arnold网络的科学机器学习 2507.22959v1

Authors (4): Salah A. Faroughi, Farinaz Mostajeran, Amin Hamed Mashhadzadeh, Shirko Faroughi

The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs’ advantages in capturing complex dynamics while learning more effectively. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.

nan

Article 310

Title@2025-07-30 (3): The challenge of hidden gifts in multi-agent reinforcement learning

Title: The challenge of hidden gifts in multi-agent reinforcement learning

Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen

多试剂强化学习中隐藏礼品的挑战 2505.20579v3

Authors (2): Dane Malenfant, Blake A. Richards

Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness in independent agents can benefit these settings.

nan

Article 311

Title@2025-07-30 (3): BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

Title: BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

BEACON: Eine Bayesische Optimierungsstrategie für Neuheitssuche in teuren Black-Box-Systemen

BEACON: 昂贵的黑箱系统新奇搜索贝叶斯最佳最佳战略 2406.03616v4

Authors (3): Wei-Ting Tang, Ankush Chakrabarty, Joel A. Paulson

Novelty search (NS) refers to a class of exploration algorithms that seek to uncover diverse system behaviors through simulations or experiments. Such diversity is central to many AI-driven discovery and design tasks, including material and drug development, neural architecture search, and reinforcement learning. However, existing NS methods typically rely on evolutionary strategies and other meta-heuristics that require dense sampling of the input space, making them impractical for expensive black-box systems. In this work, we introduce BEACON, a sample-efficient, Bayesian optimization-inspired approach to NS that is tailored for settings where the input-to-behavior relationship is opaque and costly to evaluate. BEACON models this mapping using multi-output Gaussian processes (MOGPs) and selects new inputs by maximizing a novelty metric computed from posterior samples of the MOGP, effectively balancing the exploration-exploitation trade-off. By leveraging recent advances in posterior sampling and high-dimensional GP modeling, our method remains scalable to large input spaces and datasets. We evaluate BEACON across ten synthetic benchmarks and eight real-world tasks, including the design of diverse materials for clean energy applications. Our results show that BEACON significantly outperforms existing NS baselines, consistently discovering a broader set of behaviors under tight evaluation budgets.

nan

Article 312

Title@2025-07-30 (3): Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Title: Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Wavelet trifft Adam: Komprimierende Gradienten für Gedächtnis-Effizientes Training

Wavelet Meets Adam:将逐步压缩用于记忆效率培训 2501.07237v3

Authors (8): Ziqing Wen, Ping Luo, Jiahuan Wang, Xiaoge Deng, Jinping Zou, Kun Yuan, Tao Sun, Dongsheng Li

Large language models (LLMs) have shown impressive performance across a range of natural language processing tasks. However, their vast number of parameters introduces significant memory challenges during training, particularly when using memory-intensive optimizers like Adam. Existing memory-efficient algorithms often rely on techniques such as singular value decomposition projection or weight freezing. While these approaches help alleviate memory constraints, they generally produce suboptimal results compared to full-rank updates. In this paper, we investigate the memory-efficient method beyond low-rank training, proposing a novel solution called Gradient Wavelet Transform (GWT), which applies wavelet transforms to gradients in order to significantly reduce the memory requirements for maintaining optimizer states. We demonstrate that GWT can be seamlessly integrated with memory-intensive optimizers, enabling efficient training without sacrificing performance. Through extensive experiments on both pre-training and fine-tuning tasks, we show that GWT achieves state-of-the-art performance compared with advanced memory-efficient optimizers and full-rank approaches in terms of both memory usage and training performance.

nan

Article 313

Title@2025-07-30 (3): Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality

Title: Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality

Dekodierung neuraler Signaturen semantischer Bewertungen in Depression und Suizidalität

萧条和自相残杀情况下语义评价解码神经签名 2507.22313v1

Authors (18): Woojae Jeong, Aditya Kommineni, Kleanthis Avramidis, Colin McDaniel, Donald Berry, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Dani Byrd, Assal Habibi, B. Rael Cahn, Idan A. Blank, Kristina Lerman, Dimitrios Pantazis, Sudarsana R. Kadiri, Takfarinas Medani, Shrikanth Narayanan, Richard M. Leahy

Depression and suicidality profoundly impact cognition and emotion, yet objective neurophysiological biomarkers remain elusive. We investigated the spatiotemporal neural dynamics underlying affective semantic processing in individuals with varying levels of clinical severity of depression and suicidality using multivariate decoding of electroencephalography (EEG) data. Participants (N=137) completed a sentence evaluation task involving emotionally charged self-referential statements while EEG was recorded. We identified robust, neural signatures of semantic processing, with peak decoding accuracy between 300-600 ms – a window associated with automatic semantic evaluation and conflict monitoring. Compared to healthy controls, individuals with depression and suicidality showed earlier onset, longer duration, and greater amplitude decoding responses, along with broader cross-temporal generalization and increased activation of frontocentral and parietotemporal components. These findings suggest altered sensitivity and impaired disengagement from emotionally salient content in the clinical groups, advancing our understanding of the neurocognitive basis of mental health and providing a principled basis for developing reliable EEG-based biomarkers of depression and suicidality.

nan

Article 314

Title@2025-07-30 (3): An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

Title: An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

Ein asynchroner dezentralisierter Optimierungsalgorithmus für nichtkonvexe Probleme

非经济问题非集中分散化最佳优化比值 2507.22311v1

Authors (3): Behnam Mafakheri, Jonathan H. Manton, Iman Shames

In this paper, we consider nonconvex decentralised optimisation and learning over a network of distributed agents. We develop an ADMM algorithm based on the Randomised Block Coordinate Douglas-Rachford splitting method which enables agents in the network to distributedly and asynchronously compute a set of first-order stationary solutions of the problem. To the best of our knowledge, this is the first decentralised and asynchronous algorithm for solving nonconvex optimisation problems with convergence proof. The numerical examples demonstrate the efficiency of the proposed algorithm for distributed Phase Retrieval and sparse Principal Component Analysis problems.

nan

Article 315

Title@2025-07-30 (3): High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

Title: High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

High-Resolution Live Fuel Moisture Content (LFMC) Karten für Wildfire-Risiko aus multimodalen Erdbeobachtungsdaten

多式地球观测数据产生的野火风险高分辨率活燃料动力内容地图 2506.20132v2

Authors (8): Patrick Alan Johnson, Gabriel Tseng, Yawen Zhang, Heather Heward, Virginia Sjahli, Favyen Bastani, Joseph Redmon, Patrick Beukema

Wildfires are increasing in intensity and severity at an alarming rate. Recent advances in AI and publicly available satellite data enable monitoring critical wildfire risk factors globally, at high resolution and low latency. Live Fuel Moisture Content (LFMC) is a critical wildfire risk factor and is valuable for both wildfire research and operational response. However, ground-based LFMC samples are both labor intensive and costly to acquire, resulting in sparse and infrequent updates. In this work, we explore the use of a pretrained, highly-multimodal earth-observation model for generating large-scale spatially complete (wall-to-wall) LFMC maps. Our approach achieves significant improvements over previous methods using randomly initialized models (20 reduction in RMSE). We provide an automated pipeline that enables rapid generation of these LFMC maps across the United States, and demonstrate its effectiveness in two regions recently impacted by wildfire (Eaton and Palisades).

nan

Article 316

Title@2025-07-30 (3): Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Title: Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Vergleich von Cluster-basierten Cross-Validation-Strategien für die Bewertung von Machine Learning-Modellen

比较用于机械学习模式评价的集群交叉评估战略 2507.22299v1

Authors (2): Afonso Martini Spezia, Mariana Recamonde-Mendoza

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data subsets (folds) that do not adequately represent the diversity of the original dataset, which can lead to biased performance estimates. The objective of this work is to deepen the investigation of cluster-based cross-validation strategies by analyzing the performance of different clustering algorithms through experimental comparison. Additionally, a new cross-validation technique that combines Mini Batch K-Means with class stratification is proposed. Experiments were conducted on 20 datasets (both balanced and imbalanced) using four supervised learning algorithms, comparing cross-validation strategies in terms of bias, variance, and computational cost. The technique that uses Mini Batch K-Means with class stratification outperformed others in terms of bias and variance on balanced datasets, though it did not significantly reduce computational cost. On imbalanced datasets, traditional stratified cross-validation consistently performed better, showing lower bias, variance, and computational cost, making it a safe choice for performance evaluation in scenarios with class imbalance. In the comparison of different clustering algorithms, no single algorithm consistently stood out as superior. Overall, this work contributes to improving predictive model evaluation strategies by providing a deeper understanding of the potential of cluster-based data splitting techniques and reaffirming the effectiveness of well-established strategies like stratified cross-validation. Moreover, it highlights perspectives for increasing the robustness and reliability of model evaluations, especially in datasets with clustering characteristics.

nan

Article 317

Title@2025-07-29 (2): AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

Title: AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

AlphaEarth Foundations: Ein eingebettetes Feldmodell für genaue und effiziente globale Kartierung aus spärlichen Etikettendaten

阿尔法地球基金会:利用稀少标签数据进行准确、高效全球制图的嵌入实地模型 2507.22291v1

Authors (19): Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer, Estefania Lahera, Olivia Wiles, Simon Ilyushchenko, Noel Gorelick, Lihui Lydia Zhang, Sophia Alj, Emily Schechter, Sean Askay, Oliver Guinan, Rebecca Moore, Alexis Boukouvalas, Pushmeet Kohli

Unprecedented volumes of Earth observation data are continually collected around the world, but high-quality labels remain scarce given the effort required to make physical measurements and observations. This has led to considerable investment in bespoke modeling efforts translating sparse labels into maps. Here we introduce AlphaEarth Foundations, an embedding field model yielding a highly general, geospatial representation that assimilates spatial, temporal, and measurement contexts across multiple sources, enabling accurate and efficient production of maps and monitoring systems from local to global scales. The embeddings generated by AlphaEarth Foundations are the only to consistently outperform all previous featurization approaches tested on a diverse set of mapping evaluations without re-training. We will release a dataset of global, annual, analysis-ready embedding field layers from 2017 through 2024.

nan

Article 318

Title@2025-07-29 (2): Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations

Title: Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations

Intent Recognition und Out-of-Scope-Erkennung mit LLMs in Multi-Party-Konversationen

在多方对话中使用LLMs 2507.22289v1

Authors (3): Galo Castillo-López, Gaël de Chalendar, Nasredine Semmar

Intent recognition is a fundamental component in task-oriented dialogue systems (TODS). Determining user intents and detecting whether an intent is Out-of-Scope (OOS) is crucial for TODS to provide reliable responses. However, traditional TODS require large amount of annotated data. In this work we propose a hybrid approach to combine BERT and LLMs in zero and few-shot settings to recognize intents and detect OOS utterances. Our approach leverages LLMs generalization power and BERT’s computational efficiency in such scenarios. We evaluate our method on multi-party conversation corpora and observe that sharing information from BERT outputs to LLMs leads to system performance improvement.

nan

Article 319

Title@2025-07-29 (2): CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

Title: CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

CHECK-MAT: Überprüfung von handschriftlichen mathematischen Antworten für die russische Unified State Prüfung

CHECK-MAT: 检查俄罗斯统一国家考试的手写数学答案 2507.22958v1

Authors (1): Ruslan Khrulev

This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math

nan

Article 320

Title@2025-07-29 (2): An Introduction to Modern Statistical Learning

Title: An Introduction to Modern Statistical Learning

Eine Einführung in das moderne statistische Lernen

现代统计学习介绍 2207.10185v2

Authors (1): Joseph G. Makin

This work in progress aims to provide a unified introduction to statistical learning, building up slowly from classical models like the GMM and HMM to modern neural networks like the VAE and diffusion models. There are today many internet resources that explain this or that new machine-learning algorithm in isolation, but they do not (and cannot, in so brief a space) connect these algorithms with each other or with the classical literature on statistical models, out of which the modern algorithms emerged. Also conspicuously lacking is a single notational system which, although unfazing to those already familiar with the material (like the authors of these posts), raises a significant barrier to the novice’s entry. Likewise, I have aimed to assimilate the various models, wherever possible, to a single framework for inference and learning, showing how (and why) to change one model into another with minimal alteration (some of them novel, others from the literature). Some background is of course necessary. I have assumed the reader is familiar with basic multivariable calculus, probability and statistics, and linear algebra. The goal of this book is certainly not completeness, but rather to draw a more or less straight-line path from the basics to the extremely powerful new models of the last decade. The goal then is to complement, not replace, such comprehensive texts as Bishop’s \emph{Pattern Recognition and Machine Learning}, which is now 15 years old.

nan

Article 321

Title@2025-07-29 (2): HOG-CNN: Integrating Histogram of Oriented Gradients with Convolutional Neural Networks for Retinal Image Classification

Title: HOG-CNN: Integrating Histogram of Oriented Gradients with Convolutional Neural Networks for Retinal Image Classification

HOG-CNN: Integration des Histogramms orientierter Gradienten mit konvolutionären Neuralnetzwerken für die Retinalbildklassifikation

HRG-CNN:将定向梯度直方图与关于视视像图像分类的革命神经网络整合 2507.22274v1

Authors (1): Faisal Ahmed

The analysis of fundus images is critical for the early detection and diagnosis of retinal diseases such as Diabetic Retinopathy (DR), Glaucoma, and Age-related Macular Degeneration (AMD). Traditional diagnostic workflows, however, often depend on manual interpretation and are both time- and resource-intensive. To address these limitations, we propose an automated and interpretable clinical decision support framework based on a hybrid feature extraction model called HOG-CNN. Our key contribution lies in the integration of handcrafted Histogram of Oriented Gradients (HOG) features with deep convolutional neural network (CNN) representations. This fusion enables our model to capture both local texture patterns and high-level semantic features from retinal fundus images. We evaluated our model on three public benchmark datasets: APTOS 2019 (for binary and multiclass DR classification), ORIGA (for Glaucoma detection), and IC-AMD (for AMD diagnosis); HOG-CNN demonstrates consistently high performance. It achieves 98.5\% accuracy and 99.2 AUC for binary DR classification, and 94.2 AUC for five-class DR classification. On the IC-AMD dataset, it attains 92.8\% accuracy, 94.8\% precision, and 94.5 AUC, outperforming several state-of-the-art models. For Glaucoma detection on ORIGA, our model achieves 83.9\% accuracy and 87.2 AUC, showing competitive performance despite dataset limitations. We show, through comprehensive appendix studies, the complementary strength of combining HOG and CNN features. The model’s lightweight and interpretable design makes it particularly suitable for deployment in resource-constrained clinical environments. These results position HOG-CNN as a robust and scalable tool for automated retinal disease screening.

nan

Article 322

Title@2025-07-29 (2): The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

Title: The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

Die Bedeutung des Diskreten seins: Messung der Auswirkungen der Diskretisierung in End-to-End-Differentially Private Synthetic Data

差异的重要性:衡量端至端端差异性私人合成数据中差异化的影响 2504.06923v3

Authors (4): Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro

Differentially Private (DP) generative marginal models are often used in the wild to release synthetic tabular datasets in lieu of sensitive data while providing formal privacy guarantees. These models approximate low-dimensional marginals or query workloads; crucially, they require the training data to be pre-discretized, i.e., continuous values need to first be partitioned into bins. However, as the range of values (or their domain) is often inferred directly from the training data, with the number of bins and bin edges typically defined arbitrarily, this approach can ultimately break end-to-end DP guarantees and may not always yield optimal utility. In this paper, we present an extensive measurement study of four discretization strategies in the context of DP marginal generative models. More precisely, we design DP versions of three discretizers (uniform, quantile, and k-means) and reimplement the PrivTree algorithm. We find that optimizing both the choice of discretizer and bin count can improve utility, on average, by almost 30% across six DP marginal models, compared to the default strategy and number of bins, with PrivTree being the best-performing discretizer in the majority of cases. We demonstrate that, while DP generative models with non-private discretization remain vulnerable to membership inference attacks, applying DP during discretization effectively mitigates this risk. Finally, we improve on an existing approach for automatically selecting the optimal number of bins, and achieve high utility while reducing both privacy budget consumption and computational overhead.

nan

Article 323

Title@2025-07-29 (2): Weighted Conditional Flow Matching

Title: Weighted Conditional Flow Matching

Gewichteter Bedingter Fluss passend

加权有条件流动匹配 2507.22270v1

Authors (6): Sergio Calvo-Ordonez, Matthieu Meunier, Alvaro Cartea, Christoph Reisinger, Yarin Gal, Jose Miguel Hernandez-Lobato

Conditional flow matching (CFM) has emerged as a powerful framework for training continuous normalizing flows due to its computational efficiency and effectiveness. However, standard CFM often produces paths that deviate significantly from straight-line interpolations between prior and target distributions, making generation slower and less accurate due to the need for fine discretization at inference. Recent methods enhance CFM performance by inducing shorter and straighter trajectories but typically rely on computationally expensive mini-batch optimal transport (OT). Drawing insights from entropic optimal transport (EOT), we propose Weighted Conditional Flow Matching (W-CFM), a novel approach that modifies the classical CFM loss by weighting each training pair $(x, y)$ with a Gibbs kernel. We show that this weighting recovers the entropic OT coupling up to some bias in the marginals, and we provide the conditions under which the marginals remain nearly unchanged. Moreover, we establish an equivalence between W-CFM and the minibatch OT method in the large-batch limit, showing how our method overcomes computational and performance bottlenecks linked to batch size. Empirically, we test our method on unconditional generation on various synthetic and real datasets, confirming that W-CFM achieves comparable or superior sample quality, fidelity, and diversity to other alternative baselines while maintaining the computational efficiency of vanilla CFM.

nan

Article 324

Title@2025-07-29 (2): Agent-centric learning: from external reward maximization to internal knowledge curation

Title: Agent-centric learning: from external reward maximization to internal knowledge curation

Agentzentriertes Lernen: von der externen Belohnungsmaximierung bis zur internen Wissenskuration

以代理人为中心的学习:从外部奖励最大化到内部知识整理 2507.22255v1

Authors (4): Hanqi Zhou, Fryderyk Mantiuk, David G. Nagy, Charley M. Wu

The pursuit of general intelligence has traditionally centered on external objectives: an agent’s control over its environments or mastery of specific tasks. This external focus, however, can produce specialized agents that lack adaptability. We propose representational empowerment, a new perspective towards a truly agent-centric learning paradigm by moving the locus of control inward. This objective measures an agent’s ability to controllably maintain and diversify its own knowledge structures. We posit that the capacity – to shape one’s own understanding – is an element for achieving better ``preparedness’’ distinct from direct environmental influence. Focusing on internal representations as the main substrate for computing empowerment offers a new lens through which to design adaptable intelligent systems.

nan

Article 325

Title@2025-07-29 (2): Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields

Title: Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields

Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern

由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段 2506.08146v2

Authors (6): Vahidullah Taç, Amirhossein Amiri-Hezaveh, Manuel K. Rausch, Grace N. Bechtel, Francisco Sahli Costabal, Adrian Buganza Tepole

We propose a new framework for identifying mechanical properties of heterogeneous materials without a closed-form constitutive equation. Given a full-field measurement of the displacement field, for instance as obtained from digital image correlation (DIC), a continuous approximation of the strain field is obtained by training a neural network that incorporates Fourier features to effectively capture sharp gradients in the data. A physics-based data-driven method built upon ordinary neural differential equations (NODEs) is employed to discover constitutive equations. The NODE framework can represent arbitrary materials while satisfying constraints in the theory of constitutive equations by default. To account for heterogeneity, a hyper-network is defined, where the input is the material coordinate system, and the output is the NODE-based constitutive equation. The parameters of the hyper-network are optimized by minimizing a multi-objective loss function that includes penalty terms for violations of the strong form of the equilibrium equations of elasticity and the associated Neumann boundary conditions. We showcase the framework with several numerical examples, including heterogeneity arising from variations in material parameters, spatial transitions from isotropy to anisotropy, material identification in the presence of noise, and, ultimately, application to experimental data. As the numerical results suggest, the proposed approach is robust and general in identifying the mechanical properties of heterogeneous materials with very few assumptions, making it a suitable alternative to classical inverse methods.

nan

Article 326

Title@2025-07-29 (2): Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training

Title: Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training

Verwendung von Skalierungsgesetzen für Datenquellen-Utility-Schätzung im Domain-Spezifischen Pre-Training

在具体区域培训前使用数据源实用性估算法 2507.22250v1

Authors (10): Oleksiy Ostapenko, Charles Guille-Escuret, Luke Kumar, Max Tian, Denis Kocetkov, Gopeshh Subbaraj, Raymond Li, Joel Lamy-Poirier, Sebastien Paquet, Torsten Scholak

We introduce a framework for optimizing domain-specific dataset construction in foundation model training. Specifically, we seek a cost-efficient way to estimate the quality of data sources (e.g. synthetically generated or filtered web data, etc.) in order to make optimal decisions about resource allocation for data sourcing from these sources for the stage two pre-training phase, aka annealing, with the goal of specializing a generalist pre-trained model to specific domains. Our approach extends the usual point estimate approaches, aka micro-annealing, to estimating scaling laws by performing multiple annealing runs of varying compute spent on data curation and training. This addresses a key limitation in prior work, where reliance on point estimates for data scaling decisions can be misleading due to the lack of rank invariance across compute scales – a phenomenon we confirm in our experiments. By systematically analyzing performance gains relative to acquisition costs, we find that scaling curves can be estimated for different data sources. Such scaling laws can inform cost effective resource allocation across different data acquisition methods (e.g. synthetic data), data sources (e.g. user or web data) and available compute resources. We validate our approach through experiments on a pre-trained model with 7 billion parameters. We adapt it to: a domain well-represented in the pre-training data – the medical domain, and a domain underrepresented in the pretraining corpora – the math domain. We show that one can efficiently estimate the scaling behaviors of a data source by running multiple annealing runs, which can lead to different conclusions, had one used point estimates using the usual micro-annealing technique instead. This enables data-driven decision-making for selecting and optimizing data sources.

nan

Article 327

Title@2025-07-29 (2): LLM-Assisted Cheating Detection in Korean Language via Keystrokes

Title: LLM-Assisted Cheating Detection in Korean Language via Keystrokes

LLM-Assisted Cheating Detection in koreanischer Sprache über Tastenanschläge

通过Keystrokes用韩语协助LLM 2507.22956v1

Authors (3): Dong Hyun Roh, Rajesh Kumar, An Ngo

This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean, addressing key gaps in prior research regarding language coverage, cognitive context, and the granularity of LLM involvement. Our proposed dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses. Each task spans six cognitive processes defined in Bloom’s Taxonomy (remember, understand, apply, analyze, evaluate, and create). We extract interpretable temporal and rhythmic features and evaluate multiple classifiers under both Cognition-Aware and Cognition-Unaware settings. Temporal features perform well under Cognition-Aware evaluation scenarios, while rhythmic features generalize better under cross-cognition scenarios. Moreover, detecting bona fide and transcribed responses was easier than paraphrased ones for both the proposed models and human evaluators, with the models significantly outperforming the humans. Our findings affirm that keystroke dynamics facilitate reliable detection of LLM-assisted writing across varying cognitive demands and writing strategies, including paraphrasing and transcribing LLM-generated responses.

nan

Article 328

Title@2025-07-29 (2): Understanding Concept Drift with Deprecated Permissions in Android Malware Detection

Title: Understanding Concept Drift with Deprecated Permissions in Android Malware Detection

Verständnis Konzept Drift mit veralteten Berechtigungen in Android Malware-Erkennung

理解在Android Maware 探测中拥有过时权限的漂浮概念 2507.22231v1

Authors (4): Ahmed Sabbah, Radi Jarrar, Samer Zein, David Mohaisen

Permission analysis is a widely used method for Android malware detection. It involves examining the permissions requested by an application to access sensitive data or perform potentially malicious actions. In recent years, various machine learning (ML) algorithms have been applied to Android malware detection using permission-based features and feature selection techniques, often achieving high accuracy. However, these studies have largely overlooked important factors such as protection levels and the deprecation or restriction of permissions due to updates in the Android OS – factors that can contribute to concept drift. In this study, we investigate the impact of deprecated and restricted permissions on the performance of machine learning models. A large dataset containing 166 permissions was used, encompassing more than 70,000 malware and benign applications. Various machine learning and deep learning algorithms were employed as classifiers, along with different concept drift detection strategies. The results suggest that Android permissions are highly effective features for malware detection, with the exclusion of deprecated and restricted permissions having only a marginal impact on model performance. In some cases, such as with CNN, accuracy improved. Excluding these permissions also enhanced the detection of concept drift using a year-to-year analysis strategy. Dataset balancing further improved model performance, reduced low-accuracy instances, and enhanced concept drift detection via the Kolmogorov-Smirnov test.

nan

Article 329

Title@2025-07-29 (2): TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction

Title: TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction

TRIBE: TRImodaler Gehirnencoder für Vollhirn fMRI-Antwortvorhersage

TRIBE:用于全脑FMRI反应预测的三元脑大脑编码器 2507.22229v1

Authors (5): Stéphane d’Ascoli, Jérémy Rapin, Yohann Benchetrit, Hubert Banville, Jean-Rémi King

Historically, neuroscience has progressed by fragmenting into specialized domains, each focusing on isolated modalities, tasks, or brain regions. While fruitful, this approach hinders the development of a unified model of cognition. Here, we introduce TRIBE, the first deep neural network trained to predict brain responses to stimuli across multiple modalities, cortical areas and individuals. By combining the pretrained representations of text, audio and video foundational models and handling their time-evolving nature with a transformer, our model can precisely model the spatial and temporal fMRI responses to videos, achieving the first place in the Algonauts 2025 brain encoding competition with a significant margin over competitors. Ablations show that while unimodal models can reliably predict their corresponding cortical networks (e.g. visual or auditory networks), they are systematically outperformed by our multimodal model in high-level associative cortices. Currently applied to perception and comprehension, our approach paves the way towards building an integrative model of representations in the human brain. Our code is available at https://github.com/facebookresearch/algonauts-2025.

nan

Article 330

Title@2025-07-29 (2): LLMs Between the Nodes: Community Discovery Beyond Vectors

Title: LLMs Between the Nodes: Community Discovery Beyond Vectors

LLMs zwischen den Knoten: Community Discovery Beyond Vectors

节点之间的LLMs:除矢量之外的社区发现 2507.22955v1

Authors (2): Ekta Gujral, Apurva Sinha

Community detection in social network graphs plays a vital role in uncovering group dynamics, influence pathways, and the spread of information. Traditional methods focus primarily on graph structural properties, but recent advancements in Large Language Models (LLMs) open up new avenues for integrating semantic and contextual information into this task. In this paper, we present a detailed investigation into how various LLM-based approaches perform in identifying communities within social graphs. We introduce a two-step framework called CommLLM, which leverages the GPT-4o model along with prompt-based reasoning to fuse language model outputs with graph structure. Evaluations are conducted on six real-world social network datasets, measuring performance using key metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Variation of Information (VOI), and cluster purity. Our findings reveal that LLMs, particularly when guided by graph-aware strategies, can be successfully applied to community detection tasks in small to medium-sized graphs. We observe that the integration of instruction-tuned models and carefully engineered prompts significantly improves the accuracy and coherence of detected communities. These insights not only highlight the potential of LLMs in graph-based research but also underscore the importance of tailoring model interactions to the specific structure of graph data.

nan

Article 331

Title@2025-07-29 (2): Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Title: Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Kontrastive Test-Zeit-Zusammensetzung mehrerer LoRA-Modelle für die Bildgenerierung

图像生成多种LORA模型的反向测试时间构成 2403.19776v2

Authors (4): Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag

Low-Rank Adaptation (LoRA) has emerged as a powerful and popular technique for personalization, enabling efficient adaptation of pre-trained image generation models for specific tasks without comprehensive retraining. While employing individual pre-trained LoRA models excels at representing single concepts, such as those representing a specific dog or a cat, utilizing multiple LoRA models to capture a variety of concepts in a single image still poses a significant challenge. Existing methods often fall short, primarily because the attention mechanisms within different LoRA models overlap, leading to scenarios where one concept may be completely ignored (e.g., omitting the dog) or where concepts are incorrectly combined (e.g., producing an image of two cats instead of one cat and one dog). We introduce CLoRA, a training-free approach that addresses these limitations by updating the attention maps of multiple LoRA models at test-time, and leveraging the attention maps to create semantic masks for fusing latent representations. This enables the generation of composite images that accurately reflect the characteristics of each LoRA. Our comprehensive qualitative and quantitative evaluations demonstrate that CLoRA significantly outperforms existing methods in multi-concept image generation using LoRAs.

nan

Article 332

Title@2025-07-29 (2): Explainability-Driven Feature Engineering for Mid-Term Electricity Load Forecasting in ERCOT’s SCENT Region

Title: Explainability-Driven Feature Engineering for Mid-Term Electricity Load Forecasting in ERCOT’s SCENT Region

Erklärbarkeitsgetriebene Feature-Engineering für mittelfristige Stromlastprognosen in der SCENT-Region von ERCOT

ERCOT地区中期电力负载预报的可解释性-变化式地貌工程 2507.22220v1

Authors (2): Abhiram Bhupatiraju, Sung Bum Ahn

Accurate load forecasting is essential to the operation of modern electric power systems. Given the sensitivity of electricity demand to weather variability and temporal dynamics, capturing non-linear patterns is essential for long-term planning. This paper presents a comparative analysis of machine learning models, Linear Regression, XGBoost, LightGBM, and Long Short-Term Memory (LSTM), for forecasting system-wide electricity load up to one year in advance. Midterm forecasting has shown to be crucial for maintenance scheduling, resource allocation, financial forecasting, and market participation. The paper places a focus on the use of a method called “Shapley Additive Explanations” (SHAP) to improve model explainability. SHAP enables the quantification of feature contributions, guiding informed feature engineering and improving both model transparency and forecasting accuracy.

nan

Article 333

Title@2025-07-29 (2): Representation biases: will we achieve complete understanding by analyzing representations?

Title: Representation biases: will we achieve complete understanding by analyzing representations?

Repräsentationsvoreingenommenheiten: Werden wir durch die Analyse von Repräsentationen ein vollständiges Verständnis erreichen?

代表性偏差:我们能否通过分析表述来实现完全理解? 2507.22216v1

Authors (4): Andrew Kyle Lampinen, Stephanie C. Y. Chan, Yuxuan Li, Katherine Hermann

A common approach in neuroscience is to study neural representations as a means to understand a system – increasingly, by relating the neural representations to the internal representations learned by computational models. However, a recent work in machine learning (Lampinen, 2024) shows that learned feature representations may be biased to over-represent certain features, and represent others more weakly and less-consistently. For example, simple (linear) features may be more strongly and more consistently represented than complex (highly nonlinear) features. These biases could pose challenges for achieving full understanding of a system through representational analysis. In this perspective, we illustrate these challenges – showing how feature representation biases can lead to strongly biased inferences from common analyses like PCA, regression, and RSA. We also present homomorphic encryption as a simple case study of the potential for strong dissociation between patterns of representation and computation. We discuss the implications of these results for representational comparisons between systems, and for neuroscience more generally.

nan

Article 334

Title@2025-07-29 (2): Neural Autoregressive Modeling of Brain Aging

Title: Neural Autoregressive Modeling of Brain Aging

Neurale Autoregressive Modellierung des Gehirnalterns

脑老龄化神经自动递减建模 2507.22954v1

Authors (4): Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli

Brain aging synthesis is a critical task with broad applications in clinical and computational neuroscience. The ability to predict the future structural evolution of a subject’s brain from an earlier MRI scan provides valuable insights into aging trajectories. Yet, the high-dimensionality of data, subtle changes of structure across ages, and subject-specific patterns constitute challenges in the synthesis of the aging brain. To overcome these challenges, we propose NeuroAR, a novel brain aging simulation model based on generative autoregressive transformers. NeuroAR synthesizes the aging brain by autoregressively estimating the discrete token maps of a future scan from a convenient space of concatenated token embeddings of a previous and future scan. To guide the generation, it concatenates into each scale the subject’s previous scan, and uses its acquisition age and the target age at each block via cross-attention. We evaluate our approach on both the elderly population and adolescent subjects, demonstrating superior performance over state-of-the-art generative models, including latent diffusion models (LDM) and generative adversarial networks, in terms of image fidelity. Furthermore, we employ a pre-trained age predictor to further validate the consistency and realism of the synthesized images with respect to expected aging patterns. NeuroAR significantly outperforms key models, including LDM, demonstrating its ability to model subject-specific brain aging trajectories with high fidelity.

nan

Article 335

Title@2025-07-29 (2): Intent-Aware Neural Query Reformulation for Behavior-Aligned Product Search

Title: Intent-Aware Neural Query Reformulation for Behavior-Aligned Product Search

Intent-Aware Neural Query Reformulation für verhaltensorientierte Produktsuche

用于行为自动产品搜索的内在软件元件神经查询重新校正 2507.22213v1

Authors (2): Jayanth Yetukuri, Ishita Khan

Understanding and modeling buyer intent is a foundational challenge in optimizing search query reformulation within the dynamic landscape of e-commerce search systems. This work introduces a robust data pipeline designed to mine and analyze large-scale buyer query logs, with a focus on extracting fine-grained intent signals from both explicit interactions and implicit behavioral cues. Leveraging advanced sequence mining techniques and supervised learning models, the pipeline systematically captures patterns indicative of latent purchase intent, enabling the construction of a high-fidelity, intent-rich dataset. The proposed framework facilitates the development of adaptive query rewrite strategies by grounding reformulations in inferred user intent rather than surface-level lexical signals. This alignment between query rewriting and underlying user objectives enhances both retrieval relevance and downstream engagement metrics. Empirical evaluations across multiple product verticals demonstrate measurable gains in precision-oriented relevance metrics, underscoring the efficacy of intent-aware reformulation. Our findings highlight the value of intent-centric modeling in bridging the gap between sparse user inputs and complex product discovery goals, and establish a scalable foundation for future research in user-aligned neural retrieval and ranking systems.

nan

Article 336

Title@2025-07-29 (2): Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling

Title: Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling

Graphenbasiertes unsicheres Selbsttraining mit stochastischem Knoten-Etikettierung

以图形为基础的不确定性软件自训练与斯托卡节点标签 2503.22745v2

Authors (3): Tom Liu, Anna Wu, Chao Li

Self-training has become a popular semi-supervised learning technique for leveraging unlabeled data. However, the over-confidence of pseudo-labels remains a key challenge. In this paper, we propose a novel \emph{graph-based uncertainty-aware self-training} (GUST) framework to combat over-confidence in node classification. Drawing inspiration from the uncertainty integration idea introduced by Wang \emph{et al.}~\cite{wang2024uncertainty}, our method largely diverges from previous self-training approaches by focusing on \emph{stochastic node labeling} grounded in the graph topology. Specifically, we deploy a Bayesian-inspired module to estimate node-level uncertainty, incorporate these estimates into the pseudo-label generation process via an expectation-maximization (EM)-like step, and iteratively update both node embeddings and adjacency-based transformations. Experimental results on several benchmark graph datasets demonstrate that our GUST framework achieves state-of-the-art performance, especially in settings where labeled data is extremely sparse.

nan

Article 337

Title@2025-07-29 (2): Uncertainty-Aware Graph Self-Training with Expectation-Maximization Regularization

Title: Uncertainty-Aware Graph Self-Training with Expectation-Maximization Regularization

Unsicheres Graphen-Selbst-Training mit Erwartungsmaximierung Regularisierung

具有预期-最大程度正规化的不确定性-软件图自我培训 2503.22744v2

Authors (3): Emily Wang, Michael Chen, Chao Li

In this paper, we propose a novel \emph{uncertainty-aware graph self-training} approach for semi-supervised node classification. Our method introduces an Expectation-Maximization (EM) regularization scheme to incorporate an uncertainty mechanism during pseudo-label generation and model retraining. Unlike conventional graph self-training pipelines that rely on fixed pseudo-labels, our approach iteratively refines label confidences with an EM-inspired uncertainty measure. This ensures that the predictive model focuses on reliable graph regions while gradually incorporating ambiguous nodes. Inspired by prior work on uncertainty-aware self-training techniques~\cite{wang2024uncertainty}, our framework is designed to handle noisy graph structures and feature spaces more effectively. Through extensive experiments on several benchmark graph datasets, we demonstrate that our method outperforms strong baselines by a margin of up to 2.5\% in accuracy while maintaining lower variance in performance across multiple runs.

nan

Article 338

Title@2025-07-29 (2): Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection

Title: Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection

Adaptive State-Space Mamba für Echtzeit-Sensordatenanomalienerkennung

用于实时传感器数据异常探测的适应性国家空间Mamba 2503.22743v2

Authors (2): Alice Zhang, Chao Li

State-space modeling has emerged as a powerful paradigm for sequence analysis in various tasks such as natural language processing, time-series forecasting, and signal processing. In this work, we propose an \emph{Adaptive State-Space Mamba} (\textbf{ASSM}) framework for real-time sensor data anomaly detection. While state-space models have been previously employed for image processing applications (e.g., style transfer \cite{wang2024stylemamba}), our approach leverages the core idea of sequential hidden states to tackle a significantly different domain: detecting anomalies on streaming sensor data. In particular, we introduce an adaptive gating mechanism that dynamically modulates the hidden state update based on contextual and learned statistical cues. This design ensures that our model remains computationally efficient and scalable, even under rapid data arrival rates. Extensive experiments on real-world and synthetic sensor datasets demonstrate that our method achieves superior detection performance compared to existing baselines. Our approach is easily extensible to other time-series tasks that demand rapid and reliable detection capabilities.

nan

Article 339

Title@2025-07-29 (2): Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

Title: Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

Gemeinsam besser: Kreuz- und Gelenkkovarianzen verbessern die Erkennung von Signalen in unterprobierten Daten

更好:交叉和共同变量加强未充分抽样数据中的信号可探测性 2507.22207v1

Authors (3): Arabind Swain, Sean Alexander Ridout, Ilya Nemenman

Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and P'ech'e detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.

nan

Article 340

Title@2025-07-29 (2): CTG-Insight: A Multi-Agent Interpretable LLM Framework for Cardiotocography Analysis and Classification

Title: CTG-Insight: A Multi-Agent Interpretable LLM Framework for Cardiotocography Analysis and Classification

CTG-Insight: Multi-Agent Interpretable LLM Framework für Kardiotokographie Analyse und Klassifizierung

CTG-In透视:多机构可解释LLM 心电图学分析和分类框架 2507.22205v1

Authors (3): Black Sun, Die, Hu

Remote fetal monitoring technologies are becoming increasingly common. Yet, most current systems offer limited interpretability, leaving expectant parents with raw cardiotocography (CTG) data that is difficult to understand. In this work, we present CTG-Insight, a multi-agent LLM system that provides structured interpretations of fetal heart rate (FHR) and uterine contraction (UC) signals. Drawing from established medical guidelines, CTG-Insight decomposes each CTG trace into five medically defined features: baseline, variability, accelerations, decelerations, and sinusoidal pattern, each analyzed by a dedicated agent. A final aggregation agent synthesizes the outputs to deliver a holistic classification of fetal health, accompanied by a natural language explanation. We evaluate CTG-Insight on the NeuroFetalNet Dataset and compare it against deep learning models and the single-agent LLM baseline. Results show that CTG-Insight achieves state-of-the-art accuracy (96.4%) and F1-score (97.8%) while producing transparent and interpretable outputs. This work contributes an interpretable and extensible CTG analysis framework.

nan

Article 341

Title@2025-07-29 (2): KIX: A Knowledge and Interaction-Centric Metacognitive Framework for Task Generalization

Title: KIX: A Knowledge and Interaction-Centric Metacognitive Framework for Task Generalization

KIX: Ein Wissen und Interaktion-Zentrisches Metakognitives Framework für die Aufgabenverallgemeinerung

KIX: 任务一般化的知识和互动中心元化框架 2402.05346v3

Authors (2): Arun Kumar, Paul Schrater

People aptly exhibit general intelligence behaviors through flexible problem-solving and the ability to adapt to novel situations by reusing and applying high-level knowledge acquired over time. In contrast, artificial agents tend to be specialists, lacking such generalist behaviors. To bridge this gap, artificial agents will require understanding and exploiting critical structured knowledge representations. We introduce a metacognitive reasoning framework, Knowledge-Interaction-eXecution (KIX), and argue that interactions with objects, by leveraging a type space, facilitate the learning of transferable interaction concepts and promote generalization. This framework offers a principled approach for integrating knowledge into reinforcement learning and holds promise as an enabler for generalist behaviors in artificial intelligence, robotics, and autonomous systems.

nan

Article 342

Title@2025-07-29 (2): Measuring Time-Series Dataset Similarity using Wasserstein Distance

Title: Measuring Time-Series Dataset Similarity using Wasserstein Distance

Messung der Zeitreihen-Datensätze Ähnlichkeit mit Wasserstein-Abstand

利用瓦瑟斯坦距离测量时间序列数据集的相似性 2507.22189v1

Authors (4): Hongjie Chen, Akshay Mehra, Josh Kimball, Ryan A. Rossi

The emergence of time-series foundation model research elevates the growing need to measure the (dis)similarity of time-series datasets. A time-series dataset similarity measure aids research in multiple ways, including model selection, finetuning, and visualization. In this paper, we propose a distribution-based method to measure time-series dataset similarity by leveraging the Wasserstein distance. We consider a time-series dataset an empirical instantiation of an underlying multivariate normal distribution (MVN). The similarity between two time-series datasets is thus computed as the Wasserstein distance between their corresponding MVNs. Comprehensive experiments and visualization show the effectiveness of our approach. Specifically, we show how the Wasserstein distance helps identify similar time-series datasets and facilitates inference performance estimation of foundation models in both out-of-distribution and transfer learning evaluation, with high correlations between our proposed measure and the inference loss (>0.60).

nan

Article 343

Title@2025-07-29 (2): A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Title: A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Eine skalierbare Pipeline zur Schätzung von Verb Frame Frequenzen mit großen Sprachmodellen

使用大语言模型估算 Verb 框架频谱的可缩放管道 2507.22187v1

Authors (2): Adam M. Morgan, Adeen Flinker

We present an automated pipeline for estimating Verb Frame Frequencies (VFFs), the frequency with which a verb appears in particular syntactic frames. VFFs provide a powerful window into syntax in both human and machine language systems, but existing tools for calculating them are limited in scale, accuracy, or accessibility. We use large language models (LLMs) to generate a corpus of sentences containing 476 English verbs. Next, by instructing an LLM to behave like an expert linguist, we had it analyze the syntactic structure of the sentences in this corpus. This pipeline outperforms two widely used syntactic parsers across multiple evaluation datasets. Furthermore, it requires far fewer resources than manual parsing (the gold-standard), thereby enabling rapid, scalable VFF estimation. Using the LLM parser, we produce a new VFF database with broader verb coverage, finer-grained syntactic distinctions, and explicit estimates of the relative frequencies of structural alternates commonly studied in psycholinguistics. The pipeline is easily customizable and extensible to new verbs, syntactic frames, and even other languages. We present this work as a proof of concept for automated frame frequency estimation, and release all code and data to support future research.

nan

Article 344

Title@2025-07-29 (2): SourceSplice: Source Selection for Machine Learning Tasks

Title: SourceSplice: Source Selection for Machine Learning Tasks

SourceSplice: Auswahl der Quellen für Aufgaben des maschinellen Lernens

源代码Splice: 机器学习任务源选择 2507.22186v1

Authors (2): Ambarish Singh, Romila Pradhan

Data quality plays a pivotal role in the predictive performance of machine learning (ML) tasks - a challenge amplified by the deluge of data sources available in modern organizations.Prior work in data discovery largely focus on metadata matching, semantic similarity or identifying tables that should be joined to answer a particular query, but do not consider source quality for high performance of the downstream ML task.This paper addresses the problem of determining the best subset of data sources that must be combined to construct the underlying training dataset for a given ML task.We propose SourceGrasp and SourceSplice, frameworks designed to efficiently select a suitable subset of sources that maximizes the utility of the downstream ML model.Both the algorithms rely on the core idea that sources (or their combinations) contribute differently to the task utility, and must be judiciously chosen.While SourceGrasp utilizes a metaheuristic based on a greediness criterion and randomization, the SourceSplice framework presents a source selection mechanism inspired from gene splicing - a core concept used in protein synthesis.We empirically evaluate our algorithms on three real-world datasets and synthetic datasets and show that, with significantly fewer subset explorations, SourceSplice effectively identifies subsets of data sources leading to high task utility.We also conduct studies reporting the sensitivity of SourceSplice to the decision choices under several settings.

nan

Article 345

Title@2025-07-29 (2): SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

Title: SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

SyncDiff: Synchronisierte Bewegung Diffusion für Multi-Body Mensch-Objekt-Interaktion Synthese

同步Diff: 用于多波人-物体相互作用合成的同步运动扩散 2412.20104v5

Authors (4): Wenkun He, Yun Liu, Ruitao Liu, Li Yi

Synthesizing realistic human-object interaction motions is a critical problem in VR/AR and human animation. Unlike the commonly studied scenarios involving a single human or hand interacting with one object, we address a more generic multi-body setting with arbitrary numbers of humans, hands, and objects. This complexity introduces significant challenges in synchronizing motions due to the high correlations and mutual influences among bodies. To address these challenges, we introduce SyncDiff, a novel method for multi-body interaction synthesis using a synchronized motion diffusion strategy. SyncDiff employs a single diffusion model to capture the joint distribution of multi-body motions. To enhance motion fidelity, we propose a frequency-domain motion decomposition scheme. Additionally, we introduce a new set of alignment scores to emphasize the synchronization of different body motions. SyncDiff jointly optimizes both data sample likelihood and alignment likelihood through an explicit synchronization strategy. Extensive experiments across four datasets with various multi-body configurations demonstrate the superiority of SyncDiff over existing state-of-the-art motion synthesis methods.

nan

Article 346

Title@2025-07-29 (2): Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Title: Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Gestapelte SVD oder SVD gestapelt? Eine Random Matrix Theorie Perspektive auf Datenintegration

堆叠的 SVD 还是堆叠的 SVD ? 关于数据整合的随机矩阵理论视角 2507.22170v1

Authors (4): Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma

Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today’s world of increasing data size and dimensionality. This lack of theoretical understanding has led to uncertainty about which method to choose and limited the ability to fully exploit their potential. To address these challenges, we derive exact expressions for the asymptotic performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods. Our analysis reveals that while neither method uniformly dominates the other in the unweighted case, optimally weighted Stack-SVD dominates optimally weighted SVD-Stack. We extend our analysis to accommodate multiple shared components, and provide practical algorithms for estimating optimal weights from data, offering theoretical guidance for method selection in practical data integration problems. Extensive numerical simulations and semi-synthetic experiments on genomic data corroborate our theoretical findings.

nan

Article 347

Title@2025-07-29 (2): Distributional Unlearning: Forgetting Distributions, Not Just Samples

Title: Distributional Unlearning: Forgetting Distributions, Not Just Samples

Verteilungsloses Lernen: Verteilungen vergessen, nicht nur Proben

分发的不学习:忘记分发,而不仅仅是抽样 2507.15112v2

Authors (3): Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo

Machine unlearning seeks to remove unwanted information from trained models, initially at the individual-sample level, but increasingly at the level of entire sub-populations. In many deployments, models must delete whole topical domains to satisfy privacy, legal, or quality requirements, e.g., removing several users’ posts under GDPR or copyrighted web content. Existing unlearning tools remain largely sample-oriented, and straightforward point deletion often leaves enough residual signal for downstream learners to recover the unwanted domain. We introduce distributional unlearning, a data-centric, model-agnostic framework that asks: Given examples from an unwanted distribution and a retained distribution, what is the smallest set of points whose removal makes the edited dataset far from the unwanted domain yet close to the retained one? Using Kullback-Leibler divergence to quantify removal and preservation, we derive the exact Pareto frontier in the Gaussian case and prove that any model retrained on the edited data incurs log-loss shifts bounded by the divergence thresholds. We propose a simple distance-based selection rule satisfying these constraints with a quadratic reduction in deletion budget compared to random removal. Experiments on synthetic Gaussians, Jigsaw Toxic Comments, SMS spam, and CIFAR-10 show 15-72% fewer deletions than random, with negligible impact on retained performance.

nan

Article 348

Title@2025-07-29 (2): When Truthful Representations Flip Under Deceptive Instructions?

Title: When Truthful Representations Flip Under Deceptive Instructions?

Wenn wahrheitsgetreue Darstellungen unter trügerische Anweisungen fallen?

当真相代表在欺骗性指令下翻转时? 2507.22149v1

Authors (7): Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li

Large language models (LLMs) tend to follow maliciously crafted instructions to generate deceptive responses, posing safety challenges. How deceptive instructions alter the internal representations of LLM compared to truthful ones remains poorly understood beyond output analysis. To bridge this gap, we investigate when and how these representations ``flip’’, such as from truthful to deceptive, under deceptive versus truthful/neutral instructions. Analyzing the internal representations of Llama-3.1-8B-Instruct and Gemma-2-9B-Instruct on a factual verification task, we find the model’s instructed True/False output is predictable via linear probes across all conditions based on the internal representation. Further, we use Sparse Autoencoders (SAEs) to show that the Deceptive instructions induce significant representational shifts compared to Truthful/Neutral representations (which are similar), concentrated in early-to-mid layers and detectable even on complex datasets. We also identify specific SAE features highly sensitive to deceptive instruction and use targeted visualizations to confirm distinct truthful/deceptive representational subspaces. % Our analysis pinpoints layer-wise and feature-level correlates of instructed dishonesty, offering insights for LLM detection and control. Our findings expose feature- and layer-level signatures of deception, offering new insights for detecting and mitigating instructed dishonesty in LLMs.

nan

Article 349

Title@2025-07-29 (2): MOSS: Multi-Objective Optimization for Stable Rule Sets

Title: MOSS: Multi-Objective Optimization for Stable Rule Sets

MOSS: Multi-Objektive Optimierung für stabile Regelsätze

MOSS: 稳定规则集的多目标优化 2506.08030v2

Authors (2): Brian Liu, Rahul Mazumder

We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective optimization framework. Importantly, MOSS allows a practitioner to rapidly evaluate the trade-off between accuracy and stability in sparse rule sets in order to select an appropriate model. We develop a specialized cutting plane algorithm in our framework to rapidly compute the Pareto frontier between these two objectives, and our algorithm scales to problem instances beyond the capabilities of commercial optimization solvers. Our experiments show that MOSS outperforms state-of-the-art rule ensembles in terms of both predictive performance and stability.

nan

Article 350

Title@2025-07-29 (2): Automated Label Placement on Maps via Large Language Models

Title: Automated Label Placement on Maps via Large Language Models

Automatische Etikettenplatzierung auf Karten über große Sprachmodelle

通过大语言模型在地图上自动贴贴标签 2507.22952v1

Authors (2): Harry Shomer, Jiejun Xu

Label placement is a critical aspect of map design, serving as a form of spatial annotation that directly impacts clarity and interpretability. Despite its importance, label placement remains largely manual and difficult to scale, as existing automated systems struggle to integrate cartographic conventions, adapt to context, or interpret labeling instructions. In this work, we introduce a new paradigm for automatic label placement (ALP) that formulates the task as a data editing problem and leverages large language models (LLMs) for context-aware spatial annotation. To support this direction, we curate MAPLE, the first known benchmarking dataset for evaluating ALP on real-world maps, encompassing diverse landmark types and label placement annotations from open-source data. Our method retrieves labeling guidelines relevant to each landmark type leveraging retrieval-augmented generation (RAG), integrates them into prompts, and employs instruction-tuned LLMs to generate ideal label coordinates. We evaluate four open-source LLMs on MAPLE, analyzing both overall performance and generalization across different types of landmarks. This includes both zero-shot and instruction-tuned performance. Our results demonstrate that LLMs, when guided by structured prompts and domain-specific retrieval, can learn to perform accurate spatial edits, aligning the generated outputs with expert cartographic standards. Overall, our work presents a scalable framework for AI-assisted map finishing and demonstrates the potential of foundation models in structured data editing tasks. The code and data can be found at https://github.com/HarryShomer/MAPLE.

nan

Article 351

Title@2025-07-29 (2): Foundation Models for Demand Forecasting via Dual-Strategy Ensembling

Title: Foundation Models for Demand Forecasting via Dual-Strategy Ensembling

Grundlagenmodelle für die Nachfrageprognose über Dual-Strategy-Assembling

通过双战略组合进行需求预测的基础模型 2507.22053v1

Authors (3): Wei Yang, Defu Cao, Yan Liu

Accurate demand forecasting is critical for supply chain optimization, yet remains difficult in practice due to hierarchical complexity, domain shifts, and evolving external factors. While recent foundation models offer strong potential for time series forecasting, they often suffer from architectural rigidity and limited robustness under distributional change. In this paper, we propose a unified ensemble framework that enhances the performance of foundation models for sales forecasting in real-world supply chains. Our method combines two complementary strategies: (1) Hierarchical Ensemble (HE), which partitions training and inference by semantic levels (e.g., store, category, department) to capture localized patterns; and (2) Architectural Ensemble (AE), which integrates predictions from diverse model backbones to mitigate bias and improve stability. We conduct extensive experiments on the M5 benchmark and three external sales datasets, covering both in-domain and zero-shot forecasting. Results show that our approach consistently outperforms strong baselines, improves accuracy across hierarchical levels, and provides a simple yet effective mechanism for boosting generalization in complex forecasting environments.

nan

Article 352

Title@2025-07-29 (2): Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Title: Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives

从树形集合中提取解释模型:计算和统计视角 2506.20114v3

Authors (3): Brian Liu, Rahul Mazumder, Peter Radchenko

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

nan

Article 353

Title@2025-07-29 (2): Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling

Title: Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling

Gewicht-Parameterisierung in kontinuierlichen Zeittiefen Neuronalen Netzwerken für Surrogatmodellierung

用于代用建模的连续时间深心神经网络中的重量光度计 2507.22045v1

Authors (3): Haley Rosso, Lars Ruthotto, Khachik Sargsyan

Continuous-time deep learning models, such as neural ordinary differential equations (ODEs), offer a promising framework for surrogate modeling of complex physical systems. A central challenge in training these models lies in learning expressive yet stable time-varying weights, particularly under computational constraints. This work investigates weight parameterization strategies that constrain the temporal evolution of weights to a low-dimensional subspace spanned by polynomial basis functions. We evaluate both monomial and Legendre polynomial bases within neural ODE and residual network (ResNet) architectures under discretize-then-optimize and optimize-then-discretize training paradigms. Experimental results across three high-dimensional benchmark problems show that Legendre parameterizations yield more stable training dynamics, reduce computational cost, and achieve accuracy comparable to or better than both monomial parameterizations and unconstrained weight models. These findings elucidate the role of basis choice in time-dependent weight parameterization and demonstrate that using orthogonal polynomial bases offers a favorable tradeoff between model expressivity and training efficiency.

nan

Article 354

Title@2025-07-29 (2): Compton Form Factor Extraction using Quantum Deep Neural Networks

Title: Compton Form Factor Extraction using Quantum Deep Neural Networks

Compton Form Factor Extraction mit Hilfe von Quantum Deep Neural Networks

使用量子深神经网络抽取 Compton 窗体系数 2504.15458v2

Authors (2): Brandon B. Le, Dustin Keller

We present an extraction of Compton Form Factors (CFFs) from Deeply Virtual Compton Scattering (DVCS) experiments conducted at Thomas Jefferson National Accelerator Facility, utilizing Quantum Deep Neural Networks (QDNNs). The analysis employs the standard Belitsky, Kirchner, and M"uller formalism at twist-two, complemented by a fitting procedure designed to minimize model dependence in a manner analogous to conventional local fits. A pseudodata extraction test of the CFFs is performed using both Classical Deep Neural Networks (CDNNs) and QDNNs, with a detailed comparative analysis. Results indicate that QDNNs can outperform CDNNs in particular cases, offering enhanced predictive accuracy and precision even with limited model complexity. Motivated by this, we develop a metric to quantify the extent of the quantum advantage based on characteristics of DVCS experimental data. These findings underscore the promising role of QDNNs in advancing future investigations into multidimensional parton distributions and hadronic physics.

nan

Article 355

Title@2025-07-29 (2): Structure-Informed Deep Reinforcement Learning for Inventory Management

Title: Structure-Informed Deep Reinforcement Learning for Inventory Management

Strukturinformiertes Deep Verstärkungslernen für das Bestandsmanagement

为库存管理进行结构化深强化学习 2507.22040v1

Authors (7): Alvaro Maggiar, Sohrab Andaz, Akhil Bagaria, Carson Eisenach, Dean Foster, Omer Gottesman, Dominique Perrault-Joncas

This paper investigates the application of Deep Reinforcement Learning (DRL) to classical inventory management problems, with a focus on practical implementation considerations. We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios including multi-period systems with lost sales (with and without lead times), perishable inventory management, dual sourcing, and joint inventory procurement and removal. The DRL approach learns policies across products using only historical information that would be available in practice, avoiding unrealistic assumptions about demand distributions or access to distribution parameters. We demonstrate that our generic DRL implementation performs competitively against or outperforms established benchmarks and heuristics across these diverse settings, while requiring minimal parameter tuning. Through examination of the learned policies, we show that the DRL approach naturally captures many known structural properties of optimal policies derived from traditional operations research methods. To further improve policy performance and interpretability, we propose a Structure-Informed Policy Network technique that explicitly incorporates analytically-derived characteristics of optimal policies into the learning process. This approach can help interpretability and add robustness to the policy in out-of-sample performance, as we demonstrate in an example with realistic demand data. Finally, we provide an illustrative application of DRL in a non-stationary setting. Our work bridges the gap between data-driven learning and analytical insights in inventory management while maintaining practical applicability.

nan

Article 356

Title@2025-07-29 (2): SAKE: Steering Activations for Knowledge Editing

Title: SAKE: Steering Activations for Knowledge Editing

SAKE: Steuerung von Aktivierungen für die Wissensbearbeitung

战略:知识编辑指导活动 2503.01751v2

Authors (4): Marco Scialanga, Thibault Laugel, Vincent Grari, Marcin Detyniecki

As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.

nan

Article 357

Title@2025-07-29 (2): Supervised Quantum Image Processing

Title: Supervised Quantum Image Processing

Überwachte Quantenbildverarbeitung

监督量子图像处理 2507.22039v1

Authors (4): Marco Parigi, Mehran Khosrojerdi, Filippo Caruso, Leonardo Banchi

In the era of big data and artificial intelligence, the increasing volume of data and the demand to solve more and more complex computational challenges are two driving forces for improving the efficiency of data storage, processing and analysis. Quantum image processing (QIP) is an interdisciplinary field between quantum information science and image processing, which has the potential to alleviate some of these challenges by leveraging the power of quantum computing. In this work, we compare and examine the compression properties of four different Quantum Image Representations (QImRs): namely, Tensor Network Representation (TNR), Flexible Representation of Quantum Image (FRQI), Novel Enhanced Quantum Representation NEQR, and Quantum Probability Image Encoding (QPIE). Our simulations show that FRQI performs a higher compression of image information than TNR, NEQR, and QPIE. Furthermore, we investigate the trade-off between accuracy and memory in binary classification problems, evaluating the performance of quantum kernels based on QImRs compared to the classical linear kernel. Our results indicate that quantum kernels provide comparable classification average accuracy but require exponentially fewer resources for image storage.

nan

Article 358

Title@2025-07-29 (2): UserBench: An Interactive Gym Environment for User-Centric Agents

Title: UserBench: An Interactive Gym Environment for User-Centric Agents

UserBench: Eine interaktive Gym-Umgebung für User-Centric-Agenten

用户 Bench: 用户中心代理器的交互式 Gym 环境 2507.22034v1

Authors (12): Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang

Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, enabling them to solve complex tasks. However, their ability to proactively collaborate with users, especially when goals are vague, evolving, or indirectly expressed, remains underexplored. To address this gap, we introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions. UserBench features simulated users who start with underspecified goals and reveal preferences incrementally, requiring agents to proactively clarify intent and make grounded decisions with tools. Our evaluation of leading open- and closed-source LLMs reveals a significant disconnect between task completion and user alignment. For instance, models provide answers that fully align with all user intents only 20% of the time on average, and even the most advanced models uncover fewer than 30% of all user preferences through active interaction. These results highlight the challenges of building agents that are not just capable task executors, but true collaborative partners. UserBench offers an interactive environment to measure and advance this critical capability.

nan

Article 359

Title@2025-07-29 (2): Classification of Honey Botanical and Geographical Sources using Mineral Profiles and Machine Learning

Title: Classification of Honey Botanical and Geographical Sources using Mineral Profiles and Machine Learning

Klassifizierung von Honig Botanical und Geografische Quellen mit Mineralprofilen und maschinellem Lernen

利用矿物概况和机器学习对蜂蜜植物和地理来源进行分类 2507.22032v1

Authors (2): Mokhtar Al-Awadhi, Ratnadeep Deshmukh

This paper proposes a machine learning-based approach for identifying honey floral and geographical sources using mineral element profiles. The proposed method comprises two steps: preprocessing and classification. The preprocessing phase involves missing-value treatment and data normalization. In the classification phase, we employ various supervised classification models for discriminating between six botanical sources and 13 geographical origins of honey. We test the classifiers’ performance on a publicly available honey mineral element dataset. The dataset contains mineral element profiles of honeys from various floral and geographical origins. Results show that mineral element content in honey provides discriminative information useful for classifying honey botanical and geographical sources. Results also show that the Random Forests (RF) classifier obtains the best performance on this dataset, achieving a cross-validation accuracy of 99.30% for classifying honey botanical origins and 98.01% for classifying honey geographical origins.

nan

Article 360

Title@2025-07-29 (2): Persistent Backdoor Attacks in Continual Learning

Title: Persistent Backdoor Attacks in Continual Learning

Persistente Hintertürangriffe im kontinuierlichen Lernen

持续学习中的持续后门攻击 2409.13864v3

Authors (3): Zhen Guo, Abhinav Kumar, Reza Tourani

Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task’s training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.

nan

Article 361

Title@2025-07-29 (2): Exploring the Stratified Space Structure of an RL Game with the Volume Growth Transform

Title: Exploring the Stratified Space Structure of an RL Game with the Volume Growth Transform

Erforschung der Stratifizierten Raumstruktur eines RL-Spiels mit der Volume Growth Transform

探索与量增长变换的RL游戏的分流空间结构 2507.22010v1

Authors (6): Justin Curry, Brennan Lagasse, Ngoc B. Lam, Gregory Cox, David Rosenbluth, Alberto Speranzon

In this work, we explore the structure of the embedding space of a transformer model trained for playing a particular reinforcement learning (RL) game. Specifically, we investigate how a transformer-based Proximal Policy Optimization (PPO) model embeds visual inputs in a simple environment where an agent must collect “coins” while avoiding dynamic obstacles consisting of “spotlights.” By adapting Robinson et al.’s study of the volume growth transform for LLMs to the RL setting, we find that the token embedding space for our visual coin collecting game is also not a manifold, and is better modeled as a stratified space, where local dimension can vary from point to point. We further strengthen Robinson’s method by proving that fairly general volume growth curves can be realized by stratified spaces. Finally, we carry out an analysis that suggests that as an RL agent acts, its latent representation alternates between periods of low local dimension, while following a fixed sub-strategy, and bursts of high local dimension, where the agent achieves a sub-goal (e.g., collecting an object) or where the environmental complexity increases (e.g., more obstacles appear). Consequently, our work suggests that the distribution of dimensions in a stratified latent space may provide a new geometric indicator of complexity for RL games.

nan

Article 362

Title@2025-07-29 (2): An $\tilde{O}$ptimal Differentially Private Learner for Concept Classes with VC Dimension 1

Title: An $\tilde{O}$ptimal Differentially Private Learner for Concept Classes with VC Dimension 1

Ein $\tilde{O}$ptimal Differential Private Learner für Konzeptklassen mit VC Dimension 1

$\tilde{O} 用于 VC 1 层面概念类的 $timal diffical 私人不同学习器 2505.06581v2

Authors (1): Chao Yan

We present the first nearly optimal differentially private PAC learner for any concept class with VC dimension 1 and Littlestone dimension $d$. Our algorithm achieves the sample complexity of $\tilde{O}_{\varepsilon,\delta,\alpha,\delta}(\log^* d)$, nearly matching the lower bound of $\Omega(\log^* d)$ proved by Alon et al. [STOC19]. Prior to our work, the best known upper bound is $\tilde{O}(VC\cdot d^5)$ for general VC classes, as shown by Ghazi et al. [STOC21].

nan

Article 363

Title@2025-07-29 (2): Staining and locking computer vision models without retraining

Title: Staining and locking computer vision models without retraining

Staining und Verriegelung von Computer Vision-Modelle ohne Umschulung

不经再培训而将计算机视觉模型固定和封闭 2507.22000v1

Authors (5): Oliver J. Sutton, Qinghua Zhou, George Leete, Alexander N. Gorban, Ivan Y. Tyukin

We introduce new methods of staining and locking computer vision models, to protect their owners’ intellectual property. Staining, also known as watermarking, embeds secret behaviour into a model which can later be used to identify it, while locking aims to make a model unusable unless a secret trigger is inserted into input images. Unlike existing methods, our algorithms can be used to stain and lock pre-trained models without requiring fine-tuning or retraining, and come with provable, computable guarantees bounding their worst-case false positive rates. The stain and lock are implemented by directly modifying a small number of the model’s weights and have minimal impact on the (unlocked) model’s performance. Locked models are unlocked by inserting a small `trigger patch’ into the corner of the input image. We present experimental results showing the efficacy of our methods and demonstrating their practical performance on a variety of computer vision models.

nan

Article 364

Title@2025-07-29 (2): Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation

Title: Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation

Lehre mich zu Trick: Erforschen von zweifelhafter Übertragbarkeit durch Wissensdestillation

教我变作:探索通过知识蒸馏来进行逆向转让 2507.21992v1

Authors (3): Siddhartha Pradhan, Shikshya Shiwakoti, Neha Bathuri

We investigate whether knowledge distillation (KD) from multiple heterogeneous teacher models can enhance the generation of transferable adversarial examples. A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization, with ResNet50 and DenseNet-161 as teachers. The trained student is then used to generate adversarial examples using FG, FGS, and PGD attacks, which are evaluated against a black-box target model (GoogLeNet). Our results show that student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines, while reducing adversarial example generation time by up to a factor of six. An ablation study further reveals that lower temperature settings and the inclusion of hard-label supervision significantly enhance transferability. These findings suggest that KD can serve not only as a model compression technique but also as a powerful tool for improving the efficiency and effectiveness of black-box adversarial attacks.

nan

Article 365

Title@2025-07-29 (2): Higher-Order Kuramoto Oscillator Network for Dense Associative Memory

Title: Higher-Order Kuramoto Oscillator Network for Dense Associative Memory

Höhere Ordnung Kuramoto Oszillator Netzwerk für Dense Assoziative Speicher

高端仓本聚合内存振动加速器网络 2507.21984v1

Authors (2): Jona Nagerl, Natalia G. Berloff

Networks of phase oscillators can serve as dense associative memories if they incorporate higher-order coupling beyond the classical Kuramoto model’s pairwise interactions. Here we introduce a generalized Kuramoto model with combined second-harmonic (pairwise) and fourth-harmonic (quartic) coupling, inspired by dense Hopfield memory theory. Using mean-field theory and its dynamical approximation, we obtain a phase diagram for dense associative memory model that exhibits a tricritical point at which the continuous onset of memory retrieval is supplanted by a discontinuous, hysteretic transition. In the quartic-dominated regime, the system supports bistable phase-locked states corresponding to stored memory patterns, with a sizable energy barrier between memory and incoherent states. We analytically determine this bistable region and show that the escape time from a memory state (due to noise) grows exponentially with network size, indicating robust storage. Extending the theory to finite memory load, we show that higher-order couplings achieve superlinear scaling of memory capacity with system size, far exceeding the limit of pairwise-only oscillators. Large-scale simulations of the oscillator network confirm our theoretical predictions, demonstrating rapid pattern retrieval and robust storage of many phase patterns. These results bridge the Kuramoto synchronization with modern Hopfield memories, pointing toward experimental realization of high-capacity, analog associative memory in oscillator systems.

nan

Article 366

Title@2025-07-29 (2): Improving Generative Ad Text on Facebook using Reinforcement Learning

Title: Improving Generative Ad Text on Facebook using Reinforcement Learning

Verbesserung des generativen Ad-Texts auf Facebook mit Verstärkungslernen

利用强化学习改善脸书上的创创创广告 2507.21983v1

Authors (5): Daniel R. Jiang, Alex Nikulkov, Yu-Chia Chen, Yang Bai, Zheqing Zhu

Generative artificial intelligence (AI), in particular large language models (LLMs), is poised to drive transformative economic change. LLMs are pre-trained on vast text data to learn general language patterns, but a subsequent post-training phase is critical to align them for specific real-world tasks. Reinforcement learning (RL) is the leading post-training technique, yet its economic impact remains largely underexplored and unquantified. We examine this question through the lens of the first deployment of an RL-trained LLM for generative advertising on Facebook. Integrated into Meta’s Text Generation feature, our model, “AdLlama,” powers an AI tool that helps advertisers create new variations of human-written ad text. To train this model, we introduce reinforcement learning with performance feedback (RLPF), a post-training method that uses historical ad performance data as a reward signal. In a large-scale 10-week A/B test on Facebook spanning nearly 35,000 advertisers and 640,000 ad variations, we find that AdLlama improves click-through rates by 6.7% (p=0.0296) compared to a supervised imitation model trained on curated ads. This represents a substantial improvement in advertiser return on investment on Facebook. We also find that advertisers who used AdLlama generated more ad variations, indicating higher satisfaction with the model’s outputs. To our knowledge, this is the largest study to date on the use of generative AI in an ecologically valid setting, offering an important data point quantifying the tangible impact of RL post-training. Furthermore, the results show that RLPF is a promising and generalizable approach for metric-driven post-training that bridges the gap between highly capable language models and tangible outcomes.

nan

Article 367

Title@2025-07-29 (2): Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities

Title: Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities

Thou Shalt Not Prompt: Zero-Shot menschliche Aktivitätserkennung in Smart Homes durch Sprachmodellierung von Sensordaten & Aktivitäten

” Thowel “ 不迅速:通过感应数据和活动语言建模模拟,在智能家庭内零点热人类活动确认 2507.21964v1

Authors (2): Sourish Gunesh Dhekane, Thomas Ploetz

Developing zero-shot human activity recognition (HAR) methods is a critical direction in smart home research – considering its impact on making HAR systems work across smart homes having diverse sensing modalities, layouts, and activities of interest. The state-of-the-art solutions along this direction are based on generating natural language descriptions of the sensor data and feeding it via a carefully crafted prompt to the LLM to perform classification. Despite their performance guarantees, such ``prompt-the-LLM’’ approaches carry several risks, including privacy invasion, reliance on an external service, and inconsistent predictions due to version changes, making a case for alternative zero-shot HAR methods that do not require prompting the LLMs. In this paper, we propose one such solution that models sensor data and activities using natural language, leveraging its embeddings to perform zero-shot classification and thereby bypassing the need to prompt the LLMs for activity predictions. The impact of our work lies in presenting a detailed case study on six datasets, highlighting how language modeling can bolster HAR systems in zero-shot recognition.

nan

Article 368

Title@2025-07-29 (2): SLA-Centric Automated Algorithm Selection Framework for Cloud Environments

Title: SLA-Centric Automated Algorithm Selection Framework for Cloud Environments

SLA-Centric automatisierte Algorithmenauswahl-Framework für Cloud-Umgebungen

SLA-Centric 云层环境自动测算选择框架 2507.21963v1

Authors (3): Siana Rizwan, Tasnim Ahmed, Salimur Choudhury

Cloud computing offers on-demand resource access, regulated by Service-Level Agreements (SLAs) between consumers and Cloud Service Providers (CSPs). SLA violations can impact efficiency and CSP profitability. In this work, we propose an SLA-aware automated algorithm-selection framework for combinatorial optimization problems in resource-constrained cloud environments. The framework uses an ensemble of machine learning models to predict performance and rank algorithm-hardware pairs based on SLA constraints. We also apply our framework to the 0-1 knapsack problem. We curate a dataset comprising instance specific features along with memory usage, runtime, and optimality gap for 6 algorithms. As an empirical benchmark, we evaluate the framework on both classification and regression tasks. Our ablation study explores the impact of hyperparameters, learning approaches, and large language models effectiveness in regression, and SHAP-based interpretability.

nan

Article 369

Title@2025-07-29 (2): Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data

Title: Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data

Gewährleistung der Sicherheit von medizinischer KI: Interpretationsgestützte Erkennung und Minderung von sauberen Modellverhalten und zugehörigen Daten

确保医疗AI安全:可解释性-驱动性探测和减少污秽模型行为和相关数据 2501.13818v2

Authors (4): Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semi-automated interpretability-based bias annotation capabilities. This includes methods for the sample- and feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. Our code is available at https://github.com/frederikpahde/medical-ai-safety.

nan

Article 370

Title@2025-07-29 (2): DeepGo: Predictive Directed Greybox Fuzzing

Title: DeepGo: Predictive Directed Greybox Fuzzing

DeepGo: Predictive Directed Greybox Fuzzing

深度Go:预测方向灰盒模糊 2507.21952v1

Authors (6): Peihong Lin, Pengfei Wang, Xu Zhou, Wei Xie, Gen Zhang, Kai Lu

The state-of-the-art DGF techniques redefine and optimize the fitness metric to reach the target sites precisely and quickly. However, optimizations for fitness metrics are mainly based on heuristic algorithms, which usually rely on historical execution information and lack foresight on paths that have not been exercised yet. Thus, those hard-to-execute paths with complex constraints would hinder DGF from reaching the targets, making DGF less efficient. In this paper, we propose DeepGo, a predictive directed grey-box fuzzer that can combine historical and predicted information to steer DGF to reach the target site via an optimal path. We first propose the path transition model, which models DGF as a process of reaching the target site through specific path transition sequences. The new seed generated by mutation would cause the path transition, and the path corresponding to the high-reward path transition sequence indicates a high likelihood of reaching the target site through it. Then, to predict the path transitions and the corresponding rewards, we use deep neural networks to construct a Virtual Ensemble Environment (VEE), which gradually imitates the path transition model and predicts the rewards of path transitions that have not been taken yet. To determine the optimal path, we develop a Reinforcement Learning for Fuzzing (RLF) model to generate the transition sequences with the highest sequence rewards. The RLF model can combine historical and predicted path transitions to generate the optimal path transition sequences, along with the policy to guide the mutation strategy of fuzzing. Finally, to exercise the high-reward path transition sequence, we propose the concept of an action group, which comprehensively optimizes the critical steps of fuzzing to realize the optimal path to reach the target efficiently.

nan

Article 371

Title@2025-07-29 (2): Simulating Posterior Bayesian Neural Networks with Dependent Weights

Title: Simulating Posterior Bayesian Neural Networks with Dependent Weights

Simulation von hinteren bayesischen neuralen Netzwerken mit abhängigen Gewichten

模拟具有依附体重量的波别海湾神经网络 2507.22095v1

Authors (3): Nicola Apollonio, Giovanni Franzina, Giovanni Luca Torrisi

In this paper we consider posterior Bayesian fully connected and feedforward deep neural networks with dependent weights. Particularly, if the likelihood is Gaussian, we identify the distribution of the wide width limit and provide an algorithm to sample from the network. In the shallow case we explicitly compute the distribution of the output, proving that it is a Gaussian mixture. All the theoretical results are numerically validated.

nan

Article 372

Title@2025-07-29 (2): Multi-state Protein Design with DynamicMPNN

Title: Multi-state Protein Design with DynamicMPNN

Multi-State Protein Design mit DynamicMPNN

具有 DiriveMPNN 的多州先质设计 2507.21938v1

Authors (9): Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles

Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using AlphaFold initial guess, DynamicMPNN outperforms ProteinMPNN by up to 13% on structure-normalized RMSD across our challenging multi-state protein benchmark.

nan

Article 373

Title@2025-07-29 (2): Linear Stability Analysis of Physics-Informed Random Projection Neural Networks for ODEs

Title: Linear Stability Analysis of Physics-Informed Random Projection Neural Networks for ODEs

Lineare Stabilitätsanalyse der physikinformierten Zufallsprojektion Neurale Netzwerke für ODEs

极光体物理集成随机投射神经网络的线性稳定性分析 2408.15393v2

Authors (4): Gianluca Fabiani, Erik Bollt, Constantinos Siettos, Athanasios N. Yannacopoulos

We present a linear stability analysis of physics-informed random projection neural networks (PI-RPNNs), for the numerical solution of {the initial value problem (IVP)} of (stiff) ODEs. We begin by proving that PI-RPNNs are uniform approximators of the solution to ODEs. We then provide a constructive proof demonstrating that PI-RPNNs offer consistent and asymptotically stable numerical schemes, thus convergent schemes. In particular, we prove that multi-collocation PI-RPNNs guarantee asymptotic stability. Our theoretical results are illustrated via numerical solutions of benchmark examples including indicative comparisons with the backward Euler method, the midpoint method, the trapezoidal rule, the 2-stage Gauss scheme, and the 2- and 3-stage Radau schemes.

nan

Article 374

Title@2025-07-29 (2): SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs

Title: SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs

SmoothRot: Kombination von Kanal-Weiss-Skalierung und Rotation für Quantisierungsfreundliche LLMs

平滑旋转: 将频道- Wise 缩放和旋转组合起来, 用于量化- 友好型LLMS 2506.05413v2

Authors (3): Patrik Czakó, Gábor Kertész, Sándor Szénási

We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs). SmoothRot addresses the critical challenge of massive activation outliers, by integrating channel-wise scaling with Hadamard transformations. Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy. Experiments conducted on popular LLMs (LLaMA2 7B, LLaMA3.1 8B, and Mistral 7B) demonstrate that SmoothRot consistently reduces the performance gap between quantized and FP16 models by approximately 10-30\% across language generation and zero-shot reasoning tasks, without introducing additional inference latency. Code is available at https://github.com/czakop/smoothrot.

nan

Article 375

Title@2025-07-29 (2): SLR: Automated Synthesis for Scalable Logical Reasoning

Title: SLR: Automated Synthesis for Scalable Logical Reasoning

SLR: Automatisierte Synthese für skalierbare logische Vernunft

SLR: 用于可缩放逻辑理由的自动合成 2506.15787v3

Authors (9): Lukas Helff, Ahmad Omar, Felix Friedrich, Antonia Wüst, Hikaru Shindo, Rupert Mitchell, Tim Woydt, Patrick Schramowski, and Wolfgang Stammer Kristian Kersting

We introduce SLR, an end-to-end framework for systematic evaluation and training of Large Language Models (LLMs) via Scalable Logical Reasoning. Given a user’s task specification, SLR automatically synthesizes (i) an instruction prompt for an inductive reasoning task, (ii) a validation program, executable on model outputs to provide verifiable rewards, and (iii) the latent ground-truth rule. This process is fully automated, scalable, requires no human annotations, and offers precise control over task difficulty. Using SLR, we create SLR-Bench, a benchmark comprising 19k prompts organized into 20 curriculum levels that progressively increase in relational, arithmetic, and recursive complexity. Large-scale evaluation reveals that contemporary LLMs readily produce syntactically valid rules, yet often fail at correct logical inference. Recent reasoning LLMs demonstrate improved performance but incur very high test-time computation, with costs exceeding $300 for just 1,000 prompts. Finally, curriculum learning via SLR doubles Llama-3-8B accuracy on SLR-Bench, achieving parity with Gemini-Flash-Thinking at a fraction of computational cost. Moreover, these reasoning capabilities generalize to a wide range of established benchmarks, underscoring the effectiveness of SLR for downstream reasoning.

nan

Article 376

Title@2025-07-29 (2): HiPreNets: High-Precision Neural Networks through Progressive Training

Title: HiPreNets: High-Precision Neural Networks through Progressive Training

HiPreNets: Hochpräzisions-Neuralnetzwerke durch progressives Training

HPRENets:通过渐进培训建立高精度神经网络 2506.15064v2

Authors (3): Ethan Mulle, Wei Kang, Qi Gong

Deep neural networks are powerful tools for solving nonlinear problems in science and engineering, but training highly accurate models becomes challenging as problem complexity increases. Non-convex optimization and numerous hyperparameters to tune make performance improvement difficult, and traditional approaches often prioritize minimizing mean squared error (MSE) while overlooking $L^{\infty}$ error, which is the critical focus in many applications. To address these challenges, we present a progressive framework for training and tuning high-precision neural networks (HiPreNets). Our approach refines a previously explored staged training technique for neural networks that improves an existing fully connected neural network by sequentially learning its prediction residuals using additional networks, leading to improved overall accuracy. We discuss how to take advantage of the structure of the residuals to guide the choice of loss function, number of parameters to use, and ways to introduce adaptive data sampling techniques. We validate our framework’s effectiveness through several benchmark problems.

nan

Article 377

Title@2025-07-29 (2): Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks

Title: Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks

Generalists vs. Specialists: Bewertung von LLMs auf hochkonzentrierten biophysikalischen Sequenzoptimierungsaufgaben

通才与专家:评估高度约束生物物理序列优化任务中受高度约束的生物物理序列优化任务LLMs 2410.22296v5

Authors (9): Angelica Chen, Samuel D. Stanton, Frances Ding, Robert G. Alberstein, Andrew M. Watkins, Richard Bonneau, Vladimir Gligorijević, Kyunghyun Cho, Nathan C. Frey

Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequate synthetic benchmarks. We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems. With prompting alone, off-the-shelf LLMs struggle to optimize Ehrlich functions. In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. When combined with a novel preference learning loss, we find LLOME can not only learn to solve some Ehrlich functions, but can even perform as well as or better than LaMBO-2 on moderately difficult Ehrlich variants. However, LLMs also exhibit some likelihood-reward miscalibration and struggle without explicit rewards. Our results indicate LLMs can occasionally provide significant benefits, but specialized solvers are still competitive and incur less overhead.

nan

Article 378

Title@2025-07-29 (2): TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Title: TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

TESSERA: Temporale Einbettungen von Oberflächenspektren für die Darstellung und Analyse der Erde

TESSERA:用于地球代表和分析的地平面表面表层实时嵌入 2506.20380v3

Authors (12): Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C Lisaius, Markus Immitzer, David A. Coomes, Anil Madhavapeddy, Andrew Blake, Srinivasan Keshav

Satellite remote sensing from repeated observations and multiple sensors enables a wide range of downstream applications, including climate modeling, carbon accounting, and strategies for conservation and sustainable land use. However, satellite time series are voluminous, often corrupted by sensor noise, clouds, and atmospheric conditions, and unevenly spaced in time, making them challenging to use. We present TESSERA, an open, global, land-oriented remote sensing foundation model that uses self-supervised learning to generate `ready-to-use’ embeddings at 10~m scale from pixel-level satellite time series data. TESSERA uses two parallel Transformer-based encoders to combine optical data from ten Sentinel-2 spectral bands at 10-60~m spatial resolution and two Sentinel-1 synthetic aperture radar backscatter coefficients at 10~m resolution to create embeddings that are subsequently fused with a multilayer perceptron to create annual global embedding maps. We compare our work with state-of-the-art task-specific models and other foundation models in five diverse downstream tasks and find that TESSERA closely matches or outperforms these baselines. We believe that TESSERA’s ease of use, openness, computation-, label-, and data-efficiency, and high performance will prove transformative in a wide range of vegetation-oriented ecological and agricultural applications.

nan

Article 379

Title@2025-07-29 (2): Evaluating Deepfake Detectors in the Wild

Title: Evaluating Deepfake Detectors in the Wild

Bewertung von Deepfake-Detektoren in der Wildnis

评估野生深假探测器 2507.21905v1

Authors (2): Viacheslav Pirogov, Maksim Artemev

Deepfakes powered by advanced machine learning models present a significant and evolving threat to identity verification and the authenticity of digital media. Although numerous detectors have been developed to address this problem, their effectiveness has yet to be tested when applied to real-world data. In this work we evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection. Using state-of-the-art deepfake generation methods, we create a comprehensive dataset containing more than 500,000 high-quality deepfake images. Our analysis shows that detecting deepfakes still remains a challenging task. The evaluation shows that in fewer than half of the deepfake detectors tested achieved an AUC score greater than 60%, with the lowest being 50%. We demonstrate that basic image manipulations, such as JPEG compression or image enhancement, can significantly reduce model performance. All code and data are publicly available at https://github.com/messlav/Deepfake-Detectors-in-the-Wild.

nan

Article 380

Title@2025-07-29 (2): Receding Hamiltonian-Informed Optimal Neural Control and State Estimation for Closed-Loop Dynamical Systems

Title: Receding Hamiltonian-Informed Optimal Neural Control and State Estimation for Closed-Loop Dynamical Systems

Receding Hamiltonian-informed Optimal Neural Control und State Abschätzung für Closed-Loop Dynamical Systems

正在消退汉密尔顿式密密尔顿式内装最佳神经控制以及闭闭棒动态系统的国家估计 2411.01297v3

Authors (2): Josue N. Rivera, Dengfeng Sun

This paper formalizes Hamiltonian-Informed Optimal Neural (Hion) controllers, a novel class of neural network-based controllers for dynamical systems and explicit non-linear model-predictive control. Hion controllers estimate future states and develop an optimal control strategy using Pontryagin’s Maximum Principle. The proposed framework, along with our Taylored Multi-Faceted Approach for Neural ODE and Optimal Control (T-mano) architecture, allows for custom transient behavior, predictive control, and closed-loop feedback, addressing limitations of existing methods. Comparative analyses with established model-predictive controllers revealed Hion controllers’ superior optimality and tracking capabilities. Optimal control strategies are also demonstrated for both linear and non-linear dynamical systems.

nan

Article 381

Title@2025-07-29 (2): Can sparse autoencoders make sense of gene expression latent variable models?

Title: Can sparse autoencoders make sense of gene expression latent variable models?

Können spärliche Autoencoder die Genexpression latent variabler Modelle sinnvoll machen?

稀有的自动代碼器能理解基因表达潜在的变异模型吗? 2410.11468v3

Authors (1): Viktoria Schuster

Sparse autoencoders (SAEs) have lately been used to uncover interpretable latent features in large language models. By projecting dense embeddings into a much higher-dimensional and sparse space, learned features become disentangled and easier to interpret. This work explores the potential of SAEs for decomposing embeddings in complex and high-dimensional biological data. Using simulated data, it outlines the efficacy, hyperparameter landscape, and limitations of SAEs when it comes to extracting ground truth generative variables from latent space. The application to embeddings from pretrained single-cell models shows that SAEs can find and steer key biological processes and even uncover subtle biological signals that might otherwise be missed. This work further introduces scFeatureLens, an automated interpretability approach for linking SAE features and biological concepts from gene sets to enable large-scale analysis and hypothesis generation in single-cell gene expression models.

nan

Article 382

Title@2025-07-29 (2): Reducing Data Requirements for Sequence-Property Prediction in Copolymer Compatibilizers via Deep Neural Network Tuning

Title: Reducing Data Requirements for Sequence-Property Prediction in Copolymer Compatibilizers via Deep Neural Network Tuning

Reduzierung der Datenanforderungen für die Sequence-Property-Prognose in Copolymer-Compatibilizern über Deep Neural Network Tuning

通过深神经网络图案减少通过深神经网络图案对聚合聚合复合集成器的序列-财产预测数据要求 2507.21902v1

Authors (4): Md Mushfiqul Islam, Nishat N. Labiba, Lawrence O. Hall, David S. Simmons

Synthetic sequence-controlled polymers promise to transform polymer science by combining the chemical versatility of synthetic polymers with the precise sequence-mediated functionality of biological proteins. However, design of these materials has proven extraordinarily challenging, because they lack the massive datasets of closely related evolved molecules that accelerate design of proteins. Here we report on a new Artifical Intelligence strategy to dramatically reduce the amount of data necessary to accelerate these materials’ design. We focus on data connecting the repeat-unit-sequence of a \emph{compatibilizer} molecule to its ability to reduce the interfacial tension between distinct polymer domains. The optimal sequence of these molecules, which are essential for applications such as mixed-waste polymer recycling, depends strongly on variables such as concentration and chemical details of the polymer. With current methods, this would demand an entirely distinct dataset to enable design at each condition. Here we show that a deep neural network trained on low-fidelity data for sequence/interfacial tension relations at one set of conditions can be rapidly tuned to make higher-fidelity predictions at a distinct set of conditions, requiring far less data that would ordinarily be needed. This priming-and-tuning approach should allow a single low-fidelity parent dataset to dramatically accelerate prediction and design in an entire constellation of related systems. In the long run, it may also provide an approach to bootstrapping quantitative atomistic design with AI insights from fast, coarse simulations.

nan

Article 383

Title@2025-07-29 (2): LLM-based Content Classification Approach for GitHub Repositories by the README Files

Title: LLM-based Content Classification Approach for GitHub Repositories by the README Files

LLM-basierter Content-Klassifikationsansatz für GitHub-Repositories durch die README-Dateien

REEADME 文件中基于LLM的GitHub储存库内容分类方法 2507.21899v1

Authors (4): Malik Uzair Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik

GitHub is the world’s most popular platform for storing, sharing, and managing code. Every GitHub repository has a README file associated with it. The README files should contain project-related information as per the recommendations of GitHub to support the usage and improvement of repositories. However, GitHub repository owners sometimes neglected these recommendations. This prevents a GitHub repository from reaching its full potential. This research posits that the comprehensiveness of a GitHub repository’s README file significantly influences its adoption and utilization, with a lack of detail potentially hindering its full potential for widespread engagement and impact within the research community. Large Language Models (LLMs) have shown great performance in many text-based tasks including text classification, text generation, text summarization and text translation. In this study, an approach is developed to fine-tune LLMs for automatically classifying different sections of GitHub README files. Three encoder-only LLMs are utilized, including BERT, DistilBERT and RoBERTa. These pre-trained models are then fine-tuned based on a gold-standard dataset consisting of 4226 README file sections. This approach outperforms current state-of-the-art methods and has achieved an overall F1 score of 0.98. Moreover, we have also investigated the use of Parameter-Efficient Fine-Tuning (PEFT) techniques like Low-Rank Adaptation (LoRA) and shown an economical alternative to full fine-tuning without compromising much performance. The results demonstrate the potential of using LLMs in designing an automatic classifier for categorizing the content of GitHub README files. Consequently, this study contributes to the development of automated tools for GitHub repositories to improve their identifications and potential usages.

nan

Article 384

Title@2025-07-29 (2): Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis

Title: Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis

Kardiovaskuläre Krankheitsvorhersage mit maschinellem Lernen: Eine vergleichende Analyse

利用机器学习对心血管疾病进行预测:比较分析 2507.21898v1

Authors (4): Risshab Srinivas Ramesh, Roshani T S Udupa, Monisha J, Kushi K K S

Cardiovascular diseases (CVDs) are a main cause of mortality globally, accounting for 31% of all deaths. This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records to explore the influence of numerical (age, height, weight, blood pressure, BMI) and categorical gender, cholesterol, glucose, smoking, alcohol, activity) factors on CVD occurrence. We have performed statistical analyses, including t-tests, Chi-square tests, and ANOVA, to identify strong associations between CVD and elderly people, hypertension, higher weight, and abnormal cholesterol levels, while physical activity (a protective factor). A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associations for smoking and alcohol, suggesting potential data issues. Model performance comparisons reveal CatBoost as the top performer with an accuracy of 0.734 and an ECE of 0.0064 and excels in probabilistic prediction (Brier score = 0.1824). Data challenges, including outliers and skewed distributions, indicate a need for improved preprocessing to enhance predictive reliability.

nan

Article 385

Title@2025-07-29 (2): A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

Title: A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

Eine Rezension über Selbstüberwachtes Lernen für Zeitreihenanomalienerkennung: Neuere Fortschritte und offene Herausforderungen

《反反常探测:最新进展和公开挑战》时间序列自我监督学习回顾 2501.15196v2

Authors (3): Aitor Sánchez-Ferrera, Borja Calvo, Jose A. Lozano

Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and struggling to adapt to unseen normality. In response to this limitation, self-supervised techniques for time series have garnered attention as a potential solution to undertake this obstacle and enhance the performance of anomaly detectors. This paper presents a comprehensive review of the recent methods that make use of self-supervised learning for time series anomaly detection. A taxonomy is proposed to categorize these methods based on their primary characteristics, facilitating a clear understanding of their diversity within this field. The information contained in this survey, along with additional details that will be periodically updated, is available on the following GitHub repository: https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection.

nan

Article 386

Title@2025-07-29 (2): Data-driven quantum Koopman method for simulating nonlinear dynamics

Title: Data-driven quantum Koopman method for simulating nonlinear dynamics

Datengesteuerte Quantenkoopman-Methode zur Simulation nichtlinearer Dynamik

模拟非线性动态的由数据驱动的量量 Koopman 方法 2507.21890v1

Authors (4): Baoyang Zhang, Zhen Lu, Yaomin Zhao, Yue Yang

Quantum computation offers potential exponential speedups for simulating certain physical systems, but its application to nonlinear dynamics is inherently constrained by the requirement of unitary evolution. We propose the quantum Koopman method (QKM), a data-driven framework that bridges this gap through transforming nonlinear dynamics into linear unitary evolution in higher-dimensional observable spaces. Leveraging the Koopman operator theory to achieve a global linearization, our approach maps system states into a hierarchy of Hilbert spaces using a deep autoencoder. Within the linearized embedding spaces, the state representation is decomposed into modulus and phase components, and the evolution is governed by a set of unitary Koopman operators that act exclusively on the phase. These operators are constructed from diagonal Hamiltonians with coefficients learned from data, a structure designed for efficient implementation on quantum hardware. This architecture enables direct multi-step prediction, and the operator’s computational complexity scales logarithmically with the observable space dimension. The QKM is validated across diverse nonlinear systems. Its predictions maintain relative errors below 6% for reaction-diffusion systems and shear flows, and capture key statistics in 2D turbulence. This work establishes a practical pathway for quantum-accelerated simulation of nonlinear phenomena, exploring a framework built on the synergy between deep learning for global linearization and quantum algorithms for unitary dynamics evolution.

nan

Article 387

Title@2025-07-29 (2): Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions

Title: Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions

Puzzle-Ähnlichkeit: Ein perzeptuell geführtes Cross-Reference-Metrikum für Artefakterkennung in 3D-Szenenrekonstruktionen

3D 场景重建中个体行为探测概念引导交叉参考参考度量 2411.17489v3

Authors (3): Nicolai Hermann, Jorge Condor, Piotr Didyk

Modern reconstruction techniques can effectively model complex 3D scenes from sparse 2D views. However, automatically assessing the quality of novel views and identifying artifacts is challenging due to the lack of ground truth images and the limitations of no-reference image metrics in predicting reliable artifact maps. The absence of such metrics hinders assessment of the quality of novel views and limits the adoption of post-processing techniques, such as inpainting, to enhance reconstruction quality. To tackle this, recent work has established a new category of metrics (cross-reference), predicting image quality solely by leveraging context from alternate viewpoint captures (arXiv:2404.14409). In this work, we propose a new cross-reference metric, Puzzle Similarity, which is designed to localize artifacts in novel views. Our approach utilizes image patch statistics from the training views to establish a scene-specific distribution, later used to identify poorly reconstructed regions in the novel views. Given the lack of good measures to evaluate cross-reference methods in the context of 3D reconstruction, we collected a novel human-labeled dataset of artifact and distortion maps in unseen reconstructed views. Through this dataset, we demonstrate that our method achieves state-of-the-art localization of artifacts in novel views, correlating with human assessment, even without aligned references. We can leverage our new metric to enhance applications like automatic image restoration, guided acquisition, or 3D reconstruction from sparse inputs. Find the project page at https://nihermann.github.io/puzzlesim/ .

nan

Article 388

Title@2025-07-29 (2): Prediction accuracy versus rescheduling flexibility in elective surgery management

Title: Prediction accuracy versus rescheduling flexibility in elective surgery management

Vorhersagegenauigkeit versus Anpassungsflexibilität im elektiven chirurgischen Management

在选修外科管理方面,预测准确性与重新安排灵活性 2507.15566v2

Authors (4): Pieter Smet, Martina Doneda, Ettore Lanzarone, Giuliana Carello

The availability of downstream resources plays is critical in planning the admission of elective surgery patients. The most crucial one is inpatient beds. To ensure bed availability, hospitals may use machine learning (ML) models to predict patients’ length-of-stay (LOS) in the admission planning stage. However, the real value of the LOS for each patient may differ from the predicted one, potentially making the schedule infeasible. To address such infeasibilities, it is possible to implement rescheduling strategies that take advantage of operational flexibility. For example, planners may postpone admission dates, relocate patients to different wards, or even transfer patients who are already admitted among wards. A straightforward assumption is that better LOS predictions can help reduce the impact of rescheduling. However, the training process of ML models that can make such accurate predictions can be very costly. Building on previous work that proposed simulated ML for evaluating data-driven approaches, this paper explores the relationship between LOS prediction accuracy and rescheduling flexibility across various corrective policies. Specifically, we examine the most effective patient rescheduling strategies under LOS prediction errors to prevent bed overflows while optimizing resource utilization

nan

Article 389

Title@2025-07-29 (2): Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting

Title: Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting

Context-Aware Probabilistische Modellierung mit LLM für multimodale Zeitreihenvorhersage

与多种时序多时序预测的LLMLM建立环境软件概率模型 2505.10774v2

Authors (7): Yueyang Yao, Jiajun Li, Xingyuan Dai, MengMeng Zhang, Xiaoyan Gong, Fei-Yue Wang, Yisheng Lv

Time series forecasting is important for applications spanning energy markets, climate analysis, and traffic management. However, existing methods struggle to effectively integrate exogenous texts and align them with the probabilistic nature of large language models (LLMs). Current approaches either employ shallow text-time series fusion via basic prompts or rely on deterministic numerical decoding that conflict with LLMs’ token-generation paradigm, which limits contextual awareness and distribution modeling. To address these limitations, we propose CAPTime, a context-aware probabilistic multimodal time series forecasting method that leverages text-informed abstraction and autoregressive LLM decoding. Our method first encodes temporal patterns using a pretrained time series encoder, then aligns them with textual contexts via learnable interactions to produce joint multimodal representations. By combining a mixture of distribution experts with frozen LLMs, we enable context-aware probabilistic forecasting while preserving LLMs’ inherent distribution modeling capabilities. Experiments on diverse time series forecasting tasks demonstrate the superior accuracy and generalization of CAPTime, particularly in multimodal scenarios. Additional analysis highlights its robustness in data-scarce scenarios through hybrid probabilistic decoding.

nan

Article 390

Title@2025-07-29 (2): Representations in vision and language converge in a shared, multidimensional space of perceived similarities

Title: Representations in vision and language converge in a shared, multidimensional space of perceived similarities

Repräsentationen in Vision und Sprache konvergieren in einem geteilten, mehrdimensionalen Raum wahrgenommener Ähnlichkeiten

视觉和语言代表在共同的、多层面的、有共同感知的相似性空间中汇合在一起 2507.21871v1

Authors (4): Katerina Marie Simkova, Adrien Doerig, Clayton Hickey, Ian Charest

Humans can effortlessly describe what they see, yet establishing a shared representational format between vision and language remains a significant challenge. Emerging evidence suggests that human brain representations in both vision and language are well predicted by semantic feature spaces obtained from large language models (LLMs). This raises the possibility that sensory systems converge in their inherent ability to transform their inputs onto shared, embedding-like representational space. However, it remains unclear how such a space manifests in human behaviour. To investigate this, sixty-three participants performed behavioural similarity judgements separately on 100 natural scene images and 100 corresponding sentence captions from the Natural Scenes Dataset. We found that visual and linguistic similarity judgements not only converge at the behavioural level but also predict a remarkably similar network of fMRI brain responses evoked by viewing the natural scene images. Furthermore, computational models trained to map images onto LLM-embeddings outperformed both category-trained and AlexNet controls in explaining the behavioural similarity structure. These findings demonstrate that human visual and linguistic similarity judgements are grounded in a shared, modality-agnostic representational structure that mirrors how the visual system encodes experience. The convergence between sensory and artificial systems suggests a common capacity of how conceptual representations are formed-not as arbitrary products of first order, modality-specific input, but as structured representations that reflect the stable, relational properties of the external world.

nan

Article 391

Title@2025-07-29 (2): Discovering Interpretable Ordinary Differential Equations from Noisy Data

Title: Discovering Interpretable Ordinary Differential Equations from Noisy Data

Das Entdecken interpretierbarer gewöhnlicher Differentialgleichungen aus Noisy-Daten

从噪音数据中发现可解释的普通差异 2507.21841v1

Authors (2): Rahul Golder, M. M. Faruque Hasan

The data-driven discovery of interpretable models approximating the underlying dynamics of a physical system has gained attraction in the past decade. Current approaches employ pre-specified functional forms or basis functions and often result in models that lack physical meaning and interpretability, let alone represent the true physics of the system. We propose an unsupervised parameter estimation methodology that first finds an approximate general solution, followed by a spline transformation to linearly estimate the coefficients of the governing ordinary differential equation (ODE). The approximate general solution is postulated using the same functional form as the analytical solution of a general homogeneous, linear, constant-coefficient ODE. An added advantage is its ability to produce a high-fidelity, smooth functional form even in the presence of noisy data. The spline approximation obtains gradient information from the functional form which are linearly independent and creates the basis of the gradient matrix. This gradient matrix is used in a linear system to find the coefficients of the ODEs. From the case studies, we observed that our modeling approach discovers ODEs with high accuracy and also promotes sparsity in the solution without using any regularization techniques. The methodology is also robust to noisy data and thus allows the integration of data-driven techniques into real experimental setting for data-driven learning of physical phenomena.

nan

Article 392

Title@2025-07-29 (2): Analysis of Fourier Neural Operators via Effective Field Theory

Title: Analysis of Fourier Neural Operators via Effective Field Theory

Analyse von Fourier-Neuraloperatoren über Effektive Feldtheorie

通过有效实地理论分析四架神经操作器 2507.21833v1

Authors (1): Taeyoung Kim

Fourier Neural Operators (FNOs) have emerged as leading surrogates for high-dimensional partial-differential equations, yet their stability, generalization and frequency behavior lack a principled explanation. We present the first systematic effective-field-theory analysis of FNOs in an infinite-dimensional function space, deriving closed recursion relations for the layer kernel and four-point vertex and then examining three practically important settings-analytic activations, scale-invariant cases and architectures with residual connections. The theory shows that nonlinear activations inevitably couple frequency inputs to high-frequency modes that are otherwise discarded by spectral truncation, and experiments confirm this frequency transfer. For wide networks we obtain explicit criticality conditions on the weight-initialization ensemble that keep small input perturbations to have uniform scale across depth, and empirical tests validate these predictions. Taken together, our results quantify how nonlinearity enables neural operators to capture non-trivial features, supply criteria for hyper-parameter selection via criticality analysis, and explain why scale-invariant activations and residual connections enhance feature learning in FNOs.

nan

Article 393

Title: Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences

Einführung von HALC: Eine allgemeine Pipeline für die Suche nach optimalen Promptenstrategien für die automatisierte Codierung mit LLMs in den Computational Social Sciences

介绍HALC:寻找计算社会科学中与LLMs自动编码的最佳加速战略的一般管道 2507.21831v1

Authors (3): Andreas Reich, Claudia Thoms, Tobias Schrimpf

LLMs are seeing widespread use for task automation, including automated coding in the social sciences. However, even though researchers have proposed different prompting strategies, their effectiveness varies across LLMs and tasks. Often trial and error practices are still widespread. We propose HALC$-$a general pipeline that allows for the systematic and reliable construction of optimal prompts for any given coding task and model, permitting the integration of any prompting strategy deemed relevant. To investigate LLM coding and validate our pipeline, we sent a total of 1,512 individual prompts to our local LLMs in over two million requests. We test prompting strategies and LLM task performance based on few expert codings (ground truth). When compared to these expert codings, we find prompts that code reliably for single variables (${\alpha}$climate = .76; ${\alpha}$movement = .78) and across two variables (${\alpha}$climate = .71; ${\alpha}$movement = .74) using the LLM Mistral NeMo. Our prompting strategies are set up in a way that aligns the LLM to our codebook$-$we are not optimizing our codebook for LLM friendliness. Our paper provides insights into the effectiveness of different prompting strategies, crucial influencing factors, and the identification of reliable prompts for each coding task and model.

nan

Article 394

Title@2025-07-29 (2): EEG-CLIP : Learning EEG representations from natural language descriptions

Title: EEG-CLIP : Learning EEG representations from natural language descriptions

EEG-CLIP : Lernen von EEG-Darstellungen aus natürlichen Sprachbeschreibungen

EEG-CLIP:从自然语言说明中学习EEG代表 2503.16531v2

Authors (3): Tidiane Camaret Ndir, Robin Tibor Schirrmeister, Tonio Ball

Deep networks for electroencephalogram (EEG) decoding are often only trained to solve one specific task, such as pathology or age decoding. A more general task-agnostic approach is to train deep networks to match a (clinical) EEG recording to its corresponding textual medical report and vice versa. This approach was pioneered in the computer vision domain matching images and their text captions and subsequently allowed to do successful zero-shot decoding using textual class prompts. In this work, we follow this approach and develop a contrastive learning framework, EEG-CLIP, that aligns the EEG time series and the descriptions of the corresponding clinical text in a shared embedding space. We investigated its potential for versatile EEG decoding, evaluating performance in a range of few-shot and zero-shot settings. Overall, we show that EEG-CLIP manages to non-trivially align text and EEG representations. Our work presents a promising approach to learn general EEG representations, which could enable easier analyses of diverse decoding questions through zero-shot decoding or training task-specific models from fewer training examples. The code for reproducing our results is available at https://github.com/tidiane-camaret/EEGClip

nan

Article 395

Title@2025-07-29 (2): MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation

Title: MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation

MIBoost: Ein Gradient, der Algorithmen für die variable Auswahl nach mehrfacher Imputation erhöht

MIBoost: 多重截断后变量选择的渐变推推算算法 2507.21807v1

Authors (1): Robert Kuchen

Statistical learning methods for automated variable selection, such as LASSO, elastic nets, or gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which creates several completed datasets. However, there is an ongoing debate on how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches modify the regularization methods LASSO and elastic nets by defining a single loss function, resulting in a unified set of coefficients across imputations. Our key contribution is to extend this principle to the framework of component-wise gradient boosting by proposing MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets. Simulation studies suggest that our approach yields prediction performance comparable to that of these recently proposed methods.

nan

Article 396

Title@2025-07-29 (2): Scaling and Distilling Transformer Models for sEMG

Title: Scaling and Distilling Transformer Models for sEMG

Skalierung und Destillierung von Transformer-Modellen für sEMG

SEMG 缩放和蒸馏变压器模型 2507.22094v1

Authors (6): Nicholas Mehlman, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Kelvin Niu, Alexander H. Miller, Shagun Sodhani

Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, the limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks. In this paper, we demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters, surpassing the model size regime investigated in other sEMG research (usually <10M parameters). We show that >100M-parameter models can be effectively distilled into models 50x smaller with minimal loss of performance (<1.5% absolute). This results in efficient and expressive models suitable for complex real-time sEMG tasks in real-world environments.

nan

Article 397

Title@2025-07-29 (2): Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

Title: Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations

Bayesian Neural Network Surrogats für die Bayesian Optimierung von CO2-Abscheidungs- und -Speicheroperationen

Bayesian碳捕获和储存作业最佳利用Bayesian 碳捕获和储存作业的Bayesian神经网络代管国 2507.21803v1

Authors (2): Sofianos Panagiotis Fotias, Vassilis Gaganis

Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhanced Oil Recovery, serves a dual purpose: it not only curbs CO$_2$ emissions and addresses climate change but also extends the operational lifespan and sustainability of oil fields and platforms, easing the shift toward greener practices. This paper delivers a thorough comparative evaluation of strategies for optimizing decision variables in CCS project development, employing a derivative-free technique known as Bayesian Optimization. In addition to Gaussian Processes, which usually serve as the gold standard in BO, various novel stochastic models were examined and compared within a BO framework. This research investigates the effectiveness of utilizing more exotic stochastic models than GPs for BO in environments where GPs have been shown to underperform, such as in cases with a large number of decision variables or multiple objective functions that are not similarly scaled. By incorporating Net Present Value (NPV) as a key objective function, the proposed framework demonstrates its potential to improve economic viability while ensuring the sustainable deployment of CCS technologies. Ultimately, this study represents the first application in the reservoir engineering industry of the growing body of BO research, specifically in the search for more appropriate stochastic models, highlighting its potential as a preferred method for enhancing sustainability in the energy sector.

nan

Article 398

Title@2025-07-29 (2): Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer

Title: Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer

Entsperrende Interpretierbarkeit für RF Sensing: Ein komplexes White-Box-Transformator

RF遥感的解锁可解释性:一个复杂而有价值的白箱变换器 2507.21799v1

Authors (3): Xie Zhang, Yina Wang, Chenshu Wu

The empirical success of deep learning has spurred its application to the radio-frequency (RF) domain, leading to significant advances in Deep Wireless Sensing (DWS). However, most existing DWS models function as black boxes with limited interpretability, which hampers their generalizability and raises concerns in security-sensitive physical applications. In this work, inspired by the remarkable advances of white-box transformers, we present RF-CRATE, the first mathematically interpretable deep network architecture for RF sensing, grounded in the principles of complex sparse rate reduction. To accommodate the unique RF signals, we conduct non-trivial theoretical derivations that extend the original real-valued white-box transformer to the complex domain. By leveraging the CR-Calculus framework, we successfully construct a fully complex-valued white-box transformer with theoretically derived self-attention and residual multi-layer perceptron modules. Furthermore, to improve the model’s ability to extract discriminative features from limited wireless data, we introduce Subspace Regularization, a novel regularization strategy that enhances feature diversity, resulting in an average performance improvement of 19.98% across multiple sensing tasks. We extensively evaluate RF-CRATE against seven baselines with multiple public and self-collected datasets involving different RF signals. The results show that RF-CRATE achieves performance on par with thoroughly engineered black-box models, while offering full mathematical interpretability. More importantly, by extending CRATE to the complex domain, RF-CRATE yields substantial improvements, achieving an average classification gain of 5.08% and reducing regression error by 10.34% across diverse sensing tasks compared to CRATE. RF-CRATE is fully open-sourced at: https://github.com/rfcrate/RF_CRATE.

nan

Article 399

Title@2025-07-29 (2): Unifying Post-hoc Explanations of Knowledge Graph Completions

Title: Unifying Post-hoc Explanations of Knowledge Graph Completions

Vereinheitlichung von Post-hoc-Erklärungen von Wissensgraphen-Vervollständigungen

知识图完成后统一解释 2507.22951v1

Authors (4): Alessandro Lonardi, Samy Badreddine, Tarek R. Besold, Pablo Sanchez Martin

Post-hoc explainability for Knowledge Graph Completion (KGC) lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified approach to post-hoc explainability in KGC. First, we propose a general framework to characterize post-hoc explanations via multi-objective optimization, balancing their effectiveness and conciseness. This unifies existing post-hoc explainability algorithms in KGC and the explanations they produce. Next, we suggest and empirically support improved evaluation protocols using popular metrics like Mean Reciprocal Rank and Hits@$k$. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end-users. By unifying methods and refining evaluation standards, this work aims to make research in KGC explainability more reproducible and impactful.

nan

Article 400

Title@2025-07-29 (2): Conceptualizing Uncertainty: A Concept-based Approach to Explaining Uncertainty

Title: Conceptualizing Uncertainty: A Concept-based Approach to Explaining Uncertainty

Konzeptualisierung der Unsicherheit: Ein konzeptbasierter Ansatz zur Erklärung der Unsicherheit

不确定性概念化:以概念为基础的解释不确定性的方法 2503.03443v2

Authors (5): Isaac Roberts, Alexander Schulz, Sarah Schroeder, Fabian Hinder, Barbara Hammer

Uncertainty in machine learning refers to the degree of confidence or lack thereof in a model’s predictions. While uncertainty quantification methods exist, explanations of uncertainty, especially in high-dimensional settings, remain an open challenge. Existing work focuses on feature attribution approaches which are restricted to local explanations. Understanding uncertainty, its origins, and characteristics on a global scale is crucial for enhancing interpretability and trust in a model’s predictions. In this work, we propose to explain the uncertainty in high-dimensional data classification settings by means of concept activation vectors which give rise to local and global explanations of uncertainty. We demonstrate the utility of the generated explanations by leveraging them to refine and improve our model.

nan

Article 401

Title@2025-07-29 (2): A finite time analysis of distributed Q-learning

Title: A finite time analysis of distributed Q-learning

Eine endliche Zeitanalyse des verteilten Q-Learning

对分发的 “ 学习 “ 的有限时间分析 2405.14078v2

Authors (2): Han-Dong Lim, Donghwan Lee

Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left{\frac{1}{\epsilon^2}\frac{t_{\text{mix}}}{(1-\gamma)^6 d_{\min}^4 } ,\frac{1}{\epsilon}\frac{\sqrt{

\gS

\gA

}}{(1-\sigma_2(\boldsymbol{W}))(1-\gamma)^4 d_{\min}^3} \right}\right)$ under tabular lookup

nan

Article 402

Title@2025-07-29 (2): Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Title: Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Domänenverallgemeinerung und Anpassung in Intensivpflege mit Ankerregression

锁定后退的密集护理中的广域化和适应 2507.21783v1

Authors (4): Malte Londschien, Manuel Burger, Gunnar Rätsch, Peter Bühlmann

The performance of predictive models in clinical settings often degrades when deployed in new hospitals due to distribution shifts. This paper presents a large-scale study of causality-inspired domain generalization on heterogeneous multi-center intensive care unit (ICU) data. We apply anchor regression and introduce anchor boosting, a novel, tree-based nonlinear extension, to a large dataset comprising 400,000 patients from nine distinct ICU databases. The anchor regularization consistently improves out-of-distribution performance, particularly for the most dissimilar target domains. The methods appear robust to violations of theoretical assumptions, such as anchor exogeneity. Furthermore, we propose a novel conceptual framework to quantify the utility of large external data datasets. By evaluating performance as a function of available target-domain data, we identify three regimes: (i) a domain generalization regime, where only the external model should be used, (ii) a domain adaptation regime, where refitting the external model is optimal, and (iii) a data-rich regime, where external data provides no additional value.

nan

Article 403

Title@2025-07-29 (2): Learning Kinetic Monte Carlo stochastic dynamics with Deep Generative Adversarial Networks

Title: Learning Kinetic Monte Carlo stochastic dynamics with Deep Generative Adversarial Networks

Learning Kinetic Monte Carlo stochastische Dynamik mit tiefen Generativen Adversarial Networks

与深创反对流网络一起学习运动式蒙特卡洛运动 2507.21763v1

Authors (4): Daniele Lanzoni, Olivier Pierre-Louis, Roberto Bergamaschini, Francesco Montalenti

We show that Generative Adversarial Networks (GANs) may be fruitfully exploited to learn stochastic dynamics, surrogating traditional models while capturing thermal fluctuations. Specifically, we showcase the application to a two-dimensional, many-particle system, focusing on surface-step fluctuations and on the related time-dependent roughness. After the construction of a dataset based on Kinetic Monte Carlo simulations, a conditional GAN is trained to propagate stochastically the state of the system in time, allowing the generation of new sequences with a reduced computational cost. Modifications with respect to standard GANs, which facilitate convergence and increase accuracy, are discussed. The trained network is demonstrated to quantitatively reproduce equilibrium and kinetic properties, including scaling laws, with deviations of a few percent from the exact value. Extrapolation limits and future perspectives are critically discussed.

nan

Article 404

Title@2025-07-29 (2): Unified machine-learning framework for property prediction and time-evolution simulation of strained alloy microstructure

Title: Unified machine-learning framework for property prediction and time-evolution simulation of strained alloy microstructure

Unified Machine-Learning-Framework für die Eigenschaftsvorhersage und Zeit-Evolutions-Simulation von strapazierter Legierungs-Mikrostruktur

财产预测统一机械学习框架和累累合金微结构时间演变模拟 2507.21760v1

Authors (6): Andrea Fantasia, Daniele Lanzoni, Niccolò Di Eugenio, Angelo Monteleone, Roberto Bergamaschini, Francesco Montalenti

We introduce a unified machine-learning framework designed to conveniently tackle the temporal evolution of alloy microstructures under the influence of an elastic field. This approach allows for the simultaneous extraction of elastic parameters from a short trajectory and for the prediction of further microstructure evolution under their influence. This is demonstrated by focusing on spinodal decomposition in the presence of a lattice mismatch eta, and by carrying out an extensive comparison between the ground-truth evolution supplied by phase field simulations and the predictions of suitable convolutional recurrent neural network architectures. The two tasks may then be performed subsequently into a cascade framework. Under a wide spectrum of misfit conditions, the here-presented cascade model accurately predicts eta and the full corresponding microstructure evolution, also when approaching critical conditions for spinodal decomposition. Scalability to larger computational domain sizes and mild extrapolation errors in time (for time sequences five times longer than the sampled ones during training) are demonstrated. The proposed framework is general and can be applied beyond the specific, prototypical system considered here as an example. Intriguingly, experimental videos could be used to infer unknown external parameters, prior to simulating further temporal evolution.

nan

Article 405

Title@2025-07-29 (2): VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Title: VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

VLA-Touch: Erweiterung von Vision-Language-Action-Modellen mit Dual-Level Taktiles Feedback

VLA-Touch:加强具有双轨反馈的愿景-语言-行动模式 2507.17294v2

Authors (5): Jianxin Bi, Kevin Yuchen Ma, Ce Hao, Mike Zheng Shou, Harold Soh

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.

nan

Article 406

Title@2025-07-29 (2): Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification

Title: Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification

Verbesserung der Neural Network Training mit Hilfe von Dynamic Learning Rate Schedules für PINNs und Bildklassifikation

改善神经网络培训,利用动态学习率表改进个人信息网络和图像分类 2507.21749v1

Authors (3): D. Veerababu, Ashwin A. Raikar, Prasanta K. Ghosh

Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made. The learning rate is one of such crucial hyperparameters, which is usually kept static during the training process. Learning dynamics in complex systems often requires a more adaptive approach to the learning rate. This adaptability becomes crucial to effectively navigate varying gradients and optimize the learning process during the training process. In this paper, a dynamic learning rate scheduler (DLRS) algorithm is presented that adapts the learning rate based on the loss values calculated during the training process. Experiments are conducted on problems related to physics-informed neural networks (PINNs) and image classification using multilayer perceptrons and convolutional neural networks, respectively. The results demonstrate that the proposed DLRS accelerates training and improves stability.

nan

Article 407

Title@2025-07-29 (2): evoxels: A differentiable physics framework for voxel-based microstructure simulations

Title: evoxels: A differentiable physics framework for voxel-based microstructure simulations

evoxels: Ein differenzierbares Physik-Framework für Voxel-basierte Mikrostruktursimulationen

evoxels:基于 voxel 的微结构模拟法的不同物理框架 2507.21748v1

Authors (4): Simon Daubner, Alexander E. Cohen, Benjamin Dörich, Samuel J. Cooper

Materials science inherently spans disciplines: experimentalists use advanced microscopy to uncover micro- and nanoscale structure, while theorists and computational scientists develop models that link processing, structure, and properties. Bridging these domains is essential for inverse material design where you start from desired performance and work backwards to optimal microstructures and manufacturing routes. Integrating high-resolution imaging with predictive simulations and data-driven optimization accelerates discovery and deepens understanding of process-structure-property relationships. The differentiable physics framework evoxels is based on a fully Pythonic, unified voxel-based approach that integrates segmented 3D microscopy data, physical simulations, inverse modeling, and machine learning.

nan

Article 408

Title@2025-07-29 (2): Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

Title: Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

Einmal quantifizieren, schnell trainieren: Allreduce-kompatible Kompression mit wahrnehmbaren Garantien

量化一次,快速列车:用可变担保进行减压-可比较压缩 2305.18627v2

Authors (4): Jihao Xin, Marco Canini, Peter Richtárik, Samuel Horváth

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this bottleneck. However, existing quantization schemes are often incompatible with Allreduce, the dominant communication primitive in distributed deep learning, and many prior solutions rely on heuristics without theoretical guarantees. We introduce Global-QSGD, an Allreduce-compatible gradient quantization method that leverages global norm scaling to reduce communication overhead while preserving accuracy. Global-QSGD is backed by rigorous theoretical analysis, extending standard unbiased compressor frameworks to establish formal convergence guarantees. Additionally, we develop a performance model to evaluate its impact across different hardware configurations. Extensive experiments on NVLink, PCIe, and large-scale cloud environments show that Global-QSGD accelerates distributed training by up to 3.51% over baseline quantization methods, making it a practical and efficient solution for large-scale deep learning workloads.

nan

Article 409

Title@2025-07-29 (2): Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques

Title: Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques

Motion Diffusion Autoencoder: Ermöglichen der Attributmanipulation in der menschlichen Bewegung demonstriert auf Karate-Techniken

运动扩散自动调控器:在空手道技术上展示的在人类运动中进行使能的特性操纵 2501.18729v2

Authors (2): Anthony Richardson, Felix Putze

Attribute manipulation deals with the problem of changing individual attributes of a data point or a time series, while leaving all other aspects unaffected. This work focuses on the domain of human motion, more precisely karate movement patterns. To the best of our knowledge, it presents the first success at manipulating attributes of human motion data. One of the key requirements for achieving attribute manipulation on human motion is a suitable pose representation. Therefore, we design a novel continuous, rotation-based pose representation that enables the disentanglement of the human skeleton and the motion trajectory, while still allowing an accurate reconstruction of the original anatomy. The core idea of the manipulation approach is to use a transformer encoder for discovering high-level semantics, and a diffusion probabilistic model for modeling the remaining stochastic variations. We show that the embedding space obtained from the transformer encoder is semantically meaningful and linear. This enables the manipulation of high-level attributes, by discovering their linear direction of change in the semantic embedding space and moving the embedding along said direction. All code and data is made publicly available.

nan

Article 410

Title@2025-07-29 (2): Zero-Shot Machine Unlearning with Proxy Adversarial Data Generation

Title: Zero-Shot Machine Unlearning with Proxy Adversarial Data Generation

Zero-Shot-Maschine-Entlernen mit Proxy-Adversarial-Datengenerierung

零热机离学,利用代理反对流数据生成 2507.21738v1

Authors (4): Huiqiang Chen, Tianqing Zhu, Xin Yu, Wanlei Zhou

Machine unlearning aims to remove the influence of specific samples from a trained model. A key challenge in this process is over-unlearning, where the model’s performance on the remaining data significantly drops due to the change in the model’s parameters. Existing unlearning algorithms depend on the remaining data to prevent this issue. As such, these methods are inapplicable in a more practical scenario, where only the unlearning samples are available (i.e., zero-shot unlearning). This paper presents a novel framework, ZS-PAG, to fill this gap. Our approach offers three key innovations: (1) we approximate the inaccessible remaining data by generating adversarial samples; (2) leveraging the generated samples, we pinpoint a specific subspace to perform the unlearning process, therefore preventing over-unlearning in the challenging zero-shot scenario; and (3) we consider the influence of the unlearning process on the remaining samples and design an influence-based pseudo-labeling strategy. As a result, our method further improves the model’s performance after unlearning. The proposed method holds a theoretical guarantee, and experiments on various benchmarks validate the effectiveness and superiority of our proposed method over several baselines.

nan

Article 411

Title@2025-07-29 (2): Generalized few-shot transfer learning architecture for modeling the EDFA gain spectrum

Title: Generalized few-shot transfer learning architecture for modeling the EDFA gain spectrum

Generalisierte wenig-shot Transfer Lernarchitektur für die Modellierung der EDFA Gain-Spektrum

用于模拟欧洲开发协会增益频谱的通用的几发转让学习架构 2507.21728v1

Authors (5): Agastya Raj, Zehao Wang, Tingjun Chen, Daniel C Kilper, Marco Ruffini

Accurate modeling of the gain spectrum in Erbium-Doped Fiber Amplifiers (EDFAs) is essential for optimizing optical network performance, particularly as networks evolve toward multi-vendor solutions. In this work, we propose a generalized few-shot transfer learning architecture based on a Semi-Supervised Self-Normalizing Neural Network (SS-NN) that leverages internal EDFA features - such as VOA input or output power and attenuation, to improve gain spectrum prediction. Our SS-NN model employs a two-phase training strategy comprising unsupervised pre-training with noise-augmented measurements and supervised fine-tuning with a custom weighted MSE loss. Furthermore, we extend the framework with transfer learning (TL) techniques that enable both homogeneous (same-feature space) and heterogeneous (different-feature sets) model adaptation across booster, preamplifier, and ILA EDFAs. To address feature mismatches in heterogeneous TL, we incorporate a covariance matching loss to align second-order feature statistics between source and target domains. Extensive experiments conducted across 26 EDFAs in the COSMOS and Open Ireland testbeds demonstrate that the proposed approach significantly reduces the number of measurements requirements on the system while achieving lower mean absolute errors and improved error distributions compared to benchmark methods.

nan

Article 412

Title@2025-07-29 (2): Riemannian Optimization on Tree Tensor Networks with Application in Machine Learning

Title: Riemannian Optimization on Tree Tensor Networks with Application in Machine Learning

Riemannsche Optimierung auf Tree Tensor-Netzwerken mit Anwendung im maschinellen Lernen

Riemannian 利用机器学习应用在树透镜网络上的优化 2507.21726v1

Authors (3): Marius Willner, Marco Trenti, Dirk Lebiedz

Tree tensor networks (TTNs) are widely used in low-rank approximation and quantum many-body simulation. In this work, we present a formal analysis of the differential geometry underlying TTNs. Building on this foundation, we develop efficient first- and second-order optimization algorithms that exploit the intrinsic quotient structure of TTNs. Additionally, we devise a backpropagation algorithm for training TTNs in a kernel learning setting. We validate our methods through numerical experiments on a representative machine learning task.

nan

Article 413

Title@2025-07-29 (2): Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis

Title: Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis

Intrinsische Barrieren und praktische Wege für die Mensch-AI-Ausrichtung: Eine auf Vereinbarungen basierende Komplexitätsanalyse

内在障碍和人类-AI协调的实用途径:基于协定的复杂程度分析 2502.05934v2

Authors (1): Aran Nayebi

We formalize AI alignment as a multi-objective optimization problem called $\langle M,N,\varepsilon,\delta\rangle$-agreement that generalizes prior approaches with fewer assumptions, in which a set of $N$ agents (including humans) must reach approximate ($\varepsilon$) agreement across $M$ candidate objectives with probability at least $1-\delta$. Using communication complexity, we prove an information-theoretic lower bound demonstrating that once either $M$ or $N$ is large enough, no interaction or rationality can avoid intrinsic alignment overheads. This barrier establishes rigorous intrinsic limits to alignment \emph{itself}, not merely to specific methods, clarifying a crucial no free lunch'' principle: encodingall human values’’ inevitably leads to misalignment, requiring future methods to explicitly manage complexity through consensus-driven reduction or prioritization of objectives. Complementing this impossibility result, we provide explicit algorithms achieving alignment under both computationally unbounded and bounded rationality with noisy messages. Even in these best-case scenarios where alignment to arbitrary precision is theoretically guaranteed, our analysis identifies three critical scalability barriers: the number of tasks ($M$), agents ($N$), and task state space size ($D$); thereby highlighting fundamental complexity-theoretic constraints and providing guidelines for safer, scalable human-AI collaboration.

nan

Article 414

Title@2025-07-29 (2): Robust Matrix Completion for Discrete Rating-Scale Data: Coping with Fake Profiles in Recommender Systems

Title: Robust Matrix Completion for Discrete Rating-Scale Data: Coping with Fake Profiles in Recommender Systems

Robuste Matrix-Vervollständigung für diskrete Rating-Scale-Daten: Umgang mit gefälschten Profilen in Recommender-Systemen

分立评分尺度数据强力矩阵补全:在推荐人系统中处理假配置配置文件 2412.20802v2

Authors (3): Aurore Archimbaud, Andreas Alfons, Ines Wilms

Recommender systems are essential tools in the digital landscape for connecting users with content that more closely aligns with their preferences. Matrix completion is a widely used statistical framework for such systems, aiming to predict a user’s preferences for items they have not yet rated by leveraging the observed ratings in a partially filled user-item rating matrix. Realistic applications of matrix completion in recommender systems must address several challenges that are too often neglected: (i) the discrete nature of rating-scale data, (ii) the presence of malicious users who manipulate the system to their advantage through the creation of fake profiles, and (iii) missing-not-at-random patterns, where users are more likely to rate items they expect to enjoy. Our goal in this paper is twofold. First, we propose a novel matrix completion method, robust discrete matrix completion (RDMC), designed specifically to handle the discrete nature of sparse rating-scale data and to remain reliable in the presence of adversarial manipulation. We evaluate RDMC through carefully designed experiments and realistic case studies. Our work therefore, secondly, offers a statistically-sound blueprint for future studies on how to evaluate matrix completion methods for recommender systems under realistic scenarios.

nan

Article 415

Title@2025-07-29 (2): Data-Driven Extended Corresponding State Approach for Residual Property Prediction of Hydrofluoroolefins

Title: Data-Driven Extended Corresponding State Approach for Residual Property Prediction of Hydrofluoroolefins

Datengetriebener erweiterter korrespondierender State Approach für die Vorhersage residualer Eigenschaften von Hydrofluorolefinen

关于氢氟烯烃残余财产预测的数据驱动扩展对应对应国家办法 2507.21720v1

Authors (2): Gang Wang, Peng Hu

Hydrofluoroolefins are considered the most promising next-generation refrigerants due to their extremely low global warming potential values, which can effectively mitigate the global warming effect. However, the lack of reliable thermodynamic data hinders the discovery and application of newer and superior hydrofluoroolefin refrigerants. In this work, integrating the strengths of theoretical method and data-driven method, we proposed a neural network extended corresponding state model to predict the residual thermodynamic properties of hydrofluoroolefin refrigerants. The innovation is that the fluids are characterized through their microscopic molecular structures by the inclusion of graph neural network module and the specialized design of model architecture to enhance its generalization ability. The proposed model is trained using the highly accurate data of available known fluids, and evaluated via the leave-one-out cross-validation method. Compared to conventional extended corresponding state models or cubic equation of state, the proposed model shows significantly improved accuracy for density and energy properties in liquid and supercritical regions, with average absolute deviation of 1.49% (liquid) and 2.42% (supercritical) for density, 3.37% and 2.50% for residual entropy, 1.85% and 1.34% for residual enthalpy. These results demonstrate the effectiveness of embedding physics knowledge into the machine learning model. The proposed neural network extended corresponding state model is expected to significantly accelerate the discovery of novel hydrofluoroolefin refrigerants.

nan

Article 416

Title@2025-07-29 (2): Quantum enhanced stratification of Breast Cancer: exploring quantum expressivity for real omics data

Title: Quantum enhanced stratification of Breast Cancer: exploring quantum expressivity for real omics data

Quantenverstärkte Schichtung des Brustkrebses: Erforschung der Quantenexpressivität für reale Omics-Daten

量子增强乳腺癌分层:探索真实动脉数据的数量表达性 2409.14089v2

Authors (4): Valeria Repetto, Elia Giuseppe Ceroni, Giuseppe Buonaiuto, Romina D’Aurizio

Quantum Machine Learning (QML) is considered one of the most promising applications of Quantum Computing in the Noisy Intermediate Scale Quantum (NISQ) era for the impact it is thought to have in the near future. Although promising theoretical assumptions, the exploration of how QML could foster new discoveries in Medicine and Biology fields is still in its infancy with few examples. In this study, we aimed to assess whether Quantum Kernels (QK) could effectively classify subtypes of Breast Cancer (BC) patients on the basis of molecular characteristics. We performed an heuristic exploration of encoding configurations with different entanglement levels to determine a trade-off between kernel expressivity and performances. Our results show that QKs yield comparable clustering results with classical methods while using fewer data points, and are able to fit the data with a higher number of clusters. Additionally, we conducted the experiments on the Quantum Processing Unit (QPU) to evaluate the effect of noise on the outcome. We found that less expressive encodings showed a higher resilience to noise, indicating that the computational pipeline can be reliably implemented on the NISQ devices. Our findings suggest that QK methods show promises for application in Precision Oncology, especially in scenarios where the dataset is limited in size and a granular non-trivial stratification of complex molecular data cannot be achieved classically.

nan

Article 417

Title@2025-07-29 (2): An Equal-Probability Partition of the Sample Space: A Non-parametric Inference from Finite Samples

Title: An Equal-Probability Partition of the Sample Space: A Non-parametric Inference from Finite Samples

Eine gleichberechtigte Teilung des Probenraums: Eine nicht-parametrische Folgerung von Finite-Proben

样板空间的同等概率部分:来自有限样品的非参数推论 2507.21712v1

Authors (1): Urban Eriksson

This paper investigates what can be inferred about an arbitrary continuous probability distribution from a finite sample of $N$ observations drawn from it. The central finding is that the $N$ sorted sample points partition the real line into $N+1$ segments, each carrying an expected probability mass of exactly $1/(N+1)$. This non-parametric result, which follows from fundamental properties of order statistics, holds regardless of the underlying distribution’s shape. This equal-probability partition yields a discrete entropy of $\log_2(N+1)$ bits, which quantifies the information gained from the sample and contrasts with Shannon’s results for continuous variables. I compare this partition-based framework to the conventional ECDF and discuss its implications for robust non-parametric inference, particularly in density and tail estimation.

nan

Article 418

Title@2025-07-29 (2): PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting

Title: PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting

PRIG: Physik-informierte und verstärkte interpretierbare GRU für die Prognose der Rohstoffnachfrage

PREIG: 物理知情和强化驱动的商品需求预测可解释的GRU 2507.21710v1

Authors (3): Hongwei Ma, Junbin Gao, Minh-Ngoc Tran

Accurately forecasting commodity demand remains a critical challenge due to volatile market dynamics, nonlinear dependencies, and the need for economically consistent predictions. This paper introduces PREIG, a novel deep learning framework tailored for commodity demand forecasting. The model uniquely integrates a Gated Recurrent Unit (GRU) architecture with physics-informed neural network (PINN) principles by embedding a domain-specific economic constraint: the negative elasticity between price and demand. This constraint is enforced through a customized loss function that penalizes violations of the physical rule, ensuring that model predictions remain interpretable and aligned with economic theory. To further enhance predictive performance and stability, PREIG incorporates a hybrid optimization strategy that couples NAdam and L-BFGS with Population-Based Training (POP). Experiments across multiple commodities datasets demonstrate that PREIG significantly outperforms traditional econometric models (ARIMA,GARCH) and deep learning baselines (BPNN,RNN) in both RMSE and MAPE. When compared with GRU,PREIG maintains good explainability while still performing well in prediction. By bridging domain knowledge, optimization theory and deep learning, PREIG provides a robust, interpretable, and scalable solution for high-dimensional nonlinear time series forecasting in economy.

nan

Article 419

Title@2025-07-29 (2): Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting

Title: Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting

Lokaler Aufmerksamkeitsmechanismus: Förderung der Transformer-Architektur für Langzeit-Zeitreihenprognosen

地方关注机制:促进长序列时间序列预测的变革结构 2410.03805v3

Authors (4): Ignacio Aguilera-Martos, Andrés Herrera-Poyatos, Julián Luengo, Francisco Herrera

Transformers have become the leading choice in natural language processing over other deep learning architectures. This trend has also permeated the field of time series analysis, especially for long-horizon forecasting, showcasing promising results both in performance and running time. In this paper, we introduce Local Attention Mechanism (LAM), an efficient attention mechanism tailored for time series analysis. This mechanism exploits the continuity properties of time series to reduce the number of attention scores computed. We present an algorithm for implementing LAM in tensor algebra that runs in time and memory O(nlogn), significantly improving upon the O(n^2) time and memory complexity of traditional attention mechanisms. We also note the lack of proper datasets to evaluate long-horizon forecast models. Thus, we propose a novel set of datasets to improve the evaluation of models addressing long-horizon forecasting challenges. Our experimental analysis demonstrates that the vanilla transformer architecture magnified with LAM surpasses state-of-the-art models, including the vanilla attention mechanism. These results confirm the effectiveness of our approach and highlight a range of future challenges in long-sequence time series forecasting.

nan

Article 420

Title@2025-07-29 (2): Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review

Title: Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review

综合病理图象和高通量血压数据以全面预测癌症存活率的机器学习的多式联运预测模型:系统审查 2507.16876v2

Authors (6): Charlotte Jennings, Andrew Broad, Lucy Godson, Emily Clarke, David Westhead, Darren Treanor

Multimodal machine learning integrating histopathology and molecular data shows promise for cancer prognostication. We systematically reviewed studies combining whole slide images (WSIs) and high-throughput omics to predict overall survival. Searches of EMBASE, PubMed, and Cochrane CENTRAL (12/08/2024), plus citation screening, identified eligible studies. Data extraction used CHARMS; bias was assessed with PROBAST+AI; synthesis followed SWiM and PRISMA 2020. Protocol: PROSPERO (CRD42024594745). Forty-eight studies (all since 2017) across 19 cancer types met criteria; all used The Cancer Genome Atlas. Approaches included regularised Cox regression (n=4), classical ML (n=13), and deep learning (n=31). Reported c-indices ranged 0.550-0.857; multimodal models typically outperformed unimodal ones. However, all studies showed unclear/high bias, limited external validation, and little focus on clinical utility. Multimodal WSI-omics survival prediction is a fast-growing field with promising results but needs improved methodological rigor, broader datasets, and clinical evaluation. Funded by NPIC, Leeds Teaching Hospitals NHS Trust, UK (Project 104687), supported by UKRI Industrial Strategy Challenge Fund.

nan

Article 421

Title@2025-07-29 (2): Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

Title: Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

Hierarchische Mischungen von Gaußianern zur kombinierten Dimensionalitätsreduktion und Clusterbildung

用于合并减少维度和集群的高斯人等级混合物 2206.04841v2

Authors (2): Sacha Sokoloski, Philipp Berens

We introduce hierarchical mixtures of Gaussians (HMoGs), which unify dimensionality reduction and clustering into a single probabilistic model. HMoGs provide closed-form expressions for the model likelihood, exact inference over latent states and cluster membership, and exact algorithms for maximum-likelihood optimization. The novel exponential family parameterization of HMoGs greatly reduces their computational complexity relative to similar model-based methods, allowing them to efficiently model hundreds of latent dimensions, and thereby capture additional structure in high-dimensional data. We demonstrate HMoGs on synthetic experiments and MNIST, and show how joint optimization of dimensionality reduction and clustering facilitates increased model performance. We also explore how sparsity-constrained dimensionality reduction can further improve clustering performance while encouraging interpretability. By bridging classical statistical modelling with the scale of modern data and compute, HMoGs offer a practical approach to high-dimensional clustering that preserves statistical rigour, interpretability, and uncertainty quantification that is often missing from embedding-based, variational, and self-supervised methods.

nan

Article 422

Title@2025-07-29 (2): diffSPH: Differentiable Smoothed Particle Hydrodynamics for Adjoint Optimization and Machine Learning

Title: diffSPH: Differentiable Smoothed Particle Hydrodynamics for Adjoint Optimization and Machine Learning

diffSPH: Differenzierbare geglättete Partikelhydrodynamik für Adjoint-Optimierung und maschinelles Lernen

diffSPH: 用于联合优化和机械学习的有差异的滑动粒子流体动力学 2507.21684v1

Authors (2): Rene Winchenbach, Nils Thuerey

We present diffSPH, a novel open-source differentiable Smoothed Particle Hydrodynamics (SPH) framework developed entirely in PyTorch with GPU acceleration. diffSPH is designed centrally around differentiation to facilitate optimization and machine learning (ML) applications in Computational Fluid Dynamics~(CFD), including training neural networks and the development of hybrid models. Its differentiable SPH core, and schemes for compressible (with shock capturing and multi-phase flows), weakly compressible (with boundary handling and free-surface flows), and incompressible physics, enable a broad range of application areas. We demonstrate the framework’s unique capabilities through several applications, including addressing particle shifting via a novel, target-oriented approach by minimizing physical and regularization loss terms, a task often intractable in traditional solvers. Further examples include optimizing initial conditions and physical parameters to match target trajectories, shape optimization, implementing a solver-in-the-loop setup to emulate higher-order integration, and demonstrating gradient propagation through hundreds of full simulation steps. Prioritizing readability, usability, and extensibility, this work offers a foundational platform for the CFD community to develop and deploy novel neural networks and adjoint optimization applications.

nan

Article 423

Title@2025-07-29 (2): Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

Title: Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

Unüberwachte Risikofaktoren-Identifikation über Krebsarten und Datenmodalitäten durch erklärbare künstliche Intelligenz

通过可解释的人工智能,在癌症类型和数据模式中,通过可解释的人工智能,确定各种癌症类型和数据模式的不受监督的风险因素 2506.12944v3

Authors (10): Maximilian Ferle, Jonas Ader, Thomas Wiemers, Nora Grieb, Adrian Lindenmeyer, Hans-Jonas Meyer, Thomas Neumuth, Markus Kreuz, Kristin Reiche, Maximilian Merz

Risk stratification is a key tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases. Post-hoc explainability analyses uncover clinically meaningful features determining the group assignments which align well with established risk factors and thus lend strong weight to the methods utility. This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification, enabling the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.

nan

Article 424

Title@2025-07-29 (2): Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing

Title: Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing

Implementierung großer Quantenboltzmann-Maschinen als generative KI-Modelle für die Datensatz-Balancing

实施大型量子波尔兹曼机器作为数据集平衡生成的AI模型 2502.03086v2

Authors (6): Salvatore Sinno, Markus Bertl, Arati Sahoo, Bhavika Bhalgamiya, Thomas Groß, Nicholas Chancellor

This study explores the implementation of large Quantum Restricted Boltzmann Machines (QRBMs), a key advancement in Quantum Machine Learning (QML), as generative models on D-Wave’s Pegasus quantum hardware to address dataset imbalance in Intrusion Detection Systems (IDS). By leveraging Pegasus’s enhanced connectivity and computational capabilities, a QRBM with 120 visible and 120 hidden units was successfully embedded, surpassing the limitations of default embedding tools. The QRBM synthesized over 1.6 million attack samples, achieving a balanced dataset of over 4.2 million records. Comparative evaluations with traditional balancing methods, such as SMOTE and RandomOversampler, revealed that QRBMs produced higher-quality synthetic samples, significantly improving detection rates, precision, recall, and F1 score across diverse classifiers. The study underscores the scalability and efficiency of QRBMs, completing balancing tasks in milliseconds. These findings highlight the transformative potential of QML and QRBMs as next-generation tools in data preprocessing, offering robust solutions for complex computational challenges in modern information systems.

nan

Article 425

Title@2025-07-29 (2): Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification

Title: Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification

Wahrscheinlichkeitskonsistenz im maschinellen Lernen und seine Verbindung zur Unsicherheitsquantifizierung

机器学习及其与不确定性量化的关联的概率一致性 2507.21670v1

Authors (2): Paul Patrone, Anthony Kearsley

Machine learning (ML) is often viewed as a powerful data analysis tool that is easy to learn because of its black-box nature. Yet this very nature also makes it difficult to quantify confidence in predictions extracted from ML models, and more fundamentally, to understand how such models are mathematical abstractions of training data. The goal of this paper is to unravel these issues and their connections to uncertainty quantification (UQ) by pursuing a line of reasoning motivated by diagnostics. In such settings, prevalence - i.e. the fraction of elements in class - is often of inherent interest. Here we analyze the many interpretations of prevalence to derive a level-set theory of classification, which shows that certain types of self-consistent ML models are equivalent to class-conditional probability distributions. We begin by studying the properties of binary Bayes optimal classifiers, recognizing that their boundary sets can be reinterpreted as level-sets of pairwise density ratios. By parameterizing Bayes classifiers in terms of the prevalence, we then show that they satisfy important monotonicity and class-switching properties that can be used to deduce the density ratios without direct access to the boundary sets. Moreover, this information is sufficient for tasks such as constructing the multiclass Bayes-optimal classifier and estimating inherent uncertainty in the class assignments. In the multiclass case, we use these results to deduce normalization and self-consistency conditions, the latter being equivalent to the law of total probability for classifiers. We also show that these are necessary conditions for arbitrary ML models to have valid probabilistic interpretations. Throughout we demonstrate how this analysis informs the broader task of UQ for ML via an uncertainty propagation framework.

nan

Article 426

Title@2025-07-29 (2): Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification

Title: Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification

Quantum Boltzmann Maschinen mit paralleler Abschirmung für medizinische Bildklassifikation

使用平行安内处理医疗图像分类的量子波尔兹曼机器 2507.14116v2

Authors (8): Daniëlle Schuman, Mark V. Seebode, Tobias Rohe, Maximilian Balthasar Mansky, Michael Schroedl-Baumann, Jonas Stein, Claudia Linnhoff-Popien, Florian Krellner

Exploiting the fact that samples drawn from a quantum annealer inherently follow a Boltzmann-like distribution, annealing-based Quantum Boltzmann Machines (QBMs) have gained increasing popularity in the quantum research community. While they harbor great promises for quantum speed-up, their usage currently stays a costly endeavor, as large amounts of QPU time are required to train them. This limits their applicability in the NISQ era. Following the idea of No`e et al. (2024), who tried to alleviate this cost by incorporating parallel quantum annealing into their unsupervised training of QBMs, this paper presents an improved version of parallel quantum annealing that we employ to train QBMs in a supervised setting. Saving qubits to encode the inputs, the latter setting allows us to test our approach on medical images from the MedMNIST data set (Yang et al., 2023), thereby moving closer to real-world applicability of the technology. Our experiments show that QBMs using our approach already achieve reasonable results, comparable to those of similarly-sized Convolutional Neural Networks (CNNs), with markedly smaller numbers of epochs than these classical models. Our parallel annealing technique leads to a speed-up of almost 70 % compared to regular annealing-based BM executions.

nan

Article 427

Title@2025-07-29 (2): DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

Title: DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

DGP: Ein Dual-Granularity-Prompting-Framework für Betrugserkennung mit grafisch verbesserten LLMs

DGP:用图形增强的LMMs进行欺诈侦查的两层提示框架 2507.21653v1

Authors (5): Yuan Li, Jun Hu, Bryan Hooi, Bingsheng He, Cheng Chen

Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features, often rich in textual data, and graph structural information. Recently, Graph-Enhanced LLMs emerge as a promising graph learning approach that converts graph information into prompts, exploiting LLMs’ ability to reason over both textual and structural information. Among them, text-only prompting, which converts graph information to prompts consisting solely of text tokens, offers a solution that relies only on LLM tuning without requiring additional graph-specific encoders. However, text-only prompting struggles on heterogeneous fraud-detection graphs: multi-hop relations expand exponentially with each additional hop, leading to rapidly growing neighborhoods associated with dense textual information. These neighborhoods may overwhelm the model with long, irrelevant content in the prompt and suppress key signals from the target node, thereby degrading performance. To address this challenge, we propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node while summarizing neighbor information into coarse-grained text prompts. DGP introduces tailored summarization strategies for different data modalities, bi-level semantic abstraction for textual fields and statistical aggregation for numerical features, enabling effective compression of verbose neighbor content into concise, informative prompts. Experiments across public and industrial datasets demonstrate that DGP operates within a manageable token budget while improving fraud detection performance by up to 6.8% (AUPRC) over state-of-the-art methods, showing the potential of Graph-Enhanced LLMs for fraud detection.

nan

Article 428

Title@2025-07-29 (2): Hyperbolic Genome Embeddings

Title: Hyperbolic Genome Embeddings

Hyperbolische Genom-Embeddings

超双曲基基因组嵌入器 2507.21648v1

Authors (3): Raiyan R. Khan, Philippe Chlenski, Itsik Pe’er

Current approaches to genomic sequence modeling often struggle to align the inductive biases of machine learning models with the evolutionarily-informed structure of biological systems. To this end, we formulate a novel application of hyperbolic CNNs that exploits this structure, enabling more expressive DNA sequence representations. Our strategy circumvents the need for explicit phylogenetic mapping while discerning key properties of sequences pertaining to core functional and regulatory behavior. Across 37 out of 42 genome interpretation benchmark datasets, our hyperbolic models outperform their Euclidean equivalents. Notably, our approach even surpasses state-of-the-art performance on seven GUE benchmark datasets, consistently outperforming many DNA language models while using orders of magnitude fewer parameters and avoiding pretraining. Our results include a novel set of benchmark datasets–the Transposable Elements Benchmark–which explores a major but understudied component of the genome with deep evolutionary significance. We further motivate our work by exploring how our hyperbolic models recognize genomic signal under various data-generating conditions and by constructing an empirical method for interpreting the hyperbolicity of dataset embeddings. Throughout these assessments, we find persistent evidence highlighting the potential of our hyperbolic framework as a robust paradigm for genome representation learning. Our code and benchmark datasets are available at https://github.com/rrkhan/HGE.

nan

Article 429

Title@2025-07-29 (2): Whilter: A Whisper-based Data Filter for “In-the-Wild” Speech Corpora Using Utterance-level Multi-Task Classification

Title: Whilter: A Whisper-based Data Filter for “In-the-Wild” Speech Corpora Using Utterance-level Multi-Task Classification

Whilter: Ein Whisper-basierter Datenfilter für “In-the-Wild”-Sprachkorpora unter Verwendung einer Multi-Task-Klassifikation auf Utterance-Ebene

时 : 以语音为基础的数据过滤器, 用于“在野中”演讲团, 使用异地级多任务分类 2507.21642v1

Authors (6): William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong

Large-scale in-the-wild speech datasets have become more prevalent in recent years due to increased interest in models that can learn useful features from unlabelled data for tasks such as speech recognition or synthesis. These datasets often contain undesirable features, such as multiple speakers, non-target languages, and music, which may impact model learning. The Whilter model is proposed as a multitask solution to identify these undesirable samples. Whilter uses a Whisper encoder with an attention-based classifier to solve five diverse classification problems at once. In addition, an annotated dataset is published for a subset of two popular in-the-wild corpora. Whilter achieves F1 scores above 85% and equal error rates of 6.5% to 7.8% for three of five subtasks, outperforming a state-of-the-art BEATs classifier on speech-specific classes, with a notable decrease in processing time compared to a combination of single-task alternatives.

nan

Article 430

Title@2025-07-29 (2): Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

Title: Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

Assistax: Ein hardwarebeschleunigtes Lern-Benchmark für assistive Robotik

辅助:辅助机器人学辅助机器人学硬件加速增强学习基准 2507.21638v1

Authors (9): Leonard Hinckeldey, Elliot Fosong, Elle Miller, Rimvydas Rubavicius, Trevor McInroe, Patricia Wollstadt, Christiane B. Wiebel-Herboth, Subramanian Ramamoorthy, Stefano V. Albrecht

The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX’s hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent’s zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.

nan

Article 431

Title@2025-07-29 (2): Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Title: Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Verteidigung gegen unvorhergesehene Ausfallmodi mit latenten Adversarial Training

利用远程反反向培训,防范意外失灵模式 2403.05030v6

Authors (4): Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell

Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors. Finding and fixing these is challenging because the attack surface is so large – it is not tractable to exhaustively search for inputs that may elicit harmful behaviors. Red-teaming and adversarial training (AT) are commonly used to improve robustness, however, they empirically struggle to fix failure modes that differ from the attacks used during training. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without leveraging knowledge of what they are or using inputs that elicit them. LAT makes use of the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. Here, we use it to defend against failure modes without examples that elicit them. Specifically, we use LAT to remove backdoors and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness to novel attacks and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.

nan

Article 432

Title@2025-07-29 (2): Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Title: Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Latent Adversarial Training verbessert Robustheit für persistente schädliche Verhalten in LLMs

长效对长效有害行为培训能提高长效LMM中持久性有害行为的积极性 2407.15549v3

Authors (11): Abhay Sheshadri, Aidan Ewart, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper

Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of ‘jailbreaking’ techniques to elicit harmful text from models that were fine-tuned to be harmless. Recent work on red-teaming, model editing, and interpretability suggests that this challenge stems from how (adversarial) fine-tuning largely serves to suppress rather than remove undesirable capabilities from LLMs. Prior work has introduced latent adversarial training (LAT) as a way to improve robustness to broad classes of failures. These prior works have considered untargeted latent space attacks where the adversary perturbs latent activations to maximize loss on examples of desirable behavior. Untargeted LAT can provide a generic type of robustness but does not leverage information about specific failure modes. Here, we experiment with targeted LAT where the adversary seeks to minimize loss on a specific competing task. We find that it can augment a wide variety of state-of-the-art methods. First, we use targeted LAT to improve robustness to jailbreaks, outperforming a strong R2D2 baseline with orders of magnitude less compute. Second, we use it to more effectively remove backdoors with no knowledge of the trigger. Finally, we use it to more effectively unlearn knowledge for specific undesirable tasks in a way that is also more robust to re-learning. Overall, our results suggest that targeted LAT can be an effective tool for defending against harmful behaviors from LLMs.

nan

Article 433

Title@2025-07-29 (2): A calibration test for evaluating set-based epistemic uncertainty representations

Title: A calibration test for evaluating set-based epistemic uncertainty representations

Ein Kalibriertest zur Bewertung setbasierter epistemischer Unsicherheitsdarstellungen

用于评价基于固定定点的感知性不确定表示的校准测试 2502.16299v2

Authors (5): Mira Jürgens, Thomas Mortier, Eyke Hüllermeier, Viktor Bengs, Willem Waegeman

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set’s predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.

nan

Article 434

Title@2025-07-29 (2): Hybrid activation functions for deep neural networks: S3 and S4 – a novel approach to gradient flow optimization

Title: Hybrid activation functions for deep neural networks: S3 and S4 – a novel approach to gradient flow optimization

Hybride Aktivierungsfunktionen für tiefe neuronale Netzwerke: S3 und S4 – ein neuartiger Ansatz zur Gradientenflussoptimierung

深神经网络的混合激活功能:S3和S4 – – 梯度流优化的新办法 2507.22090v1

Authors (1): Sergii Kavun

Activation functions are critical components in deep neural networks, directly influencing gradient flow, training stability, and model performance. Traditional functions like ReLU suffer from dead neuron problems, while sigmoid and tanh exhibit vanishing gradient issues. We introduce two novel hybrid activation functions: S3 (Sigmoid-Softsign) and its improved version S4 (smoothed S3). S3 combines sigmoid for negative inputs with softsign for positive inputs, while S4 employs a smooth transition mechanism controlled by a steepness parameter k. We conducted comprehensive experiments across binary classification, multi-class classification, and regression tasks using three different neural network architectures. S4 demonstrated superior performance compared to nine baseline activation functions, achieving 97.4% accuracy on MNIST, 96.0% on Iris classification, and 18.7 MSE on Boston Housing regression. The function exhibited faster convergence (-19 for ReLU) and maintained stable gradient flow across network depths. Comparative analysis revealed S4’s gradient range of [0.24, 0.59] compared to ReLU’s 18% dead neurons in deep networks. The S4 activation function addresses key limitations of existing functions through its hybrid design and smooth transition mechanism. The tunable parameter k allows adaptation to different tasks and network depths, making S4 a versatile choice for deep learning applications. These findings suggest that hybrid activation functions represent a promising direction for improving neural network training dynamics.

nan

Article 435

Title@2025-07-29 (2): Collaborative filtering based on nonnegative/binary matrix factorization

Title: Collaborative filtering based on nonnegative/binary matrix factorization

Kollaborative Filterung auf der Grundlage nichtnegativer/binärer Matrixfaktorisierung

基于非负负/二进制矩阵因子化的合作过滤 2410.10381v4

Authors (5): Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo

Collaborative filtering generates recommendations by exploiting user-item similarities based on rating data, which often contains numerous unrated items. To predict scores for unrated items, matrix factorization techniques such as nonnegative matrix factorization (NMF) are often employed. Nonnegative/binary matrix factorization (NBMF), which is an extension of NMF, approximates a nonnegative matrix as the product of nonnegative and binary matrices. While previous studies have applied NBMF primarily to dense data such as images, this paper proposes a modified NBMF algorithm tailored for collaborative filtering with sparse data. In the modified method, unrated entries in the rating matrix are masked, enhancing prediction accuracy. Furthermore, utilizing a low-latency Ising machine in NBMF is advantageous in terms of the computation time, making the proposed method beneficial.

nan

Article 436

Title@2025-07-29 (2): Categorical Distributions are Effective Neural Network Outputs for Event Prediction

Title: Categorical Distributions are Effective Neural Network Outputs for Event Prediction

Kategorische Verteilungen sind effektive neurale Netzwerk-Ausgaben für Event-Vorhersage

分类分布是事件预测的有效神经网络产出 2507.21616v1

Authors (2): Kevin Doran, Tom Baden

We demonstrate the effectiveness of using a simple neural network output, a categorical probability distribution, for the task of next spike prediction. This case study motivates an investigation into why this simple output structure is not commonly used with neural temporal point process models. We find evidence that many existing datasets for evaluating temporal point process models do not reveal much information about the underlying event generating processes, and many existing models perform well due to regularization effects of model size and constraints on output structure. We extend existing datasets and create new ones in order to explore outside of this information limited regime and find that outputting a simple categorical distribution is competitive across a wide range of datasets.

nan

Article 437

Title@2025-07-29 (2): Multi-branch of Attention Yields Accurate Results for Tabular Data

Title: Multi-branch of Attention Yields Accurate Results for Tabular Data

Multi-Zweige der Aufmerksamkeit Erträge genaue Ergebnisse für Tabellendaten

多部门关注表格数据的准确结果 2502.12507v2

Authors (5): Xuechen Li, Yupeng Li, Jian Liu, Xiaolin Jin, Xin Hu

Tabular data inherently exhibits significant feature heterogeneity, but existing transformer-based methods lack specialized mechanisms to handle this property. To bridge the gap, we propose MAYA, an encoder-decoder transformer-based framework. In the encoder, we design a Multi-Branch of Attention (MBA) that constructs multiple parallel attention branches and averages the features at each branch, effectively fusing heterogeneous features while limiting parameter growth. Additionally, we employ collaborative learning with a dynamic consistency weight constraint to produce more robust representations. In the decoder stage, cross-attention is utilized to seamlessly integrate tabular data with corresponding label features. This dual-attention mechanism effectively captures both intra-instance and inter-instance interactions. We evaluate the proposed method on a wide range of datasets and compare it with other state-of-the-art transformer-based methods. Extensive experiments demonstrate that our model achieves superior performance among transformer-based methods in both tabular classification and regression tasks.

nan

Article 438

Title: Demystifying Misconceptions in Social Bots Research

Entmystifizierende Missverständnisse in der Social Bots Forschung

社会生物群研究中解密错误观念 2303.17251v4

Authors (6): Stefano Cresci, Kai-Cheng Yang, Angelo Spognardi, Roberto Di Pietro, Filippo Menczer, Marinella Petrocchi

Research on social bots aims at advancing knowledge and providing solutions to one of the most debated forms of online manipulation. Yet, social bot research is plagued by widespread biases, hyped results, and misconceptions that set the stage for ambiguities, unrealistic expectations, and seemingly irreconcilable findings. Overcoming such issues is instrumental towards ensuring reliable solutions and reaffirming the validity of the scientific method. Here, we discuss a broad set of consequential methodological and conceptual issues that affect current social bots research, illustrating each with examples drawn from recent studies. More importantly, we demystify common misconceptions, addressing fundamental points on how social bots research is discussed. Our analysis surfaces the need to discuss research about online disinformation and manipulation in a rigorous, unbiased, and responsible way. This article bolsters such effort by identifying and refuting common fallacious arguments used by both proponents and opponents of social bots research, as well as providing directions toward sound methodologies for future research.

nan

Article 439

Title@2025-07-29 (2): A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

Title: A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

Eine Umfrage über speichereffiziente Transformer-basierte Modellausbildung in KI für die Wissenschaft

关于AIST科学领域基于记忆-有效变压器的模型培训的调查 2501.11847v2

Authors (6): Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Shanshan Li, Dongsheng Li

Scientific research faces high costs and inefficiencies with traditional methods, but the rise of deep learning and large language models (LLMs) offers innovative solutions. This survey reviews transformer-based LLM applications across scientific fields such as biology, medicine, chemistry, and meteorology, underscoring their role in advancing research. However, the continuous expansion of model size has led to significant memory demands, hindering further development and application of LLMs for science. This survey systematically reviews and categorizes memory-efficient pre-training techniques for large-scale transformers, including algorithm-level, system-level, and hardware-software co-optimization. Using AlphaFold 2 as an example, we demonstrate how tailored memory optimization methods can reduce storage needs while preserving prediction accuracy. By bridging model efficiency and scientific application needs, we hope to provide insights for scalable and cost-effective LLM training in AI for science.

nan

Article 440

Title@2025-07-29 (2): Principled Curriculum Learning using Parameter Continuation Methods

Title: Principled Curriculum Learning using Parameter Continuation Methods

Prinzipielles Curriculum Lernen mit Parameter-Weiterführungsmethoden

使用参数持续方法进行有原则的课程学习 2507.22089v1

Authors (2): Harsh Nilesh Pathak, Randy Paffenroth

In this work, we propose a parameter continuation method for the optimization of neural networks. There is a close connection between parameter continuation, homotopies, and curriculum learning. The methods we propose here are theoretically justified and practically effective for several problems in deep neural networks. In particular, we demonstrate better generalization performance than state-of-the-art optimization techniques such as ADAM for supervised and unsupervised learning tasks.

nan

Article 441

Title@2025-07-29 (2): A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models

Title: A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models

Eine detaillierte Faktorenanalyse für den politischen Kompasstest: Navigieren von Ideologien großer Sprachmodelle

《政治指南测试的详细要素分析:掌握大语言模式的特征》 2506.22493v2

Authors (7): Sadia Kamal, Lalu Prasad Yadav Prakash, S M Rafiuddin, Mohammed Rakib, Arunkumar Bagavathi, Atriya Sen, Sagnik Ray Choudhury

Political Compass Test (PCT) or similar questionnaires have been used to quantify LLM’s political leanings. Building on a recent line of work that examines the validity of PCT tests, we demonstrate that variation in standard generation parameters does not significantly impact the models’ PCT scores. However, external factors such as prompt variations and fine-tuning individually and in combination affect the same. Finally, we demonstrate that when models are fine-tuned on text datasets with higher political content than others, the PCT scores are not differentially affected. This calls for a thorough investigation into the validity of PCT and similar tests, as well as the mechanism by which political leanings are encoded in LLMs.

nan

Article 442

Title@2025-07-29 (2): Machine Learning Risk Intelligence for Green Hydrogen Investment: Insights for Duqm R3 Auction

Title: Machine Learning Risk Intelligence for Green Hydrogen Investment: Insights for Duqm R3 Auction

Machine Learning Risk Intelligence für Green Hydrogen Investment: Einblicke für Duqm R3 Auktion

绿色氢投资的机器学习风险情报:Duqm R3拍卖的透视 2507.19529v2

Authors (2): Obumneme Nwafor, Mohammed Abdul Majeed Al Hooti

As green hydrogen emerges as a major component of global decarbonisation, Oman has positioned itself strategically through national auctions and international partnerships. Following two successful green hydrogen project rounds, the country launched its third auction (R3) in the Duqm region. While this area exhibits relative geospatial homogeneity, it is still vulnerable to environmental fluctuations that pose inherent risks to productivity. Despite growing global investment in green hydrogen, operational data remains scarce, with major projects like Saudi Arabia’s NEOM facility not expected to commence production until 2026, and Oman’s ACME Duqm project scheduled for 2028. This absence of historical maintenance and performance data from large-scale hydrogen facilities in desert environments creates a major knowledge gap for accurate risk assessment for infrastructure planning and auction decisions. Given this data void, environmental conditions emerge as accessible and reliable proxy for predicting infrastructure maintenance pressures, because harsh desert conditions such as dust storms, extreme temperatures, and humidity fluctuations are well-documented drivers of equipment degradation in renewable energy systems. To address this challenge, this paper proposes an Artificial Intelligence decision support system that leverages publicly available meteorological data to develop a predictive Maintenance Pressure Index (MPI), which predicts risk levels and future maintenance demands on hydrogen infrastructure. This tool strengthens regulatory foresight and operational decision-making by enabling temporal benchmarking to assess and validate performance claims over time. It can be used to incorporate temporal risk intelligence into auction evaluation criteria despite the absence of historical operational benchmarks.

nan

Article 443

Title@2025-07-29 (2): Meta-Designing Quantum Experiments with Language Models

Title: Meta-Designing Quantum Experiments with Language Models

Meta-Designing Quantenexperimente mit Sprachmodellen

配有语言模型的元设计量子实验 2406.02470v2

Authors (6): Sören Arlt, Haonan Duan, Felix Li, Sang Michael Xie, Yuhuai Wu, Mario Krenn

Artificial Intelligence (AI) can solve complex scientific problems beyond human capabilities, but the resulting solutions offer little insight into the underlying physical principles. One prominent example is quantum physics, where computers can discover experiments for the generation of specific quantum states, but it is unclear how finding general design concepts can be automated. Here, we address this challenge by training a transformer-based language model to create human-readable Python code, which solves an entire class of problems in a single pass. This strategy, which we call meta-design, enables scientists to gain a deeper understanding and extrapolate to larger experiments without additional optimization. To demonstrate the effectiveness of our approach, we uncover previously unknown experimental generalizations of important quantum states, e.g. from condensed matter physics. The underlying methodology of meta-design can naturally be extended to fields such as materials science or engineering.

nan

Article 444

Title@2025-07-29 (2): “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents

Title: “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents

“So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents

“告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。 2507.07848v2

Authors (3): Giovanni Dispoto, Paolo Bonetti, Marcello Restelli

Recent advances in Reinforcement Learning (RL) largely benefit from the inclusion of Deep Neural Networks, boosting the number of novel approaches proposed in the field of Deep Reinforcement Learning (DRL). These techniques demonstrate the ability to tackle complex games such as Atari, Go, and other real-world applications, including financial trading. Nevertheless, a significant challenge emerges from the lack of interpretability, particularly when attempting to comprehend the underlying patterns learned, the relative importance of the state features, and how they are integrated to generate the policy’s output. For this reason, in mission-critical and real-world settings, it is often preferred to deploy a simpler and more interpretable algorithm, although at the cost of performance. In this paper, we propose a novel algorithm, supported by theoretical guarantees, that can extract an interpretable policy (e.g., a linear policy) without disregarding the peculiarities of expert behavior. This result is obtained by considering the advantage function, which includes information about why an action is superior to the others. In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience. The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario, demonstrating its ability to extract meaningful information from complex expert policies.

nan

Article 445

Title@2025-07-29 (2): An em algorithm for quantum Boltzmann machines

Title: An em algorithm for quantum Boltzmann machines

Ein Em-Algorithmus für Quantenboltzmann-Maschinen

Boltzmann 量子机器的 Em 算法 2507.21569v1

Authors (3): Takeshi Kimura, Kohtaro Kato, Masahito Hayashi

We develop a quantum version of the em algorithm for training quantum Boltzmann machines. The em algorithm is an information-geometric extension of the well-known expectation-maximization (EM) algorithm, offering a structured alternative to gradient-based methods with potential advantages in stability and convergence. We implement the algorithm on a semi-quantum restricted Boltzmann machine, where quantum effects are confined to the hidden layer. This structure enables analytical update rules while preserving quantum expressivity. Numerical experiments on benchmark datasets show that the proposed method achieves stable learning and outperforms gradient-based training in several cases. These results demonstrate the potential of information-geometric optimization for quantum machine learning, particularly in settings where standard methods struggle due to non-commutativity or vanishing gradients.

nan

Article 446

Title@2025-07-29 (2): Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Title: Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie

生成异质多维数据:比较研究 2507.00090v3

Authors (4): Michael Corbeau, Emmanuelle Claeys, Mathieu Serrurier, Pascale Zaraté

Allocation of personnel and material resources is highly sensible in the case of firefighter interventions. This allocation relies on simulations to experiment with various scenarios. The main objective of this allocation is the global optimization of the firefighters response. Data generation is then mandatory to study various scenarios In this study, we propose to compare different data generation methods. Methods such as Random Sampling, Tabular Variational Autoencoders, standard Generative Adversarial Networks, Conditional Tabular Generative Adversarial Networks and Diffusion Probabilistic Models are examined to ascertain their efficacy in capturing the intricacies of firefighter interventions. Traditional evaluation metrics often fall short in capturing the nuanced requirements of synthetic datasets for real-world scenarios. To address this gap, an evaluation of synthetic data quality is conducted using a combination of domain-specific metrics tailored to the firefighting domain and standard measures such as the Wasserstein distance. Domain-specific metrics include response time distribution, spatial-temporal distribution of interventions, and accidents representation. These metrics are designed to assess data variability, the preservation of fine and complex correlations and anomalies such as event with a very low occurrence, the conformity with the initial statistical distribution and the operational relevance of the synthetic data. The distribution has the particularity of being highly unbalanced, none of the variables following a Gaussian distribution, adding complexity to the data generation process.

nan

Article 447

Title@2025-07-29 (2): Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

Title: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

Verbesserung der Graphen-basierten Empfehlungen mit Mehrheitsvoting LLM-Rerank Augmentation

采用多数表决的LLM-重新升级增强图表为基础的建议 2507.21563v1

Authors (6): Minh-Anh Nguyen, Bao Nguyen, Ha Lan N. T., Tuan Anh Hoang, Duc-Trong Le, Dung D. Le

Recommendation systems often suffer from data sparsity caused by limited user-item interactions, which degrade their performance and amplify popularity bias in real-world scenarios. This paper proposes a novel data augmentation framework that leverages Large Language Models (LLMs) and item textual descriptions to enrich interaction data. By few-shot prompting LLMs multiple times to rerank items and aggregating the results via majority voting, we generate high-confidence synthetic user-item interactions, supported by theoretical guarantees based on the concentration of measure. To effectively leverage the augmented data in the context of a graph recommendation system, we integrate it into a graph contrastive learning framework to mitigate distributional shift and alleviate popularity bias. Extensive experiments show that our method improves accuracy and reduces popularity bias, outperforming strong baselines.

nan

Article 448

Title@2025-07-29 (2): PEVLM: Parallel Encoding for Vision-Language Models

Title: PEVLM: Parallel Encoding for Vision-Language Models

PEVLM: Parallele Kodierung für Vision-Language-Modelle

PEVLM: 视觉语言模型平行编码 2506.19651v3

Authors (8): Letian Kang, Shixian Luo, Yiqiang Li, Yuxin Yin, Shenxuan Zhou, Xiaoyang Yu, Jin Yang, Yong Wu

Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long video scenarios. PEVLM partitions the input video into context blocks with a shared sink block, while preserving sequential position embeddings to align the attention weight distribution with that of Full-Attention. This design reduces attention complexity from $O((T \times N)^2)$ to $O(T \times N)$ where $T$ is the number of frames and $N$ the number of tokens per frame, without sacrificing accuracy. Extensive experiments across multiple state-of-the-art models and benchmarks demonstrate that PEVLM consistently outperforms existing parallel encoding approaches, achieving up to \textbf{7.47x} speedup in attention computation and reducing end-to-end latency by \textbf{40\%}. Remarkably, PEVLM not only maintains high accuracy, but in some settings even surpasses Full-Attention performance. Under strict latency constraints, it achieves substantial gains, improving accuracy from \textbf{23.26\%} to \textbf{61.03\%}. These results underscore the effectiveness of PEVLM for low-latency, long-context video understanding, making it a promising solution for real-world applications.

nan

Article 449

Title@2025-07-29 (2): C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Title: C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

C2-Evo: Co-Evolving multimodale Daten und Modell zur Selbstverbesserung

C2-Evo:共同演进的多模式数据和自我改进理由模型 2507.16518v2

Authors (12): Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang

Recent advances in multimodal large language models (MLLMs) have shown impressive reasoning capabilities. However, further enhancing existing MLLMs necessitates high-quality vision-language datasets with carefully curated task complexities, which are both costly and challenging to scale. Although recent self-improving models that iteratively refine themselves offer a feasible solution, they still suffer from two core challenges: (i) most existing methods augment visual or textual data separately, resulting in discrepancies in data complexity (e.g., over-simplified diagrams paired with redundant textual descriptions); and (ii) the evolution of data and models is also separated, leading to scenarios where models are exposed to tasks with mismatched difficulty levels. To address these issues, we propose C2-Evo, an automatic, closed-loop self-improving framework that jointly evolves both training data and model capabilities. Specifically, given a base dataset and a base model, C2-Evo enhances them by a cross-modal data evolution loop and a data-model evolution loop. The former loop expands the base dataset by generating complex multimodal problems that combine structured textual sub-problems with iteratively specified geometric diagrams, while the latter loop adaptively selects the generated problems based on the performance of the base model, to conduct supervised fine-tuning and reinforcement learning alternately. Consequently, our method continuously refines its model and training data, and consistently obtains considerable performance gains across multiple mathematical reasoning benchmarks. Our code, models, and datasets will be released.

nan

Article 450

Title@2025-07-29 (2): Fine-Grained Perturbation Guidance via Attention Head Selection

Title: Fine-Grained Perturbation Guidance via Attention Head Selection

Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl

通过 “ 关注负责人甄选 “ 指导 2506.10978v3

Authors (10): Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Sangwu Lee, Sayak Paul, Susung Hong, Seungryong Kim

Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose “HeadHunter”, a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head’s attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.

nan

Article 451

Title@2025-07-29 (2): On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems

Title: On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems

Über Politik-Stochastik in gegenseitiger Information Optimale Kontrolle von Linearsystemen

关于相互信息中政策现状的相互信息最佳控制线性系统 2507.21543v1

Authors (2): Shoju Enami, Kenji Kashima

In recent years, mutual information optimal control has been proposed as an extension of maximum entropy optimal control. Both approaches introduce regularization terms to render the policy stochastic, and it is important to theoretically clarify the relationship between the temperature parameter (i.e., the coefficient of the regularization term) and the stochasticity of the policy. Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information optimal control. In this paper, we investigate this relationship for a mutual information optimal control problem (MIOCP) of discrete-time linear systems. After extending the result of a previous study of the MIOCP, we establish the existence of an optimal policy of the MIOCP, and then derive the respective conditions on the temperature parameter under which the optimal policy becomes stochastic and deterministic. Furthermore, we also derive the respective conditions on the temperature parameter under which the policy obtained by an alternating optimization algorithm becomes stochastic and deterministic. The validity of the theoretical results is demonstrated through numerical experiments.

nan

Article 452

Title@2025-07-29 (2): AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery

Title: AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery

KI-Ming rückwärts: Auslöschende archäologische Landschaften in Mesopotamien und automatische Erkennung von Stätten auf CORONA-Bildern

AI-Ming倒向:美索不达米亚消失的考古景观和自动探测CORONA图像上的遗址 2507.13420v2

Authors (4): Alessandro Pistola, Valentina Orru’, Nicolo’ Marchetti, Marco Roccetti

By upgrading an existing deep learning model with the knowledge provided by one of the oldest sets of grayscale satellite imagery, known as CORONA, we improved the AI model attitude towards the automatic identification of archaeological sites in an environment which has been completely transformed in the last five decades, including the complete destruction of many of those same sites. The initial Bing based convolutional network model was retrained using CORONA satellite imagery for the district of Abu Ghraib, west of Baghdad, central Mesopotamian floodplain. The results were twofold and surprising. First, the detection precision obtained on the area of interest increased sensibly: in particular, the Intersection over Union (IoU) values, at the image segmentation level, surpassed 85 percent, while the general accuracy in detecting archeological sites reached 90 percent. Second, our retrained model allowed the identification of four new sites of archaeological interest (confirmed through field verification), previously not identified by archaeologists with traditional techniques. This has confirmed the efficacy of using AI techniques and the CORONA imagery from the 1960 to discover archaeological sites currently no longer visible, a concrete breakthrough with significant consequences for the study of landscapes with vanishing archaeological evidence induced by anthropization

nan

Article 453

Title@2025-07-29 (2): Automatic Classification of User Requirements from Online Feedback – A Replication Study

Title: Automatic Classification of User Requirements from Online Feedback – A Replication Study

Automatische Klassifizierung der Benutzeranforderungen aus Online-Feedback – Eine Replikationsstudie

在线反馈用户要求自动分类 – – 复制研究 2507.21532v1

Authors (7): Meet Bhatt, Nic Boilard, Muhammad Rehan Chaudhary, Cole Thompson, Jacob Idoko, Aakash Sorathiya, Gouri Ginde

Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Although RE research is rooted in empirical investigation, it has paid limited attention to replicating NLP for RE (NLP4RE) studies. The rapidly advancing realm of NLP is creating new opportunities for efficient, machine-assisted workflows, which can bring new perspectives and results to the forefront. Thus, we replicate and extend a previous NLP4RE study (baseline), “Classifying User Requirements from Online Feedback in Small Dataset Environments using Deep Learning”, which evaluated different deep learning models for requirement classification from user reviews. We reproduced the original results using publicly released source code, thereby helping to strengthen the external validity of the baseline study. We then extended the setup by evaluating model performance on an external dataset and comparing results to a GPT-4o zero-shot classifier. Furthermore, we prepared the replication study ID-card for the baseline study, important for evaluating replication readiness. Results showed diverse reproducibility levels across different models, with Naive Bayes demonstrating perfect reproducibility. In contrast, BERT and other models showed mixed results. Our findings revealed that baseline deep learning models, BERT and ELMo, exhibited good generalization capabilities on an external dataset, and GPT-4o showed performance comparable to traditional baseline machine learning models. Additionally, our assessment confirmed the baseline study’s replication readiness; however missing environment setup files would have further enhanced readiness. We include this missing information in our replication package and provide the replication study ID-card for our study to further encourage and support the replication of our study.

nan

Article 454

Title@2025-07-29 (2): Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series

Title: Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series

Hierarchische stochastische Differentialgleichungsmodelle für latentes Manifold Learning in der Neural Time Series

神经时间序列中前部蒙花层学习的等级学历史理学分等模型 2507.21531v1

Authors (5): Pedram Rajaei, Maryam Ostadsharif Memar, Navid Ziaei, Behzad Nazari, Ali Yousefi

The manifold hypothesis suggests that high-dimensional neural time series lie on a low-dimensional manifold shaped by simpler underlying dynamics. To uncover this structure, latent dynamical variable models such as state-space models, recurrent neural networks, neural ordinary differential equations, and Gaussian Process Latent Variable Models are widely used. We propose a novel hierarchical stochastic differential equation (SDE) model that balances computational efficiency and interpretability, addressing key limitations of existing methods. Our model assumes the trajectory of a manifold can be reconstructed from a sparse set of samples from the manifold trajectory. The latent space is modeled using Brownian bridge SDEs, with points - specified in both time and value - sampled from a multivariate marked point process. These Brownian bridges define the drift of a second set of SDEs, which are then mapped to the observed data. This yields a continuous, differentiable latent process capable of modeling arbitrarily complex time series as the number of manifold points increases. We derive training and inference procedures and show that the computational cost of inference scales linearly with the length of the observation data. We then validate our model on both synthetic data and neural recordings to demonstrate that it accurately recovers the underlying manifold structure and scales effectively with data dimensionality.

nan

Article 455

Title@2025-07-29 (2): Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

Title: Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

Auf dem Weg zu einer erleichterten Fairnessbewertung von KI-basierten Haut-Lesions-Klassifikatoren durch GenAI-basierte Bildsynthese

通过GenAI基于GenAI的图像合成,促进基于AI的皮肤皮质分类分类的公平评估 2507.17860v2

Authors (4): Ko Watanabe, Stanislav Frolov, Adriano Lucieri, Andreas Dengel

Recent advancements in Deep Learning and its application on the edge hold great potential for the revolution of routine screenings for skin cancers like Melanoma. Along with the anticipated benefits of this technology, potential dangers arise from unforseen and inherent biases. Thus, assessing and improving the fairness of such systems is of utmost importance. A key challenge in fairness assessment is to ensure that the evaluation dataset is sufficiently representative of different Personal Identifiable Information (PII) (sex, age, and race) and other minority groups. Against the backdrop of this challenge, this study leverages the state-of-the-art Generative AI (GenAI) LightningDiT model to assess the fairness of publicly available melanoma classifiers. The results suggest that fairness assessment using highly realistic synthetic data is a promising direction. Yet, our findings indicate that verifying fairness becomes difficult when the melanoma-detection model used for evaluation is trained on data that differ from the dataset underpinning the synthetic images. Nonetheless, we propose that our approach offers a valuable new avenue for employing synthetic data to gauge and enhance fairness in medical-imaging GenAI systems.

nan

Article 456

Title@2025-07-29 (2): A Scalable and High Availability Solution for Recommending Resolutions to Problem Tickets

Title: A Scalable and High Availability Solution for Recommending Resolutions to Problem Tickets

Eine skalierbare und hochverfügbare Lösung für die Empfehlung von Auflösungen an Problemlösungen

向问题罚单建议解决方案的可扩展和高可用性解决方案 2507.19846v2

Authors (3): Harish Saragadam, Chetana K Nayak, Joy Bose

Resolution of incidents or problem tickets is a common theme in service industries in any sector, including billing and charging systems in telecom domain. Machine learning can help to identify patterns and suggest resolutions for the problem tickets, based on patterns in the historical data of the tickets. However, this process may be complicated due to a variety of phenomena such as data drift and issues such as missing data, lack of data pertaining to resolutions of past incidents, too many similar sounding resolutions due to free text and similar sounding text. This paper proposes a robust ML-driven solution employing clustering, supervised learning, and advanced NLP models to tackle these challenges effectively. Building on previous work, we demonstrate clustering-based resolution identification, supervised classification with LDA, Siamese networks, and One-shot learning, Index embedding. Additionally, we present a real-time dashboard and a highly available Kubernetes-based production deployment. Our experiments with both the open-source Bitext customer-support dataset and proprietary telecom datasets demonstrate high prediction accuracy.

nan

Article 457

Title@2025-07-29 (2): Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. I: Penalty Approach

Title: Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. I: Penalty Approach

Nonconvex Optimization Framework für Gruppen-Spar-Feedback Linear-Quadratische Optimale Kontrolle. I: Strafverfahren

群分反馈线性水分最佳最佳控制非康化最佳框架。 2507.18114v2

Authors (3): Lechen Feng, Xun Li, Yuan-Hua Ni

This paper develops a unified nonconvex optimization framework for the design of group-sparse feedback controllers in infinite-horizon linear-quadratic (LQ) problems. We address two prominent extensions of the classical LQ problem: the distributed LQ problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ), both of which are motivated by the need for scalable and structure-aware control in large-scale systems. Unlike existing approaches that rely on convex relaxations or are limited to block-diagonal structures, we directly formulate the controller synthesis as a finite-dimensional nonconvex optimization problem with group $\ell_0$-norm regularization, capturing general sparsity patterns. We establish a connection between DFT-LQ and SF-LQ problems, showing that both can be addressed within our unified framework. Furthermore, we propose a penalty-based proximal alternating linearized minimization (PALM) algorithm and provide a rigorous convergence analysis under mild assumptions, overcoming the lack of coercivity in the objective function. The proposed method admits efficient solvers for all subproblems and guarantees global convergence to critical points. Our results fill a key gap in the literature by enabling the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates or restrictive structural assumptions.

nan

Article 458

Title@2025-07-29 (2): TolerantECG: A Foundation Model for Imperfect Electrocardiogram

Title: TolerantECG: A Foundation Model for Imperfect Electrocardiogram

TolerantECG: Ein Grundmodell für ein imperfektes Elektrokardiogramm

缩放式ECG:不完美心电图基金会模型 2507.09887v2

Authors (4): Huynh Dang Nguyen, Trong-Thang Pham, Ngan Le, Van Nguyen

The electrocardiogram (ECG) is an essential and effective tool for diagnosing heart diseases. However, its effectiveness can be compromised by noise or unavailability of one or more leads of the standard 12-lead recordings, resulting in diagnostic errors or uncertainty. To address these challenges, we propose TolerantECG, a foundation model for ECG signals that is robust to noise and capable of functioning with arbitrary subsets of the standard 12-lead ECG. TolerantECG training combines contrastive and self-supervised learning frameworks to jointly learn ECG signal representations alongside their corresponding knowledge-retrieval-based text report descriptions and corrupted or lead-missing signals. Comprehensive benchmarking results demonstrate that TolerantECG consistently ranks as the best or second-best performer across various ECG signal conditions and class levels in the PTB-XL dataset, and achieves the highest performance on the MIT-BIH Arrhythmia Database.

nan

Article 459

Title@2025-07-29 (2): Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment

Title: Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment

Posture-Driven Action Intent Inferenz für Spielstil und Müdigkeit Bewertung

游戏风格和Fatigue评估的推论 2507.11642v2

Authors (2): Abhishek Jaiswal, Nisheeth Srivastava

Posture-based mental state inference has significant potential in diagnosing fatigue, preventing injury, and enhancing performance across various domains. Such tools must be research-validated with large datasets before being translated into practice. Unfortunately, such vision diagnosis faces serious challenges due to the sensitivity of human subject data. To address this, we identify sports settings as a viable alternative for accumulating data from human subjects experiencing diverse emotional states. We test our hypothesis in the game of cricket and present a posture-based solution to identify human intent from activity videos. Our method achieves over 75\% F1 score and over 80\% AUC-ROC in discriminating aggressive and defensive shot intent through motion analysis. These findings indicate that posture leaks out strong signals for intent inference, even with inherent noise in the data pipeline. Furthermore, we utilize existing data statistics as weak supervision to validate our findings, offering a potential solution for overcoming data labelling limitations. This research contributes to generalizable techniques for sports analytics and also opens possibilities for applying human behavior analysis across various fields.

nan

Article 460

Title@2025-07-29 (2): Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

Title: Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

Langfristige Fairness-Anfragen und Verfolgungen im Bereich des maschinellen Lernens: Eine Übersicht von Begriffen, Methoden und Herausforderungen

机构学习方面的长期公平调查与追踪:对名称、方法与挑战的调查 2406.06736v3

Authors (7): Usman Gohar, Zeyu Tang, Jialu Wang, Kun Zhang, Peter L. Spirtes, Yang Liu, Lu Cheng

The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.

nan

Article 461

Title@2025-07-29 (2): Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Title: Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Persona-Vektoren: Überwachung und Kontrolle von Charaktereigenschaften in Sprachmodellen

人向量:监测和控制语言模式中的字符轨迹 2507.21509v1

Authors (5): Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, Jack Lindsey

Large language models interact with users through a simulated ‘Assistant’ persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model’s activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant’s personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors. These shifts can be mitigated through post-hoc intervention, or avoided in the first place with a new preventative steering method. Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description.

nan

Article 462

Title@2025-07-29 (2): Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

Title: Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

Kodezi Chronos: Ein Debugging-First Language Model für Repository-Scale Code Understanding

Kodezi Chronos:调试第一语言模式,用于存储库-范围守则理解 2507.12482v2

Authors (5): Ishraq Khan, Assad Chowdary, Sharoz Haseeb, Urvish Patel, Yousuf Zaii

Large Language Models (LLMs) have improved code generation and software automation, but remain limited by inference-time context and lack structured reasoning over code. Debugging remains unsolved despite these advances. While Claude Opus 4 and GPT-4.1 achieve >70% on code synthesis benchmarks, they perform <15% on real debugging tasks. We introduce Kodezi Chronos, a language model built specifically for debugging. Chronos combines Adaptive Graph-Guided Retrieval to navigate codebases up to 10 million lines using multi-hop traversal (92% precision, 85% recall), Persistent Debug Memory trained on 15M+ sessions, and a 7-layer architecture for iterative fix-test-refine loops. On 5,000 real-world scenarios, Chronos achieves 67.3% fix accuracy, compared to 14.2% and 13.8% for Claude and GPT-4.1 respectively. Chronos reduces debugging time by 40% and iteration count by 65%. It resolves complex multi-file bugs involving cross-repository context and temporal reasoning. Key limitations include 23.4% success on hardware-dependent issues and 41.2% on dynamic language errors. Theoretical analysis shows O(k log d) retrieval complexity with convergence guarantees. In a human evaluation (N=50), 89% of participants preferred Chronos over baseline models. Chronos will be available in Kodezi OS in Q4 2025 and via API in Q1 2026.

nan

Article 463

Title@2025-07-29 (2): Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Title: Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Probabilistische gerichtete Distanzfelder für Ray-Based Shape-Darstellungen

光以光基形状表示法的直射距离场概率 2404.09081v2

Authors (4): Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven Dickinson, Allan Jepson

In modern computer vision, the optimal representation of 3D shape continues to be task-dependent. One fundamental operation applied to such representations is differentiable rendering, as it enables inverse graphics approaches in learning frameworks. Standard explicit shape representations (voxels, point clouds, or meshes) are often easily rendered, but can suffer from limited geometric fidelity, among other issues. On the other hand, implicit representations (occupancy, distance, or radiance fields) preserve greater fidelity, but suffer from complex or inefficient rendering processes, limiting scalability. In this work, we devise Directed Distance Fields (DDFs), a novel neural shape representation that builds upon classical distance fields. The fundamental operation in a DDF maps an oriented point (position and direction) to surface visibility and depth. This enables efficient differentiable rendering, obtaining depth with a single forward pass per pixel, as well as differential geometric quantity extraction (e.g., surface normals), with only additional backward passes. Using probabilistic DDFs (PDDFs), we show how to model inherent discontinuities in the underlying field. We then apply DDFs to several applications, including single-shape fitting, generative modelling, and single-image 3D reconstruction, showcasing strong performance with simple architectural components via the versatility of our representation. Finally, since the dimensionality of DDFs permits view-dependent geometric artifacts, we conduct a theoretical investigation of the constraints necessary for view consistency. We find a small set of field properties that are sufficient to guarantee a DDF is consistent, without knowing, for instance, which shape the field is expressing.

nan

Article 464

Title@2025-07-29 (2): Semantic segmentation of SEM images of lower bainitic and tempered martensitic steels

Title: Semantic segmentation of SEM images of lower bainitic and tempered martensitic steels

Semantische Segmentierung von SEM-Bildern von unteren bainitischen und gehärteten martensitischen Stählen

SEM图象的金属和温和的金属冶金钢的金属图象的语义分解 2312.17251v2

Authors (8): Xiaohan Bie, Manoj Arthanari, Evelin Barbosa de Melo, Baihua Ren, Juancheng Li, Stephen Yue, Salim Brahimi, Jun Song

This study employs deep learning techniques to segment scanning electron microscope images, enabling a quantitative analysis of carbide precipitates in lower bainite and tempered martensite steels with comparable strength. Following segmentation, carbides are investigated, and their volume percentage, size distribution, and orientations are probed within the image dataset. Our findings reveal that lower bainite and tempered martensite exhibit comparable volume percentages of carbides, albeit with a more uniform distribution of carbides in tempered martensite. Carbides in lower bainite demonstrate a tendency for better alignment than those in tempered martensite, aligning with the observations of other researchers. However, both microstructures display a scattered carbide orientation, devoid of any discernible pattern. Comparative analysis of aspect ratios and sizes of carbides in lower bainite and tempered martensite unveils striking similarities. The deep learning model achieves an impressive pixelwise accuracy of 98.0% in classifying carbide/iron matrix at the individual pixel level. The semantic segmentation derived from deep learning extends its applicability to the analysis of secondary phases in various materials, offering a time-efficient, versatile AI-powered workflow for quantitative microstructure analysis.

nan

Article 465

Title@2025-07-29 (2): Evaluation and Benchmarking of LLM Agents: A Survey

Title: Evaluation and Benchmarking of LLM Agents: A Survey

Bewertung und Benchmarking von LLM-Agenten: Eine Umfrage

对LLLM代理的评估和基准确定:调查 2507.21504v1

Authors (4): Mahmoud Mohammadi, Yipeng Li, Jane Lo, Wendy Yip

The rise of LLM-based agents has opened new frontiers in AI applications, yet evaluating these agents remains a complex and underdeveloped area. This survey provides an in-depth overview of the emerging field of LLM agent evaluation, introducing a two-dimensional taxonomy that organizes existing work along (1) evaluation objectives – what to evaluate, such as agent behavior, capabilities, reliability, and safety – and (2) evaluation process – how to evaluate, including interaction modes, datasets and benchmarks, metric computation methods, and tooling. In addition to taxonomy, we highlight enterprise-specific challenges, such as role-based access to data, the need for reliability guarantees, dynamic and long-horizon interactions, and compliance, which are often overlooked in current research. We also identify future research directions, including holistic, more realistic, and scalable evaluation. This work aims to bring clarity to the fragmented landscape of agent evaluation and provide a framework for systematic assessment, enabling researchers and practitioners to evaluate LLM agents for real-world deployment.

nan

Article 466

Title@2025-07-29 (2): Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Title: Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen

为采用大语言模式的高级指示提供激励理由 2506.01413v5

Authors (9): Yulei Qin, Gang Li, Zongyi Li, Zihan Xu, Yuchen Shi, Zhekai Lin, Xiao Cui, Ke Li, Xing Sun

Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose RAIF, a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to a 8B LLM. Evaluation on OOD constraints also confirms the generalizability of our RAIF. Codes and data are available at https://github.com/yuleiqin/RAIF. Keywords: reinforcement learning with verifiable rewards (RLVR), instruction following, complex instructions

nan

Article 467

Title@2025-07-29 (2): Multifunctional physical reservoir computing in soft tensegrity robots

Title: Multifunctional physical reservoir computing in soft tensegrity robots

Multifunktionales physikalisches Reservoir-Computing in Soft-Angespanntheit-Robotern

多功能物理储油层软时势机器人计算 2507.21496v1

Authors (4): Ryo Terajima, Katsuma Inoue, Kohei Nakajima, Yasuo Kuniyoshi

Recent studies have demonstrated that the dynamics of physical systems can be utilized for the desired information processing under the framework of physical reservoir computing (PRC). Robots with soft bodies are examples of such physical systems, and their nonlinear body-environment dynamics can be used to compute and generate the motor signals necessary for the control of their own behavior. In this simulation study, we extend this approach to control and embed not only one but also multiple behaviors into a type of soft robot called a tensegrity robot. The resulting system, consisting of the robot and the environment, is a multistable dynamical system that converges to different attractors from varying initial conditions. Furthermore, attractor analysis reveals that there exist “untrained attractors” in the state space of the system outside the training data. These untrained attractors reflect the intrinsic properties and structures of the tensegrity robot and its interactions with the environment. The impacts of these recent findings in PRC remain unexplored in embodied AI research. We here illustrate their potential to understand various features of embodied cognition that have not been fully addressed to date.

nan

Article 468

Title@2025-07-29 (2): Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning

Title: Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning

Latte: Collaborative Test-Time Adaption von Vision-Language-Modellen im Federated Learning

Latte:联邦学习联合会愿景-语言模型协作测试-时间适应 2507.21494v1

Authors (6): Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He

Test-time adaptation with pre-trained vision-language models has gained increasing attention for addressing distribution shifts during testing. Among these approaches, memory-based algorithms stand out due to their training-free nature and ability to leverage historical test data. However, existing test-time adaptation methods are typically designed for a single domain with abundant data. In decentralized settings such as federated learning, applying these methods individually to each client suffers from limited test data, while directly sharing a single global memory via the server prevents proper personalization to each client’s unique distribution. To address this, we propose Latte, a novel framework where each client maintains a local memory to store embeddings from its own historical test data and an external memory to store class prototypes from other relevant clients. During communication, each client retrieves prototypes from similar clients under the server’s coordination to expand its memory. For local adaptation, Latte utilizes both embedding similarity and uncertainty to enhance model performance. Our theoretical analysis shows that Latte effectively leverages in-distribution clients while remaining robust to out-of-distribution clients. Extensive experiments on domain adaptation and corruption benchmarks validate that Latte achieves superior performance in decentralized settings, while introducing only negligible communication and computation costs. Our code is available at https://github.com/baowenxuan/Latte .

nan

Article 469

Title@2025-07-29 (2): Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control

Title: Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control

Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle

利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题 2505.03134v3

Authors (3): Sajjad Rezvani Boroujeni, Hossein Abedi, Tom Bush

Visual defect detection in industrial glass manufacturing remains a critical challenge due to the low frequency of defective products, leading to imbalanced datasets that limit the performance of deep learning models and computer vision systems. This paper presents a novel approach using Denoising Diffusion Probabilistic Models (DDPMs) to generate synthetic defective glass product images for data augmentation, effectively addressing class imbalance issues in manufacturing quality control and automated visual inspection. The methodology significantly enhances image classification performance of standard CNN architectures (ResNet50V2, EfficientNetB0, and MobileNetV2) in detecting anomalies by increasing the minority class representation. Experimental results demonstrate substantial improvements in key machine learning metrics, particularly in recall for defective samples across all tested deep neural network architectures while maintaining perfect precision on the validation set. The most dramatic improvement was observed in ResNet50V2’s overall classification accuracy, which increased from 78\% to 93\% when trained with the augmented data. This work provides a scalable, cost-effective approach to enhancing automated defect detection in glass manufacturing that can potentially be extended to other industrial quality assurance systems and industries with similar class imbalance challenges.

nan

Article 470

Title@2025-07-29 (2): Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Title: Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Sem-DPO: Semantische Inkonsistenz bei der Preference-Optimierung für Prompt Engineering mindern

Sem-DPO: 减轻在优先优化即时工程方面的语义不一致现象 2507.20133v2

Authors (8): Anas Mohamed, Azal Ahmad Khan, Xinran Wang, Ahmad Faraz Khan, Shuwen Ge, Saman Bahzad Khan, Ayaan Ahmad, Ali Anwar

Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user’s intended meaning. We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO adjusts the DPO loss using a weight based on how different the winning prompt is from the original, reducing the impact of training examples that are semantically misaligned. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a provably bounded neighborhood of the original text. On three standard text-to-image prompt-optimization benchmarks and two language models, Sem-DPO achieves 8-12% higher CLIP similarity and 5-9% higher human-preference scores (HPSv2.1, PickScore) than DPO, while also outperforming state-of-the-art baselines. These findings suggest that strong flat baselines augmented with semantic weighting should become the new standard for prompt-optimization studies and lay the groundwork for broader, semantics-aware preference optimization in language models.

nan

Article 471

Title@2025-07-29 (2): Image Super-resolution Inspired Electron Density Prediction

Title: Image Super-resolution Inspired Electron Density Prediction

Bild Super-Auflösung Inspirierte Elektronendichte Vorhersage

图像超分辨率激发电密度预测 2402.12335v2

Authors (4): Chenghan Li, Or Sharir, Shunyue Yuan, Garnet K. Chan

Drawing inspiration from the domain of image super-resolution, we view the electron density as a 3D grayscale image and use a convolutional residual network to transform a crude and trivially generated guess of the molecular density into an accurate ground-state quantum mechanical density. We find that this model outperforms all prior density prediction approaches. Because the input is itself a real-space density, the predictions are equivariant to molecular symmetry transformations even though the model is not constructed to be. Due to its simplicity, the model is directly applicable to unseen molecular conformations and chemical elements. We show that fine-tuning on limited new data provides high accuracy even in challenging cases of exotic elements and charge states. Our work suggests new routes to learning real-space physical quantities drawing from the established ideas of image processing.

nan

Article 472

Title@2025-07-29 (2): Stochastic forest transition model dynamics and parameter estimation via deep learning

Title: Stochastic forest transition model dynamics and parameter estimation via deep learning

Stochastische Wald Übergangsmodell Dynamik und Parameterschätzung durch Deep Learning

通过深层学习对森林口过渡模型动态和参数估计 2507.21486v1

Authors (3): Satoshi Kumabe, Tianyu Song, Ton Viet Ta

Forest transitions, characterized by dynamic shifts between forest, agricultural, and abandoned lands, are complex phenomena. This study developed a stochastic differential equation model to capture the intricate dynamics of these transitions. We established the existence of global positive solutions for the model and conducted numerical analyses to assess the impact of model parameters on deforestation incentives. To address the challenge of parameter estimation, we proposed a novel deep learning approach that estimates all model parameters from a single sample containing time-series observations of forest and agricultural land proportions. This innovative approach enables us to understand forest transition dynamics and deforestation trends at any future time.

nan

Article 473

Title@2025-07-29 (2): Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer

测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v5

Authors (2): Satadeep Bhattacharjee, Seung-Cheol Lee

The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model’s empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.

nan

Article 474

Title@2025-07-29 (2): The pitfalls of next-token prediction

Title: The pitfalls of next-token prediction

Die Fallstricke der Next-Token-Vorhersage

下吨预测的陷阱 2403.06963v3

Authors (2): Gregor Bachmann, Vaishnavh Nagarajan

Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective. As a starting point, we argue that the two often-conflated phases of next-token prediction – autoregressive inference and teacher-forced training – must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner – remarkably, despite the task being straightforward to learn. Finally, we provide preliminary evidence that this failure can be resolved using teacherless training, a simple modification using dummy tokens that predicts multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm. We make our code available under https://github.com/gregorbachmann/Next-Token-Failures

nan

Article 475

Title@2025-07-29 (2): HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

Title: HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

HI-PMK: Ein Data-Dependent-Kernel für unvollständige heterogene Datendarstellung

HI-PMK:一个数据依赖核心,用于不完全异基因数据代表 2501.04300v3

Authors (4): Youran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells, Sunil Aryal

Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical). Existing methods often rely on imputation, which may introduce bias or privacy risks, or fail to jointly address data heterogeneity and structured missingness. We propose the \textbf{H}eterogeneous \textbf{I}ncomplete \textbf{P}robability \textbf{M}ass \textbf{K}ernel (\textbf{HI-PMK}), a novel data-dependent representation learning approach that eliminates the need for imputation. HI-PMK introduces two key innovations: (1) a probability mass-based dissimilarity measure that adapts to local data distributions across heterogeneous features (numerical, ordinal, nominal), and (2) a missingness-aware uncertainty strategy (MaxU) that conservatively handles all three missingness mechanisms by assigning maximal plausible dissimilarity to unobserved entries. Our approach is privacy-preserving, scalable, and readily applicable to downstream tasks such as classification and clustering. Extensive experiments on over 15 benchmark datasets demonstrate that HI-PMK consistently outperforms traditional imputation-based pipelines and kernel methods across a wide range of missing data settings. Code is available at: https://github.com/echoid/Incomplete-Heter-Kernel

nan

Article 476

Title@2025-07-29 (2): Capacity-Constrained Continual Learning

Title: Capacity-Constrained Continual Learning

Leistungsbeschränktes kontinuierliches Lernen

受能力制约的不断学习 2507.21479v1

Authors (4): Zheng Wen, Doina Precup, Benjamin Van Roy, Satinder Singh

Any agents we can possibly build are subject to capacity constraints, as memory and compute resources are inherently finite. However, comparatively little attention has been dedicated to understanding how agents with limited capacity should allocate their resources for optimal performance. The goal of this paper is to shed some light on this question by studying a simple yet relevant continual learning problem: the capacity-constrained linear-quadratic-Gaussian (LQG) sequential prediction problem. We derive a solution to this problem under appropriate technical conditions. Moreover, for problems that can be decomposed into a set of sub-problems, we also demonstrate how to optimally allocate capacity across these sub-problems in the steady state. We view the results of this paper as a first step in the systematic theoretical study of learning under capacity constraints.

nan

Article 477

Title@2025-07-29 (2): Adversarial bandit optimization for approximately linear functions

Title: Adversarial bandit optimization for approximately linear functions

Adversariale Bandit-Optimierung für etwa lineare Funktionen

大约直线功能的对面土匪优化 2505.20734v5

Authors (3): Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player’s choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.

nan

Article 478

Title@2025-07-29 (2): Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning

Title: Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning

Hebbian Memory-Augmented Recurrent Networks: Engram Neuronen im Deep Learning

Hebbian记忆增强的经常网络网络:深层学习中的Engram神经元 2507.21474v1

Authors (1): Daniel Szelogowski

Despite success across diverse tasks, current artificial recurrent network architectures rely primarily on implicit hidden-state memories, limiting their interpretability and ability to model long-range dependencies. In contrast, biological neural systems employ explicit, associative memory traces (i.e., engrams) strengthened through Hebbian synaptic plasticity and activated sparsely during recall. Motivated by these neurobiological insights, we introduce the Engram Neural Network (ENN), a novel recurrent architecture incorporating an explicit, differentiable memory matrix with Hebbian plasticity and sparse, attention-driven retrieval mechanisms. The ENN explicitly models memory formation and recall through dynamic Hebbian traces, improving transparency and interpretability compared to conventional RNN variants. We evaluate the ENN architecture on three canonical benchmarks: MNIST digit classification, CIFAR-10 image sequence modeling, and WikiText-103 language modeling. Our empirical results demonstrate that the ENN achieves accuracy and generalization performance broadly comparable to classical RNN, GRU, and LSTM architectures, with all models converging to similar accuracy and perplexity on the large-scale WikiText-103 task. At the same time, the ENN offers significant enhancements in interpretability through observable memory dynamics. Hebbian trace visualizations further reveal biologically plausible, structured memory formation processes, validating the potential of neuroscience-inspired mechanisms to inform the development of more interpretable and robust deep learning models.

nan

Article 479

Title@2025-07-29 (2): Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training

Title: Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training

Retrieve-Augmented Generation zur Beschleunigung der Diffusionspolitik ohne zusätzliches Training

加速推广政策而无需额外培训的回收-提款一代 2507.21452v1

Authors (5): Sodtavilan Odonchimed, Tatsuya Matsushima, Simon Holk, Yusuke Iwasawa, Yutaka Matsuo

Diffusion Policies (DPs) have attracted attention for their ability to achieve significant accuracy improvements in various imitation learning tasks. However, DPs depend on Diffusion Models, which require multiple noise removal steps to generate a single action, resulting in long generation times. To solve this problem, knowledge distillation-based methods such as Consistency Policy (CP) have been proposed. However, these methods require a significant amount of training time, especially for difficult tasks. In this study, we propose RAGDP (Retrieve-Augmented Generation for Diffusion Policies) as a novel framework that eliminates the need for additional training using a knowledge base to expedite the inference of pre-trained DPs. In concrete, RAGDP encodes observation-action pairs through the DP encoder to construct a vector database of expert demonstrations. During inference, the current observation is embedded, and the most similar expert action is extracted. This extracted action is combined with an intermediate noise removal step to reduce the number of steps required compared to the original diffusion step. We show that by using RAGDP with the base model and existing acceleration methods, we improve the accuracy and speed trade-off with no additional training. Even when accelerating the models 20 times, RAGDP maintains an advantage in accuracy, with a 7% increase over distillation models such as CP.

nan

Article 480

Title@2025-07-29 (2): From Global to Local: A Scalable Benchmark for Local Posterior Sampling

Title: From Global to Local: A Scalable Benchmark for Local Posterior Sampling

Von Global zu Local: Ein skalierbarer Benchmark für die lokale posteriore Probenahme

从全球到地方:一个可缩放的基准 2507.21449v1

Authors (2): Rohan Hitchcock, Jesse Hoogland

Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well understood how stochastic gradient MCMC (SGMCMC) algorithms interact with this degeneracy. In particular, current global convergence guarantees for common SGMCMC algorithms rely on assumptions which are likely incompatible with degenerate loss landscapes. In this paper, we argue that this gap requires a shift in focus from global to local posterior sampling, and, as a first step, we introduce a novel scalable benchmark for evaluating the local sampling performance of SGMCMC algorithms. We evaluate a number of common algorithms, and find that RMSProp-preconditioned SGLD is most effective at faithfully representing the local geometry of the posterior distribution. Although we lack theoretical guarantees about global sampler convergence, our empirical results show that we are able to extract non-trivial local information in models with up to O(100M) parameters.

nan

Article 481

Title@2025-07-29 (2): Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations

Title: Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations

Echtzeit-Audio-Visuelle Sprachverbesserung mit vortrainierten visuellen Darstellungen

利用经过培训的视觉代表器加强实时视听语音语音 2507.21448v1

Authors (5): Teng, Ma, Sile Yin, Li-Chia Yang, Shuo Zhang

Speech enhancement in audio-only settings remains challenging, particularly in the presence of interfering speakers. This paper presents a simple yet effective real-time audio-visual speech enhancement (AVSE) system, RAVEN, which isolates and enhances the on-screen target speaker while suppressing interfering speakers and background noise. We investigate how visual embeddings learned from audio-visual speech recognition (AVSR) and active speaker detection (ASD) contribute to AVSE across different SNR conditions and numbers of interfering speakers. Our results show concatenating embeddings from AVSR and ASD models provides the greatest improvement in low-SNR, multi-speaker environments, while AVSR embeddings alone perform best in noise-only scenarios. In addition, we develop a real-time streaming system that operates on a computer CPU and we provide a video demonstration and code repository. To our knowledge, this is the first open-source implementation of a real-time AVSE system.

nan

Article 482

Title@2025-07-29 (2): Nonparametric Sparse Online Learning of the Koopman Operator

Title: Nonparametric Sparse Online Learning of the Koopman Operator

Nonparametric Sparse Online-Lernen des Koopman-Betreibers

Koopman 运算符的非参数 Sparass 在线学习 2405.07432v3

Authors (5): Boya Hou, Sina Sanjari, Nathan Dahlin, Alec Koppel, Subhonmesh Bose

The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. However, existing data-driven approaches to learning the Koopman operator rely on batch data. In this work, we present a sparse online learning algorithm that learns the Koopman operator iteratively via stochastic approximation, with explicit control over model complexity and provable convergence guarantees. Specifically, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and address the mis-specified scenario where the dynamics may escape the chosen RKHS. In this mis-specified setting, we relate the Koopman operator to the conditional mean embeddings (CME) operator. We further establish both asymptotic and finite-time convergence guarantees for our learning algorithm in mis-specified setting, with trajectory-based sampling where the data arrive sequentially over time. Numerical experiments demonstrate the algorithm’s capability to learn unknown nonlinear dynamics.

nan

Article 483

Title@2025-07-29 (2): PVD-ONet: A Multi-scale Neural Operator Method for Singularly Perturbed Boundary Layer Problems

Title: PVD-ONet: A Multi-scale Neural Operator Method for Singularly Perturbed Boundary Layer Problems

PVD-ONet: Eine mehrstufige Neuraloperator-Methode für singulär gestörte Grenzschichtprobleme

PVD-ONet: 单层扰动边界层问题多级神经操作员方法 2507.21437v1

Authors (2): Tiantian Sun, Jian Zu

Physics-informed neural networks and Physics-informed DeepONet excel in solving partial differential equations; however, they often fail to converge for singularly perturbed problems. To address this, we propose two novel frameworks, Prandtl-Van Dyke neural network (PVD-Net) and its operator learning extension Prandtl-Van Dyke Deep Operator Network (PVD-ONet), which rely solely on governing equations without data. To address varying task-specific requirements, both PVD-Net and PVD-ONet are developed in two distinct versions, tailored respectively for stability-focused and high-accuracy modeling. The leading-order PVD-Net adopts a two-network architecture combined with Prandtl’s matching condition, targeting stability-prioritized scenarios. The high-order PVD-Net employs a five-network design with Van Dyke’s matching principle to capture fine-scale boundary layer structures, making it ideal for high-accuracy scenarios. PVD-ONet generalizes PVD-Net to the operator learning setting by assembling multiple DeepONet modules, directly mapping initial conditions to solution operators and enabling instant predictions for an entire family of boundary layer problems without retraining. Numerical experiments on various models show that our proposed methods consistently outperform existing baselines under various error metrics, thereby offering a powerful new approach for multi-scale problems.

nan

Article 484

Title@2025-07-29 (2): Measuring Sample Quality with Copula Discrepancies

Title: Measuring Sample Quality with Copula Discrepancies

Messung der Probenqualität mit Copula-Diskrepanzen

衡量抽样质量与可协调差异 2507.21434v1

Authors (3): Agnideep Aich, Ashit Baran Aich, Bruce Wade

The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar’s theorem to isolate and quantify the fidelity of a sample’s dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall’s tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.

nan

Article 485

Title@2025-07-29 (2): LLAMAPIE: Proactive In-Ear Conversation Assistants

Title: LLAMAPIE: Proactive In-Ear Conversation Assistants

LLAMAPIE: Proaktive In-Ear-Gesprächsassistenten

LLAMAPIE: 主动的在轨在轨对话助理 2505.04066v2

Authors (5): Tuochao Chen, Nicholas Batchelder, Alisa Liu, Noah Smith, Shyamnath Gollakota

We introduce LlamaPIE, the first real-time proactive assistant designed to enhance human conversations through discreet, concise guidance delivered via hearable devices. Unlike traditional language models that require explicit user invocation, this assistant operates in the background, anticipating user needs without interrupting conversations. We address several challenges, including determining when to respond, crafting concise responses that enhance conversations, leveraging knowledge of the user for context-aware assistance, and real-time, on-device processing. To achieve this, we construct a semi-synthetic dialogue dataset and propose a two-model pipeline: a small model that decides when to respond and a larger model that generates the response. We evaluate our approach on real-world datasets, demonstrating its effectiveness in providing helpful, unobtrusive assistance. User studies with our assistant, implemented on Apple Silicon M2 hardware, show a strong preference for the proactive assistant over both a baseline with no assistance and a reactive model, highlighting the potential of LlamaPie to enhance live conversations.

nan

Article 486

Title@2025-07-29 (2): Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning

Title: Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning

Induzieren kausale Weltmodelle in LLMs für Zero-Shot Physical Reasoning

在零热物理原因的LLMM中引入因果世界模型 2507.19855v2

Authors (6): Aditya Sharma, Linh Nguyen, Ananya Gupta, Chengyu Wang, Chiamaka Adebayo, Jakub Kowalski

Large Language Models (LLMs), despite their advanced linguistic capabilities, fundamentally lack an intuitive understanding of physical dynamics, which limits their effectiveness in real-world scenarios that require causal reasoning. In this paper, we introduce Causal World Model Induction (CWMI), a novel framework designed to embed an explicit model of causal physics within an LLM. Our approach incorporates a dedicated Causal Physics Module (CPM) and a new training objective called Causal Intervention Loss, encouraging the model to learn cause-and-effect relationships from multimodal data. By training the model to predict the outcomes of hypothetical interventions instead of merely capturing statistical correlations, CWMI develops a robust internal representation of physical laws. Experimental results show that CWMI significantly outperforms state-of-the-art LLMs on zero-shot physical reasoning tasks, including the PIQA benchmark and our newly proposed PhysiCa-Bench dataset. These findings demonstrate that inducing a causal world model is a critical step toward more reliable and generalizable AI systems.

nan

Article 487

Title@2025-07-29 (2): From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

Title: From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

Von Sublinear zu Linear: Schnelle Konvergenz in tiefen Netzwerken über lokale Polyak-Lojasiewicz-Regionen

从子线线至线性线性:通过地方Polyak-Lojasiewicz区在深网络中快速聚合 2507.21429v1

Authors (3): Agnideep Aich, Ashit Baran Aich, Bruce Wade

The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) stability, these same regions satisfy a local Polyak-Lojasiewicz (PL) condition. We introduce the concept of a Locally Polyak-Lojasiewicz Region (LPLR), where the squared gradient norm lower-bounds the suboptimality gap, prove that properly initialized finite-width networks admit such regions around initialization, and establish that GD achieves linear convergence within an LPLR, providing the first finite-width guarantee that matches empirically observed rates. We validate our theory across diverse settings, from controlled experiments on fully-connected networks to modern ResNet architectures trained with stochastic methods, demonstrating that LPLR structure emerges robustly in practical deep learning scenarios. By rigorously connecting local landscape geometry to fast optimization through the NTK framework, our work provides a definitive theoretical explanation for the remarkable efficiency of gradient-based optimization in deep learning.

nan

Article 488

Title@2025-07-29 (2): InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

Title: InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain für LLM mit optischen Schaltungsschalter Transceivern

无限HBD:利用光电转换收发器为LLM 建立数据中心 – – 高度宽宽度高域域 2502.03885v5

Authors (14): Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang

Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism (TP) and Expert Parallelism (EP). However, existing HBD architectures face fundamental limitations in scalability, cost, and fault resiliency: switch-centric HBDs (e.g., NVL-72) incur prohibitive scaling costs, while GPU-centric HBDs (e.g., TPUv3/Dojo) suffer from severe fault propagation. Switch-GPU hybrid HBDs such as TPUv4 take a middle-ground approach, but the fault explosion radius remains large at the cube level (e.g., 64 TPUs). We propose InfiniteHBD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level using Optical Circuit Switching (OCS). By embedding OCS within each transceiver, InfiniteHBD achieves reconfigurable point-to-multipoint connectivity, allowing the topology to adapt to variable-size rings. This design provides: i) datacenter-wide scalability without cost explosion; ii) fault resilience by isolating failures to a single node, and iii) full bandwidth utilization for fault-free GPUs. Key innovations include a Silicon Photonic (SiPh)-based low-cost OCS transceiver (OCSTrx), a reconfigurable k-hop ring topology co-designed with intra-/inter-node communication, and an HBD-DCN orchestration algorithm maximizing GPU utilization while minimizing cross-ToR datacenter network traffic. The evaluation demonstrates that InfiniteHBD achieves 31% of the cost of NVL-72, near-zero GPU waste ratio (over one order of magnitude lower than NVL-72 and TPUv4), near-zero cross-ToR traffic when node fault ratios are under 7%, and improves Model FLOPs Utilization by 3.37x compared to NVIDIA DGX (8 GPUs per Node).

nan

Article 489

Title@2025-07-29 (2): PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN

Title: PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN

PAR-AdvGAN: Verbesserung der Angriffsfähigkeit mit progressiver Auto-Regression AdvGAN

PAR-AdvGAN: 提高反向攻击能力 2502.12207v2

Authors (7): Jiayu Zhang, Zhiyu Zhu, Xinyi Wang, Silin Liao, Zhibo Jin, Flora D. Salim, Huaming Chen

Deep neural networks have demonstrated remarkable performance across various domains. However, they are vulnerable to adversarial examples, which can lead to erroneous predictions. Generative Adversarial Networks (GANs) can leverage the generators and discriminators model to quickly produce high-quality adversarial examples. Since both modules train in a competitive and simultaneous manner, GAN-based algorithms like AdvGAN can generate adversarial examples with better transferability compared to traditional methods. However, the generation of perturbations is usually limited to a single iteration, preventing these examples from fully exploiting the potential of the methods. To tackle this issue, we introduce a novel approach named Progressive Auto-Regression AdvGAN (PAR-AdvGAN). It incorporates an auto-regressive iteration mechanism within a progressive generation network to craft adversarial examples with enhanced attack capability. We thoroughly evaluate our PAR-AdvGAN method with a large-scale experiment, demonstrating its superior performance over various state-of-the-art black-box adversarial attacks, as well as the original AdvGAN.Moreover, PAR-AdvGAN significantly accelerates the adversarial example generation, i.e., achieving the speeds of up to 335.5 frames per second on Inception-v3 model, outperforming the gradient-based transferable attack algorithms. Our code is available at: https://github.com/LMBTough/PAR

nan

Article 490

Title@2025-07-29 (2): Back Home: A Computer Vision Solution to Seashell Identification for Ecological Restoration

Title: Back Home: A Computer Vision Solution to Seashell Identification for Ecological Restoration

Zurück Home: Eine Computer Vision-Lösung zur Seashell-Identifizierung für die ökologische Restaurierung

返乡:通过计算机的愿景解决方案来识别海壳,促进生态恢复 2501.04873v4

Authors (3): Alexander Valverde, Luis Solano, André Montoya

Illegal souvenir collection strips an estimated five tonnes of seashells from Costa Rica’s beaches each year. Yet, once these specimens are seized, their coastal origin – Pacific or Caribbean – cannot be verified easily due to the lack of information, preventing their return when confiscated by local authorities. To solve this issue, we introduce BackHome19K, the first large-scale image corpus (19,058 photographs, 516 species) annotated with coast-level labels, and propose a lightweight pipeline that infers provenance in real time on a mobile-grade CPU. A trained anomaly filter pre-screens uploads, increasing robustness to user-generated noise. On a held-out test set, the classifier attains 86.3% balanced accuracy, while the filter rejects 93% of 180 out-of-domain objects with zero false negatives. Deployed as a web application, the system has already processed 70,000 shells for wildlife officers in under three seconds per image, enabling confiscated specimens to be safely repatriated to their native ecosystems. The dataset is available at https://huggingface.co/datasets/FIFCO/BackHome19K

nan

Article 491

Title@2025-07-29 (2): Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling

Title: Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling

Bergbau-Intrinsische Belohnungen aus LLM-Hidden States für effiziente Best-of-N-Probenahme

LLM隐藏国为高效率最佳采样而从LLM公司获得的采矿内部奖赏 2505.12225v2

Authors (4): Jizhou Guo, Zhaomin Wu, Hanchen Yang, Philip S. Yu

Enhancing Large Language Model (LLM)’s performance with best-of-N sampling is effective and has attracted significant attention. However, it is computationally prohibitive due to massive, data-hungry text-based reward models. By changing the data source from text to hidden states, we introduce SWIFT (Simple Weighted Intrinsic Feedback Technique), a novel, lightweight technique that leverages the rich information embedded in LLM hidden states to address these issues, which operates on token-level and consists of only linear layers. Extensive experiments show that SWIFT outperforms baselines with less than 0.005% of the parameters of baselines, requiring only a few samples for training, demonstrating significant efficiency improvement. SWIFT’s robust scalability, applicability to some closed-source models via logits, and ability to be combined with traditional reward models to yield further performance gains underscore its practical value.

nan

Article 492

Title@2025-07-29 (2): InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Title: InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

InfiniteYou: Flexibles Foto Recrafting unter Wahrung Ihrer Identität

无限的你:在保留身份的同时灵活摄影改造 2503.16418v2

Authors (6): Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu

Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

nan

Article 493

Title@2025-07-29 (2): Randomized Kaczmarz Methods with Beyond-Krylov Convergence

Title: Randomized Kaczmarz Methods with Beyond-Krylov Convergence

Randomisierte Kaczmarz Methoden mit Beyond-Krylov Konvergenz

Kaczmarz 超克隆集成法 2501.11673v2

Authors (4): Michał Dereziński, Deanna Needell, Elizaveta Rebrova, Jiaming Yang

Randomized Kaczmarz methods form a family of linear system solvers which converge by repeatedly projecting their iterates onto randomly sampled equations. While effective in some contexts, such as highly over-determined least squares, Kaczmarz methods are traditionally deemed secondary to Krylov subspace methods, since this latter family of solvers can exploit outliers in the input’s singular value distribution to attain fast convergence on ill-conditioned systems. In this paper, we introduce Kaczmarz++, an accelerated randomized block Kaczmarz algorithm that exploits outlying singular values in the input to attain a fast Krylov-style convergence. Moreover, we show that Kaczmarz++ captures large outlying singular values provably faster than popular Krylov methods, for both over- and under-determined systems. We also develop an optimized variant for positive semidefinite systems, called CD++, demonstrating empirically that it is competitive in arithmetic operations with both CG and GMRES on a collection of benchmark problems. To attain these results, we introduce several novel algorithmic improvements to the Kaczmarz framework, including adaptive momentum acceleration, Tikhonov-regularized projections, and a memoization scheme for reusing information from previously sampled equation blocks.

nan

Article 494

Title@2025-07-29 (2): MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving

Title: MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving

MapDiffusion: Generative Diffusion für vektorisierte Online-HD-Karte Aufbau und Unsicherheit im autonomen Fahren

地图传播:在自动驾驶中为矢量在线HD地图绘制和不确定估计进行传播 2507.21423v1

Authors (6): Thomas Monninger, Zihan Zhang, Zhipeng Mo, Md Zafar Anwar, Steffen Staab, Sihao Ding

Autonomous driving requires an understanding of the static environment from sensor data. Learned Bird’s-Eye View (BEV) encoders are commonly used to fuse multiple inputs, and a vector decoder predicts a vectorized map representation from the latent BEV grid. However, traditional map construction models provide deterministic point estimates, failing to capture uncertainty and the inherent ambiguities of real-world environments, such as occlusions and missing lane markings. We propose MapDiffusion, a novel generative approach that leverages the diffusion paradigm to learn the full distribution of possible vectorized maps. Instead of predicting a single deterministic output from learned queries, MapDiffusion iteratively refines randomly initialized queries, conditioned on a BEV latent grid, to generate multiple plausible map samples. This allows aggregating samples to improve prediction accuracy and deriving uncertainty estimates that directly correlate with scene ambiguity. Extensive experiments on the nuScenes dataset demonstrate that MapDiffusion achieves state-of-the-art performance in online map construction, surpassing the baseline by 5% in single-sample performance. We further show that aggregating multiple samples consistently improves performance along the ROC curve, validating the benefit of distribution modeling. Additionally, our uncertainty estimates are significantly higher in occluded areas, reinforcing their value in identifying regions with ambiguous sensor input. By modeling the full map distribution, MapDiffusion enhances the robustness and reliability of online vectorized HD map construction, enabling uncertainty-aware decision-making for autonomous vehicles in complex environments.

nan

Article 495

Title@2025-07-29 (2): Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring

Title: Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring

Drehmomentbasierte Graphenchirurgie:Verbesserung der Graphen-Neural-Netzwerke mit Hierarchischem Rewiring

基于托盘的图表外科:用等级重组增强图形神经网络 2507.21422v1

Authors (6): Sujia Huang, Lele Fu, Zhen Cui, Tong Zhang, Na Song, Bo Huang

Graph Neural Networks (GNNs) have emerged as powerful tools for learning from graph-structured data, leveraging message passing to diffuse information and update node representations. However, most efforts have suggested that native interactions encoded in the graph may not be friendly for this process, motivating the development of graph rewiring methods. In this work, we propose a torque-driven hierarchical rewiring strategy, inspired by the notion of torque in classical mechanics, dynamically modulating message passing to improve representation learning in heterophilous graphs and enhance robustness against noisy graphs. Specifically, we define an interference-aware torque metric that integrates structural distance and energy scores to quantify the perturbation induced by edges, thereby encouraging each node to aggregate information from its nearest low-energy neighbors. We use the metric to hierarchically reconfigure the receptive field of each layer by judiciously pruning high-torque edges and adding low-torque links, suppressing propagation noise and boosting pertinent signals. Extensive evaluations on benchmark datasets show that our approach surpasses state-of-the-art methods on both heterophilous and homophilous graphs, and maintains high accuracy on noisy graph.

nan

Article 496

Title@2025-07-29 (2): Cascading and Proxy Membership Inference Attacks

Title: Cascading and Proxy Membership Inference Attacks

Cascading und Proxy Mitgliedschafts-Inferenz-Angriffe

连带和代理成员推定攻击 2507.21412v1

Authors (8): Yuntao Du, Jiacheng Li, Yuetian Chen, Kaiyuan Zhang, Zhizhen Yuan, Hanshen Xiao, Bruno Ribeiro, Ninghui Li

A Membership Inference Attack (MIA) assesses how much a trained machine learning model reveals about its training data by determining whether specific query instances were included in the dataset. We classify existing MIAs into adaptive or non-adaptive, depending on whether the adversary is allowed to train shadow models on membership queries. In the adaptive setting, where the adversary can train shadow models after accessing query instances, we highlight the importance of exploiting membership dependencies between instances and propose an attack-agnostic framework called Cascading Membership Inference Attack (CMIA), which incorporates membership dependencies via conditional shadow training to boost membership inference performance. In the non-adaptive setting, where the adversary is restricted to training shadow models before obtaining membership queries, we introduce Proxy Membership Inference Attack (PMIA). PMIA employs a proxy selection strategy that identifies samples with similar behaviors to the query instance and uses their behaviors in shadow models to perform a membership posterior odds test for membership inference. We provide theoretical analyses for both attacks, and extensive experimental results demonstrate that CMIA and PMIA substantially outperform existing MIAs in both settings, particularly in the low false-positive regime, which is crucial for evaluating privacy risks.

nan

Article 497

Title@2025-07-29 (2): Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning

Title: Automated Generation of Diverse Courses of Actions for Multi-Agent Operations using Binary Optimization and Graph Learning

Automatisierte Generierung von vielfältigen Handlungskursen für Multi-Agenten-Betriebe mit Binäroptimierung und Graphen-Lernen

利用二进制优化和图表学习,自动产生多种多机构业务行动多样化行动方案 2506.20031v2

Authors (4): Prithvi Poddar, Ehsan Tarkesh Esfahani, Karthik Dantu, Souma Chowdhury

Operations in disaster response, search \& rescue, and military missions that involve multiple agents demand automated processes to support the planning of the courses of action (COA). Moreover, traverse-affecting changes in the environment (rain, snow, blockades, etc.) may impact the expected performance of a COA, making it desirable to have a pool of COAs that are diverse in task distributions across agents. Further, variations in agent capabilities, which could be human crews and/or autonomous systems, present practical opportunities and computational challenges to the planning process. This paper presents a new theoretical formulation and computational framework to generate such diverse pools of COAs for operations with soft variations in agent-task compatibility. Key to the problem formulation is a graph abstraction of the task space and the pool of COAs itself to quantify its diversity. Formulating the COAs as a centralized multi-robot task allocation problem, a genetic algorithm is used for (order-ignoring) allocations of tasks to each agent that jointly maximize diversity within the COA pool and overall compatibility of the agent-task mappings. A graph neural network is trained using a policy gradient approach to then perform single agent task sequencing in each COA, which maximizes completion rates adaptive to task features. Our tests of the COA generation process in a simulated environment demonstrate significant performance gain over a random walk baseline, small optimality gap in task sequencing, and execution time of about 50 minutes to plan up to 20 COAs for 5 agent/100 task operations.

nan

Article 498

Title@2025-07-29 (2): Data Leakage and Redundancy in the LIT-PCBA Benchmark

Title: Data Leakage and Redundancy in the LIT-PCBA Benchmark

Datenleckage und Redundanz im LIT-PCBA Benchmark

LIT-PCBA基准数据泄漏和冗余 2507.21404v1

Authors (3): Amber Huang, Ian Scott Knight, Slava Naprienko

LIT-PCBA is a widely used benchmark for virtual screening, but our audit reveals it is fundamentally compromised. The dataset suffers from egregious data leakage, rampant duplication, and pervasive analog redundancy – flaws that invalidate its use for fair model evaluation. Notably, we identify 2,491 inactives duplicated across training and validation sets, and thousands more repeated within individual data splits (2,945 in training, 789 in validation). Critically, three ligands in the query set – meant to represent unseen test cases – are leaked: two appear in the training set, one in validation. Structural redundancy compounds these issues: for some targets, over 80% of query ligands are near duplicates, with Tanimoto similarity >= 0.9. In ALDH1 alone, we find 323 highly similar active pairs between training and validation sets, invalidating claims of chemical diversity. These and other flaws collectively cause models trained on LIT-PCBA to memorize rather than generalize. To demonstrate the consequences of these data integrity failures, we implement a trivial memorization-based baseline – using no learning, no physics, and no modeling – that outperforms state-of-the-art models, including deep neural networks like CHEESE, on LIT-PCBA simply by exploiting these artifacts. Our findings render the benchmark unfit for its intended purpose and call into question previous results based on its use. We share this audit to raise awareness and provide tooling to help the community develop more rigorous and reliable datasets going forward. All scripts necessary to reproduce our audit and the baseline implementation are available at: https://github.com/sievestack/LIT-PCBA-audit

nan

Article 499

Title@2025-07-29 (2): Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

Title: Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

Ermöglichung der Erkundung von Pareto-Stationarität im multi-objektiven Ausbau-Lernen: Ein multi-objektiver gewichtiger Chebyshev-Actor-Kritischer Ansatz

使多目标强化学习中的Pareto-Starity探索:多目标加权-Chebyshev Actor-Crictive 方法 2507.21397v1

Authors (9): Fnu Hairi, Jiao Yang, Tianchen Zhou, Haibo Yang, Chaosheng Dong, Fan Yang, Michinari Momma, Yan Gao, Jia Liu

In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a \uline{M}ulti-\uline{O}bjective weighted-\uline{CH}ebyshev \uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{\min}$ in finding an $\epsilon$-Pareto-stationary solution, where $p_{\min}$ denotes the minimum entry of a given weight vector $\mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $\tilde{\mathcal{O}}(\epsilon^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.

nan

Article 500

Title@2025-07-29 (2): Systolic Array-based Accelerator for State-Space Models

Title: Systolic Array-based Accelerator for State-Space Models

Systolischer Array-basierter Accelerator für State-Space-Modelle

州空间模型的基于收量的阵列加速器 2507.21394v1

Authors (5): Shiva Raja, Cansu Demirkiran, Aakash Sarkar, Milos Popovic, Ajay Joshi

Sequence modeling is crucial for AI to understand temporal data and detect complex time-dependent patterns. While recurrent neural networks (RNNs), convolutional neural networks (CNNs), and Transformers have advanced in capturing long-range dependencies, they struggle with achieving high accuracy with very long sequences due to limited memory retention (fixed context window). State-Space Models (SSMs) leverage exponentially decaying memory enabling lengthy context window and so they process very long data sequences more efficiently than recurrent and Transformer-based models. Unlike traditional neural models like CNNs and RNNs, SSM-based models require solving differential equations through continuous integration, making training and inference both compute- and memory-intensive on conventional CPUs and GPUs. In this paper we introduce a specialized hardware accelerator, EpochCore, for accelerating SSMs. EpochCore is based on systolic arrays (SAs) and is designed to enhance the energy efficiency and throughput of inference of SSM-based models for long-range sequence tasks. Within the SA, we propose a versatile processing element (PE) called LIMA-PE to perform traditional and specialized MAC operations to support traditional DNNs and SSMs. To complement the EpochCore microarchitecture, we propose a novel dataflow, ProDF, which enables highly efficient execution of SSM-based models. By leveraging the LIMA-PE microarchitecture and ProDF, EpochCore achieves on average 250x gains in performance and 45x improvement in energy efficiency, at the expense of 2x increase in area cost over traditional SA-based accelerators, and around ~2,000x improvement in latency/inference on LRA datasets compared to GPU kernel operations.

nan

Article 501

Title@2025-07-28 (1): FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning

Title: FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning

FedStrategist: Ein Meta-Learning-Framework für adaptive und robuste Aggregation im Federated Learning

联邦战略:联邦学习中适应性和强力聚合的元学习框架 2507.14322v2

Authors (3): Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain

Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem. We design a lightweight contextual bandit agent that dynamically selects the optimal aggregation rule from an arsenal of defenses based on real-time diagnostic metrics. Through comprehensive experiments, we demonstrate that no single static rule is universally optimal. We show that our adaptive agent successfully learns superior policies across diverse scenarios, including a ``Krum-favorable” environment and against a sophisticated “stealth” adversary designed to neutralize specific diagnostic signals. Critically, we analyze the paradoxical scenario where a non-robust baseline achieves high but compromised accuracy, and demonstrate that our agent learns a conservative policy to prioritize model integrity. Furthermore, we prove the agent’s policy is controllable via a single “risk tolerance” parameter, allowing practitioners to explicitly manage the trade-off between performance and security. Our work provides a new, practical, and analyzable approach to creating resilient and intelligent decentralized AI systems.

nan

Article 502

Title@2025-07-28 (1): Addressing High Class Imbalance in Multi-Class Diabetic Retinopathy Severity Grading with Augmentation and Transfer Learning

Title: Addressing High Class Imbalance in Multi-Class Diabetic Retinopathy Severity Grading with Augmentation and Transfer Learning

Umgang mit hochwertigem Gleichgewicht in der multi-Klasse diabetischen Retinopathie Schweregraduierung mit Augmentation und Transfer-Lernen

多类糖尿病雷蒂诺病分病分级加增和转移学习中处理高等级的不平衡问题 2507.17121v2

Authors (1): Faisal Ahmed

Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited training data. We evaluate a range of pretrained convolutional neural network architectures, including variants of ResNet and EfficientNet, on the APTOS 2019 dataset. For binary classification, our proposed model achieves a state-of-the-art accuracy of 98.9%, with a precision of 98.6%, recall of 99.3%, F1-score of 98.9%, and an AUC of 99.4%. In the more challenging five-class severity classification task, our model obtains a competitive accuracy of 84.6% and an AUC of 94.1%, outperforming several existing approaches. Our findings also demonstrate that EfficientNet-B0 and ResNet34 offer optimal trade-offs between accuracy and computational efficiency across both tasks. These results underscore the effectiveness of combining class-balanced augmentation with transfer learning for high-performance DR diagnosis. The proposed framework provides a scalable and accurate solution for DR screening, with potential for deployment in real-world clinical environments.

nan

Article 503

Title@2025-07-28 (1): Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Title: Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Effizienter neuraler Kombinatorial-Optimierungslöser für das Min-max Heterogene kapazitive Fahrzeugrouting-Problem

用于解决机动车辆流动问题最小最大高度超异性电动车辆的高效神经组合组合优化解决方案 2507.21386v1

Authors (8): Xuan Wu, Di Wang, Chunguo Wu, Kaifang Qi, Chunyan Miao, Yubin Xiao, Jian Zhang, You Zhou

Numerous Neural Combinatorial Optimization (NCO) solvers have been proposed to address Vehicle Routing Problems (VRPs). However, most of these solvers focus exclusively on single-vehicle VRP variants, overlooking the more realistic min-max Heterogeneous Capacitated Vehicle Routing Problem (MMHCVRP), which involves multiple vehicles. Existing MMHCVRP solvers typically select a vehicle and its next node to visit at each decoding step, but often make myopic decoding decisions and overlook key properties of MMHCVRP, including local topological relationships, vehicle permutation invariance, and node symmetry, resulting in suboptimal performance. To better address these limitations, we propose ECHO, an efficient NCO solver. First, ECHO exploits the proposed dual-modality node encoder to capture local topological relationships among nodes. Subsequently, to mitigate myopic decisions, ECHO employs the proposed Parameter-Free Cross-Attention mechanism to prioritize the vehicle selected in the preceding decoding step. Finally, leveraging vehicle permutation invariance and node symmetry, we introduce a tailored data augment strategy for MMHCVRP to stabilize the Reinforcement Learning training process. To assess the performance of ECHO, we conduct extensive experiments. The experimental results demonstrate that ECHO outperforms state-of-the-art NCO solvers across varying numbers of vehicles and nodes, and exhibits well-performing generalization across both scales and distribution patterns. Finally, ablation studies validate the effectiveness of all proposed methods.

nan

Article 504

Title@2025-07-28 (1): TiVy: Time Series Visual Summary for Scalable Visualization

Title: TiVy: Time Series Visual Summary for Scalable Visualization

TiVy: Zeitreihenvisuelle Zusammenfassung für skalierbare Visualisierung

TiVy:可缩放可视化的时间序列视觉摘要 2507.18972v2

Authors (5): Gromit Yeuk-Yin Chan, Luis Gustavo Nonato, Themis Palpanas, Cláudio T. Silva, Juliana Freire

Visualizing multiple time series presents fundamental tradeoffs between scalability and visual clarity. Time series capture the behavior of many large-scale real-world processes, from stock market trends to urban activities. Users often gain insights by visualizing them as line charts, juxtaposing or superposing multiple time series to compare them and identify trends and patterns. However, existing representations struggle with scalability: when covering long time spans, leading to visual clutter from too many small multiples or overlapping lines. We propose TiVy, a new algorithm that summarizes time series using sequential patterns. It transforms the series into a set of symbolic sequences based on subsequence visual similarity using Dynamic Time Warping (DTW), then constructs a disjoint grouping of similar subsequences based on the frequent sequential patterns. The grouping result, a visual summary of time series, provides uncluttered superposition with fewer small multiples. Unlike common clustering techniques, TiVy extracts similar subsequences (of varying lengths) aligned in time. We also present an interactive time series visualization that renders large-scale time series in real-time. Our experimental evaluation shows that our algorithm (1) extracts clear and accurate patterns when visualizing time series data, (2) achieves a significant speed-up (1000X) compared to a straightforward DTW clustering. We also demonstrate the efficiency of our approach to explore hidden structures in massive time series data in two usage scenarios.

nan

Article 505

Title@2025-07-28 (1): Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators

Title: Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators

Reservoir Computation mit Netzwerken der Differenzierung Neuron Ring Oszillatoren

差异式中子环振动器网络的储量计算 2507.21377v1

Authors (6): Alexander Yeung, Peter DelMastro, Arjun Karuvally, Hava Siegelmann, Edward Rietman, Hananel Hazan

Reservoir Computing is a machine learning approach that uses the rich repertoire of complex system dynamics for function approximation. Current approaches to reservoir computing use a network of coupled integrating neurons that require a steady current to maintain activity. Here, we introduce a small world graph of differentiating neurons that are active only when there are changes in input as an alternative to integrating neurons as a reservoir computing substrate. We find the coupling strength and network topology that enable these small world networks to function as an effective reservoir. We demonstrate the efficacy of these networks in the MNIST digit recognition task, achieving comparable performance of 90.65% to existing reservoir computing approaches. The findings suggest that differentiating neurons can be a potential alternative to integrating neurons and can provide a sustainable future alternative for power-hungry AI applications.

nan

Article 506

Title: Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment

Multi-Mikrofon- und Multi-Modal-Emotionserkennung in reverberanter Umgebung

在震动性环境中多语种和多模式情感的认可 2409.09545v3

Authors (3): Ohad Cohen, Gershon Hazan, Sharon Gannot

This paper presents a Multi-modal Emotion Recognition (MER) system designed to enhance emotion recognition accuracy in challenging acoustic conditions. Our approach combines a modified and extended Hierarchical Token-semantic Audio Transformer (HTS-AT) for multi-channel audio processing with an R(2+1)D Convolutional Neural Networks (CNN) model for video analysis. We evaluate our proposed method on a reverberated version of the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset using synthetic and real-world Room Impulse Responsess (RIRs). Our results demonstrate that integrating audio and video modalities yields superior performance compared to uni-modal approaches, especially in challenging acoustic conditions. Moreover, we show that the multimodal (audiovisual) approach that utilizes multiple microphones outperforms its single-microphone counterpart.

nan

Article 507

Title@2025-07-28 (1): Load Balancing for AI Training Workloads

Title: Load Balancing for AI Training Workloads

Lastausgleich für KI-Trainings-Workloads

AI培训的平衡 AI 平衡 IT 培训工作量 2507.21372v1

Authors (3): Sarah McClure, Sylvia Ratnasamy, Scott Shenker

We investigate the performance of various load balancing algorithms for large-scale AI training workloads that are running on dedicated infrastructure. The performance of load balancing depends on both the congestion control and loss recovery algorithms, so our evaluation also sheds light on the appropriate choices for those designs as well.

nan

Article 508

Title@2025-07-28 (1): A Contrastive Diffusion-based Network (CDNet) for Time Series Classification

Title: A Contrastive Diffusion-based Network (CDNet) for Time Series Classification

Ein Kontrastives Diffusions-basiertes Netzwerk (CDNet) für die Zeitreihenklassifikation

用于时间序列分类的以反向传播为基础的网络(CDNet) 2507.21357v1

Authors (2): Yaoyu Zhang, Chi-Guhn Lee

Deep learning models are widely used for time series classification (TSC) due to their scalability and efficiency. However, their performance degrades under challenging data conditions such as class similarity, multimodal distributions, and noise. To address these limitations, we propose CDNet, a Contrastive Diffusion-based Network that enhances existing classifiers by generating informative positive and negative samples via a learned diffusion process. Unlike traditional diffusion models that denoise individual samples, CDNet learns transitions between samples–both within and across classes–through convolutional approximations of reverse diffusion steps. We introduce a theoretically grounded CNN-based mechanism to enable both denoising and mode coverage, and incorporate an uncertainty-weighted composite loss for robust training. Extensive experiments on the UCR Archive and simulated datasets demonstrate that CDNet significantly improves state-of-the-art (SOTA) deep learning classifiers, particularly under noisy, similar, and multimodal conditions.

nan

Article 509

Title@2025-07-28 (1): Group Relative Augmentation for Data Efficient Action Detection

Title: Group Relative Augmentation for Data Efficient Action Detection

Gruppenrelative Augmentation für dateneffiziente Aktionserkennung

数据高效行动检测组群相对增量 2507.21353v1

Authors (4): Deep Anil Patel, Iain Melvin, Zachary Izzo, Martin Renqiang Min

Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges like overfitting and the granularity mismatch between scene-level pre-training and required person-centric understanding. We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation. Applied within the frozen VLM backbone using FiLM, these augmentations generate diverse feature variations directly relevant to the task. Additionally, we introduce a group-weighted loss function that dynamically modulates the training contribution of each augmented sample based on its prediction divergence relative to the group average. This promotes robust learning by prioritizing informative yet reasonable augmentations. We demonstrate our method’s effectiveness on complex multi-label, multi-person action detection datasets (AVA, MOMA), achieving strong mAP performance and showcasing significant data efficiency for adapting VLMs from limited examples.

nan

Article 510

Title@2025-07-28 (1): DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation

Title: DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation

DEM-NeRF: Eine neuro-symbolische Methode zur wissenschaftlichen Entdeckung durch physikinformierte Simulation

DEM-NERF:通过物理成形模拟法科学发现的一种神经-共制方法 2507.21350v1

Authors (3): Wenkai Tan, Alvaro Velasquez, Houbing Song

Neural networks have emerged as a powerful tool for modeling physical systems, offering the ability to learn complex representations from limited data while integrating foundational scientific knowledge. In particular, neuro-symbolic approaches that combine data-driven learning, the neuro, with symbolic equations and rules, the symbolic, address the tension between methods that are purely empirical, which risk straying from established physical principles, and traditional numerical solvers that demand complete geometric knowledge and can be prohibitively expensive for high-fidelity simulations. In this work, we present a novel neuro-symbolic framework for reconstructing and simulating elastic objects directly from sparse multi-view image sequences, without requiring explicit geometric information. Specifically, we integrate a neural radiance field (NeRF) for object reconstruction with physics-informed neural networks (PINN) that incorporate the governing partial differential equations of elasticity. In doing so, our method learns a spatiotemporal representation of deforming objects that leverages both image supervision and symbolic physical constraints. To handle complex boundary and initial conditions, which are traditionally confronted using finite element methods, boundary element methods, or sensor-based measurements, we employ an energy-constrained Physics-Informed Neural Network architecture. This design enhances both simulation accuracy and the explainability of results.

nan

Article 511

Title@2025-07-28 (1): Recovering Manifold Structure Using Ollivier-Ricci Curvature

Title: Recovering Manifold Structure Using Ollivier-Ricci Curvature

Recovering Manifold Structure mit Ollivier-Ricci Krümmung

使用 Oliviier- Ricci 曲线恢复处理结构 2410.01149v2

Authors (3): Tristan Luca Saidi, Abigail Hickok, Andrew J. Blumberg

We introduce ORC-ManL, a new algorithm to prune spurious edges from nearest neighbor graphs using a criterion based on Ollivier-Ricci curvature and estimated metric distortion. Our motivation comes from manifold learning: we show that when the data generating the nearest-neighbor graph consists of noisy samples from a low-dimensional manifold, edges that shortcut through the ambient space have more negative Ollivier-Ricci curvature than edges that lie along the data manifold. We demonstrate that our method outperforms alternative pruning methods and that it significantly improves performance on many downstream geometric data analysis tasks that use nearest neighbor graphs as input. Specifically, we evaluate on manifold learning, persistent homology, dimension estimation, and others. We also show that ORC-ManL can be used to improve clustering and manifold learning of single-cell RNA sequencing data. Finally, we provide empirical convergence experiments that support our theoretical findings.

nan

Article 512

Title@2025-07-28 (1): Graph neural networks for residential location choice: connection to classical logit models

Title: Graph neural networks for residential location choice: connection to classical logit models

Graphische neuronale Netze für die Wahl der Wohnlage: Anbindung an klassische Logit-Modelle

用于住宅地点选择的图形神经网络:与古典日志模型的连接 2507.21334v1

Authors (5): Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Yuqi Zhou, Shenhao Wang

Researchers have adopted deep learning for classical discrete choice analysis as it can capture complex feature relationships and achieve higher predictive performance. However, the existing deep learning approaches cannot explicitly capture the relationship among choice alternatives, which has been a long-lasting focus in classical discrete choice models. To address the gap, this paper introduces Graph Neural Network (GNN) as a novel framework to analyze residential location choice. The GNN-based discrete choice models (GNN-DCMs) offer a structured approach for neural networks to capture dependence among spatial alternatives, while maintaining clear connections to classical random utility theory. Theoretically, we demonstrate that the GNN-DCMs incorporate the nested logit (NL) model and the spatially correlated logit (SCL) model as two specific cases, yielding novel algorithmic interpretation through message passing among alternatives’ utilities. Empirically, the GNN-DCMs outperform benchmark MNL, SCL, and feedforward neural networks in predicting residential location choices among Chicago’s 77 community areas. Regarding model interpretation, the GNN-DCMs can capture individual heterogeneity and exhibit spatially-aware substitution patterns. Overall, these results highlight the potential of GNN-DCMs as a unified and expressive framework for synergizing discrete choice modeling and deep learning in the complex spatial choice contexts.

nan

Article 513

Title@2025-07-28 (1): Predicting VBAC Outcomes from U.S. Natality Data using Deep and Classical Machine Learning Models

Title: Predicting VBAC Outcomes from U.S. Natality Data using Deep and Classical Machine Learning Models

Vorhersage von VBAC-Ergebnissen aus US-Natality-Daten mittels Deep and Classical Machine Learning Models

利用深古机器学习模型,从美国圣诞数据中预测VBAC结果 2507.21330v1

Authors (1): Ananya Anand

Accurately predicting the outcome of a trial of labor after cesarean (TOLAC) is essential for guiding prenatal counseling and minimizing delivery-related risks. This study presents supervised machine learning models for predicting vaginal birth after cesarean (VBAC) using 643,029 TOLAC cases from the CDC WONDER Natality dataset (2017-2023). After filtering for singleton births with one or two prior cesareans and complete data across 47 prenatal-period features, three classifiers were trained: logistic regression, XGBoost, and a multilayer perceptron (MLP). The MLP achieved the highest performance with an AUC of 0.7287, followed closely by XGBoost (AUC = 0.727), both surpassing the logistic regression baseline (AUC = 0.709). To address class imbalance, class weighting was applied to the MLP, and a custom loss function was implemented in XGBoost. Evaluation metrics included ROC curves, confusion matrices, and precision-recall analysis. Logistic regression coefficients highlighted maternal BMI, education, parity, comorbidities, and prenatal care indicators as key predictors. Overall, the results demonstrate that routinely collected, early-pregnancy variables can support scalable and moderately high-performing VBAC prediction models. These models offer potential utility in clinical decision support, particularly in settings lacking access to specialized intrapartum data.

nan

Article 514

Title@2025-07-28 (1): SQuat: Subspace-orthogonal KV Cache Quantization

Title: SQuat: Subspace-orthogonal KV Cache Quantization

SQuat: Subraum-orthogonale KV-Cache-Quantisierung

Suat: 子空间- orthogonal KV 缓存缓存量化 2503.24358v2

Authors (4): Hao Wang, Ligong Han, Kai Xu, Akash Srivastava

The key-value (KV) cache accelerates LLMs decoding by storing KV tensors from previously generated tokens. It reduces redundant computation at the cost of increased memory usage. To mitigate this overhead, existing approaches compress KV tensors into lower-bit representations; however, quantization errors can accumulate as more tokens are generated, potentially resulting in undesired outputs. In this paper, we introduce SQuat (Subspace-orthogonal KV cache quantization). It first constructs a subspace spanned by query tensors to capture the most critical task-related information. During key tensor quantization, it enforces that the difference between the (de)quantized and original keys remains orthogonal to this subspace, minimizing the impact of quantization errors on the attention mechanism’s outputs. SQuat requires no model fine-tuning, no additional calibration dataset for offline learning, and is grounded in a theoretical framework we develop. Through numerical experiments, we show that our method reduces peak memory by 2.17 to 2.82, improves throughput by 2.45 to 3.60, and achieves more favorable benchmark scores than existing KV cache quantization algorithms.

nan

Article 515

Title@2025-07-28 (1): Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Title: Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen

从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架 2505.11864v3

Authors (2): Kalyan Cherukuri, Aarav Lala

As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering $\epsilon$-approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a provably convergent algorithm for policy optimization using preference-inferred reward cones. Our results bridge the gap between practical alignment techniques and theoretical guarantees, providing a principled foundation for learning aligned behaviors in a high-dimension and value-pluralistic environment.

nan

Article 516

Title@2025-07-28 (1): MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

Title: MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

MALLM-GAN: Multi-Agent Large Language Model als generatives Adversarial Network zur Synthese von Tabellendaten

MALM-GAN:多种需要的大型语言模型,作为合成表格数据生成反对向网络 2406.10521v4

Authors (3): Yaobin Ling, Xiaoqian Jiang, Yejin Kim

In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To address this challenge, we propose a novel framework to generate synthetic tabular data, powered by large language models (LLMs) that emulates the architecture of a Generative Adversarial Network (GAN). By incorporating data generation process as contextual information and utilizing LLM as the optimizer, our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes. Our experimental results on public and private datasets demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.

nan

Article 517

Title@2025-07-28 (1): Position: Adopt Constraints Over Penalties in Deep Learning

Title: Position: Adopt Constraints Over Penalties in Deep Learning

Position: Überstrapazierte Strafen im Deep Learning adoptieren

职位:在深深学习中采用约束措施以凌驾刑罚 2505.20628v3

Authors (3): Juan Ramirez, Meraj Hashemizadeh, Simon Lacoste-Julien

Recent efforts to develop trustworthy AI systems with accountability guarantees have led to widespread use of machine learning formulations incorporating external requirements, or constraints. These requirements are often enforced via penalization–adding fixed-weight terms to the task loss. We argue this approach is fundamentally ill-suited since there may be no penalty coefficient that simultaneously ensures constraint satisfaction and optimal constrained performance, i.e., that truly solves the constrained problem. Moreover, tuning these coefficients requires costly trial-and-error, incurring significant time and computational overhead. We, therefore, advocate for broader adoption of tailored constrained optimization methods–such as the Lagrangian approach, which jointly optimizes the penalization “coefficients” (the Lagrange multipliers) and the model parameters. Such methods (i) truly solve the constrained problem and do so accountably, by clearly defining feasibility and verifying when it is achieved, (ii) eliminate the need for extensive penalty tuning, and (iii) integrate seamlessly with modern deep learning pipelines.

nan

Article 518

Title@2025-07-28 (1): Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

Title: Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

Großes Sprachmodell-verstärktes Verstärkungslernen für vielfältige und neuartige Empfehlungen

为多样化和新颖建议加强大型语文强化学习模式 2507.21274v1

Authors (5): Jiin Woo, Alireza Bagheri Garakani, Tianchen Zhou, Zhishen Huang, Yan Gao

In recommendation systems, diversity and novelty are essential for capturing varied user preferences and encouraging exploration, yet many systems prioritize click relevance. While reinforcement learning (RL) has been explored to improve diversity, it often depends on random exploration that may not align with user interests. We propose LAAC (LLM-guided Adversarial Actor Critic), a novel method that leverages large language models (LLMs) as reference policies to suggest novel items, while training a lightweight policy to refine these suggestions using system-specific data. The method formulates training as a bilevel optimization between actor and critic networks, enabling the critic to selectively favor promising novel actions and the actor to improve its policy beyond LLM recommendations. To mitigate overestimation of unreliable LLM suggestions, we apply regularization that anchors critic values for unexplored items close to well-estimated dataset actions. Experiments on real-world datasets show that LAAC outperforms existing baselines in diversity, novelty, and accuracy, while remaining robust on imbalanced data, effectively integrating LLM knowledge without expensive fine-tuning.

nan

Article 519

Title@2025-07-28 (1): Deep Polynomial Chaos Expansion

Title: Deep Polynomial Chaos Expansion

Tiefenpolynomiale Chaos-Expansion

深刻的多元混乱扩大 2507.21273v1

Authors (3): Johannes Exenberger, Sascha Ranftl, Robert Peharz

Polynomial chaos expansion (PCE) is a classical and widely used surrogate modeling technique in physical simulation and uncertainty quantification. By taking a linear combination of a set of basis polynomials - orthonormal with respect to the distribution of uncertain input parameters - PCE enables tractable inference of key statistical quantities, such as (conditional) means, variances, covariances, and Sobol sensitivity indices, which are essential for understanding the modeled system and identifying influential parameters and their interactions. As the number of basis functions grows exponentially with the number of parameters, PCE does not scale well to high-dimensional problems. We address this challenge by combining PCE with ideas from probabilistic circuits, resulting in the deep polynomial chaos expansion (DeepPCE) - a deep generalization of PCE that scales effectively to high-dimensional input spaces. DeepPCE achieves predictive performance comparable to that of multi-layer perceptrons (MLPs), while retaining PCE’s ability to compute exact statistical inferences via simple forward passes.

nan

Article 520

Title@2025-07-28 (1): Generative imaging for radio interferometry with fast uncertainty quantification

Title: Generative imaging for radio interferometry with fast uncertainty quantification

Generative Bildgebung für die Radiointerferometrie mit schneller Unsicherheitsquantifizierung

具有快速不确定性量化的无线电干涉测量生成成像 2507.21270v1

Authors (5): Matthijs Mars, Tobías I. Liaudat, Jessica J. Whitney, Marta M. Betcke, Jason D. McEwen

With the rise of large radio interferometric telescopes, particularly the SKA, there is a growing demand for computationally efficient image reconstruction techniques. Existing reconstruction methods, such as the CLEAN algorithm or proximal optimisation approaches, are iterative in nature, necessitating a large amount of compute. These methods either provide no uncertainty quantification or require large computational overhead to do so. Learned reconstruction methods have shown promise in providing efficient and high quality reconstruction. In this article we explore the use of generative neural networks that enable efficient approximate sampling of the posterior distribution for high quality reconstructions with uncertainty quantification. Our RI-GAN framework, builds on the regularised conditional generative adversarial network (rcGAN) framework by integrating a gradient U-Net (GU-Net) architecture - a hybrid reconstruction model that embeds the measurement operator directly into the network. This framework uses Wasserstein GANs to improve training stability in combination with regularisation terms that combat mode collapse, which are typical problems for conditional GANs. This approach takes as input the dirty image and the point spread function (PSF) of the observation and provides efficient, high-quality image reconstructions that are robust to varying visibility coverages, generalises to images with an increased dynamic range, and provides informative uncertainty quantification. Our methods provide a significant step toward computationally efficient, scalable, and uncertainty-aware imaging for next-generation radio telescopes.

nan

Article 521

Title@2025-07-28 (1): Numerical PDE solvers outperform neural PDE solvers

Title: Numerical PDE solvers outperform neural PDE solvers

Numerische PDE-Löser übertreffen neuronale PDE-Löser

数字 PDE 溶解器超过神经神经功能 PDE 溶解器 2507.21269v1

Authors (4): Patrick Chatain, Michael Rizvi-Martel, Guillaume Rabusseau, Adam Oberman

We present DeepFDM, a differentiable finite-difference framework for learning spatially varying coefficients in time-dependent partial differential equations (PDEs). By embedding a classical forward-Euler discretization into a convolutional architecture, DeepFDM enforces stability and first-order convergence via CFL-compliant coefficient parameterizations. Model weights correspond directly to PDE coefficients, yielding an interpretable inverse-problem formulation. We evaluate DeepFDM on a benchmark suite of scalar PDEs: advection, diffusion, advection-diffusion, reaction-diffusion and inhomogeneous Burgers’ equations-in one, two and three spatial dimensions. In both in-distribution and out-of-distribution tests (quantified by the Hellinger distance between coefficient priors), DeepFDM attains normalized mean-squared errors one to two orders of magnitude smaller than Fourier Neural Operators, U-Nets and ResNets; requires 10-20X fewer training epochs; and uses 5-50X fewer parameters. Moreover, recovered coefficient fields accurately match ground-truth parameters. These results establish DeepFDM as a robust, efficient, and transparent baseline for data-driven solution and identification of parametric PDEs.

nan

Article 522

Title@2025-07-28 (1): Adversarial attacks and defenses in explainable artificial intelligence: A survey

Title: Adversarial attacks and defenses in explainable artificial intelligence: A survey

Adversariale Angriffe und Abwehrkräfte in erklärbarer künstlicher Intelligenz: Eine Umfrage

可解释的人工智能中的反向攻击和防御:一项调查 2306.06123v4

Authors (2): Hubert Baniecki, Przemyslaw Biecek

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model’s reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.

nan

Article 523

Title@2025-07-28 (1): Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics

Title: Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics

Multiskaliges geometrisches und topologisches Lernen in der Analyse der kollektiven Dynamik weicher Materie

在分析软物质集体动态中进行多尺度多几何学和地形学学习 2507.21265v1

Authors (8): Tetiana Orlova, Amaranta Membrillo Solis, Hayley R. O. Sohn, Tristan Madeleine, Giampaolo D’Alessandro, Ivan I. Smalyukh, Malgosia Kaczmarek, Jacek Brodzki

Understanding the behavior and evolution of a dynamical many-body system by analyzing patterns in their experimentally captured images is a promising method relevant for a variety of living and non-living self-assembled systems. The arrays of moving liquid crystal skyrmions studied here are a representative example of hierarchically organized materials that exhibit complex spatiotemporal dynamics driven by multiscale processes. Joint geometric and topological data analysis (TDA) offers a powerful framework for investigating such systems by capturing the underlying structure of the data at multiple scales. In the TDA approach, we introduce the $\Psi$-function, a robust numerical topological descriptor related to both the spatiotemporal changes in the size and shape of individual topological solitons and the emergence of regions with their different spatial organization. The geometric method based on the analysis of vector fields generated from images of skyrmion ensembles offers insights into the nonlinear physical mechanisms of the system’s response to external stimuli and provides a basis for comparison with theoretical predictions. The methodology presented here is very general and can provide a characterization of system behavior both at the level of individual pattern-forming agents and as a whole, allowing one to relate the results of image data analysis to processes occurring in a physical, chemical, or biological system in the real world.

nan

Article 524

Title@2025-07-28 (1): Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Title: Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Verstärktes Lernen Fine-Tunes ein Sparse Subnetwork in großen Sprachmodellen

以大语言模式建立粗略的子网络 2507.17107v2

Authors (1): Andrii Balashov

Reinforcement learning (RL) is a key post-pretraining step for aligning large language models (LLMs) with complex tasks and human preferences. While it is often assumed that RL fine-tuning requires updating most of a model’s parameters, we challenge this assumption with a surprising finding: RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged. We call this phenomenon RL-induced parameter update sparsity. It arises naturally, without any sparsity constraints or parameter-efficient tuning, and appears across multiple RL algorithms (e.g., PPO, DPO, SimPO, PRIME) and model families (e.g., OpenAI, Meta, and open-source LLMs). Moreover, the subnetworks updated by RL show substantial overlap across different seeds, datasets, and algorithms-far exceeding chance-suggesting a partially transferable structure in the pretrained model. We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model. Our analysis suggests this sparsity emerges because RL operates near the model’s original distribution, requiring only targeted changes. KL penalties, gradient clipping, and on-policy dynamics have limited effect on the sparsity pattern. These findings shed new light on how RL adapts models: not by shifting all weights, but by focusing training on a small, consistently updated subnetwork. This insight enables more efficient RL methods and reframes sparsity through the lens of the lottery ticket hypothesis.

nan

Article 525

Title@2025-07-28 (1): Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors

Title: Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors

Adaptive multimodale Protein-Plug-and-Play mit Diffusion-basierten Prioren

适应性多式蛋白丁质多式聚苯乙烯插件和基于传播的前期布料 2507.21260v1

Authors (4): Amartya Banerjee, Xingyu Xu, Caroline Moosmüller, Harlin Lee

In an inverse problem, the goal is to recover an unknown parameter (e.g., an image) that has typically undergone some lossy or noisy transformation during measurement. Recently, deep generative models, particularly diffusion models, have emerged as powerful priors for protein structure generation. However, integrating noisy experimental data from multiple sources to guide these models remains a significant challenge. Existing methods often require precise knowledge of experimental noise levels and manually tuned weights for each data modality. In this work, we introduce Adam-PnP, a Plug-and-Play framework that guides a pre-trained protein diffusion model using gradients from multiple, heterogeneous experimental sources. Our framework features an adaptive noise estimation scheme and a dynamic modality weighting mechanism integrated into the diffusion process, which reduce the need for manual hyperparameter tuning. Experiments on complex reconstruction tasks demonstrate significantly improved accuracy using Adam-PnP.

nan

Article 526

Title@2025-07-28 (1): Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees

Title: Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees

Heterogener Behandlungseffekt bei Time-to-Event-Ergebnissen: Zensierte Daten mit rekursiv unterstellten Bäumen nutzen

时间到晚上结果中的异异异性治疗效应:利用对立的树木利用敏感数据 2502.01575v3

Authors (3): Tomer Meir, Uri Shalit, Malka Gorfine

Tailoring treatments to individual needs is a central goal in fields such as medicine. A key step toward this goal is estimating Heterogeneous Treatment Effects (HTE) - the way treatments impact different subgroups. While crucial, HTE estimation is challenging with survival data, where time until an event (e.g., death) is key. Existing methods often assume complete observation, an assumption violated in survival data due to right-censoring, leading to bias and inefficiency. Cui et al. (2023) proposed a doubly-robust method for HTE estimation in survival data under no hidden confounders, combining a causal survival forest with an augmented inverse-censoring weighting estimator. However, we find it struggles under heavy censoring, which is common in rare-outcome problems such as Amyotrophic lateral sclerosis (ALS). Moreover, most current methods cannot handle instrumental variables, which are a crucial tool in the causal inference arsenal. We introduce Multiple Imputation for Survival Treatment Response (MISTR), a novel, general, and non-parametric method for estimating HTE in survival data. MISTR uses recursively imputed survival trees to handle censoring without directly modeling the censoring mechanism. Through extensive simulations and analysis of two real-world datasets-the AIDS Clinical Trials Group Protocol 175 and the Illinois unemployment dataset we show that MISTR outperforms prior methods under heavy censoring in the no-hidden-confounders setting, and extends to the instrumental variable setting. To our knowledge, MISTR is the first non-parametric approach for HTE estimation with unobserved confounders via instrumental variables.

nan

Article 527

Title@2025-07-28 (1): Diffusion Denoiser-Aided Gyrocompassing

Title: Diffusion Denoiser-Aided Gyrocompassing

Diffusion Denoiser-Aided Gyrocompassing

传播 Denoiser 辅助热聚热器 2507.21245v1

Authors (4): Gershy Ben-Arie, Daniel Engelsman, Rotem Dror, Itzik Klein

An accurate initial heading angle is essential for efficient and safe navigation across diverse domains. Unlike magnetometers, gyroscopes can provide accurate heading reference independent of the magnetic disturbances in a process known as gyrocompassing. Yet, accurate and timely gyrocompassing, using low-cost gyroscopes, remains a significant challenge in scenarios where external navigation aids are unavailable. Such challenges are commonly addressed in real-world applications such as autonomous vehicles, where size, weight, and power limitations restrict sensor quality, and noisy measurements severely degrade gyrocompassing performance. To cope with this challenge, we propose a novel diffusion denoiser-aided gyrocompass approach. It integrates a diffusion-based denoising framework with an enhanced learning-based heading estimation model. The diffusion denoiser processes raw inertial sensor signals before input to the deep learning model, resulting in accurate gyrocompassing. Experiments using both simulated and real sensor data demonstrate that our proposed approach improves gyrocompassing accuracy by 26% compared to model-based gyrocompassing and by 15% compared to other learning-driven approaches. This advancement holds particular significance for ensuring accurate and robust navigation in autonomous platforms that incorporate low-cost gyroscopes within their navigation systems.

nan

Article 528

Title@2025-07-28 (1): Bubbleformer: Forecasting Boiling with Transformers

Title: Bubbleformer: Forecasting Boiling with Transformers

Bubbleformer: Vorhersage Kochen mit Transformatoren

Bubbleex: 预测与变压器相混合 2507.21244v1

Authors (5): Sheikh Md Shakeel Hassan, Xianwei Zou, Akash Dhruv, Vishwanath Ganesan, Aparna Chandramowlishwaran

Modeling boiling (an inherently chaotic, multiphase process central to energy and thermal systems) remains a significant challenge for neural PDE surrogates. Existing models require future input (e.g., bubble positions) during inference because they fail to learn nucleation from past states, limiting their ability to autonomously forecast boiling dynamics. They also fail to model flow boiling velocity fields, where sharp interface-momentum coupling demands long-range and directional inductive biases. We introduce Bubbleformer, a transformer-based spatiotemporal model that forecasts stable and long-range boiling dynamics including nucleation, interface evolution, and heat transfer without dependence on simulation data during inference. Bubbleformer integrates factorized axial attention, frequency-aware scaling, and conditions on thermophysical parameters to generalize across fluids, geometries, and operating conditions. To evaluate physical fidelity in chaotic systems, we propose interpretable physics-based metrics that evaluate heat-flux consistency, interface geometry, and mass conservation. We also release BubbleML 2.0, a high-fidelity dataset that spans diverse working fluids (cryogens, refrigerants, dielectrics), boiling configurations (pool and flow boiling), flow regimes (bubbly, slug, annular), and boundary conditions. Bubbleformer sets new benchmark results in both prediction and forecasting of two-phase boiling flows.

nan

Article 529

Title@2025-07-28 (1): Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors

Title: Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors

Fluidisch innervated Gitter machen vielseitige und langlebige Taktile Sensoren

具有流力、动态和耐久感应感应传感器 2507.21225v1

Authors (6): Annan Zhang, Miguel Flores-Acton, Andy Yu, Anshul Gupta, Maggie Yao, Daniela Rus

Tactile sensing plays a fundamental role in enabling robots to navigate dynamic and unstructured environments, particularly in applications such as delicate object manipulation, surface exploration, and human-robot interaction. In this paper, we introduce a passive soft robotic fingertip with integrated tactile sensing, fabricated using a 3D-printed elastomer lattice with embedded air channels. This sensorization approach, termed fluidic innervation, transforms the lattice into a tactile sensor by detecting pressure changes within sealed air channels, providing a simple yet robust solution to tactile sensing in robotics. Unlike conventional methods that rely on complex materials or designs, fluidic innervation offers a simple, scalable, single-material fabrication process. We characterize the sensors’ response, develop a geometric model to estimate tip displacement, and train a neural network to accurately predict contact location and contact force. Additionally, we integrate the fingertip with an admittance controller to emulate spring-like behavior, demonstrate its capability for environment exploration through tactile feedback, and validate its durability under high impact and cyclic loading conditions. This tactile sensing technique offers advantages in terms of simplicity, adaptability, and durability and opens up new opportunities for versatile robotic manipulation.

nan

Article 530

Title@2025-07-28 (1): Benchmarking a Tunable Quantum Neural Network on Trapped-Ion and Superconducting Hardware

Title: Benchmarking a Tunable Quantum Neural Network on Trapped-Ion and Superconducting Hardware

Benchmarking eines Tunable Quantum Neural Network auf Trapped-Ion und supraleitende Hardware

设定关于受困和超导制成硬硬件的金枪鱼可量量神经网络的基准基准 2507.21222v1

Authors (7): Djamil Lakhdar-Hamina, Xingxin Liu, Richard Barney, Sarah H. Miller, Alaina M. Green, Norbert M. Linke, Victor Galitski

We implement a quantum generalization of a neural network on trapped-ion and IBM superconducting quantum computers to classify MNIST images, a common benchmark in computer vision. The network feedforward involves qubit rotations whose angles depend on the results of measurements in the previous layer. The network is trained via simulation, but inference is performed experimentally on quantum hardware. The classical-to-quantum correspondence is controlled by an interpolation parameter, $a$, which is zero in the classical limit. Increasing $a$ introduces quantum uncertainty into the measurements, which is shown to improve network performance at moderate values of the interpolation parameter. We then focus on particular images that fail to be classified by a classical neural network but are detected correctly in the quantum network. For such borderline cases, we observe strong deviations from the simulated behavior. We attribute this to physical noise, which causes the output to fluctuate between nearby minima of the classification energy landscape. Such strong sensitivity to physical noise is absent for clear images. We further benchmark physical noise by inserting additional single-qubit and two-qubit gate pairs into the neural network circuits. Our work provides a springboard toward more complex quantum neural networks on current devices: while the approach is rooted in standard classical machine learning, scaling up such networks may prove classically non-simulable and could offer a route to near-term quantum advantage.

nan

Article 531

Title@2025-07-28 (1): Flow Matching Policy Gradients

Title: Flow Matching Policy Gradients

Strömungszugehörige politische Gradienten

流程匹配政策梯度 2507.21053v1

Authors (8): David McAllister, Songwei Ge, Brent Yi, Chung Min Kim, Ethan Weber, Hongsuk Choi, Haiwen Feng, Angjoo Kanazawa

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching loss, in a manner compatible with the popular PPO-clip framework. It sidesteps the need for exact likelihood computation while preserving the generative capabilities of flow-based models. Unlike prior approaches for diffusion-based reinforcement learning that bind training to a specific sampling method, FPO is agnostic to the choice of diffusion or flow integration at both training and inference time. We show that FPO can train diffusion-style policies from scratch in a variety of continuous control tasks. We find that flow-based models can capture multimodal action distributions and achieve higher performance than Gaussian policies, particularly in under-conditioned settings.

nan

Article 532

Title@2025-07-28 (1): Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Title: Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Rep-MTL: Lösen der Macht der Repräsentationsaufgabe Saliency für Multi-Task-Lernen

Rep-MTL:释放代表一级任务在多任务学习方面的弹性权力 2507.21049v1

Authors (3): Zedong Wang, Siyuan Li, Dan Xu

Despite the promise of Multi-Task Learning in leveraging complementary knowledge across tasks, existing multi-task optimization (MTO) techniques remain fixated on resolving conflicts via optimizer-centric loss scaling and gradient manipulation strategies, yet fail to deliver consistent gains. In this paper, we argue that the shared representation space, where task interactions naturally occur, offers rich information and potential for operations complementary to existing optimizers, especially for facilitating the inter-task complementarity, which is rarely explored in MTO. This intuition leads to Rep-MTL, which exploits the representation-level task saliency to quantify interactions between task-specific optimization and shared representation learning. By steering these saliencies through entropy-based penalization and sample-wise cross-task alignment, Rep-MTL aims to mitigate negative transfer by maintaining the effective training of individual tasks instead pure conflict-solving, while explicitly promoting complementary information sharing. Experiments are conducted on four challenging MTL benchmarks covering both task-shift and domain-shift scenarios. The results show that Rep-MTL, even paired with the basic equal weighting policy, achieves competitive performance gains with favorable efficiency. Beyond standard performance metrics, Power Law exponent analysis demonstrates Rep-MTL’s efficacy in balancing task-specific learning and cross-task sharing. The project page is available at HERE.

nan

Article 533

Title@2025-07-28 (1): Agentic Web: Weaving the Next Web with AI Agents

Title: Agentic Web: Weaving the Next Web with AI Agents

Agentic Web: Das nächste Web mit KI-Agenten weben

代理网络: 与 AI 代理进行下个网络编织 2507.21206v1

Authors (18): Yingxuan Yang, Mulei Ma, Yuxuan Huang, Huacan Chai, Chenyu Gong, Haoran Geng, Yuanjian Zhou, Ying Wen, Meng Fang, Muhao Chen, Shangding Gu, Ming Jin, Costas Spanos, Yang Yang, Pieter Abbeel, Dawn Song, Weinan Zhang, Jun Wang

The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web, a new phase of the internet defined by autonomous, goal-driven interactions. In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users. This transition from human-driven to machine-to-machine interaction allows intent to be delegated, relieving users from routine digital operations and enabling a more interactive, automated web experience. In this paper, we present a structured framework for understanding and building the Agentic Web. We trace its evolution from the PC and Mobile Web eras and identify the core technological foundations that support this shift. Central to our framework is a conceptual model consisting of three key dimensions: intelligence, interaction, and economics. These dimensions collectively enable the capabilities of AI agents, such as retrieval, recommendation, planning, and collaboration. We analyze the architectural and infrastructural challenges involved in creating scalable agentic systems, including communication protocols, orchestration strategies, and emerging paradigms such as the Agent Attention Economy. We conclude by discussing the potential applications, societal risks, and governance issues posed by agentic systems, and outline research directions for developing open, secure, and intelligent ecosystems shaped by both human intent and autonomous agent behavior. A continuously updated collection of relevant studies for agentic web is available at: https://github.com/SafeRL-Lab/agentic-web.

nan

Article 534

Title@2025-07-28 (1): Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

Title: Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

Transformer als ungerollte Folgerung in probabilistischen Laplacian Eigenmaps: Eine Interpretation und mögliche Verbesserungen

Laplacian Eigenmaps: 解释和可能的改进 2507.21040v1

Authors (2): Aditya Ravuri, Neil D. Lawrence

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform “linear” dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

nan

Article 535

Title@2025-07-28 (1): When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Title: When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Wenn Brain Foundation Model Cauchy-Schwarz Divergenz trifft: Ein neues Framework für die bereichsübergreifende Motor Imagery Decodierung

当大脑基金会模型与Cauchy-Schwarz差异相遇时:跨物体机动图象解码新框架 2507.21037v1

Authors (6): Jinzhou Wu, Baoping Tang, Qikang Li, Yi Wang, Cheng Li, Shujian Yu

Decoding motor imagery (MI) electroencephalogram (EEG) signals, a key non-invasive brain-computer interface (BCI) paradigm for controlling external systems, has been significantly advanced by deep learning. However, MI-EEG decoding remains challenging due to substantial inter-subject variability and limited labeled target data, which necessitate costly calibration for new users. Many existing multi-source domain adaptation (MSDA) methods indiscriminately incorporate all available source domains, disregarding the large inter-subject differences in EEG signals, which leads to negative transfer and excessive computational costs. Moreover, while many approaches focus on feature distribution alignment, they often neglect the explicit dependence between features and decision-level outputs, limiting their ability to preserve discriminative structures. To address these gaps, we propose a novel MSDA framework that leverages a pretrained large Brain Foundation Model (BFM) for dynamic and informed source subject selection, ensuring only relevant sources contribute to adaptation. Furthermore, we employ Cauchy-Schwarz (CS) and Conditional CS (CCS) divergences to jointly perform feature-level and decision-level alignment, enhancing domain invariance while maintaining class discriminability. Extensive evaluations on two benchmark MI-EEG datasets demonstrate that our framework outperforms a broad range of state-of-the-art baselines. Additional experiments with a large source pool validate the scalability and efficiency of BFM-guided selection, which significantly reduces training time without sacrificing performance.

nan

Article 536

Title@2025-07-28 (1): Learning from Limited and Imperfect Data

Title: Learning from Limited and Imperfect Data

Von begrenzten und unvollkommenen Daten lernen

学习有限和不完善数据 2507.21205v1

Authors (1): Harsh Rangwani

The distribution of data in the world (eg, internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well-curated datasets perform suboptimally when used for learning from imperfect datasets with long-tailed imbalances and distribution shifts. To expand the use of deep models, it is essential to overcome the labor-intensive curation process by developing robust algorithms that can learn from diverse, real-world data distributions. Toward this goal, we develop practical algorithms for Deep Neural Networks which can learn from limited and imperfect data present in the real world. This thesis is divided into four segments, each covering a scenario of learning from limited or imperfect data. The first part of the thesis focuses on Learning Generative Models from Long-Tail Data, where we mitigate the mode-collapse and enable diverse aesthetic image generations for tail (minority) classes. In the second part, we enable effective generalization on tail classes through Inductive Regularization schemes, which allow tail classes to generalize as effectively as the head classes without requiring explicit generation of images. In the third part, we develop algorithms for Optimizing Relevant Metrics for learning from long-tailed data with limited annotation (semi-supervised), followed by the fourth part, which focuses on the Efficient Domain Adaptation of the model to various domains with very few to zero labeled samples.

nan

Article 537

Title@2025-07-28 (1): Optimization Performance of Factorization Machine with Annealing under Limited Training Data

Title: Optimization Performance of Factorization Machine with Annealing under Limited Training Data

Optimierung Leistung der Factorisierungsmaschine mit Annealing unter begrenzter Trainingsdaten

根据有限培训数据与Annaaling公司一起使用的保质机械的优化性能 2507.21024v1

Authors (4): Mayumi Nakano, Yuya Seki, Shuta Kikuchi, Shu Tanaka

Black-box (BB) optimization problems aim to identify an input that minimizes the output of a function (the BB function) whose input-output relationship is unknown. Factorization machine with annealing (FMA) is a promising approach to this task, employing a factorization machine (FM) as a surrogate model to iteratively guide the solution search via an Ising machine. Although FMA has demonstrated strong optimization performance across various applications, its performance often stagnates as the number of optimization iterations increases. One contributing factor to this stagnation is the growing number of data points in the dataset used to train FM. It is hypothesized that as more data points are accumulated, the contribution of newly added data points becomes diluted within the entire dataset, thereby reducing their impact on improving the prediction accuracy of FM. To address this issue, we propose a novel method for sequential dataset construction that retains at most a specified number of the most recently added data points. This strategy is designed to enhance the influence of newly added data points on the surrogate model. Numerical experiments demonstrate that the proposed FMA achieves lower-cost solutions with fewer BB function evaluations compared to the conventional FMA.

nan

Article 538

Title@2025-07-28 (1): On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Title: On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Über die Verwendung des schuppigen Wertes für Anomalie Lokalisierung: Eine statistische Untersuchung

利用虚光值实现异常本地化:统计调查 2507.21023v1

Authors (2): Rick S. Blum, Franziska Freytag

Recent publications have suggested using the Shapley value for anomaly localization for sensor data systems. Using a reasonable mathematical anomaly model for full control, experiments indicate that using a single fixed term in the Shapley value calculation achieves a lower complexity anomaly localization test, with the same probability of error, as a test using the Shapley value for all cases tested. A proof demonstrates these conclusions must be true for all independent observation cases. For dependent observation cases, no proof is available.

nan

Article 539

Title@2025-07-28 (1): Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

Title: Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

Behavior-Spezifische Filterung für verbesserte Schweineverhaltensklassifikation in der Precision Livestock Farming

精密牲畜耕作中强化猪品行为分类的具体行为过滤法 2507.21021v1

Authors (4): Zhen Zhang, Dong Sam Ha, Gota Morota, Sook Shin

This study proposes a behavior-specific filtering method to improve behavior classification accuracy in Precision Livestock Farming. While traditional filtering methods, such as wavelet denoising, achieved an accuracy of 91.58%, they apply uniform processing to all behaviors. In contrast, the proposed behavior-specific filtering method combines Wavelet Denoising with a Low Pass Filter, tailored to active and inactive pig behaviors, and achieved a peak accuracy of 94.73%. These results highlight the effectiveness of behavior-specific filtering in enhancing animal behavior monitoring, supporting better health management and farm efficiency.

nan

Article 540

Title@2025-07-28 (1): Deep Learning for Skeleton Based Human Motion Rehabilitation Assessment: A Benchmark

Title: Deep Learning for Skeleton Based Human Motion Rehabilitation Assessment: A Benchmark

Deep Learning für skeletonbasierte Human Motion Rehabilitation Assessment: Ein Benchmark

Skeleton基于Skeleton的人类运动康复评估深学习:基准 2507.21018v1

Authors (5): Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier

Automated assessment of human motion plays a vital role in rehabilitation, enabling objective evaluation of patient performance and progress. Unlike general human activity recognition, rehabilitation motion assessment focuses on analyzing the quality of movement within the same action class, requiring the detection of subtle deviations from ideal motion. Recent advances in deep learning and video-based skeleton extraction have opened new possibilities for accessible, scalable motion assessment using affordable devices such as smartphones or webcams. However, the field lacks standardized benchmarks, consistent evaluation protocols, and reproducible methodologies, limiting progress and comparability across studies. In this work, we address these gaps by (i) aggregating existing rehabilitation datasets into a unified archive called Rehab-Pile, (ii) proposing a general benchmarking framework for evaluating deep learning methods in this domain, and (iii) conducting extensive benchmarking of multiple architectures across classification and regression tasks. All datasets and implementations are released to the community to support transparency and reproducibility. This paper aims to establish a solid foundation for future research in automated rehabilitation assessment and foster the development of reliable, accessible, and personalized rehabilitation solutions. The datasets, source-code and results of this article are all publicly available.

nan

Article 541

Title@2025-07-28 (1): Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Title: Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Vorhersage der Kognition aus fMRI:Eine vergleichende Studie von Graph, Transformer und Kernelmodellen über Aufgaben- und Ruhebedingungen hinweg

FMRI的预测认知:关于不同任务和休息条件的图形、变形器和内核模型的比较研究 2507.21016v1

Authors (4): Jagruti Patel, Mikkel Schöttner, Thomas A. W. Bolton, Patric Hagmann

Predicting cognition from neuroimaging data in healthy individuals offers insights into the neural mechanisms underlying cognitive abilities, with potential applications in precision medicine and early detection of neurological and psychiatric conditions. This study systematically benchmarked classical machine learning (Kernel Ridge Regression (KRR)) and advanced deep learning (DL) models (Graph Neural Networks (GNN) and Transformer-GNN (TGNN)) for cognitive prediction using Resting-state (RS), Working Memory, and Language task fMRI data from the Human Connectome Project Young Adult dataset. Our results, based on R2 scores, Pearson correlation coefficient, and mean absolute error, revealed that task-based fMRI, eliciting neural responses directly tied to cognition, outperformed RS fMRI in predicting cognitive behavior. Among the methods compared, a GNN combining structural connectivity (SC) and functional connectivity (FC) consistently achieved the highest performance across all fMRI modalities; however, its advantage over KRR using FC alone was not statistically significant. The TGNN, designed to model temporal dynamics with SC as a prior, performed competitively with FC-based approaches for task-fMRI but struggled with RS data, where its performance aligned with the lower-performing GNN that directly used fMRI time-series data as node features. These findings emphasize the importance of selecting appropriate model architectures and feature representations to fully leverage the spatial and temporal richness of neuroimaging data. This study highlights the potential of multimodal graph-aware DL models to combine SC and FC for cognitive prediction, as well as the promise of Transformer-based approaches for capturing temporal dynamics. By providing a comprehensive comparison of models, this work serves as a guide for advancing brain-behavior modeling using fMRI, SC and DL.

nan

Article 542

Title@2025-07-28 (1): Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Title: Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Bewertung des Versprechens und der Fälle von LLMs bei Hiring-Entscheidungen

评估LLM女士在雇用决定中的许诺和机会 2507.02087v2

Authors (4): Eitan Anzenberg, Arunava Samajpati, Sivasankaran Chandrasekar, Varun Kacholia

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model’s predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn’t be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

nan

Article 543

Title@2025-07-28 (1): LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

Title: LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

LoRA-PAR: Ein flexibler Dual-System-LoRA-Partitionsansatz für effizientes LLM-Feintuning

LOLAR-PAR:高效 LLM 微调的灵活双系统滚动分割法 2507.20999v1

Authors (4): Yining Huang, Bin Li, Keke Tang, Meilian Chen

Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefit substantially from chain-of-thought (CoT) reasoning, yet pushing their performance typically requires vast data, large model sizes, and full-parameter fine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost, most existing approaches primarily address domain adaptation or layer-wise allocation rather than explicitly tailoring data and parameters to different response demands. Inspired by “Thinking, Fast and Slow,” which characterizes two distinct modes of thought-System 1 (fast, intuitive, often automatic) and System 2 (slower, more deliberative and analytic)-we draw an analogy that different “subregions” of an LLM’s parameters might similarly specialize for tasks that demand quick, intuitive responses versus those requiring multi-step logical reasoning. Therefore, we propose LoRA-PAR, a dual-system LoRA framework that partitions both data and parameters by System 1 or System 2 demands, using fewer yet more focused parameters for each task. Specifically, we classify task data via multi-model role-playing and voting, and partition parameters based on importance scoring, then adopt a two-stage fine-tuning strategy of training System 1 tasks with supervised fine-tuning (SFT) to enhance knowledge and intuition and refine System 2 tasks with reinforcement learning (RL) to reinforce deeper logical deliberation next. Extensive experiments show that the two-stage fine-tuning strategy, SFT and RL, lowers active parameter usage while matching or surpassing SOTA PEFT baselines.

nan

Article 544

Title@2025-07-28 (1): Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

Title: Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

Modulare Delta-Zusammenführung mit orthogonalen Einschränkungen: Ein skalierbarer Rahmen für kontinuierliche und reversible Modellzusammensetzung

模块三角洲与正纵形制约合并:可扩展的连续性和可复制模型构成框架 2507.20997v1

Authors (3): Haris Khan, Shumaila Asif, Sadia Asif

In real-world machine learning deployments, models must be continually updated, composed, and when required, selectively undone. However, existing approaches to model merging and continual learning often suffer from task interference, catastrophic forgetting, or lack of reversibility. We propose Modular Delta Merging with Orthogonal Constraints (MDM-OC), a novel framework that enables scalable, interference-free, and reversible composition of fine-tuned models. Each task-specific model is encoded as a delta from a shared base and projected into an orthogonal subspace to eliminate conflict. These projected deltas are then merged via gradient-based optimization to form a unified model that retains performance across tasks. Our approach supports continual integration of new models, structured unmerging for compliance such as GDPR requirements, and model stability via elastic weight consolidation and synthetic replay. Extensive experiments on vision and natural language processing benchmarks demonstrate that MDM-OC outperforms prior baselines in accuracy, backward transfer, and unmerge fidelity, while remaining memory-efficient and computationally tractable. This framework offers a principled solution for modular and compliant AI system design.

nan

Article 545

Title@2025-07-28 (1): On the Robustness of Global Feature Effect Explanations

Title: On the Robustness of Global Feature Effect Explanations

Über die Robustheit der globalen Feature-Effekt Erklärungen

全球特效解释的威力 2406.09069v2

Authors (4): Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

nan

Article 546

Title@2025-07-28 (1): GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Title: GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

GUI-G$^2$: Gaussian Reward Modeling für GUI Grounding

GUI-G$$2美元:GUI地基的高斯奖赏模型 2507.15846v3

Authors (12): Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang

Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.

nan

Article 547

Title@2025-07-28 (1): Personalized Treatment Effect Estimation from Unstructured Data

Title: Personalized Treatment Effect Estimation from Unstructured Data

Schätzung des personalisierten Behandlungseffekts aus unstrukturierten Daten

来自无结构数据的个人化治疗效果估算 2507.20993v1

Authors (2): Henri Arno, Thomas Demeester

Existing methods for estimating personalized treatment effects typically rely on structured covariates, limiting their applicability to unstructured data. Yet, leveraging unstructured data for causal inference has considerable application potential, for instance in healthcare, where clinical notes or medical images are abundant. To this end, we first introduce an approximate ‘plug-in’ method trained directly on the neural representations of unstructured data. However, when these fail to capture all confounding information, the method may be subject to confounding bias. We therefore introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training, but allow estimating personalized treatment effects purely from unstructured inputs, while avoiding confounding bias. When these structured measurements are only available for a non-representative subset of the data, these estimators may suffer from sampling bias. To address this, we further introduce a regression-based correction that accounts for the non-uniform sampling, assuming the sampling mechanism is known or can be well-estimated. Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings, despite its simplicity.

nan

Article 548

Title@2025-07-28 (1): Scaling Physical Reasoning with the PHYSICS Dataset

Title: Scaling Physical Reasoning with the PHYSICS Dataset

Skalierung der physikalischen Vernunft mit dem PHYSICS-Datensatz

利用PHYSICS数据集调整物理理由 2506.00022v3

Authors (12): Shenghe Zheng, Qianjia Cheng, Junchi Yao, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui, Peng Ye

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model’s physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics.

nan

Article 549

Title@2025-07-28 (1): Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs

Title: Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs

Reparieren von Schwachstellen ohne unsichtbare Hände. Eine differenzierte Replikationsstudie auf LLMs

在没有无形手的情况下修复弱点,对LLMs进行差别化的推广研究。 2507.20977v1

Authors (2): Maria Camporese, Fabio Massacci

Background: Automated Vulnerability Repair (AVR) is a fast-growing branch of program repair. Recent studies show that large language models (LLMs) outperform traditional techniques, extending their success beyond code generation and fault detection. Hypothesis: These gains may be driven by hidden factors – “invisible hands” such as training-data leakage or perfect fault localization – that let an LLM reproduce human-authored fixes for the same code. Objective: We replicate prior AVR studies under controlled conditions by deliberately adding errors to the reported vulnerability location in the prompt. If LLMs merely regurgitate memorized fixes, both small and large localization errors should yield the same number of correct patches, because any offset should divert the model from the original fix. Method: Our pipeline repairs vulnerabilities from the Vul4J and VJTrans benchmarks after shifting the fault location by n lines from the ground truth. A first LLM generates a patch, a second LLM reviews it, and we validate the result with regression and proof-of-vulnerability tests. Finally, we manually audit a sample of patches and estimate the error rate with the Agresti-Coull-Wilson method.

nan

Article 550

Title@2025-07-28 (1): Locally Adaptive Conformal Inference for Operator Models

Title: Locally Adaptive Conformal Inference for Operator Models

Lokale Adaptive Konforme Schlussfolgerung für Operatormodelle

操作者模型的本地适应性本地化常规推论 2507.20975v1

Authors (2): Trevor Harris, Yan Liu

Operator models are regression algorithms for functional data and have become a key tool for emulating large-scale dynamical systems. Recent advances in deep neural operators have dramatically improved the accuracy and scalability of operator modeling, but lack an inherent notion of predictive uncertainty. We introduce Local Spectral Conformal Inference (LSCI), a new framework for locally adaptive, distribution-free uncertainty quantification for neural operator models. LSCI uses projection-based depth scoring and localized conformal inference to generate function-valued prediction sets with statistical guarantees. We prove approximate finite-sample marginal coverage under local exchangeability, and demonstrate significant gains in adaptivity and coverage across synthetic and real-world operator learning tasks.

nan

Article 551

Title@2025-07-28 (1): Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Title: Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Model-Agnostic Gender Bias Control für Text-to-Image Generation via Sparse Autoencoder

通过 Sparse 自动编码器控制文本到图像生成的模型 – – 不可允许性别比控制 2507.20973v1

Authors (6): Chao Wu, Zhenyi Wang, Kangxian Xie, Naresh Kumar Devulapally, Vishnu Suresh Lokhande, Mingchen Gao

Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced outputs. Trained only once, the sparse autoencoder provides a reusable debiasing direction, offering effective control and interpretable insight into biased subspaces. Extensive evaluations across multiple T2I models, including Stable Diffusion 1.4, 1.5, 2.1, and SDXL, demonstrate that SAE Debias substantially reduces gender bias while preserving generation quality. To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models. These findings contribute toward building socially responsible generative AI, providing an interpretable and model-agnostic tool to support fairness in text-to-image generation.

nan

Article 552

Title@2025-07-28 (1): A Modular Open Source Framework for Genomic Variant Calling

Title: A Modular Open Source Framework for Genomic Variant Calling

Modulares Open Source Framework für den genomischen Variant Calling

基因变异召唤模块开放源框架 2411.11513v2

Authors (4): Ankita Vaishnobi Bisoi, Shreyas V, Jose Siguenza, Bharath Ramsundar

Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages DeepVariant’s convolutional neural network (CNN) architecture to improve the accuracy and reliability of variant detection. The implemented pipeline includes stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using a modified Inception v3 model. Our work adds a modular and extensible variant calling framework to the DeepChem framework and enables future work integrating DeepChem’s drug discovery infrastructure more tightly with bioinformatics pipelines.

nan

Article 553

Title@2025-07-28 (1): A Survey of Deep Learning for Geometry Problem Solving

Title: A Survey of Deep Learning for Geometry Problem Solving

Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen

解决几何问题深层学习调查 2507.11936v4

Authors (3): Jianzhe Ma, Wenxuan Wang, Qin Jin

Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.

nan

Article 554

Title@2025-07-28 (1): Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Title: Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Mehrbildbeschreibungen für mehrsprachige, leichte Kognitive Impairment-Erkennung durch kontrastives Lernen enthüllen

通过差异学习发现多语种轻视认知缺陷的单形多语种描述 2505.17067v3

Authors (5): Kristin Qi, Jiali Cheng, Youxiang Zhu, Hadi Amiri, Xiaohui Liang

Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the ‘Cookie Theft’). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework’s effectiveness in multilingual and multi-picture MCI detection.

nan

Article 555

Title@2025-07-28 (1): From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

Title: From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

Von der Verschränkung zur Ausrichtung: Repräsentationsraumdekomposition für unüberwachte Zeitreihen-Domänenanpassung

从连接到对齐:无人监督的时间序列适应的代表空间分解 2507.20968v1

Authors (4): Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang

Domain shift poses a fundamental challenge in time series analysis, where models trained on source domain often fail dramatically when applied in target domain with different yet similar distributions. While current unsupervised domain adaptation (UDA) methods attempt to align cross-domain feature distributions, they typically treat features as indivisible entities, ignoring their intrinsic compositions that governs domain adaptation. We introduce DARSD, a novel UDA framework with theoretical explainability that explicitly realizes UDA tasks from the perspective of representation space decomposition. Our core insight is that effective domain adaptation requires not just alignment, but principled disentanglement of transferable knowledge from mixed representations. DARSD consists three synergistic components: (I) An adversarial learnable common invariant basis that projects original features into a domain-invariant subspace while preserving semantic content; (II) A prototypical pseudo-labeling mechanism that dynamically separates target features based on confidence, hindering error accumulation; (III) A hybrid contrastive optimization strategy that simultaneously enforces feature clustering and consistency while mitigating emerging distribution gaps. Comprehensive experiments conducted on four benchmark datasets (WISDM, HAR, HHAR, and MFD) demonstrate DARSD’s superiority against 12 UDA algorithms, achieving optimal performance in 35 out of 53 cross-domain scenarios.

nan

Article 556

Title@2025-07-28 (1): PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

Title: PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

PROVCREATOR: Synthese komplexer heterogener Graphen mit Knoten- und Kantenattributen

PROVCTOR: 综合复杂异异化图与节点和边缘属性 2507.20967v1

Authors (7): Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee

The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging – especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs for contextual modeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains: system provenance graphs in cybersecurity and knowledge graphs from IntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricate dependencies between structure and semantics, enabling the generation of realistic and privacy-aware synthetic datasets.

nan

Article 557

Title@2025-07-28 (1): Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

Title: Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

Handoff-Design in benutzer-zentralen zellfreien Massive MIMO-Netzwerke mit DRL

使用DRL的无用户核心细胞无大规模MIMM网络的离岸设计 2507.20966v1

Authors (5): Hussein A. Ammar, Raviraj Adve, Shahram Shahbazpanahi, Gary Boudreau, Israfil Bahceci

In the user-centric cell-free massive MIMO (UC-mMIMO) network scheme, user mobility necessitates updating the set of serving access points to maintain the user-centric clustering. Such updates are typically performed through handoff (HO) operations; however, frequent HOs lead to overheads associated with the allocation and release of resources. This paper presents a deep reinforcement learning (DRL)-based solution to predict and manage these connections for mobile users. Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy. We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs. We develop two variants of our system; the first one uses mobility direction-assisted (DA) observations that are based on the user movement pattern, while the second one uses history-assisted (HA) observations that are based on the history of the large-scale fading (LSF). Simulation results show that our DRL-based continuous action space approach is more scalable than discrete space counterpart, and that our derived HO policy automatically learns to gather HOs in specific time slots to minimize the overhead of initiating HOs. Our solution can also operate in real time with a response time less than 0.4 ms.

nan

Article 558

Title@2025-07-28 (1): Core Safety Values for Provably Corrigible Agents

Title: Core Safety Values for Provably Corrigible Agents

Grundlegende Sicherheitswerte für wahrscheinlich korrigierbare Wirkstoffe

可可调代用品的核心安全价值 2507.20964v1

Authors (1): Aran Nayebi

We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments. Our framework replaces a single opaque reward with five structurally separate utility heads – deference, switch-access preservation, truthfulness, low-impact behavior via a belief-based extension of Attainable Utility Preservation, and bounded task reward – combined lexicographically by strict weight gaps. Theorem 1 proves exact single-round corrigibility in the partially observable off-switch game; Theorem 3 extends the guarantee to multi-step, self-spawning agents, showing that even if each head is \emph{learned} to mean-squared error $\varepsilon$ and the planner is $\varepsilon$-sub-optimal, the probability of violating \emph{any} safety property is bounded while still ensuring net human benefit. In contrast to Constitutional AI or RLHF/RLAIF, which merge all norms into one learned scalar, our separation makes obedience and impact-limits dominate even when incentives conflict. For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable by reduction to the halting problem, then carve out a finite-horizon ``decidable island’’ where safety can be certified in randomized polynomial time and verified with privacy-preserving, constant-round zero-knowledge proofs. Consequently, the remaining challenge is the ordinary ML task of data coverage and generalization: reward-hacking risk is pushed into evaluation quality rather than hidden incentive leak-through, giving clearer implementation guidance for today’s LLM assistants and future autonomous systems.

nan

Article 559

Title@2025-07-28 (1): Mean-Field Langevin Diffusions with Density-dependent Temperature

Title: Mean-Field Langevin Diffusions with Density-dependent Temperature

Mittleres Feld Langevin Diffusionen mit Dichte-abhängiger Temperatur

依赖密度温度的中度Langevin发射场 2507.20958v1

Authors (2): Yu-Jui Huang, Zachariah Malik

In the context of non-convex optimization, we let the temperature of a Langevin diffusion to depend on the diffusion’s own density function. The rationale is that the induced density reveals to some extent the landscape imposed by the non-convex function to be minimized, such that a density-dependent temperature can provide location-wise random perturbation that may better react to, for instance, the location and depth of local minimizers. As the Langevin dynamics is now self-regulated by its own density, it forms a mean-field stochastic differential equation (SDE) of the Nemytskii type, distinct from the standard McKean-Vlasov equations. Relying on Wasserstein subdifferential calculus, we first show that the corresponding (nonlinear) Fokker-Planck equation has a unique solution. Next, a weak solution to the SDE is constructed from the solution to the Fokker-Planck equation, by Trevisan’s superposition principle. As time goes to infinity, we further show that the density induced by the SDE converges to an invariant distribution, which admits an explicit formula in terms of the Lambert $W$ function.

nan

Article 560

Title@2025-07-28 (1): An empirical comparison of some outlier detection methods with longitudinal data

Title: An empirical comparison of some outlier detection methods with longitudinal data

Ein empirischer Vergleich einiger Ausreißer-Detektionsmethoden mit Längsschnittdaten

将某些异常探测方法与纵向数据进行实证比较 2507.21203v1

Authors (1): Marcello D’Orazio

This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance between observations or binary partitioning trees. This is achieved by applying the methods to panel survey data related to different types of statistical units. Traditional methods are quite simple, enabling the direct identification of potential outliers, but they require specific assumptions. In contrast, recent methods provide only a score whose magnitude is directly related to the likelihood of an outlier being present. All the methods require the user to set a number of tuning parameters. However, the most recent methods are more flexible and sometimes more effective than traditional methods. In addition, these methods can be applied to multidimensional data.

nan

Article 561

Title@2025-07-28 (1): PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery

Title: PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery

PySHRED: Ein Python-Paket für Shallow REcurrent Decodierung für spärliche Erfassung, Modellreduktion und wissenschaftliche Entdeckung

PySHRED: Sahallow 流流解解码用于遥感、减少模型和科学发现的一个Python包件 2507.20954v1

Authors (7): David Ye, Jan Williams, Mars Gao, Stefano Riva, Matteo Tomasetto, David Zoro, J. Nathan Kutz

SHallow REcurrent Decoders (SHRED) provide a deep learning strategy for modeling high-dimensional dynamical systems and/or spatiotemporal data from dynamical system snapshot observations. PySHRED is a Python package that implements SHRED and several of its major extensions, including for robust sensing, reduced order modeling and physics discovery. In this paper, we introduce the version 1.0 release of PySHRED, which includes data preprocessors and a number of cutting-edge SHRED methods specifically designed to handle real-world data that may be noisy, multi-scale, parameterized, prohibitively high-dimensional, and strongly nonlinear. The package is easy to install, thoroughly-documented, supplemented with extensive code examples, and modularly-structured to support future additions. The entire codebase is released under the MIT license and is available at https://github.com/pyshred-dev/pyshred.

nan

Article 562

Title@2025-07-28 (1): Multivariate Conformal Prediction via Conformalized Gaussian Scoring

Title: Multivariate Conformal Prediction via Conformalized Gaussian Scoring

Multivariate konforme Vorhersage über konforme Gaussian Scoring

通过集成高斯测算法进行多变的多变预测 2507.20941v1

Authors (4): Sacha Braun, Eugène Berta, Michael I. Jordan, Francis Bach

While achieving exact conditional coverage in conformal prediction is unattainable without making strong, untestable regularity assumptions, the promise of conformal prediction hinges on finding approximations to conditional guarantees that are realizable in practice. A promising direction for obtaining conditional dependence for conformal sets–in particular capturing heteroskedasticity–is through estimating the conditional density $\mathbb{P}_{Y

X}$ and conformalizing its level sets. Previous work in this vein has focused on nonconformity scores based on the empirical cumulative distribution function (CDF). Such scores are, however, computationally costly, typically requiring expensive sampling methods. To avoid the need for sampling, we observe that the CDF-based score reduces to a Mahalanobis distance in the case of Gaussian scores, yielding a closed-form expression that can be directly conformalized. Moreover, the use of a Gaussian-based score opens the door to a number of extensions of the basic conformal method; in particular, we show how to construct conformal sets with missing output values, refine conformal sets as partial information about $Y$ becomes available, and construct conformal sets on transformations of the output space. Finally, empirical results indicate that our approach produces conformal sets that more closely approximate conditional coverage in multivariate settings compared to alternative methods.

nan

Article 563

Title@2025-07-28 (1): Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Title: Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Persona-Driven Reasoning in Sprachmodellen per Aktivierungs-Patching auflösen

通过激活补丁在语言模型中通过激活补丁解剖人-人-驱动原因 2507.20936v1

Authors (2): Ansh Poonia, Maeghal Jain

Large language models (LLMs) exhibit remarkable versatility in adopting diverse personas. In this study, we examine how assigning a persona influences a model’s reasoning on an objective task. Using activation patching, we take a first step toward understanding how key components of the model encode persona-specific information. Our findings reveal that the early Multi-Layer Perceptron (MLP) layers attend not only to the syntactic structure of the input but also process its semantic content. These layers transform persona tokens into richer representations, which are then used by the middle Multi-Head Attention (MHA) layers to shape the model’s output. Additionally, we identify specific attention heads that disproportionately attend to racial and color-based identities.

nan

Article 564

Title@2025-07-28 (1): Aether: Geometric-Aware Unified World Modeling

Title: Aether: Geometric-Aware Unified World Modeling

Äther: Geometrisch-Bewusst Unified World Modeling

以太: 几何-软件统一世界建模 2503.18945v3

Authors (11): Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates zero-shot synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Notably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

nan

Article 565

Title@2025-07-28 (1): Breaking the Precision Ceiling in Physics-Informed Neural Networks: A Hybrid Fourier-Neural Architecture for Ultra-High Accuracy

Title: Breaking the Precision Ceiling in Physics-Informed Neural Networks: A Hybrid Fourier-Neural Architecture for Ultra-High Accuracy

Breaking the Precision Ceiling in Physics-informed Neural Networks: Eine hybride Fourier-Neural-Architektur für ultra-hohe Genauigkeit

打破物理内成形神经网络的精确度上限:超高精确度的混合四面体-神经结构 2507.20929v1

Authors (4): Wei Shan Lee, Chi Kiu Althina Chau, Kei Chon Sio, Kam Ian Leong

Physics-informed neural networks (PINNs) have plateaued at errors of $10^{-3}$-$10^{-4}$ for fourth-order partial differential equations, creating a perceived precision ceiling that limits their adoption in engineering applications. We break through this barrier with a hybrid Fourier-neural architecture for the Euler-Bernoulli beam equation, achieving unprecedented L2 error of $1.94 \times 10^{-7}$-a 17-fold improvement over standard PINNs and (15-500\times) better than traditional numerical methods. Our approach synergistically combines a truncated Fourier series capturing dominant modal behavior with a deep neural network providing adaptive residual corrections. A systematic harmonic optimization study revealed a counter-intuitive discovery: exactly 10 harmonics yield optimal performance, with accuracy catastrophically degrading from $10^{-7}$ to $10^{-1}$ beyond this threshold. The two-phase optimization strategy (Adam followed by L-BFGS) and adaptive weight balancing enable stable ultra-precision convergence. GPU-accelerated implementation achieves sub-30-minute training despite fourth-order derivative complexity. By addressing 12 critical gaps in existing approaches-from architectural rigidity to optimization landscapes-this work demonstrates that ultra-precision is achievable through proper design, opening new paradigms for scientific computing where machine learning can match or exceed traditional numerical methods.

nan

Article 566

Title@2025-07-28 (1): LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

Title: LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

LLM2TEA: Agentischer AI-Designer für Entdeckung mit generativem evolutionären Multitasking

LLM2TEA: 利用产生进化多任务探索的代理AI 设计器 2406.14917v3

Authors (5): Melvin Wong, Jiao Liu, Thiago Rios, Stefan Menzel, Yew Soon Ong

This paper presents LLM2TEA, a Large Language Model (LLM) driven MultiTask Evolutionary Algorithm, representing the first agentic AI designer of its kind operating with generative evolutionary multitasking (GEM). LLM2TEA enables the crossbreeding of solutions from multiple domains, fostering novel solutions that transcend disciplinary boundaries. Of particular interest is the ability to discover designs that are both novel and conforming to real-world physical specifications. LLM2TEA comprises an LLM to generate genotype samples from text prompts describing target objects, a text-to-3D generative model to produce corresponding phenotypes, a classifier to interpret its semantic representations, and a computational simulator to assess its physical properties. Novel LLM-based multitask evolutionary operators are introduced to guide the search towards high-performing, practically viable designs. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, showing 97% to 174% improvements in the diversity of novel designs over the current text-to-3D baseline. Moreover, over 73% of the generated designs outperform the top 1% of designs produced by the text-to-3D baseline in terms of physical performance. The designs produced by LLM2TEA are not only aesthetically creative but also functional in real-world contexts. Several of these designs have been successfully 3D printed, demonstrating the ability of our approach to transform AI-generated outputs into tangible, physical designs. These designs underscore the potential of LLM2TEA as a powerful tool for complex design optimization and discovery, capable of producing novel and physically viable designs.

nan

Article 567

Title@2025-07-28 (1): SEAL: Searching Expandable Architectures for Incremental Learning

Title: SEAL: Searching Expandable Architectures for Incremental Learning

SEAL: Suche nach erweiterbaren Architekturen für inkrementelles Lernen

SEAL: 搜索可扩展建筑以进行递增学习 2505.10457v2

Authors (2): Matteo Gambella, Manuel Roveri

Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while maintaining a lower model size compared to prior methods. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.

nan

Article 568

Title@2025-07-28 (1): Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Title: Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Zero-Shot-Lernen mit Nachfolge Nachbestellen Vorschulung für Compound-Protein-Interaktion

用于复合蛋白相互作用的零热学习和后序重新排序的零热学习预培训 2507.20925v1

Authors (9): Hongzhi Zhang, Zhonglie Liu, Kun Meng, Jiameng Chen, Jia Wu, Bo Du, Di Lin, Yan Che, Wenbin Hu

Given the vastness of chemical space and the ongoing emergence of previously uncharacterized proteins, zero-shot compound-protein interaction (CPI) prediction better reflects the practical challenges and requirements of real-world drug development. Although existing methods perform adequately during certain CPI tasks, they still face the following challenges: (1) Representation learning from local or complete protein sequences often overlooks the complex interdependencies between subsequences, which are essential for predicting spatial structures and binding properties. (2) Dependence on large-scale or scarce multimodal protein datasets demands significant training data and computational resources, limiting scalability and efficiency. To address these challenges, we propose a novel approach that pretrains protein representations for CPI prediction tasks using subsequence reordering, explicitly capturing the dependencies between protein subsequences. Furthermore, we apply length-variable protein augmentation to ensure excellent pretraining performance on small training datasets. To evaluate the model’s effectiveness and zero-shot learning ability, we combine it with various baseline methods. The results demonstrate that our approach can improve the baseline model’s performance on the CPI task, especially in the challenging zero-shot scenario. Compared to existing pre-training models, our model demonstrates superior performance, particularly in data-scarce scenarios where training samples are limited. Our implementation is available at https://github.com/Hoch-Zhang/PSRP-CPI.

nan

Article 569

Title@2025-07-28 (1): Modeling User Behavior from Adaptive Surveys with Supplemental Context

Title: Modeling User Behavior from Adaptive Surveys with Supplemental Context

Modellierung des Benutzerverhaltens aus adaptiven Umfragen mit ergänzendem Kontext

模拟具有补充背景的适应性调查用户行为 2507.20919v1

Authors (3): Aman Shukla, Daniel Patrick Scantlebury, Rishabh Kumar

Modeling user behavior is critical across many industries where understanding preferences, intent, or decisions informs personalization, targeting, and strategic outcomes. Surveys have long served as a classical mechanism for collecting such behavioral data due to their interpretability, structure, and ease of deployment. However, surveys alone are inherently limited by user fatigue, incomplete responses, and practical constraints on their length making them insufficient for capturing user behavior. In this work, we present LANTERN (Late-Attentive Network for Enriched Response Modeling), a modular architecture for modeling user behavior by fusing adaptive survey responses with supplemental contextual signals. We demonstrate the architectural value of maintaining survey primacy through selective gating, residual connections and late fusion via cross-attention, treating survey data as the primary signal while incorporating external modalities only when relevant. LANTERN outperforms strong survey-only baselines in multi-label prediction of survey responses. We further investigate threshold sensitivity and the benefits of selective modality reliance through ablation and rare/frequent attribute analysis. LANTERN’s modularity supports scalable integration of new encoders and evolving datasets. This work provides a practical and extensible blueprint for behavior modeling in survey-centric applications.

nan

Article 570

Title@2025-07-28 (1): Are ECGs enough? Deep learning classification of pulmonary embolism using electrocardiograms

Title: Are ECGs enough? Deep learning classification of pulmonary embolism using electrocardiograms

Genügen EKGs? Deep Learning Klassifikation der Lungenembolie mit Elektrokardiogrammen

ECG 是否足够? 使用心电图对肺栓塞进行深度学习分类 2503.08960v2

Authors (2): Joao D. S. Marques, Arlindo L. Oliveira

Pulmonary embolism is a leading cause of out of hospital cardiac arrest that requires fast diagnosis. While computed tomography pulmonary angiography is the standard diagnostic tool, it is not always accessible. Electrocardiography is an essential tool for diagnosing multiple cardiac anomalies, as it is affordable, fast and available in many settings. However, the availability of public ECG datasets, specially for PE, is limited and, in practice, these datasets tend to be small, making it essential to optimize learning strategies. In this study, we investigate the performance of multiple neural networks in order to assess the impact of various approaches. Moreover, we check whether these practices enhance model generalization when transfer learning is used to translate information learned in larger ECG datasets, such as PTB-XL, CPSC18 and MedalCare-XL, to a smaller, more challenging dataset for PE. By leveraging transfer learning, we analyze the extent to which we can improve learning efficiency and predictive performance on limited data. Code available at https://github.com/joaodsmarques/Are-ECGs-enough-Deep-Learning-Classifiers .

nan

Article 571

Title@2025-07-28 (1): Joint modeling for learning decision-making dynamics in behavioral experiments

Title: Joint modeling for learning decision-making dynamics in behavioral experiments

Gemeinsame Modellierung für das Lernen von Entscheidungsdynamiken in Verhaltensexperimenten

在行为实验中为学习决策动态进行联合建模 2506.02394v2

Authors (3): Yuan Bian, Xingche Guo, Yuanjia Wang

Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the ‘‘engaged’’ state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the ‘‘lapsed’’ state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer decision times when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the ‘‘engaged’’ state but not in the ‘‘lapsed’’ state, providing evidence of brain-behavior association specific to the ‘‘engaged’’ state.

nan

Article 572

Title@2025-07-28 (1): Online hierarchical partitioning of the output space in extreme multi-label data stream

Title: Online hierarchical partitioning of the output space in extreme multi-label data stream

Online-Hierarchische Partitionierung des Ausgaberaums im extremen Multi-Label-Datenstrom

极端多标签数据流中输出空间的在线分层 2507.20894v1

Authors (4): Lara Neves, Afonso Lourenço, Alberto Cano, Goreti Marreiros

Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.

nan

Article 573

Title@2025-07-28 (1): Implementing Adaptations for Vision AutoRegressive Model

Title: Implementing Adaptations for Vision AutoRegressive Model

Implementierung von Anpassungen für das AutoRegressive Vision Modell

实施适应展望自动递减模式 2507.11441v2

Authors (3): Kaif Shaikh, Franziska Boenisch, Adam Dziedzic

Vision AutoRegressive model (VAR) was recently introduced as an alternative to Diffusion Models (DMs) in image generation domain. In this work we focus on its adaptations, which aim to fine-tune pre-trained models to perform specific downstream tasks, like medical data generation. While for DMs there exist many techniques, adaptations for VAR remain underexplored. Similarly, differentially private (DP) adaptations-ones that aim to preserve privacy of the adaptation data-have been extensively studied for DMs, while VAR lacks such solutions. In our work, we implement and benchmark many strategies for VAR, and compare them to state-of-the-art DM adaptation strategies. We observe that VAR outperforms DMs for non-DP adaptations, however, the performance of DP suffers, which necessitates further research in private adaptations for VAR. Code is available at https://github.com/sprintml/finetuning_var_dp.

nan

Article 574

Title@2025-07-28 (1): Testbed and Software Architecture for Enhancing Security in Industrial Private 5G Networks

Title: Testbed and Software Architecture for Enhancing Security in Industrial Private 5G Networks

Testbed und Software-Architektur zur Verbesserung der Sicherheit in industriellen privaten 5G-Netzwerken

加强工业私营5G网络安全测试台和软件架构 2507.20873v1

Authors (6): Song Son Ha, Florian Foerster, Thomas Robert Doebbert, Tim Kittel, Dominik Merli, Gerd Scholl

In the era of Industry 4.0, the growing need for secure and efficient communication systems has driven the development of fifth-generation (5G) networks characterized by extremely low latency, massive device connectivity and high data transfer speeds. However, the deployment of 5G networks presents significant security challenges, requiring advanced and robust solutions to counter increasingly sophisticated cyber threats. This paper proposes a testbed and software architecture to strengthen the security of Private 5G Networks, particularly in industrial communication environments.

nan

Article 575

Title@2025-07-28 (1): Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer’s Disease

Title: Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer’s Disease

Nicht nur Grey Matter: OmniBrain für robuste multimodale Klassifizierung der Alzheimer-Krankheit

不仅灰物质:阿兹海默氏病强力多式联运分类 2507.20872v1

Authors (7): Ahmed Sharshar, Yasser Ashraf, Tameem Bakr, Salma Hassan, Hosam Elgendy, Mohammad Yaqub, Mohsen Guizani

Alzheimer’s disease affects over 55 million people worldwide and is projected to more than double by 2050, necessitating rapid, accurate, and scalable diagnostics. However, existing approaches are limited because they cannot achieve clinically acceptable accuracy, generalization across datasets, robustness to missing modalities, and explainability all at the same time. This inability to satisfy all these requirements simultaneously undermines their reliability in clinical settings. We propose OmniBrain, a multimodal framework that integrates brain MRI, radiomics, gene expression, and clinical data using a unified model with cross-attention and modality dropout. OmniBrain achieves $92.2 \pm 2.4\%$accuracy on the ANMerge dataset and generalizes to the MRI-only ADNI dataset with $70.4 \pm 2.7\%$ accuracy, outperforming unimodal and prior multimodal approaches. Explainability analyses highlight neuropathologically relevant brain regions and genes, enhancing clinical trust. OmniBrain offers a robust, interpretable, and practical solution for real-world Alzheimer’s diagnosis.

nan

Article 576

Title@2025-07-28 (1): \textit{FedABC}: Attention-Based Client Selection for Federated Learning with Long-Term View

Title: \textit{FedABC}: Attention-Based Client Selection for Federated Learning with Long-Term View

\textit{FedABC}: Aufmerksamkeitsbasierte Client-Auswahl für Federated Learning mit Langzeitansicht

\ textit{FedABC}:从长期角度选择关注的联邦学习对象 2507.20871v1

Authors (5): Wenxuan Ye, Xueli An, Junfan Wang, Xueqiang Yan, Georg Carle

Native AI support is a key objective in the evolution of 6G networks, with Federated Learning (FL) emerging as a promising paradigm. FL allows decentralized clients to collaboratively train an AI model without directly sharing their data, preserving privacy. Clients train local models on private data and share model updates, which a central server aggregates to refine the global model and redistribute it for the next iteration. However, client data heterogeneity slows convergence and reduces model accuracy, and frequent client participation imposes communication and computational burdens. To address these challenges, we propose \textit{FedABC}, an innovative client selection algorithm designed to take a long-term view in managing data heterogeneity and optimizing client participation. Inspired by attention mechanisms, \textit{FedABC} prioritizes informative clients by evaluating both model similarity and each model’s unique contributions to the global model. Moreover, considering the evolving demands of the global model, we formulate an optimization problem to guide \textit{FedABC} throughout the training process. Following the ``later-is-better” principle, \textit{FedABC} adaptively adjusts the client selection threshold, encouraging greater participation in later training stages. Extensive simulations on CIFAR-10 demonstrate that \textit{FedABC} significantly outperforms existing approaches in model accuracy and client participation efficiency, achieving comparable performance with 32\% fewer clients than the classical FL algorithm \textit{FedAvg}, and 3.5\% higher accuracy with 2\% fewer clients than the state-of-the-art. This work marks a step toward deploying FL in heterogeneous, resource-constrained environments, thereby supporting native AI capabilities in 6G networks.

nan

Article 577

Title@2025-07-28 (1): Bi-cephalic self-attended model to classify Parkinson’s disease patients with freezing of gait

Title: Bi-cephalic self-attended model to classify Parkinson’s disease patients with freezing of gait

Bi-zephalisches selbstbeaufsichtigtes Modell zur Einstufung von Parkinson-Patienten mit Gangeinfrieren

将帕金森病人的双脑自学分类并冻结步伐的双脑自闭模式 2507.20862v1

Authors (7): Shomoita Jahid Mitin, Rodrigue Rizk, Maximilian Scherer, Thomas Koeglsperger, Daniel Lench, KC Santosh, Arun Singh

Parkinson Disease (PD) often results in motor and cognitive impairments, including gait dysfunction, particularly in patients with freezing of gait (FOG). Current detection methods are either subjective or reliant on specialized gait analysis tools. This study aims to develop an objective, data-driven, and multi-modal classification model to detect gait dysfunction in PD patients using resting-state EEG signals combined with demographic and clinical variables. We utilized a dataset of 124 participants: 42 PD patients with FOG (PDFOG+), 41 without FOG (PDFOG-), and 41 age-matched healthy controls. Features extracted from resting-state EEG and descriptive variables (age, education, disease duration) were used to train a novel Bi-cephalic Self-Attention Model (BiSAM). We tested three modalities: signal-only, descriptive-only, and multi-modal, across different EEG channel subsets (BiSAM-63, -16, -8, and -4). Signal-only and descriptive-only models showed limited performance, achieving a maximum accuracy of 55% and 68%, respectively. In contrast, the multi-modal models significantly outperformed both, with BiSAM-8 and BiSAM-4 achieving the highest classification accuracy of 88%. These results demonstrate the value of integrating EEG with objective descriptive features for robust PDFOG+ detection. This study introduces a multi-modal, attention-based architecture that objectively classifies PDFOG+ using minimal EEG channels and descriptive variables. This approach offers a scalable and efficient alternative to traditional assessments, with potential applications in routine clinical monitoring and early diagnosis of PD-related gait dysfunction.

nan

Article 578

Title@2025-07-28 (1): REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

Title: REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

REDS: Ressourceneffiziente Deep Subnetworks für dynamische Ressourcenbeschränkungen

REDD: 资源效率高的动态资源制约的深层子网络 2311.13349v3

Authors (5): Francesco Corti, Balz Maag, Joachim Schauer, Ulrich Pferschy, Olga Saukh

Deep learning models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models that are not capable to adapt at runtime. In this work, we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS leverages structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieves computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) taking advantage of data cache by re-arranging the order of operations in REDS computational graph. REDS supports conventional deep networks frequently deployed on the edge and provides computational benefits even for small and simple networks. We evaluate REDS on eight benchmark architectures trained on the Visual Wake Words, Google Speech Commands, Fashion-MNIST, CIFAR-10 and ImageNet-1K datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence demonstrating REDS’ outstanding performance in terms of submodels’ test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a fully-connected network on Arduino Nano 33 BLE.

nan

Article 579

Title@2025-07-28 (1): Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Title: Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Geometrie des Neuralen Verstärkungslernens in kontinuierlichen Zustands- und Handlungsräumen

持续状态和行动空间神经强化学习的几何测量 2507.20853v1

Authors (3): Saket Tiwari, Omer Gottesman, George Konidaris

Advances in reinforcement learning (RL) have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens to understand the locally attained set of states. The set of all parametrised policies learnt through a semi-gradient based approach induces a set of attainable states in RL. We show that the training dynamics of a two-layer neural policy induce a low dimensional manifold of attainable states embedded in the high-dimensional nominal state space trained using an actor-critic algorithm. We prove that, under certain conditions, the dimensionality of this manifold is of the order of the dimensionality of the action space. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations.

nan

Article 580

Title@2025-07-28 (1): Learning unitaries with quantum statistical queries

Title: Learning unitaries with quantum statistical queries

Lerneinheiten mit quantenstatistischen Abfragen

附有量数统计查询的学习单 2310.02254v3

Authors (1): Armando Angrisani

We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.

nan

Article 581

Title@2025-07-28 (1): Towards Explainable Deep Clustering for Time Series Data

Title: Towards Explainable Deep Clustering for Time Series Data

Auf dem Weg zu erklärbarem Deep Clustering für Zeitreihendaten

实现时间序列数据可解释的深层群集 2507.20840v1

Authors (3): Udo Schlegel, Gabriel Marques Tavares, Thomas Seidl

Deep clustering uncovers hidden patterns and groups in complex time series data, yet its opaque decision-making limits use in safety-critical settings. This survey offers a structured overview of explainable deep clustering for time series, collecting current methods and their real-world applications. We thoroughly discuss and compare peer-reviewed and preprint papers through application domains across healthcare, finance, IoT, and climate science. Our analysis reveals that most work relies on autoencoder and attention architectures, with limited support for streaming, irregularly sampled, or privacy-preserved series, and interpretability is still primarily treated as an add-on. To push the field forward, we outline six research opportunities: (1) combining complex networks with built-in interpretability; (2) setting up clear, faithfulness-focused evaluation metrics for unsupervised explanations; (3) building explainers that adapt to live data streams; (4) crafting explanations tailored to specific domains; (5) adding human-in-the-loop methods that refine clusters and explanations together; and (6) improving our understanding of how time series clustering models work internally. By making interpretability a primary design goal rather than an afterthought, we propose the groundwork for the next generation of trustworthy deep clustering time series analytics.

nan

Article 582

Title@2025-07-28 (1): BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network

Title: BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network

BuildSTG: Eine Multi-Building-Methode zur Energiebelastungsprognose mit Spatio-Temporal Graph Neural Network

BuildSTG:使用SPATIO-时钟图神经网络的多建筑能源载荷预测方法 2507.20838v1

Authors (6): Yongzheng Liu, Yiming Wang, Po Xu, Yingjie Xu, Yuntian Chen, Dongxiao Zhang

Due to the extensive availability of operation data, data-driven methods show strong capabilities in predicting building energy loads. Buildings with similar features often share energy patterns, reflected by spatial dependencies in their operational data, which conventional prediction methods struggle to capture. To overcome this, we propose a multi-building prediction approach using spatio-temporal graph neural networks, comprising graph representation, graph learning, and interpretation. First, a graph is built based on building characteristics and environmental factors. Next, a multi-level graph convolutional architecture with attention is developed for energy prediction. Lastly, a method interpreting the optimized graph structure is introduced. Experiments on the Building Data Genome Project 2 dataset confirm superior performance over baselines such as XGBoost, SVR, FCNN, GRU, and Naive, highlighting the method’s robustness, generalization, and interpretability in capturing meaningful building similarities and spatial relationships.

nan

Article 583

Title@2025-07-28 (1): First Hallucination Tokens Are Different from Conditional Ones

Title: First Hallucination Tokens Are Different from Conditional Ones

Erste Halluzinationstoken unterscheiden sich von Bedingten

第一次幻觉声调与有条件的音调不同 2507.20836v1

Authors (2): Jakob Snel, Seong Joon Oh

Hallucination, the generation of untruthful content, is one of the major concerns regarding foundational models. Detecting hallucinations at the token level is vital for real-time filtering and targeted correction, yet the variation of hallucination signals within token sequences is not fully understood. Leveraging the RAGTruth corpus with token-level annotations and reproduced logits, we analyse how these signals depend on a token’s position within hallucinated spans, contributing to an improved understanding of token-level hallucination. Our results show that the first hallucinated token carries a stronger signal and is more detectable than conditional tokens. We release our analysis framework, along with code for logit reproduction and metric computation at https://github.com/jakobsnl/RAGTruth_Xtended.

nan

Article 584

Title@2025-07-28 (1): RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge

Title: RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge

RF-Herausforderung: Die datengetriebene Funkfrequenz-Signaltrennungs-Herausforderung

RF 挑战:数据驱动无线电频率信号分离挑战 2409.08839v3

Authors (7): Alejandro Lancho, Amir Weiss, Gary C. F. Lee, Tejas Jayashankar, Binoy Kurien, Yury Polyanskiy, Gregory W. Wornell

We address the critical problem of interference rejection in radio-frequency (RF) signals using a data-driven approach that leverages deep-learning methods. A primary contribution of this paper is the introduction of the RF Challenge, which is a publicly available, diverse RF signal dataset for data-driven analyses of RF signal problems. Specifically, we adopt a simplified signal model for developing and analyzing interference rejection algorithms. For this signal model, we introduce a set of carefully chosen deep learning architectures, incorporating key domain-informed modifications alongside traditional benchmark solutions to establish baseline performance metrics for this intricate, ubiquitous problem. Through extensive simulations involving eight different signal mixture types, we demonstrate the superior performance (in some cases, by two orders of magnitude) of architectures such as UNet and WaveNet over traditional methods like matched filtering and linear minimum mean square error estimation. Our findings suggest that the data-driven approach can yield scalable solutions, in the sense that the same architectures may be similarly trained and deployed for different types of signals. Moreover, these findings further corroborate the promising potential of deep learning algorithms for enhancing communication systems, particularly via interference mitigation. This work also includes results from an open competition based on the RF Challenge, hosted at the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’24).

nan

Article 585

Title@2025-07-28 (1): Combolutional Neural Networks

Title: Combolutional Neural Networks

Kombolutionäre Neuronale Netze

混合神经网络 2507.21202v1

Authors (3): Cameron Churchwell, Minje Kim, Paris Smaragdis

Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.

nan

Article 586

Title@2025-07-28 (1): On the similarity of bandwidth-tuned quantum kernels and classical kernels

Title: On the similarity of bandwidth-tuned quantum kernels and classical kernels

Zur Ähnlichkeit von bandbreitengesteuerten Quantenkernen und klassischen Kerneln

关于带宽调频量子内核和古典内核的相似性 2503.05602v3

Authors (3): Roberto Flórez-Ablan, Marco Roth, Jan Schnabel

Quantum kernels (QK) are widely used in quantum machine learning applications; yet, their potential to surpass classical machine learning methods on classical datasets remains uncertain. This limitation can be attributed to the exponential concentration phenomenon, which can impair generalization. A common strategy to alleviate this is bandwidth tuning, which involves rescaling data points in the quantum model to improve generalization. In this work, we numerically demonstrate that optimal bandwidth tuning results in QKs that closely resemble radial basis function (RBF) kernels, leading to a lack of quantum advantage over classical methods. Moreover, we reveal that the size of optimal bandwidth tuning parameters further simplifies QKs, causing them to behave like polynomial kernels, corresponding to a low-order Taylor approximation of a RBF kernel. We thoroughly investigate this for fidelity quantum kernels and projected quantum kernels using various data encoding circuits across several classification datasets. We provide numerical evidence and derive a simple analytical model that elucidates how bandwidth tuning influences key quantities in classification tasks. Overall, our findings shed light on the mechanisms that render QK methods classically tractable.

nan

Article 587

Title@2025-07-28 (1): Why Flow Matching is Particle Swarm Optimization?

Title: Why Flow Matching is Particle Swarm Optimization?

Warum ist Flow Matching Partikel-Swarm-Optimierung?

为什么花流合拍是粒子蜂群最佳化? 2507.20810v1

Authors (1): Kaichen Ouyang

This paper preliminarily investigates the duality between flow matching in generative models and particle swarm optimization (PSO) in evolutionary computation. Through theoretical analysis, we reveal the intrinsic connections between these two approaches in terms of their mathematical formulations and optimization mechanisms: the vector field learning in flow matching shares similar mathematical expressions with the velocity update rules in PSO; both methods follow the fundamental framework of progressive evolution from initial to target distributions; and both can be formulated as dynamical systems governed by ordinary differential equations. Our study demonstrates that flow matching can be viewed as a continuous generalization of PSO, while PSO provides a discrete implementation of swarm intelligence principles. This duality understanding establishes a theoretical foundation for developing novel hybrid algorithms and creates a unified framework for analyzing both methods. Although this paper only presents preliminary discussions, the revealed correspondences suggest several promising research directions, including improving swarm intelligence algorithms based on flow matching principles and enhancing generative models using swarm intelligence concepts.

nan

Article 588

Title@2025-07-28 (1): Understanding Bias in Perceiving Dimensionality Reduction Projections

Title: Understanding Bias in Perceiving Dimensionality Reduction Projections

Verständnis von Bias in Wahrnehmung von Dimensionalitätsreduktionsprojektionen

理解在认识减少多维度减少预测中的偏见 2507.20805v1

Authors (6): Seoyoung Doh, Hyeon Jeon, Sungbok Shin, Ghulam Jilani Quadri, Nam Wook Kim, Jinwook Seo

Selecting the dimensionality reduction technique that faithfully represents the structure is essential for reliable visual communication and analytics. In reality, however, practitioners favor projections for other attractions, such as aesthetics and visual saliency, over the projection’s structural faithfulness, a bias we define as visual interestingness. In this research, we conduct a user study that (1) verifies the existence of such bias and (2) explains why the bias exists. Our study suggests that visual interestingness biases practitioners’ preferences when selecting projections for analysis, and this bias intensifies with color-encoded labels and shorter exposure time. Based on our findings, we discuss strategies to mitigate bias in perceiving and interpreting DR projections.

nan

Article 589

Title@2025-07-28 (1): Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Title: Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Kritik des unreinen Grundes: Enthüllen des Argumentationsverhaltens medizinischer Großsprachenmodelle

简便理由的批评:统一医学大语言模式的推理行为 2412.15748v2

Authors (2): Shamus Sim, Tyrone Chen

Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, in this work, we adapt the existing concept of reasoning behaviour and articulate its interpretation within the specific context of medical LLMs. We survey and categorise current state-of-the-art approaches for modeling and evaluating reasoning reasoning in medical LLMs. Additionally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. We also outline key open challenges facing the development of Large Reasoning Models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole.

nan

Article 590

Title@2025-07-28 (1): Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Title: Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Ausrichten von großsprachlichen Modellagenten mit rationalen und moralischen Präferenzen: Ein überwachter Feintuning-Ansatz

将大语言示范物剂与理性和道德优先相匹配:受监督的微调办法 2507.20796v1

Authors (3): Wei Lu, Daniel L. Chen, Christian B. Hansen

Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents’ behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.

nan

Article 591

Title@2025-07-28 (1): APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

Title: APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

APTx Neuron: Unified Trainable Neuron Architecture Integrating Activation and Computation

APTx Neuron: 统一可训练的中子建筑综合激活和计算 2507.14270v3

Authors (1): Ravin Kumar

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression. The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant. The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable. We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69% test accuracy within 11 epochs using approximately 332K trainable parameters. The results highlight the superior expressiveness and computational efficiency of the APTx Neuron compared to traditional neurons, pointing toward a new paradigm in unified neuron design and the architectures built upon it.

nan

Article 592

Title@2025-07-28 (1): Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Title: Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Kohärente Online-Road-Topologie Schätzung und Begründung mit Standard-Definitionskarten

与标准定义地图一致的在线道路地形图估计和理由 2507.01397v2

Authors (5): Khanh Son Pham, Christian Witte, Jens Behley, Johannes Betz, Cyrill Stachniss

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner. To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a large margin, highlighting the benefits of our modeling scheme.

nan

Article 593

Title@2025-07-28 (1): Dragonfly: a modular deep reinforcement learning library

Title: Dragonfly: a modular deep reinforcement learning library

Dragonfly: eine modulare Bibliothek für tiefe Verstärkung

龙蝇:一个模块式深强化学习图书馆 2505.03778v2

Authors (4): Jonathan Viquerat, Paul Garnier, Amirhossein Bateni, Elie Hachem

Dragonfly is a deep reinforcement learning library focused on modularity, in order to ease experimentation and developments. It relies on a json serialization that allows to swap building blocks and perform parameter sweep, while minimizing code maintenance. Some of its features are specifically designed for CPU-intensive environments, such as numerical simulations. Its performance on standard agents using common benchmarks compares favorably with the literature.

nan

Article 594

Title@2025-07-28 (1): Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa

Title: Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa

Satelliten-Oberflächen-Raum Maschinen-Learning-Modelle für Reservoir-Speicherschätzung: Regime-Sensitive Evaluation und Einsatz am Staudamm Loskop, Südafrika

在南非Loskop大坝储存量估计:制度敏感评价和行动部署卫星-储量储存量估计的卫星表面区域机械学习模型:系统敏感评价和行动部署 2502.19989v3

Authors (6): Hugo Retief, Kayathri, Vigneswaran, Surajit Ghosh, Mariangel Garcia Andarcia, Chris Dickens

Reliable daily estimates of reservoir storage are pivotal for water allocation and drought response decisions in semiarid regions. Conventional rating curves at Loskop Dam, the primary storage on South Africa’s Olifants River, have become increasingly uncertain owing to sedimentation and episodic drawdown. A 40 year Digital Earth Africa (DEA) surface area archive (1984-2024) fused with gauged water levels to develop data driven volume predictors that operate under a maximum 9.14%, a 90 day drawdown constraint. Four nested feature sets were examined: (i) raw water area, (ii) +a power law “calculated volume” proxy, (iii) +six river geometry metrics, and (iv) +full supply elevation. Five candidate algorithms, Gradient Boosting (GB), Random Forest (RF), Ridge (RI), Lasso (LA) and Elastic Net (EN), were tuned using a 20 draw random search and assessed with a five fold Timeseries Split to eliminate look ahead bias. Prediction errors were decomposed into two regimes: Low (<250 x 10^6 cubic meters) and High (>250 x 10^6 cubic meters) storage regimes. Ridge regression achieved the lowest cross validated RMSE (12.3 x 10^6 cubic meters), outperforming GB by 16% and RF by 7%. In regime terms, Ridge was superior in the Low band (18.0 ver. 22.7 MCM for GB) and tied RF in the High band (~12 MCM). In sample diagnostics showed GB’s apparent dominance (6.8-5.4 MCM) to be an artefact of overfitting. A Ridge meta stacked ensemble combining GB, RF, and Ridge reduced full series RMSE to ~ 11 MCM (~ 3% of live capacity). We recommend (i) GB retrained daily for routine operations, (ii) Ridge for drought early warning, and (iii) the stacked blend for all weather dashboards. Quarterly rolling retraining and regime specific metrics are advised to maintain operational accuracy below the 5% threshold mandated by the Department of Water and Sanitation.

nan

Article 595

Title@2025-07-28 (1): Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

Title: Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

Brancheneinblicke aus dem Vergleich von Deep Learning und GBDT-Modellen für E-Commerce Learning-to-Rank

比较深层学习和电子商务学习到兰克的GBDT模式的工业透视 2507.20753v1

Authors (3): Yunus Lutz, Timo Wilm, Philipp Duwe

In e-commerce recommender and search systems, tree-based models, such as LambdaMART, have set a strong baseline for Learning-to-Rank (LTR) tasks. Despite their effectiveness and widespread adoption in industry, the debate continues whether deep neural networks (DNNs) can outperform traditional tree-based models in this domain. To contribute to this discussion, we systematically benchmark DNNs against our production-grade LambdaMART model. We evaluate multiple DNN architectures and loss functions on a proprietary dataset from OTTO and validate our findings through an 8-week online A/B test. The results show that a simple DNN architecture outperforms a strong tree-based baseline in terms of total clicks and revenue, while achieving parity in total units sold.

nan

Article 596

Title@2025-07-28 (1): Multilingual Self-Taught Faithfulness Evaluators

Title: Multilingual Self-Taught Faithfulness Evaluators

Mehrsprachige Selbstlernende Bewertung von Treue

多语言自学自学信仰评价员 2507.20752v1

Authors (6): Carlo Alfano, Aymen Al Marjani, Zeno Jonke, Amin Mantrach, Saab Mansour, Marcello Federico

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM’s general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.

nan

Article 597

Title@2025-07-28 (1): Finite-Time Analysis of Discrete-Time Stochastic Interpolants

Title: Finite-Time Analysis of Discrete-Time Stochastic Interpolants

Finite-Time-Analyse von diskret-zeitlichen stochastischen Interpolanten

秘密-时时储存的内插刑警的短期分析 2502.09130v2

Authors (4): Yuhao Liu, Yu Chen, Rui Hu, Longbo Huang

The stochastic interpolant framework offers a powerful approach for constructing generative models based on ordinary differential equations (ODEs) or stochastic differential equations (SDEs) to transform arbitrary data distributions. However, prior analyses of this framework have primarily focused on the continuous-time setting, assuming a perfect solution of the underlying equations. In this work, we present the first discrete-time analysis of the stochastic interpolant framework, where we introduce an innovative discrete-time sampler and derive a finite-time upper bound on its distribution estimation error. Our result provides a novel quantification of how different factors, including the distance between source and target distributions and estimation accuracy, affect the convergence rate and also offers a new principled way to design efficient schedules for convergence acceleration. Finally, numerical experiments are conducted on the discrete-time sampler to corroborate our theoretical findings.

nan

Article 598

Title@2025-07-28 (1): Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Title: Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models?

快速困难能否预测为加速理据模型的RL微调而在线化? 2507.04632v3

Authors (6): Yun Qu, Qi Wang, Yixiu Mao, Vincent Tao Hu, Björn Ommer, Xiangyang Ji

Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline’s reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt’s success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.

nan

Article 599

Title@2025-07-28 (1): Learning the Value Systems of Societies from Preferences

Title: Learning the Value Systems of Societies from Preferences

Die Wertsysteme der Gesellschaften aus Präferenzen lernen

学习社会从优惠社会的价值体系 2507.20728v1

Authors (4): Andrés Holgado-Sánchez, Holger Billhardt, Sascha Ossowski, Sara Degli-Esposti

Aligning AI systems with human values and the value-based preferences of various stakeholders (their value systems) is key in ethical AI. In value-aware AI systems, decision-making draws upon explicit computational representations of individual values (groundings) and their aggregation into value systems. As these are notoriously difficult to elicit and calibrate manually, value learning approaches aim to automatically derive computational models of an agent’s values and value system from demonstrations of human behaviour. Nonetheless, social science and humanities literature suggest that it is more adequate to conceive the value system of a society as a set of value systems of different groups, rather than as the simple aggregation of individual value systems. Accordingly, here we formalize the problem of learning the value systems of societies and propose a method to address it based on heuristic deep clustering. The method learns socially shared value groundings and a set of diverse value systems representing a given society by observing qualitative value-based preferences from a sample of agents. We evaluate the proposal in a use case with real data about travelling decisions.

nan

Article 600

Title@2025-07-28 (1): Everything is a Video: Unifying Modalities through Next-Frame Prediction

Title: Everything is a Video: Unifying Modalities through Next-Frame Prediction

Alles ist ein Video: Vereinheitlichen von Modalitäten durch Next-Frame-Vorhersage

一切都是一部视频:通过下框架预测实现统一的方式 2411.10503v2

Authors (7): G. Thomas Hudson, Dean Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed

Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model’s ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models.

nan

Article 601

Title@2025-07-28 (1): Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset

Title: Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset

Verbesserung der tragbaren Wasserhahn-Audioerkennung durch Unterklasse-Annotation im HD-Epic-Datensatz

通过在HD-Epic数据集中分级注解,加强穿戴式塔普水音频探测 2505.20788v2

Authors (2): Robin Burchard, Kristof Van Laerhoven

Wearable human activity recognition has been shown to benefit from the inclusion of acoustic data, as the sounds around a person often contain valuable context. However, due to privacy concerns, it is usually not ethically feasible to record and save microphone data from the device, since the audio could, for instance, also contain private conversations. Rather, the data should be processed locally, which in turn requires processing power and consumes energy on the wearable device. One special use case of contextual information that can be utilized to augment special tasks in human activity recognition is water flow detection, which can, e.g., be used to aid wearable hand washing detection. We created a new label called tap water for the recently released HD-Epic data set, creating 717 hand-labeled annotations of tap water flow, based on existing annotations of the water class. We analyzed the relation of tap water and water in the dataset and additionally trained and evaluated two lightweight classifiers to evaluate the newly added label class, showing that the new class can be learned more easily.

nan

Article 602

Title@2025-07-28 (1): Uncertainty-driven Embedding Convolution

Title: Uncertainty-driven Embedding Convolution

Ungewissheitsgetriebene Einbettung in die Konvolution

由不确定因素驱动的内嵌演变 2507.20718v1

Authors (5): Sungjun Lim, Kangjun Noh, Youngjun Choi, Heeyoung Lee, Kyungwoo Song

Text embeddings are essential components in modern NLP pipelines. While numerous embedding models have been proposed, their performance varies across domains, and no single model consistently excels across all tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble weights based on embedding uncertainty, grounded in a Bayes-optimal solution under a surrogate loss. Additionally, UEC introduces an uncertainty-aware similarity function that directly incorporates uncertainty into similarity scoring. Extensive experiments on retrieval, classification, and semantic similarity benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.

nan

Article 603

Title@2025-07-28 (1): GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution

Title: GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution

GDSR: Global-Detail-Integration durch Dual-Branch-Netzwerk mit Wavelet-Verlusten für remote Sensing Image Super-Resolution

GDSR:通过带有遥感图像超分辨率波浪损失的双层网络实现全球详细一体化 2501.01460v3

Authors (6): Qiwei Zhu, Kai Li, Guojing Zhang, Xiaoying Wang, Jianqiang Huang, Xilai Li

In recent years, deep neural networks, including Convolutional Neural Networks, Transformers, and State Space Models, have achieved significant progress in Remote Sensing Image (RSI) Super-Resolution (SR). However, existing SR methods typically overlook the complementary relationship between global and local dependencies. These methods either focus on capturing local information or prioritize global information, which results in models that are unable to effectively capture both global and local features simultaneously. Moreover, their computational cost becomes prohibitive when applied to large-scale RSIs. To address these challenges, we introduce the novel application of Receptance Weighted Key Value (RWKV) to RSI-SR, which captures long-range dependencies with linear complexity. To simultaneously model global and local features, we propose the Global-Detail dual-branch structure, GDSR, which performs SR by paralleling RWKV and convolutional operations to handle large-scale RSIs. Furthermore, we introduce the Global-Detail Reconstruction Module (GDRM) as an intermediary between the two branches to bridge their complementary roles. In addition, we propose the Dual-Group Multi-Scale Wavelet Loss, a wavelet-domain constraint mechanism via dual-group subband strategy and cross-resolution frequency alignment for enhanced reconstruction fidelity in RSI-SR. Extensive experiments under two degradation methods on several benchmarks, including AID, UCMerced, and RSSRD-QH, demonstrate that GSDR outperforms the state-of-the-art Transformer-based method HAT by an average of 0.09 dB in PSNR, while using only 63% of its parameters and 51% of its FLOPs, achieving an inference speed 3.2 times faster.

nan

Article 604

Title@2025-07-28 (1): Group Sequence Policy Optimization

Title: Group Sequence Policy Optimization

Optimierung der Gruppensequenzpolitik

组序列政策优化 2507.18071v2

Authors (12): Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, Junyang Lin

This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.

nan

Article 605

Title@2025-07-28 (1): Prostate Cancer Classification Using Multimodal Feature Fusion and Explainable AI

Title: Prostate Cancer Classification Using Multimodal Feature Fusion and Explainable AI

Prostatakrebsklassifikation mit multimodaler Feature Fusion und erklärbarer KI

采用多模式特征融合和可解释的AI 的前列腺癌症分类 2507.20714v1

Authors (8): Asma Sadia Khan, Fariba Tasnia Khan, Tanjim Mahmud, Salman Karim Khan, Rishita Chakma, Nahed Sharmen, Mohammad Shahadat Hossain, Karl Andersson

Prostate cancer, the second most prevalent male malignancy, requires advanced diagnostic tools. We propose an explainable AI system combining BERT (for textual clinical notes) and Random Forest (for numerical lab data) through a novel multimodal fusion strategy, achieving superior classification performance on PLCO-NIH dataset (98% accuracy, 99% AUC). While multimodal fusion is established, our work demonstrates that a simple yet interpretable BERT+RF pipeline delivers clinically significant improvements - particularly for intermediate cancer stages (Class 2/3 recall: 0.900 combined vs 0.824 numerical/0.725 textual). SHAP analysis provides transparent feature importance rankings, while ablation studies prove textual features’ complementary value. This accessible approach offers hospitals a balance of high performance (F1=89%), computational efficiency, and clinical interpretability - addressing critical needs in prostate cancer diagnostics.

nan

Article 606

Title@2025-07-28 (1): Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Title: Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Schnelle letzte Konvergenz der SGD im glatten Interpolationssystem

SGD在平滑的内插制度中的汇合 2507.11274v2

Authors (4): Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting – particularly with large (constant) stepsizes – has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $0 < \eta < 2/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(\frac{1}{\eta (2-\beta \eta) T^{1-\beta\eta/2}} + \frac{\eta}{(2-\beta\eta)^2} T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $\smash{O(1/\sqrt T)}$ with $\eta=1/\beta$, improving upon the best-known $\smash{O(T^{-1/4})}$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

nan

Article 607

Title@2025-07-28 (1): Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Title: Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Aufdecken der Illusion von Fairness: Prüfung von Schwachstellen bei Distributionsmanipulationsangriffen

《公平观:审计对分配操纵攻击的脆弱性》 2507.20708v1

Authors (5): Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

Proving the compliance of AI algorithms has become an important challenge with the growing deployment of such algorithms for real-life applications. Inspecting possible biased behaviors is mandatory to satisfy the constraints of the regulations of the EU Artificial Intelligence’s Act. Regulation-driven audits increasingly rely on global fairness metrics, with Disparate Impact being the most widely used. Yet such global measures depend highly on the distribution of the sample on which the measures are computed. We investigate first how to manipulate data samples to artificially satisfy fairness criteria, creating minimally perturbed datasets that remain statistically indistinguishable from the original distribution while satisfying prescribed fairness constraints. Then we study how to detect such manipulation. Our analysis (i) introduces mathematically sound methods for modifying empirical distributions under fairness constraints using entropic or optimal transport projections, (ii) examines how an auditee could potentially circumvent fairness inspections, and (iii) offers recommendations to help auditors detect such data manipulations. These results are validated through experiments on classical tabular datasets in bias detection.

nan

Article 608

Title@2025-07-28 (1): PanoGAN A Deep Generative Model for Panoramic Dental Radiographs

Title: PanoGAN A Deep Generative Model for Panoramic Dental Radiographs

PanoGAN Ein tiefes Generatives Modell für Panoramic Dental Radiographen

PanoGAN 全景牙科放射线的深创模型 2507.21200v1

Authors (6): Soren Pedersen, Sanyam Jain, Mikkel Chavez, Viktor Ladehoff, Bruna Neves de Freitas, Ruben Pauwels

This paper presents the development of a generative adversarial network (GAN) for synthesizing dental panoramic radiographs. Although exploratory in nature, the study aims to address the scarcity of data in dental research and education. We trained a deep convolutional GAN (DCGAN) using a Wasserstein loss with gradient penalty (WGANGP) on a dataset of 2322 radiographs of varying quality. The focus was on the dentoalveolar regions, other anatomical structures were cropped out. Extensive preprocessing and data cleaning were performed to standardize the inputs while preserving anatomical variability. We explored four candidate models by varying critic iterations, feature depth, and the use of denoising prior to training. A clinical expert evaluated the generated radiographs based on anatomical visibility and realism, using a 5-point scale (1 very poor 5 excellent). Most images showed moderate anatomical depiction, although some were degraded by artifacts. A trade-off was observed the model trained on non-denoised data yielded finer details especially in structures like the mandibular canal and trabecular bone, while a model trained on denoised data offered superior overall image clarity and sharpness. These findings provide a foundation for future work on GAN-based methods in dental imaging.

nan

Article 609

Title@2025-07-28 (1): Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data

Title: Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data

Verbesserung des kontinuierlichen Open-World-Lernens unter den Zwängen knapper beschrifteter Daten

在缺少标签数据的限制下改进开放世界持续学习 2502.20974v2

Authors (6): Yujie Li, Xiangkun Wang, Xin Yang, Marcello Bonsangue, Junbo Zhang, Tianrui Li

Open-world continual learning (OWCL) adapts to sequential tasks with open samples, learning knowledge incrementally while preventing forgetting. However, existing OWCL still requires a large amount of labeled data for training, which is often impractical in real-world applications. Given that new categories/entities typically come with limited annotations and are in small quantities, a more realistic situation is OWCL with scarce labeled data, i.e., few-shot training samples. Hence, this paper investigates the problem of open-world few-shot continual learning (OFCL), challenging in (i) learning unbounded tasks without forgetting previous knowledge and avoiding overfitting, (ii) constructing compact decision boundaries for open detection with limited labeled data, and (iii) transferring knowledge about knowns and unknowns and even update the unknowns to knowns once the labels of open samples are learned. In response, we propose a novel OFCL framework that integrates three key components: (1) an instance-wise token augmentation (ITA) that represents and enriches sample representations with additional knowledge, (2) a margin-based open boundary (MOB) that supports open detection with new tasks emerge over time, and (3) an adaptive knowledge space (AKS) that endows unknowns with knowledge for the updating from unknowns to knowns. Finally, extensive experiments show that the proposed OFCL framework outperforms all baselines remarkably with practical importance and reproducibility. The source code is released at https://github.com/liyj1201/OFCL.

nan

Article 610

Title@2025-07-28 (1): Continual Low-Rank Scaled Dot-product Attention

Title: Continual Low-Rank Scaled Dot-product Attention

Continual Low-Rank Scaled Dot-Produkt Achtung

持续低兰克缩放点产品注意 2412.03214v4

Authors (5): Ginés Carreto Picón, Illia Oleksiienko, Lukas Hedegaard, Arian Bakhtiarnia, Alexandros Iosifidis

Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-product Attention, is commonly overlooked. This makes their adoption in applications involving stream data processing with constraints in response latency, computational and memory resources infeasible. Some works have proposed methods to lower the computational cost of Transformers, i.e. low-rank approximations, sparsity in attention, and efficient formulations for Continual Inference. In this paper, we introduce a new formulation of the Scaled Dot-product Attention based on the Nystr"om approximation that is suitable for Continual Inference. In experiments on Online Audio Classification and Online Action Detection tasks, the proposed Continual Scaled Dot-product Attention can lower the number of operations by up to three orders of magnitude compared to the original Transformers while retaining the predictive performance of competing models.

nan

Article 611

Title@2025-07-28 (1): Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Title: Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Architektur-Aware Minimierung (A$^2$M): So finden Sie flache Minima in der neuralen Architektur Suche

尽量减少建筑-软件最小化(2亿澳元):如何在神经建筑搜索中找到Flat Minima 2503.10404v2

Authors (3): Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.

nan

Article 612

Title@2025-07-28 (1): LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Title: LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

LUT Tensor Core: Ein Software-Hardware-Co-Design für LUT-basierte Low-Bit LLM-Inferenz

LUT 信标核心:基于 LUT的低比低 LLM 推断的软件-硬件共同设计 2408.06003v3

Authors (11): Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

Large Language Model (LLM) inference becomes resource-intensive, prompting a shift toward low-bit model weights to reduce the memory footprint and improve efficiency. Such low-bit LLMs necessitate the mixed-precision matrix multiplication (mpGEMM), an important yet underexplored operation involving the multiplication of lower-precision weights with higher-precision activations. Off-the-shelf hardware does not support this operation natively, leading to indirect, thus inefficient, dequantization-based implementations. In this paper, we study the lookup table (LUT)-based approach for mpGEMM and find that a conventional LUT implementation fails to achieve the promised gains. To unlock the full potential of LUT-based mpGEMM, we propose LUT Tensor Core, a software-hardware co-design for low-bit LLM inference. LUT Tensor Core differentiates itself from conventional LUT designs through: 1) software-based optimizations to minimize table precompute overhead and weight reinterpretation to reduce table storage; 2) a LUT-based Tensor Core hardware design with an elongated tiling shape to maximize table reuse and a bit-serial design to support diverse precision combinations in mpGEMM; 3) a new instruction set and compilation optimizations for LUT-based mpGEMM. LUT Tensor Core significantly outperforms existing pure software LUT implementations and achieves a 1.44$\times$ improvement in compute density and energy efficiency compared to previous state-of-the-art LUT-based accelerators.

nan

Article 613

Title@2025-07-28 (1): Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

Title: Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

Neue pivoted Cholesky Zersetzungen für effiziente Gaußschen Prozessableitung

高效高斯进程引力的分解 2507.20678v1

Authors (2): Filip de Roos, Fabio Muratore

The Cholesky decomposition is a fundamental tool for solving linear systems with symmetric and positive definite matrices which are ubiquitous in linear algebra, optimization, and machine learning. Its numerical stability can be improved by introducing a pivoting strategy that iteratively permutes the rows and columns of the matrix. The order of pivoting indices determines how accurately the intermediate decomposition can reconstruct the original matrix, thus is decisive for the algorithm’s efficiency in the case of early termination. Standard implementations select the next pivot from the largest value on the diagonal. In the case of Bayesian nonparametric inference, this strategy corresponds to greedy entropy maximization, which is often used in active learning and design of experiments. We explore this connection in detail and deduce novel pivoting strategies for the Cholesky decomposition. The resulting algorithms are more efficient at reducing the uncertainty over a data set, can be updated to include information about observations, and additionally benefit from a tailored implementation. We benchmark the effectiveness of the new selection strategies on two tasks important to Gaussian processes: sparse regression and inference based on preconditioned iterative solvers. Our results show that the proposed selection strategies are either on par or, in most cases, outperform traditional baselines while requiring a negligible amount of additional computation.

nan

Article 614

Title@2025-07-28 (1): A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games

Title: A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games

Eine multimodale Architektur für Endpoint-Positionsvorhersage in Team-basierten Multiplayer-Spielen

以团队为基础的多玩者运动会中端点定位预测的多模式架构 2507.20670v1

Authors (4): Jonas Peche, Aliaksei Tsishurou, Alexander Zap, Guenter Wallner

Understanding and predicting player movement in multiplayer games is crucial for achieving use cases such as player-mimicking bot navigation, preemptive bot control, strategy recommendation, and real-time player behavior analytics. However, the complex environments allow for a high degree of navigational freedom, and the interactions and team-play between players require models that make effective use of the available heterogeneous input data. This paper presents a multimodal architecture for predicting future player locations on a dynamic time horizon, using a U-Net-based approach for calculating endpoint location probability heatmaps, conditioned using a multimodal feature encoder. The application of a multi-head attention mechanism for different groups of features allows for communication between agents. In doing so, the architecture makes efficient use of the multimodal game state including image inputs, numerical and categorical features, as well as dynamic game data. Consequently, the presented technique lays the foundation for various downstream tasks that rely on future player positions such as the creation of player-predictive bot behavior or player anomaly detection.

nan

Article 615

Title@2025-07-28 (1): MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

Title: MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

MIMII-Agent: LLMs mit Funktionsaufruf für relative Auswertung der anomalen Schallerkennung

MIMII-代理:利用具有相对评估异常声音检测功能的LMs 2507.20666v1

Authors (5): Harsh Purohit, Tomoya Nishida, Kota Dohi, Takashi Endo, Yohei Kawaguchi

This paper proposes a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection (UASD) systems across different machine types, even in the absence of real anomaly sound data. Conventional keyword-based data augmentation methods often produce unrealistic sounds due to their reliance on manually defined labels, limiting scalability as machine types and anomaly patterns diversify. Advanced audio generative models, such as MIMII-Gen, show promise but typically depend on anomalous training data, making them less effective when diverse anomalous examples are unavailable. To address these limitations, we propose a novel synthesis approach leveraging large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions, converting normal machine sounds into diverse and plausible anomalous sounds. We validate this approach by evaluating a UASD system trained only on normal sounds from five machine types, using both real and synthetic anomaly data. Experimental results reveal consistent trends in relative detection difficulty across machine types between synthetic and real anomalies. This finding supports our hypothesis and highlights the effectiveness of the proposed LLM-based synthesis approach for relative evaluation of UASD systems.

nan

Article 616

Title@2025-07-28 (1): Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications

Title: Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications

Verbesserung der kompositorischen LLM-Reasoning mit strukturierten Arbeitsbeziehungen in der interaktiven multimodalen Kommunikation

与互动多模式通信中结构性任务关系有关的理由 2507.21199v1

Authors (12): Xinye Cao, Hongcan Guo, Guoshun Nan, Jiaoyang Cui, Haoting Qian, Yihan Lin, Yilin Peng, Diyang Zhang, Yanzhao Hou, Huici Wu, Xiaofeng Tao, Tony Q. S. Quek

Interactive multimodal applications (IMAs), such as route planning in the Internet of Vehicles, enrich users’ personalized experiences by integrating various forms of data over wireless networks. Recent advances in large language models (LLMs) utilize mixture-of-experts (MoE) mechanisms to empower multiple IMAs, with each LLM trained individually for a specific task that presents different business workflows. In contrast to existing approaches that rely on multiple LLMs for IMAs, this paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks. The two primary challenges include 1) guiding a single LLM to adapt to diverse IMA objectives and 2) ensuring the flexibility and efficiency of the LLM in resource-constrained mobile environments. To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs by constructing a task dependency graph. We partition the learnable parameter matrix of neural layers for each IMA to facilitate LLM composition. Then, we develop a step-by-step fine-tuning procedure guided by task relations, including training, freezing, and masking phases. This allows the LLM to learn to reason among tasks for better adaptation, capturing the latent dependencies between tasks. For the second challenge, we introduce ContextGear, a scheduling strategy to optimize the training procedure of ContextLoRA, aiming to minimize computational and communication costs through a strategic grouping mechanism. Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear. Furthermore, we prototype our proposed paradigm on a real-world wireless testbed, demonstrating its practical applicability for various IMAs. We will release our code to the community.

nan

Article 617

Title@2025-07-28 (1): Towards trustworthy AI in materials mechanics through domain-guided attention

Title: Towards trustworthy AI in materials mechanics through domain-guided attention

Auf dem Weg zu vertrauenswürdiger KI in der Materialmechanik durch domänengeführte Aufmerksamkeit

通过域引导关注在材料机械学方面实现可信赖的AI 2507.20658v1

Authors (3): Jesco Talies, Eric Breitbarth, David Melching

Ensuring the trustworthiness and robustness of deep learning models remains a fundamental challenge, particularly in high-stakes scientific applications. In this study, we present a framework called attention-guided training that combines explainable artificial intelligence techniques with quantitative evaluation and domain-specific priors to guide model attention. We demonstrate that domain specific feedback on model explanations during training can enhance the model’s generalization capabilities. We validate our approach on the task of semantic crack tip segmentation in digital image correlation data which is a key application in the fracture mechanical characterization of materials. By aligning model attention with physically meaningful stress fields, such as those described by Williams’ analytical solution, attention-guided training ensures that the model focuses on physically relevant regions. This finally leads to improved generalization and more faithful explanations.

nan

Article 618

Title@2025-07-28 (1): The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Title: The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Die Feature Speed Formel: ein flexibler Ansatz zur Skalierung von Hyperparametern tiefer neuronaler Netzwerke

特色速度公式:对深神经网络的超强参数进行缩放的灵活办法 2311.18718v4

Authors (2): Lénaïc Chizat, Praneeth Netrapalli

Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of feature updates after one GD step, at any training time, can be expressed via a simple and general \emph{feature speed formula} in terms of this angle $\theta_\ell$, the loss decay, and the magnitude of the backward pass. This angle $\theta_\ell$ is controlled by the conditioning of the layer-to-layer Jacobians and at random initialization, it is determined by the spectrum of a certain kernel, which coincides with the Neural Tangent Kernel when $\ell=\text{depth}$. Given $\theta_\ell$, the feature speed formula provides us with rules to adjust HPs (scales and learning rates) so as to satisfy certain dynamical properties, such as feature learning and loss decay. We investigate the implications of our approach for ReLU MLPs and ResNets in the large width-then-depth limit. Relying on prior work, we show that in ReLU MLPs with iid initialization, the angle degenerates with depth as $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$. In contrast, ResNets with branch scale $O(1/\sqrt{\text{depth}})$ maintain a non-degenerate angle $\cos(\theta_\ell)=\Theta(1)$. We use these insights to recover key properties of known HP scalings and also to introduce a new HP scaling for large depth ReLU MLPs with favorable theoretical properties.

nan

Article 619

Title@2025-07-28 (1): Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

Title: Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

Benchmarking Graph Neural Networks für die Dokumentenlayout-Analyse in öffentlichen Angelegenheiten

用于公共事务文件布局分析的图表神经网络 2505.14699v2

Authors (6): Miguel Lopez-Duran, Julian Fierrez, Aythami Morales, Ruben Tolosana, Oscar Delgado-Mohatar, Alvaro Ortigosa

The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable Document Format. In this work, we benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents. We introduce two graph construction structures: a k-closest-neighbor graph and a fully connected graph, and generate node features via pre-trained text and vision models, thus avoiding manual feature engineering. Three experimental frameworks are evaluated: single-modality (text or visual), concatenated multimodal, and dual-branch multimodal. We evaluated four foundational GNN models and compared them with the baseline. Our experiments are specifically conducted on a rich dataset of public affairs documents that includes more than 20 sources (e.g., regional and national-level official gazettes), 37K PDF documents, with 441K pages in total. Our results demonstrate that GraphSAGE operating on the k-closest-neighbor graph in a dual-branch configuration achieves the highest per-class and overall accuracy, outperforming the baseline in some sources. These findings confirm the importance of local layout relationships and multimodal fusion exploited through GNNs for the analysis of native digital document layouts.

nan

Article 620

Title@2025-07-28 (1): Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation

Title: Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation

Tiefe generative Modelle der Evolution: SNP-Ebene Populationsanpassung durch genomische Verknüpfung

深刻的演变模式:通过基因组联系纳入SNP层次的人口适应 2507.20644v1

Authors (3): Julia Siekiera, Christian Schlötterer, Stefan Kramer

The investigation of allele frequency trajectories in populations evolving under controlled environmental pressures has become a popular approach to study evolutionary processes on the molecular level. Statistical models based on well-defined evolutionary concepts can be used to validate different hypotheses about empirical observations. Despite their popularity, classic statistical models like the Wright-Fisher model suffer from simplified assumptions such as the independence of selected loci along a chromosome and uncertainty about the parameters. Deep generative neural networks offer a powerful alternative known for the integration of multivariate dependencies and noise reduction. Due to their high data demands and challenging interpretability they have, so far, not been widely considered in the area of population genomics. To address the challenges in the area of Evolve and Resequencing experiments (E&R) based on pooled sequencing (Pool-Seq) data, we introduce a deep generative neural network that aims to model a concept of evolution based on empirical observations over time. The proposed model estimates the distribution of allele frequency trajectories by embedding the observations from single nucleotide polymorphisms (SNPs) with information from neighboring loci. Evaluation on simulated E&R experiments demonstrates the model’s ability to capture the distribution of allele frequency trajectories and illustrates the representational power of deep generative models on the example of linkage disequilibrium (LD) estimation. Inspecting the internally learned representations enables estimating pairwise LD, which is typically inaccessible in Pool-Seq data. Our model provides competitive LD estimation in Pool-Seq data high degree of LD when compared to existing methods.

nan

Article 621

Title@2025-07-28 (1): IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

Title: IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

IGNIS: Ein robustes neurales Netzwerk-Framework für eingeschränkte Parameterschätzungen in Archimedischen Copulas

IGNIS:Archimedean Copulas受控参数估计的强力神经网络框架 2505.22518v3

Authors (1): Agnideep Aich

Classical estimators, the cornerstones of statistical inference, face insurmountable challenges when applied to important emerging classes of Archimedean copulas. These models exhibit pathological properties, including numerically unstable densities, non-monotonic parameter-to-dependence mappings, and vanishingly small likelihood gradients, rendering methods like Maximum Likelihood (MLE) and Method of Moments (MoM) inconsistent or computationally infeasible. We introduce IGNIS, a unified neural estimation framework that sidesteps these barriers by learning a direct, robust mapping from data-driven dependency measures to the underlying copula parameter theta. IGNIS utilizes a multi-input architecture and a theory-guided output layer (softplus(z) + 1) to automatically enforce the domain constraint theta_hat >= 1. Trained and validated on four families (Gumbel, Joe, and the numerically challenging A1/A2), IGNIS delivers accurate and stable estimates for real-world financial and health datasets, demonstrating its necessity for reliable inference in modern, complex dependence models where traditional methods fail.

nan

Article 622

Title@2025-07-28 (1): Learning Before Filtering: Real-Time Hardware Learning at the Detector Level

Title: Learning Before Filtering: Real-Time Hardware Learning at the Detector Level

Lernen vor dem Filtern: Echtzeit-Hardware-Lernen auf Detektorebene

在过滤前学习:在探测器一级实时硬件学习 2506.11981v2

Authors (1): Boštjan Maček

Advances in sensor technology and automation have ushered in an era of data abundance, where the ability to identify and extract relevant information in real time has become increasingly critical. Traditional filtering approaches, which depend on a priori knowledge, often struggle to adapt to dynamic or unanticipated data features. Machine learning offers a compelling alternative-particularly when training can occur directly at or near the detector. This paper presents a digital hardware architecture designed for real-time neural network training, specifically optimized for high-throughput data ingestion. The design is described in an implementation-independent manner, with detailed analysis of each architectural component and their performance implications. Through system parameterization, the study explores trade-offs between processing speed, model complexity, and hardware resource utilization. Practical examples illustrate how these parameters affect applicability across various use cases. A proof-of-concept implementation on an FPGA demonstrates in-situ training, confirming that computational accuracy is preserved relative to conventional software-based approaches. Moreover, resource estimates indicate that current-generation FPGAs can train networks of approximately 3,500 neurons per chip. The architecture is both scalable and adaptable, representing a significant advancement toward integrating learning directly within detector systems and enabling a new class of extreme-edge, real-time information processing.

nan

Article 623

Title@2025-07-28 (1): Secure Best Arm Identification in the Presence of a Copycat

Title: Secure Best Arm Identification in the Presence of a Copycat

Sichere Best Arm Identification in der Gegenwart eines Copycat

在有模仿器的情况下安全最佳武器识别 2507.18975v2

Authors (2): Asaf Cohen, Onur Günlü

Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player’s goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax–optimal algorithm identifies the best arm with an $\Omega\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $\Omega\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $\Omega\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.

nan

Article 624

Title@2025-07-28 (1): Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Title: Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Erweiterung großer multimodaler Modelle mit adaptiver Sparsamkeit und KV-Cache-Kompression

加强具有适应性平衡和KV缓存压缩的大型多式模型 2507.20613v1

Authors (4): Te Zhang, Yuheng Li, Junxiang Wang, Lujun Li

Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.

nan

Article 625

Title@2025-07-28 (1): Comparing and Scaling fMRI Features for Brain-Behavior Prediction

Title: Comparing and Scaling fMRI Features for Brain-Behavior Prediction

Vergleich und Skalierung von fMRI-Features für Gehirn-Verhalten-Vorhersage

比较和扩大FMRI 脑行为预测特征 2507.20601v1

Authors (4): Mikkel Schöttner Sieler, Thomas A. W. Bolton, Jagruti Patel, Patric Hagmann

Predicting behavioral variables from neuroimaging modalities such as magnetic resonance imaging (MRI) has the potential to allow the development of neuroimaging biomarkers of mental and neurological disorders. A crucial processing step to this aim is the extraction of suitable features. These can differ in how well they predict the target of interest, and how this prediction scales with sample size and scan time. Here, we compare nine feature subtypes extracted from resting-state functional MRI recordings for behavior prediction, ranging from regional measures of functional activity to functional connectivity (FC) and metrics derived with graph signal processing (GSP), a principled approach for the extraction of structure-informed functional features. We study 979 subjects from the Human Connectome Project Young Adult dataset, predicting summary scores for mental health, cognition, processing speed, and substance use, as well as age and sex. The scaling properties of the features are investigated for different combinations of sample size and scan time. FC comes out as the best feature for predicting cognition, age, and sex. Graph power spectral density is the second best for predicting cognition and age, while for sex, variability-based features show potential as well. When predicting sex, the low-pass graph filtered coupled FC slightly outperforms the simple FC variant. None of the other targets were predicted significantly. The scaling results point to higher performance reserves for the better-performing features. They also indicate that it is important to balance sample size and scan time when acquiring data for prediction studies. The results confirm FC as a robust feature for behavior prediction, but also show the potential of GSP and variability-based measures. We discuss the implications for future prediction studies in terms of strategies for acquisition and sample composition.

nan

Article 626

Title: Distributional Soft Actor-Critic with Three Refinements

Verteilungsweiche Aktor-Kritik mit drei Veredelungen

配发软软软动作器和三精度 2310.05858v6

Authors (9): Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li, Chang Liu, Ya-Qin Zhang, Bo Cheng, Keqiang Li

Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution. Despite its effectiveness, DSACv1 faces challenges such as training instability and sensitivity to reward scaling, caused by high variance in critic gradients due to return randomness. In this paper, we introduce three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy: expected value substitution, twin value distribution learning, and variance-based critic gradient adjustment. The enhanced algorithm, termed DSAC with Three refinements (DSAC-T or DSACv2), is systematically evaluated across a diverse set of benchmark tasks. Without the need for task-specific hyperparameter tuning, DSAC-T consistently matches or outperforms leading model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T ensures a stable learning process and maintains robust performance across varying reward scales. Its effectiveness is further demonstrated through real-world application in controlling a wheeled robot, highlighting its potential for deployment in practical robotic tasks.

nan

Article 627

Title@2025-07-28 (1): PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation

Title: PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation

PhaseNAS: Sprachmodellgestützte Architektursuche mit dynamischer Phasenanpassung

SqimNAS: 具有动态阶段适应性的语言模式驱动器建筑搜索 2507.20592v1

Authors (4): Fei Kong, Xiaohan Shan, Yanwei Hu, Jianmin Li

Neural Architecture Search (NAS) is challenged by the trade-off between search space exploration and efficiency, especially for complex tasks. While recent LLM-based NAS methods have shown promise, they often suffer from static search strategies and ambiguous architecture representations. We propose PhaseNAS, an LLM-based NAS framework with dynamic phase transitions guided by real-time score thresholds and a structured architecture template language for consistent code generation. On the NAS-Bench-Macro benchmark, PhaseNAS consistently discovers architectures with higher accuracy and better rank. For image classification (CIFAR-10/100), PhaseNAS reduces search time by up to 86% while maintaining or improving accuracy. In object detection, it automatically produces YOLOv8 variants with higher mAP and lower resource cost. These results demonstrate that PhaseNAS enables efficient, adaptive, and generalizable NAS across diverse vision tasks.

nan

Article 628

Title@2025-07-28 (1): Enhancing generalization in high energy physics using white-box adversarial attacks

Title: Enhancing generalization in high energy physics using white-box adversarial attacks

Verbesserung der Verallgemeinerung in der Hochenergiephysik mit White-Box-Angriffen

利用白箱对抗性攻击加强高能物理学的普及化 2411.09296v3

Authors (4): Franck Rothen, Samuel Klein, Matthew Leigh, Tobias Golling

Machine learning is becoming increasingly popular in the context of particle physics. Supervised learning, which uses labeled Monte Carlo (MC) simulations, remains one of the most widely used methods for discriminating signals beyond the Standard Model. However, this paper suggests that supervised models may depend excessively on artifacts and approximations from Monte Carlo simulations, potentially limiting their ability to generalize well to real data. This study aims to enhance the generalization properties of supervised models by reducing the sharpness of local minima. It reviews the application of four distinct white-box adversarial attacks in the context of classifying Higgs boson decay signals. The attacks are divided into weight-space attacks and feature-space attacks. To study and quantify the sharpness of different local minima, this paper presents two analysis methods: gradient ascent and reduced Hessian eigenvalue analysis. The results show that white-box adversarial attacks significantly improve generalization performance, albeit with increased computational complexity.

nan

Article 629

Title@2025-07-28 (1): Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only “Better or Worse” Expert Feedback

Title: Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only “Better or Worse” Expert Feedback

Beyond Manual Annotation: Ein Mensch-AI-Kollaboratives Framework für medizinische Bildsegmentierung mit nur “Besser oder schlechter” Experten-Feedback

超越手册说明:仅使用“更好或更坏”专家反馈的人类-大赦国际医疗图像分割协作框架 2507.05815v2

Authors (1): Yizhe Zhang

Manual annotation of medical images is a labor-intensive and time-consuming process, posing a significant bottleneck in the development and deployment of robust medical imaging AI systems. This paper introduces a novel hands-free Human-AI collaborative framework for medical image segmentation that substantially reduces the annotation burden by eliminating the need for explicit manual pixel-level labeling. The core innovation lies in a preference learning paradigm, where human experts provide minimal, intuitive feedback – simply indicating whether an AI-generated segmentation is better or worse than a previous version. The framework comprises four key components: (1) an adaptable foundation model (FM) for feature extraction, (2) label propagation based on feature similarity, (3) a clicking agent that learns from human better-or-worse feedback to decide where to click and with which label, and (4) a multi-round segmentation learning procedure that trains a state-of-the-art segmentation network using pseudo-labels generated by the clicking agent and FM-based label propagation. Experiments on three public datasets demonstrate that the proposed approach achieves competitive segmentation performance using only binary preference feedback, without requiring experts to directly manually annotate the images.

nan

Article 630

Title@2025-07-28 (1): AutoLibra: Agent Metric Induction from Open-Ended Feedback

Title: AutoLibra: Agent Metric Induction from Open-Ended Feedback

AutoLibra: Agent Metric Induktion aus offenem Feedback

AutoLibra: 不限名额反馈的计量介绍代理 2505.02820v2

Authors (6): Hao Zhu, Phil Cuvin, Xinkai Yu, Charlotte Ka Yee Yan, Jason Zhang, Diyi Yang

Agents are predominantly evaluated and optimized via task success metrics, which are coarse, rely on manual design from experts, and fail to reward intermediate emergent behaviors. We propose AutoLibra, a framework for agent evaluation, that transforms open-ended human feedback e.g. “If you find that the button is disabled, don’t click it again”, or “This agent has too much autonomy to decide what to do on its own” into metrics for evaluating fine-grained behaviors in agent trajectories. AutoLibra accomplishes this by grounding feedback to an agent’s behavior, clustering similar positive and negative behaviors, and creating concrete metrics with clear definitions and concrete examples, which can be used for prompting LLM-as-a-Judge as evaluators. We further propose two meta-metrics to evaluate the alignment of a set of (induced) metrics with open feedback: “coverage” and “redundancy”. Through optimizing these meta-metrics, we experimentally demonstrate AutoLibra’s ability to induce more concrete agent evaluation metrics than the ones proposed in previous agent evaluation benchmarks and discover new metrics to analyze agents. We also present two applications of AutoLibra in agent improvement: First, we show that AutoLibra-induced metrics serve as better prompt-engineering targets than the task success rate on a wide range of text game tasks, improving agent performance over baseline by a mean of 20%. Second, we show that AutoLibra can iteratively select high-quality fine-tuning data for web navigation agents. Our results suggest that AutoLibra is a powerful task-agnostic tool for evaluating and improving language agents.

nan

Article 631

Title@2025-07-28 (1): GASPnet: Global Agreement to Synchronize Phases

Title: GASPnet: Global Agreement to Synchronize Phases

GASPnet: Globales Abkommen zur Synchronisierung von Phasen

GASPnet:同步阶段全球协定 2507.16674v2

Authors (4): Andrea Alamia, Sabine Muzellec, Thomas Serre, Rufin VanRullen

In recent years, Transformer architectures have revolutionized most fields of artificial intelligence, relying on an attentional mechanism based on the agreement between keys and queries to select and route information in the network. In previous work, we introduced a novel, brain-inspired architecture that leverages a similar implementation to achieve a global ‘routing by agreement’ mechanism. Such a system modulates the network’s activity by matching each neuron’s key with a single global query, pooled across the entire network. Acting as a global attentional system, this mechanism improves noise robustness over baseline levels but is insufficient for multi-classification tasks. Here, we improve on this work by proposing a novel mechanism that combines aspects of the Transformer attentional operations with a compelling neuroscience theory, namely, binding by synchrony. This theory proposes that the brain binds together features by synchronizing the temporal activity of neurons encoding those features. This allows the binding of features from the same object while efficiently disentangling those from distinct objects. We drew inspiration from this theory and incorporated angular phases into all layers of a convolutional network. After achieving phase alignment via Kuramoto dynamics, we use this approach to enhance operations between neurons with similar phases and suppresses those with opposite phases. We test the benefits of this mechanism on two datasets: one composed of pairs of digits and one composed of a combination of an MNIST item superimposed on a CIFAR-10 image. Our results reveal better accuracy than CNN networks, proving more robust to noise and with better generalization abilities. Overall, we propose a novel mechanism that addresses the visual binding problem in neural networks by leveraging the synergy between neuroscience and machine learning.

nan

Article 632

Title@2025-07-28 (1): A note on the Artstein-Avidan-Milman’s generalized Legendre transforms

Title: A note on the Artstein-Avidan-Milman’s generalized Legendre transforms

Ein Hinweis auf Artstein-Avidan-Milmans generalisierte Legende transformiert

关于Artstein-Avidan-Milman的通用传说变换的注解 2507.20577v1

Authors (1): Frank Nielsen

Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661-674] characterized invertible reverse-ordering transforms on the space of lower-semi-continuous extended real-valued convex functions as affine deformations of the ordinary Legendre transform. In this note, we prove that all those generalized Legendre transforms on functions correspond to the ordinary Legendre transform on dually corresponding affine-deformed functions. That is, generalized convex conjugates are convex conjugates of affine-deformed functions. We conclude this note by sketching how this result can be interpreted from the lens of information geometry.

nan

Article 633

Title@2025-07-28 (1): Fusing CFD and measurement data using transfer learning

Title: Fusing CFD and measurement data using transfer learning

Zusammenführen von CFD- und Messdaten mittels Transfer-Lernen

利用转让学习法解冻家庭发展筹资和测量数据 2507.20576v1

Authors (2): Alexander Barklage, Philipp Bekemeyer

Aerodynamic analysis during aircraft design usually involves methods of varying accuracy and spatial resolution, which all have their advantages and disadvantages. It is therefore desirable to create data-driven models which effectively combine these advantages. Such data fusion methods for distributed quantities mainly rely on proper orthogonal decomposition as of now, which is a linear method. In this paper, we introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning. The network training accounts for the heterogeneity of the data, as simulation data usually features a high spatial resolution, while measurement data is sparse but more accurate. In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities. The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model. This approach is applied to a multilayer perceptron architecture and shows significant improvements over the established method based on proper orthogonal decomposition by producing more physical solutions near nonlinearities. In addition, the neural network provides solutions at arbitrary flow conditions, thus making the model useful for flight mechanical design, structural sizing, and certification. As the proposed training strategy is very general, it can also be applied to more complex neural network architectures in the future.

nan

Article 634

Title@2025-07-28 (1): Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Title: Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Reminiszenz-Angriff auf Residuals: Ausnutzung der ungefähren Maschine Unlearning für die Privatsphäre

对残余物的重复记忆攻击:利用近似机器不学习促进隐私 2507.20573v1

Authors (8): Yaxin Xiao, Qingqing Ye, Li Hu, Huadi Zheng, Haibo Hu, Zi Liang, Haoyang Li, Yijie Jiao

Machine unlearning enables the removal of specific data from ML models to uphold the right to be forgotten. While approximate unlearning algorithms offer efficient alternatives to full retraining, this work reveals that they fail to adequately protect the privacy of unlearned data. In particular, these algorithms introduce implicit residuals which facilitate privacy attacks targeting at unlearned data. We observe that these residuals persist regardless of model architectures, parameters, and unlearning algorithms, exposing a new attack surface beyond conventional output-based leakage. Based on this insight, we propose the Reminiscence Attack (ReA), which amplifies the correlation between residuals and membership privacy through targeted fine-tuning processes. ReA achieves up to 1.90x and 1.12x higher accuracy than prior attacks when inferring class-wise and sample-wise membership, respectively. To mitigate such residual-induced privacy risk, we develop a dual-phase approximate unlearning framework that first eliminates deep-layer unlearned data traces and then enforces convergence stability to prevent models from “pseudo-convergence”, where their outputs are similar to retrained models but still preserve unlearned residuals. Our framework works for both classification and generation tasks. Experimental evaluations confirm that our approach maintains high unlearning efficacy, while reducing the adaptive privacy attack accuracy to nearly random guess, at the computational cost of 2-12% of full retraining from scratch.

nan

Article 635

Title@2025-07-28 (1): DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

Title: DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

DAG-AFL: 贫化的以环状图为基础的非同步联邦学习 2507.20571v1

Authors (7): Shuaipeng Zhang, Lanju Kong, Yixin Zhang, Wei He, Yongqing Zheng, Han Yu, Lizhen Cui

Due to the distributed nature of federated learning (FL), the vulnerability of the global model and the need for coordination among many client devices pose significant challenges. As a promising decentralized, scalable and secure solution, blockchain-based FL methods have attracted widespread attention in recent years. However, traditional consensus mechanisms designed for Proof of Work (PoW) similar to blockchain incur substantial resource consumption and compromise the efficiency of FL, particularly when participating devices are wireless and resource-limited. To address asynchronous client participation and data heterogeneity in FL, while limiting the additional resource overhead introduced by blockchain, we propose the Directed Acyclic Graph-based Asynchronous Federated Learning (DAG-AFL) framework. We develop a tip selection algorithm that considers temporal freshness, node reachability and model accuracy, with a DAG-based trusted verification strategy. Extensive experiments on 3 benchmarking datasets against eight state-of-the-art approaches demonstrate that DAG-AFL significantly improves training efficiency and model accuracy by 22.7% and 6.5% on average, respectively.

nan

Article 636

Title@2025-07-28 (1): CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Title: CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

CUDA-L1: Verbesserung der CUDA-Optimierung durch kontrastives Verstärkungslernen

CUDA-L1:通过反竞争强化学习改进CUDA优化 2507.14111v4

Authors (5): Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum

The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization that employs a novel contrastive RL algorithm. CUDA-L1 achieves significant performance improvements on the CUDA optimization task: trained on NVIDIA A100, it delivers an average speedup of x3.12 with a median speedup of x1.42 across all 250 CUDA kernels of KernelBench, with peak speedups reaching x120. Furthermore, the model also demonstrates portability across GPU architectures, achieving average speedups of x3.12 on L40, x2.50 on RTX 3090, x2.39 on H100, and x2.37 on H20 despite being optimized specifically for A100. The capabilities of CUDA-L1 demonstrate that, RL can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources. We also identify important challenges posed by training RL models for tasks like CUDA development, where RL often learns to exploit loopholes in reward functions rather than solve the intended optimization problems. By identifying these failure modes and analyzing their root causes, we develop practical methods for creating more robust training procedures that prevent reward hacking.

nan

Article 637

Title@2025-07-28 (1): Statistical Inference for Differentially Private Stochastic Gradient Descent

Title: Statistical Inference for Differentially Private Stochastic Gradient Descent

Statistische Schlussfolgerung für unterschiedliche private stochastische Gradientenabstieg

不同私家私家稀有渐变后代的统计推推法 2507.20560v1

Authors (3): Xintao Xia, Linjun Zhang, Zhanrui Cai

Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.

nan

Article 638

Title@2025-07-28 (1): The Effect of Data Poisoning on Counterfactual Explanations

Title: The Effect of Data Poisoning on Counterfactual Explanations

Die Auswirkung von Datenvergiftungen auf gegenfaktische Erklärungen

数据中毒对反事实解释的影响 2402.08290v4

Authors (4): André Artelt, Shubham Sharma, Freddy Lecué, Barbara Hammer

Counterfactual explanations are a widely used approach for examining the predictions of black-box systems. They can offer the opportunity for computational recourse by suggesting actionable changes on how to alter the input to obtain a different (i.e., more favorable) system output. However, recent studies have pointed out their susceptibility to various forms of manipulation. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, a sub-group of instances, or globally for all instances. In this context, we formally introduce and characterize data poisonings, from which we derive and investigate a general data poisoning mechanism. We demonstrate the impact of such data poisoning in the critical real-world application of explaining event detections in water distribution networks. Additionally, we conduct an extensive empirical evaluation, demonstrating that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning. Furthermore, we find that existing defense methods fail to detect those poisonous samples.

nan

Article 639

Title@2025-07-28 (1): Uncovering Gradient Inversion Risks in Practical Language Model Training

Title: Uncovering Gradient Inversion Risks in Practical Language Model Training

Uncovering Gradient Inversion Risiken in der praktischen Sprachmodellausbildung

实用语言示范培训中未覆盖的渐变风险 2507.21198v1

Authors (7): Xinguo Feng, Zhongkui Ma, Zihan Wang, Eu Joe Chegne, Mengyao Ma, Alsharif Abuadbba, Guangdong Bai

The gradient inversion attack has been demonstrated as a significant privacy threat to federated learning (FL), particularly in continuous domains such as vision models. In contrast, it is often considered less effective or highly dependent on impractical training settings when applied to language models, due to the challenges posed by the discrete nature of tokens in text data. As a result, its potential privacy threats remain largely underestimated, despite FL being an emerging training method for language models. In this work, we propose a domain-specific gradient inversion attack named Grab (gradient inversion with hybrid optimization). Grab features two alternating optimization processes to address the challenges caused by practical training settings, including a simultaneous optimization on dropout masks between layers for improved token recovery and a discrete optimization for effective token sequencing. Grab can recover a significant portion (up to 92.9% recovery rate) of the private training data, outperforming the attack strategy of utilizing discrete optimization with an auxiliary model by notable improvements of up to 28.9% recovery rate in benchmark settings and 48.5% recovery rate in practical settings. Grab provides a valuable step forward in understanding this privacy threat in the emerging FL training mode of language models.

nan

Article 640

Title@2025-07-28 (1): Improving Group Fairness in Tensor Completion via Imbalance Mitigating Entity Augmentation

Title: Improving Group Fairness in Tensor Completion via Imbalance Mitigating Entity Augmentation

Verbesserung der Gruppengerechtigkeit in der Tensor-Vervollständigung durch Imbalance Mitigating Entity Augmentation

通过不平衡的减轻实体增长扩大,改善集团公平性 2507.20542v1

Authors (3): Dawon Ahn, Jun-Gi Jang, Evangelos E. Papalexakis

Group fairness is important to consider in tensor decomposition to prevent discrimination based on social grounds such as gender or age. Although few works have studied group fairness in tensor decomposition, they suffer from performance degradation. To address this, we propose STAFF(Sparse Tensor Augmentation For Fairness) to improve group fairness by minimizing the gap in completion errors of different groups while reducing the overall tensor completion error. Our main idea is to augment a tensor with augmented entities including sufficient observed entries to mitigate imbalance and group bias in the sparse tensor. We evaluate \method on tensor completion with various datasets under conventional and deep learning-based tensor models. STAFF consistently shows the best trade-off between completion error and group fairness; at most, it yields 36% lower MSE and 59% lower MADE than the second-best baseline.

nan

Article 641

Title@2025-07-28 (1): NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

Title: NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

NbBench: Benchmarking-Sprachenmodelle für umfassende Nanobody-Aufgaben

NbBench:全面纳米机构任务的语言模式基准 2505.02022v2

Authors (2): Yiming Zhang, Koji Tsuda

Nanobodies – single-domain antibody fragments derived from camelid heavy-chain-only antibodies – exhibit unique advantages such as compact size, high stability, and strong binding affinity, making them valuable tools in therapeutics and diagnostics. While recent advances in pretrained protein and antibody language models (PPLMs and PALMs) have greatly enhanced biomolecular understanding, nanobody-specific modeling remains underexplored and lacks a unified benchmark. To address this gap, we introduce NbBench, the first comprehensive benchmark suite for nanobody representation learning. Spanning eight biologically meaningful tasks across nine curated datasets, NbBench encompasses structure annotation, binding prediction, and developability assessment. We systematically evaluate eleven representative models – including general-purpose protein LMs, antibody-specific LMs, and nanobody-specific LMs – in a frozen setting. Our analysis reveals that antibody language models excel in antigen-related tasks, while performance on regression tasks such as thermostability and affinity remains challenging across all models. Notably, no single model consistently outperforms others across all tasks. By standardizing datasets, task definitions, and evaluation protocols, NbBench offers a reproducible foundation for assessing and advancing nanobody modeling.

nan

Article 642

Title@2025-07-28 (1): Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

Title: Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

Action-Liste Verstärkungs-Lernsyndrom-Dekodierung für Binary Linear Block Codes

二元线性线性块块代码的标记 2507.17893v2

Authors (2): Milad Taghipour, Bane Vasic

This paper explores the application of reinforcement learning techniques to enhance the performance of decoding of linear block codes based on flipping bits and finding optimal decisions. We describe the methodology for mapping the iterative decoding process into Markov Decision Processes (MDPs) and propose different methods to reduce the number of states in the MDP. A truncated MDP is proposed to reduce the number of states in the MDP by learning a Hamming ball with a specified radius around codewords. We then propose a general scheme for reinforcement learning based decoders applicable to any class of codes to improve the performance of decoders. We call this scheme an action-list decoding. We design an action-list decoder based on the Deep-Q network values that substantially enhance performance. We also get benefit of automorphism group of code to further improve the code performance. Additionally, we propose a feedback-based method to exploit and enhance the performance of existing high-performing decoders by applying reinforcement learning algorithms after the existing decoders. These approaches effectively reduces the complexity of the reinforcement learning block. Finally, we present experimental results for the Low-Density Parity Check (LDPC) codes over the Binary Symmetric Channel (BSC) to demonstrate the efficiency of the proposed methods.

nan

Article 643

Title@2025-07-28 (1): MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Title: MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

MagicMotion: Kontrollierbare Video-Generation mit Dense-to-Spar-Trajektorie-Anleitung

魔力运动:可控视频生成并配有高到分轨迹指导 2503.16421v2

Authors (6): Quanhao Li, Zhen Xing, Rui Wang, Hui Zhang, Qi Dai, Zuxuan Wu

Recent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce MagicMotion, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce MagicBench, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at https://quanhaol.github.io/magicmotion-site.

nan

Article 644

Title@2025-07-28 (1): Kimi K2: Open Agentic Intelligence

Title: Kimi K2: Open Agentic Intelligence

Kimi K2: Offene Agentische Intelligenz

Kimi K2:开放特工情报 2507.20534v1

Authors (169): Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Hao Hu, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Chao Hong, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, Chengyin Liu, Chenyu Liu, Hongzhang Liu, Jingyuan Liu, Junqi Liu, Liang Liu, Shaowei Liu, T. Y. Liu, Tianwei Liu, Weizhou Liu, Yangyang Liu, Yibo Liu, Yiping Liu, Yue Liu, Zhengying Liu, Enzhe Lu, Lijun Lu, Shengling Ma, Xinyu Ma, Yingwei Ma, Shaoguang Mao, Jie Mei, Xin Men, Yibo Miao, Siyuan Pan, Yebo Peng, Ruoyu Qin, Bowen Qu, Zeyu Shang, Lidong Shi, Shengyuan Shi, Feifan Song, Jianlin Su, Zhengyuan Su, Xinjie Sun, Flood Sung, Heyi Tang, Jiawen Tao, Qifeng Teng, Chensi Wang, Dinglu Wang, Feng Wang, Haiming Wang, Jianzhou Wang, Jiaxing Wang, Jinhong Wang, Shengjie Wang, Shuyi Wang, Yao Wang, Yejie Wang, Yiqin Wang, Yuxin Wang, Yuzhi Wang, Zhaoji Wang, Zhengtao Wang, Zhexu Wang, Chu Wei, Qianqian Wei, Wenhao Wu, Xingzhe Wu, Yuxin Wu, Chenjun Xiao, Xiaotong Xie, Weimin Xiong, Boyu Xu, Jing Xu, Jinjing Xu, L. H. Xu, Lin Xu, Suting Xu, Weixin Xu, Xinran Xu, Yangchuan Xu, Ziyao Xu, Junjie Yan, Yuzi Yan, Xiaofei Yang, Ying Yang, Zhen Yang, Zhilin Yang, Zonghan Yang, Haotian Yao, Xingcheng Yao, Wenjie Ye, Zhuorui Ye, Bohong Yin, Longhui Yu, Enming Yuan, Hongbang Yuan, Mengjie Yuan, Haobing Zhan, Dehao Zhang, Hao Zhang, Wanlu Zhang, Xiaobin Zhang, Yangkun Zhang, Yizhi Zhang, Yongting Zhang, Yu Zhang, Yutao Zhang, Yutong Zhang, Zheng Zhang, Haotian Zhao, Yikai Zhao, Huabin Zheng, Shaojie Zheng, Jianren Zhou, Xinyu Zhou, Zaida Zhou, Zhen Zhu, Weiyu Zhuang, Xinxing Zu

We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike. During post-training, K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline and a joint reinforcement learning (RL) stage, where the model improves its capabilities through interactions with real and synthetic environments. Kimi K2 achieves state-of-the-art performance among open-source non-thinking models, with strengths in agentic capabilities. Notably, K2 obtains 66.1 on Tau2-Bench, 76.5 on ACEBench (En), 65.8 on SWE-Bench Verified, and 47.3 on SWE-Bench Multilingual – surpassing most open and closed-sourced baselines in non-thinking settings. It also exhibits strong capabilities in coding, mathematics, and reasoning tasks, with a score of 53.7 on LiveCodeBench v6, 49.5 on AIME 2025, 75.1 on GPQA-Diamond, and 27.1 on OJBench, all without extended thinking. These results position Kimi K2 as one of the most capable open-source large language models to date, particularly in software engineering and agentic tasks. We release our base and post-trained model checkpoints to facilitate future research and applications of agentic intelligence.

nan

Article 645

Title@2025-07-28 (1): Kernel Learning for Sample Constrained Black-Box Optimization

Title: Kernel Learning for Sample Constrained Black-Box Optimization

Kernel-Lernen für Probe eingeschränkte Black-Box-Optimierung

用于样本的内核学习 2507.20533v1

Authors (3): Rajalaxmi Rajagopalan, Yu-Lin Wei, Romit Roy Choudhury

Black box optimization (BBO) focuses on optimizing unknown functions in high-dimensional spaces. In many applications, sampling the unknown function is expensive, imposing a tight sample budget. Ongoing work is making progress on reducing the sample budget by learning the shape/structure of the function, known as kernel learning. We propose a new method to learn the kernel of a Gaussian Process. Our idea is to create a continuous kernel space in the latent space of a variational autoencoder, and run an auxiliary optimization to identify the best kernel. Results show that the proposed method, Kernel Optimized Blackbox Optimization (KOBO), outperforms state of the art by estimating the optimal at considerably lower sample budgets. Results hold not only across synthetic benchmark functions but also in real applications. We show that a hearing aid may be personalized with fewer audio queries to the user, or a generative model could converge to desirable images from limited user ratings.

nan

Article 646

Title@2025-07-28 (1): Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

Title: Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

Versehentliche Sicherheitslücke: Faktoren bei Feinsteuerung, die das Modell schützen

意外脆弱性:改变模式保障保障措施的微调因素 2505.16789v2

Authors (4): Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin

As large language models (LLMs) gain popularity, their vulnerability to adversarial attacks emerges as a primary concern. While fine-tuning models on domain-specific datasets is often employed to improve model performance, it can inadvertently introduce vulnerabilities within the underlying model. In this work, we investigate Accidental Vulnerability, unexpected vulnerabilities arising from characteristics of fine-tuning data. We begin by identifying potential correlation factors such as linguistic features, semantic similarity, and toxicity across multiple experimental datasets. We then evaluate the adversarial robustness of these fine-tuned models, analyzing persona shifts and interpretability traits to understand how dataset factors contribute to attack success rates. Lastly, we explore causal relationships that offer new insights into adversarial defense strategies, highlighting the crucial role of dataset design in preserving model alignment. Our code is available at https://github.com/psyonp/accidental_vulnerability.

nan

Article 647

Title@2025-07-28 (1): AQUA: A Large Language Model for Aquaculture & Fisheries

Title: AQUA: A Large Language Model for Aquaculture & Fisheries

AQUA: Ein großes Sprachmodell für Aquakultur und Fischerei

AQUA:水产养殖和渔业大语言模式 2507.20520v1

Authors (7): Praneeth Narisetty, Uday Kumar Reddy Kattamanchi, Lohit Akshant Nimma, Sri Ram Kaushik Karnati, Shiva Nagendra Babu Kore, Mounika Golamari, Tejashree Nageshreddy

Aquaculture plays a vital role in global food security and coastal economies by providing sustainable protein sources. As the industry expands to meet rising demand, it faces growing challenges such as disease outbreaks, inefficient feeding practices, rising labor costs, logistical inefficiencies, and critical hatchery issues, including high mortality rates and poor water quality control. Although artificial intelligence has made significant progress, existing machine learning methods fall short of addressing the domain-specific complexities of aquaculture. To bridge this gap, we introduce AQUA, the first large language model (LLM) tailored for aquaculture, designed to support farmers, researchers, and industry practitioners. Central to this effort is AQUADAPT (Data Acquisition, Processing and Tuning), an Agentic Framework for generating and refining high-quality synthetic data using a combination of expert knowledge, largescale language models, and automated evaluation techniques. Our work lays the foundation for LLM-driven innovations in aquaculture research, advisory systems, and decision-making tools.

nan

Article 648

Title@2025-07-28 (1): Geometric Representation Condition Improves Equivariant Molecule Generation

Title: Geometric Representation Condition Improves Equivariant Molecule Generation

Geometrische Darstellung verbessert Gleichwertige Molekülerzeugung

条件改善等异分子生成 2410.03655v4

Authors (5): Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang

Recent advances in molecular generative models have demonstrated great promise for accelerating scientific discovery, particularly in drug design. However, these models often struggle to generate high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to improve molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared with single-stage generation, the easy-to-generate representation in the first stage guides the second stage generation toward a high-quality molecule in a goal-oriented way. Leveraging EDM and SemlaFlow as base generators, we observe significant quality improvements in unconditional molecule generation on the widely used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 50\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations. Furthermore, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly reducing the generation iterations needed. Code is available at https://github.com/GraphPKU/GeoRCG.

nan

Article 649

Title@2025-07-28 (1): Guide your favorite protein sequence generative model

Title: Guide your favorite protein sequence generative model

Führen Sie Ihre Lieblings-Protein-Sequenz generative Modell

指导您最喜爱的蛋白质序列基因模型 2505.04823v3

Authors (7): Junhao Xiong, Hunter Nisonoff, Maria Lukarska, Ishan Gaur, Luke M. Oltrogge, David F. Savage, Jennifer Listgarten

Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide – a principled and general method for conditioning – by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.

nan

Article 650

Title@2025-07-28 (1): Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Title: Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Effizienter Proxy Raytracer für optische Systeme mit impliziten Neuraldarstellungen

使用隐性神经仪表的光学系统 2507.20513v1

Authors (4): Shiva Sinaei, Chuanjun Zheng, Kaan Akşit, Daisuke Iwai

Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations, which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, achieving positional errors on the order of 1{\mu}m and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potential of neural representations as a proxy for optical raytracer.

nan

Article 651

Title@2025-07-28 (1): Tensor Completion with Nearly Linear Samples Given Weak Side Information

Title: Tensor Completion with Nearly Linear Samples Given Weak Side Information

Tensor-Vervollständigung mit fast linearen Proben bei schwachen Seiteninformationen

由于侧面信息薄弱, Tensor 完成近线性样本的 Tensor 完成 2007.00736v4

Authors (2): Christina Lee Yu, Xumei Xi

Tensor completion exhibits an interesting computational-statistical gap in terms of the number of samples needed to perform tensor estimation. While there are only $\Theta(tn)$ degrees of freedom in a $t$-order tensor with $n^t$ entries, the best known polynomial time algorithm requires $O(n^{t/2})$ samples in order to guarantee consistent estimation. In this paper, we show that weak side information is sufficient to reduce the sample complexity to $O(n)$. The side information consists of a weight vector for each of the modes which is not orthogonal to any of the latent factors along that mode; this is significantly weaker than assuming noisy knowledge of the subspaces. We provide an algorithm that utilizes this side information to produce a consistent estimator with $O(n^{1+\kappa})$ samples for any small constant $\kappa > 0$. We also provide experiments on both synthetic and real-world datasets that validate our theoretical insights.

nan

Article 652

Title@2025-07-28 (1): Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning

Title: Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning

Zugeschriebene Graphen-Clustering mit Multi-Scale Gewicht-basiert paarweise Coarsening und Kontrastives Lernen

与多比额表基于重量的对称相对宽度分析和差异性学习组合在一起的属性图 2507.20505v1

Authors (10): Binxiong Li, Yuefei Wang, Binyu Zhao, Heyang Gao, Benhan Yang, Quanzhou Luo, Xue Li, Xu Xiang, Yujie Liu, Huijie Tang

This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24% increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP.

nan

Article 653

Title@2025-07-28 (1): Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

Title: Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

Prover Agent: Ein agentenbasiertes Framework für formale mathematische Nachweise

以代理人为基础的正式数学证明框架 2506.19923v2

Authors (4): Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present case studies illustrating how these generated lemmas contribute to solving challenging problems.

nan

Article 654

Title@2025-07-28 (1): REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Title: REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle

REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v7

Authors (4): Jian Hu, Jason Klein Liu, Haotian Xu, Wei Shen

Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT or GPT-4 commonly employ Proximal Policy Optimization (PPO), the inclusion of a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on simpler prompts and vulnerability to reward hacking and may be biased. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the global advantage normalization which is unbiased to improve the training stability. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in both RLHF and long chain-of-thought (CoT) settings compared to existing REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.

nan

Article 655

Title@2025-07-28 (1): Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Title: Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Lernen zu lernen, während man aufhält: Gradientenkonflikte im maschinellen Lernen bekämpfen

学习在保存时不学习 : 在机器不学习中对抗渐变冲突 2503.06339v2

Authors (2): Gaurav Patel, Qiang Qiu

Machine Unlearning has recently garnered significant attention, aiming to selectively remove knowledge associated with specific data while preserving the model’s performance on the remaining data. A fundamental challenge in this process is balancing effective unlearning with knowledge retention, as naive optimization of these competing objectives can lead to conflicting gradients, hindering convergence and degrading overall performance. To address this issue, we propose Learning to Unlearn while Retaining, aimed to mitigate gradient conflicts between unlearning and retention objectives. Our approach strategically avoids conflicts through an implicit gradient regularization mechanism that emerges naturally within the proposed framework. This prevents conflicting gradients between unlearning and retention, leading to effective unlearning while preserving the model’s utility. We validate our approach across both discriminative and generative tasks, demonstrating its effectiveness in achieving unlearning without compromising performance on remaining data. Our results highlight the advantages of avoiding such gradient conflicts, outperforming existing methods that fail to account for these interactions.

nan

Article 656

Title: Customize Multi-modal RAI Guardrails with Precedent-based predictions

Multimodale RAI-Guardrails mit vorausschauenden Vorhersagen anpassen

定制具有先例预测的多式RAI护卫车 2507.20503v1

Authors (6): Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos, Weitong Ruan, Rahul Gupta, Kai-Wei Chang

A multi-modal guardrail must effectively filter image content based on user-defined policies, identifying material that may be hateful, reinforce harmful stereotypes, contain explicit material, or spread misinformation. Deploying such guardrails in real-world applications, however, poses significant challenges. Users often require varied and highly customizable policies and typically cannot provide abundant examples for each custom policy. Consequently, an ideal guardrail should be scalable to the multiple policies and adaptable to evolving user standards with minimal retraining. Existing fine-tuning methods typically condition predictions on pre-defined policies, restricting their generalizability to new policies or necessitating extensive retraining to adapt. Conversely, training-free methods struggle with limited context lengths, making it difficult to incorporate all the policies comprehensively. To overcome these limitations, we propose to condition model’s judgment on “precedents”, which are the reasoning processes of prior data points similar to the given input. By leveraging precedents instead of fixed policies, our approach greatly enhances the flexibility and adaptability of the guardrail. In this paper, we introduce a critique-revise mechanism for collecting high-quality precedents and two strategies that utilize precedents for robust prediction. Experimental results demonstrate that our approach outperforms previous methods across both few-shot and full-dataset scenarios and exhibits superior generalization to novel policies.

nan

Article 657

Title@2025-07-28 (1): DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Title: DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

DmC: Nächstgelegenes Orientierungs-Diffusionsmodell für Offline-Querdomain-Verstärkungs-Lernen

DMC: 近邻教育指导离线跨领域强化学习推广模式 2507.20499v1

Authors (6): Linh Le Pham Van, Minh Hoang Nguyen, Duc Kieu, Hung Le, Hung The Tran, Sunil Gupta

Cross-domain offline reinforcement learning (RL) seeks to enhance sample efficiency in offline RL by utilizing additional offline source datasets. A key challenge is to identify and utilize source samples that are most relevant to the target domain. Existing approaches address this challenge by measuring domain gaps through domain classifiers, target transition dynamics modeling, or mutual information estimation using contrastive loss. However, these methods often require large target datasets, which is impractical in many real-world scenarios. In this work, we address cross-domain offline RL under a limited target data setting, identifying two primary challenges: (1) Dataset imbalance, which is caused by large source and small target datasets and leads to overfitting in neural network-based domain gap estimators, resulting in uninformative measurements; and (2) Partial domain overlap, where only a subset of the source data is closely aligned with the target domain. To overcome these issues, we propose DmC, a novel framework for cross-domain offline RL with limited target samples. Specifically, DmC utilizes $k$-nearest neighbor ($k$-NN) based estimation to measure domain proximity without neural network training, effectively mitigating overfitting. Then, by utilizing this domain proximity, we introduce a nearest-neighbor-guided diffusion model to generate additional source samples that are better aligned with the target domain, thus enhancing policy learning with more effective source samples. Through theoretical analysis and extensive experiments in diverse MuJoCo environments, we demonstrate that DmC significantly outperforms state-of-the-art cross-domain offline RL methods, achieving substantial performance gains.

nan

Article 658

Title@2025-07-28 (1): Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

Title: Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

Mischung aus Länge und Pruning Experten für Wissensgraphen Reasoning

知识图解释理由的长长和缓冲专家混合 2507.20498v1

Authors (3): Enjun Du, Siyi Liu, Yongqi Zhang

Knowledge Graph (KG) reasoning, which aims to infer new facts from structured knowledge repositories, plays a vital role in Natural Language Processing (NLP) systems. Its effectiveness critically depends on constructing informative and contextually relevant reasoning paths. However, existing graph neural networks (GNNs) often adopt rigid, query-agnostic path-exploration strategies, limiting their ability to adapt to diverse linguistic contexts and semantic nuances. To address these limitations, we propose \textbf{MoKGR}, a mixture-of-experts framework that personalizes path exploration through two complementary components: (1) a mixture of length experts that adaptively selects and weights candidate path lengths according to query complexity, providing query-specific reasoning depth; and (2) a mixture of pruning experts that evaluates candidate paths from a complementary perspective, retaining the most informative paths for each query. Through comprehensive experiments on diverse benchmark, MoKGR demonstrates superior performance in both transductive and inductive settings, validating the effectiveness of personalized path exploration in KGs reasoning.

nan

Article 659

Title@2025-07-28 (1): Classification of high-dimensional data with spiked covariance matrix structure

Title: Classification of high-dimensional data with spiked covariance matrix structure

Klassifizierung von hochdimensionalen Daten mit spiked Kovarianz-Matrix-Struktur

高维数据分类和加压共变矩阵结构 2110.01950v2

Authors (2): Yin-Jen Chen, Minh Tang

We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalues structure and the vector $\zeta$, given by the difference between the whitened mean vectors, is sparse with sparsity at most $s$. We propose an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitened the data, then screen the features by keeping only those corresponding to the $s$ largest coordinates of $\zeta$ and finally apply Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Experimental results on real and synthetic data sets indicate that the proposed classifier is competitive with existing state-of-the-art methods while also selecting a smaller number of features.

nan

Article 660

Title@2025-07-28 (1): Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data

Title: Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data

Position: Untrainiertes maschinelles Lernen zur Erkennung von Anomalien durch Verwendung von 3D-Punkt-Cloud-Daten

位置: 使用 3D 点云数据进行异常检测的未经训练的机器学习 2502.03876v3

Authors (2): Juan Du, Dongheng Chen

Anomaly detection based on 3D point cloud data is an important research problem and receives more and more attention recently. Untrained anomaly detection based on only one sample is an emerging research problem motivated by real manufacturing industries such as personalized manufacturing where only one sample can be collected without any additional labels and historical datasets. Identifying anomalies accurately based on one 3D point cloud sample is a critical challenge in both industrial applications and the field of machine learning. This paper aims to provide a formal definition of the untrained anomaly detection problem based on 3D point cloud data, discuss the differences between untrained anomaly detection and current unsupervised anomaly detection problems. Unlike trained unsupervised learning, untrained unsupervised learning does not rely on any data, including unlabeled data. Instead, they leverage prior knowledge about the surfaces and anomalies. We propose three complementary methodological frameworks: the Latent Variable Inference Framework that employs probabilistic modeling to distinguish anomalies; the Decomposition Framework that separates point clouds into reference, anomaly, and noise components through sparse learning; and the Local Geometry Framework that leverages neighborhood information for anomaly identification. Experimental results demonstrate that untrained methods achieve competitive detection performance while offering significant computational advantages, demonstrating up to a 15-fold increase in execution speed. The proposed methods provide viable solutions for scenarios with extreme data scarcity, addressing critical challenges in personalized manufacturing and healthcare applications where collecting multiple samples or historical data is infeasible.

nan

Article 661

Title@2025-07-28 (1): A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Title: A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Eine neue Methode zur zufälligen Reshuffling für ungenügende Nicht-Konvex-Finite-Summe-Optimierung

用于非移动非convelx Finite- 和优化的新随机调整方法 2312.01047v3

Authors (3): Junwen Qiu, Xiao Li, Andre Milzarek

Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case. In this work, we design a new normal map-based proximal random reshuffling (norm-PRR) method for nonsmooth nonconvex finite-sum problems. We show that norm-PRR achieves the iteration complexity ${\cal O}(n^{-1/3}T^{-2/3})$ where $n$ denotes the number of component functions $f(\cdot,i)$ and $T$ counts the total number of iterations. This improves the currently known complexity bounds for this class of problems by a factor of $n^{-1/3}$ in terms of the number of gradient evaluations. Additionally, we prove that norm-PRR converges linearly under the (global) Polyak-{\L}ojasiewicz condition and in the interpolation setting. We further complement these non-asymptotic results and provide an in-depth analysis of the asymptotic properties of norm-PRR. Specifically, under the (local) Kurdyka-{\L}ojasiewicz inequality, the whole sequence of iterates generated by norm-PRR is shown to converge to a single stationary point. Moreover, we derive last-iterate convergence rates that can match those in the smooth, strongly convex setting. Finally, numerical experiments are performed on nonconvex classification tasks to illustrate the efficiency of the proposed approach.

nan

Article 662

Title@2025-07-28 (1): Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

Title: Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

Deep Reputation Scoring in DeFi: zScore-based Wallet Ranking von Liquidität und Handelssignalen

DFi:从流动性和交易信号中排列的基于zScolor的钱包 2507.20494v1

Authors (6): Dhanashekar Kandaswamy, Ashutosh Sahoo, Akshay SP, Gurukiran S, Parag Paul, Girish G N

As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning.

nan

Article 663

Title@2025-07-28 (1): HIAL: A New Paradigm for Hypergraph Active Learning via Influence Maximization

Title: HIAL: A New Paradigm for Hypergraph Active Learning via Influence Maximization

HIAL: Ein neues Paradigma für Hypergraph Aktives Lernen durch Einflussmaximierung

HIAL:通过影响最大化进行超光速积极学习的新范例 2507.20490v1

Authors (6): Yanheng Hou, Xunkai Li, Zhenjun Li, Bing Zhou, Ronghua Li, Guoren Wang

In recent years, Hypergraph Neural Networks (HNNs) have demonstrated immense potential in handling complex systems with high-order interactions. However, acquiring large-scale, high-quality labeled data for these models is costly, making Active Learning (AL) a critical technique. Existing Graph Active Learning (GAL) methods, when applied to hypergraphs, often rely on techniques like “clique expansion,” which destroys the high-order structural information crucial to a hypergraph’s success, thereby leading to suboptimal performance. To address this challenge, we introduce HIAL (Hypergraph Active Learning), a native active learning framework designed specifically for hypergraphs. We innovatively reformulate the Hypergraph Active Learning (HAL) problem as an Influence Maximization task. The core of HIAL is a dual-perspective influence function that, based on our novel “High-Order Interaction-Aware (HOI-Aware)” propagation mechanism, synergistically evaluates a node’s feature-space coverage (via Magnitude of Influence, MoI) and its topological influence (via Expected Diffusion Value, EDV). We prove that this objective function is monotone and submodular, thus enabling the use of an efficient greedy algorithm with a formal (1-1/e) approximation guarantee. Extensive experiments on seven public datasets demonstrate that HIAL significantly outperforms state-of-the-art baselines in terms of performance, efficiency, generality, and robustness, establishing an efficient and powerful new paradigm for active learning on hypergraphs.

nan

Article 664

Title@2025-07-28 (1): Conditional Diffusion Models for Global Precipitation Map Inpainting

Title: Conditional Diffusion Models for Global Precipitation Map Inpainting

Bedingte Diffusionsmodelle für die weltweite Niederschlagskarte Inpainting

全球降地地图油漆有条件传播模型 2507.20478v1

Authors (3): Daiko Kishikawa, Yuka Muto, Shunji Kotsuki

Incomplete satellite-based precipitation presents a significant challenge in global monitoring. For example, the Global Satellite Mapping of Precipitation (GSMaP) from JAXA suffers from substantial missing regions due to the orbital characteristics of satellites that have microwave sensors, and its current interpolation methods often result in spatial discontinuities. In this study, we formulate the completion of the precipitation map as a video inpainting task and propose a machine learning approach based on conditional diffusion models. Our method employs a 3D U-Net with a 3D condition encoder to reconstruct complete precipitation maps by leveraging spatio-temporal information from infrared images, latitude-longitude grids, and physical time inputs. Training was carried out on ERA5 hourly precipitation data from 2020 to 2023. We generated a pseudo-GSMaP dataset by randomly applying GSMaP masks to ERA maps. Performance was evaluated for the calendar year 2024, and our approach produces more spatio-temporally consistent inpainted precipitation maps compared to conventional methods. These results indicate the potential to improve global precipitation monitoring using the conditional diffusion models.

nan

Article 665

Title@2025-07-28 (1): Token Reduction Should Go Beyond Efficiency in Generative Models – From Vision, Language to Multimodality

Title: Token Reduction Should Go Beyond Efficiency in Generative Models – From Vision, Language to Multimodality

Token-Reduktion sollte über Effizienz in generativen Modellen hinausgehen – Von Vision, Sprache zur Multimodalität

从愿景、语言到多式联运 2505.18227v2

Authors (10): Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, Marinka Zitnik

In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input’s essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate “overthinking” and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, and broader ML and scientific domains. We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.

nan

Article 666

Title@2025-07-28 (1): Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction

Title: Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction

Operator-Inferenz Bewusst Quadratische Manifolds mit isotropen reduzierten Koordinaten für nicht-intrusive Modellreduktion

使用不侵扰性减少模型减少非侵入性模型的慢位位坐标 2507.20463v1

Authors (5): Paul Schwerdtner, Prakash Mohan, Julie Bessac, Marc T. Henry de Frahan, Benjamin Peherstorfer

Quadratic manifolds for nonintrusive reduced modeling are typically trained to minimize the reconstruction error on snapshot data, which means that the error of models fitted to the embedded data in downstream learning steps is ignored. In contrast, we propose a greedy training procedure that takes into account both the reconstruction error on the snapshot data and the prediction error of reduced models fitted to the data. Because our procedure learns quadratic manifolds with the objective of achieving accurate reduced models, it avoids oscillatory and other non-smooth embeddings that can hinder learning accurate reduced models. Numerical experiments on transport and turbulent flow problems show that quadratic manifolds trained with the proposed greedy approach lead to reduced models with up to two orders of magnitude higher accuracy than quadratic manifolds trained with respect to the reconstruction error alone.

nan

Article 667

Title@2025-07-28 (1): EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks

Title: EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks

EdgeAgentX-DT: Integrieren von digitalen Zwillingen und Generative KI für resiliente Edge-Intelligenz in taktischen Netzwerken

EGEAGENTX-DT:将数字双双结合和生成AI,以在战术网络中建立有弹性的边缘情报 2507.21196v1

Authors (1): Abir Ray

We introduce EdgeAgentX-DT, an advanced extension of the EdgeAgentX framework that integrates digital twin simulations and generative AI-driven scenario training to significantly enhance edge intelligence in military networks. EdgeAgentX-DT utilizes network digital twins, virtual replicas synchronized with real-world edge devices, to provide a secure, realistic environment for training and validation. Leveraging generative AI methods, such as diffusion models and transformers, the system creates diverse and adversarial scenarios for robust simulation-based agent training. Our multi-layer architecture includes: (1) on-device edge intelligence; (2) digital twin synchronization; and (3) generative scenario training. Experimental simulations demonstrate notable improvements over EdgeAgentX, including faster learning convergence, higher network throughput, reduced latency, and improved resilience against jamming and node failures. A case study involving a complex tactical scenario with simultaneous jamming attacks, agent failures, and increased network loads illustrates how EdgeAgentX-DT sustains operational performance, whereas baseline methods fail. These results highlight the potential of digital-twin-enabled generative training to strengthen edge AI deployments in contested environments.

nan

Article 668

Title@2025-07-28 (1): Shapley-Value-Based Graph Sparsification for GNN Inference

Title: Shapley-Value-Based Graph Sparsification for GNN Inference

Shapley-Value-Based Graph Sparsification für GNN-Inferenz

GNN 推断法的基于形状值的图形分隔 2507.20460v1

Authors (2): Selahattin Akkas, Ariful Azad

Graph sparsification is a key technique for improving inference efficiency in Graph Neural Networks by removing edges with minimal impact on predictions. GNN explainability methods generate local importance scores, which can be aggregated into global scores for graph sparsification. However, many explainability methods produce only non-negative scores, limiting their applicability for sparsification. In contrast, Shapley value based methods assign both positive and negative contributions to node predictions, offering a theoretically robust and fair allocation of importance by evaluating many subsets of graphs. Unlike gradient-based or perturbation-based explainers, Shapley values enable better pruning strategies that preserve influential edges while removing misleading or adversarial connections. Our approach shows that Shapley value-based graph sparsification maintains predictive performance while significantly reducing graph complexity, enhancing both interpretability and efficiency in GNN inference.

nan

Article 669

Title@2025-07-28 (1): Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

Title: Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

Masked Autoencoder, die das Herz fühlen: Enthüllen Einfachheit Bias für EKG-Analysen

感觉心脏的蒙面自动代码器:用于ECG分析的“永存的简单比” 2506.22495v4

Authors (5): He-Yang Xu, Hongxiang Gao, Yuwen Li, Xiu-Shen Wei, Chengyu Liu

The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance.

nan

Article 670

Title@2025-07-28 (1): Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Title: Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Diagonal gewichtete generalisierte Methode von Momenten Schätzung für Gaussian Mixture Modeling

Gaussian Mixture 模型模型的对等光速估计动量通用方法 2507.20459v1

Authors (4): Liu Zhang, Oscar Mickelin, Sheng Xu, Amit Singer

Since Pearson [Philosophical Transactions of the Royal Society of London. A, 185 (1894), pp. 71-110] first applied the method of moments (MM) for modeling data as a mixture of one-dimensional Gaussians, moment-based estimation methods have proliferated. Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm.

nan

Article 671

Title@2025-07-28 (1): Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Title: Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Frequency-Aware Autoregressive Modellierung für effiziente High-Resolution-Bildsynthese

高效高分辨率图像合成高效高分辨率图像集自动回归模型 2507.20454v1

Authors (5): Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan

Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refining resolution across multiple stages. However, the computational overhead in high-resolution stages remains a critical challenge due to the substantial number of tokens involved. In this paper, we introduce SparseVAR, a plug-and-play acceleration framework for next-scale prediction that dynamically excludes low-frequency tokens during inference without requiring additional training. Our approach is motivated by the observation that tokens in low-frequency regions have a negligible impact on image quality in high-resolution stages and exhibit strong similarity with neighboring tokens. Additionally, we observe that different blocks in the next-scale prediction model focus on distinct regions, with some concentrating on high-frequency areas. SparseVAR leverages these insights by employing lightweight MSE-based metrics to identify low-frequency tokens while preserving the fidelity of excluded regions through a small set of uniformly sampled anchor tokens. By significantly reducing the computational cost while maintaining high image generation quality, SparseVAR achieves notable acceleration in both HART and Infinity. Specifically, SparseVAR achieves up to a 2 times speedup with minimal quality degradation in Infinity-2B.

nan

Article 672

Title@2025-07-28 (1): Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Title: Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Schwach-zu-starke Verallgemeinerung mit Ausfall-Trajektorien: Ein baumbasierter Ansatz zur Elizit-Optimal-Politik in starken Modellen

与失败轨迹相协调的弱力至强力普遍化:以树为基础的方法,在强型模型中采用适当的最佳政策 2507.18858v2

Authors (6): Ruimeng Ye, Zihan Wang, Yang Xiao, Zinan Ling, Manling Li, Bo Hui

Weak-to-Strong generalization (W2SG) is a new trend to elicit the full capabilities of a strong model with supervision from a weak model. While existing W2SG studies focus on simple tasks like binary classification, we extend this paradigm to complex interactive decision-making environments. Specifically, we fine-tune a strong model with trajectories of intermediate actions generated by a weak model. Motivated by the human learning process, we propose to generalize not only success knowledge but also failure experience so that the strong model can learn from failed trajectories accumulated by weak models. To effectively and efficiently elicit the potential of strong agents, we further construct ``trajectory trees,” a hierarchical representation that organizes weak model-generated action trajectories, coupled with Monte Carlo Tree Search (MCTS) to optimize the strong model. Through theoretical analysis, we provide formal guarantees for the effectiveness of our method in improving W2SG performance. Our empirical evaluations demonstrate substantial improvements in reasoning and decision-making capabilities across diverse task domains, validating the scalability and robustness of our proposed framework.

nan

Article 673

Title@2025-07-28 (1): Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Title: Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Ihre Aufmerksamkeit ist wichtig: die Robustheit des Modells zu verbessern, um Geräusche und spurlose Korruptionen zu verursachen

注意事项:改进噪音和纯洁的标本的示范性强力 2507.20453v1

Authors (4): Camilo Tamayo-Rousseau, Yunjia Zhao, Yiqun Zhang, Randall Balestriero

Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corruption scenarios. Through testing across the CIFAR-10, CIFAR-100, and Imagenette datasets, we show that Doubly Stochastic attention is the most robust. Our findings inform self-attention selection in contexts with imperfect data.

nan

Article 674

Title@2025-07-28 (1): WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

Title: WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

WEEP: Ein differenzierbarer, nicht konvexe Sparse Regularizer über schwach-konvexe Umhüllung

WEEP:通过微弱-Convex 信封的可区别的、不可区分的、非confvex Spassar 正规化器 2507.20447v1

Authors (3): Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota

Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.

nan

Article 675

Title@2025-07-28 (1): BOASF: A Unified Framework for Speeding up Automatic Machine Learning via Adaptive Successive Filtering

Title: BOASF: A Unified Framework for Speeding up Automatic Machine Learning via Adaptive Successive Filtering

BOASF: Ein einheitliches Framework zur Beschleunigung des automatischen maschinellen Lernens durch adaptives aufeinander folgendes Filtern

BOASF: 通过适应性连续过滤加速自动机学习的统一框架 2507.20446v1

Authors (7): Guanghui Zhu, Xin Fang, Lei Wang, Wenzhong Chen, Rong Gu, Chunfeng Yuan, Yihua Huang

Machine learning has been making great success in many application areas. However, for the non-expert practitioners, it is always very challenging to address a machine learning task successfully and efficiently. Finding the optimal machine learning model or the hyperparameter combination set from a large number of possible alternatives usually requires considerable expert knowledge and experience. To tackle this problem, we propose a combined Bayesian Optimization and Adaptive Successive Filtering algorithm (BOASF) under a unified multi-armed bandit framework to automate the model selection or the hyperparameter optimization. Specifically, BOASF consists of multiple evaluation rounds in each of which we select promising configurations for each arm using the Bayesian optimization. Then, ASF can early discard the poor-performed arms adaptively using a Gaussian UCB-based probabilistic model. Furthermore, a Softmax model is employed to adaptively allocate available resources for each promising arm that advances to the next round. The arm with a higher probability of advancing will be allocated more resources. Experimental results show that BOASF is effective for speeding up the model selection and hyperparameter optimization processes while achieving robust and better prediction performance than the existing state-of-the-art automatic machine learning methods. Moreover, BOASF achieves better anytime performance under various time budgets.

nan

Article 676

Title@2025-07-28 (1): Provable In-Context Learning of Nonlinear Regression with Transformers

Title: Provable In-Context Learning of Nonlinear Regression with Transformers

Voraussichtliches In-Context-Lernen von nichtlinearer Regression mit Transformern

以变换器对非线性回归的可证实的内文学习 2507.20443v1

Authors (3): Hongbo Li, Lingjie Duan, Yingbin Liang

The transformer architecture, which processes sequences of input tokens to produce outputs for query tokens, has revolutionized numerous areas of machine learning. A defining feature of transformers is their ability to perform previously unseen tasks using task-specific prompts without updating parameters, a phenomenon known as in-context learning (ICL). Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks such as linear regression and binary classification. To advance the theoretical understanding of ICL, this paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities in these settings. We analyze the stage-wise dynamics of attention during training: attention scores between a query token and its target features grow rapidly in the early phase, then gradually converge to one, while attention to irrelevant features decays more slowly and exhibits oscillatory behavior. Our analysis introduces new proof techniques that explicitly characterize how the nature of general non-degenerate L-Lipschitz task functions affects attention weights. Specifically, we identify that the Lipschitz constant L of nonlinear function classes as a key factor governing the convergence dynamics of transformers in ICL. Leveraging these insights, for two distinct regimes depending on whether L is below or above a threshold, we derive different time bounds to guarantee near-zero prediction error. Notably, despite the convergence time depending on the underlying task functions, we prove that query tokens consistently attend to prompt tokens with highly relevant features at convergence, demonstrating the ICL capability of transformers for unseen functions.

nan

Article 677

Title@2025-07-27 (7): BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool

Title: BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool

BioNeuralNet: Ein Graph Neural Network basiertes Multi-Omics Network Data Analysis Tool

生物神经网:基于多功能网络数据分析工具的图表神经网络工具 2507.20440v1

Authors (9): Vicente Ramos, Sundous Hussein, Mohamed Abdel-Hafiz, Arunangshu Sarkar, Weixuan Liu, Katerina J. Kechris, Russell P. Bowler, Leslie Lange, Farnoush Banaei-Kashani

Multi-omics data offer unprecedented insights into complex biological systems, yet their high dimensionality, sparsity, and intricate interactions pose significant analytical challenges. Network-based approaches have advanced multi-omics research by effectively capturing biologically relevant relationships among molecular entities. While these methods are powerful for representing molecular interactions, there remains a need for tools specifically designed to effectively utilize these network representations across diverse downstream analyses. To fulfill this need, we introduce BioNeuralNet, a flexible and modular Python framework tailored for end-to-end network-based multi-omics data analysis. BioNeuralNet leverages Graph Neural Networks (GNNs) to learn biologically meaningful low-dimensional representations from multi-omics networks, converting these complex molecular networks into versatile embeddings. BioNeuralNet supports all major stages of multi-omics network analysis, including several network construction techniques, generation of low-dimensional representations, and a broad range of downstream analytical tasks. Its extensive utilities, including diverse GNN architectures, and compatibility with established Python packages (e.g., scikit-learn, PyTorch, NetworkX), enhance usability and facilitate quick adoption. BioNeuralNet is an open-source, user-friendly, and extensively documented framework designed to support flexible and reproducible multi-omics network analysis in precision medicine.

nan

Article 678

Title@2025-07-27 (7): Critiques of World Models

Title: Critiques of World Models

Kritik an Weltmodellen

世界模式的证明 2507.05169v3

Authors (4): Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu

World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of “hypothetical thinking” in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervision learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

nan

Article 679

Title@2025-07-27 (7): Surrogate modeling of Cellular-Potts Agent-Based Models as a segmentation task using the U-Net neural network architecture

Title: Surrogate modeling of Cellular-Potts Agent-Based Models as a segmentation task using the U-Net neural network architecture

Surrogate Modellierung von Zellular-Potts Agent-Based Models als Segmentierungsaufgabe mit Hilfe der U-Net-Neuralnetzwerkarchitektur

利用 U-Net 神经网络结构结构,将代用基于细胞-动力代理模型建模作为一种分离任务 2505.00316v3

Authors (7): Tien Comlekoglu, J. Quetzalcóatl Toledo-Marín, Tina Comlekoglu, Douglas W. DeSimone, Shayn M. Peirce, Geoffrey Fox, James A. Glazier

The Cellular-Potts model is a powerful and ubiquitous framework for developing computational models for simulating complex multicellular biological systems. Cellular-Potts models (CPMs) are often computationally expensive due to the explicit modeling of interactions among large numbers of individual model agents and diffusive fields described by partial differential equations (PDEs). In this work, we develop a convolutional neural network (CNN) surrogate model using a U-Net architecture that accounts for periodic boundary conditions. We use this model to accelerate the evaluation of a mechanistic CPM previously used to investigate in vitro vasculogenesis. The surrogate model was trained to predict 100 computational steps ahead (Monte-Carlo steps, MCS), accelerating simulation evaluations by a factor of 590 times compared to CPM code execution. Over multiple recursive evaluations, our model effectively captures the emergent behaviors demonstrated by the original Cellular-Potts model of such as vessel sprouting, extension and anastomosis, and contraction of vascular lacunae. This approach demonstrates the potential for deep learning to serve as efficient surrogate models for CPM simulations, enabling faster evaluation of computationally expensive CPM of biological processes at greater spatial and temporal scales.

nan

Article 680

Title@2025-07-27 (7): FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning

Title: FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning

FAST: Ähnlichkeitsbasierter Wissenstransfer für effizientes politisches Lernen

FAST: 以相似性为基础的知识转让,促进有效的政策学习 2507.20433v1

Authors (3): Alessandro Capurso, Elia Piccoli, Davide Bacciu

Transfer Learning (TL) offers the potential to accelerate learning by transferring knowledge across tasks. However, it faces critical challenges such as negative transfer, domain adaptation and inefficiency in selecting solid source policies. These issues often represent critical problems in evolving domains, i.e. game development, where scenarios transform and agents must adapt. The continuous release of new agents is costly and inefficient. In this work we challenge the key issues in TL to improve knowledge transfer, agents performance across tasks and reduce computational costs. The proposed methodology, called FAST - Framework for Adaptive Similarity-based Transfer, leverages visual frames and textual descriptions to create a latent representation of tasks dynamics, that is exploited to estimate similarity between environments. The similarity scores guides our method in choosing candidate policies from which transfer abilities to simplify learning of novel tasks. Experimental results, over multiple racing tracks, demonstrate that FAST achieves competitive final performance compared to learning-from-scratch methods while requiring significantly less training steps. These findings highlight the potential of embedding-driven task similarity estimations.

nan

Article 681

Title@2025-07-27 (7): Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning

Title: Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning

Dichteverhältnis Schätzungsbasierte Bayesische Optimierung mit semi-überwachtem Lernen

基于巴耶斯最优化的半强化学习 2305.15612v5

Authors (1): Jungtaek Kim

Bayesian optimization has attracted huge attention from diverse research areas in science and engineering, since it is capable of efficiently finding a global optimum of an expensive-to-evaluate black-box function. In general, a probabilistic regression model is widely used as a surrogate function to model an explicit distribution over function evaluations given an input to estimate and a training dataset. Beyond the probabilistic regression-based methods, density ratio estimation-based Bayesian optimization has been suggested in order to estimate a density ratio of the groups relatively close and relatively far to a global optimum. Developing this line of research further, supervised classifiers are employed to estimate a class probability for the two groups instead of a density ratio. However, the supervised classifiers used in this strategy are prone to be overconfident for known knowledge on global solution candidates. Supposing that we have access to unlabeled points, e.g., predefined fixed-size pools, we propose density ratio estimation-based Bayesian optimization with semi-supervised learning to solve this challenge. Finally, we show the empirical results of our methods and several baseline methods in two distinct scenarios with unlabeled point sampling and a fixed-size pool, and analyze the validity of our methods in diverse experiments.

nan

Article 682

Title@2025-07-27 (7): Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs

Title: Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs

Interpretierbare, auf Anomalien basierende DDoS-Erkennung in AI-RAN mit XAI und LLMs

在AI-RAN使用 XAI 和LLM 进行AI-RAN的基于解释的DDoS 探测 2507.21193v1

Authors (4): Sotiris Chatzimiltis, Mohammad Shojafar, Mahdi Boloursaz Mashhadi, Rahim Tafazolli

Next generation Radio Access Networks (RANs) introduce programmability, intelligence, and near real-time control through intelligent controllers, enabling enhanced security within the RAN and across broader 5G/6G infrastructures. This paper presents a comprehensive survey highlighting opportunities, challenges, and research gaps for Large Language Models (LLMs)-assisted explainable (XAI) intrusion detection (IDS) for secure future RAN environments. Motivated by this, we propose an LLM interpretable anomaly-based detection system for distributed denial-of-service (DDoS) attacks using multivariate time series key performance measures (KPMs), extracted from E2 nodes, within the Near Real-Time RAN Intelligent Controller (Near-RT RIC). An LSTM-based model is trained to identify malicious User Equipment (UE) behavior based on these KPMs. To enhance transparency, we apply post-hoc local explainability methods such as LIME and SHAP to interpret individual predictions. Furthermore, LLMs are employed to convert technical explanations into natural-language insights accessible to non-expert users. Experimental results on real 5G network KPMs demonstrate that our framework achieves high detection accuracy (F1-score > 0.96) while delivering actionable and interpretable outputs.

nan

Article 683

Title@2025-07-27 (7): ResCap-DBP: A Lightweight Residual-Capsule Network for Accurate DNA-Binding Protein Prediction Using Global ProteinBERT Embeddings

Title: ResCap-DBP: A Lightweight Residual-Capsule Network for Accurate DNA-Binding Protein Prediction Using Global ProteinBERT Embeddings

ResCap-DBP: Ein leichtes Residual-Capsule-Netzwerk für präzise DNA-Binding-Protein-Vorhersage mit globalen Protein-BERT-Embeddings

ResCapt-DBP:利用全球蛋白BER 嵌入器进行精密DNA丁丁蛋白蛋白预测的轻量残余能力网络 2507.20426v1

Authors (3): Samiul Based Shuvo, Tasnia Binte Mamun, U Rajendra Acharya

DNA-binding proteins (DBPs) are integral to gene regulation and cellular processes, making their accurate identification essential for understanding biological functions and disease mechanisms. Experimental methods for DBP identification are time-consuming and costly, driving the need for efficient computational prediction techniques. In this study, we propose a novel deep learning framework, ResCap-DBP, that combines a residual learning-based encoder with a one-dimensional Capsule Network (1D-CapsNet) to predict DBPs directly from raw protein sequences. Our architecture incorporates dilated convolutions within residual blocks to mitigate vanishing gradient issues and extract rich sequence features, while capsule layers with dynamic routing capture hierarchical and spatial relationships within the learned feature space. We conducted comprehensive ablation studies comparing global and local embeddings from ProteinBERT and conventional one-hot encoding. Results show that ProteinBERT embeddings substantially outperform other representations on large datasets. Although one-hot encoding showed marginal advantages on smaller datasets, such as PDB186, it struggled to scale effectively. Extensive evaluations on four pairs of publicly available benchmark datasets demonstrate that our model consistently outperforms current state-of-the-art methods. It achieved AUC scores of 98.0% and 89.5% on PDB14189andPDB1075, respectively. On independent test sets PDB2272 and PDB186, the model attained top AUCs of 83.2% and 83.3%, while maintaining competitive performance on larger datasets such as PDB20000. Notably, the model maintains a well balanced sensitivity and specificity across datasets. These results demonstrate the efficacy and generalizability of integrating global protein representations with advanced deep learning architectures for reliable and scalable DBP prediction in diverse genomic contexts.

nan

Article 684

Title@2025-07-27 (7): Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Title: Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Kommunikation-Effizient verteiltes Training für kollaborative Flat Optima Erholung im Deep Learning

促进深学习合作、平板最佳最佳恢复的传播-高效分配培训 2507.20424v1

Authors (2): Tolga Dimlioglu, Anna Choromanska

We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while significantly reducing communication overhead. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.

nan

Article 685

Title@2025-07-27 (7): Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?

Title: Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?

Umfrage zu NLU-Benchmarks Diagnose Linguistische Phänomene: Warum nicht Diagnose-Benchmarks standardisieren?

NLU基准诊断语言神话调查:为什么不使诊断基准标准化? 2507.20419v1

Authors (3): Khloud AL Jallad, Nada Ghneim, Ghaida Rebdawi

Natural Language Understanding (NLU) is a basic task in Natural Language Processing (NLP). The evaluation of NLU capabilities has become a trending research topic that attracts researchers in the last few years, resulting in the development of numerous benchmarks. These benchmarks include various tasks and datasets in order to evaluate the results of pretrained models via public leaderboards. Notably, several benchmarks contain diagnostics datasets designed for investigation and fine-grained error analysis across a wide range of linguistic phenomena. This survey provides a comprehensive review of available English, Arabic, and Multilingual NLU benchmarks, with a particular emphasis on their diagnostics datasets and the linguistic phenomena they covered. We present a detailed comparison and analysis of these benchmarks, highlighting their strengths and limitations in evaluating NLU tasks and providing in-depth error analysis. When highlighting the gaps in the state-of-the-art, we noted that there is no naming convention for macro and micro categories or even a standard set of linguistic phenomena that should be covered. Consequently, we formulated a research question regarding the evaluation metrics of the evaluation diagnostics benchmarks: “Why do not we have an evaluation standard for the NLU evaluation diagnostics benchmarks?” similar to ISO standard in industry. We conducted a deep analysis and comparisons of the covered linguistic phenomena in order to support experts in building a global hierarchy for linguistic phenomena in future. We think that having evaluation metrics for diagnostics evaluation could be valuable to gain more insights when comparing the results of the studied models on different diagnostics benchmarks.

nan

Article 686

Title@2025-07-27 (7): Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Title: Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Clarify lernen: Multiturn-Gespräche mit aktionsbasiertem Kontrast-Selbst-Training

学习澄清:与基于行动的差异性自我培训进行多方向对话 2406.00222v2

Authors (4): Maximillian Chen, Ruoxi Sun, Tomas Pfister, Sercan Ö. Arık

Large language models (LLMs), optimized through human feedback, have rapidly emerged as a leading paradigm for developing intelligent conversational assistants. However, despite their strong performance across many benchmarks, LLM-based agents might still lack conversational skills such as disambiguation – when they are faced with ambiguity, they often overhedge or implicitly guess users’ true intents rather than asking clarification questions. Under task-specific settings, high-quality conversation samples are often limited, constituting a bottleneck for LLMs’ ability to learn optimal dialogue action policies. We propose Action-Based Contrastive Self-Training (ACT), a quasi-online preference optimization algorithm based on Direct Preference Optimization (DPO), that enables data-efficient dialogue policy learning in multi-turn conversation modeling. We demonstrate ACT’s efficacy under in data-efficient tuning scenarios, even when there is no action label available, using multiple real-world conversational tasks: tabular-grounded question-answering, machine reading comprehension, and AmbigSQL, a novel task for disambiguating information-seeking requests for complex SQL generation towards data analysis agents. Additionally, we propose evaluating LLMs’ ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard tuning approaches like supervised fine-tuning and DPO.

nan

Article 687

Title@2025-07-27 (7): A General Framework for Estimating Preferences Using Response Time Data

Title: A General Framework for Estimating Preferences Using Response Time Data

Ein allgemeiner Rahmen für die Schätzung von Präferenzen unter Verwendung von Reaktionszeitdaten

利用反应时间数据估计优惠的一般框架 2507.20403v1

Authors (3): Federico Echenique, Alireza Fallah, Michael I. Jordan

We propose a general methodology for recovering preference parameters from data on choices and response times. Our methods yield estimates with fast ($1/n$ for $n$ data points) convergence rates when specialized to the popular Drift Diffusion Model (DDM), but are broadly applicable to generalizations of the DDM as well as to alternative models of decision making that make use of response time data. The paper develops an empirical application to an experiment on intertemporal choice, showing that the use of response times delivers predictive accuracy and matters for the estimation of economically relevant parameters.

nan

Article 688

Title@2025-07-27 (7): A Free Probabilistic Framework for Analyzing the Transformer-based Language Models

Title: A Free Probabilistic Framework for Analyzing the Transformer-based Language Models

Ein freier probabilistischer Rahmen für die Analyse der transformerbasierten Sprachmodelle

分析以变换器为基础的语言模型的自由概率框架 2506.16550v2

Authors (1): Swagatam Das

We present a formal operator-theoretic framework for analyzing Transformer-based language models using free probability theory. By modeling token embeddings and attention mechanisms as self-adjoint operators in a tracial ( W^* )-probability space, we reinterpret attention as non-commutative convolution and describe representation propagation via free additive convolution. This leads to a spectral dynamic system interpretation of deep Transformers. We derive entropy-based generalization bounds under freeness assumptions and provide insight into positional encoding, spectral evolution, and representational complexity. This work offers a principled, though theoretical, perspective on structural dynamics in large language models.

nan

Article 689

Title@2025-07-27 (7): Exploring Adaptive Structure Learning for Heterophilic Graphs

Title: Exploring Adaptive Structure Learning for Heterophilic Graphs

Erforschen von adaptivem Strukturlernen für heterophile Graphen

探索异性哲学图的适应性结构学习 2507.21191v1

Authors (1): Garv Kaushik

Graph Convolutional Networks (GCNs) gained traction for graph representation learning, with recent attention on improving performance on heterophilic graphs for various real-world applications. The localized feature aggregation in a typical message-passing paradigm hinders the capturing of long-range dependencies between non-local nodes of the same class. The inherent connectivity structure in heterophilic graphs often conflicts with information sharing between distant nodes of same class. We propose structure learning to rewire edges in shallow GCNs itself to avoid performance degradation in downstream discriminative tasks due to oversmoothing. Parameterizing the adjacency matrix to learn connections between non-local nodes and extend the hop span of shallow GCNs facilitates the capturing of long-range dependencies. However, our method is not generalizable across heterophilic graphs and performs inconsistently on node classification task contingent to the graph structure.

nan

Article 690

Title@2025-07-27 (7): Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals

Title: Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals

Jenseits neuraler Netzwerke: Symbolische Vernunft über Wavelet Logic Graph Signals

超越神经网络:波盘逻辑图信号的符号原因 2507.21190v1

Authors (3): Andrew Kiruluta, Andreas Lemos, Priscilla Burity

We present a fully non neural learning framework based on Graph Laplacian Wavelet Transforms (GLWT). Unlike traditional architectures that rely on convolutional, recurrent, or attention based neural networks, our model operates purely in the graph spectral domain using structured multiscale filtering, nonlinear shrinkage, and symbolic logic over wavelet coefficients. Signals defined on graph nodes are decomposed via GLWT, modulated with interpretable nonlinearities, and recombined for downstream tasks such as denoising and token classification. The system supports compositional reasoning through a symbolic domain-specific language (DSL) over graph wavelet activations. Experiments on synthetic graph denoising and linguistic token graphs demonstrate competitive performance against lightweight GNNs with far greater transparency and efficiency. This work proposes a principled, interpretable, and resource-efficient alternative to deep neural architectures for learning on graphs.

nan

Article 691

Title@2025-07-27 (7): Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning

Title: Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning

Operator-based Machine Intelligence: Ein Hilbert Space Framework für Spektrales Lernen und Symbolische Vernunft

以操作者为基础的机器情报:希尔伯特光学学习和符号理由空间框架 2507.21189v1

Authors (3): Andrew Kiruluta, Andreas Lemos, Priscilla Burity

Traditional machine learning models, particularly neural networks, are rooted in finite-dimensional parameter spaces and nonlinear function approximations. This report explores an alternative formulation where learning tasks are expressed as sampling and computation in infinite dimensional Hilbert spaces, leveraging tools from functional analysis, signal processing, and spectral theory. We review foundational concepts such as Reproducing Kernel Hilbert Spaces (RKHS), spectral operator learning, and wavelet-domain representations. We present a rigorous mathematical formulation of learning in Hilbert spaces, highlight recent models based on scattering transforms and Koopman operators, and discuss advantages and limitations relative to conventional neural architectures. The report concludes by outlining directions for scalable and interpretable machine learning grounded in Hilbertian signal processing.

nan

Article 692

Title@2025-07-27 (7): Bipedalism for Quadrupedal Robots: Versatile Loco-Manipulation through Risk-Adaptive Reinforcement Learning

Title: Bipedalism for Quadrupedal Robots: Versatile Loco-Manipulation through Risk-Adaptive Reinforcement Learning

Bipedalismus für Vierradroboter: Vielseitige Loko-Manipulation durch Risiko-Adaptive Verstärkungs-Lernen

四肢机器人的双轨主义:通过风险评估强化学习进行Versatile Loco-管理 2507.20382v1

Authors (3): Yuyou Zhang, Radu Corcodel, Ding Zhao

Loco-manipulation of quadrupedal robots has broadened robotic applications, but using legs as manipulators often compromises locomotion, while mounting arms complicates the system. To mitigate this issue, we introduce bipedalism for quadrupedal robots, thus freeing the front legs for versatile interactions with the environment. We propose a risk-adaptive distributional Reinforcement Learning (RL) framework designed for quadrupedal robots walking on their hind legs, balancing worst-case conservativeness with optimal performance in this inherently unstable task. During training, the adaptive risk preference is dynamically adjusted based on the uncertainty of the return, measured by the coefficient of variation of the estimated return distribution. Extensive experiments in simulation show our method’s superior performance over baselines. Real-world deployment on a Unitree Go2 robot further demonstrates the versatility of our policy, enabling tasks like cart pushing, obstacle probing, and payload transport, while showcasing robustness against challenging dynamics and external disturbances.

nan

Article 693

Title@2025-07-27 (7): Set-based Implicit Likelihood Inference of Galaxy Cluster Mass

Title: Set-based Implicit Likelihood Inference of Galaxy Cluster Mass

Set-based Implicit Likelihood Inferenz von Galaxy Cluster Masse

银河群群群集基于设定的隐含可能性推推推 2507.20378v1

Authors (2): Bonny Y. Wang, Leander Thiele

We present a set-based machine learning framework that infers posterior distributions of galaxy cluster masses from projected galaxy dynamics. Our model combines Deep Sets and conditional normalizing flows to incorporate both positional and velocity information of member galaxies to predict residual corrections to the $M$-$\sigma$ relation for improved interpretability. Trained on the Uchuu-UniverseMachine simulation, our approach significantly reduces scatter and provides well-calibrated uncertainties across the full mass range compared to traditional dynamical estimates.

nan

Article 694

Title@2025-07-27 (7): PyG 2.0: Scalable Learning on Real World Graphs

Title: PyG 2.0: Scalable Learning on Real World Graphs

PyG 2.0: Scalable Learning on Real World Graphs

PyG 2.0: 真实世界图表上的可缩放学习 2507.16991v2

Authors (13): Matthias Fey, Jinu Sunil, Akihiro Nitta, Rishi Puri, Manan Shah, Blaž Stojanovič, Ramona Bendias, Alexandria Barghi, Vid Kocijan, Zecheng Zhang, Xinwei He, Jan Eric Lenssen, Jure Leskovec

PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework’s enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.

nan

Article 695

Title@2025-07-27 (7): WBHT: A Generative Attention Architecture for Detecting Black Hole Anomalies in Backbone Networks

Title: WBHT: A Generative Attention Architecture for Detecting Black Hole Anomalies in Backbone Networks

WBHT: Eine generative Aufmerksamkeitsarchitektur zur Erkennung von Schwarzlochanomalien in Backbone Networks

WBHT:用于检测后骨网络黑洞异常现象的引人注意结构 2507.20373v1

Authors (3): Kiymet Kaya, Elif Ak, Sule Gunduz Oguducu

We propose the Wasserstein Black Hole Transformer (WBHT) framework for detecting black hole (BH) anomalies in communication networks. These anomalies cause packet loss without failure notifications, disrupting connectivity and leading to financial losses. WBHT combines generative modeling, sequential learning, and attention mechanisms to improve BH anomaly detection. It integrates a Wasserstein generative adversarial network with attention mechanisms for stable training and accurate anomaly identification. The model uses long-short-term memory layers to capture long-term dependencies and convolutional layers for local temporal patterns. A latent space encoding mechanism helps distinguish abnormal network behavior. Tested on real-world network data, WBHT outperforms existing models, achieving significant improvements in F1 score (ranging from 1.65% to 58.76%). Its efficiency and ability to detect previously undetected anomalies make it a valuable tool for proactive network monitoring and security, especially in mission-critical networks.

nan

Article 696

Title@2025-07-27 (7): A Learning-based Domain Decomposition Method

Title: A Learning-based Domain Decomposition Method

Eine lernbasierte Methode der Domänenzersetzung

以学习为基础的域分解方法 2507.17328v2

Authors (3): Rui Wu, Nikola Kovachki, Burigede Liu

Recent developments in mechanical, aerospace, and structural engineering have driven a growing need for efficient ways to model and analyse structures at much larger and more complex scales than before. While established numerical methods like the Finite Element Method remain reliable, they often struggle with computational cost and scalability when dealing with large and geometrically intricate problems. In recent years, neural network-based methods have shown promise because of their ability to efficiently approximate nonlinear mappings. However, most existing neural approaches are still largely limited to simple domains, which makes it difficult to apply to real-world PDEs involving complex geometries. In this paper, we propose a learning-based domain decomposition method (L-DDM) that addresses this gap. Our approach uses a single, pre-trained neural operator-originally trained on simple domains-as a surrogate model within a domain decomposition scheme, allowing us to tackle large and complicated domains efficiently. We provide a general theoretical result on the existence of neural operator approximations in the context of domain decomposition solution of abstract PDEs. We then demonstrate our method by accurately approximating solutions to elliptic PDEs with discontinuous microstructures in complex geometries, using a physics-pretrained neural operator (PPNO). Our results show that this approach not only outperforms current state-of-the-art methods on these challenging problems, but also offers resolution-invariance and strong generalization to microstructural patterns unseen during training.

nan

Article 697

Title@2025-07-27 (7): Memorization: A Close Look at Books

Title: Memorization: A Close Look at Books

Auswendiglernen: Ein genauer Blick auf Bücher

记忆化:对书籍的近视 2504.12549v2

Authors (5): Iris Ma, Ian Domingo, Alberto Krone-Martins, Pierre Baldi, Cristina V. Lopes

To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the “prefix-prompting” extraction technique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice’s Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.

nan

Article 698

Title@2025-07-27 (7): Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Title: Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Clustering by Aufmerksamkeit: Leveraging Previous Fitted Transformers for Data Partitioning

集中集束注意: 利用事先适合的变异器来利用数据分割 2507.20369v1

Authors (2): Ahmed Shokry, Ayman Khalafallah

Clustering is a core task in machine learning with wide-ranging applications in data mining and pattern recognition. However, its unsupervised nature makes it inherently challenging. Many existing clustering algorithms suffer from critical limitations: they often require careful parameter tuning, exhibit high computational complexity, lack interpretability, or yield suboptimal accuracy, especially when applied to large-scale datasets. In this paper, we introduce a novel clustering approach based on meta-learning. Our approach eliminates the need for parameter optimization while achieving accuracy that outperforms state-of-the-art clustering techniques. The proposed technique leverages a few pre-clustered samples to guide the clustering process for the entire dataset in a single forward pass. Specifically, we employ a pre-trained Prior-Data Fitted Transformer Network (PFN) to perform clustering. The algorithm computes attention between the pre-clustered samples and the unclustered samples, allowing it to infer cluster assignments for the entire dataset based on the learned relation. We theoretically and empirically demonstrate that, given just a few pre-clustered examples, the model can generalize to accurately cluster the rest of the dataset. Experiments on challenging benchmark datasets show that our approach can successfully cluster well-separated data without any pre-clustered samples, and significantly improves performance when a few clustered samples are provided. We show that our approach is superior to the state-of-the-art techniques. These results highlight the effectiveness and scalability of our approach, positioning it as a promising alternative to existing clustering techniques.

nan

Article 699

Title@2025-07-27 (7): Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis

Title: Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis

Sequence-Aware Inline-Messung Attribution für gut-schlechte Wafer-Diagnose

良好巴德瓦费尔诊断的测序内线测量属性 2507.20364v1

Authors (4): Kohei Miyaguchi, Masao Joko, Rebekah Sheraw, Tsuyoshi Idé

How can we identify problematic upstream processes when a certain type of wafer defect starts appearing at a quality checkpoint? Given the complexity of modern semiconductor manufacturing, which involves thousands of process steps, cross-process root cause analysis for wafer defects has been considered highly challenging. This paper proposes a novel framework called Trajectory Shapley Attribution (TSA), an extension of Shapley values (SV), a widely used attribution algorithm in explainable artificial intelligence research. TSA overcomes key limitations of standard SV, including its disregard for the sequential nature of manufacturing processes and its reliance on an arbitrarily chosen reference point. We applied TSA to a good-bad wafer diagnosis task in experimental front-end-of-line processes at the NY CREATES Albany NanoTech fab, aiming to identify measurement items (serving as proxies for process parameters) most relevant to abnormal defect occurrence.

nan

Article 700

Title@2025-07-27 (7): Lagrangian neural networks for nonholonomic mechanics

Title: Lagrangian neural networks for nonholonomic mechanics

Lagrangeische neuronale Netzwerke für nichtholonomische Mechanik

Lagrangian 神经网络,用于非蛋白体力学机械学 2411.00110v2

Authors (3): Viviana Alejandra Diaz, Leandro Martin Salomone, Marcela Zuccalli

Lagrangian Neural Networks (LNNs) are a powerful tool for addressing physical systems, particularly those governed by conservation laws. LNNs can parametrize the Lagrangian of a system to predict trajectories with nearly conserved energy. These techniques have proven effective in unconstrained systems as well as those with holonomic constraints. In this work, we adapt LNN techniques to mechanical systems with nonholonomic constraints. We test our approach on some well-known examples with nonholonomic constraints, showing that incorporating these restrictions into the neural network’s learning improves not only trajectory estimation accuracy but also ensures adherence to constraints and exhibits better energy behavior compared to the unconstrained counterpart.

nan

Article 701

Title@2025-07-27 (7): MH-GIN: Multi-scale Heterogeneous Graph-based Imputation Network for AIS Data (Extended Version)

Title: MH-GIN: Multi-scale Heterogeneous Graph-based Imputation Network for AIS Data (Extended Version)

MH-GIN: Multiskaliges Heterogenes Graph-basiertes Imputationsnetzwerk für AIS-Daten (erweiterte Version)

MH-GIN:AIS数据多比例异异形图表计算网(Expended 版本) 2507.20362v1

Authors (6): Hengyu Liu, Tianyi Li, Yuqiang He, Kristian Torp, Yushuai Li, Christian S. Jensen

Location-tracking data from the Automatic Identification System, much of which is publicly available, plays a key role in a range of maritime safety and monitoring applications. However, the data suffers from missing values that hamper downstream applications. Imputing the missing values is challenging because the values of different heterogeneous attributes are updated at diverse rates, resulting in the occurrence of multi-scale dependencies among attributes. Existing imputation methods that assume similar update rates across attributes are unable to capture and exploit such dependencies, limiting their imputation accuracy. We propose MH-GIN, a Multi-scale Heterogeneous Graph-based Imputation Network that aims improve imputation accuracy by capturing multi-scale dependencies. Specifically, MH-GIN first extracts multi-scale temporal features for each attribute while preserving their intrinsic heterogeneous characteristics. Then, it constructs a multi-scale heterogeneous graph to explicitly model dependencies between heterogeneous attributes to enable more accurate imputation of missing values through graph propagation. Experimental results on two real-world datasets find that MH-GIN is capable of an average 57% reduction in imputation errors compared to state-of-the-art methods, while maintaining computational efficiency. The source code and implementation details of MH-GIN are publicly available https://github.com/hyLiu1994/MH-GIN.

nan

Article 702

Title@2025-07-27 (7): Wafer Defect Root Cause Analysis with Partial Trajectory Regression

Title: Wafer Defect Root Cause Analysis with Partial Trajectory Regression

Wafer fehlerhafte Wurzelursachenanalyse mit partieller Trajektorieregression

Wafer 偏差根源分析,带有部分轨倒退 2507.20357v1

Authors (4): Kohei Miyaguchi, Masao Joko, Rebekah Sheraw, Tsuyoshi Idé

Identifying upstream processes responsible for wafer defects is challenging due to the combinatorial nature of process flows and the inherent variability in processing routes, which arises from factors such as rework operations and random process waiting times. This paper presents a novel framework for wafer defect root cause analysis, called Partial Trajectory Regression (PTR). The proposed framework is carefully designed to address the limitations of conventional vector-based regression models, particularly in handling variable-length processing routes that span a large number of heterogeneous physical processes. To compute the attribution score of each process given a detected high defect density on a specific wafer, we propose a new algorithm that compares two counterfactual outcomes derived from partial process trajectories. This is enabled by new representation learning methods, proc2vec and route2vec. We demonstrate the effectiveness of the proposed framework using real wafer history data from the NY CREATES fab in Albany.

nan

Article 703

Title@2025-07-27 (7): A Theory of $θ$-Expectations

Title: A Theory of $θ$-Expectations

Eine Theorie von $θ$-Erwartungen

美元预期值的理论 2507.20353v1

Authors (1): Qian Qi

The canonical theory of stochastic calculus under ambiguity, founded on sub-additivity, is insensitive to non-convex uncertainty structures, leading to an identifiability impasse. This paper develops a mathematical framework for an identifiable calculus sensitive to non-convex geometry. We introduce the $\theta$-BSDE, a class of backward stochastic differential equations where the driver is determined by a pointwise maximization over a primitive, possibly non-convex, uncertainty set. The system’s tractability is predicated not on convexity, but on a global analytic hypothesis: the existence of a unique and globally Lipschitz maximizer map for the driver function. Under this hypothesis, which carves out a tractable class of models, we establish well-posedness via a fixed-point argument. For a distinct, geometrically regular class of models, we prove a result of independent interest: under non-degeneracy conditions from Malliavin calculus, the maximizer is unique along any solution path, ensuring the model’s internal consistency. We clarify the fundamental logical gap between this pathwise property and the global regularity required by our existence proof. The resulting valuation operator defines a dynamically consistent expectation, and we establish its connection to fully nonlinear PDEs via a Feynman-Kac formula.

nan

Article 704

Title@2025-07-27 (7): Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs

Title: Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs

Einbettungen in die Diagnose: Latent Fragilität unter Agentische Störungen in klinischen LLMs

诊断的嵌入:临床LMS中的干燥干扰下的潜在易碎性 2507.21188v1

Authors (1): Raj Krishnan Vijayaraj

LLMs for clinical decision support often fail under small but clinically meaningful input shifts such as masking a symptom or negating a finding, despite high performance on static benchmarks. These reasoning failures frequently go undetected by standard NLP metrics, which are insensitive to latent representation shifts that drive diagnosis instability. We propose a geometry-aware evaluation framework, LAPD (Latent Agentic Perturbation Diagnostics), which systematically probes the latent robustness of clinical LLMs under structured adversarial edits. Within this framework, we introduce Latent Diagnosis Flip Rate (LDFR), a model-agnostic diagnostic signal that captures representational instability when embeddings cross decision boundaries in PCA-reduced latent space. Clinical notes are generated using a structured prompting pipeline grounded in diagnostic reasoning, then perturbed along four axes: masking, negation, synonym replacement, and numeric variation to simulate common ambiguities and omissions. We compute LDFR across both foundation and clinical LLMs, finding that latent fragility emerges even under minimal surface-level changes. Finally, we validate our findings on 90 real clinical notes from the DiReCT benchmark (MIMIC-IV), confirming the generalizability of LDFR beyond synthetic settings. Our results reveal a persistent gap between surface robustness and semantic stability, underscoring the importance of geometry-aware auditing in safety-critical clinical AI.

nan

Article 705

Title@2025-07-27 (7): Computational Advantages of Multi-Grade Deep Learning: Convergence Analysis and Performance Insights

Title: Computational Advantages of Multi-Grade Deep Learning: Convergence Analysis and Performance Insights

Computationale Vorteile von Multi-Grade Deep Learning: Konvergenzanalyse und Leistungseinblicke

多年级深层学习的计算优势:趋同分析和业绩透视 2507.20351v1

Authors (2): Ronglong Fang, Yuesheng Xu

Multi-grade deep learning (MGDL) has been shown to significantly outperform the standard single-grade deep learning (SGDL) across various applications. This work aims to investigate the computational advantages of MGDL focusing on its performance in image regression, denoising, and deblurring tasks, and comparing it to SGDL. We establish convergence results for the gradient descent (GD) method applied to these models and provide mathematical insights into MGDL’s improved performance. In particular, we demonstrate that MGDL is more robust to the choice of learning rate under GD than SGDL. Furthermore, we analyze the eigenvalue distributions of the Jacobian matrices associated with the iterative schemes arising from the GD iterations, offering an explanation for MGDL’s enhanced training stability.

nan

Article 706

Title@2025-07-27 (7): From Observations to Causations: A GNN-based Probabilistic Prediction Framework for Causal Discovery

Title: From Observations to Causations: A GNN-based Probabilistic Prediction Framework for Causal Discovery

Von Beobachtungen zu Kausationen: Ein auf GNN basierendes probabilistisches Prognose-Framework für die kausale Entdeckung

从观察到因果关系:基于GNN的 “ 发现原因概率预测框架 “ 2507.20349v1

Authors (2): Rezaur Rashid, Gabriel Terejanu

Causal discovery from observational data is challenging, especially with large datasets and complex relationships. Traditional methods often struggle with scalability and capturing global structural information. To overcome these limitations, we introduce a novel graph neural network (GNN)-based probabilistic framework that learns a probability distribution over the entire space of causal graphs, unlike methods that output a single deterministic graph. Our framework leverages a GNN that encodes both node and edge attributes into a unified graph representation, enabling the model to learn complex causal structures directly from data. The GNN model is trained on a diverse set of synthetic datasets augmented with statistical and information-theoretic measures, such as mutual information and conditional entropy, capturing both local and global data properties. We frame causal discovery as a supervised learning problem, directly predicting the entire graph structure. Our approach demonstrates superior performance, outperforming both traditional and recent non-GNN-based methods, as well as a GNN-based approach, in terms of accuracy and scalability on synthetic and real-world datasets without further training. This probabilistic framework significantly improves causal structure learning, with broad implications for decision-making and scientific discovery across various fields.

nan

Article 707

Title@2025-07-27 (7): RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model

Title: RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model

RadMamba: Effiziente Erkennung menschlicher Aktivität durch Radar-basiertes Mikro-Doppler-Orientiertes Mamba State-Space-Modell

RadMamba:通过以雷达为基础的以微型多普勒为导向的Mamba国家空间模型,有效认识人类活动 2504.12039v2

Authors (3): Yizhuo Wu, Francesco Fioranelli, Chang Gao

Radar-based HAR has emerged as a promising alternative to conventional monitoring approaches, such as wearable devices and camera-based systems, due to its unique privacy preservation and robustness advantages. However, existing solutions based on convolutional and recurrent neural networks, although effective, are computationally demanding during deployment. This limits their applicability in scenarios with constrained resources or those requiring multiple sensors. Advanced architectures, such as Vision Transformer (ViT) and State-Space Model (SSM) architectures, offer improved modeling capabilities and have made efforts toward lightweight designs. However, their computational complexity remains relatively high. To leverage the strengths of transformer architectures while simultaneously enhancing accuracy and reducing computational complexity, this paper introduces RadMamba, a parameter-efficient, radar micro-Doppler-oriented Mamba SSM specifically tailored for radar-based HAR. Across three diverse datasets, RadMamba matches the top-performing previous model’s 99.8% classification accuracy on Dataset DIAT with only 1/400 of its parameters and equals the leading models’ 92.0% accuracy on Dataset CI4R with merely 1/10 of their parameters. In scenarios with continuous sequences of actions evaluated on Dataset UoG2020, RadMamba surpasses other models with significantly higher parameter counts by at least 3%, achieving this with only 6.7k parameters. Our code is available at: https://github.com/lab-emi/AIRHAR.

nan

Article 708

Title@2025-07-27 (7): Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

Title: Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

Hypergraph Neuronale Netzwerke enthüllen räumliche Domänen aus Single-cell-Transkriptionsdaten

从单细胞转换器数据中提取空间域域 2410.19868v2

Authors (2): Mehrad Soltani, Luis Rueda

The task of spatial clustering of transcriptomics data is of paramount importance. It enables the classification of tissue samples into diverse subpopulations of cells, which, in turn, facilitates the analysis of the biological functions of clusters, tissue reconstruction, and cell-cell interactions. Many approaches leverage gene expressions, spatial locations, and histological images to detect spatial domains; however, Graph Neural Networks (GNNs) as state of the art models suffer from a limitation in the assumption of pairwise connections between nodes. In the case of domain detection in spatial transcriptomics, some cells are found to be not directly related. Still, they are grouped as the same domain, which shows the incapability of GNNs for capturing implicit connections among the cells. While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges, which lets Hypergraph Neural Networks (HGNNs) capture and utilize richer and more complex structural information than traditional GNNs. We use autoencoders to address the limitation of not having the actual labels, which are well-suited for unsupervised learning. Our model has demonstrated exceptional performance, achieving the highest iLISI score of 1.843 compared to other methods. This score indicates the greatest diversity of cell types identified by our method. Furthermore, our model outperforms other methods in downstream clustering, achieving the highest ARI values of 0.51 and Leiden score of 0.60.

nan

Article 709

Title@2025-07-27 (7): Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data

Title: Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data

Interpretierbare Graph Kolmogorov-Arnold-Netzwerke für Multi-Cancer-Klassifikation und Biomarker-Identifikation mittels Multi-Omics-Daten

利用多有机数据进行多癌症分类和生物标志识别的可解释图表 Kolmogorov-Arnold网络 2503.22939v3

Authors (7): Fadi Alharbi, Nishant Budhiraja, Aleksandar Vakanski, Boyu Zhang, Murtada K. Elbashir, Harshith Guduru, Mohanad Mohammed

The integration of heterogeneous multi-omics datasets at a systems level remains a central challenge for developing analytical and computational models in precision cancer diagnostics. This paper introduces Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN), a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples together with Protein-Protein Interaction (PPI) networks for cancer classification across 31 different cancer types. The proposed approach combines differential gene expression with DESeq2, Linear Models for Microarray (LIMMA), and Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce multi-omics data dimensionality while preserving relevant biological features. The model architecture is based on the Kolmogorov-Arnold theorem principle and uses trainable univariate functions to enhance interpretability and feature analysis. MOGKAN achieves classification accuracy of 96.28 percent and exhibits low experimental variability in comparison to related deep learning-based models. The biomarkers identified by MOGKAN were validated as cancer-related markers through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability with potential to enhance the translation of complex multi-omics data into clinically actionable cancer diagnostics.

nan

Article 710

Title@2025-07-27 (7): Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Title: Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Pflegen hilfreicher, personalisierter und kreativer KI-Lehrer: Ein Rahmen für pädagogische Ausrichtung mittels Stärkungslernen

培养有助、个性化和创意的AI导师:利用强化学习实现教学协调的框架 2507.20335v1

Authors (11): Siyu Song, Wentao Liu, Ye Lu, Ruohua Zhang, Tao Liu, Jinze Lv, Xinyun Wang, Aimin Zhou, Fei Tan, Bo Jiang, Hao Hao

The integration of large language models (LLMs) into education presents unprecedented opportunities for scalable personalized learning. However, standard LLMs often function as generic information providers, lacking alignment with fundamental pedagogical principles such as helpfulness, student-centered personalization, and creativity cultivation. To bridge this gap, we propose EduAlign, a novel framework designed to guide LLMs toward becoming more effective and responsible educational assistants. EduAlign consists of two main stages. In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity (HPC). These annotations are used to train HPC-RM, a multi-dimensional reward model capable of accurately scoring LLM outputs according to these educational principles. We further evaluate the consistency and reliability of this reward model. In the second stage, we leverage HPC-RM as a reward signal to fine-tune a pre-trained LLM using Group Relative Policy Optimization (GRPO) on a set of 2k diverse prompts. We then assess the pre- and post-finetuning models on both educational and general-domain benchmarks across the three HPC dimensions. Experimental results demonstrate that the fine-tuned model exhibits significantly improved alignment with pedagogical helpfulness, personalization, and creativity stimulation. This study presents a scalable and effective approach to aligning LLMs with nuanced and desirable educational traits, paving the way for the development of more engaging, pedagogically aligned AI tutors.

nan

Article 711

Title@2025-07-27 (7): The Blessing and Curse of Dimensionality in Safety Alignment

Title: The Blessing and Curse of Dimensionality in Safety Alignment

Der Segen und Fluch der Dimensionalität in der Sicherheitsausrichtung

安全协调中多维度的祝福和诅咒 2507.20333v1

Authors (3): Rachel S. Y. Teo, Laziz U. Abdullaev, Tan M. Nguyen

The focus on safety alignment in large language models (LLMs) has increased significantly due to their widespread adoption across different domains. The scale of LLMs play a contributing role in their success, and the growth in parameter count follows larger hidden dimensions. In this paper, we hypothesize that while the increase in dimensions has been a key advantage, it may lead to emergent problems as well. These problems emerge as the linear structures in the activation space can be exploited, in the form of activation engineering, to circumvent its safety alignment. Through detailed visualizations of linear subspaces associated with different concepts, such as safety, across various model scales, we show that the curse of high-dimensional representations uniquely impacts LLMs. Further substantiating our claim, we demonstrate that projecting the representations of the model onto a lower dimensional subspace can preserve sufficient information for alignment while avoiding those linear structures. Empirical results confirm that such dimensional reduction significantly reduces susceptibility to jailbreaking through representation engineering. Building on our empirical validations, we provide theoretical insights into these linear jailbreaking methods relative to a model’s hidden dimensions. Broadly speaking, our work posits that the high dimensions of a model’s internal representations can be both a blessing and a curse in safety alignment.

nan

Article 712

Title@2025-07-27 (7): FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing

Title: FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing

FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung

流动对等: 轨迹- 重新分类、转换- 无流动图像编辑 2505.23145v4

Authors (4): Jeongsol Kim, Yeobin Hong, Jonghyun Park, Jong Chul Ye

Recent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equation (ODE). While the lack of exact latent inversion is a core advantage of these methods, it often results in unstable editing trajectories and poor source consistency. To address this limitation, we propose {\em FlowAlign}, a novel inversion-free flow-based framework for consistent image editing with optimal control-based trajectory control. Specifically, FlowAlign introduces source similarity at the terminal point as a regularization term to promote smoother and more consistent trajectories during the editing process. Notably, our terminal point regularization is shown to explicitly balance semantic alignment with the edit prompt and structural consistency with the source image along the trajectory. Furthermore, FlowAlign naturally supports reverse editing by simply reversing the ODE trajectory, highliting the reversible and consistent nature of the transformation. Extensive experiments demonstrate that FlowAlign outperforms existing methods in both source preservation and editing controllability.

nan

Article 713

Title@2025-07-27 (7): MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction

Title: MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction

MIPS: ein multimodales Infinite Polymer Sequence Pre-Training Framework für Polymer Property Prediction

MIPS: 聚合物财产预测的多式联运无限聚合物序列培训前框架 2507.20326v1

Authors (5): Jiaxi Wang, Yaosen Min, Xun Zhu, Miao Li, Ji Wu

Polymers, composed of repeating structural units called monomers, are fundamental materials in daily life and industry. Accurate property prediction for polymers is essential for their design, development, and application. However, existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer, since the properties change during the polymerization process. In this study, we propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers and integrates both topological and spatial information for comprehensive modeling. From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences. For MPM, we demonstrate that applying MPM to infinite polymer sequences is equivalent to applying MPM on the induced star-linking graph of monomers. For GAM, we propose to further replace global graph attention with localized graph attention (LGA). Moreover, we show the robustness of the “star linking” strategy through Repeat and Shift Invariance Test (RSIT). Despite its robustness, “star linking” strategy exhibits limitations when monomer side chains contain ring structures, a common characteristic of polymers, as it fails the Weisfeiler-Lehman~(WL) test. To overcome this issue, we propose backbone embedding to enhance the capability of MPM and LGA on infinite polymer sequences. From the spatial perspective, we extract 3D descriptors of repeating monomers to capture spatial information. Finally, we design a cross-modal fusion mechanism to unify the topological and spatial information. Experimental validation across eight diverse polymer property prediction tasks reveals that MIPS achieves state-of-the-art performance.

nan

Article 714

Title@2025-07-27 (7): ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios

Title: ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios

ELMES: Ein automatisierter Rahmen für die Bewertung großer Sprachmodelle in Bildungsszenarien

ELMES:评估教育情景中大语言模式自动框架 2507.22947v1

Authors (12): Shou’ang Wei, Xinyun Wang, Shuzhen Bi, Jian Chen, Ruijia Li, Bo Jiang, Xin Lin, Min Zhang, Yu Song, BingDong Li, Aimin Zhou, Hao Hao

The emergence of Large Language Models (LLMs) presents transformative opportunities for education, generating numerous novel application scenarios. However, significant challenges remain: evaluation metrics vary substantially across different educational scenarios, while many emerging scenarios lack appropriate assessment metrics. Current benchmarks predominantly measure general intelligence rather than pedagogical capabilities. To address this gap, we introduce ELMES, an open-source automated evaluation framework specifically designed for assessing LLMs in educational settings. ELMES features a modular architecture that enables researchers to create dynamic, multi-agent dialogues through simple configuration files, facilitating flexible scenario design without requiring extensive programming expertise. The framework incorporates a hybrid evaluation engine that objectively quantifies traditionally subjective pedagogical metrics using an LLM-as-a-Judge methodology. We conduct systematic benchmarking of state-of-the-art LLMs across four critical educational scenarios: Knowledge Point Explanation, Guided Problem-Solving Teaching, Interdisciplinary Lesson Plan Generation, and Contextualized Question Generation, employing fine-grained metrics developed in collaboration with education specialists. Our results demonstrate distinct capability distributions among models, revealing context-specific strengths and limitations. ELMES provides educators and researchers with an accessible evaluation framework that significantly reduces adaptation barriers for diverse educational applications while advancing the practical implementation of LLMs in pedagogy. The framework is publicly available at \emph{https://github.com/sii-research/elmes.git}.

nan

Article 715

Title@2025-07-27 (7): A Comparative Study of OpenMP Scheduling Algorithm Selection Strategies

Title: A Comparative Study of OpenMP Scheduling Algorithm Selection Strategies

Eine vergleichende Studie der OpenMP-Scheeduling-Algorithm-Auswahlstrategien

OpenMP 测高计表选择战略比较研究 2507.20312v1

Authors (6): Jonas H. Müller Korndörfer, Ali Mohammed, Ahmed Eleliemy, Quentin Guilloteau, Reto Krummenacher, Florina M. Ciorba

Scientific and data science applications are becoming increasingly complex, with growing computational and memory demands. Modern high performance computing (HPC) systems provide high parallelism and heterogeneity across nodes, devices, and cores. To achieve good performance, effective scheduling and load balancing techniques are essential. Parallel programming frameworks such as OpenMP now offer a variety of advanced scheduling algorithms to support diverse applications and platforms. This creates an instance of the scheduling algorithm selection problem, which involves identifying the most suitable algorithm for a given combination of workload and system characteristics. In this work, we explore learning-based approaches for selecting scheduling algorithms in OpenMP. We propose and evaluate expert-based and reinforcement learning (RL)-based methods, and conduct a detailed performance analysis across six applications and three systems. Our results show that RL methods are capable of learning high-performing scheduling decisions, although they require significant exploration, with the choice of reward function playing a key role. Expert-based methods, in contrast, rely on prior knowledge and involve less exploration, though they may not always identify the optimal algorithm for a specific application-system pair. By combining expert knowledge with RL-based learning, we achieve improved performance and greater adaptability. Overall, this work demonstrates that dynamic selection of scheduling algorithms during execution is both viable and beneficial for OpenMP applications. The approach can also be extended to MPI-based programs, enabling optimization of scheduling decisions across multiple levels of parallelism.

nan

Article 716

Title@2025-07-27 (7): What is Wrong with Perplexity for Long-context Language Modeling?

Title: What is Wrong with Perplexity for Long-context Language Modeling?

Was ist falsch an Verwirrung für Langkontext-Sprachenmodellierung?

长文本语言建模的复杂性有什么问题? 2410.23771v5

Authors (8): Lizhe Fang, Yifei Wang, Zhaoyang Liu, Chenheng Zhang, Stefanie Jegelka, Jinyang Gao, Bolin Ding, Yisen Wang

Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose \textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce \textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs. Code is available at https://github.com/PKU-ML/LongPPL.

nan

Article 717

Title@2025-07-27 (7): First-Order Sparse Convex Optimization: Better Rates with Sparse Updates

Title: First-Order Sparse Convex Optimization: Better Rates with Sparse Updates

Sparse Convex Optimization: Bessere Preise mit Sparse-Updates

第一序式螺旋螺旋式最优化: 与粗序更新相比, 利率更好。 2506.19075v2

Authors (1): Dan Garber

In was recently established that for convex optimization problems with a sparse optimal solution (may it be entry-wise sparsity or matrix rank-wise sparsity) it is possible to have linear convergence rates which depend on an improved mixed-norm condition number of the form $\frac{\beta_1{}s}{\alpha_2}$, where $\beta_1$ is the $\ell_1$-Lipchitz continuity constant of the gradient, $\alpha_2$ is the $\ell_2$-quadratic growth constant, and $s$ is the sparsity of the optimal solution. However, beyond the improved convergence rate, these methods are unable to leverage the sparsity of optimal solutions towards improving also the runtime of each iteration, which may still be prohibitively high for high-dimensional problems. In this work, we establish that linear convergence rates which depend on this improved condition number can be obtained using only sparse updates, which may result in overall significantly improved running times. Moreover, our methods are considerably easier to implement.

nan

Article 718

Title@2025-07-27 (7): Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach

Title: Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach

Auf dem Weg zu einem generalisierten Parameter Tuning in kohärenten Ising-Maschinen: Ein portfoliobasierter Ansatz

向一致的自相矛盾机器的一般参数图示:基于组合的办法 2507.20295v1

Authors (4): Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

Coherent Ising Machines (CIMs) have recently gained attention as a promising computing model for solving combinatorial optimization problems. In particular, the Chaotic Amplitude Control (CAC) algorithm has demonstrated high solution quality, but its performance is highly sensitive to a large number of hyperparameters, making efficient tuning essential. In this study, we present an algorithm portfolio approach for hyperparameter tuning in CIMs employing Chaotic Amplitude Control with momentum (CACm) algorithm. Our method incorporates multiple search strategies, enabling flexible and effective adaptation to the characteristics of the hyperparameter space. Specifically, we propose two representative tuning methods, Method A and Method B. Method A optimizes each hyperparameter sequentially with a fixed total number of trials, while Method B prioritizes hyperparameters based on initial evaluations before applying Method A in order. Performance evaluations were conducted on the Supercomputer “Flow” at Nagoya University, using planted Wishart instances and Time to Solution (TTS) as the evaluation metric. Compared to the baseline performance with best-known hyperparameters, Method A achieved up to 1.47x improvement, and Method B achieved up to 1.65x improvement. These results demonstrate the effectiveness of the algorithm portfolio approach in enhancing the tuning process for CIMs.

nan

Article 719

Title@2025-07-27 (7): Context-Aware Deep Lagrangian Networks for Model Predictive Control

Title: Context-Aware Deep Lagrangian Networks for Model Predictive Control

Context-Aware Deep Lagrangian Networks für Modellvorhersagesteuerung

用于模型预测控制的深拉格朗江网络 2506.15249v3

Authors (3): Lucas Schulze, Jan Peters, Oleg Arenz

Controlling a robot based on physics-consistent dynamic models, such as Deep Lagrangian Networks (DeLaN), can improve the generalizability and interpretability of the resulting behavior. However, in complex environments, the number of objects to potentially interact with is vast, and their physical properties are often uncertain. This complexity makes it infeasible to employ a single global model. Therefore, we need to resort to online system identification of context-aware models that capture only the currently relevant aspects of the environment. While physical principles such as the conservation of energy may not hold across varying contexts, ensuring physical plausibility for any individual context-aware model can still be highly desirable, particularly when using it for receding horizon control methods such as model predictive control (MPC). Hence, in this work, we extend DeLaN to make it context-aware, combine it with a recurrent network for online system identification, and integrate it with an MPC for adaptive, physics-consistent control. We also combine DeLaN with a residual dynamics model to leverage the fact that a nominal model of the robot is typically available. We evaluate our method on a 7-DOF robot arm for trajectory tracking under varying loads. Our method reduces the end-effector tracking error by 39%, compared to a 21% improvement achieved by a baseline that uses an extended Kalman filter.

nan

Article 720

Title@2025-07-27 (7): Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation

Title: Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation

Kontrollierbares Feature Whitening für hyperparameterfreie Bias Mitigation

用于减缓超参数-无偏见的可控地貌白化 2507.20284v1

Authors (6): Yooshin Cho, Hanbyel Cho, Janghyeon Lee, HyeongGwon Hong, Jaesung Ahn, Junmo Kim

As the use of artificial intelligence rapidly increases, the development of trustworthy artificial intelligence has become important. However, recent studies have shown that deep neural networks are susceptible to learn spurious correlations present in datasets. To improve the reliability, we propose a simple yet effective framework called controllable feature whitening. We quantify the linear correlation between the target and bias features by the covariance matrix, and eliminate it through the whitening module. Our results systemically demonstrate that removing the linear correlations between features fed into the last linear classifier significantly mitigates the bias, while avoiding the need to model intractable higher-order dependencies. A particular advantage of the proposed method is that it does not require regularization terms or adversarial learning, which often leads to unstable optimization in practice. Furthermore, we show that two fairness criteria, demographic parity and equalized odds, can be effectively handled by whitening with the re-weighted covariance matrix. Consequently, our method controls the trade-off between the utility and fairness of algorithms by adjusting the weighting coefficient. Finally, we validate that our method outperforms existing approaches on four benchmark datasets: Corrupted CIFAR-10, Biased FFHQ, WaterBirds, and Celeb-A.

nan

Article 721

Title@2025-07-27 (7): Machine Learning Model Integration with Open World Temporal Logic for Process Automation

Title: Machine Learning Model Integration with Open World Temporal Logic for Process Automation

Machine Learning Model Integration mit Open World Temporal Logic für die Prozessautomatisierung

与开放世界时间逻辑集成的机械学习模型集成 2506.17776v2

Authors (4): Dyuman Aditya, Colton Payne, Mario Leiva, Paulo Shakarian

Recent advancements in Machine Learning (ML) have yielded powerful models capable of extracting structured information from diverse and complex data sources. However, a significant challenge lies in translating these perceptual or extractive outputs into actionable, reasoned decisions within complex operational workflows. To address these challenges, this paper introduces a novel approach that integrates the outputs from various machine learning models directly with the PyReason framework, an open-world temporal logic programming reasoning engine. PyReason’s foundation in generalized annotated logic allows for the seamless incorporation of real-valued outputs (e.g., probabilities, confidence scores) from diverse ML models, treating them as truth intervals within its logical framework. Crucially, PyReason provides mechanisms, implemented in Python, to continuously poll ML model outputs, convert them into logical facts, and dynamically recompute the minimal model, ensuring real-tine adaptive decision-making. Furthermore, its native support for temporal reasoning, knowledge graph integration, and fully explainable interface traces enables sophisticated analysis over time-sensitive process data and existing organizational knowledge. By combining the strengths of perception and extraction from ML models with the logical deduction and transparency of PyReason, we aim to create a powerful system for automating complex processes. This integration finds utility across numerous domains, including manufacturing, healthcare, and business operations.

nan

Article 722

Title@2025-07-27 (7): Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Title: Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Agent-Fin-R1: Verbesserung der Finanzintelligenz durch Domain-Expertise, Trainingseffizienz und Advanced Reasoning

Agentar Fin-Fin-R1:通过域域专门知识、培训效率和高级理由加强金融情报 2507.16802v4

Authors (13): Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Jingze Song, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang, Peng Zhang

Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.

nan

Article 723

Title@2025-07-27 (7): Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation

Title: Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation

Vidar: Verkörpertes Video-Diffusionsmodell für die generalistische Bimanualmanipulation

Vidar: 通用主义二手手操纵录相传播模型 2507.12898v2

Authors (8): Yao Feng, Hengkai Tan, Xinyi Mao, Guodong Liu, Shuhe Huang, Chendong Xiang, Hang Su, Jun Zhu

Bimanual robotic manipulation, which involves the coordinated control of two robotic arms, is foundational for solving challenging tasks. Despite recent progress in general-purpose manipulation, data scarcity and embodiment heterogeneity remain serious obstacles to further scaling up in bimanual settings. In this paper, we introduce Video Diffusion for Action Reasoning (Vidar), a two-stage framework that leverages large-scale, diffusion-based video pre-training and a novel masked inverse dynamics model for action prediction. We pre-train the video diffusion model on 750K multi-view videos from three real-world bimanual robot platforms, utilizing a unified observation space that encodes robot, camera, task, and scene contexts. Our masked inverse dynamics model learns masks to extract action-relevant information from generated trajectories without requiring pixel-level labels, and the masks can effectively generalize to unseen backgrounds. Our experiments demonstrate that with only 20 minutes of human demonstrations on an unseen robot platform (only 1% of typical data requirements), Vidar generalizes to unseen tasks and backgrounds with strong semantic understanding, surpassing state-of-the-art methods. Our findings highlight the potential of video foundation models, coupled with masked action prediction, to enable scalable and generalizable robotic manipulation in diverse real-world settings.

nan

Article 724

Title@2025-07-27 (7): Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence

Title: Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence

Annähernde vollständige konforme Vorhersage für neurale Netzwerkregression mit Gauß-Newton-Einfluss

在高斯-牛顿影响下对神经网络倒退进行近似全常规预测 2507.20272v1

Authors (4): Dharmesh Tailor, Alvaro H. C. Correia, Eric Nalisnick, Christos Louizos

Uncertainty quantification is an important prerequisite for the deployment of deep learning models in safety-critical areas. Yet, this hinges on the uncertainty estimates being useful to the extent the prediction intervals are well-calibrated and sharp. In the absence of inherent uncertainty estimates (e.g. pretrained models predicting only point estimates), popular approaches that operate post-hoc include Laplace’s method and split conformal prediction (split-CP). However, Laplace’s method can be miscalibrated when the model is misspecified and split-CP requires sample splitting, and thus comes at the expense of statistical efficiency. In this work, we construct prediction intervals for neural network regressors post-hoc without held-out data. This is achieved by approximating the full conformal prediction method (full-CP). Whilst full-CP nominally requires retraining the model for every test point and candidate label, we propose to train just once and locally perturb model parameters using Gauss-Newton influence to approximate the effect of retraining. Coupled with linearization of the network, we express the absolute residual nonconformity score as a piecewise linear function of the candidate label allowing for an efficient procedure that avoids the exhaustive search over the output space. On standard regression benchmarks and bounding box localization, we show the resulting prediction intervals are locally-adaptive and often tighter than those of split-CP.

nan

Article 725

Title@2025-07-27 (7): Data-Efficient Prediction-Powered Calibration via Cross-Validation

Title: Data-Efficient Prediction-Powered Calibration via Cross-Validation

Dateneffiziente Vorhersage-Powered Kalibrierung über Cross-Validation

通过交叉校准进行数据有效预测力校准 2507.20268v1

Authors (5): Seonghoon Yoo, Houssem Sifaou, Sangwoo Park, Joonhyuk Kang, Osvaldo Simeone

Calibration data are necessary to formally quantify the uncertainty of the decisions produced by an existing artificial intelligence (AI) model. To overcome the common issue of scarce calibration data, a promising approach is to employ synthetic labels produced by a (generally different) predictive model. However, fine-tuning the label-generating predictor on the inference task of interest, as well as estimating the residual bias of the synthetic labels, demand additional data, potentially exacerbating the calibration data scarcity problem. This paper introduces a novel approach that efficiently utilizes limited calibration data to simultaneously fine-tune a predictor and estimate the bias of the synthetic labels. The proposed method yields prediction sets with rigorous coverage guarantees for AI-generated decisions. Experimental results on an indoor localization problem validate the effectiveness and performance gains of our solution.

nan

Article 726

Title@2025-07-27 (7): Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

Title: Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

Ungewissheits-Bewusst-Test-Zeit-Optimierung für 3D menschliche Pose-Schätzung

3D 人类粒子估计的不确定性-软件测试-时间优化 2402.02339v2

Authors (8): Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on a projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of the pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on challenging datasets: Human3.6M, MPI-INF-3DHP, and 3DPW. Notably, our approach outperforms the previous best result by a large margin of 5.5\% on Human3.6M. Code is available at \href{https://github.com/xiu-cs/UAO-Pose3D}{https://github.com/xiu-cs/UAO-Pose3D}.

nan

Article 727

Title@2025-07-27 (7): Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining

Title: Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining

Lernen von Experten-Faktoren: Trajektorien-Level-Reward-Formung für den Formelischen Alpha-Mining

从专家因素中学习:公式阿尔法采矿的轨迹级奖得分形状 2507.20263v1

Authors (4): Junjie Zhao, Chengxi Zhang, Chenkai Wang, Peng Yang

Reinforcement learning (RL) has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies. However, existing methods are hampered by the sparse rewards given the underlying Markov Decision Process. This inefficiency limits the exploration of the vast symbolic search space and destabilizes the training process. To address this, Trajectory-level Reward Shaping (TLRS), a novel reward shaping method, is proposed. TLRS provides dense, intermediate rewards by measuring the subsequence-level similarity between partially generated expressions and a set of expert-designed formulas. Furthermore, a reward centering mechanism is introduced to reduce training variance. Extensive experiments on six major Chinese and U.S. stock indices show that TLRS significantly improves the predictive power of mined factors, boosting the Rank Information Coefficient by 9.29% over existing potential-based shaping algorithms. Notably, TLRS achieves a major leap in computational efficiency by reducing its time complexity with respect to the feature dimension from linear to constant, which is a significant improvement over distance-based baselines.

nan

Article 728

Title@2025-07-27 (7): Leveraging Analytic Gradients in Provably Safe Reinforcement Learning

Title: Leveraging Analytic Gradients in Provably Safe Reinforcement Learning

Nutzung analytischer Gradienten im wahrscheinlich sicheren Ausbau-Lernen

在安全强化学习中利用分析梯度 2506.01665v2

Authors (4): Tim Walter, Hannah Markgraf, Jonathan Külz, Matthias Althoff

The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them with a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance.

nan

Article 729

Title@2025-07-27 (7): GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference

Title: GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference

GQSA: Gruppe Quantisierung und Sparsamkeit für die Beschleunigung der großen Sprachmodellinferenz

GQSA:加速使用大语言模式模型推断的组量化和分数 2412.17560v4

Authors (6): Chao Zeng, Songwei Liu, Shu Yang, Fangmin Chen, Lean Fu, Xing Mei

Model compression has emerged as a mainstream solution to reduce memory usage and computational overhead. This paper presents Group Quantization and Sparse Acceleration (GQSA), a novel compression technique tailored for LLMs. Traditional methods typically focus exclusively on either quantization or sparsification, but relying on a single strategy often results in significant performance loss at high compression rates. In contrast, GQSA integrates quantization and sparsification in a tightly coupled manner, leveraging GPU-friendly structured group sparsity and quantization for efficient acceleration. Building upon system-algorithm co-design principles, we propose a two-stage sparse optimization strategy that ensures the performance superiority of the compressed model. On the engine side, we introduce a “task-centric” parallel strategy, which, to the best of our knowledge, is the first application in the domain of sparse computing. Compared to the traditional 2:4 sparse method, the GQSA offers a more flexible and adjustable sparsity rate, as well as a higher weight compression rate, and is efficiently compatible with weight-only quantization methods. Experimental results demonstrate that, under the GQSA W4S50% compression setting, the model’s accuracy surpasses that of both 2:4 pruning and W2 quantization. Furthermore, at the inference level, GQSA outperforms W2 by 1.26$\times$ and 2:4 pruning by 2.35$\times$ in terms of speed.

nan

Article 730

Title@2025-07-27 (7): ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

Title: ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration für große Sprachmodelle

ABQ-LLLM:大语言模型的任意-Bit量化推断加速 2408.08554v3

Authors (9): Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quantization leads to performance degradation. Second, restricted by the limited integer computing unit type on GPUs, quantized matrix operations with different precisions cannot be effectively accelerated. To address these issues, we introduce a novel arbitrary-bit quantization algorithm and inference framework, ABQ-LLM. It achieves superior performance across various quantization settings and enables efficient arbitrary-precision quantized inference on the GPU. ABQ-LLM introduces several key innovations: (1) a distribution correction method for transformer blocks to mitigate distribution differences caused by full quantization of weights and activations, improving performance at low bit-widths. (2) the bit balance strategy to counteract performance degradation from asymmetric distribution issues at very low bit-widths (e.g., 2-bit). (3) an innovative quantization acceleration framework that reconstructs the quantization matrix multiplication of arbitrary precision combinations based on BTC (Binary TensorCore) equivalents, gets rid of the limitations of INT4/INT8 computing units. ABQ-LLM can convert each component bit width gain into actual acceleration gain, maximizing performance under mixed precision(e.g., W6A6, W2A8). Based on W2*A8 quantization configuration on LLaMA-7B model, it achieved a WikiText2 perplexity of 7.59 (2.17$\downarrow $ vs 9.76 in AffineQuant). Compared to SmoothQuant, we realized 1.6$\times$ acceleration improvement and 2.7$\times$ memory compression gain.

nan

Article 731

Title@2025-07-27 (7): Semi-Supervised Risk Control via Prediction-Powered Inference

Title: Semi-Supervised Risk Control via Prediction-Powered Inference

Halbüberwachte Risikokontrolle durch vorausschauende Schlussfolgerung

通过预测力推断的半监督风险控制 2412.11174v2

Authors (3): Bat-Sheva Einbinder, Liran Ringel, Yaniv Romano

The risk-controlling prediction sets (RCPS) framework is a general tool for transforming the output of any machine learning model to design a predictive rule with rigorous error rate control. The key idea behind this framework is to use labeled hold-out calibration data to tune a hyper-parameter that affects the error rate of the resulting prediction rule. However, the limitation of such a calibration scheme is that with limited hold-out data, the tuned hyper-parameter becomes noisy and leads to a prediction rule with an error rate that is often unnecessarily conservative. To overcome this sample-size barrier, we introduce a semi-supervised calibration procedure that leverages unlabeled data to rigorously tune the hyper-parameter without compromising statistical validity. Our procedure builds upon the prediction-powered inference framework, carefully tailoring it to risk-controlling tasks. We demonstrate the benefits and validity of our proposal through two real-data experiments: few-shot image classification and early time series classification.

nan

Article 732

Title@2025-07-27 (7): Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design

Title: Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design

Protein-SE(3): Benchmarking SE(3)-basierte Generative Modelle für Proteinstrukturdesign

蛋白因-SE(3):制定SE(3)基准的蛋白因结构设计生成模型 2507.20243v1

Authors (6): Lang Yu, Zhangyang Gao, Cheng Tan, Qin Chen, Jie Zhou, Liang He

SE(3)-based generative models have shown great promise in protein geometry modeling and effective structure design. However, the field currently lacks a modularized benchmark to enable comprehensive investigation and fair comparison of different methods. In this paper, we propose Protein-SE(3), a new benchmark based on a unified training framework, which comprises protein scaffolding tasks, integrated generative models, high-level mathematical abstraction, and diverse evaluation metrics. Recent advanced generative models designed for protein scaffolding, from multiple perspectives like DDPM (Genie1 and Genie2), Score Matching (FrameDiff and RfDiffusion) and Flow Matching (FoldFlow and FrameFlow) are integrated into our framework. All integrated methods are fairly investigated with the same training dataset and evaluation metrics. Furthermore, we provide a high-level abstraction of the mathematical foundations behind the generative models, enabling fast prototyping of future algorithms without reliance on explicit protein structures. Accordingly, we release the first comprehensive benchmark built upon unified training framework for SE(3)-based protein structure design, which is publicly accessible at https://github.com/BruthYU/protein-se3.

nan

Article 733

Title@2025-07-27 (7): Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers

Title: Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers

Contrast-CAT: Kontrastierende Aktivierungen für verbesserte Interpretierbarkeit in Transformer-basierten Textklassifikatoren

反对-CAT:在基于变换器的文本分类中增强解释力的对比活动 2507.21186v1

Authors (3): Sungmin Han, Jeonghyun Lee, Sangkyun Lee

Transformers have profoundly influenced AI research, but explaining their decisions remains challenging – even for relatively simpler tasks such as classification – which hinders trust and safe deployment in real-world applications. Although activation-based attribution methods effectively explain transformer-based text classification models, our findings reveal that these methods can be undermined by class-irrelevant features within activations, leading to less reliable interpretations. To address this limitation, we propose Contrast-CAT, a novel activation contrast-based attribution method that refines token-level attributions by filtering out class-irrelevant features. By contrasting the activations of an input sequence with reference activations, Contrast-CAT generates clearer and more faithful attribution maps. Experimental results across various datasets and models confirm that Contrast-CAT consistently outperforms state-of-the-art methods. Notably, under the MoRF setting, it achieves average improvements of x1.30 in AOPC and x2.25 in LOdds over the most competing methods, demonstrating its effectiveness in enhancing interpretability for transformer-based text classification.

nan

Article 734

Title@2025-07-27 (7): Recursive KalmanNet: Analyse des capacités de généralisation d’un réseau de neurones récurrent guidé par un filtre de Kalman

Title: Recursive KalmanNet: Analyse des capacités de généralisation d’un réseau de neurones récurrent guidé par un filtre de Kalman

Rekursives KalmanNet: Analyse des capacités de généralisierung d’un réseau de neurones récurrent guidé par un filtre de Kalman

Crecursive KalmanNet:卡尔曼岛非过滤状态神经元神经元神经元神经元系统总分类能力分析 2507.14144v2

Authors (4): Cyril Falcon, Hassan Mortada, Mathéo Clavaud, Jean-Philippe Michel

The Recursive KalmanNet, recently introduced by the authors, is a recurrent neural network guided by a Kalman filter, capable of estimating the state variables and error covariance of stochastic dynamic systems from noisy measurements, without prior knowledge of the noise characteristics. This paper explores its generalization capabilities in out-of-distribution scenarios, where the temporal dynamics of the test measurements differ from those encountered during training. Le Recursive KalmanNet, r'ecemment introduit par les auteurs, est un r'eseau de neurones r'ecurrent guid'e par un filtre de Kalman, capable d’estimer les variables d’'etat et la covariance des erreurs des syst`emes dynamiques stochastiques `a partir de mesures bruit'ees, sans connaissance pr'ealable des caract'eristiques des bruits. Cet article explore ses capacit'es de g'en'eralisation dans des sc'enarios hors distribution, o`u les dynamiques temporelles des mesures de test diff`erent de celles rencontr'ees `a l’entra\^inement.

nan

Article 735

Title@2025-07-27 (7): TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

Title: TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung

TinySQL: 用于机械解释性研究的渐进文本到SQL数据集 2503.12730v4

Authors (6): Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah

Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.

nan

Article 736

Title@2025-07-27 (7): GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Title: GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Geführte Zahl: Quantisierung von großen Sprachmodellen durch Ausnutzung der End Loss Guidance

向导量:通过利用最终损失指导意见对大语言模型进行量化 2505.07004v3

Authors (8): Jinuk Kim, Marwa El Halabi, Wonpyo Park, Clemens JS Schaefer, Deokjae Lee, Yeonhong Park, Jae W. Lee, Hyun Oh Song

Post-training quantization is a key technique for reducing the memory and inference latency of large language models by quantizing weights and activations without requiring retraining. However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. GuidedQuant consistently boosts the performance of state-of-the-art quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization. Additionally, we introduce a novel non-uniform scalar quantization algorithm, which is guaranteed to monotonically decrease the quantization objective value, and outperforms existing methods in this category. We release the code at https://github.com/snu-mllab/GuidedQuant.

nan

Article 737

Title@2025-07-27 (7): Adaptive Real-Time Multi-Loss Function Optimization Using Dynamic Memory Fusion Framework: A Case Study on Breast Cancer Segmentation

Title: Adaptive Real-Time Multi-Loss Function Optimization Using Dynamic Memory Fusion Framework: A Case Study on Breast Cancer Segmentation

Adaptive Echtzeit-Multi-Loss-Funktionsoptimierung mittels Dynamic Memory Fusion Framework: Eine Fallstudie zur Brustkrebssegmentierung

利用动态记忆融合框架,利用动态记忆融合框架优化适应性实时多损失功能:乳腺癌分割案例研究 2410.19745v2

Authors (2): Amin Golnari, Mostafa Diba

Deep learning has proven to be a highly effective tool for a wide range of applications, significantly when leveraging the power of multi-loss functions to optimize performance on multiple criteria simultaneously. However, optimal selection and weighting loss functions in deep learning tasks can significantly influence model performance, yet manual tuning of these functions is often inefficient and inflexible. We propose a novel framework called dynamic memory fusion for adaptive multi-loss function penalizing in real-time to address this. This framework leverages historical loss values data to dynamically adjust the weighting of multiple loss functions throughout the training process. Additionally, this framework integrates an auxiliary loss function to enhance model performance in the early stages. To further research horizons, we introduce the class-balanced dice loss function, designed to address class imbalance by prioritizing underrepresented classes. Experiments on breast ultrasound datasets demonstrate that the framework improves segmentation performance across various metrics. These results demonstrate the effectiveness of our proposed framework in ensuring that the model dynamically adjusts its focus to prioritize the most relevant criteria, leading to improved performance in evolving environments. The source code for our proposed methodology is publicly available on GitHub.

nan

Article 738

Title@2025-07-27 (7): Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading

Title: Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading

Technical Indicator Networks (TINs): Eine interpretierbare Neuralarchitektur zur Modernisierung der klassischen al-Technischen Analyse für adaptives algorithmisches Trading

技术指标网络(TINs):适应性定值贸易的现代经典技术分析解释性神经结构 2507.20202v1

Authors (1): Longfei Lu

This work proposes that a vast majority of classical technical indicators in financial analysis are, in essence, special cases of neural networks with fixed and interpretable weights. It is shown that nearly all such indicators, such as moving averages, momentum-based oscillators, volatility bands, and other commonly used technical constructs, can be reconstructed topologically as modular neural network components. Technical Indicator Networks (TINs) are introduced as a general neural architecture that replicates and structurally upgrades traditional indicators by supporting n-dimensional inputs such as price, volume, sentiment, and order book data. By encoding domain-specific knowledge into neural structures, TINs modernize the foundational logic of technical analysis and propel algorithmic trading into a new era, bridging the legacy of proven indicators with the potential of contemporary AI systems.

nan

Article 739

Title@2025-07-27 (7): Does equivariance matter at scale?

Title: Does equivariance matter at scale?

Fällt die Gleichwertigkeit im Maßstab auf?

在规模上,等差是否重要? 2410.23179v2

Authors (4): Johann Brehmer, Sönke Behrends, Pim de Haan, Taco Cohen

Given large datasets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.

nan

Article 740

Title@2025-07-27 (7): Partial Domain Adaptation via Importance Sampling-based Shift Correction

Title: Partial Domain Adaptation via Importance Sampling-based Shift Correction

Partielle Domänenanpassung über wichtige Sampling-basierte Shift-Korrektur

通过基于重要性抽样的调整校正 2507.20191v1

Authors (5): Cheng-Jun Guo, Chuan-Xian Ren, You-Wei Luo, Xiao-Lin Xu, Hong Yan

Partial domain adaptation (PDA) is a challenging task in real-world machine learning scenarios. It aims to transfer knowledge from a labeled source domain to a related unlabeled target domain, where the support set of the source label distribution subsumes the target one. Previous PDA works managed to correct the label distribution shift by weighting samples in the source domain. However, the simple reweighing technique cannot explore the latent structure and sufficiently use the labeled data, and then models are prone to over-fitting on the source domain. In this work, we propose a novel importance sampling-based shift correction (IS$^2$C) method, where new labeled data are sampled from a built sampling domain, whose label distribution is supposed to be the same as the target domain, to characterize the latent structure and enhance the generalization ability of the model. We provide theoretical guarantees for IS$^2$C by proving that the generalization error can be sufficiently dominated by IS$^2$C. In particular, by implementing sampling with the mixture distribution, the extent of shift between source and sampling domains can be connected to generalization error, which provides an interpretable way to build IS$^2$C. To improve knowledge transfer, an optimal transport-based independence criterion is proposed for conditional distribution alignment, where the computation of the criterion can be adjusted to reduce the complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2)$ in realistic PDA scenarios. Extensive experiments on PDA benchmarks validate the theoretical results and demonstrate the effectiveness of our IS$^2$C over existing methods.

nan

Article 741

Title@2025-07-27 (7): NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis

Title: NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis

NeuroCLIP: Eine multimodale kontrastive Lernmethode für rTMS-behandelte Methamphetamin-Addiktionsanalyse

NeuroCLIP:经RTMS处理的甲基苯丙胺成瘾分析的多式反竞争学习方法 2507.20189v1

Authors (13): Chengkai Wang, Di Wu, Yunsheng Liao, Wenyao Zheng, Ziyi Zeng, Xurong Gao, Hemmings Wu, Zhoule Zhu, Jie Yang, Lihua Zhong, Weiwei Cheng, Yun-Hsuan Chen, Mohamad Sawan

Methamphetamine dependence poses a significant global health challenge, yet its assessment and the evaluation of treatments like repetitive transcranial magnetic stimulation (rTMS) frequently depend on subjective self-reports, which may introduce uncertainties. While objective neuroimaging modalities such as electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) offer alternatives, their individual limitations and the reliance on conventional, often hand-crafted, feature extraction can compromise the reliability of derived biomarkers. To overcome these limitations, we propose NeuroCLIP, a novel deep learning framework integrating simultaneously recorded EEG and fNIRS data through a progressive learning strategy. This approach offers a robust and trustworthy biomarker for methamphetamine addiction. Validation experiments show that NeuroCLIP significantly improves discriminative capabilities among the methamphetamine-dependent individuals and healthy controls compared to models using either EEG or only fNIRS alone. Furthermore, the proposed framework facilitates objective, brain-based evaluation of rTMS treatment efficacy, demonstrating measurable shifts in neural patterns towards healthy control profiles after treatment. Critically, we establish the trustworthiness of the multimodal data-driven biomarker by showing its strong correlation with psychometrically validated craving scores. These findings suggest that biomarker derived from EEG-fNIRS data via NeuroCLIP offers enhanced robustness and reliability over single-modality approaches, providing a valuable tool for addiction neuroscience research and potentially improving clinical assessments.

nan

Article 742

Title@2025-07-27 (7): Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Title: Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern

齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略 2506.05713v2

Authors (12): Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei

Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.

nan

Article 743

Title@2025-07-27 (7): ASNN: Learning to Suggest Neural Architectures from Performance Distributions

Title: ASNN: Learning to Suggest Neural Architectures from Performance Distributions

ASNN: Neurale Architekturen aus Leistungsverteilungen vorschlagen lernen

ASNN: 学习从业绩分配中建议神经结构 2507.20164v1

Authors (1): Jinwook Hong

The architecture of a neural network (NN) plays a critical role in determining its performance. However, there is no general closed-form function that maps between network structure and accuracy, making the process of architecture design largely heuristic or search-based. In this study, we propose the Architecture Suggesting Neural Network (ASNN), a model designed to learn the relationship between NN architecture and its test accuracy, and to suggest improved architectures accordingly. To train ASNN, we constructed datasets using TensorFlow-based models with varying numbers of layers and nodes. Experimental results were collected for both 2-layer and 3-layer architectures across a grid of configurations, each evaluated with 10 repeated trials to account for stochasticity. Accuracy values were treated as inputs, and architectural parameters as outputs. The trained ASNN was then used iteratively to predict architectures that yield higher performance. In both 2-layer and 3-layer cases, ASNN successfully suggested architectures that outperformed the best results found in the original training data. Repeated prediction and retraining cycles led to the discovery of architectures with improved mean test accuracies, demonstrating the model’s capacity to generalize the performance-structure relationship. These results suggest that ASNN provides an efficient alternative to random search for architecture optimization, and offers a promising approach toward automating neural network design. “Parts of the manuscript, including text editing and expression refinement, were supported by OpenAI’s ChatGPT. All content was reviewed and verified by the authors.”

nan

Article 744

Title@2025-07-27 (7): Syno: Structured Synthesis for Neural Operators

Title: Syno: Structured Synthesis for Neural Operators

Syno: Strukturierte Synthese für neurale Operatoren

同步:神经操作员结构化合成 2410.23745v2

Authors (4): Yongqi Zhuo, Zhengyuan Su, Chenggang Zhao, Mingyu Gao

The desires for better prediction accuracy and higher execution performance in neural networks never end. Neural architecture search (NAS) and tensor compilers are two popular techniques to optimize these two goals, but they are both limited to composing or optimizing existing manually designed operators rather than coming up with completely new designs. In this work, we explore the less studied direction of neural operator synthesis, which aims to automatically and efficiently discover novel neural operators with better accuracy and/or speed. We develop an end-to-end framework Syno, to realize practical neural operator synthesis. Syno makes use of a novel set of fine-grained primitives defined on tensor dimensions, which ensure various desired properties to ease model training, and also enable expression canonicalization techniques to avoid redundant candidates during search. Syno further adopts a novel guided synthesis flow to obtain valid operators matched with the specified input/output dimension sizes, and leverages efficient stochastic tree search algorithms to quickly explore the design space. We demonstrate that Syno discovers better operators with average speedups of $1.37\times$ to $2.06\times$ on various hardware and compiler choices, while keeping less than 1% accuracy loss even on NAS-optimized models.

nan

Article 745

Title@2025-07-27 (7): Practical Multi-Task Learning for Rare Conversions in Ad Tech

Title: Practical Multi-Task Learning for Rare Conversions in Ad Tech

Praktisches Multi-Task-Lernen für rare Konvertierungen in der Anzeigentechnik

利用实用多任务学习技术技术中稀有转换的多目的实用学习 2507.20161v1

Authors (5): Yuval Dishi, Ophir Friedler, Yonatan Karni, Natalia Silberstein, Yulia Stolin

We present a Multi-Task Learning (MTL) approach for improving predictions for rare (e.g., <1%) conversion events in online advertising. The conversions are classified into “rare” or “frequent” types based on historical statistics. The model learns shared representations across all signals while specializing through separate task towers for each type. The approach was tested and fully deployed to production, demonstrating consistent improvements in both offline (0.69% AUC lift) and online KPI performance metric (2% Cost per Action reduction).

nan

Article 746

Title@2025-07-27 (7): On the Role of Discrete Representation in Sparse Mixture of Experts

Title: On the Role of Discrete Representation in Sparse Mixture of Experts

Über die Rolle der diskreten Vertretung in der Sparse Mischung von Experten

关于专家在散乱混混中代表的混乱作用问题 2411.19402v2

Authors (4): Giang Do, Kha Pham, Hung Le, Truyen Tran

Sparse mixture of experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via indirection, which employs the discrete representation of input that points to the expert. The discrete representations are learnt via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE’s ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language models and vision tasks for pre-training and fine-tuning, we show that VQMoE achieves a 28% improvement in robustness compared to other SMoE routing methods, while maintaining strong performance in fine-tuning tasks.

nan

Article 747

Title@2025-07-27 (7): SETOL: A Semi-Empirical Theory of (Deep) Learning

Title: SETOL: A Semi-Empirical Theory of (Deep) Learning

SETOL: Eine semi-empirische Theorie des (Tiefen) Lernens

SETOL:半经验学理论(深)学习 2507.17912v2

Authors (2): Charles H Martin, Christopher Hinrichs

We present a SemiEmpirical Theory of Learning (SETOL) that explains the remarkable performance of State-Of-The-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR): the heavy-tailed power-law layer quality metrics, alpha and alpha-hat. In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, importantly, without needing access to either testing or training data. Our SETOL uses techniques from statistical mechanics as well as advanced methods from random matrix theory and quantum chemistry. The derivation suggests new mathematical preconditions for ideal learning, including a new metric, ERG, which is equivalent to applying a single step of the Wilson Exact Renormalization Group. We test the assumptions and predictions of SETOL on a simple 3-layer multilayer perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer qualities of a trained NN by simply computing the empirical spectral density (ESD) of the layer weight matrices and plugging this ESD into our SETOL formulas. Notably, we examine the performance of the HTSR alpha and the SETOL ERG layer quality metrics, and find that they align remarkably well, both on our MLP and on SOTA NNs.

nan

Article 748

Title@2025-07-27 (7): The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models

Title: The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models

The Policy Cliff: Eine theoretische Analyse von Belohnungs-Policy-Karten in großen Sprachmodellen

政策悬崖:大语言模式奖励政策图的理论分析 2507.20150v1

Authors (1): Xingcheng Xu

Reinforcement learning (RL) plays a crucial role in shaping the behavior of large language and reasoning models (LLMs/LRMs). However, it often produces brittle and unstable policies, leading to critical failures such as spurious reasoning, deceptive alignment, and instruction disobedience that undermine the trustworthiness and safety of LLMs/LRMs. Currently, these issues lack a unified theoretical explanation and are typically addressed using ad-hoc heuristics. This paper presents a rigorous mathematical framework for analyzing the stability of the mapping from a reward function to the optimal policy. We show that policy brittleness often stems from non-unique optimal actions, a common occurrence when multiple valid traces exist in a reasoning task. This theoretical lens provides a unified explanation for a range of seemingly disparate failures, reframing them as rational outcomes of optimizing rewards that may be incomplete or noisy, especially in the presence of action degeneracy. We extend this analysis from the fundamental single-reward setting to the more realistic multi-reward RL across diverse domains, showing how stability is governed by an “effective reward” aggregation mechanism. We also prove that entropy regularization restores policy stability at the cost of increased stochasticity. Our framework provides a unified explanation for recent empirical findings on deceptive reasoning, instruction-following trade-offs, and RLHF-induced sophistry, and is further validated through perturbation experiments in multi-reward RL. This work advances policy-stability analysis from empirical heuristics towards a principled theory, offering essential insights for designing safer and more trustworthy AI systems.

nan

Article 749

Title@2025-07-27 (7): Awesome-OL: An Extensible Toolkit for Online Learning

Title: Awesome-OL: An Extensible Toolkit for Online Learning

Awesome-OL: Ein umfangreiches Toolkit für das Online-Lernen

OSUE-OL:网上学习扩展工具包 2507.20144v1

Authors (5): Zeyi Liu, Songqiao Hu, Pengyu Han, Jiaming Liu, Xiao He

In recent years, online learning has attracted increasing attention due to its adaptive capability to process streaming and non-stationary data. To facilitate algorithm development and practical deployment in this area, we introduce Awesome-OL, an extensible Python toolkit tailored for online learning research. Awesome-OL integrates state-of-the-art algorithm, which provides a unified framework for reproducible comparisons, curated benchmark datasets, and multi-modal visualization. Built upon the scikit-multiflow open-source infrastructure, Awesome-OL emphasizes user-friendly interactions without compromising research flexibility or extensibility. The source code is publicly available at: https://github.com/liuzy0708/Awesome-OL.

nan

Article 750

Title@2025-07-27 (7): Generalized Trusted Multi-view Classification Framework with Hierarchical Opinion Aggregation

Title: Generalized Trusted Multi-view Classification Framework with Hierarchical Opinion Aggregation

Generalized Trusted Multi-View-Klassifikationsrahmen mit Hierarchischer Meinung Aggregation

普遍信任的多观点分类框架和等级性意见汇总 2411.03713v2

Authors (6): Long Shi, Chuanqing Tang, Huangyi Deng, Cai Xu, Lei Xing, Badong Chen

Recently, multi-view learning has witnessed a considerable interest on the research of trusted decision-making. Previous methods are mainly inspired from an important paper published by Han et al. in 2021, which formulates a Trusted Multi-view Classification (TMC) framework that aggregates evidence from different views based on Dempster’s combination rule. All these methods only consider inter-view aggregation, yet lacking exploitation of intra-view information. In this paper, we propose a generalized trusted multi-view classification framework with hierarchical opinion aggregation. This hierarchical framework includes a two-phase aggregation process: the intra-view and inter-view aggregation hierarchies. In the intra aggregation, we assume that each view is comprised of common information shared with other views, as well as its specific information. We then aggregate both the common and specific information. This aggregation phase is useful to eliminate the feature noise inherent to view itself, thereby improving the view quality. In the inter-view aggregation, we design an attention mechanism at the evidence level to facilitate opinion aggregation from different views. To the best of our knowledge, this is one of the pioneering efforts to formulate a hierarchical aggregation framework in the trusted multi-view learning domain. Extensive experiments show that our model outperforms some state-of art trust-related baselines. One can access the source code on https://github.com/lshi91/GTMC-HOA.

nan

Article 751

Title@2025-07-27 (7): Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

Title: Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

Verteiltes Lernen über willkürliche Topologie: Lineares Tempo-Up mit polynomischer Transienten Zeit

任意地形学的分布式学习:线性快速提升与多面性瞬时 2503.16123v2

Authors (2): Runze You, Shi Pu

We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, \emph{Spanning Tree Push-Pull} (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradients. Unlike prior approaches that rely heavily on spectral gap properties, STPP leverages a more flexible topological characterization, enabling robust information flow and efficient updates. Theoretically, we prove that STPP achieves linear speedup and polynomial transient iteration complexity – up to $\mathcal{O}(n^7)$ for smooth nonconvex objectives and $\tilde{\mathcal{O}}(n^3)$ for smooth strongly convex objectives – under arbitrary network topologies. Moreover, compared with existing methods, STPP achieves faster convergence rates on sparse and non-regular topologies (e.g., directed rings) and reduces communication overhead on dense networks (e.g., static exponential graphs). Numerical experiments further demonstrate the strong performance of STPP across various graph architectures.

nan

Article 752

Title@2025-07-27 (7): EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models

Title: EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models

EvoSLD: Automatisierte Neural Scaling Law Discovery mit großen Sprachmodellen

EvoSLD: 用大语言模型发现自动神经放大法 2507.21184v1

Authors (4): Haowei Lin, Xiangyu Wang, Jianzhu Ma, Yitao Liang

Scaling laws are fundamental mathematical relationships that predict how neural network performance evolves with changes in variables such as model size, dataset size, and computational resources. Traditionally, discovering these laws requires extensive human expertise and manual experimentation. We introduce EvoSLD, an automated framework for Scaling Law Discovery (SLD) that leverages evolutionary algorithms guided by Large Language Models (LLMs) to co-evolve symbolic expressions and their optimization routines. Formulated to handle scaling variables, control variables, and response metrics across diverse experimental settings, EvoSLD searches for parsimonious, universal functional forms that minimize fitting errors on grouped data subsets. Evaluated on five real-world scenarios from recent literature, EvoSLD rediscovers exact human-derived laws in two cases and surpasses them in others, achieving up to orders-of-magnitude reductions in normalized mean squared error on held-out test sets. Compared to baselines like symbolic regression and ablated variants, EvoSLD demonstrates superior accuracy, interpretability, and efficiency, highlighting its potential to accelerate AI research. Code is available at https://github.com/linhaowei1/SLD.

nan

Article 753

Title@2025-07-27 (7): When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Title: When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Wann funktioniert Metadata Conditioning (NOT) für Sprachmodell-Vorschulungen? Eine Studie mit kontextfreien Grammatiken

元数据条件(NOT)何时能为语言示范培训前培训提供语言示范?无背景语法研究 2504.17562v2

Authors (10): Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa, Kazusato Oko, Shoichiro Yamaguchi, Sosuke Kobayashi, Seiya Tokui, Kohei Hayashi, Daisuke Okanohara, Taiji Suzuki

The ability to acquire latent semantics is one of the key properties that determines the performance of language models. One convenient approach to invoke this ability is to prepend metadata (e.g. URLs, domains, and styles) at the beginning of texts in the pre-training data, making it easier for the model to access latent semantics before observing the entire text. Previous studies have reported that this technique actually improves the performance of trained models in downstream tasks; however, this improvement has been observed only in specific downstream tasks, without consistent enhancement in average next-token prediction loss. To understand this phenomenon, we closely investigate how prepending metadata during pre-training affects model performance by examining its behavior using artificial data. Interestingly, we found that this approach produces both positive and negative effects on the downstream tasks. We demonstrate that the effectiveness of the approach depends on whether latent semantics can be inferred from the downstream task’s prompt. Specifically, through investigations using data generated by probabilistic context-free grammars, we show that training with metadata helps improve model’s performance when the given context is long enough to infer the latent semantics. In contrast, the technique negatively impacts performance when the context lacks the necessary information to make an accurate posterior inference.

nan

Article 754

Title@2025-07-27 (7): MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Title: MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

MaPPO: Maximale Posteriori-Preference-Optimierung mit vorherigem Wissen

MaPPPO: 与先前知识最优化的后世偏好 2507.21183v1

Authors (10): Guangchen Lan, Sipeng Zhang, Tianle Wang, Yuwei Zhang, Daoan Zhang, Xinpeng Wei, Xiaoman Pan, Hongming Zhang, Dong-Jun Han, Christopher G. Brinton

As the era of large language models (LLMs) on behalf of users unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a framework for learning from preferences that explicitly incorporates prior reward knowledge into the optimization objective. While existing methods such as Direct Preference Optimization (DPO) and its variants treat preference learning as a Maximum Likelihood Estimation (MLE) problem, MaPPO extends this paradigm by integrating prior reward estimates into a principled Maximum a Posteriori (MaP) objective. This not only generalizes DPO and its variants, but also enhances alignment by mitigating the oversimplified binary classification of responses. More importantly, MaPPO introduces no additional hyperparameter, and supports preference optimization in both offline and online settings. In addition, MaPPO can be used as a plugin with consistent improvement on DPO variants, including widely used SimPO, IPO, and CPO. Extensive empirical evaluations of different model sizes and model series on three standard benchmarks, including MT-Bench, AlpacaEval 2.0, and Arena-Hard, demonstrate consistent improvements in alignment performance without sacrificing computational efficiency.

nan

Article 755

Title@2025-07-27 (7): Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design

Title: Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design

Generative Molekülentwicklung mit 3D-Pharmakore für effizientes strukturbasiertes Drug Design

利用3D药用磷进行高效结构制药物设计生成分子进化 2507.20130v1

Authors (6): Yi He, Ailun Wang, Zhi Wang, Yu Liu, Xingyuan Xu, Wen Yan

Recent advances in generative models, particularly diffusion and auto-regressive models, have revolutionized fields like computer vision and natural language processing. However, their application to structure-based drug design (SBDD) remains limited due to critical data constraints. To address the limitation of training data for models targeting SBDD tasks, we propose an evolutionary framework named MEVO, which bridges the gap between billion-scale small molecule dataset and the scarce protein-ligand complex dataset, and effectively increase the abundance of training data for generative SBDD models. MEVO is composed of three key components: a high-fidelity VQ-VAE for molecule representation in latent space, a diffusion model for pharmacophore-guided molecule generation, and a pocket-aware evolutionary strategy for molecule optimization with physics-based scoring function. This framework efficiently generate high-affinity binders for various protein targets, validated with predicted binding affinities using free energy perturbation (FEP) methods. In addition, we showcase the capability of MEVO in designing potent inhibitors to KRAS$^{\textrm{G12D}}$, a challenging target in cancer therapeutics, with similar affinity to the known highly active inhibitor evaluated by FEP calculations. With high versatility and generalizability, MEVO offers an effective and data-efficient model for various tasks in structure-based ligand design.

nan

Article 756

Title@2025-07-27 (7): Aggregation-aware MLP: An Unsupervised Approach for Graph Message-passing

Title: Aggregation-aware MLP: An Unsupervised Approach for Graph Message-passing

Aggregation-aware MLP: Ein unbeaufsichtigter Ansatz für Graph Message-Passing

聚合意识 MLP: 图形信件传送的无人监督的方法 2507.20127v1

Authors (5): Xuanting Xie, Bingheng Li, Erlin Pan, Zhao Kang, Wenyu Chen

Graph Neural Networks (GNNs) have become a dominant approach to learning graph representations, primarily because of their message-passing mechanisms. However, GNNs typically adopt a fixed aggregator function such as Mean, Max, or Sum without principled reasoning behind the selection. This rigidity, especially in the presence of heterophily, often leads to poor, problem dependent performance. Although some attempts address this by designing more sophisticated aggregation functions, these methods tend to rely heavily on labeled data, which is often scarce in real-world tasks. In this work, we propose a novel unsupervised framework, “Aggregation-aware Multilayer Perceptron” (AMLP), which shifts the paradigm from directly crafting aggregation functions to making MLP adaptive to aggregation. Our lightweight approach consists of two key steps: First, we utilize a graph reconstruction method that facilitates high-order grouping effects, and second, we employ a single-layer network to encode varying degrees of heterophily, thereby improving the capacity and applicability of the model. Extensive experiments on node clustering and classification demonstrate the superior performance of AMLP, highlighting its potential for diverse graph learning scenarios.

nan

Article 757

Title@2025-07-27 (7): Minimax Optimal Reinforcement Learning with Quasi-Optimism

Title: Minimax Optimal Reinforcement Learning with Quasi-Optimism

Minimax Optimales Stärkungslernen mit Quasi-Optimismus

以准适应主义进行最优化强化学习 2503.00810v3

Authors (2): Harin Lee, Min-hwan Oh

In our quest for a reinforcement learning (RL) algorithm that is both practical and provably optimal, we introduce EQO (Exploration via Quasi-Optimism). Unlike existing minimax optimal approaches, EQO avoids reliance on empirical variances and employs a simple bonus term proportional to the inverse of the state-action visit count. Central to EQO is the concept of quasi-optimism, where estimated values need not be fully optimistic, allowing for a simpler yet effective exploration strategy. The algorithm achieves the sharpest known regret bound for tabular RL under the mildest assumptions, proving that fast convergence can be attained with a practical and computationally efficient approach. Empirical evaluations demonstrate that EQO consistently outperforms existing algorithms in both regret performance and computational efficiency, providing the best of both theoretical soundness and practical effectiveness.

nan

Article 758

Title@2025-07-27 (7): Wine Characterisation with Spectral Information and Predictive Artificial Intelligence

Title: Wine Characterisation with Spectral Information and Predictive Artificial Intelligence

Weincharakterisierung mit Spektralinformation und vorausschauender Künstlicher Intelligenz

光谱信息和预报人工智能的优美特征 2507.20114v1

Authors (5): Jianping Yao, Son N. Tran, Hieu Nguyen, Samantha Sawyer, Rocco Longo

The purpose of this paper is to use absorbance data obtained by human tasting and an ultraviolet-visible (UV-Vis) scanning spectrophotometer to predict the attributes of grape juice (GJ) and to classify the wine’s origin, respectively. The approach combined machine learning (ML) techniques with spectroscopy to find a relatively simple way to apply them in two stages of winemaking and help improve the traditional wine analysis methods regarding sensory data and wine’s origins. This new technique has overcome the disadvantages of the complex sensors by taking advantage of spectral fingerprinting technology and forming a comprehensive study of the employment of AI in the wine analysis domain. In the results, Support Vector Machine (SVM) was the most efficient and robust in both attributes and origin prediction tasks. Both the accuracy and F1 score of the origin prediction exceed 91%. The feature ranking approach found that the more influential wavelengths usually appear at the lower end of the scan range, 250 nm (nanometers) to 420 nm, which is believed to be of great help for selecting appropriate validation methods and sensors to extract wine data in future research. The knowledge of this research provides new ideas and early solutions for the wine industry or other beverage industries to integrate big data and IoT in the future, which significantly promotes the development of ‘Smart Wineries’.

nan

Article 759

Title@2025-07-27 (7): Online Learning with Probing for Sequential User-Centric Selection

Title: Online Learning with Probing for Sequential User-Centric Selection

Online-Lernen mit Probing für die sequentielle Benutzer-Centric-Auswahl

在线学习,通过测试进行序列用户- Centric 选择 2507.20112v1

Authors (6): Tianyi Xu, Yiting Chen, Henger Li, Zheyong Bian, Emiliano Dall’Anese, Zizhan Zheng

We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns $K$ plays to $M$ arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee $\zeta = (e-1)/(2e-1)$. For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound $\mathcal{O}(\sqrt{T} + \ln^{2} T)$. We also prove a lower bound $\Omega(\sqrt{T})$, showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.

nan

Article 760

Title@2025-07-27 (7): NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding

Title: NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding

NeuroVoxel-LM: Sprachorientierte 3D-Perception über dynamische Voxelisierung und Meta-Embedding

NeuroVoxel-LM:通过动态氧化化和代谢生成的3D感知 2507.20110v1

Authors (2): Shiyu Liu, Lianlei Shan

Recent breakthroughs in Visual Language Models (VLMs) and Multimodal Large Language Models (MLLMs) have significantly advanced 3D scene perception towards language-driven cognition. However, existing 3D language models struggle with sparse, large-scale point clouds due to slow feature extraction and limited representation accuracy. To address these challenges, we propose NeuroVoxel-LM, a novel framework that integrates Neural Radiance Fields (NeRF) with dynamic resolution voxelization and lightweight meta-embedding. Specifically, we introduce a Dynamic Resolution Multiscale Voxelization (DR-MSV) technique that adaptively adjusts voxel granularity based on geometric and structural complexity, reducing computational cost while preserving reconstruction fidelity. In addition, we propose the Token-level Adaptive Pooling for Lightweight Meta-Embedding (TAP-LME) mechanism, which enhances semantic representation through attention-based weighting and residual fusion. Experimental results demonstrate that DR-MSV significantly improves point cloud feature extraction efficiency and accuracy, while TAP-LME outperforms conventional max-pooling in capturing fine-grained semantics from NeRF weights.

nan

Article 761

Title@2025-07-27 (7): Graded Transformers: A Symbolic-Geometric Approach to Structured Learning

Title: Graded Transformers: A Symbolic-Geometric Approach to Structured Learning

Gradierte Transformer: Ein symbolisch-geometrischer Ansatz zum strukturierten Lernen

等级变换器:结构化学习的象征性地质计量方法 2507.20108v1

Authors (1): Tony Shaska Sr

We introduce the Graded Transformer framework, a novel class of sequence models that embeds algebraic inductive biases through grading transformations on vector spaces. Extending the theory of Graded Neural Networks (GNNs), we propose two architectures: the Linearly Graded Transformer (LGT) and the Exponentially Graded Transformer (EGT). These models apply parameterized scaling operators-governed by fixed or learnable grading tuples and, for EGT, exponential factors to infuse hierarchical structure into attention and representation layers, enhancing efficiency for structured data. We derive rigorous theoretical guarantees, including universal approximation theorems for continuous and Sobolev functions, reduced sample complexity via effective VC dimension bounds, Lipschitz continuity of graded operations, and robustness to adversarial perturbations. A graded loss function ensures gradient stability and alignment with domain priors during optimization. By treating grades as differentiable parameters, the framework enables adaptive feature prioritization, overcoming limitations of fixed grades in prior work. The Graded Transformer holds transformative potential for hierarchical learning and neurosymbolic reasoning, with applications spanning algebraic geometry (e.g., moduli spaces and zeta functions), physics (e.g., multiscale simulations), natural language processing (e.g., syntactic parsing), biological sequence analysis (e.g., variant prediction), and emerging areas like graph neural networks and financial modeling. This work advances structured deep learning by fusing geometric and algebraic principles with attention mechanisms, offering a mathematically grounded alternative to data-driven models and paving the way for interpretable, efficient systems in complex domains.

nan

Article 762

Title@2025-07-27 (7): RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Title: RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

RL$^3$: Förderung des Meta-Verstärkung-Lernens über RL innerhalb RL$^2$

3,300卢比:通过RL在RL内促进元加强学习,2卢比 2306.15909v6

Authors (3): Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

Meta reinforcement learning (Meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but do converge to an optimal policy in the limit. We propose RL$^3$, a principled hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to Meta-RL. We show that RL$^3$ earns a greater cumulative reward in the long term compared to RL$^2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks. Experiments are conducted on Meta-RL benchmarks and custom discrete domains that exhibit a range of short-term, long-term, and complex dependencies.

nan

Article 763

Title@2025-07-27 (7): Analytic Continual Test-Time Adaptation for Multi-Modality Corruption

Title: Analytic Continual Test-Time Adaptation for Multi-Modality Corruption

Analytische kontinuierliche Test-Zeit-Anpassung für Multi-Modalität Korruption

多模式腐败分析分析的连续测试时间适应多模式腐败 2410.22373v2

Authors (7): Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang

Test-Time Adaptation (TTA) enables pre-trained models to bridge the gap between source and target datasets using unlabeled test data, addressing domain shifts caused by corruptions like weather changes, noise, or sensor malfunctions in test time. Multi-Modal Continual Test-Time Adaptation (MM-CTTA), as an extension of standard TTA, further allows models to handle multi-modal inputs and adapt to continuously evolving target domains. However, MM-CTTA faces critical challenges such as catastrophic forgetting and reliability bias, which are rarely addressed effectively under multi-modal corruption scenarios. In this paper, we propose a novel approach, Multi-modality Dynamic Analytic Adapter (MDAA), to tackle MM-CTTA tasks. MDAA introduces analytic learning,a closed-form training technique,through Analytic Classifiers (ACs) to mitigate catastrophic forgetting. Furthermore, we design the Dynamic Late Fusion Mechanism (DLFM) to dynamically select and integrate reliable information from different modalities. Extensive experiments show that MDAA achieves state-of-the-art performance across the proposed tasks.

nan

Article 764

Title@2025-07-27 (7): EcoTransformer: Attention without Multiplication

Title: EcoTransformer: Attention without Multiplication

EcoTransformer: Achtung ohne Multiplikation

生态转换:注意不乘数 2507.20096v1

Authors (2): Xin Gao, Xingming Xu

The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy.

nan

Article 765

Title@2025-07-27 (7): Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Title: Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Meta Fusion: Ein einheitliches Rahmenwerk für Multimodalitätsfusion mit gegenseitigem Lernen

元融合:多式联运与相互学习统一框架 2507.20089v1

Authors (3): Ziyi Liang, Annie Qu, Babak Shahbaba

Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical diagnosis. Traditional fusion methods, including early, intermediate, and late fusion, integrate data at different stages, each offering distinct advantages and limitations. In this paper, we introduce Meta Fusion, a flexible and principled framework that unifies these existing strategies as special cases. Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities, and further boosts predictive performance through soft information sharing within the cohort. Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality. Theoretically, our soft information sharing mechanism reduces the generalization error. Empirically, Meta Fusion consistently outperforms conventional fusion strategies in extensive simulation studies. We further validate our approach on real-world applications, including Alzheimer’s disease detection and neural decoding.

nan

Article 766

Title@2025-07-27 (7): Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

Title: Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC 2505.18300v3

Authors (3): Jie Hu, Yi-Ting Ma, Do Young Eun

We propose a history-driven target (HDT) framework in Markov Chain Monte Carlo (MCMC) to improve any random walk algorithm on discrete state spaces, such as general undirected graphs, for efficient sampling from target distribution $\boldsymbol{\mu}$. With broad applications in network science and distributed optimization, recent innovations like the self-repellent random walk (SRRW) achieve near-zero variance by prioritizing under-sampled states through transition kernel modifications based on past visit frequencies. However, SRRW’s reliance on explicit computation of transition probabilities for all neighbors at each step introduces substantial computational overhead, while its strict dependence on time-reversible Markov chains excludes advanced non-reversible MCMC methods. To overcome these limitations, instead of direct modification of transition kernel, HDT introduces a history-dependent target distribution $\boldsymbol{\pi}[\mathbf{x}]$ to replace the original target $\boldsymbol{\mu}$ in any graph sampler, where $\mathbf{x}$ represents the empirical measure of past visits. This design preserves lightweight implementation by requiring only local information between the current and proposed states and achieves compatibility with both reversible and non-reversible MCMC samplers, while retaining unbiased samples with target distribution $\boldsymbol{\mu}$ and near-zero variance performance. Extensive experiments in graph sampling demonstrate consistent performance gains, and a memory-efficient Least Recently Used (LRU) cache ensures scalability to large general graphs.

nan

Article 767

Title@2025-07-27 (7): Feed-anywhere ANN (I) Steady Discrete $\to$ Diffusing on Graph Hidden States

Title: Feed-anywhere ANN (I) Steady Discrete $\to$ Diffusing on Graph Hidden States

Futtermittel überall ANN (I) Steady Discrete $\to$ Diffusion auf Graph Hidden States

ANN (I) 稳定地在图表隐藏状态上分解 $\ to $@to$#fef 2507.20088v1

Authors (2): Dmitry Pasechnyuk-Vilensky, Daniil Doroshenko

We propose a novel framework for learning hidden graph structures from data using geometric analysis and nonlinear dynamics. Our approach: (1) Defines discrete Sobolev spaces on graphs for scalar/vector fields, establishing key functional properties; (2) Introduces gauge-equivalent nonlinear Schr"odinger and Landau–Lifshitz dynamics with provable stable stationary solutions smoothly dependent on input data and graph weights; (3) Develops a stochastic gradient algorithm over graph moduli spaces with sparsity regularization. Theoretically, we guarantee: topological correctness (homology recovery), metric convergence (Gromov–Hausdorff), and efficient search space utilization. Our dynamics-based model achieves stronger generalization bounds than standard neural networks, with complexity dependent on the data manifold’s topology.

nan

Article 768

Title@2025-07-26 (6): Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection

Title: Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection

Cluster Purge Loss: Strukturierung von Transformer-Embeddings für äquivalente Mutanten-Detektion

组群清除损失:对等变异物探测的变异体嵌入结构 2507.20078v1

Authors (3): Adelaide Danilov, Aria Nourbakhsh, Christoph Schommer

Recent pre-trained transformer models achieve superior performance in various code processing objectives. However, although effective at optimizing decision boundaries, common approaches for fine-tuning them for downstream classification tasks - distance-based methods or training an additional classification head - often fail to thoroughly structure the embedding space to reflect nuanced intra-class semantic relationships. Equivalent code mutant detection is one of these tasks, where the quality of the embedding space is crucial to the performance of the models. We introduce a novel framework that integrates cross-entropy loss with a deep metric learning objective, termed Cluster Purge Loss. This objective, unlike conventional approaches, concentrates on adjusting fine-grained differences within each class, encouraging the separation of instances based on semantical equivalency to the class center using dynamically adjusted borders. Employing UniXCoder as the base model, our approach demonstrates state-of-the-art performance in the domain of equivalent mutant detection and produces a more interpretable embedding space.

nan

Article 769

Title@2025-07-26 (6): Sparse Equation Matching: A Derivative-Free Learning for General-Order Dynamical Systems

Title: Sparse Equation Matching: A Derivative-Free Learning for General-Order Dynamical Systems

Sparse Equation Matching: Ein Derivativ-freies Lernen für allgemein geordnete dynamische Systeme

分布分布式配对:通用平极动态系统无衍生性无损学习 2507.20072v1

Authors (3): Jiaqiang Li, Jianbin Tan, Xueqin Wang

Equation discovery is a fundamental learning task for uncovering the underlying dynamics of complex systems, with wide-ranging applications in areas such as brain connectivity analysis, climate modeling, gene regulation, and physical system simulation. However, many existing approaches rely on accurate derivative estimation and are limited to first-order dynamical systems, restricting their applicability to real-world scenarios. In this work, we propose sparse equation matching (SEM), a unified framework that encompasses several existing equation discovery methods under a common formulation. SEM introduces an integral-based sparse regression method using Green’s functions, enabling derivative-free estimation of differential operators and their associated driving functions in general-order dynamical systems. The effectiveness of SEM is demonstrated through extensive simulations, benchmarking its performance against derivative-based approaches. We then apply SEM to electroencephalographic (EEG) data recorded during multiple oculomotor tasks, collected from 52 participants in a brain-computer interface experiment. Our method identifies active brain regions across participants and reveals task-specific connectivity patterns. These findings offer valuable insights into brain connectivity and the underlying neural mechanisms.

nan

Article 770

Title@2025-07-26 (6): Multi-Person Interaction Generation from Two-Person Motion Priors

Title: Multi-Person Interaction Generation from Two-Person Motion Priors

Multi-Personen-Interaktionsgenerierung von Zwei-Personen-Motion-Prioren

从两人先前动议中产生多人相互影响 2505.17860v2

Authors (4): Wenning Xu, Shiyu Fan, Paul Henderson, Edmond S. L. Ho

Generating realistic human motion with high-level controls is a crucial task for social understanding, robotics, and animation. With high-quality MOCAP data becoming more available recently, a wide range of data-driven approaches have been presented. However, modelling multi-person interactions still remains a less explored area. In this paper, we present Graph-driven Interaction Sampling, a method that can generate realistic and diverse multi-person interactions by leveraging existing two-person motion diffusion models as motion priors. Instead of training a new model specific to multi-person interaction synthesis, our key insight is to spatially and temporally separate complex multi-person interactions into a graph structure of two-person interactions, which we name the Pairwise Interaction Graph. We thus decompose the generation task into simultaneous single-person motion generation conditioned on one other’s motion. In addition, to reduce artifacts such as interpenetrations of body parts in generated multi-person interactions, we introduce two graph-dependent guidance terms into the diffusion sampling scheme. Unlike previous work, our method can produce various high-quality multi-person interactions without having repetitive individual motions. Extensive experiments demonstrate that our approach consistently outperforms existing methods in reducing artifacts when generating a wide range of two-person and multi-person interactions.

nan

Article 771

Title@2025-07-26 (6): PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

Title: PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

PERRY: Politikevaluierung mit Vertrauensintervallen unter Verwendung von Zusatzdaten

使用辅助数据进行具有互信性的政策评价 2507.20068v1

Authors (7): Aishwarya Mandyam, Jason Meng, Ge Gao, Jiankai Sun, Mac Schwager, Barbara E. Engelhardt, Emma Brunskill

Off-policy evaluation (OPE) methods aim to estimate the value of a new reinforcement learning (RL) policy prior to deployment. Recent advances have shown that leveraging auxiliary datasets, such as those synthesized by generative models, can improve the accuracy of these value estimates. Unfortunately, such auxiliary datasets may also be biased, and existing methods for using data augmentation for OPE in RL lack principled uncertainty quantification. In high stakes settings like healthcare, reliable uncertainty estimates are important for comparing policy value estimates. In this work, we propose two approaches to construct valid confidence intervals for OPE when using data augmentation. The first provides a confidence interval over the policy performance conditioned on a particular initial state $V^{\pi}(s_0)$– such intervals are particularly important for human-centered applications. To do so we introduce a new conformal prediction method for high dimensional state MDPs. Second, we consider the more common task of estimating the average policy performance over many initial states; to do so we draw on ideas from doubly robust estimation and prediction powered inference. Across simulators spanning robotics, healthcare and inventory management, and a real healthcare dataset from MIMIC-IV, we find that our methods can use augmented data and still consistently produce intervals that cover the ground truth values, unlike previously proposed methods.

nan

Article 772

Title@2025-07-26 (6): PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

Title: PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

PITA: Präferenz-geführte Inferenz-Zeit-Ausrichtung für LLM nach dem Training

PITA:LLM培训后培训的优先指导推论-时间协调 2507.20067v1

Authors (4): Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai

Inference-time alignment enables large language models (LLMs) to generate outputs aligned with end-user preferences without further training. Recent post-training methods achieve this by using small guidance models to modify token generation during inference. These methods typically optimize a reward function KL-regularized by the original LLM taken as the reference policy. A critical limitation, however, is their dependence on a pre-trained reward model, which requires fitting to human preference feedback–a potentially unstable process. In contrast, we introduce PITA, a novel framework that integrates preference feedback directly into the LLM’s token generation, eliminating the need for a reward model. PITA learns a small preference-based guidance policy to modify token probabilities at inference time without LLM fine-tuning, reducing computational cost and bypassing the pre-trained reward model dependency. The problem is framed as identifying an underlying preference distribution, solved through stochastic search and iterative refinement of the preference-based guidance model. We evaluate PITA across diverse tasks, including mathematical reasoning and sentiment classification, demonstrating its effectiveness in aligning LLM outputs with user preferences.

nan

Article 773

Title@2025-07-26 (6): Geometric Operator Learning with Optimal Transport

Title: Geometric Operator Learning with Optimal Transport

Geometrisches Bedienerlernen mit optimalem Verkehr

以最佳运输方式学习几何操作员 2507.20065v1

Authors (4): Xinyi Li, Zongyi Li, Nikola Kovachki, Anima Anandkumar

We propose integrating optimal transport (OT) into operator learning for partial differential equations (PDEs) on complex geometries. Classical geometric learning methods typically represent domains as meshes, graphs, or point clouds. Our approach generalizes discretized meshes to mesh density functions, formulating geometry embedding as an OT problem that maps these functions to a uniform density in a reference space. Compared to previous methods relying on interpolation or shared deformation, our OT-based method employs instance-dependent deformation, offering enhanced flexibility and effectiveness. For 3D simulations focused on surfaces, our OT-based neural operator embeds the surface geometry into a 2D parameterized latent space. By performing computations directly on this 2D representation of the surface manifold, it achieves significant computational efficiency gains compared to volumetric simulation. Experiments with Reynolds-averaged Navier-Stokes equations (RANS) on the ShapeNet-Car and DrivAerNet-Car datasets show that our method achieves better accuracy and also reduces computational expenses in terms of both time and memory usage compared to existing machine learning models. Additionally, our model demonstrates significantly improved accuracy on the FlowBench dataset, underscoring the benefits of employing instance-dependent deformation for datasets with highly variable geometries.

nan

Article 774

Title@2025-07-26 (6): Strategic Filtering for Content Moderation: Free Speech or Free of Distortion?

Title: Strategic Filtering for Content Moderation: Free Speech or Free of Distortion?

Strategisches Filtern für Content Moderation: Freie Sprache oder frei von Verzerrung?

内容调节的战略过滤: 言论自由还是无扭曲? 2507.20061v1

Authors (4): Saba Ahmadi, Avrim Blum, Haifeng Xu, Fan Yao

User-generated content (UGC) on social media platforms is vulnerable to incitements and manipulations, necessitating effective regulations. To address these challenges, those platforms often deploy automated content moderators tasked with evaluating the harmfulness of UGC and filtering out content that violates established guidelines. However, such moderation inevitably gives rise to strategic responses from users, who strive to express themselves within the confines of guidelines. Such phenomena call for a careful balance between: 1. ensuring freedom of speech – by minimizing the restriction of expression; and 2. reducing social distortion – measured by the total amount of content manipulation. We tackle the problem of optimizing this balance through the lens of mechanism design, aiming at optimizing the trade-off between minimizing social distortion and maximizing free speech. Although determining the optimal trade-off is NP-hard, we propose practical methods to approximate the optimal solution. Additionally, we provide generalization guarantees determining the amount of finite offline data required to approximate the optimal moderator effectively.

nan

Article 775

Title@2025-07-26 (6): ModShift: Model Privacy via Designed Shifts

Title: ModShift: Model Privacy via Designed Shifts

ModShift: Model Privacy über Designed Shifts

ModShifft: 通过设计变换实现的模型隐私 2507.20060v1

Authors (2): Nomaan A. Kherani, Urbashi Mitra

In this paper, shifts are introduced to preserve model privacy against an eavesdropper in federated learning. Model learning is treated as a parameter estimation problem. This perspective allows us to derive the Fisher Information matrix of the model updates from the shifted updates and drive them to singularity, thus posing a hard estimation problem for Eve. The shifts are securely shared with the central server to maintain model accuracy at the server and participating devices. A convergence test is proposed to detect if model updates have been tampered with and we show that our scheme passes this test. Numerical results show that our scheme achieves a higher model shift when compared to a noise injection scheme while requiring a lesser bandwidth secret channel.

nan

Article 776

Title@2025-07-26 (6): RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation

Title: RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation

RAG in the Wild: Über die (In)Wirksamkeit von LLMs mit Mixture-of-Knowledge Retrieval Augmentation

野生ROG:关于利用混合知识回收增加的LLMs(内)效力 2507.20059v1

Authors (6): Ran Xu, Yuchen Zhuang, Yue Yu, Haoyu Wang, Wenqi Shi, Carl Yang

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved at inference time. While RAG demonstrates strong performance on benchmarks largely derived from general-domain corpora like Wikipedia, its effectiveness under realistic, diverse retrieval scenarios remains underexplored. We evaluated RAG systems using MassiveDS, a large-scale datastore with mixture of knowledge, and identified critical limitations: retrieval mainly benefits smaller models, rerankers add minimal value, and no single retrieval source consistently excels. Moreover, current LLMs struggle to route queries across heterogeneous knowledge sources. These findings highlight the need for adaptive retrieval strategies before deploying RAG in real-world settings. Our code and data can be found at https://github.com/ritaranx/RAG_in_the_Wild.

nan

Article 777

Title@2025-07-26 (6): Predicting Parkinson’s Disease Progression Using Statistical and Neural Mixed Effects Models: A Comparative Study on Longitudinal Biomarkers

Title: Predicting Parkinson’s Disease Progression Using Statistical and Neural Mixed Effects Models: A Comparative Study on Longitudinal Biomarkers

Vorhersage der Progression der Parkinson-Krankheit anhand statistischer und neuraler Mixed Effects-Modelle: Eine vergleichende Studie über Längsschnittbiomarker

利用统计和神经混合效应模型预测帕金森氏疾病进展:纵向生物标记的比较研究 2507.20058v1

Authors (4): Ran Tong, Lanruo Wang, Tong Wang, Wei Yan

Predicting Parkinson’s Disease (PD) progression is crucial, and voice biomarkers offer a non-invasive method for tracking symptom severity (UPDRS scores) through telemonitoring. Analyzing this longitudinal data is challenging due to within-subject correlations and complex, nonlinear patient-specific progression patterns. This study benchmarks LMMs against two advanced hybrid approaches: the Generalized Neural Network Mixed Model (GNMM) (Mandel 2021), which embeds a neural network within a GLMM structure, and the Neural Mixed Effects (NME) model (Wortwein 2023), allowing nonlinear subject-specific parameters throughout the network. Using the Oxford Parkinson’s telemonitoring voice dataset, we evaluate these models’ performance in predicting Total UPDRS to offer practical guidance for PD research and clinical applications.

nan

Article 778

Title@2025-07-26 (6): What Can Grokking Teach Us About Learning Under Nonstationarity?

Title: What Can Grokking Teach Us About Learning Under Nonstationarity?

Was kann Grokking uns über das Lernen unter Nonstationarität lehren?

格罗金能教我们什么如何在不固定状态下学习? 2507.20057v1

Authors (4): Clare Lyle, Gharda Sokar, Razvan Pascanu, Andras Gyorgy

In continual learning problems, it is often necessary to overwrite components of a neural network’s learned representation in response to changes in the data stream; however, neural networks often exhibit \primacy bias, whereby early training data hinders the network’s ability to generalize on later tasks. While feature-learning dynamics of nonstationary learning problems are not well studied, the emergence of feature-learning dynamics is known to drive the phenomenon of grokking, wherein neural networks initially memorize their training data and only later exhibit perfect generalization. This work conjectures that the same feature-learning dynamics which facilitate generalization in grokking also underlie the ability to overwrite previous learned features as well, and methods which accelerate grokking by facilitating feature-learning dynamics are promising candidates for addressing primacy bias in non-stationary learning problems. We then propose a straightforward method to induce feature-learning dynamics as needed throughout training by increasing the effective learning rate, i.e. the ratio between parameter and update norms. We show that this approach both facilitates feature-learning and improves generalization in a variety of settings, including grokking, warm-starting neural network training, and reinforcement learning tasks.

nan

Article 779

Title@2025-07-26 (6): Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism

Title: Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism

Verbesserung der Deep Learning-basierten Atemschallanalyse mit Frequenzauswahl und Aufmerksamkeitsmechanismus

利用频率选择和注意机制改进基于深学习的呼吸系统无害分析 2507.20052v1

Authors (3): Nouhaila Fraihi, Ouassim Karrakchou, Mounir Ghogho

Accurate classification of respiratory sounds requires deep learning models that effectively capture fine-grained acoustic features and long-range temporal dependencies. Convolutional Neural Networks (CNNs) are well-suited for extracting local time-frequency patterns but are limited in modeling global context. In contrast, transformer-based models can capture long-range dependencies, albeit with higher computational demands. To address these limitations, we propose a compact CNN-Temporal Self-Attention (CNN-TSA) network that integrates lightweight self-attention into an efficient CNN backbone. Central to our approach is a Frequency Band Selection (FBS) module that suppresses noisy and non-informative frequency regions, substantially improving accuracy and reducing FLOPs by up to 50%. We also introduce age-specific models to enhance robustness across diverse patient groups. Evaluated on the SPRSound-2022/2023 and ICBHI-2017 lung sound datasets, CNN-TSA with FBS sets new benchmarks on SPRSound and achieves state-of-the-art performance on ICBHI, all with a significantly smaller computational footprint. Furthermore, integrating FBS into an existing transformer baseline yields a new record on ICBHI, confirming FBS as an effective drop-in enhancement. These results demonstrate that our framework enables reliable, real-time respiratory sound analysis suitable for deployment in resource-constrained settings.

nan

Article 780

Title@2025-07-26 (6): $K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

Title: $K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

$K^4$: Online Log Anomalienerkennung durch unüberwachtes Lernen

4K元:在线记录异常探测不受监督的典型学习 2507.20051v1

Authors (6): Weicong Chen, Vikash Singh, Zahra Rahmani, Debargha Ganguly, Mohsen Hariri, Vipin Chaudhary

Existing Log Anomaly Detection (LogAD) methods are often slow, dependent on error-prone parsing, and use unrealistic evaluation protocols. We introduce $K^4$, an unsupervised and parser-independent framework for high-performance online detection. $K^4$ transforms arbitrary log embeddings into compact four-dimensional descriptors (Precision, Recall, Density, Coverage) using efficient k-nearest neighbor (k-NN) statistics. These descriptors enable lightweight detectors to accurately score anomalies without retraining. Using a more realistic online evaluation protocol, $K^4$ sets a new state-of-the-art (AUROC: 0.995-0.999), outperforming baselines by large margins while being orders of magnitude faster, with training under 4 seconds and inference as low as 4 $\mu$s.

nan

Article 781

Title@2025-07-26 (6): Irredundant $k$-Fold Cross-Validation

Title: Irredundant $k$-Fold Cross-Validation

Irredundant $k$-Fold Cross-Validierung

溢余美元-折价交叉估价 2507.20048v1

Authors (1): Jesus S. Aguilar-Ruiz

In traditional k-fold cross-validation, each instance is used ($k-1$) times for training and once for testing, leading to redundancy that lets many instances disproportionately influence the learning phase. We introduce Irredundant $k$-fold cross-validation, a novel method that guarantees each instance is used exactly once for training and once for testing across the entire validation procedure. This approach ensures a more balanced utilization of the dataset, mitigates overfitting due to instance repetition, and enables sharper distinctions in comparative model analysis. The method preserves stratification and remains model-agnostic, i.e., compatible with any classifier. Experimental results demonstrate that it delivers consistent performance estimates across diverse datasets – comparable to $k$-fold cross-validation – while providing less optimistic variance estimates because training partitions are non-overlapping, and significantly reducing the overall computational cost.

nan

Article 782

Title@2025-07-26 (6): Improving Audio Classification by Transitioning from Zero- to Few-Shot

Title: Improving Audio Classification by Transitioning from Zero- to Few-Shot

Verbesserung der Audioklassifikation durch Übergang von Null- auf Wenig-Schuss

通过从零转向少热,改进音频分类 2507.20036v1

Authors (2): James Taylor, Wolfgang Mack

State-of-the-art audio classification often employs a zero-shot approach, which involves comparing audio embeddings with embeddings from text describing the respective audio class. These embeddings are usually generated by neural networks trained through contrastive learning to align audio and text representations. Identifying the optimal text description for an audio class is challenging, particularly when the class comprises a wide variety of sounds. This paper examines few-shot methods designed to improve classification accuracy beyond the zero-shot approach. Specifically, audio embeddings are grouped by class and processed to replace the inherently noisy text embeddings. Our results demonstrate that few-shot classification typically outperforms the zero-shot baseline.

nan

Article 783

Title@2025-07-26 (6): Selective Prompt Anchoring for Code Generation

Title: Selective Prompt Anchoring for Code Generation

Selektive Prompt-Ankerung für die Code-Generierung

代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代 2408.09121v6

Authors (2): Yuan Tian, Tianyi Zhang

Recent advances in large language models (LLMs) have transformed software development by automatically generating code from natural language. Yet challenges remain in generating fully correct code that aligns with user intent. Our study reveals that LLMs tend to pay less attention to user prompts as more code tokens are generated. We hypothesize that this attention dilution issue is an important reason for code generation errors. To mitigate this issue, we propose Selective Prompt Anchoring (SPA) to guide code LLMs to pay more attention to user intent when generating code. We evaluate SPA using six base LLMs across six benchmarks. Our results demonstrate that SPA enhances Pass@1 by up to 12.9%, consistently outperforming SOTA code generation methods in all settings. Our code is available at https://github.com/magic-YuanTian/Selective-Prompt-Anchoring.

nan

Article 784

Title@2025-07-26 (6): Preference learning made easy: Everything should be understood through win rate

Title: Preference learning made easy: Everything should be understood through win rate

Vorliebe Lernen leicht gemacht: Alles sollte durch Win-Rate verstanden werden

首选学习容易:人人都应通过双赢率来理解一切 2502.10505v2

Authors (2): Lily H. Zhang, Rajesh Ranganath

Preference learning, or the task of aligning generative models to preference comparison data, has yet to reach the conceptual maturity of classification, density estimation, etc. To close this gap, this work presents a framework to understand preference learning starting from the sampling distribution of pairwise preference data. First, we prove that the only evaluation of a generative model that respects both preferences and prevalences in the data distribution is a form of win rate, justifying win rate as the focal point to understand preference learning. We then analyze preference learning methods as win rate optimization (WRO) or non-WRO. We present novel instances of WRO beyond existing examples (RLHF, NLHF) and identify two key theoretical benefits of all such methods. We prove that common non-WRO methods like DPO and SFT on preferred samples lack these properties and suggest ways to mitigate such theoretical limitations. We also show that WRO underperforms in practice due optimization difficulties and that optimization success predicts performance better than choices which affect the objective’s solution. Our analysis highlights best practices for existing methods and provides recommendations for future research, guided by the principle that one should either align non-WRO methods more closely with WRO or improve the optimization of WRO objectives.

nan

Article 785

Title@2025-07-26 (6): Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization

Title: Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization

Machine-Learning-Assisted Photonic Device Development: Ein multiskaliger Ansatz von der Theorie zur Charakterisierung

机学辅助光学设备开发:从理论到定性的多尺度方法 2506.20056v2

Authors (19): Yuheng Chen, Alexander Montes McNeil, Taehyuk Park, Blake A. Wilson, Vaishnavi Iyer, Michael Bezick, Jae-Ik Choi, Rohan Ojha, Pravin Mahendran, Daksh Kumar Singh, Geetika Chitturi, Peigang Chen, Trang Do, Alexander V. Kildishev, Vladimir M. Shalaev, Michael Moebius, Wenshan Cai, Yongmin Liu, Alexandra Boltasseva

Photonic device development (PDD) has achieved remarkable success in designing and implementing new devices for controlling light across various wavelengths, scales, and applications, including telecommunications, imaging, sensing, and quantum information processing. PDD is an iterative, five-step process that consists of: i) deriving device behavior from design parameters, ii) simulating device performance, iii) finding the optimal candidate designs from simulations, iv) fabricating the optimal device, and v) measuring device performance. Classically, all these steps involve Bayesian optimization, material science, control theory, and direct physics-driven numerical methods. However, many of these techniques are computationally intractable, monetarily costly, or difficult to implement at scale. In addition, PDD suffers from large optimization landscapes, uncertainties in structural or optical characterization, and difficulties in implementing robust fabrication processes. However, the advent of machine learning over the past decade has provided novel, data-driven strategies for tackling these challenges, including surrogate estimators for speeding up computations, generative modeling for noisy measurement modeling and data augmentation, reinforcement learning for fabrication, and active learning for experimental physical discovery. In this review, we present a comprehensive perspective on these methods to enable machine-learning-assisted PDD (ML-PDD) for efficient design optimization with powerful generative models, fast simulation and characterization modeling under noisy measurements, and reinforcement learning for fabrication. This review will provide researchers from diverse backgrounds with valuable insights into this emerging topic, fostering interdisciplinary efforts to accelerate the development of complex photonic devices and systems.

nan

Article 786

Title: When Engineering Outruns Intelligence: A Re-evaluation of Instruction-Guided Navigation

Wenn Engineering Outruns Intelligenz: Eine Neubewertung der instruction-guided Navigation

Engineering Outs Onsruns Intelling:重新评价指示引导导航 2507.20021v1

Authors (4): Matin Aghaei, Mohammad Ali Alomrani, Yingxue Zhang, Mahdi Biparva

Large language models (LLMs) are often credited with recent leaps in ObjectGoal Navigation, yet the extent to which they improve planning remains unclear. We revisit this question on the HM3D-v1 validation split. First, we strip InstructNav of its Dynamic Chain-of-Navigation prompt, open-vocabulary GLEE detector and Intuition saliency map, and replace them with a simple Distance-Weighted Frontier Explorer (DWFE). This geometry-only heuristic raises Success from 58.0% to 61.1% and lifts SPL from 20.9% to 36.0% over 2 000 validation episodes, outperforming all previous training-free baselines. Second, we add a lightweight language prior (SHF); on a 200-episode subset this yields a further +2% Success and +0.9% SPL while shortening paths by five steps on average. Qualitative trajectories confirm the trend: InstructNav back-tracks and times-out, DWFE reaches the goal after a few islands, and SHF follows an almost straight route. Our results indicate that frontier geometry, not emergent LLM reasoning, drives most reported gains, and suggest that metric-aware prompts or offline semantic graphs are necessary before attributing navigation success to “LLM intelligence.”

nan

Article 787

Title@2025-07-26 (6): Conformal Safety Shielding for Imperfect-Perception Agents

Title: Conformal Safety Shielding for Imperfect-Perception Agents

Konforme Sicherheitsabschirmung für Imperfect-Perception Agents

为不合格感化物剂提供正规安全防护 2506.17275v2

Authors (7): William Scarbro, Calum Imrie, Sinem Getir Yaman, Kavan Fatehi, Corina S. Pasareanu, Radu Calinescu, Ravi Mangal

We consider the problem of safe control in discrete autonomous agents that use learned components for imperfect perception (or more generally, state estimation) from high-dimensional observations. We propose a shield construction that provides run-time safety guarantees under perception errors by restricting the actions available to an agent, modeled as a Markov decision process, as a function of the state estimates. Our construction uses conformal prediction for the perception component, which guarantees that for each observation, the predicted set of estimates includes the actual state with a user-specified probability. The shield allows an action only if it is allowed for all the estimates in the predicted set, resulting in local safety. We also articulate and prove a global safety property of existing shield constructions for perfect-perception agents bounding the probability of reaching unsafe states if the agent always chooses actions prescribed by the shield. We illustrate our approach with a case-study of an experimental autonomous system that guides airplanes on taxiways using high-dimensional perception DNNs.

nan

Article 788

Title@2025-07-26 (6): Shape Invariant 3D-Variational Autoencoder: Super Resolution in Turbulence flow

Title: Shape Invariant 3D-Variational Autoencoder: Super Resolution in Turbulence flow

Shape Invariant 3D-Variational Autoencoder: Super Auflösung im Turbulenzfluss

形状 3D - 变化式自动编码器: 波动流中的超级分辨率 2507.22082v1

Authors (1): Anuraj Maurya

Deep learning provides a versatile suite of methods for extracting structured information from complex datasets, enabling deeper understanding of underlying fluid dynamic phenomena. The field of turbulence modeling, in particular, benefits from the growing availability of high-dimensional data obtained through experiments, field observations, and large-scale simulations spanning multiple spatio-temporal scales. This report presents a concise overview of both classical and deep learningbased approaches to turbulence modeling. It further investigates two specific challenges at the intersection of fluid dynamics and machine learning: the integration of multiscale turbulence models with deep learning architectures, and the application of deep generative models for super-resolution reconstruction

nan

Article 789

Title@2025-07-26 (6): A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Title: A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Eine Praxis des Post-Trainings auf Llama-3 70B mit optimaler Auswahl des zusätzlichen Sprachmischverhältnisses

Llama-3-70B培训后做法,最佳选择其他语言混合比率 2409.06624v2

Authors (6): Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Luo Ji

Large Language Models (LLM) often need to be Continual Pre-Trained (CPT) to obtain unfamiliar language skills or adapt to new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ratio of extra language or domain corpus. However, there is no systematic study that bridges the gap between the optimal mixture ratio and the actual model performance, and the gap between experimental scaling law and the actual deployment in the full model size. In this paper, we perform CPT on Llama-3 8B and 70B to enhance its Chinese ability. We study the optimal correlation between the Additional Language Mixture Ratio (ALMR) and the Learning Rate (LR) on the 8B size which directly indicates the optimal experimental setup. By thorough choice of hyper-parameter, and subsequent fine-tuning, the model capability is improved not only on the Chinese-related benchmark but also in some specific domains including math, coding, and emotional intelligence. We deploy the final 70B version of LLM on a real-life chat system which obtains satisfying performance.

nan

Article 790

Title@2025-07-26 (6): FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

Title: FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

FedSWA: Verbesserung der Generalisierung im Federated Learning mit hoch Heterogenen Daten über Momentum-basierte stochastische kontrollierte Gewichtsverringerung

FedSWA:通过基于动力的存储器控控湿率提高具有高度异异变数据的联邦学习普及程度 2507.20016v1

Authors (6): Liu junkang, Yuanyuan Liu, Fanhua Shang, Hongying Liu, Jin Liu, Wei Feng

For federated learning (FL) algorithms such as FedSAM, their generalization capability is crucial for real-word applications. In this paper, we revisit the generalization problem in FL and investigate the impact of data heterogeneity on FL generalization. We find that FedSAM usually performs worse than FedAvg in the case of highly heterogeneous data, and thus propose a novel and effective federated learning algorithm with Stochastic Weight Averaging (called \texttt{FedSWA}), which aims to find flatter minima in the setting of highly heterogeneous data. Moreover, we introduce a new momentum-based stochastic controlled weight averaging FL algorithm (\texttt{FedMoSWA}), which is designed to better align local and global models. Theoretically, we provide both convergence analysis and generalization bounds for \texttt{FedSWA} and \texttt{FedMoSWA}. We also prove that the optimization and generalization errors of \texttt{FedMoSWA} are smaller than those of their counterparts, including FedSAM and its variants. Empirically, experimental results on CIFAR10/100 and Tiny ImageNet demonstrate the superiority of the proposed algorithms compared to their counterparts. Open source code at: https://github.com/junkangLiu0/FedSWA.

nan

Article 791

Title@2025-07-26 (6): PaRCE: Probabilistic and Reconstruction-based Competency Estimation for CNN-based Image Classification

Title: PaRCE: Probabilistic and Reconstruction-based Competency Estimation for CNN-based Image Classification

PaRCE: Probabilistische und rekonstruktionsbasierte Kompetenzschätzung für CNN-basierte Bildklassifikation

PaRCE:有线电视新闻网图像分类的概率和基于重建的能力估计 2411.16715v3

Authors (2): Sara Pohland, Claire Tomlin

Convolutional neural networks (CNNs) are extremely popular and effective for image classification tasks but tend to be overly confident in their predictions. Various works have sought to quantify uncertainty associated with these models, detect out-of-distribution (OOD) inputs, or identify anomalous regions in an image, but limited work has sought to develop a holistic approach that can accurately estimate perception model confidence across various sources of uncertainty. We develop a probabilistic and reconstruction-based competency estimation (PaRCE) method and compare it to existing approaches for uncertainty quantification and OOD detection. We find that our method can best distinguish between correctly classified, misclassified, and OOD samples with anomalous regions, as well as between samples with visual image modifications resulting in high, medium, and low prediction accuracy. We describe how to extend our approach for anomaly localization tasks and demonstrate the ability of our approach to distinguish between regions in an image that are familiar to the perception model from those that are unfamiliar. We find that our method generates interpretable scores that most reliably capture a holistic notion of perception model confidence.

nan

Article 792

Title@2025-07-26 (6): Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost

Title: Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost

Robuste Taxi-Fare-Prognose unter Lärmbedingungen: Eine vergleichende Studie von GAT, TimesNet und XGBoost

噪音条件下的强劲的出租车票价预测:GAT比较研究,TimesNet和XGBoost 2507.20008v1

Authors (1): Padmavathi Moorthy

Precise fare prediction is crucial in ride-hailing platforms and urban mobility systems. This study examines three machine learning models-Graph Attention Networks (GAT), XGBoost, and TimesNet to evaluate their predictive capabilities for taxi fares using a real-world dataset comprising over 55 million records. Both raw (noisy) and denoised versions of the dataset are analyzed to assess the impact of data quality on model performance. The study evaluated the models along multiple axes, including predictive accuracy, calibration, uncertainty estimation, out-of-distribution (OOD) robustness, and feature sensitivity. We also explore pre-processing strategies, including KNN imputation, Gaussian noise injection, and autoencoder-based denoising. The study reveals critical differences between classical and deep learning models under realistic conditions, offering practical guidelines for building robust and scalable models in urban fare prediction systems.

nan

Article 793

Title: HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning

HeLo:情感分布学习中带有标签关联的异变多模式融合 2507.06821v3

Authors (5): Chuhang Zheng, Chunwei Tian, Jie Wen, Daoqiang Zhang, Qi Zhu

Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.

nan

Article 794

Title@2025-07-26 (6): MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

Title: MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

MeTHanol: Modularisiertes Denken von Sprachmodellen mit Intermediate Layer Thinking, Decodierung und Bootstrapping Reasoning

METHanol:含有中间层思考、解毒和诱导理由的模块化思维语言模型 2409.12059v5

Authors (10): Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji

Current research efforts are focused on enhancing the thinking and reasoning capability of large language model (LLM) by prompting, data-driven emergence and inference-time computation. In this study, we consider stimulating language model’s thinking and cognitive abilities from a modular perspective, which mimics the human brain architecture. We select a specific intermediate attention layer with newly implemented language heads. We conduct dual-layer fine-tuning by annotated (query, thought, answer) samples and show that the intermediate layer can also learn to decode fluent and reasonable language tokens. A two-pass inference mechanism is designed to generate thoughts then formal responses. The entire framework is called modularized thinking language model (MeTHanol) which can enhance LLM’s cognitive behaviors as indicated by Theory of Mind (ToM) and Vignette-based experiments. Case studies also show that MeTHanol can plan and self-reflect and generate human-like thoughts and answers, even on unseen and open-domain tasks. MeTHanol can also adapt to a personalized prompt and behave as the specified character. Our study holds promise for significant cognitive gains from a modular perspective. Our code, model and data are available at https://bachozean.github.io/methanol-page

nan

Article 795

Title@2025-07-26 (6): GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning

Title: GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning

GLC++: Source-Free Universal Domain Adaptation durch Global-Local Clustering und Contrastive Affinity Learning

GLLC++:通过全球-地方集束和差异性亲密学习实现无源通用域域适应 2403.14410v2

Authors (7): Sanqing Qu, Tianpei Zou, Florian Röhrbein, Cewu Lu, Guang Chen, Dacheng Tao, Changjun Jiang

Deep neural networks often exhibit sub-optimal performance under covariate and category shifts. Source-Free Domain Adaptation (SFDA) presents a promising solution to this dilemma, yet most SFDA approaches are restricted to closed-set scenarios. In this paper, we explore Source-Free Universal Domain Adaptation (SF-UniDA) aiming to accurately classify “known” data belonging to common categories and segregate them from target-private “unknown” data. We propose a novel Global and Local Clustering (GLC) technique, which comprises an adaptive one-vs-all global clustering algorithm to discern between target classes, complemented by a local k-NN clustering strategy to mitigate negative transfer. Despite the effectiveness, the inherent closed-set source architecture leads to uniform treatment of “unknown” data, impeding the identification of distinct “unknown” categories. To address this, we evolve GLC to GLC++, integrating a contrastive affinity learning strategy. We examine the superiority of GLC and GLC++ across multiple benchmarks and category shift scenarios. Remarkably, in the most challenging open-partial-set scenarios, GLC and GLC++ surpass GATE by 16.8\% and 18.9\% in H-score on VisDA, respectively. GLC++ enhances the novel category clustering accuracy of GLC by 4.1\% in open-set scenarios on Office-Home. Furthermore, the introduced contrastive learning strategy not only enhances GLC but also significantly facilitates existing methodologies. The code is available at https://github.com/ispc-lab/GLC-plus.

nan

Article 796

Title@2025-07-26 (6): The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Title: The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Die dunkle Seite der Kräfte: Bewertung nicht konservativer Kraftmodelle für atomistisches maschinelles Lernen

部队的黑暗面:评估非保守力量模型,以进行原子学机器学习 2412.11569v5

Authors (3): Filippo Bigi, Marcel Langer, Michele Ceriotti

The use of machine learning to estimate the energy of a group of atoms, and the forces that drive them to more stable configurations, has revolutionized the fields of computational chemistry and materials discovery. In this domain, rigorous enforcement of symmetry and conservation laws has traditionally been considered essential. For this reason, interatomic forces are usually computed as the derivatives of the potential energy, ensuring energy conservation. Several recent works have questioned this physically constrained approach, suggesting that directly predicting the forces yields a better trade-off between accuracy and computational efficiency, and that energy conservation can be learned during training. This work investigates the applicability of such non-conservative models in microscopic simulations. We identify and demonstrate several fundamental issues, from ill-defined convergence of geometry optimization to instability in various types of molecular dynamics. Given the difficulty in monitoring and correcting the lack of energy conservation, direct forces should be used with great care. We show that the best approach to exploit the acceleration they afford is to use them in conjunction with conservative forces. A model can be pre-trained efficiently on direct forces, then fine-tuned using backpropagation. At evaluation time, both force types can be used together to avoid unphysical effects while still benefitting almost entirely from the computational efficiency of direct forces.

nan

Article 797

Title@2025-07-26 (6): Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion

Title: Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion

Effiziente stimmkonditionierte Musikgeneration über Soft Alignment Aufmerksamkeit und Latent Diffusion

通过软对齐关注和远程传播, 高效的Vocal有条件的音乐制作 2507.19991v1

Authors (2): Hei Shing Cheung, Boya Zhang

We present a lightweight latent diffusion model for vocal-conditioned musical accompaniment generation that addresses critical limitations in existing music AI systems. Our approach introduces a novel soft alignment attention mechanism that adaptively combines local and global temporal dependencies based on diffusion timesteps, enabling efficient capture of multi-scale musical structure. Operating in the compressed latent space of a pre-trained variational autoencoder, the model achieves a 220 times parameter reduction compared to state-of-the-art systems while delivering 52 times faster inference. Experimental evaluation demonstrates competitive performance with only 15M parameters, outperforming OpenAI Jukebox in production quality and content unity while maintaining reasonable musical coherence. The ultra-lightweight architecture enables real-time deployment on consumer hardware, making AI-assisted music creation accessible for interactive applications and resource-constrained environments.

nan

Article 798

Title@2025-07-26 (6): Visual Analytics Using Tensor Unified Linear Comparative Analysis

Title: Visual Analytics Using Tensor Unified Linear Comparative Analysis

Visual Analytics mit Tensor Unified Linear Comparative Analysis

利用透光器统一线性比较分析进行视觉分析 2507.19988v1

Authors (5): Naoki Okami, Kazuki Miyake, Naohisa Sakamoto, Jorji Nonaka, Takanori Fujiwara

Comparing tensors and identifying their (dis)similar structures is fundamental in understanding the underlying phenomena for complex data. Tensor decomposition methods help analysts extract tensors’ essential characteristics and aid in visual analytics for tensors. In contrast to dimensionality reduction (DR) methods designed only for analyzing a matrix (i.e., second-order tensor), existing tensor decomposition methods do not support flexible comparative analysis. To address this analysis limitation, we introduce a new tensor decomposition method, named tensor unified linear comparative analysis (TULCA), by extending its DR counterpart, ULCA, for tensor analysis. TULCA integrates discriminant analysis and contrastive learning schemes for tensor decomposition, enabling flexible comparison of tensors. We also introduce an effective method to visualize a core tensor extracted from TULCA into a set of 2D visualizations. We integrate TULCA’s functionalities into a visual analytics interface to support analysts in interpreting and refining the TULCA results. We demonstrate the efficacy of TULCA and the visual analytics interface with computational evaluations and two case studies, including an analysis of log data collected from a supercomputer.

nan

Article 799

Title@2025-07-26 (6): Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice

Title: Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice

Recurrent neuronale Netzwerkwellenfunktionen für Rydberg-Atomarrays auf Kagome-Gitter

Rydberg kagome 板上原子阵列的经常性神经网络波函数 2405.20384v2

Authors (5): Mohamed Hibat-Allah, Ejaaz Merali, Giacomo Torlai, Roger G Melko, Juan Carrasquilla

Rydberg atom array experiments have demonstrated the ability to act as powerful quantum simulators, preparing strongly-correlated phases of matter which are challenging to study for conventional computer simulations. A key direction has been the implementation of interactions on frustrated geometries, in an effort to prepare exotic many-body states such as spin liquids and glasses. In this paper, we apply two-dimensional recurrent neural network (RNN) wave functions to study the ground states of Rydberg atom arrays on the kagome lattice. We implement an annealing scheme to find the RNN variational parameters in regions of the phase diagram where exotic phases may occur, corresponding to rough optimization landscapes. For Rydberg atom array Hamiltonians studied previously on the kagome lattice, our RNN ground states show no evidence of exotic spin liquid or emergent glassy behavior. In the latter case, we argue that the presence of a non-zero Edwards-Anderson order parameter is an artifact of the long autocorrelations times experienced with quantum Monte Carlo (QMC) simulations, and we show that autocorrelations can be systematically reduced by increasing numerical effort. This result emphasizes the utility of autoregressive models, such as RNNs, in conjunction with QMC, to explore Rydberg atom array physics on frustrated lattices and beyond.

nan

Article 800

Title@2025-07-26 (6): LLM-Adapted Interpretation Framework for Machine Learning Models

Title: LLM-Adapted Interpretation Framework for Machine Learning Models

LLM-adapted Interpretation Framework for Machine Learning Models

LLM-成熟的机器学习模型解释框架 2507.21179v1

Authors (7): Yuqi Jin, Zihan Hu, Weiteng Zhang, Weihao Xie, Jianwei Shuai, Xian Shen, Zhen Feng

Background & Aims: High-performance machine learning models like XGBoost are often “black boxes,” limiting their clinical adoption due to a lack of interpretability. This study aims to bridge the gap between predictive accuracy and narrative transparency for sarcopenia risk assessment. Methods: We propose the LLM-Adapted Interpretation Framework (LAI-ML), a novel knowledge distillation architecture. LAI-ML transforms feature attributions from a trained XGBoost model into a probabilistic format using specialized techniques (HAGA and CACS). A Large Language Model (LLM), guided by a reinforcement learning loop and case-based retrieval, then generates data-faithful diagnostic narratives. Results: The LAI-ML framework achieved 83% prediction accuracy, significantly outperforming the baseline XGBoost model, 13% higher. Notably, the LLM not only replicated the teacher model’s logic but also corrected its predictions in 21.7% of discordant cases, demonstrating enhanced reasoning. Conclusion: LAI-ML effectively translates opaque model predictions into trustworthy and interpretable clinical insights, offering a deployable solution to the “black-box” problem in medical AI.

nan

Article 801

Title@2025-07-26 (6): Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

Title: Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

Robustes Daten-Wasserzeichen in Sprachmodellen durch Einspritzen fiktiver Kenntnisse

在语言模型中,通过输入有说服力的知识在语言模型中进行强力数据水上标记 2503.04036v3

Authors (4): Xinyue Cui, Johnny Tian-Zheng Wei, Swabha Swayamdipta, Robin Jia

Data watermarking in language models injects traceable signals, such as specific token sequences or stylistic patterns, into copyrighted text, allowing copyright holders to track and verify training data ownership. Previous data watermarking techniques primarily focus on effective memorization during pretraining, while overlooking challenges that arise in other stages of the LLM lifecycle, such as the risk of watermark filtering during data preprocessing and verification difficulties due to API-only access. To address these challenges, we propose a novel data watermarking approach that injects plausible yet fictitious knowledge into training data using generated passages describing a fictitious entity and its associated attributes. Our watermarks are designed to be memorized by the LLM through seamlessly integrating in its training data, making them harder to detect lexically during preprocessing. We demonstrate that our watermarks can be effectively memorized by LLMs, and that increasing our watermarks’ density, length, and diversity of attributes strengthens their memorization. We further show that our watermarks remain effective after continual pretraining and supervised finetuning. Finally, we show that our data watermarks can be evaluated even under API-only access via question answering.

nan

Article 802

Title@2025-07-26 (6): The Origin of Self-Attention: Pairwise Affinity Matrices in Feature Selection and the Emergence of Self-Attention

Title: The Origin of Self-Attention: Pairwise Affinity Matrices in Feature Selection and the Emergence of Self-Attention

Der Ursprung der Selbstachtung: Paarweise Affinitätsmatrizen in der Feature-Auswahl und das Entstehen der Selbstachtung

自我关注的起源:选择地物中的对等亲亲关系母体和自我关注的出现 2507.14560v2

Authors (1): Giorgio Roffo

The self-attention mechanism, now central to deep learning architectures such as Transformers, is a modern instance of a more general computational principle: learning and using pairwise affinity matrices to control how information flows through a model. This paper traces the conceptual origins of self-attention across multiple domains, including computer vision, natural language processing, and graph learning, through their shared reliance on an affinity matrix, denoted as A. We highlight Infinite Feature Selection (Inf-FS) as a foundational approach that generalizes the idea of affinity-based weighting. Unlike the fixed dot-product structure used in Transformers, Inf-FS defines A either through domain knowledge or by learning, and computes feature relevance through multi-hop propagation over the affinity graph. From this perspective, self-attention can be seen as a special case of Inf-FS: it uses a single-hop affinity computation where A is dynamically built from token similarities. We argue that the underlying structure, reasoning over pairwise relationships, is preserved across both approaches, and the key differences lie in how the affinity matrix is defined and applied. By situating self-attention within the broader paradigm of affinity-based computation, we unify several strands of machine learning research and highlight a common mathematical foundation that underpins diverse models and tasks.

nan

Article 803

Title@2025-07-26 (6): MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Title: MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

MegaScale-Infer: Servieren von Mixture-of-Experts auf Scale mit disaggregierten Experten-Parallelismus

超星级――推:利用分级专家平行主义在规模上为混合专家服务 2504.02263v4

Authors (20): Ruidong Zhu, Ziheng Jiang, Chao Jin, Peng Wu, Cesar A. Stuardo, Dongyang Wang, Xinlei Zhang, Huaping Zhou, Haoran Wei, Yang Cheng, Jianzhe Xiao, Xinyi Zhang, Lingjun Liu, Haibin Lin, Li-Wen Chang, Jianxi Ye, Xiao Yu, Xuanzhe Liu, Xin Jin, Xin Liu

Mixture-of-Experts (MoE) showcases tremendous potential to scale large language models (LLMs) with enhanced performance and reduced computational complexity. However, its sparsely activated architecture shifts feed-forward networks (FFNs) from being compute-intensive to memory-intensive during inference, leading to substantially lower GPU utilization and increased operational costs. We present MegaScale-Infer, an efficient and cost-effective system for serving large-scale MoE models. MegaScale-Infer disaggregates attention and FFN modules within each model layer, enabling independent scaling, tailored parallelism strategies, and heterogeneous deployment for both modules. To fully exploit disaggregation in the presence of MoE’s sparsity, MegaScale-Infer introduces ping-pong pipeline parallelism, which partitions a request batch into micro-batches and shuttles them between attention and FFNs for inference. Combined with distinct model parallelism for each module, MegaScale-Infer effectively hides communication overhead and maximizes GPU utilization. To adapt to disaggregated attention and FFN modules and minimize data transmission overhead (e.g., token dispatch), MegaScale-Infer provides a high-performance M2N communication library that eliminates unnecessary GPU-to-CPU data copies, group initialization overhead, and GPU synchronization. Experimental results indicate that MegaScale-Infer achieves up to 1.90x higher per-GPU throughput than state-of-the-art solutions.

nan

Article 804

Title@2025-07-26 (6): Extreme value theory for singular subspace estimation in the matrix denoising model

Title: Extreme value theory for singular subspace estimation in the matrix denoising model

Extreme Werttheorie für die singuläre Subraumschätzung im Matrix-Denoisierungsmodell

矩阵除空模型中单子空间估计极端值极端值理论 2507.19978v1

Authors (2): Junhyung Chang, Joshua Cape

This paper studies fine-grained singular subspace estimation in the matrix denoising model where a deterministic low-rank signal matrix is additively perturbed by a stochastic matrix of Gaussian noise. We establish that the maximum Euclidean row norm (i.e., the two-to-infinity norm) of the aligned difference between the leading sample and population singular vectors approaches the Gumbel distribution in the large-matrix limit, under suitable signal-to-noise conditions and after appropriate centering and scaling. We apply our novel asymptotic distributional theory to test hypotheses of low-rank signal structure encoded in the leading singular vectors and their corresponding principal subspace. We provide de-biased estimators for the corresponding nuisance signal singular values and show that our proposed plug-in test statistic has desirable properties. Notably, compared to using the Frobenius norm subspace distance, our test statistic based on the two-to-infinity norm has higher power to detect structured alternatives that differ from the null in only a few matrix entries or rows. Our main results are obtained by a novel synthesis of and technical analysis involving entrywise matrix perturbation analysis, extreme value theory, saddle point approximation methods, and random matrix theory. Our contributions complement the existing literature for matrix denoising focused on minimaxity, mean squared error analysis, unitarily invariant distances between subspaces, component-wise asymptotic distributional theory, and row-wise uniform error bounds. Numerical simulations illustrate our main results and demonstrate the robustness properties of our testing procedure to non-Gaussian noise distributions.

nan

Article 805

Title@2025-07-26 (6): A roadmap for AI in robotics

Title: A roadmap for AI in robotics

Fahrplan für KI in der Robotik

机器人用人工智能的路线图 2507.19975v1

Authors (11): Aude Billard, Alin Albu-Schaeffer, Michael Beetz, Wolfram Burgard, Peter Corke, Matei Ciocarlie, Ravinder Dahiya, Danica Kragic, Ken Goldberg, Yukie Nagai, Davide Scaramuzza

AI technologies, including deep learning, large-language models have gone from one breakthrough to the other. As a result, we are witnessing growing excitement in robotics at the prospect of leveraging the potential of AI to tackle some of the outstanding barriers to the full deployment of robots in our daily lives. However, action and sensing in the physical world pose greater and different challenges than analysing data in isolation. As the development and application of AI in robotic products advances, it is important to reflect on which technologies, among the vast array of network architectures and learning models now available in the AI field, are most likely to be successfully applied to robots; how they can be adapted to specific robot designs, tasks, environments; which challenges must be overcome. This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises. These range from keeping up-to-date large datasets, representatives of a diversity of tasks robots may have to perform, and of environments they may encounter, to designing AI algorithms tailored specifically to robotics problems but generic enough to apply to a wide range of applications and transfer easily to a variety of robotic platforms. For robots to collaborate effectively with humans, they must predict human behavior without relying on bias-based profiling. Explainability and transparency in AI-driven robot control are not optional but essential for building trust, preventing misuse, and attributing responsibility in accidents. We close on what we view as the primary long-term challenges, that is, to design robots capable of lifelong learning, while guaranteeing safe deployment and usage, and sustainable computational costs.

nan

Article 806

Title@2025-07-26 (6): NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

Title: NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs

NestQuant: 母体产品和LLMs的Nasted Lattice量化 2502.09720v3

Authors (4): Semyon Savkin, Eitan Porat, Or Ordentlich, Yury Polyanskiy

Post-training quantization (PTQ) has emerged as a critical technique for efficient deployment of large language models (LLMs). This work proposes NestQuant, a novel PTQ scheme for weights and activations that is based on self-similar nested lattices. Recent works have mathematically shown such quantizers to be information-theoretically optimal for low-precision matrix multiplication. We implement a practical low-complexity version of NestQuant based on Gosset lattice, making it a drop-in quantizer for any matrix multiplication step (e.g., in self-attention, MLP etc). For example, NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B to 4 bits, achieving perplexity of 6.6 on wikitext2. This represents more than 55% reduction in perplexity gap with respect to unquantized model (perplexity of 6.14) compared to state-of-the-art Metas SpinQuant (perplexity 7.3), OstQuant (7.3) and QuaRot (8.2). Comparisons on bigger models (up to 70B) and on various LLM evaluation benchmarks confirm uniform superiority of NestQuant.

nan

Article 807

Title@2025-07-26 (6): SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions

Title: SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions

SkinDualGen: Prompt-getriebene Diffusion für die gleichzeitige Bild-Maske-Generierung in Hautläsionen

SkinDualGen: 皮肤遗迹中同声图像元件生成的快速驱动扩散 2507.19970v1

Authors (1): Zhaobin Xu

Medical image analysis plays a pivotal role in the early diagnosis of diseases such as skin lesions. However, the scarcity of data and the class imbalance significantly hinder the performance of deep learning models. We propose a novel method that leverages the pretrained Stable Diffusion-2.0 model to generate high-quality synthetic skin lesion images and corresponding segmentation masks. This approach augments training datasets for classification and segmentation tasks. We adapt Stable Diffusion-2.0 through domain-specific Low-Rank Adaptation (LoRA) fine-tuning and joint optimization of multi-objective loss functions, enabling the model to simultaneously generate clinically relevant images and segmentation masks conditioned on textual descriptions in a single step. Experimental results show that the generated images, validated by FID scores, closely resemble real images in quality. A hybrid dataset combining real and synthetic data markedly enhances the performance of classification and segmentation models, achieving substantial improvements in accuracy and F1-score of 8% to 15%, with additional positive gains in other key metrics such as the Dice coefficient and IoU. Our approach offers a scalable solution to address the challenges of medical imaging data, contributing to improved accuracy and reliability in diagnosing rare diseases.

nan

Article 808

Title@2025-07-26 (6): Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Title: Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Dimer-Enhanced Optimization: Ein Ansatz erster Ordnung, um Sattelpunkte im neuralen Netzwerktraining zu überwinden

优化优化:在神经网络培训中以第一阶梯方式解剖搭配点 2507.19968v1

Authors (3): Yue Hu, Zanxia Cao, Yingchao Liu

First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network training. DEO adapts the Dimer method to explore a broader region of the loss landscape, approximating the Hessian’s smallest eigenvector without computing the full matrix. By periodically projecting the gradient onto the subspace orthogonal to the minimum curvature direction, DEO guides the optimizer away from saddle points and flat regions, enhancing training efficiency with non-stepwise updates. Preliminary experiments on a Transformer toy model show DEO achieves competitive performance compared to standard first-order methods, improving navigation of complex loss landscapes. Our work repurposes physics-inspired, first-order curvature estimation to enhance neural network training in high-dimensional spaces.

nan

Article 809

Title@2025-07-26 (6): Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning

Title: Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning

Multi-Agenten-Verstärkungs-Lernen mit großflächiger Mixed-Traffic- und Intersektionskontrolle

利用多剂强化学习系统进行大型混合运输和跨部门控制 2504.04691v2

Authors (5): Songyang Liu, Muyang Fan, Weizi Li, Jing Du, Shuai Li

Traffic congestion remains a significant challenge in modern urban networks. Autonomous driving technologies have emerged as a potential solution. Among traffic control methods, reinforcement learning has shown superior performance over traffic signals in various scenarios. However, prior research has largely focused on small-scale networks or isolated intersections, leaving large-scale mixed traffic control largely unexplored. This study presents the first attempt to use decentralized multi-agent reinforcement learning for large-scale mixed traffic control in which some intersections are managed by traffic signals and others by robot vehicles. Evaluating a real-world network in Colorado Springs, CO, USA with 14 intersections, we measure traffic efficiency via average waiting time of vehicles at intersections and the number of vehicles reaching their destinations within a time window (i.e., throughput). At 80% RV penetration rate, our method reduces waiting time from 6.17s to 5.09s and increases throughput from 454 vehicles per 500 seconds to 493 vehicles per 500 seconds, outperforming the baseline of fully signalized intersections. These findings suggest that integrating reinforcement learning-based control large-scale traffic can improve overall efficiency and may inform future urban planning strategies.

nan

Article 810

Title@2025-07-26 (6): Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks

Title: Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks

Wer besitzt dieses Beispiel: Cross-Client Mitgliedschaft Inferenz Attack in Federated Graph Neural Networks

拥有此样本者: 联邦神经网络的跨气候成员推论攻击 2507.19964v1

Authors (10): Kunhao Li, Di Wu, Jun Bai, Jing Xu, Lei Yang, Ziyi Zhang, Yiliao Song, Wencheng Yang, Taotao Cai, Yan Li

Graph-structured data is prevalent in many real-world applications, including social networks, financial systems, and molecular biology. Graph Neural Networks (GNNs) have become the de facto standard for learning from such data due to their strong representation capabilities. As GNNs are increasingly deployed in federated learning (FL) settings to preserve data locality and privacy, new privacy threats arise from the interaction between graph structures and decentralized training. In this paper, we present the first systematic study of cross-client membership inference attacks (CC-MIA) against node classification tasks of federated GNNs (FedGNNs), where a malicious client aims to infer which client owns the given data. Unlike prior centralized-focused work that focuses on whether a sample was included in training, our attack targets sample-to-client attribution, a finer-grained privacy risk unique to federated settings. We design a general attack framework that exploits FedGNNs’ aggregation behaviors, gradient updates, and embedding proximity to link samples to their source clients across training rounds. We evaluate our attack across multiple graph datasets under realistic FL setups. Results show that our method achieves high performance on both membership inference and ownership identification. Our findings highlight a new privacy threat in federated graph learning-client identity leakage through structural and model-level cues, motivating the need for attribution-robust GNN design.

nan

Article 811

Title@2025-07-26 (6): Preconditioned Inexact Stochastic ADMM for Deep Model

Title: Preconditioned Inexact Stochastic ADMM for Deep Model

Vorkonditioniertes inexaktes stochastisches ADMM für Deep Model

用于深型号的预设不灵巧的ADMMD 2502.10784v3

Authors (5): Shenglong Zhou, Ouya Wang, Ziyan Luo, Yongxu Zhu, Geoffrey Ye Li

The recent advancement of foundation models (FMs) has brought about a paradigm shift, revolutionizing various sectors worldwide. The popular optimizers used to train these models are stochastic gradient descent-based algorithms, which face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. This paper develops an algorithm, PISA (\textbf{P}reconditioned \textbf{I}nexact \textbf{S}tochastic \textbf{A}lternating Direction Method of Multipliers), which enables scalable parallel computing and supports various preconditions, such as second-order information, second moment, and orthogonalized momentum by Newton-Schulz iterations. Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables PISA to tackle the challenge of data heterogeneity effectively. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art optimizers.

nan

Article 812

Title@2025-07-26 (6): $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting

Title: $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting

$K^2$VAE: Ein Koopman-Kalman-Verbesserter Variations-AutoEncoder für probabilistische Zeitreihenprognosen

2美元VAE: 概率时间序列预测的Koopman-Kalman增强变异自动编码器 2505.23017v3

Authors (6): Xingjian Wu, Xiangfei Qiu, Hongfan Gao, Jilin Hu, Bin Yang, Chenjuan Guo

Probabilistic Time Series Forecasting (PTSF) plays a crucial role in decision-making across various fields, including economics, energy, and transportation. Most existing methods excell at short-term forecasting, while overlooking the hurdles of Long-term Probabilistic Time Series Forecasting (LPTSF). As the forecast horizon extends, the inherent nonlinear dynamics have a significant adverse effect on prediction accuracy, and make generative models inefficient by increasing the cost of each iteration. To overcome these limitations, we introduce $K^2$VAE, an efficient VAE-based generative model that leverages a KoopmanNet to transform nonlinear time series into a linear dynamical system, and devises a KalmanNet to refine predictions and model uncertainty in such linear system, which reduces error accumulation in long-term forecasting. Extensive experiments demonstrate that $K^2$VAE outperforms state-of-the-art methods in both short- and long-term PTSF, providing a more efficient and accurate solution.

nan

Article 813

Title@2025-07-26 (6): Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Title: Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs

跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v4

Authors (5): Hao Wang, Pinzhi Huang, Jihan Yang, Saining Xie, Daisuke Kawahara

The rapid evolution of multimodal large language models (MLLMs) has significantly enhanced their real-world applications. However, achieving consistent performance across languages, especially when integrating cultural knowledge, remains a significant challenge. To better assess this issue, we introduce two new benchmarks: KnowRecall and VisRecall, which evaluate cross-lingual consistency in MLLMs. KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages, focusing on cultural and historical questions about global landmarks. VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images. Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, still struggle to achieve cross-lingual consistency. This underscores the need for more robust approaches that produce truly multilingual and culturally aware models.

nan

Article 814

Title@2025-07-26 (6): Negative Dependence as a toolbox for machine learning : review and new developments

Title: Negative Dependence as a toolbox for machine learning : review and new developments

Negative Abhängigkeit als Werkzeugkasten für maschinelles Lernen: Überprüfung und Neuentwicklungen

消极依赖作为机器学习的工具箱:审查与新发展 2502.07285v2

Authors (4): Hoang-Son Tran, Vladimir Petrovic, Remi Bardenet, Subhroshekhar Ghosh

Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad range of fundamental machine learning challenges including optimization, sampling, dimensionality reduction and sparse signal recovery, often surpassing the performance of current methods based on statistical independence. The most popular negatively dependent model has been that of determinantal point processes (DPPs), which have their origins in quantum theory. However, other models, such as perturbed lattice models, strongly Rayleigh measures, zeros of random functions have gained salience in various learning applications. In this article, we review this burgeoning field of research, as it has developed over the past two decades or so. We also present new results on applications of DPPs to the parsimonious representation of neural networks. In the limited scope of the article, we mostly focus on aspects of this area to which the authors contributed over the recent years, including applications to Monte Carlo methods, coresets and stochastic gradient descent, stochastic networks, signal processing and connections to quantum computation. However, starting from basics of negative dependence for the uninitiated reader, extensive references are provided to a broad swath of related developments which could not be covered within our limited scope. While existing works and reviews generally focus on specific negatively dependent models (e.g. DPPs), a notable feature of this article is that it addresses negative dependence as a machine learning methodology as a whole. In this vein, it covers within its span an array of negatively dependent models and their applications well beyond DPPs, thereby putting forward a very general and rather unique perspective.

nan

Article 815

Title@2025-07-26 (6): Simple Policy Optimization

Title: Simple Policy Optimization

Einfache Optimierung der Politik

简单政策优化 2401.16025v9

Authors (5): Zhengpeng Xie, Qiang Zhang, Fan Yang, Marco Hutter, Renjing Xu

Model-free reinforcement learning algorithms have seen remarkable progress, but key challenges remain. Trust Region Policy Optimization (TRPO) is known for ensuring monotonic policy improvement through conservative updates within a trust region, backed by strong theoretical guarantees. However, its reliance on complex second-order optimization limits its practical efficiency. Proximal Policy Optimization (PPO) addresses this by simplifying TRPO’s approach using ratio clipping, improving efficiency but sacrificing some theoretical robustness. This raises a natural question: Can we combine the strengths of both methods? In this paper, we introduce Simple Policy Optimization (SPO), a novel unconstrained first-order algorithm. By slightly modifying the policy loss used in PPO, SPO can achieve the best of both worlds. Our new objective improves upon ratio clipping, offering stronger theoretical properties and better constraining the probability ratio within the trust region. Empirical results demonstrate that SPO outperforms PPO with a simple implementation, particularly for training large, complex network architectures end-to-end.

nan

Article 816

Title@2025-07-26 (6): Deep Learning Based Joint Channel Estimation and Positioning for Sparse XL-MIMO OFDM Systems

Title: Deep Learning Based Joint Channel Estimation and Positioning for Sparse XL-MIMO OFDM Systems

Deep Learning Based Joint Channel Schätzung und Positionierung für Sparse XL-MIMO OFDM Systeme

分散 XL-MIMO ODM系统的深学习联合频道估计和定位 2507.19936v1

Authors (7): Zhongnian Li, Chao Zheng, Jian Xiao, Ji Wang, Gongpu Wang, Ming Zeng, Octavia A. Dobre

This paper investigates joint channel estimation and positioning in near-field sparse extra-large multiple-input multiple-output (XL-MIMO) orthogonal frequency division multiplexing (OFDM) systems. To achieve cooperative gains between channel estimation and positioning, we propose a deep learning-based two-stage framework comprising positioning and channel estimation. In the positioning stage, the user’s coordinates are predicted and utilized in the channel estimation stage, thereby enhancing the accuracy of channel estimation. Within this framework, we propose a U-shaped Mamba architecture for channel estimation and positioning, termed as CP-Mamba. This network integrates the strengths of the Mamba model with the structural advantages of U-shaped convolutional networks, enabling effective capture of local spatial features and long-range temporal dependencies of the channel. Numerical simulation results demonstrate that the proposed two-stage approach with CP-Mamba architecture outperforms existing baseline methods. Moreover, sparse arrays (SA) exhibit significantly superior performance in both channel estimation and positioning accuracy compared to conventional compact arrays.

nan

Article 817

Title@2025-07-26 (6): Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Title: Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Frontier AI Risk Management Framework in der Praxis: Ein technischer Bericht zur Risikoanalyse

《国际边界风险管理框架实际操作:风险分析技术报告》 2507.16534v2

Authors (38): Shanghai AI Lab, :, Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou

To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, this report presents a comprehensive assessment of their frontier risks. Drawing on the E-T-C analysis (deployment environment, threat source, enabling capability) from the Frontier AI Risk Management Framework (v1.0) (SafeWork-F1-Framework), we identify critical risks in seven areas: cyber offense, biological and chemical risks, persuasion and manipulation, uncontrolled autonomous AI R\&D, strategic deception and scheming, self-replication, and collusion. Guided by the “AI-$45^\circ$ Law,” we evaluate these risks using “red lines” (intolerable thresholds) and “yellow lines” (early warning indicators) to define risk zones: green (manageable risk for routine deployment and continuous monitoring), yellow (requiring strengthened mitigations and controlled deployment), and red (necessitating suspension of development and/or deployment). Experimental results show that all recent frontier AI models reside in green and yellow zones, without crossing red lines. Specifically, no evaluated models cross the yellow line for cyber offense or uncontrolled AI R\&D risks. For self-replication, and strategic deception and scheming, most models remain in the green zone, except for certain reasoning models in the yellow zone. In persuasion and manipulation, most models are in the yellow zone due to their effective influence on humans. For biological and chemical risks, we are unable to rule out the possibility of most models residing in the yellow zone, although detailed threat modeling and in-depth assessment are required to make further claims. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.

nan

Article 818

Title@2025-07-26 (6): ReCA: A Parametric ReLU Composite Activation Function

Title: ReCA: A Parametric ReLU Composite Activation Function

ReCA: Eine parametrische ReLU-Kompositaktivierungsfunktion

ReCA: 参数雷光U复合启动功能 2504.08994v2

Authors (2): John Chidiac, Danielle Azar

Activation functions have been shown to affect the performance of deep neural networks significantly. While the Rectified Linear Unit (ReLU) remains the dominant choice in practice, the optimal activation function for deep neural networks remains an open research question. In this paper, we propose a novel parametric activation function, ReCA, based on ReLU, which has been shown to outperform all baselines on state-of-the-art datasets using different complex neural network architectures.

nan

Article 819

Title@2025-07-26 (6): Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems

Title: Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems

Effiziente Ritz-Methode für 1D-Diffusionsreaktionsprobleme

用于1D 扩散反应问题的高效浅流机法 2407.01496v4

Authors (4): Zhiqiang Cai, Anastassia Doktorova, Robert D. Falgout, César Herrera

This paper studies the shallow Ritz method for solving one-dimensional diffusion-reaction problems. The method is capable of improving the order of approximation for non-smooth problems. By following a similar approach to the one presented in [9], we present a damped block Newton (dBN) method to achieve nearly optimal order of approximation. The dBN method optimizes the Ritz functional by alternating between the linear and non-linear parameters of the shallow ReLU neural network (NN). For diffusion-reaction problems, new difficulties arise: (1) for the linear parameters, the mass matrix is dense and even more ill-conditioned than the stiffness matrix, and (2) for the non-linear parameters, the Hessian matrix is dense and may be singular. This paper addresses these challenges, resulting in a dBN method with computational cost of ${\cal O}(n)$. The ideas presented for diffusion-reaction problems can also be applied to least-squares approximation problems. For both applications, starting with the non-linear parameters as a uniform partition, numerical experiments show that the dBN method moves the mesh points to nearly optimal locations.

nan

Article 820

Title@2025-07-26 (6): Tractable Representation Learning with Probabilistic Circuits

Title: Tractable Representation Learning with Probabilistic Circuits

Tractable Representative Learning mit probabilistischen Schaltungen

利用概率电路进行可追踪的代表性学习 2507.04385v2

Authors (6): Steven Braun, Sahil Sidheekh, Antonio Vergari, Martin Mundt, Sriraam Natarajan, Kristian Kersting

Probabilistic circuits (PCs) are powerful probabilistic models that enable exact and tractable inference, making them highly suitable for probabilistic reasoning and inference tasks. While dominant in neural networks, representation learning with PCs remains underexplored, with prior approaches relying on external neural embeddings or activation-based encodings. To address this gap, we introduce autoencoding probabilistic circuits (APCs), a novel framework leveraging the tractability of PCs to model probabilistic embeddings explicitly. APCs extend PCs by jointly modeling data and embeddings, obtaining embedding representations through tractable probabilistic inference. The PC encoder allows the framework to natively handle arbitrary missing data and is seamlessly integrated with a neural decoder in a hybrid, end-to-end trainable architecture enabled by differentiable sampling. Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. These results highlight APCs as a powerful and flexible representation learning method that exploits the probabilistic inference capabilities of PCs, showing promising directions for robust inference, out-of-distribution detection, and knowledge distillation.

nan

Article 821

Title@2025-07-26 (6): Interleaved Multitask Learning with Energy Modulated Learning Progress

Title: Interleaved Multitask Learning with Energy Modulated Learning Progress

Interleaved Multitask Learning mit energiemoduliertem Lernfortschritt

利用能源调整的学习进度进行跨间多任务学习 2504.00707v2

Authors (5): Hanne Say, Suzan Ece Ada, Emre Ugur, Minoru Asada, Erhan Oztop

As humans learn new skills and apply their existing knowledge while maintaining previously learned information, “continual learning” in machine learning aims to incorporate new data while retaining and utilizing past knowledge. However, existing machine learning methods often does not mimic human learning where tasks are intermixed due to individual preferences and environmental conditions. Humans typically switch between tasks instead of completely mastering one task before proceeding to the next. To explore how human-like task switching can enhance learning efficiency, we propose a multi task learning architecture that alternates tasks based on task-agnostic measures such as “learning progress” and “neural computational energy expenditure”. To evaluate the efficacy of our method, we run several systematic experiments by using a set of effect-prediction tasks executed by a simulated manipulator robot. The experiments show that our approach surpasses random interleaved and sequential task learning in terms of average learning accuracy. Moreover, by including energy expenditure in the task switching logic, our approach can still perform favorably while reducing neural energy expenditure.

nan

Article 822

Title@2025-07-26 (6): Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting

Title: Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting

Erklärung der Design-Wahl der Wahrscheinlichkeitspfade im Flow-Matching für Vorhersagen

说明预测流程匹配中概率路径的设计选择 2410.03229v3

Authors (7): Soon Hoe Lim, Yijin Wang, Annan Yu, Emma Hart, Michael W. Mahoney, Xiaoye S. Li, N. Benjamin Erichson

Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting performance remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the selection of the probability path model. Motivated by this insight, we propose a novel probability path model designed to improve forecasting performance. Our empirical results across various dynamical system benchmarks show that our model achieves faster convergence during training and improved predictive performance compared to existing probability path models. Importantly, our approach is efficient during inference, requiring only a few sampling steps. This makes our proposed model practical for real-world applications and opens new avenues for probabilistic forecasting.

nan

Article 823

Title@2025-07-26 (6): SoftPipe: A Soft-Guided Reinforcement Learning Framework for Automated Data Preparation

Title: SoftPipe: A Soft-Guided Reinforcement Learning Framework for Automated Data Preparation

SoftPipe: Ein Soft-Guided-Enforcement-Lernrahmen für die automatisierte Datenvorbereitung

SoftPipe: 自动数据编制软件辅助强化学习框架 2507.13710v2

Authors (6): Jing Chang, Chang Liu, Jinbin Huang, Shuyuan Zheng, Rui Mao, Jianbin Qin

Data preparation is a foundational yet notoriously challenging component of the machine learning lifecycle, characterized by a vast combinatorial search space. While reinforcement learning (RL) offers a promising direction, state-of-the-art methods suffer from a critical limitation: to manage the search space, they rely on rigid hard constraints'' that prematurely prune the search space and often preclude optimal solutions. To address this, we introduce SoftPipe, a novel RL framework that replaces these constraints with a flexiblesoft guidance’’ paradigm. SoftPipe formulates action selection as a Bayesian inference problem. A high-level strategic prior, generated by a Large Language Model (LLM), probabilistically guides exploration. This prior is combined with empirical estimators from two sources through a collaborative process: a fine-grained quality score from a supervised Learning-to-Rank (LTR) model and a long-term value estimate from the agent’s Q-function. Through extensive experiments on 18 diverse datasets, we demonstrate that SoftPipe achieves up to a 13.9\% improvement in pipeline quality and 2.8$\times$ faster convergence compared to existing methods.

nan

Article 824

Title@2025-07-26 (6): The Impact of Fine-tuning Large Language Models on Automated Program Repair

Title: The Impact of Fine-tuning Large Language Models on Automated Program Repair

Die Auswirkungen von Feinabstimmungen großer Sprachmodelle auf die automatisierte Programmreparatur

微调大语言模型对自动方案维修的影响 2507.19909v1

Authors (4): Roman Macháček, Anastasiia Grishina, Max Hort, Leon Moonen

Automated Program Repair (APR) uses various tools and techniques to help developers achieve functional and error-free code faster. In recent years, Large Language Models (LLMs) have gained popularity as components in APR tool chains because of their performance and flexibility. However, training such models requires a significant amount of resources. Fine-tuning techniques have been developed to adapt pre-trained LLMs to specific tasks, such as APR, and enhance their performance at far lower computational costs than training from scratch. In this study, we empirically investigate the impact of various fine-tuning techniques on the performance of LLMs used for APR. Our experiments provide insights into the performance of a selection of state-of-the-art LLMs pre-trained on code. The evaluation is done on three popular APR benchmarks (i.e., QuixBugs, Defects4J and HumanEval-Java) and considers six different LLMs with varying parameter sizes (resp. CodeGen, CodeT5, StarCoder, DeepSeekCoder, Bloom, and CodeLlama-2). We consider three training regimens: no fine-tuning, full fine-tuning, and parameter-efficient fine-tuning (PEFT) using LoRA and IA3. We observe that full fine-tuning techniques decrease the benchmarking performance of various models due to different data distributions and overfitting. By using parameter-efficient fine-tuning methods, we restrict models in the amount of trainable parameters and achieve better results. Keywords: large language models, automated program repair, parameter-efficient fine-tuning, AI4Code, AI4SE, ML4SE.

nan

Article 825

Title@2025-07-26 (6): Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings

Title: Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings

Treue differenzierbare Vernunft mit neugeschaffenen, regionsbasierten Einbettungen

以区域为基础的嵌入式 2406.09529v2

Authors (3): Aleksandar Pavlovic, Emanuel Sallinger, Steven Schockaert

Knowledge graph (KG) embedding methods learn geometric representations of entities and relations to predict plausible missing knowledge. These representations are typically assumed to capture rule-like inference patterns. However, our theoretical understanding of which inference patterns can be captured remains limited. Ideally, KG embedding methods should be expressive enough such that for any set of rules, there exist relation embeddings that exactly capture these rules. This principle has been studied within the framework of region-based embeddings, but existing models are severely limited in the kinds of rule bases that can be captured. We argue that this stems from the fact that entity embeddings are only compared in a coordinate-wise fashion. As an alternative, we propose RESHUFFLE, a simple model based on ordering constraints that can faithfully capture a much larger class of rule bases than existing approaches. Most notably, RESHUFFLE can capture bounded inference w.r.t. arbitrary sets of closed path rules. The entity embeddings in our framework can be learned by a Graph Neural Network (GNN), which effectively acts as a differentiable rule base.

nan

Article 826

Title@2025-07-26 (6): TS-Insight: Visualizing Thompson Sampling for Verification and XAI

Title: TS-Insight: Visualizing Thompson Sampling for Verification and XAI

TS-Insight: Visualisierung der Thompson-Probenahme für Verifikation und XAI

TS-深入观察:可视化Thompson抽样核查和XAI 2507.19898v1

Authors (5): Parsa Vares, Éloi Durant, Jun Pang, Nicolas Médoc, Mohammad Ghoniem

Thompson Sampling (TS) and its variants are powerful Multi-Armed Bandit algorithms used to balance exploration and exploitation strategies in active learning. Yet, their probabilistic nature often turns them into a ``black box’’, hindering debugging and trust. We introduce TS-Insight, a visual analytics tool explicitly designed to shed light on the internal decision mechanisms of Thompson Sampling-based algorithms, for model developers. It comprises multiple plots, tracing for each arm the evolving posteriors, evidence counts, and sampling outcomes, enabling the verification, diagnosis, and explainability of exploration/exploitation dynamics. This tool aims at fostering trust and facilitating effective debugging and deployment in complex binary decision-making scenarios especially in sensitive domains requiring interpretable decision-making.

nan

Article 827

Title@2025-07-26 (6): Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach

Title: Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach

Nonconvex Optimization Framework for Group-Spasse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach

用于群分反馈线性水量最佳控制的非confvex优化框架。二:非惩罚性办法 2507.19895v1

Authors (3): Lechen Feng, Xun Li, Yuan-Hua Ni

This work is a companion paper of [8], where the distributed linear-quadratic problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ) are formulated into a nonsmooth and nonconvex optimization problem with affine constraints. Moreover, a penalty approach is considered in \cite{feng-part1}, and the PALM (proximal alternating linearized minimization) algorithm is studied with convergence and complexity analysis. In this paper, we aim to address the inherent drawbacks of the penalty approach, such as the challenge of tuning the penalty parameter and the risk of introducing spurious stationary points. Specifically, we first reformulate the SF-LQ problem and the DFT-LQ problem from an epi-composition function perspective, aiming to solve the constrained problem directly. Then, from a theoretical viewpoint, we revisit the alternating direction method of multipliers (ADMM) and establish its convergence to the set of cluster points under certain assumptions. When these assumptions do not hold, we can effectively utilize alternative approaches combining subgradient descent with Difference-of-Convex relaxation methods. In summary, our results enable the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates, restrictive structural assumptions, or penalty formulations that incorporate constraints into the cost function.

nan

Article 828

Title@2025-07-26 (6): A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction

Title: A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction

Eine Umfrage zum Generativen Modell-Unlearning: Grundlagen, Taxonomie, Evaluation und Zukunftsrichtung

关于 “ 产生示范式 “ 学习:基本原理、分类学、评价和未来方向的调查 “ 2507.19894v1

Authors (9): Xiaohua Feng, Jiaming Zhang, Fengyuan Yu, Chengye Wang, Li Zhang, Kaixiang Li, Yuyuan Li, Chaochao Chen, Jianwei Yin

With the rapid advancement of generative models, associated privacy concerns have attracted growing attention. To address this, researchers have begun adapting machine unlearning techniques from traditional classification models to generative settings. Although notable progress has been made in this area, a unified framework for systematically organizing and integrating existing work is still lacking. The substantial differences among current studies in terms of unlearning objectives and evaluation protocols hinder the objective and fair comparison of various approaches. While some studies focus on specific types of generative models, they often overlook the commonalities and systematic characteristics inherent in Generative Model Unlearning (GenMU). To bridge this gap, we provide a comprehensive review of current research on GenMU and propose a unified analytical framework for categorizing unlearning objectives, methodological strategies, and evaluation metrics. In addition, we explore the connections between GenMU and related techniques, including model editing, reinforcement learning from human feedback, and controllable generation. We further highlight the potential practical value of unlearning techniques in real-world applications. Finally, we identify key challenges and outline future research directions aimed at laying a solid foundation for further advancements in this field. We consistently maintain the related open-source materials at https://github.com/caxLee/Generative-model-unlearning-survey.

nan

Article 829

Title@2025-07-26 (6): CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation

Title: CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation

CLoRA: Parameter-Effizientes kontinuierliches Lernen mit Low-Rank-Anpassung

CLORA:低Rank适应的参数有效持续学习 2507.19887v1

Authors (3): Shishir Muralidhara, Didier Stricker, René Schuster

In the past, continual learning (CL) was mostly concerned with the problem of catastrophic forgetting in neural networks, that arises when incrementally learning a sequence of tasks. Current CL methods function within the confines of limited data access, without any restrictions imposed on computational resources. However, in real-world scenarios, the latter takes precedence as deployed systems are often computationally constrained. A major drawback of most CL methods is the need to retrain the entire model for each new task. The computational demands of retraining large models can be prohibitive, limiting the applicability of CL in environments with limited resources. Through CLoRA, we explore the applicability of Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method for class-incremental semantic segmentation. CLoRA leverages a small set of parameters of the model and uses the same set for learning across all tasks. Results demonstrate the efficacy of CLoRA, achieving performance on par with and exceeding the baseline methods. We further evaluate CLoRA using NetScore, underscoring the need to factor in resource efficiency and evaluate CL methods beyond task performance. CLoRA significantly reduces the hardware requirements for training, making it well-suited for CL in resource-constrained environments after deployment.

nan

Article 830

Title@2025-07-26 (6): CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation

Title: CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation

CoSTI: Konsistenzmodelle für (eine schnellere) Spatio-Temporale Imputation

COSTI:(更快的)SPatio-Te时截肢的一致模型 2501.19364v2

Authors (4): Javier Solís-García, Belén Vega-Márquez, Juan A. Nepomuceno, Isabel A. Nepomuceno-Chamorro

Multivariate Time Series Imputation (MTSI) is crucial for many applications, such as healthcare monitoring and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, like Denoising Diffusion Probabilistic Models (DDPMs), achieve high imputation accuracy; however, they suffer from significant computational costs and are notably time-consuming due to their iterative nature. In this work, we propose CoSTI, an innovative adaptation of Consistency Models (CMs) for the MTSI domain. CoSTI employs Consistency Training to achieve comparable imputation quality to DDPMs while drastically reducing inference times, making it more suitable for real-time applications. We evaluate CoSTI across multiple datasets and missing data scenarios, demonstrating up to a 98% reduction in imputation time with performance on par with diffusion-based models. This work bridges the gap between efficiency and accuracy in generative imputation tasks, providing a scalable solution for handling missing data in critical spatio-temporal systems.

nan

Article 831

Title@2025-07-26 (6): DRL-AdaPart: DRL-Driven Adaptive STAR-RIS Partitioning for Fair and Frugal Resource Utilization

Title: DRL-AdaPart: DRL-Driven Adaptive STAR-RIS Partitioning for Fair and Frugal Resource Utilization

DRL-AdaPart: DRL-getriebene adaptive STAR-RIS-Partitionierung für faire und frugale Ressourcennutzung

DRL-AdaPart: DRL-Drive DRL-Drive 适应性STAR-风险研究分割,促进公平和节节节利用资源 2407.06868v2

Authors (4): Ashok S. Kumar, Nancy Nayak, Sheetal Kalyani, Himal A. Suraweera

In this work, we propose a method for efficient resource utilization of simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) elements to ensure fair and high data rates. We introduce a subsurface assignment variable that determines the number of STAR-RIS elements allocated to each user and maximizes the sum of the data rates by jointly optimizing the phase shifts of the STAR-RIS and the subsurface assignment variables using an appropriately tailored deep reinforcement learning (DRL) algorithm. The proposed DRL method is also compared with a Dinkelbach algorithm and the designed hybrid DRL approach. A penalty term is incorporated into the DRL model to enhance resource utilization by intelligently deactivating STAR-RIS elements when not required. The proposed DRL method can achieve fair and high data rates for static and mobile users while ensuring efficient resource utilization through extensive simulations. Using the proposed DRL method, up to 27% and 21% of STAR-RIS elements can be deactivated in static and mobile scenarios, respectively, without affecting performance.

nan

Article 832

Title@2025-07-26 (6): RestoreAI – Pattern-based Risk Estimation Of Remaining Explosives

Title: RestoreAI – Pattern-based Risk Estimation Of Remaining Explosives

RestoreAI – Musterbasierte Risikoschätzung von verbleibenden Sprengstoffen

Res恢复AI – – 基于模式的剩余爆炸物风险估计 2507.19873v1

Authors (3): Björn Kischelewski, Benjamin Guedj, David Wahl

Landmine removal is a slow, resource-intensive process affecting over 60 countries. While AI has been proposed to enhance explosive ordnance (EO) detection, existing methods primarily focus on object recognition, with limited attention to prediction of landmine risk based on spatial pattern information. This work aims to answer the following research question: How can AI be used to predict landmine risk from landmine patterns to improve clearance time efficiency? To that effect, we introduce RestoreAI, an AI system for pattern-based risk estimation of remaining explosives. RestoreAI is the first AI system that leverages landmine patterns for risk prediction, improving the accuracy of estimating the residual risk of missing EO prior to land release. We particularly focus on the implementation of three instances of RestoreAI, respectively, linear, curved and Bayesian pattern deminers. First, the linear pattern deminer uses linear landmine patterns from a principal component analysis (PCA) for the landmine risk prediction. Second, the curved pattern deminer uses curved landmine patterns from principal curves. Finally, the Bayesian pattern deminer incorporates prior expert knowledge by using a Bayesian pattern risk prediction. Evaluated on real-world landmine data, RestoreAI significantly boosts clearance efficiency. The top-performing pattern-based deminers achieved a 14.37 percentage point increase in the average share of cleared landmines per timestep and required 24.45% less time than the best baseline deminer to locate all landmines. Interestingly, linear and curved pattern deminers showed no significant performance difference, suggesting that more efficient linear patterns are a viable option for risk prediction.

nan

Article 833

Title@2025-07-26 (6): Numerical Artifacts in Learning Dynamical Systems

Title: Numerical Artifacts in Learning Dynamical Systems

Numerische Artefakte im Lernen dynamischer Systeme

学习动态系统中的数值手法 2507.14491v2

Authors (2): Bing-Ze Lu, Richard Tsai

In many applications, one needs to learn a dynamical system from its solutions sampled at a finite number of time points. The learning problem is often formulated as an optimization problem over a chosen function class. However, in the optimization procedure, it is necessary to employ a numerical scheme to integrate candidate dynamical systems and assess how their solutions fit the data. This paper reveals potentially serious effects of a chosen numerical scheme on the learning outcome. In particular, our analysis demonstrates that a damped oscillatory system may be incorrectly identified as having “anti-damping” and exhibiting a reversed oscillation direction, despite adequately fitting the given data points.

nan

Article 834

Title@2025-07-26 (6): Quantum-Informed Machine Learning for Chaotic Systems

Title: Quantum-Informed Machine Learning for Chaotic Systems

Quanteninformiertes maschinelles Lernen für chaotische Systeme

半量量制系统半成型机器学习 2507.19861v1

Authors (3): Maida Wang, Xiao Xue, Peter V. Coveney

Learning the behaviour of chaotic systems remains challenging due to instability in long-term predictions and difficulties in accurately capturing invariant statistical properties. While quantum machine learning offers a promising route to efficiently capture physical properties from high-dimensional data, its practical deployment is hindered by current hardware noise and limited scalability. We introduce a quantum-informed machine learning framework for learning partial differential equations, with an application focus on chaotic systems. A quantum circuit Born machine is employed to learn the invariant properties of chaotic dynamical systems, achieving substantial memory efficiency by representing these complex physical statistics with a compact set of trainable circuit parameters. This approach reduces the data storage requirement by over two orders of magnitude compared to the raw simulation data. The resulting statistical quantum-informed prior is then incorporated into a Koopman-based auto-regressive model to address issues such as gradient vanishing or explosion, while maintaining long-term statistical fidelity. The framework is evaluated on three representative systems: the Kuramoto-Sivashinsky equation, two-dimensional Kolmogorov flow and turbulent channel flow. In all cases, the quantum-informed model achieves superior performance compared to its classical counterparts without quantum priors. This hybrid architecture offers a practical route for learning dynamical systems using near-term quantum hardware.

nan

Article 835

Title@2025-07-26 (6): Training Neural Networks for Modularity aids Interpretability

Title: Training Neural Networks for Modularity aids Interpretability

Ausbildung Neuronale Netzwerke für Modularitätshilfen Dolmetschbarkeit

模块辅助工具神经网络培训 2409.15747v2

Authors (3): Satvik Golechha, Dylan Cope, Nandi Schoots

An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular using an ``enmeshment loss’’ function that encourages the formation of non-interacting clusters. Using automated interpretability measures, we show that our method finds clusters that learn different, disjoint, and smaller circuits for CIFAR-10 labels. Our approach provides a promising direction for making neural networks easier to interpret.

nan

Article 836

Title@2025-07-26 (6): Taming Domain Shift in Multi-source CT-Scan Classification via Input-Space Standardization

Title: Taming Domain Shift in Multi-source CT-Scan Classification via Input-Space Standardization

Domänenumschichtung in der CT-Scan-Klassifikation mit mehreren Quellen mittels Input-Space Standardisierung

通过输入空间标准化实现多源CT-scan分类的多源CT-Scan域变换 2507.19858v1

Authors (9): Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

Multi-source CT-scan classification suffers from domain shifts that impair cross-source generalization. While preprocessing pipelines combining Spatial-Slice Feature Learning (SSFL++) and Kernel-Density-based Slice Sampling (KDS) have shown empirical success, the mechanisms underlying their domain robustness remain underexplored. This study analyzes how this input-space standardization manages the trade-off between local discriminability and cross-source generalization. The SSFL++ and KDS pipeline performs spatial and temporal standardization to reduce inter-source variance, effectively mapping disparate inputs into a consistent target space. This preemptive alignment mitigates domain shift and simplifies the learning task for network optimization. Experimental validation demonstrates consistent improvements across architectures, proving the benefits stem from the preprocessing itself. The approach’s effectiveness was validated by securing first place in a competitive challenge, supporting input-space standardization as a robust and practical solution for multi-institutional medical imaging.

nan

Article 837

Title@2025-07-26 (6): MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

Title: MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

MultiKernelBench: Ein Multi-Platform Benchmark für die Kernel-Generation

多KenneelBench: 核心生成的多平台基准 2507.17773v2

Authors (6): Zhongzhen Wen, Yinghui Zhang, Zhong Li, Zhongxin Liu, Linna Xie, Tian Zhang

The automatic generation of deep learning (DL) kernels using large language models (LLMs) has emerged as a promising approach to reduce the manual effort and hardware-specific expertise required for writing high-performance operator implementations. However, existing benchmarks for evaluating LLMs in this domain suffer from limited hardware support, coarse-grained kernel categorization, and imbalanced task coverage. To address these limitations, we introduce MultiKernelBench, the first comprehensive, multi-platform benchmark for LLM-based DL kernel generation. MultiKernelBench spans 285 tasks across 14 well-defined kernel categories and supports three major hardware platforms: Nvidia GPUs, Huawei NPUs, and Google TPUs. To enable future extensibility, we design a modular backend abstraction layer that decouples platform-specific logic from the core benchmarking infrastructure, allowing easy integration of new hardware platforms. We further propose a simple yet effective category-aware one-shot prompting method that improves generation quality by providing in-category exemplars. Through systematic evaluations of seven state-of-the-art LLMs, we reveal significant variation in task difficulty, poor generalization to platforms with less training exposure, and the effectiveness of targeted prompting strategies. MultiKernelBench is publicly available at https://github.com/wzzll123/MultiKernelBench.

nan

Article 838

Title@2025-07-26 (6): Agentic Reinforced Policy Optimization

Title: Agentic Reinforced Policy Optimization

Agentische verstärkte politische Optimierung

强化政策优化 2507.19849v1

Authors (14): Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou

Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can often utilize external tools to assist in task-solving processes. However, current RL algorithms inadequately balance the models’ intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions. To bridge this gap, we propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents. Through preliminary experiments, we observe that LLMs tend to exhibit highly uncertain behavior, characterized by an increase in the entropy distribution of generated tokens, immediately following interactions with external tools. Motivated by this observation, ARPO incorporates an entropy-based adaptive rollout mechanism, dynamically balancing global trajectory sampling and step-level sampling, thereby promoting exploration at steps with high uncertainty after tool usage. By integrating an advantage attribution estimation, ARPO enables LLMs to internalize advantage differences in stepwise tool-use interactions. Our experiments across 13 challenging benchmarks in computational reasoning, knowledge reasoning, and deep search domains demonstrate ARPO’s superiority over trajectory-level RL algorithms. Remarkably, ARPO achieves improved performance using only half of the tool-use budget required by existing methods, offering a scalable solution for aligning LLM-based agents with real-time dynamic environments. Our code and datasets are released at https://github.com/dongguanting/ARPO

nan

Article 839

Title@2025-07-26 (6): VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets

Title: VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets

VAE-GAN-basierte Preismanipulation in koordinierten lokalen Energiemärkten

VAE-GAN 协调的地方能源市场价格操纵 2507.19844v1

Authors (6): Biswarup Mukherjee, Li Zhou, S. Gokul Krishnan, Milad Kabirifar, Subhash Lakshminarayana, Charalambos Konstantinou

This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs), participating in the local energy market (LEM) that interacts with the market-clearing entity. The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG) framework, enabling prosumers to make real-time decisions on whether to buy, sell, or refrain from any action while facilitating efficient coordination for optimal energy trading in a dynamic market. In addition, we investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers. Our results show that under adversarial pricing, heterogeneous prosumer groups, particularly those lacking generation capabilities, incur financial losses. The same outcome holds across LEMs of different sizes. As the market size increases, trading stabilizes and fairness improves through emergent cooperation among agents.

nan

Article 840

Title@2025-07-26 (6): Hybrid Deep Learning and Handcrafted Feature Fusion for Mammographic Breast Cancer Classification

Title: Hybrid Deep Learning and Handcrafted Feature Fusion for Mammographic Breast Cancer Classification

Hybrides Deep Learning und handwerkliche Feature Fusion für die mammographische Brustkrebsklassifikation

哺乳性乳腺癌分类的深层学习和手工制作的特征融合混合体 2507.19843v1

Authors (3): Maximilian Tschuchnig, Michael Gadermayr, Khalifa Djemal

Automated breast cancer classification from mammography remains a significant challenge due to subtle distinctions between benign and malignant tissue. In this work, we present a hybrid framework combining deep convolutional features from a ResNet-50 backbone with handcrafted descriptors and transformer-based embeddings. Using the CBIS-DDSM dataset, we benchmark our ResNet-50 baseline (AUC: 78.1%) and demonstrate that fusing handcrafted features with deep ResNet-50 and DINOv2 features improves AUC to 79.6% (setup d1), with a peak recall of 80.5% (setup d1) and highest F1 score of 67.4% (setup d1). Our experiments show that handcrafted features not only complement deep representations but also enhance performance beyond transformer-based embeddings. This hybrid fusion approach achieves results comparable to state-of-the-art methods while maintaining architectural simplicity and computational efficiency, making it a practical and effective solution for clinical decision support.

nan

Article 841

Title@2025-07-26 (6): A Low-complexity Structured Neural Network to Realize States of Dynamical Systems

Title: A Low-complexity Structured Neural Network to Realize States of Dynamical Systems

Ein strukturiertes neurales Netzwerk mit geringer Komplexität zur Realisierung von Zuständen dynamischer Systeme

实现动态系统状态的低复杂结构神经网络 2503.23697v2

Authors (4): Hansaka Aluvihare, Levi Lingsch, Xianqi Li, Sirani M. Perera

Data-driven learning is rapidly evolving and places a new perspective on realizing state-space dynamical systems. However, dynamical systems derived from nonlinear ordinary differential equations (ODEs) suffer from limitations in computational efficiency. Thus, this paper stems from data-driven learning to advance states of dynamical systems utilizing a structured neural network (StNN). The proposed learning technique also seeks to identify an optimal, low-complexity operator to solve dynamical systems, the so-called Hankel operator, derived from time-delay measurements. Thus, we utilize the StNN based on the Hankel operator to solve dynamical systems as an alternative to existing data-driven techniques. We show that the proposed StNN reduces the number of parameters and computational complexity compared with the conventional neural networks and also with the classical data-driven techniques, such as Sparse Identification of Nonlinear Dynamics (SINDy) and Hankel Alternative view of Koopman (HAVOK), which is commonly known as delay-Dynamic Mode Decomposition(DMD) or Hankel-DMD. More specifically, we present numerical simulations to solve dynamical systems utilizing the StNN based on the Hankel operator beginning from the fundamental Lotka-Volterra model, where we compare the StNN with the LEarning Across Dynamical Systems (LEADS), and extend our analysis to highly nonlinear and chaotic Lorenz systems, comparing the StNN with conventional neural networks, SINDy, and HAVOK. Hence, we show that the proposed StNN paves the way for realizing state-space dynamical systems with a low-complexity learning algorithm, enabling prediction and understanding of future states.

nan

Article 842

Title: GNSP: Gradient Null Space Projection for Preserving Cross-Modal Alignment in VLMs Continual Learning

GNSP: Gradient Null Raumprojektion zur Erhaltung der Cross-Modal Alignment in VLMs Continual Learning

GNSP: 用于在VLMs 持续学习中保持跨模式一致的渐进号空间预测 2507.19839v1

Authors (5): Tiantian Peng, Yuyang Liu, Shuo Yang, Qiuhe Hong, YongHong Tian

Contrastive Language-Image Pretraining has demonstrated remarkable zero-shot generalization by aligning visual and textual modalities in a shared embedding space. However, when continuously fine-tuned on diverse tasks, CLIP suffers from catastrophic forgetting and degradation of its embedding alignment, undermining its zero-shot capabilities. In this work, we propose Gradient Null Space Projection (GNSP), an efficient continual learning method that projects task-specific gradients onto the null space of previously learned knowledge. This orthogonal projection mathematically prevents interference with previous tasks without relying on rehearsal or architectural modification. Furthermore, to preserve the inherent generalization property of CLIP, we introduce knowledge distillation and combine it with a modality alignment preservation loss inspired by CLIP pre-training to stabilize the structure of the multimodal embedding space during fine-tuning. On the MTIL benchmark consisting of 11 tasks, our method achieved SOTA performance on both the Average and Last key metrics. More importantly, experiments show that our method successfully maintains the original modality gap and cross-modal retrieval performance of CLIP, confirming its effectiveness in maintaining a robust visual-language space throughout the continual learning process.

nan

Article 843

Title@2025-07-26 (6): Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

Title: Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

Bewertung des Selbstüberwachten Lernens in der medizinischen Bildgebung: Ein Maßstab für Robustheit, Verallgemeinerbarkeit und Multi-Domain-Impact

评价医疗成像方面的自我监督学习:强力、通用性和多领域影响基准 2412.19124v2

Authors (7): Valay Bundele, Karahan Sarıtaş, Bora Kargi, Oğuz Ata Çal, Kıvanç Tezören, Zohreh Ghaderi, Hendrik Lensch

Self-supervised learning (SSL) has emerged as a promising paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only isolated aspects of model performance. This fragmented evaluation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying conditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while exploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross-dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to simulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.

nan

Article 844

Title@2025-07-26 (6): On the rates of convergence for learning with convolutional neural networks

Title: On the rates of convergence for learning with convolutional neural networks

Über die Konvergenzraten für das Lernen mit konvolutionären neuronalen Netzwerken

与进进进神经网络学习的趋同率 2403.16459v3

Authors (3): Yunfei Yang, Han Feng, Ding-Xuan Zhou

We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second result gives new analysis on the covering number of feed-forward neural networks with CNNs as special cases. The analysis carefully takes into account the size of the weights and hence gives better bounds than the existing literature in some situations. Using these two results, we are able to derive rates of convergence for estimators based on CNNs in many learning problems. In particular, we establish minimax optimal convergence rates of the least squares based on CNNs for learning smooth functions in the nonparametric regression setting. For binary classification, we derive convergence rates for CNN classifiers with hinge loss and logistic loss. It is also shown that the obtained rates for classification are minimax optimal in some common settings.

nan

Article 845

Title@2025-07-26 (6): FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning

Title: FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning

FedBAP: Backdoor Defense via Benign Adversarial Perturbation im Federated Learning

FedBAP:通过联邦学习中的Benign Aversarial Proturbidation进行后门防御 2507.21177v1

Authors (6): Xinhai Yan, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang

Federated Learning (FL) enables collaborative model training while preserving data privacy, but it is highly vulnerable to backdoor attacks. Most existing defense methods in FL have limited effectiveness due to their neglect of the model’s over-reliance on backdoor triggers, particularly as the proportion of malicious clients increases. In this paper, we propose FedBAP, a novel defense framework for mitigating backdoor attacks in FL by reducing the model’s reliance on backdoor triggers. Specifically, first, we propose a perturbed trigger generation mechanism that creates perturbation triggers precisely matching backdoor triggers in location and size, ensuring strong influence on model outputs. Second, we utilize these perturbation triggers to generate benign adversarial perturbations that disrupt the model’s dependence on backdoor triggers while forcing it to learn more robust decision boundaries. Finally, we design an adaptive scaling mechanism to dynamically adjust perturbation intensity, effectively balancing defense strength and model performance. The experimental results demonstrate that FedBAP reduces the attack success rates by 0.22%-5.34%, 0.48%-6.34%, and 97.22%-97.6% under three types of backdoor attacks, respectively. In particular, FedBAP demonstrates outstanding performance against novel backdoor attacks.

nan

Article 846

Title@2025-07-26 (6): HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs

Title: HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs

HCAtention: Extreme KV Cache Compression via Heterogenes Aufmerksamkeitsrechnen für LLMs

HCAttention:通过不同式注意计算法对LLMs进行极端KV缓存压缩 2507.19823v1

Authors (5): Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao

Processing long-context inputs with large language models presents a significant challenge due to the enormous memory requirements of the Key-Value (KV) cache during inference. Existing KV cache compression methods exhibit noticeable performance degradation when memory is reduced by more than 85%. Additionally, strategies that leverage GPU-CPU collaboration for approximate attention remain underexplored in this setting. We propose HCAttention, a heterogeneous attention computation framework that integrates key quantization, value offloading, and dynamic KV eviction to enable efficient inference under extreme memory constraints. The method is compatible with existing transformer architectures and does not require model fine-tuning. Experimental results on the LongBench benchmark demonstrate that our approach preserves the accuracy of full-attention model while shrinking the KV cache memory footprint to 25% of its original size. Remarkably, it stays competitive with only 12.5% of the cache, setting a new state-of-the-art in LLM KV cache compression. To the best of our knowledge, HCAttention is the first to extend the Llama-3-8B model to process 4 million tokens on a single A100 GPU with 80GB memory.

nan

Article 847

Title@2025-07-26 (6): Debunking Optimization Myths in Federated Learning for Medical Image Classification

Title: Debunking Optimization Myths in Federated Learning for Medical Image Classification

Debunking Optimization Myths in Federated Learning für medizinische Bildklassifikation

联邦医学图像分类学习联合会中最优化的神话 2507.19822v1

Authors (5): Youngjoon Lee, Hyukjoon Lee, Jinu Gong, Yang Cao, Joonhyuk Kang

Federated Learning (FL) is a collaborative learning method that enables decentralized model training while preserving data privacy. Despite its promise in medical imaging, recent FL methods are often sensitive to local factors such as optimizers and learning rates, limiting their robustness in practical deployments. In this work, we revisit vanilla FL to clarify the impact of edge device configurations, benchmarking recent FL methods on colorectal pathology and blood cell classification task. We numerically show that the choice of local optimizer and learning rate has a greater effect on performance than the specific FL method. Moreover, we find that increasing local training epochs can either enhance or impair convergence, depending on the FL method. These findings indicate that appropriate edge-specific configuration is more crucial than algorithmic complexity for achieving effective FL.

nan

Article 848

Title@2025-07-26 (6): LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

Title: LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

LLM-Barber: Block-Aware Rebuilder für Sparsity Maske in One-Shot für große Sprachmodelle

LLM-Barber:大语言模型单点单层面罩块件重建器 2408.10631v2

Authors (9): Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Zhengfei Chen, Graziano Chesi, Ngai Wong, Hao Yu

Large language models (LLMs) have seen substantial growth, necessitating efficient model pruning techniques. Existing post-training pruning methods primarily measure weight importance in converged dense models, often overlooking changes in weight significance during the pruning process, leading to performance degradation. To address this issue, we present LLM-Barber (Block-Aware Rebuilder for Sparsity Mask in One-Shot), a novel one-shot pruning framework that rebuilds the sparsity mask of pruned models without any retraining or weight reconstruction. LLM-Barber incorporates block-aware error optimization across Self-Attention and MLP blocks, facilitating global performance optimization. We are the first to employ the product of weights and gradients as a pruning metric in the context of LLM post-training pruning. This enables accurate identification of weight importance in massive models and significantly reduces computational complexity compared to methods using secondorder information. Our experiments show that LLM-Barber efficiently prunes models from LLaMA and OPT families (7B to 13B) on a single A100 GPU in just 30 minutes, achieving state-of-the-art results in both perplexity and zero-shot performance across various language benchmarks. Code is available at https://github.com/YupengSu/LLM-Barber.

nan

Article 849

Title@2025-07-26 (6): Identification and estimation for matrix time series CP-factor models

Title: Identification and estimation for matrix time series CP-factor models

Identifizierung und Schätzung von Matrix-Zeitreihen CP-Faktor-Modellen

确定和估算矩阵时间序列、时间序列、CPC-因素模型 2410.05634v3

Authors (4): Jinyuan Chang, Yue Du, Guanglin Huang, Qiwei Yao

We propose a new method for identifying and estimating the CP-factor models for matrix time series. Unlike the generalized eigenanalysis-based method of Chang et al. (2023) for which the convergence rates of the associated estimators may suffer from small eigengaps as the asymptotic theory is based on some matrix perturbation analysis, the proposed new method enjoys faster convergence rates which are free from any eigengaps. It achieves this by turning the problem into a joint diagonalization of several matrices whose elements are determined by a basis of a linear system, and by choosing the basis carefully to avoid near co-linearity (see Proposition 5 and Section 4.3). Furthermore, unlike Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the proposed new method can handle rank-deficient factor loading matrices. Illustration with both simulated and real matrix time series data shows the advantages of the proposed new method.

nan

Article 850

Title@2025-07-26 (6): Sequence-based protein-protein interaction prediction and its applications in drug discovery

Title: Sequence-based protein-protein interaction prediction and its applications in drug discovery

Sequenzbasierte Protein-Protein-Interaktions-Vorhersage und ihre Anwendungen in der Arzneimittel-Entdeckung

基于序列的蛋白蛋白质-蛋白质相互作用预测及其在药物发现中的应用 2507.19805v1

Authors (3): François Charih, James R. Green, Kyle K. Biggar

Aberrant protein-protein interactions (PPIs) underpin a plethora of human diseases, and disruption of these harmful interactions constitute a compelling treatment avenue. Advances in computational approaches to PPI prediction have closely followed progress in deep learning and natural language processing. In this review, we outline the state-of the-art for sequence-based PPI prediction methods and explore their impact on target identification and drug discovery. We begin with an overview of commonly used training data sources and techniques used to curate these data to enhance the quality of the training set. Subsequently, we survey various PPI predictor types, including traditional similarity-based approaches, and deep learning-based approaches with a particular emphasis on the transformer architecture. Finally, we provide examples of PPI prediction in systems-level proteomics analyses, target identification, and design of therapeutic peptides and antibodies. We also take the opportunity to showcase the potential of PPI-aware drug discovery models in accelerating therapeutic development.

nan

Article 851

Title@2025-07-26 (6): AI-Based Clinical Rule Discovery for NMIBC Recurrence through Tsetlin Machines

Title: AI-Based Clinical Rule Discovery for NMIBC Recurrence through Tsetlin Machines

KI-basierte klinische Regel-Discovery für NMIBC-Wiederkehr durch Tsetlin-Maschinen

通过Tsetlin 机器对NMIBC的再现发现基于AI的临床规则 2507.19803v1

Authors (5): Saram Abbas, Naeem Soomro, Rishad Shafik, Rakesh Heer, Kabita Adhikari

Bladder cancer claims one life every 3 minutes worldwide. Most patients are diagnosed with non-muscle-invasive bladder cancer (NMIBC), yet up to 70% recur after treatment, triggering a relentless cycle of surgeries, monitoring, and risk of progression. Clinical tools like the EORTC risk tables are outdated and unreliable - especially for intermediate-risk cases. We propose an interpretable AI model using the Tsetlin Machine (TM), a symbolic learner that outputs transparent, human-readable logic. Tested on the PHOTO trial dataset (n=330), TM achieved an F1-score of 0.80, outperforming XGBoost (0.78), Logistic Regression (0.60), and EORTC (0.42). TM reveals the exact clauses behind each prediction, grounded in clinical features like tumour count, surgeon experience, and hospital stay - offering accuracy and full transparency. This makes TM a powerful, trustworthy decision-support tool ready for real-world adoption.

nan

Article 852

Title@2025-07-26 (6): AI/ML Life Cycle Management for Interoperable AI Native RAN

Title: AI/ML Life Cycle Management for Interoperable AI Native RAN

AI/ML Life Cycle Management für interoperable KI Native RAN

AI/ML 土著RAN 2507.18538v2

Authors (3): Chu-Hsiang Huang, Chao-Kai Wen, Geoffrey Ye Li

Artificial intelligence (AI) and machine learning (ML) models are rapidly permeating the 5G Radio Access Network (RAN), powering beam management, channel state information (CSI) feedback, positioning, and mobility prediction. However, without a standardized life-cycle management (LCM) framework, challenges, such as model drift, vendor lock-in, and limited transparency, hinder large-scale adoption. 3GPP Releases 16-20 progressively evolve AI/ML from experimental features to managed, interoperable network functions. Beginning with the Network Data Analytics Function (NWDAF) in Rel-16, subsequent releases introduced standardized interfaces for model transfer, execution, performance monitoring, and closed-loop control, culminating in Rel-20’s two-sided CSI-compression Work Item and vendor-agnostic LCM profile. This article reviews the resulting five-block LCM architecture, KPI-driven monitoring mechanisms, and inter-vendor collaboration schemes, while identifying open challenges in resource-efficient monitoring, environment drift detection, intelligent decision-making, and flexible model training. These developments lay the foundation for AI-native transceivers as a key enabler for 6G.

nan

Article 853

Title@2025-07-26 (6): Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling

Title: Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling

Weiterentwicklung der Materialentdeckung mit Valence Constrained Design in der Generativen Modellierung

在产生模型模型中用贵重受控设计发现增强材料的发现 2507.19799v1

Authors (9): Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, Weiwei Xie, Ju Li, Heather J. Kulik, Mingda Li

Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-$\kappa$ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery.

nan

Article 854

Title@2025-07-26 (6): Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Title: Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Langsame Entscheidungshäufigkeiten in der kontinuierlichen Kontrolle überwinden: Modellbasiertes Sequenz-Verstärkungs-Lernen für modellfreie Steuerung

克服持续控制中缓慢决定因素:无模式控制的示范序列强化学习 2410.08979v5

Authors (2): Devdhar Patel, Hava Siegelmann

Reinforcement learning (RL) is rapidly reaching and surpassing human-level control capabilities. However, state-of-the-art RL algorithms often require timesteps and reaction times significantly faster than human capabilities, which is impractical in real-world settings and typically necessitates specialized hardware. We introduce Sequence Reinforcement Learning (SRL), an RL algorithm designed to produce a sequence of actions for a given input state, enabling effective control at lower decision frequencies. SRL addresses the challenges of learning action sequences by employing both a model and an actor-critic architecture operating at different temporal scales. We propose a “temporal recall” mechanism, where the critic uses the model to estimate intermediate states between primitive actions, providing a learning signal for each individual action within the sequence. Once training is complete, the actor can generate action sequences independently of the model, achieving model-free control at a slower frequency. We evaluate SRL on a suite of continuous control tasks, demonstrating that it achieves performance comparable to state-of-the-art algorithms while significantly reducing actor sample complexity. To better assess performance across varying decision frequencies, we introduce the Frequency-Averaged Score (FAS) metric. Our results show that SRL significantly outperforms traditional RL algorithms in terms of FAS, making it particularly suitable for applications requiring variable decision frequencies. Furthermore, we compare SRL with model-based online planning, showing that SRL achieves comparable FAS while leveraging the same model during training that online planners use for planning.

nan

Article 855

Title@2025-07-26 (6): Analyzing and Mitigating Repetitions in Trip Recommendation

Title: Analyzing and Mitigating Repetitions in Trip Recommendation

Analyse und Eindämmung von Wiederholungen in der Reiseempfehlung

分析和减轻《Trip建议》中的重复上诉 2507.19798v1

Authors (6): Wenzheng Shu, Kangqi Xu, Wenxin Tai, Ting Zhong, Yong Wang, Fan Zhou

Trip recommendation has emerged as a highly sought-after service over the past decade. Although current studies significantly understand human intention consistency, they struggle with undesired repetitive outcomes that need resolution. We make two pivotal discoveries using statistical analyses and experimental designs: (1) The occurrence of repetitions is intricately linked to the models and decoding strategies. (2) During training and decoding, adding perturbations to logits can reduce repetition. Motivated by these observations, we introduce AR-Trip (Anti Repetition for Trip Recommendation), which incorporates a cycle-aware predictor comprising three mechanisms to avoid duplicate Points-of-Interest (POIs) and demonstrates their effectiveness in alleviating repetition. Experiments on four public datasets illustrate that AR-Trip successfully mitigates repetition issues while enhancing precision.

nan

Article 856

Title@2025-07-26 (6): Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning

Title: Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning

Kleiner, schneller, billiger: Architekturdesigns für effizientes maschinelles Lernen

更小、更快、更便宜:高效机械学习的建筑设计 2507.19795v1

Authors (1): Steven Walton

Major advancements in the capabilities of computer vision models have been primarily fueled by rapid expansion of datasets, model parameters, and computational budgets, leading to ever-increasing demands on computational infrastructure. However, as these models are deployed in increasingly diverse and resource-constrained environments, there is a pressing need for architectures that can deliver high performance while requiring fewer computational resources. This dissertation focuses on architectural principles through which models can achieve increased performance while reducing their computational demands. We discuss strides towards this goal through three directions. First, we focus on data ingress and egress, investigating how information may be passed into and retrieved from our core neural processing units. This ensures that our models make the most of available data, allowing smaller architectures to become more performant. Second, we investigate modifications to the core neural architecture, applied to restricted attention in vision transformers. This section explores how removing uniform context windows in restricted attention increases the expressivity of the underlying neural architecture. Third, we explore the natural structures of Normalizing Flows and how we can leverage these properties to better distill model knowledge. These contributions demonstrate that careful design of neural architectures can increase the efficiency of machine learning algorithms, allowing them to become smaller, faster, and cheaper.

nan

Article 857

Title@2025-07-26 (6): Adversarial Combinatorial Semi-bandits with Graph Feedback

Title: Adversarial Combinatorial Semi-bandits with Graph Feedback

Adversariale Kombinatoriale Halbbänder mit Graph Feedback

带有图图反馈的半斜面 2502.18826v5

Authors (1): Yuxiao Wen

In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$. We establish that the optimal regret over a time horizon $T$ scales as $\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})$, where $S$ is the size of the combinatorial decisions and $\alpha$ is the independence number of $G$. This result interpolates between the known regrets $\widetilde\Theta(S\sqrt{T})$ under full information (i.e., $G$ is complete) and $\widetilde\Theta(\sqrt{KST})$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations. We also show that online stochastic mirror descent (OSMD) that only realizes convexified actions in expectation is suboptimal. In addition, we describe the problem of \emph{combinatorial semi-bandits with general capacity} and apply our results to derive an improved regret upper bound, which may be of independent interest.

nan

Article 858

Title@2025-07-26 (6): Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures

Title: Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures

Sparse-Mode Dynamische Moduszersetzung für die Disambiguierung lokaler und globaler Strukturen

局部和全球结构的偏差分解 2507.19787v1

Authors (4): Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz

The dynamic mode decomposition (DMD) is a data-driven approach that extracts the dominant features from spatiotemporal data. In this work, we introduce sparse-mode DMD, a new variant of the optimized DMD framework that specifically leverages sparsity-promoting regularization in order to approximate DMD modes which have localized spatial structure. The algorithm maintains the noise-robust properties of optimized DMD while disambiguating between modes which are spatially local versus global in nature. In many applications, such modes are associated with discrete and continuous spectra respectively, thus allowing the algorithm to explicitly construct, in an unsupervised manner, the distinct portions of the spectrum. We demonstrate this by analyzing synthetic and real-world systems, including examples from optical waveguides, quantum mechanics, and sea surface temperature data.

nan

Article 859

Title@2025-07-26 (6): SpecBPP: A Self-Supervised Learning Approach for Hyperspectral Representation and Soil Organic Carbon Estimation

Title: SpecBPP: A Self-Supervised Learning Approach for Hyperspectral Representation and Soil Organic Carbon Estimation

SpecBPP: Ein selbstüberwachter Lernansatz für die Hyperspektraldarstellung und Boden-organische Kohlenstoffabschätzung

SpecBPP:超光谱代表性和土壤有机碳估计的自我监督学习方法 2507.19781v1

Authors (4): Daniel La’ah Ayuba, Jean-Yves Guillemaut, Belen Marti-Cardona, Oscar Mendez Maldonado

Self-supervised learning has revolutionized representation learning in vision and language, but remains underexplored for hyperspectral imagery (HSI), where the sequential structure of spectral bands offers unique opportunities. In this work, we propose Spectral Band Permutation Prediction (SpecBPP), a novel self-supervised learning framework that leverages the inherent spectral continuity in HSI. Instead of reconstructing masked bands, SpecBPP challenges a model to recover the correct order of shuffled spectral segments, encouraging global spectral understanding. We implement a curriculum-based training strategy that progressively increases permutation difficulty to manage the factorial complexity of the permutation space. Applied to Soil Organic Carbon (SOC) estimation using EnMAP satellite data, our method achieves state-of-the-art results, outperforming both masked autoencoder (MAE) and joint-embedding predictive (JEPA) baselines. Fine-tuned on limited labeled samples, our model yields an $R^2$ of 0.9456, RMSE of 1.1053%, and RPD of 4.19, significantly surpassing traditional and self-supervised benchmarks. Our results demonstrate that spectral order prediction is a powerful pretext task for hyperspectral understanding, opening new avenues for scientific representation learning in remote sensing and beyond.

nan

Article 860

Title@2025-07-26 (6): NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing

Title: NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing

NeuSemSlice: Auf dem Weg zu einer effektiven DNN-Modellpflege über Semantisches Schneiden auf Neuron-Ebene

NeusSemelice:通过中程语义剪切实现有效的 DNN 模型维护 2407.20281v2

Authors (7): Shide Zhou, Tianlin Li, Yihao Huang, Ling Shi, Kailong Wang, Yang Liu, Haoyu Wang

Deep Neural networks (DNNs), extensively applied across diverse disciplines, are characterized by their integrated and monolithic architectures, setting them apart from conventional software systems. This architectural difference introduces particular challenges to maintenance tasks, such as model restructure (e.g., model compression), re-adaptation (e.g., fitting new samples), and incremental development (e.g., continual knowledge accumulation). Prior research addresses these challenges by identifying task-critical neuron layers, and dividing neural networks into semantically-similar sequential modules. However, such layer-level approaches fail to precisely identify and manipulate neuron-level semantic components, restricting their applicability to finer-grained model maintenance tasks. In this work, we implement NeuSemSlice, a novel framework that introduces the semantic slicing technique to effectively identify critical neuron-level semantic components in DNN models for semantic-aware model maintenance tasks. Specifically, semantic slicing identifies, categorizes and merges critical neurons across different categories and layers according to their semantic similarity, enabling their flexibility and effectiveness in the subsequent tasks. For semantic-aware model maintenance tasks, we provide a series of novel strategies based on semantic slicing to enhance NeuSemSlice. They include semantic components (i.e., critical neurons) preservation for model restructure, critical neuron tuning for model re-adaptation, and non-critical neuron training for model incremental development. A thorough evaluation has demonstrated that NeuSemSlice significantly outperforms baselines in all three tasks.

nan

Article 861

Title@2025-07-26 (6): Bag of Coins: A Statistical Probe into Neural Confidence Structures

Title: Bag of Coins: A Statistical Probe into Neural Confidence Structures

Münzbeutel: Eine statistische Sonde in neurale Vertrauensstrukturen

《一袋硬币:对神经信心结构的统计研究》 2507.19774v1

Authors (5): Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed, Sameera Hewage, Bruce Wade

Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier’s logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model’s top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model’s predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.

nan

Article 862

Title@2025-07-26 (6): Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction

Title: Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction

Imitation Learning in Continuous Action Spaces: Compounding Fehler ohne Wechselwirkungen

连续行动空间的模拟学习:没有相互作用的减缓化合物错误 2507.09061v2

Authors (4): Thomas T. Zhang, Daniel Pfrommer, Nikolai Matni, Max Simchowitz

We study the problem of imitating an expert demonstrator in a continuous state-and-action dynamical system. While imitation learning in discrete settings such as autoregressive language modeling has seen immense success and popularity in recent years, imitation in physical settings such as autonomous driving and robot learning has proven comparably more complex due to the compounding errors problem, often requiring elaborate set-ups to perform stably. Recent work has demonstrated that even in benign settings, exponential compounding errors are unavoidable when learning solely from expert-controlled trajectories, suggesting the need for more advanced policy parameterizations or data augmentation. To this end, we present minimal interventions that provably mitigate compounding errors in continuous state-and-action imitation learning. When the system is open-loop stable, we prescribe “action chunking,” i.e., predicting and playing sequences of actions in open-loop; when the system is possibly unstable, we prescribe “noise injection,” i.e., adding noise during expert demonstrations. These interventions align with popular choices in modern robot learning, though the benefits we derive are distinct from the effects they were designed to target. Our results draw insights and tools from both control theory and reinforcement learning; however, our analysis reveals novel considerations that do not naturally arise when either literature is considered in isolation.

nan

Article 863

Title@2025-07-26 (6): Large Language Model Agent for Structural Drawing Generation Using ReAct Prompt Engineering and Retrieval Augmented Generation

Title: Large Language Model Agent for Structural Drawing Generation Using ReAct Prompt Engineering and Retrieval Augmented Generation

Large Language Model Agent for Structural Drawing Generation Using ReAct Prompt Engineering and Retrieval Augmented Generation

利用再行动即时工程和再取回增强型 2507.19771v1

Authors (7): Xin Zhang, Lissette Iturburu, Juan Nicolas Villamizar, Xiaoyu Liu, Manuel Salmeron, Shirley J. Dyke, Julio Ramirez

Structural drawings are widely used in many fields, e.g., mechanical engineering, civil engineering, etc. In civil engineering, structural drawings serve as the main communication tool between architects, engineers, and builders to avoid conflicts, act as legal documentation, and provide a reference for future maintenance or evaluation needs. They are often organized using key elements such as title/subtitle blocks, scales, plan views, elevation view, sections, and detailed sections, which are annotated with standardized symbols and line types for interpretation by engineers and contractors. Despite advances in software capabilities, the task of generating a structural drawing remains labor-intensive and time-consuming for structural engineers. Here we introduce a novel generative AI-based method for generating structural drawings employing a large language model (LLM) agent. The method incorporates a retrieval-augmented generation (RAG) technique using externally-sourced facts to enhance the accuracy and reliability of the language model. This method is capable of understanding varied natural language descriptions, processing these to extract necessary information, and generating code to produce the desired structural drawing in AutoCAD. The approach developed, demonstrated and evaluated herein enables the efficient and direct conversion of a structural drawing’s natural language description into an AutoCAD drawing, significantly reducing the workload compared to current working process associated with manual drawing production, facilitating the typical iterative process of engineers for expressing design ideas in a simplified way.

nan

Article 864

Title@2025-07-26 (6): Generalizable Targeted Data Poisoning against Varying Physical Objects

Title: Generalizable Targeted Data Poisoning against Varying Physical Objects

Verallgemeinerbare gezielte Datenvergiftung gegen unterschiedliche physische Objekte

针对不同物体的通用定向中毒数据 2412.03908v2

Authors (6): Zhizhen Chen, Zhengyu Zhao, Subrat Kishore Dutta, Chenhao Lin, Chao Shen, Xiao Zhang

Targeted data poisoning (TDP) aims to compromise the model’s prediction on a specific (test) target by perturbing a small subset of training data. Existing work on TDP has focused on an overly ideal threat model in which the same image sample of the target is used during both poisoning and inference stages. However, in the real world, a target object often appears in complex variations due to changes of physical settings such as viewpoint, background, and lighting conditions. In this work, we take the first step toward understanding the real-world threats of TDP by studying its generalizability across varying physical conditions. In particular, we observe that solely optimizing gradient directions, as adopted by the best previous TDP method, achieves limited generalization. To address this limitation, we propose optimizing both the gradient direction and magnitude for more generalizable gradient matching, thereby leading to higher poisoning success rates. For instance, our method outperforms the state of the art by 19.49% when poisoning CIFAR-10 images targeting multi-view cars.

nan

Article 865

Title@2025-07-26 (6): MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction

Title: MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction

MIAT: Manöver-Intention-Bewusst-Transformer für Spatio-Temporale Trajektorien-Vorhersage

MIAT: 斯帕蒂奥-时热轨程预测的操纵-有意-软件变换器 2504.05059v2

Authors (4): Chandra Raskoti, Iftekharul Islam, Xuan Wang, Weizi Li

Accurate vehicle trajectory prediction is critical for safe and efficient autonomous driving, especially in mixed traffic environments when both human-driven and autonomous vehicles co-exist. However, uncertainties introduced by inherent driving behaviors – such as acceleration, deceleration, and left and right maneuvers – pose significant challenges for reliable trajectory prediction. We introduce a Maneuver-Intention-Aware Transformer (MIAT) architecture, which integrates a maneuver intention awareness control mechanism with spatiotemporal interaction modeling to enhance long-horizon trajectory predictions. We systematically investigate the impact of varying awareness of maneuver intention on both short- and long-horizon trajectory predictions. Evaluated on the real-world NGSIM dataset and benchmarked against various transformer- and LSTM-based methods, our approach achieves an improvement of up to 4.7% in short-horizon predictions and a 1.6% in long-horizon predictions compared to other intention-aware benchmark methods. Moreover, by leveraging intention awareness control mechanism, MIAT realizes an 11.1% performance boost in long-horizon predictions, with a modest drop in short-horizon performance. The source code and datasets are available at https://github.com/cpraskoti/MIAT.

nan

Article 866

Title@2025-07-26 (6): A Machine Learning Framework for Predicting Microphysical Properties of Ice Crystals from Cloud Particle Imagery

Title: A Machine Learning Framework for Predicting Microphysical Properties of Ice Crystals from Cloud Particle Imagery

Ein Machine Learning Framework zur Vorhersage mikrophysikalischer Eigenschaften von Eiskristallen aus der Cloud-Partikelbildgebung

从云粒图像中预测冰晶微物理特性的机器学习框架 2507.19759v1

Authors (6): Joseph Ko, Jerry Harrington, Kara Sulia, Vanessa Przybylo, Marcus van Lier-Walqui, Kara Lamb

The microphysical properties of ice crystals are important because they significantly alter the radiative properties and spatiotemporal distributions of clouds, which in turn strongly affect Earth’s climate. However, it is challenging to measure key properties of ice crystals, such as mass or morphological features. Here, we present a framework for predicting three-dimensional (3D) microphysical properties of ice crystals from in situ two-dimensional (2D) imagery. First, we computationally generate synthetic ice crystals using 3D modeling software along with geometric parameters estimated from the 2021 Ice Cryo-Encapsulation Balloon (ICEBall) field campaign. Then, we use synthetic crystals to train machine learning (ML) models to predict effective density ($\rho_{e}$), effective surface area ($A_e$), and number of bullets ($N_b$) from synthetic rosette imagery. When tested on unseen synthetic images, we find that our ML models can predict microphysical properties with high accuracy. For $\rho_{e}$ and $A_e$, respectively, our best-performing single view models achieved $R^2$ values of 0.99 and 0.98. For $N_b$, our best single view model achieved a balanced accuracy and F1 score of 0.91. We also quantify the marginal prediction improvements from incorporating a second view. A stereo view ResNet-18 model reduced RMSE by 40% for both $\rho_e$ and $A_e$, relative to a single view ResNet-18 model. For $N_b$, we find that a stereo view ResNet-18 model improved the F1 score by 8%. This work provides a novel ML-driven framework for estimating ice microphysical properties from in situ imagery, which will allow for downstream constraints on microphysical parameterizations, such as the mass-size relationship.

nan

Article 867

Title@2025-07-26 (6): MTCAE-DFER: Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition

Title: MTCAE-DFER: Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition

MTCAE-DFER: Multi-Task Cascaded Autoencoder für dynamische Gesichtsausdruckerkennung

MTCAE-DFER: 用于确认动态面谱表达式的多塔卡岩层自动编码器 2412.18988v2

Authors (3): Peihao Xiang, Kaida Wu, Ou Bai

This paper expands the cascaded network branch of the autoencoder-based multi-task learning (MTL) framework for dynamic facial expression recognition, namely Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition (MTCAE-DFER). MTCAE-DFER builds a plug-and-play cascaded decoder module, which is based on the Vision Transformer (ViT) architecture and employs the decoder concept of Transformer to reconstruct the multi-head attention module. The decoder output from the previous task serves as the query (Q), representing local dynamic features, while the Video Masked Autoencoder (VideoMAE) shared encoder output acts as both the key (K) and value (V), representing global dynamic features. This setup facilitates interaction between global and local dynamic features across related tasks. Additionally, this proposal aims to alleviate overfitting of complex large model. We utilize autoencoder-based multi-task cascaded learning approach to explore the impact of dynamic face detection and dynamic face landmark on dynamic facial expression recognition, which enhances the model’s generalization ability. After we conduct extensive ablation experiments and comparison with state-of-the-art (SOTA) methods on various public datasets for dynamic facial expression recognition, the robustness of the MTCAE-DFER model and the effectiveness of global-local dynamic feature interaction among related tasks have been proven.

nan

Article 868

Title@2025-07-26 (6): Moving Out: Physically-grounded Human-AI Collaboration

Title: Moving Out: Physically-grounded Human-AI Collaboration

Ausstieg: physikalisch begründete Mensch-AI-Kollaboration

搬出:基于身体的人类 – – AI协作 2507.18623v2

Authors (5): Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo

The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce Moving Out, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models’ abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.

nan

Article 869

Title@2025-07-26 (6): Modeling enzyme temperature stability from sequence segment perspective

Title: Modeling enzyme temperature stability from sequence segment perspective

Modellierung von Enzymtemperaturstabilität aus Sequenzsegment-Perspektive

从序列段角度对酶温度稳定性进行建模 2507.19755v1

Authors (13): Ziqi Zhang, Shiheng Chen, Runze Yang, Zhisheng Wei, Wei Zhang, Lei Wang, Zhanzhi Liu, Fengshan Zhang, Jing Wu, Xiaoyong Pan, Hongbin Shen, Longbing Cao, Zhaohong Deng

Developing enzymes with desired thermal properties is crucial for a wide range of industrial and research applications, and determining temperature stability is an essential step in this process. Experimental determination of thermal parameters is labor-intensive, time-consuming, and costly. Moreover, existing computational approaches are often hindered by limited data availability and imbalanced distributions. To address these challenges, we introduce a curated temperature stability dataset designed for model development and benchmarking in enzyme thermal modeling. Leveraging this dataset, we present the \textit{Segment Transformer}, a novel deep learning framework that enables efficient and accurate prediction of enzyme temperature stability. The model achieves state-of-the-art performance with an RMSE of 24.03, MAE of 18.09, and Pearson and Spearman correlations of 0.33, respectively. These results highlight the effectiveness of incorporating segment-level representations, grounded in the biological observation that different regions of a protein sequence contribute unequally to thermal behavior. As a proof of concept, we applied the Segment Transformer to guide the engineering of a cutinase enzyme. Experimental validation demonstrated a 1.64-fold improvement in relative activity following heat treatment, achieved through only 17 mutations and without compromising catalytic function.

nan

Article 870

Title@2025-07-26 (6): Detecting Multimedia Generated by Large AI Models: A Survey

Title: Detecting Multimedia Generated by Large AI Models: A Survey

Multimedia-Erkennung durch große KI-Modelle: Eine Umfrage

由大型AI模型生成的多媒体检测:调查 2402.00045v7

Authors (10): Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, online detection tools, and evaluation metrics to provide a valuable resource for researchers and practitioners in this field. Most importantly, we offer a focused analysis from a social media perspective to highlight their broader societal impact. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.

nan

Article 871

Title@2025-07-26 (6): Extended Histogram-based Outlier Score (EHBOS)

Title: Extended Histogram-based Outlier Score (EHBOS)

Erweiterter Histogrammbasierter Outlier-Score (EHBOS)

以直方图为基础的扩展外部分数(EHBOS) 2502.05719v2

Authors (1): Tanvir Islam

Histogram-Based Outlier Score (HBOS) is a widely used outlier or anomaly detection method known for its computational efficiency and simplicity. However, its assumption of feature independence limits its ability to detect anomalies in datasets where interactions between features are critical. In this paper, we propose the Extended Histogram-Based Outlier Score (EHBOS), which enhances HBOS by incorporating two-dimensional histograms to capture dependencies between feature pairs. This extension allows EHBOS to identify contextual and dependency-driven anomalies that HBOS fails to detect. We evaluate EHBOS on 17 benchmark datasets, demonstrating its effectiveness and robustness across diverse anomaly detection scenarios. EHBOS outperforms HBOS on several datasets, particularly those where feature interactions are critical in defining the anomaly structure, achieving notable improvements in ROC AUC. These results highlight that EHBOS can be a valuable extension to HBOS, with the ability to model complex feature dependencies. EHBOS offers a powerful new tool for anomaly detection, particularly in datasets where contextual or relational anomalies play a significant role.

nan

Article 872

Title@2025-07-26 (6): DOA: A Degeneracy Optimization Agent with Adaptive Pose Compensation Capability based on Deep Reinforcement Learning

Title: DOA: A Degeneracy Optimization Agent with Adaptive Pose Compensation Capability based on Deep Reinforcement Learning

DOA: Ein Degenerierungs-Optimierungs-Agent mit adaptiver Pose-Kompensationsfähigkeit auf Basis von Deep Reinforcement Learning

DOA: 一种基于深强化学习的适应性胶囊补偿能力脱精优化剂 2507.19742v1

Authors (10): Yanbin Li, Canran Xiao, Hongyang He, Shenghai Yuan, Zong Ke, Jiajie Yu, Zixiong Qin, Zhiguo Zhang, Wenzheng Chi, Wei Zhang

Particle filter-based 2D-SLAM is widely used in indoor localization tasks due to its efficiency. However, indoor environments such as long straight corridors can cause severe degeneracy problems in SLAM. In this paper, we use Proximal Policy Optimization (PPO) to train an adaptive degeneracy optimization agent (DOA) to address degeneracy problem. We propose a systematic methodology to address three critical challenges in traditional supervised learning frameworks: (1) data acquisition bottlenecks in degenerate dataset, (2) inherent quality deterioration of training samples, and (3) ambiguity in annotation protocol design. We design a specialized reward function to guide the agent in developing perception capabilities for degenerate environments. Using the output degeneracy factor as a reference weight, the agent can dynamically adjust the contribution of different sensors to pose optimization. Specifically, the observation distribution is shifted towards the motion model distribution, with the step size determined by a linear interpolation formula related to the degeneracy factor. In addition, we employ a transfer learning module to endow the agent with generalization capabilities across different environments and address the inefficiency of training in degenerate environments. Finally, we conduct ablation studies to demonstrate the rationality of our model design and the role of transfer learning. We also compare the proposed DOA with SOTA methods to prove its superior degeneracy detection and optimization capabilities across various environments.

nan

Article 873

Title@2025-07-26 (6): LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

Title: LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung

LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制 2506.10235v2

Authors (5): Chen-Chia Chang, Wan-Hsuan Lin, Yikang Shen, Yiran Chen, Xin Zhang

Automation of analog topology design is crucial due to customized requirements of modern applications with heavily manual engineering efforts. The state-of-the-art work applies a sequence-to-sequence approach and supervised finetuning on language models to generate topologies given user specifications. However, its circuit formulation is inefficient due to O(

2) token length and suffers from low precision sensitivity to numeric inputs. In this work, we introduce LaMAGIC2, a succinct float-input canonical formulation with identifier (SFCI) for language model-based analog topology generation. SFCI addresses these challenges by improving component-type recognition through identifier-based representations, reducing token length complexity to O(

), and enhancing numeric precision sensitivity for better performance under tight tolerances. Our experiments demonstrate that LaMAGIC2 achieves 34% higher success rates under a tight tolerance of 0.01 and 10X lower MSEs compared to a prior method. LaMAGIC2 also exhibits better transferability for circuits with more vertices with up to 58.5% improvement. These advancements establish LaMAGIC2 as a robust framework for analog topology generation.

nan

Article 874

Title@2025-07-26 (6): Predicting Human Mobility in Disasters via LLM-Enhanced Cross-City Learning

Title: Predicting Human Mobility in Disasters via LLM-Enhanced Cross-City Learning

Vorhersage der menschlichen Mobilität in Katastrophen durch LLM-verbessertes Stadtübergreifendes Lernen

通过LLM-加强跨城市学习,预测人类在灾害中的流动性 2507.19737v1

Authors (4): Yinzhou Tang, Huandong Wang, Xiaochen Fan, Yong Li

The vulnerability of cities to natural disasters has increased with urbanization and climate change, making it more important to predict human mobility in the disaster scenarios for downstream tasks including location-based early disaster warning and pre-allocating rescue resources, etc. However, existing human mobility prediction models are mainly designed for normal scenarios, and fail to adapt to disaster scenarios due to the shift of human mobility patterns under disaster. To address this issue, we introduce \textbf{DisasterMobLLM}, a mobility prediction framework for disaster scenarios that can be integrated into existing deep mobility prediction methods by leveraging LLMs to model the mobility intention and transferring the common knowledge of how different disasters affect mobility intentions between cities. This framework utilizes a RAG-Enhanced Intention Predictor to forecast the next intention, refines it with an LLM-based Intention Refiner, and then maps the intention to an exact location using an Intention-Modulated Location Predictor. Extensive experiments illustrate that DisasterMobLLM can achieve a 32.8\% improvement in terms of Acc@1 and a 35.0\% improvement in terms of the F1-score of predicting immobility compared to the baselines. The code is available at https://github.com/tsinghua-fib-lab/DisasterMobLLM.

nan

Article 875

Title@2025-07-26 (6): Reinforcement Learning for Finite Space Mean-Field Type Games

Title: Reinforcement Learning for Finite Space Mean-Field Type Games

Verstärkung Lernen für Finite Space Mean-Field Type Spiele

有限空间中场内运动会强化学习 2409.18152v3

Authors (3): Kai Shao, Jiacheng Shen, Mathieu Laurière

Mean field type games (MFTGs) describe Nash equilibria between large coalitions: each coalition consists of a continuum of cooperative agents who maximize the average reward of their coalition while interacting non-cooperatively with a finite number of other coalitions. Although the theory has been extensively developed, we are still lacking efficient and scalable computational methods. Here, we develop reinforcement learning methods for such games in a finite space setting with general dynamics and reward functions. We start by proving that the MFTG solution yields approximate Nash equilibria in finite-size coalition games. We then propose two algorithms. The first is based on the quantization of mean-field spaces and Nash Q-learning. We provide convergence and stability analysis. We then propose a deep reinforcement learning algorithm, which can scale to larger spaces. Numerical experiments in 4 environments with mean-field distributions of dimension up to $200$ show the scalability and efficiency of the proposed method.

nan

Article 876

Title@2025-07-26 (6): A Metabolic-Imaging Integrated Model for Prognostic Prediction in Colorectal Liver Metastases

Title: A Metabolic-Imaging Integrated Model for Prognostic Prediction in Colorectal Liver Metastases

Ein metabolisch-imaging integriertes Modell für prognostische Vorhersagen in Colorectal Lebermetastasen

彩色活性器元件中预测预测综合模型 2507.19734v1

Authors (5): Qinlong Li, Pu Sun, Guanlin Zhu, Tianjiao Liang, Honggang QI

Prognostic evaluation in patients with colorectal liver metastases (CRLM) remains challenging due to suboptimal accuracy of conventional clinical models. This study developed and validated a robust machine learning model for predicting postoperative recurrence risk. Preliminary ensemble models achieved exceptionally high performance (AUC $>$ 0.98) but incorporated postoperative features, introducing data leakage risks. To enhance clinical applicability, we restricted input variables to preoperative baseline clinical parameters and radiomic features from contrast-enhanced CT imaging, specifically targeting recurrence prediction at 3, 6, and 12 months postoperatively. The 3-month recurrence prediction model demonstrated optimal performance with an AUC of 0.723 in cross-validation. Decision curve analysis revealed that across threshold probabilities of 0.55-0.95, the model consistently provided greater net benefit than “treat-all” or “treat-none” strategies, supporting its utility in postoperative surveillance and therapeutic decision-making. This study successfully developed a robust predictive model for early CRLM recurrence with confirmed clinical utility. Importantly, it highlights the critical risk of data leakage in clinical prognostic modeling and proposes a rigorous framework to mitigate this issue, enhancing model reliability and translational value in real-world settings.

nan

Article 877

Title@2025-07-26 (6): The Pitfalls of Imitation Learning when Actions are Continuous

Title: The Pitfalls of Imitation Learning when Actions are Continuous

Die Pitfalls des Imitationslernens, wenn die Handlungen kontinuierlich sind

连续行动时的模拟学习空洞 2503.09722v4

Authors (3): Max Simchowitz, Daniel Pfrommer, Ali Jadbabaie

We study the problem of imitating an expert demonstrator in a discrete-time, continuous state-and-action control system. We show that, even if the dynamics satisfy a control-theoretic property called exponential stability (i.e. the effects of perturbations decay exponentially quickly), and the expert is smooth and deterministic, any smooth, deterministic imitator policy necessarily suffers error on execution that is exponentially larger, as a function of problem horizon, than the error under the distribution of expert training data. Our negative result applies to any algorithm which learns solely from expert data, including both behavior cloning and offline-RL algorithms, unless the algorithm produces highly “improper” imitator policies–those which are non-smooth, non-Markovian, or which exhibit highly state-dependent stochasticity–or unless the expert trajectory distribution is sufficiently “spread.” We provide experimental evidence of the benefits of these more complex policy parameterizations, explicating the benefits of today’s popular policy parameterizations in robot learning (e.g. action-chunking and diffusion policies). We also establish a host of complementary negative and positive results for imitation in control systems.

nan

Article 878

Title@2025-07-26 (6): Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness

Title: Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness

Deep RL Dual Sourcing Bestandsmanagement mit Versorgungs- und Kapazitätsrisiko-Bewusstsein

具有供应和能力风险意识的深入RL 双重保值双重保值库存管理 2507.14446v3

Authors (3): Defeng Liu, Ying Liu, Carson Eisenach

In this work, we study how to efficiently apply reinforcement learning (RL) for solving large-scale stochastic optimization problems by leveraging intervention models. The key of the proposed methodology is to better explore the solution space by simulating and composing the stochastic processes using pre-trained deep learning (DL) models. We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization. In particular, we employ deep RL models for learning and forecasting the stochastic supply chain processes under a range of assumptions. Moreover, we also introduce a constraint coordination mechanism, designed to forecast dual costs given the cross-products constraints in the inventory network. We highlight that instead of directly modeling the complex physical constraints into the RL optimization problem and solving the stochastic problem as a whole, our approach breaks down those supply chain processes into scalable and composable DL modules, leading to improved performance on large real-world datasets. We also outline open problems for future research to further investigate the efficacy of such models.

nan

Article 879

Title@2025-07-26 (6): Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Title: Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Geometrische Struktur von flachen neuronalen Netzwerken und konstruktive Kostenminimierung ${\mathcal L}^2$

浅层神经网络的几何结构以及将成本降至最低的建设性美元=2美元 2309.10370v3

Authors (2): Thomas Chen, Patrícia Muñoz Ewald

In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow ReLU networks through the explicit construction of upper bounds which appeal to the structure of classification data, without use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider an $\mathcal{L}^2$ cost function, input space $\mathbb{R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(\delta_P)$ where $\delta_P$ measures the signal-to-noise ratio of training data. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}^M$. We comment on the characterization of the global minimum of the cost function in the given context.

nan

Article 880

Title@2025-07-25 (5): GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting

Title: GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting

GSCache: Echtzeit-Radianz-Caching für Volume Path Tracing mit 3D Gaussian Splatting

GGSCache: 使用 3D Gaussian Splatting 进行音量路径追踪的实时辐射缓存 2507.19718v1

Authors (4): David Bauer, Qi Wu, Hamid Gadirov, Kwan-Liu Ma

Real-time path tracing is rapidly becoming the standard for rendering in entertainment and professional applications. In scientific visualization, volume rendering plays a crucial role in helping researchers analyze and interpret complex 3D data. Recently, photorealistic rendering techniques have gained popularity in scientific visualization, yet they face significant challenges. One of the most prominent issues is slow rendering performance and high pixel variance caused by Monte Carlo integration. In this work, we introduce a novel radiance caching approach for path-traced volume rendering. Our method leverages advances in volumetric scene representation and adapts 3D Gaussian splatting to function as a multi-level, path-space radiance cache. This cache is designed to be trainable on the fly, dynamically adapting to changes in scene parameters such as lighting configurations and transfer functions. By incorporating our cache, we achieve less noisy, higher-quality images without increasing rendering costs. To evaluate our approach, we compare it against a baseline path tracer that supports uniform sampling and next-event estimation and the state-of-the-art for neural radiance caching. Through both quantitative and qualitative analyses, we demonstrate that our path-space radiance cache is a robust solution that is easy to integrate and significantly enhances the rendering quality of volumetric visualization applications while maintaining comparable computational efficiency.

nan

Article 881

Title@2025-07-25 (5): Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search

Title: Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search

Beyond Neighbors: Semantische Kompression und Graph-Augmented Retrieval für erweiterte Vektorsuche

近邻以外地区:用于增强矢量搜索的语义压缩和图形放大检索 2507.19715v1

Authors (2): Rahul Raja, Arpita Vats

Vector databases typically rely on approximate nearest neighbor (ANN) search to retrieve the top-k closest vectors to a query in embedding space. While effective, this approach often yields semantically redundant results, missing the diversity and contextual richness required by applications such as retrieval-augmented generation (RAG), multi-hop QA, and memory-augmented agents. We introduce a new retrieval paradigm: semantic compression, which aims to select a compact, representative set of vectors that captures the broader semantic structure around a query. We formalize this objective using principles from submodular optimization and information geometry, and show that it generalizes traditional top-k retrieval by prioritizing coverage and diversity. To operationalize this idea, we propose graph-augmented vector retrieval, which overlays semantic graphs (e.g., kNN or knowledge-based links) atop vector spaces to enable multi-hop, context-aware search. We theoretically analyze the limitations of proximity-based retrieval under high-dimensional concentration and highlight how graph structures can improve semantic coverage. Our work outlines a foundation for meaning-centric vector search systems, emphasizing hybrid indexing, diversity-aware querying, and structured semantic retrieval. We make our implementation publicly available to foster future research in this area.

nan

Article 882

Title@2025-07-25 (5): Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Title: Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Oranits: Missionszuweisung und Aufgabe-Offloading in Open RAN-basierten ITS mit Hilfe von Metaheuristic und Deep Reinforcement Learning

Oranits:利用超常和深强化学习在以开放RAN为基础的ITS中执行特派任务和卸载任务 2507.19712v1

Authors (8): Ngoc Hung Nguyen, Nguyen Van Thieu, Quang-Trung Luu, Anh Tuan Nguyen, Senura Wanasekara, Nguyen Cong Luong, Fatemeh Kavehmadavani, Van-Dinh Nguyen

In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.

nan

Article 883

Title@2025-07-25 (5): A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

Title: A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

Ein leichtes Deep Learning-basiertes Modell für das Ranking von Einflussknoten in komplexen Netzwerken

在复杂网络中确定有影响的节点的轻量级深学习模式 2507.19702v1

Authors (2): Mohammed A. Ramadhan, Abdulhakeem O. Mohammed

Identifying influential nodes in complex networks is a critical task with a wide range of applications across different domains. However, existing approaches often face trade-offs between accuracy and computational efficiency. To address these challenges, we propose 1D-CGS, a lightweight and effective hybrid model that integrates the speed of one-dimensional convolutional neural networks (1D-CNN) with the topological representation power of GraphSAGE for efficient node ranking. The model uses a lightweight input representation built on two straightforward and significant topological features: node degree and average neighbor degree. These features are processed through 1D convolutions to extract local patterns, followed by GraphSAGE layers to aggregate neighborhood information. We formulate the node ranking task as a regression problem and use the Susceptible-Infected-Recovered (SIR) model to generate ground truth influence scores. 1D-CGS is initially trained on synthetic networks generated by the Barabasi-Albert model and then applied to real world networks for identifying influential nodes. Experimental evaluations on twelve real world networks demonstrate that 1D-CGS significantly outperforms traditional centrality measures and recent deep learning models in ranking accuracy, while operating in very fast runtime. The proposed model achieves an average improvement of 4.73% in Kendall’s Tau correlation and 7.67% in Jaccard Similarity over the best performing deep learning baselines. It also achieves an average Monotonicity Index (MI) score 0.99 and produces near perfect rank distributions, indicating highly unique and discriminative rankings. Furthermore, all experiments confirm that 1D-CGS operates in a highly reasonable time, running significantly faster than existing deep learning methods, making it suitable for large scale applications.

nan

Article 884

Title@2025-07-25 (5): A Validation Approach to Over-parameterized Matrix and Image Recovery

Title: A Validation Approach to Over-parameterized Matrix and Image Recovery

Ein Validierungsansatz für überparameterisierte Matrix und Bildwiederherstellung

超参数矩阵和图像恢复的验证方法 2209.10675v3

Authors (5): Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

This paper studies the problem of recovering a low-rank matrix from several noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a priori and use an objective function built from a rank-overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of the ground-truth matrix rather than scaling with the overspecified matrix rank, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when it is stopped appropriately. We then propose an efficient stopping strategy based on the common hold-out method and show that it detects a nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior, which over-parameterizes an image with a deep network.

nan

Article 885

Title@2025-07-25 (5): Disjoint Generative Models

Title: Disjoint Generative Models

Disjoint Generative Modelle

分离生成模型 2507.19700v1

Authors (5): Anton Danholt Lautrup, Muhammad Rajabinasab, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp

We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.

nan

Article 886

Title: NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology

NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: Ein multimodaler Datensatz und Methodik

NAICS-NAICS-Aware 用于大型POI共同访问预测:多模式数据集和方法的大型 POI 联合访问预测的神经网络 2507.19697v1

Authors (6): Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Diána Hidvégi, Hend Alrasheed, Mohsen Bahrami

Understanding where people go after visiting one business is crucial for urban planning, retail analytics, and location-based services. However, predicting these co-visitation patterns across millions of venues remains challenging due to extreme data sparsity and the complex interplay between spatial proximity and business relationships. Traditional approaches using only geographic distance fail to capture why coffee shops attract different customer flows than fine dining restaurants, even when co-located. We introduce NAICS-aware GraphSAGE, a novel graph neural network that integrates business taxonomy knowledge through learnable embeddings to predict population-scale co-visitation patterns. Our key insight is that business semantics, captured through detailed industry codes, provide crucial signals that pure spatial models cannot explain. The approach scales to massive datasets (4.2 billion potential venue pairs) through efficient state-wise decomposition while combining spatial, temporal, and socioeconomic features in an end-to-end framework. Evaluated on our POI-Graph dataset comprising 94.9 million co-visitation records across 92,486 brands and 48 US states, our method achieves significant improvements over state-of-the-art baselines: the R-squared value increases from 0.243 to 0.625 (a 157 percent improvement), with strong gains in ranking quality (32 percent improvement in NDCG at 10).

nan

Article 887

Title@2025-07-25 (5): BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

Title: BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen

BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v3

Authors (3): Ruohong Liu, Jack Umenberger, Yize Chen

Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.

nan

Article 888

Title@2025-07-25 (5): KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System

Title: KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System

KD-GAT: Kombination von Wissensdestillation und Graphen-Achtungstransformator für ein Controller Area Network Intrusion Detection System

KD-GAT:将知识蒸馏和图形关注变异器合并成一个总控制区域网络入侵探测系统 2507.19686v1

Authors (4): Robert Frenken, Sidra Ghayour Bhatti, Hanqin Zhang, Qadeer Ahmed

The Controller Area Network (CAN) protocol is widely adopted for in-vehicle communication but lacks inherent security mechanisms, making it vulnerable to cyberattacks. This paper introduces KD-GAT, an intrusion detection framework that combines Graph Attention Networks (GATs) with knowledge distillation (KD) to enhance detection accuracy while reducing computational complexity. In our approach, CAN traffic is represented as graphs using a sliding window to capture temporal and relational patterns. A multi-layer GAT with jumping knowledge aggregation acting as the teacher model, while a compact student GAT–only 6.32% the size of the teacher–is trained via a two-phase process involving supervised pretraining and knowledge distillation with both soft and hard label supervision. Experiments on three benchmark datasets–Car-Hacking, Car-Survival, and can-train-and-test demonstrate that both teacher and student models achieve strong results, with the student model attaining 99.97% and 99.31% accuracy on Car-Hacking and Car-Survival, respectively. However, significant class imbalance in can-train-and-test has led to reduced performance for both models on this dataset. Addressing this imbalance remains an important direction for future work.

nan

Article 889

Title@2025-07-25 (5): Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Title: Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

kompositorische Fähigkeiten treten multiplikativ auf: Die Erforschung von Diffusionsmodellen auf einer synthetischen Aufgabe

多重复制:探索合成工作传播模型 2310.09336v5

Authors (4): Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model’s ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden “emergence” due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.

nan

Article 890

Title@2025-07-25 (5): Salsa as a Nonverbal Embodied Language – The CoMPAS3D Dataset and Benchmarks

Title: Salsa as a Nonverbal Embodied Language – The CoMPAS3D Dataset and Benchmarks

Salsa als nonverbale Sprache – Der CoMPAS3D Datensatz und Benchmarks

Salsa 作为一种非语言的成形语言 – – CoMPAS3D数据集和基准 2507.19684v1

Authors (6): Bermet Burkanova, Payam Jome Yazdian, Chuxuan Zhang, Trinity Evans, Paige Tuttösí, Angelica Lim

Imagine a humanoid that can safely and creatively dance with a human, adapting to its partner’s proficiency, using haptic signaling as a primary form of communication. While today’s AI systems excel at text or voice-based interaction with large language models, human communication extends far beyond text-it includes embodied movement, timing, and physical coordination. Modeling coupled interaction between two agents poses a formidable challenge: it is continuous, bidirectionally reactive, and shaped by individual variation. We present CoMPAS3D, the largest and most diverse motion capture dataset of improvised salsa dancing, designed as a challenging testbed for interactive, expressive humanoid AI. The dataset includes 3 hours of leader-follower salsa dances performed by 18 dancers spanning beginner, intermediate, and professional skill levels. For the first time, we provide fine-grained salsa expert annotations, covering over 2,800 move segments, including move types, combinations, execution errors and stylistic elements. We draw analogies between partner dance communication and natural language, evaluating CoMPAS3D on two benchmark tasks for synthetic humans that parallel key problems in spoken language and dialogue processing: leader or follower generation with proficiency levels (speaker or listener synthesis), and duet (conversation) generation. Towards a long-term goal of partner dance with humans, we release the dataset, annotations, and code, along with a multitask SalsaAgent model capable of performing all benchmark tasks, alongside additional baselines to encourage research in socially interactive embodied AI and creative, expressive humanoid motion generation.

nan

Article 891

Title@2025-07-25 (5): Feature learning is decoupled from generalization in high capacity neural networks

Title: Feature learning is decoupled from generalization in high capacity neural networks

Feature-Lernen wird von der Generalisierung in hochkapazitätigen neuronalen Netzwerken entkoppelt

特色学习与高容量神经网络的一般化脱钩 2507.19680v1

Authors (6): Niclas Alexander Göring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis

Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization.

nan

Article 892

Title@2025-07-25 (5): Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Title: Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Ausrichtung und Sicherheit in großen Sprachmodellen: Sicherheitsmechanismen, Trainingsparadigmen und neue Herausforderungen

大语言模式的协调和安全:安全机制、培训范式和新挑战 2507.19672v1

Authors (50): Haoran Lu, Luyang Fang, Ruidong Zhang, Xinliang Li, Jiazhang Cai, Huimin Cheng, Lin Tang, Ziyu Liu, Zeliang Sun, Tao Wang, Yingchuan Zhang, Arif Hassan Zidan, Jinwen Xu, Jincheng Yu, Meizhi Yu, Hanqi Jiang, Xilin Gong, Weidi Luo, Bolun Sun, Yongkai Chen, Terry Ma, Shushan Wu, Yifan Zhou, Junhao Chen, Haotian Xiang, Jing Zhang, Afrar Jahin, Wei Ruan, Ke Deng, Yi Pan, Peilong Wang, Jiahui Li, Zhengliang Liu, Lu Zhang, Lin Zhao, Wei Liu, Dajiang Zhu, Xin Xing, Fei Dou, Wei Zhang, Chao Huang, Rongjie Liu, Mengrui Zhang, Yiwen Liu, Xiaoxiao Sun, Qin Lu, Zhen Xiang, Wenxuan Zhong, Tianming Liu, Ping Ma

Due to the remarkable capabilities and growing impact of large language models (LLMs), they have been deeply integrated into many aspects of society. Thus, ensuring their alignment with human values and intentions has emerged as a critical challenge. This survey provides a comprehensive overview of practical alignment techniques, training protocols, and empirical findings in LLM alignment. We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives. Our analysis shows that while supervised fine-tuning enables basic instruction-following, preference-based methods offer more flexibility for aligning with nuanced human intent. We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ), highlighting their approaches to balancing quality and efficiency. We review existing evaluation frameworks and benchmarking datasets, emphasizing limitations such as reward misspecification, distributional robustness, and scalable oversight. We summarize strategies adopted by leading AI labs to illustrate the current state of practice. We conclude by outlining open problems in oversight, value pluralism, robustness, and continuous alignment. This survey aims to inform both researchers and practitioners navigating the evolving landscape of LLM alignment.

nan

Article 893

Title@2025-07-25 (5): Adaptive Bayesian Data-Driven Design of Reliable Solder Joints for Micro-electronic Devices

Title: Adaptive Bayesian Data-Driven Design of Reliable Solder Joints for Micro-electronic Devices

Adaptives Bayesian Data-Driven Design von zuverlässigen Lötgelenken für mikroelektronische Geräte

微电子设备可靠太阳能联合点的调适贝耶斯数据驱动设计 2507.19663v1

Authors (4): Leo Guo, Adwait Inamdar, Willem D. van Driel, GuoQi Zhang

Solder joint reliability related to failures due to thermomechanical loading is a critically important yet physically complex engineering problem. As a result, simulated behavior is oftentimes computationally expensive. In an increasingly data-driven world, the usage of efficient data-driven design schemes is a popular choice. Among them, Bayesian optimization (BO) with Gaussian process regression is one of the most important representatives. The authors argue that computational savings can be obtained from exploiting thorough surrogate modeling and selecting a design candidate based on multiple acquisition functions. This is feasible due to the relatively low computational cost, compared to the expensive simulation objective. This paper addresses the shortcomings in the adjacent literature by providing and implementing a novel heuristic framework to perform BO with adaptive hyperparameters across the various optimization iterations. Adaptive BO is subsequently compared to regular BO when faced with synthetic objective minimization problems. The results show the efficiency of adaptive BO when compared any worst-performing regular Bayesian schemes. As an engineering use case, the solder joint reliability problem is tackled by minimizing the accumulated non-linear creep strain under a cyclic thermal load. Results show that adaptive BO outperforms regular BO by 3% on average at any given computational budget threshold, critically saving half of the computational expense budget. This practical result underlines the methodological potential of the adaptive Bayesian data-driven methodology to achieve better results and cut optimization-related expenses. Lastly, in order to promote the reproducibility of the results, the data-driven implementations are made available on an open-source basis.

nan

Article 894

Title@2025-07-25 (5): Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Title: Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität

非曲线矩阵表感测:打破样本复杂程度的二次级屏障 2408.13276v4

Authors (2): Dominik Stöger, Yizhe Zhu

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground-truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth at a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Furthermore, we extend our theory to the noisy setting, where we show that with noisy measurements, factorized gradient descent with spectral initialization converges to the minimax optimal error up to a factor linear in $\kappa$. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.

nan

Article 895

Title@2025-07-25 (5): Growing Neural Networks: Dynamic Evolution through Gradient Descent

Title: Growing Neural Networks: Dynamic Evolution through Gradient Descent

Wachsende neurale Netzwerke: Dynamische Evolution durch gradienten Abstieg

不断增长的神经网络:通过渐渐后代的动态演变 2501.18012v2

Authors (5): Anil Radhakrishnan, John F. Lindner, Scott T. Miller, Sudeshna Sinha, William L. Ditto

In contrast to conventional artificial neural networks, which are structurally static, we present two approaches for evolving small networks into larger ones during training. The first method employs an auxiliary weight that directly controls network size, while the second uses a controller-generated mask to modulate neuron participation. Both approaches optimize network size through the same gradient-descent algorithm that updates the network’s weights and biases. We evaluate these growing networks on nonlinear regression and classification tasks, where they consistently outperform static networks of equivalent final size. We then explore the hyperparameter space of these networks to find associated scaling relations relative to their static counterparts. Our results suggest that starting small and growing naturally may be preferable to simply starting large, particularly as neural networks continue to grow in size and energy consumption.

nan

Article 896

Title@2025-07-25 (5): On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

Title: On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

Über die Grenzen von Ray-Tracing für lernbasierte RF-Aufgaben in städtischen Umgebungen

城市环境中基于学习的RF任务 2507.19653v1

Authors (4): Armen Manukyan, Hrant Khachatrian, Edvard Ghukasyan, Theofanis P. Raptis

We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna’s properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

nan

Article 897

Title@2025-07-25 (5): Street network sub-patterns and travel mode

Title: Street network sub-patterns and travel mode

Straßennetz-Untermuster und Reisemodus

街道网络次级模式和旅行模式 2507.19648v1

Authors (4): Juan Fernando Riascos Goyes, Michael Lowry, Nicolás Guarín Zapata, Juan Pablo Ospina

Urban morphology has long been recognized as a factor shaping human mobility, yet comparative and formal classifications of urban form across metropolitan areas remain limited. Building on theoretical principles of urban structure and advances in unsupervised learning, we systematically classified the built environment of nine U.S. metropolitan areas using structural indicators such as density, connectivity, and spatial configuration. The resulting morphological types were linked to mobility patterns through descriptive statistics, marginal effects estimation, and post hoc statistical testing. Here we show that distinct urban forms are systematically associated with different mobility behaviors, such as reticular morphologies being linked to significantly higher public transport use (marginal effect = 0.49) and reduced car dependence (-0.41), while organic forms are associated with increased car usage (0.44), and substantial declines in public transport (-0.47) and active mobility (-0.30). These effects are statistically robust (p < 1e-19), highlighting that the spatial configuration of urban areas plays a fundamental role in shaping transportation choices. Our findings extend previous work by offering a reproducible framework for classifying urban form and demonstrate the added value of morphological analysis in comparative urban research. These results suggest that urban form should be treated as a key variable in mobility planning and provide empirical support for incorporating spatial typologies into sustainable urban policy design.

nan

Article 898

Title@2025-07-25 (5): GABRIL: Gaze-Based Regularization for Mitigating Causal Confusion in Imitation Learning

Title: GABRIL: Gaze-Based Regularization for Mitigating Causal Confusion in Imitation Learning

GABRIL: Gaze-based Regularization zur Minderung von Kausalverwirrung im Imitationslernen

GABRIL: 减少模拟学习中因果融合的基于气体的正规化 2507.19647v1

Authors (4): Amin Banayeeanzade, Fatemeh Bahrani, Yutai Zhou, Erdem Bıyık

Imitation Learning (IL) is a widely adopted approach which enables agents to learn from human expert demonstrations by framing the task as a supervised learning problem. However, IL often suffers from causal confusion, where agents misinterpret spurious correlations as causal relationships, leading to poor performance in testing environments with distribution shift. To address this issue, we introduce GAze-Based Regularization in Imitation Learning (GABRIL), a novel method that leverages the human gaze data gathered during the data collection phase to guide the representation learning in IL. GABRIL utilizes a regularization loss which encourages the model to focus on causally relevant features identified through expert gaze and consequently mitigates the effects of confounding variables. We validate our approach in Atari environments and the Bench2Drive benchmark in CARLA by collecting human gaze datasets and applying our method in both domains. Experimental results show that the improvement of GABRIL over behavior cloning is around 179% more than the same number for other baselines in the Atari and 76% in the CARLA setup. Finally, we show that our method provides extra explainability when compared to regular IL agents.

nan

Article 899

Title@2025-07-25 (5): Categorical Schrödinger Bridge Matching

Title: Categorical Schrödinger Bridge Matching

Kategorische Schrödinger-Brücke passend

分类式 Schrödinger 桥配对 2502.01416v3

Authors (2): Grigoriy Ksenofontov, Alexander Korotin

The Schr"odinger Bridge (SB) is a powerful framework for solving generative modeling tasks such as unpaired domain translation. Most SB-related research focuses on continuous data space $\mathbb{R}^{D}$ and leaves open theoretical and algorithmic questions about applying SB methods to discrete data, e.g, on finite spaces $\mathbb{S}^{D}$. Notable examples of such sets $\mathbb{S}$ are codebooks of vector-quantized (VQ) representations of modern autoencoders, tokens in texts, categories of atoms in molecules, etc. In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. Specifically, we theoretically justify the convergence of discrete-time IMF (D-IMF) to SB in discrete spaces. This enables us to develop a practical computational algorithm for SB, which we call Categorical Schr"odinger Bridge Matching (CSBM). We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. The code of CSBM is available at https://github.com/gregkseno/csbm.

nan

Article 900

Title@2025-07-25 (5): Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy

Title: Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy

Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie

使用共同自由能源曲线几何法优化 2506.09091v3

Authors (5): Kenric Nelson, Igor Oliveira, Amenah Al-Najafi, Fode Zhang, Hon Keung Tony Ng

We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student’s t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model robust against severe outliers, while assuring that the training process is stable. The Wasserstein-2 or Fr'echet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.

nan

Article 901

Title@2025-07-25 (5): Mask prior-guided denoising diffusion improves inverse protein folding

Title: Mask prior-guided denoising diffusion improves inverse protein folding

Maskieren Sie vorgeführte Denoisierung Diffusion verbessert inverse Proteinfaltung

面罩前制导除去喷雾扩散会改善蛋白质反折叠 2412.07815v2

Authors (7): Peizhen Bai, Filip Miljković, Xianyuan Liu, Leonardo De Maria, Rebecca Croasdale-Wood, Owen Rackham, Haiping Lu

Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing strong potential and competitive performance. However, challenges remain, such as predicting elements with high structural uncertainty, including disordered regions. To tackle such low-confidence residue prediction, we propose a Mask-prior-guided denoising Diffusion (MapDiff) framework that accurately captures both structural information and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural information and residue interactions, we develop a graph-based denoising network with a mask-prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to reduce uncertainty. Evaluation on four challenging sequence design benchmarks shows that MapDiff substantially outperforms state-of-the-art methods. Furthermore, the in silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.

nan

Article 902

Title@2025-07-25 (5): Directly Learning Stock Trading Strategies Through Profit Guided Loss Functions

Title: Directly Learning Stock Trading Strategies Through Profit Guided Loss Functions

Direktes Lernen von Tradingstrategien durch gewinnorientierte Verlustfunktionen

通过利润引导损失直接学习证券交易战略 2507.19639v1

Authors (7): Devroop Kar, Zimeng Lyu, Sheeraja Rajakrishnan, Hao Zhang, Alex Ororbia, Travis Desell, Daniel Krutz

Stock trading has always been a challenging task due to the highly volatile nature of the stock market. Making sound trading decisions to generate profit is particularly difficult under such conditions. To address this, we propose four novel loss functions to drive decision-making for a portfolio of stocks. These functions account for the potential profits or losses based with respect to buying or shorting respective stocks, enabling potentially any artificial neural network to directly learn an effective trading strategy. Despite the high volatility in stock market fluctuations over time, training time-series models such as transformers on these loss functions resulted in trading strategies that generated significant profits on a portfolio of 50 different S&P 500 company stocks as compared to a benchmark reinforcment learning techniques and a baseline buy and hold method. As an example, using 2021, 2022 and 2023 as three test periods, the Crossformer model adapted with our best loss function was most consistent, resulting in returns of 51.42%, 51.04% and 48.62% respectively. In comparison, the best performing state-of-the-art reinforcement learning methods, PPO and DDPG, only delivered maximum profits of around 41%, 2.81% and 41.58% for the same periods. The code is available at https://anonymous.4open.science/r/bandit-stock-trading-58C8/README.md.

nan

Article 903

Title@2025-07-25 (5): MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

Title: MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

MOCK: ein Algorithmus für das Lernen nichtparametrischer Differentialgleichungen über multivariate Aufgaben-Kernel-Funktionen

MOCK: 通过多变量职业核心函数学习非参数等量的分等函数的算法 2306.10189v4

Authors (7): Victor Rielly, Kamel Lahouel, Ethan Lew, Nicholas Fisher, Vicky Haney, Michael Wells, Bruno Jedynak

Learning a nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations often scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach, the multivariate occupation kernel method (MOCK), using the implicit formulation provided by vector-valued reproducing kernel Hilbert spaces. The solution for the vector field relies on multivariate occupation kernel functions associated with the trajectories and scales linearly with the dimension of the state space. We validate through experiments on a variety of simulated and real datasets ranging from 2 to 1024 dimensions. MOCK outperforms all other comparators on 3 of the 9 datasets on full trajectory prediction and 4 out of the 9 datasets on next-point prediction.

nan

Article 904

Title@2025-07-25 (5): Efficient and Scalable Agentic AI with Heterogeneous Systems

Title: Efficient and Scalable Agentic AI with Heterogeneous Systems

Effiziente und skalierbare Agentische KI mit Heterogenen Systemen

具有异质系统的高效和可缩放剂AIA 2507.19635v1

Authors (3): Zain Asgar, Michelle Nguyen, Sachin Katti

AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers. Unlike conventional software or static inference, agentic workloads are dynamic and structurally complex. Often these agents are directed graphs of compute and IO operations that span multi-modal data input and conversion), data processing and context gathering (e.g vector DB lookups), multiple LLM inferences, tool calls, etc. To scale AI agent usage, we need efficient and scalable deployment and agent-serving infrastructure. To tackle this challenge, in this paper, we present a system design for dynamic orchestration of AI agent workloads on heterogeneous compute infrastructure spanning CPUs and accelerators, both from different vendors and across different performance tiers within a single vendor. The system delivers several building blocks: a framework for planning and optimizing agentic AI execution graphs using cost models that account for compute, memory, and bandwidth constraints of different HW; a MLIR based representation and compilation system that can decompose AI agent execution graphs into granular operators and generate code for different HW options; and a dynamic orchestration system that can place the granular components across a heterogeneous compute infrastructure and stitch them together while meeting an end-to-end SLA. Our design performs a systems level TCO optimization and preliminary results show that leveraging a heterogeneous infrastructure can deliver significant TCO benefits. A preliminary surprising finding is that for some workloads a heterogeneous combination of older generation GPUs with newer accelerators can deliver similar TCO as the latest generation homogenous GPU infrastructure design, potentially extending the life of deployed infrastructure.

nan

Article 905

Title@2025-07-25 (5): LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Title: LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

LoX: Low-Rank-Extrapolation stärkt LLM-Sicherheit gegen Feinabstimmung

LoX:低Rank外推法强力推力LLM 安全防止微调 2506.15606v3

Authors (6): Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong

Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper, we empirically demonstrate that this vulnerability stems from the sensitivity of safety-critical low-rank subspaces in LLM parameters to fine-tuning. Building on this insight, we propose a novel training-free method, termed Low-Rank Extrapolation (LoX), to enhance safety robustness by extrapolating the safety subspace of an aligned LLM. Our experimental results confirm the effectiveness of LoX, demonstrating significant improvements in robustness against both benign and malicious fine-tuning attacks while preserving the model’s adaptability to new tasks. For instance, LoX leads to 11% to 54% absolute reductions in attack success rates (ASR) facing benign or malicious fine-tuning attacks. By investigating the ASR landscape of parameters, we attribute the success of LoX to that the extrapolation moves LLM parameters to a flatter zone, thereby less sensitive to perturbations. The code is available at github.com/VITA-Group/LoX.

nan

Article 906

Title@2025-07-25 (5): Quantum Reinforcement Learning by Adaptive Non-local Observables

Title: Quantum Reinforcement Learning by Adaptive Non-local Observables

Quanten-Verstärkung-Lernen durch adaptive nicht-lokale Observables

适应性非当地可观测的非当地可观测物体的量级强化学习 2507.19629v1

Authors (4): Hsin-Yi Lin, Samuel Yen-Chi Chen, Huan-Hsin Tseng, Shinjae Yoo

Hybrid quantum-classical frameworks leverage quantum computing for machine learning; however, variational quantum circuits (VQCs) are limited by the need for local measurements. We introduce an adaptive non-local observable (ANO) paradigm within VQCs for quantum reinforcement learning (QRL), jointly optimizing circuit parameters and multi-qubit measurements. The ANO-VQC architecture serves as the function approximator in Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms. On multiple benchmark tasks, ANO-VQC agents outperform baseline VQCs. Ablation studies reveal that adaptive measurements enhance the function space without increasing circuit depth. Our results demonstrate that adaptive multi-qubit observables can enable practical quantum advantages in reinforcement learning.

nan

Article 907

Title@2025-07-25 (5): Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition

Title: Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition

Föderierte Berechnung des Free-Support-Transport-Barycenters durch Single-Loop Dual Decomposition

按单一卢普两极分解法对自由支持运输百分中心进行的联邦计算 2507.19627v1

Authors (2): Zhengqi Lin, Andrzej Ruszczyński

We propose an efficient federated dual decomposition algorithm for calculating the Wasserstein barycenter of several distributions, including choosing the support of the solution. The algorithm does not access local data and uses only highly aggregated information. It also does not require repeated solutions to mass transportation problems. Because of the absence of any matrix-vector operations, the algorithm exhibits a very low complexity of each iteration and significant scalability. We illustrate its virtues and compare it to the state-of-the-art methods on several examples of mixture models.

nan

Article 908

Title@2025-07-25 (5): Studying number theory with deep learning: a case study with the Möbius and squarefree indicator functions

Title: Studying number theory with deep learning: a case study with the Möbius and squarefree indicator functions

Zahlentheorie mit Deep Learning studieren: eine Fallstudie mit Möbius und quadratfreien Indikatorfunktionen

深深学习研究数字理论:与莫比乌斯和无平方指标函数有关的案例研究 2502.10335v2

Authors (1): David Lowry-Duda

Building on work of Charton, we train small transformer models to calculate the M"{o}bius function $\mu(n)$ and the squarefree indicator function $\mu^2(n)$. The models attain nontrivial predictive power. We apply a mixture of additional models and feature scoring to give a theoretical explanation.

nan

Article 909

Title@2025-07-25 (5): State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees

Title: State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees

Zustandsentwicklung über die Methoden erster Ordnung I: Starre Vorhersagen und endliche Stichprobengarantien

一: 严格预测和有限抽样保证 2507.19611v1

Authors (4): Michael Celentano, Chen Cheng, Ashwin Pananjady, Kabir Aladin Verchand

We develop a toolbox for exact analysis of iterative algorithms on a class of high-dimensional nonconvex optimization problems with random data. While prior work has shown that low-dimensional statistics of (generalized) first-order methods can be predicted by a deterministic recursion known as state evolution, our focus is on developing such a prediction for a more general class of algorithms. We provide a state evolution for any method whose iterations are given by (possibly interleaved) first-order and saddle point updates, showing two main results. First, we establish a rigorous state evolution prediction that holds even when the updates are not coordinate-wise separable. Second, we establish finite-sample guarantees bounding the deviation of the empirical updates from the established state evolution. In the process, we develop a technical toolkit that may prove useful in related problems. One component of this toolkit is a general Hilbert space lifting technique to prove existence and uniqueness of a convenient parameterization of the state evolution. Another component of the toolkit combines a generic application of Bolthausen’s conditioning method with a sequential variant of Gordon’s Gaussian comparison inequality, and provides additional ingredients that enable a general finite-sample analysis.

nan

Article 910

Title@2025-07-25 (5): Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark

Title: Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark

Tiefe unüberwachte Domain-Anpassung für die Zeitreihenklassifikation: ein Benchmark

时间序列分类:基准 2312.09857v3

Authors (5): Hassan Ismail Fawaz, Ganesh Del Grosso, Tanguy Kerdoncuff, Aurelie Boisbunon, Illyyne Saffar

Unsupervised Domain Adaptation (UDA) aims to harness labeled source data to train models for unlabeled target data. Despite extensive research in domains like computer vision and natural language processing, UDA remains underexplored for time series data, which has widespread real-world applications ranging from medicine and manufacturing to earth observation and human activity recognition. Our paper addresses this gap by introducing a comprehensive benchmark for evaluating UDA techniques for time series classification, with a focus on deep learning methods. We provide seven new benchmark datasets covering various domain shifts and temporal dynamics, facilitating fair and standardized UDA method assessments with state of the art neural network backbones (e.g. Inception) for time series data. This benchmark offers insights into the strengths and limitations of the evaluated approaches while preserving the unsupervised nature of domain adaptation, making it directly applicable to practical problems. Our paper serves as a vital resource for researchers and practitioners, advancing domain adaptation solutions for time series data and fostering innovation in this critical field. The implementation code of this benchmark is available at https://github.com/EricssonResearch/UDA-4-TSC.

nan

Article 911

Title@2025-07-25 (5): MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

Title: MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

MOCHA: Sind Code-Sprachenmodelle gegen multi-Turn bösartige Coding-Prompts robust?

MOCHA:守则语言模型是否强力打击多发恶意编码的提示? 2507.19598v1

Authors (8): Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou

Recent advancements in Large Language Models (LLMs) have significantly enhanced their code generation capabilities. However, their robustness against adversarial misuse, particularly through multi-turn malicious coding prompts, remains underexplored. In this work, we introduce code decomposition attacks, where a malicious coding task is broken down into a series of seemingly benign subtasks across multiple conversational turns to evade safety filters. To facilitate systematic evaluation, we introduce \benchmarkname{}, a large-scale benchmark designed to evaluate the robustness of code LLMs against both single-turn and multi-turn malicious prompts. Empirical results across open- and closed-source models reveal persistent vulnerabilities, especially under multi-turn scenarios. Fine-tuning on MOCHA improves rejection rates while preserving coding ability, and importantly, enhances robustness on external adversarial datasets with up to 32.4% increase in rejection rates without any additional supervision.

nan

Article 912

Title@2025-07-25 (5): Affordance-Guided Reinforcement Learning via Visual Prompting

Title: Affordance-Guided Reinforcement Learning via Visual Prompting

Erschwinglich geführtes Verstärkungslernen durch visuelle Prompting

通过视觉促视学习,提供负担得起的辅助强化教育 2407.10341v6

Authors (5): Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive zero-shot reasoning about affordances through keypoints, and we use these to define dense rewards that guide autonomous robotic learning. On diverse real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 45K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl

nan

Article 913

Title@2025-07-25 (5): Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning

Title: Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning

Geospatielles Wissen abmildern Halluzination in großen Sprachmodellen: Benchmarking und Dynamische Faktizität Ausrichtung

减轻大语言模式中的地理空间知识幻觉:基准和动态事实对齐 2507.19586v1

Authors (5): Shengyuan Wang, Jie Feng, Tianhui Liu, Dan Pei, Yong Li

Large language models (LLMs) possess extensive world knowledge, including geospatial knowledge, which has been successfully applied to various geospatial tasks such as mobility prediction and social indicator prediction. However, LLMs often generate inaccurate geospatial knowledge, leading to geospatial hallucinations (incorrect or inconsistent representations of geospatial information) that compromise their reliability. While the phenomenon of general knowledge hallucination in LLMs has been widely studied, the systematic evaluation and mitigation of geospatial hallucinations remain largely unexplored. To address this gap, we propose a comprehensive evaluation framework for geospatial hallucinations, leveraging structured geospatial knowledge graphs for controlled assessment. Through extensive evaluation across 20 advanced LLMs, we uncover the hallucinations in their geospatial knowledge. Building on these insights, we introduce a dynamic factuality aligning method based on Kahneman-Tversky Optimization (KTO) to mitigate geospatial hallucinations in LLMs, leading to a performance improvement of over 29.6% on the proposed benchmark. Extensive experimental results demonstrate the effectiveness of our benchmark and learning algorithm in enhancing the trustworthiness of LLMs in geospatial knowledge and reasoning tasks.

nan

Article 914

Title@2025-07-25 (5): Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts

Title: Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts

Weiterentwicklung der Event-Prognose durch massives Training von großen Sprachmodellen: Herausforderungen, Lösungen und breitere Auswirkungen

通过大规模培训大语言模式:挑战、解决办法和更广泛影响 2507.19477v1

Authors (4): Sang-Woo Lee, Sohee Yang, Donghyun Kwak, Noah Y. Siegel

Many recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performance, and reinforcement learning has also been reported to improve future forecasting. Additionally, the unprecedented success of recent reasoning models and Deep Research-style models suggests that technology capable of greatly improving forecasting performance has been developed. Therefore, based on these positive recent trends, we argue that the time is ripe for research on large-scale training of superforecaster-level event forecasting LLMs. We discuss two key research directions: training methods and data acquisition. For training, we first introduce three difficulties of LLM-based event forecasting training: noisiness-sparsity, knowledge cut-off, and simple reward structure problems. Then, we present related ideas to mitigate these problems: hypothetical event Bayesian networks, utilizing poorly-recalled and counterfactual events, and auxiliary reward signals. For data, we propose aggressive use of market, public, and crawling datasets to enable large-scale training and evaluation. Finally, we explain how these technical advances could enable AI to provide predictive intelligence to society in broader areas. This position paper presents promising specific paths and considerations for getting closer to superforecaster-level AI technology, aiming to call for researchers’ interest in these directions.

nan

Article 915

Title@2025-07-25 (5): Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization

Title: Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization

Lassen Sie es los? Nicht ganz: Adressieren von Item Cold Start in sequentiellen Empfehlungen mit Content-basierte Initialisierung

让它走吗?不是相当的:在基于内容的初始化的序列建议中处理项目“冷启动” 2507.19473v1

Authors (4): Anton Pembek, Artem Fatkulin, Anton Klenitskiy, Alexey Vasilev

Many sequential recommender systems suffer from the cold start problem, where items with few or no interactions cannot be effectively used by the model due to the absence of a trained embedding. Content-based approaches, which leverage item metadata, are commonly used in such scenarios. One possible way is to use embeddings derived from content features such as textual descriptions as initialization for the model embeddings. However, directly using frozen content embeddings often results in suboptimal performance, as they may not fully adapt to the recommendation task. On the other hand, fine-tuning these embeddings can degrade performance for cold-start items, as item representations may drift far from their original structure after training. We propose a novel approach to address this limitation. Instead of entirely freezing the content embeddings or fine-tuning them extensively, we introduce a small trainable delta to frozen embeddings that enables the model to adapt item representations without letting them go too far from their original semantic structure. This approach demonstrates consistent improvements across multiple datasets and modalities, including e-commerce datasets with textual descriptions and a music dataset with audio-based representation.

nan

Article 916

Title@2025-07-25 (5): Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?

Title: Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?

Ist Austauschbarkeit besser als I.I.D, um Datenverteilungsverschiebungen zu handhaben, während Daten für Data-scarce Medizinische Bildsegmentierung gepoolt werden?

是否比I. I. D. 更适于处理数据分配的转移,而处理数据分散的合并数据的医疗图像分割? 2507.19575v1

Authors (5): Ayush Roy, Samin Enam, Jun Xia, Vishnu Suresh Lokhande, Won Hwa Kim

Data scarcity is a major challenge in medical imaging, particularly for deep learning models. While data pooling (combining datasets from multiple sources) and data addition (adding more data from a new dataset) have been shown to enhance model performance, they are not without complications. Specifically, increasing the size of the training dataset through pooling or addition can induce distributional shifts, negatively affecting downstream model performance, a phenomenon known as the “Data Addition Dilemma”. While the traditional i.i.d. assumption may not hold in multi-source contexts, assuming exchangeability across datasets provides a more practical framework for data pooling. In this work, we investigate medical image segmentation under these conditions, drawing insights from causal frameworks to propose a method for controlling foreground-background feature discrepancies across all layers of deep networks. This approach improves feature representations, which are crucial in data-addition scenarios. Our method achieves state-of-the-art segmentation performance on histopathology and ultrasound images across five datasets, including a novel ultrasound dataset that we have curated and contributed. Qualitative results demonstrate more refined and accurate segmentation maps compared to prominent baselines across three model architectures. The code will be available on Github.

nan

Article 917

Title@2025-07-25 (5): ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation

Title: ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation

ReSem3D: Verfeinerbare 3D-Raumeinschränkungen durch feinkörnige semantische Erdung für eine generalisierbare Robotermanipulation

ReSem3D:通过精密的可通用机器人操纵的语义定位,改进3D空间限制 2507.18262v2

Authors (5): Chenyu Su, Weiwei Shang, Chen Qian, Fei Zhang, Shuang Cong

Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos are available at https://github.com/scy-v/ReSem3D and https://resem3d.github.io.

nan

Article 918

Title@2025-07-25 (5): Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces

Title: Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces

Linear konvergente Algorithmen für rauchfreie Probleme mit unbekannten glatten Stücken

与未知平滑小块的非移动问题线性一致的线性算法 2507.19465v1

Authors (2): Zhe Zhang, Suvrit Sra

We develop efficient algorithms for optimizing piecewise smooth (PWS) functions where the underlying partition of the domain into smooth pieces is \emph{unknown}. For PWS functions satisfying a quadratic growth (QG) condition, we propose a bundle-level (BL) type method that achieves global linear convergence – to our knowledge, the first such result for any algorithm for this problem class. We extend this method to handle approximately PWS functions and to solve weakly-convex PWS problems, improving the state-of-the-art complexity to match the benchmark for smooth non-convex optimization. Furthermore, we introduce the first verifiable and accurate termination criterion for PWS optimization. Similar to the gradient norm in smooth optimization, this certificate tightly characterizes the optimality gap under the QG condition, and can moreover be evaluated without knowledge of any problem parameters. We develop a search subroutine for this certificate and embed it within a guess-and-check framework, resulting in an almost parameter-free algorithm for both the convex QG and weakly-convex settings.

nan

Article 919

Title@2025-07-25 (5): RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Title: RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

RADLADS: Schnelle Aufmerksamkeitsdestillation zu linearen Aufmerksamkeitsdecodern auf Scale

RADLADS: 缩放线性引引代码的快速注意蒸馏 2505.03005v3

Authors (4): Daniel Goldstein, Eric Alcaide, Janna Lu, Eugene Cheah

We present Rapid Attention Distillation to Linear Attention Decoders at Scale (RADLADS), a protocol for rapidly converting softmax attention transformers into linear attention decoder models, along with two new RWKV-variant architectures, and models converted from popular Qwen2.5 open source models in 7B, 32B, and 72B sizes. Our conversion process requires only 350-700M tokens, less than 0.005% of the token count used to train the original teacher models. Converting to our 72B linear attention model costs less than $2,000 USD at today’s prices, yet quality at inference remains close to the original transformer. These models achieve state-of-the-art downstream performance across a set of standard benchmarks for linear attention models of their size. We release all our models on HuggingFace under the Apache 2.0 license, with the exception of our 72B models which are also governed by the Qwen License Agreement. Models at https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102 Training Code at https://github.com/recursal/RADLADS-paper

nan

Article 920

Title@2025-07-25 (5): Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

Title: Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

Schnelles Lernen nicht-kooperativer 3D-Modelle von Spacecraft durch Primitive Initialisierung

通过初始初始化快速学习非合作航天器3D模型 2507.19459v1

Authors (3): Pol Francesch Huc, Emily Bates, Simone D’Amico

The advent of novel view synthesis techniques such as NeRF and 3D Gaussian Splatting (3DGS) has enabled learning precise 3D models only from posed monocular images. Although these methods are attractive, they hold two major limitations that prevent their use in space applications: they require poses during training, and have high computational cost at training and inference. To address these limitations, this work contributes: (1) a Convolutional Neural Network (CNN) based primitive initializer for 3DGS using monocular images; (2) a pipeline capable of training with noisy or implicit pose estimates; and (3) and analysis of initialization variants that reduce the training cost of precise 3D models. A CNN takes a single image as input and outputs a coarse 3D model represented as an assembly of primitives, along with the target’s pose relative to the camera. This assembly of primitives is then used to initialize 3DGS, significantly reducing the number of training iterations and input images needed – by at least an order of magnitude. For additional flexibility, the CNN component has multiple variants with different pose estimation techniques. This work performs a comparison between these variants, evaluating their effectiveness for downstream 3DGS training under noisy or implicit pose estimates. The results demonstrate that even with imperfect pose supervision, the pipeline is able to learn high-fidelity 3D representations, opening the door for the use of novel view synthesis in space applications.

nan

Article 921

Title@2025-07-25 (5): Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints

Title: Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints

Hierarchischer Lernrahmen für vertiefte Stärkung von mehrjähriger Vermögensverwaltung im Rahmen von Haushaltszwängen

在预算制约下多年资产管理多年资产管理的等级式深层强化学习框架 2507.19458v1

Authors (2): Amir Fard, Arnold X. -X. Yuan

Budget planning and maintenance optimization are crucial for infrastructure asset management, ensuring cost-effectiveness and sustainability. However, the complexity arising from combinatorial action spaces, diverse asset deterioration, stringent budget constraints, and environmental uncertainty significantly limits existing methods’ scalability. This paper proposes a Hierarchical Deep Reinforcement Learning methodology specifically tailored to multi-year infrastructure planning. Our approach decomposes the problem into two hierarchical levels: a high-level Budget Planner allocating annual budgets within explicit feasibility bounds, and a low-level Maintenance Planner prioritizing assets within the allocated budget. By structurally separating macro-budget decisions from asset-level prioritization and integrating linear programming projection within a hierarchical Soft Actor-Critic framework, the method efficiently addresses exponential growth in the action space and ensures rigorous budget compliance. A case study evaluating sewer networks of varying sizes (10, 15, and 20 sewersheds) illustrates the effectiveness of the proposed approach. Compared to conventional Deep Q-Learning and enhanced genetic algorithms, our methodology converges more rapidly, scales effectively, and consistently delivers near-optimal solutions even as network size grows.

nan

Article 922

Title@2025-07-25 (5): GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Title: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

GEPA: Reflektierende Prompt-Evolution kann Verstärkungs-Lernen übertreffen

GEPA: 反思即时进化能够超过成绩的强化学习 2507.19457v1

Authors (17): Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab

Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto frontier of its own attempts. As a result of GEPA’s design, it can often turn even just a few rollouts into a large quality gain. Across four tasks, GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts. GEPA also outperforms the leading prompt optimizer, MIPROv2, by over 10% across two LLMs, and demonstrates promising results as an inference-time search strategy for code optimization.

nan

Article 923

Title@2025-07-25 (5): Forest-Guided Clustering – Shedding Light into the Random Forest Black Box

Title: Forest-Guided Clustering – Shedding Light into the Random Forest Black Box

Wald-geführte Clustering – Licht in die zufällige Wald Black Box

森林引导集束 – – 将亮光放入随机森林黑盒 2507.19455v1

Authors (6): Lisa Barros de Andrade e Sousa, Gregor Miller, Ronan Le Gleut, Dominik Thalmeier, Helena Pelin, Marie Piraud

As machine learning models are increasingly deployed in sensitive application areas, the demand for interpretable and trustworthy decision-making has increased. Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model’s internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns. FGC bridges the gap between performance and interpretability by providing structure-aware insights that go beyond feature-level attribution.

nan

Article 924

Title@2025-07-25 (5): GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences

Title: GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences

GVCCS: Ein Datensatz zur kontrailen Identifizierung und Verfolgung sichtbarer Ganzhimmel-Kamerasequenzen

GVCSCS:一个用于识别和跟踪可见全天相摄像机序列的可视全天相摄像头的对照识别和跟踪数据集 2507.18330v2

Authors (5): Gabriel Jarry, Ramon Dalmau, Philippe Very, Franck Ballerini, Stefania-Denisa Bocu

Aviation’s climate impact includes not only CO2 emissions but also significant non-CO2 effects, especially from contrails. These ice clouds can alter Earth’s radiative balance, potentially rivaling the warming effect of aviation CO2. Physics-based models provide useful estimates of contrail formation and climate impact, but their accuracy depends heavily on the quality of atmospheric input data and on assumptions used to represent complex processes like ice particle formation and humidity-driven persistence. Observational data from remote sensors, such as satellites and ground cameras, could be used to validate and calibrate these models. However, existing datasets don’t explore all aspect of contrail dynamics and formation: they typically lack temporal tracking, and do not attribute contrails to their source flights. To address these limitations, we present the Ground Visible Camera Contrail Sequences (GVCCS), a new open data set of contrails recorded with a ground-based all-sky camera in the visible range. Each contrail is individually labeled and tracked over time, allowing a detailed analysis of its lifecycle. The dataset contains 122 video sequences (24,228 frames) and includes flight identifiers for contrails that form above the camera. As reference, we also propose a unified deep learning framework for contrail analysis using a panoptic segmentation model that performs semantic segmentation (contrail pixel identification), instance segmentation (individual contrail separation), and temporal tracking in a single architecture. By providing high-quality, temporally resolved annotations and a benchmark for model evaluation, our work supports improved contrail monitoring and will facilitate better calibration of physical models. This sets the groundwork for more accurate climate impact understanding and assessments.

nan

Article 925

Title@2025-07-25 (5): Bounded KRnet and its applications to density estimation and approximation

Title: Bounded KRnet and its applications to density estimation and approximation

Gebundenes KRnet und seine Anwendungen zur Dichteschätzung und -annäherung

KRnet及其在密度估计和近似方面的应用 2305.09063v4

Authors (3): Li Zeng, Xiaoliang Wan, Tao Zhou

In this paper, we develop an invertible mapping, called B-KRnet, on a bounded domain and apply it to density estimation/approximation for data or the solutions of PDEs such as the Fokker-Planck equation and the Keller-Segel equation. Similar to KRnet, B-KRnet consists of a series of coupling layers with progressively fewer active transformation dimensions, inspired by the triangular structure of the Knothe-Rosenblatt (KR) rearrangement. The main difference between B-KRnet and KRnet is that B-KRnet is defined on a hypercube while KRnet is defined on the whole space, in other words, a new mechanism is introduced in B-KRnet to maintain the exact invertibility. Using B-KRnet as a transport map, we obtain an explicit probability density function (PDF) model that corresponds to the pushforward of a base (uniform) distribution on the hypercube. It can be directly applied to density estimation when only data are available. By coupling KRnet and B-KRnet, we define a deep generative model on a high-dimensional domain where some dimensions are bounded and other dimensions are unbounded. A typical case is the solution of the stationary kinetic Fokker-Planck equation, which is a PDF of position and momentum. Based on B-KRnet, we develop an adaptive learning approach to approximate partial differential equations whose solutions are PDFs or can be treated as PDFs. A variety of numerical experiments is presented to demonstrate the effectiveness of B-KRnet.

nan

Article 926

Title@2025-07-25 (5): Gradient-based grand canonical optimization enabled by graph neural networks with fractional atomic existence

Title: Gradient-based grand canonical optimization enabled by graph neural networks with fractional atomic existence

Gradient-basierte große kanonische Optimierung durch Graphen neuronale Netzwerke mit fraktional atomare Existenz ermöglicht

由具有分原子存在的图形神经网络促成的基于梯度的大锥体优化 2507.19438v1

Authors (2): Mads-Peter Verner Christiansen, Bjørk Hammer

Machine learning interatomic potentials have become an indispensable tool for materials science, enabling the study of larger systems and longer timescales. State-of-the-art models are generally graph neural networks that employ message passing to iteratively update atomic embeddings that are ultimately used for predicting properties. In this work we extend the message passing formalism with the inclusion of a continuous variable that accounts for fractional atomic existence. This allows us to calculate the gradient of the Gibbs free energy with respect to both the Cartesian coordinates of atoms and their existence. Using this we propose a gradient-based grand canonical optimization method and document its capabilities for a Cu(110) surface oxide.

nan

Article 927

Title@2025-07-25 (5): Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

Title: Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

Beobachtungen treffen auf Aktionen: Learning Control-Sufficient Representations for Robust Policy Generalization

行动:学习控制-足够的代表性,促进强有力的政策普遍化 2507.19437v1

Authors (4): Yuliang Gu, Hongpeng Cao, Marco Caccamo, Naira Hovakimyan

Capturing latent variations (“contexts”) is key to deploying reinforcement-learning (RL) agents beyond their training regime. We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy: observation sufficiency (preserving all predictive information) and control sufficiency (retaining decision-making relevant information). Exploiting this dichotomy, we derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning and optimizes it with Bottlenecked Contextual Policy Optimization (BCPO), an algorithm that places a variational information-bottleneck encoder in front of any off-policy policy learner. On standard continuous-control benchmarks with shifting physical parameters, BCPO matches or surpasses other baselines while using fewer samples and retaining performance far outside the training regime. The framework unifies theory, diagnostics, and practice for context-based RL.

nan

Article 928

Title@2025-07-25 (5): TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Title: TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

TaylorPODA: Eine Taylor Expansion-basierte Methode zur Verbesserung der Post-Hoc-Attributionen für Opaque-Modelle

泰勒·泰勒:以扩大泰勒为基础的方法,改进不透明模式的后住房分配办法 2507.10643v2

Authors (3): Yuchi Tang, Iñaki Esnaola, George Panoutsos

Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building on the Taylor expansion framework introduced by Deng et al. (2024) to unify existing local attribution methods, we propose a rigorous set of postulates – “precision”, “federation”, and “zero-discrepancy” – to govern Taylor term-specific attribution. Guided by these postulates, we introduce TaylorPODA (Taylor expansion-derived imPortance-Order aDapted Attribution), which incorporates an additional “adaptation” property. This property enables alignment with task-specific goals, especially in post-hoc settings lacking ground-truth explanations. Empirical evaluations demonstrate that TaylorPODA achieves competitive results against baseline methods, providing principled and visualization-friendly explanations. This work represents a step toward the trustworthy deployment of opaque models by offering explanations with stronger theoretical grounding.

nan

Article 929

Title@2025-07-25 (5): ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding

Title: ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding

ASR-geführte Lautsprecher-Rolle-Diarisierung und Diarisierung-geführte ASR-Dekodierung

ASR 代号:ASR 代号:ASR 2507.17765v2

Authors (5): Arindam Ghosh, Mark Fuhs, Bongjun Kim, Anurag Chowdhury, Monika Woszczyna

From an application standpoint, speaker-role diarization (RD), such as doctor vs. patient, host vs. guest, etc. is often more useful than traditional speaker diarization (SD), which assigns generic labels like speaker-1, speaker-2 etc. In the context of joint automatic speech recognition (ASR) + SD (who spoke what?), recent end-to-end models employ an auxiliary SD transducer, synchronized with the ASR transducer, to predict speakers per word. In this paper, we extend this framework to RD with three key contributions: (1) we simplify the training via forced alignment and cross-entropy loss instead of RNNT loss, (2) we show that word prediction and role prediction require different amounts of predictor’s context, leading to separate task-specific predictors, unlike existing shared-predictor models, and (3) we propose a way to leverage RD posterior activity to influence ASR decoding and reduce small-word deletion errors.

nan

Article 930

Title@2025-07-25 (5): Distillation Scaling Laws

Title: Distillation Scaling Laws

Destillationsskalierungsgesetze

强化法律 2502.08606v2

Authors (6): Dan Busbridge, Amitis Shidani, Floris Weers, Jason Ramapuram, Etai Littwin, Russ Webb

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key scenarios: when a teacher already exists, and when a teacher needs training. In settings involving many students or an existing teacher, distillation outperforms supervised learning up to a compute level that scales predictably with student size. Conversely, if only one student is to be distilled and a teacher also requires training, supervised learning is generally preferable. Additionally, our large-scale study of distillation increases our understanding of the process and helps inform experimental design.

nan

Article 931

Title@2025-07-25 (5): Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics

Title: Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics

Integrieren von Physik und Topologie in neurale Netzwerke zum Lernen von Starrkörperdynamik

将物理和地形学纳入学习硬体体动力学神经网络 2411.11467v3

Authors (2): Amaury Wei, Olga Fink

Rigid body interactions are fundamental to numerous scientific disciplines, but remain challenging to simulate due to their abrupt nonlinear nature and sensitivity to complex, often unknown environmental factors. These challenges call for adaptable learning-based methods capable of capturing complex interactions beyond explicit physical models and simulations. While graph neural networks can handle simple scenarios, they struggle with complex scenes and long-term predictions. We introduce a novel framework for modeling rigid body dynamics and learning collision interactions, addressing key limitations of existing graph-based methods. Our approach extends the traditional representation of meshes by incorporating higher-order topology complexes, offering a physically consistent representation. Additionally, we propose a physics-informed message-passing neural architecture, embedding physical laws directly in the model. Our method demonstrates superior accuracy, even during long rollouts, and exhibits strong generalization to unseen scenarios. Importantly, this work addresses the challenge of multi-entity dynamic interactions, with applications spanning diverse scientific and engineering domains.

nan

Article 932

Title@2025-07-25 (5): Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Title: Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Schritt-3 ist groß und dennoch erschwinglich: Modell-System-Co-Design für kostengünstige Decodierung

第3步是大号但价格可承受的:具有成本效益的编码模型系统共同设计。 2507.19427v1

Authors (200): StepFun, :, Bin Wang, Bojun Wang, Changyi Wan, Guanzhe Huang, Hanpeng Hu, Haonan Jia, Hao Nie, Mingliang Li, Nuo Chen, Siyu Chen, Song Yuan, Wuxun Xie, Xiaoniu Song, Xing Chen, Xingping Yang, Xuelin Zhang, Yanbo Yu, Yaoyu Wang, Yibo Zhu, Yimin Jiang, Yu Zhou, Yuanwei Lu, Houyi Li, Jingcheng Hu, Ka Man Lo, Ailin Huang, Binxing Jiao, Bo Li, Boyu Chen, Changxin Miao, Chang Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengyuan Yao, Daokuan Lv, Dapeng Shi, Deshan Sun, Ding Huang, Dingyuan Hu, Dongqing Pang, Enle Liu, Fajie Zhang, Fanqi Wan, Gulin Yan, Han Zhang, Han Zhou, Hanghao Wu, Hangyu Guo, Hanqi Chen, Hanshan Zhang, Hao Wu, Haocheng Zhang, Haolong Yan, Haoran Lv, Haoran Wei, Hebin Zhou, Heng Wang, Heng Wang, Hongxin Li, Hongyu Zhou, Hongyuan Wang, Huiyong Guo, Jia Wang, Jiahao Gong, Jialing Xie, Jian Zhou, Jianjian Sun, Jiaoren Wu, Jiaran Zhang, Jiayu Liu, Jie Cheng, Jie Luo, Jie Yan, Jie Yang, Jieyi Hou, Jinguang Zhang, Jinlan Cao, Jisheng Yin, Junfeng Liu, Junhao Huang, Junzhe Lin, Kaijun Tan, Kaixiang Li, Kang An, Kangheng Lin, Kenkun Liu, Lei Yang, Liang Zhao, Liangyu Chen, Lieyu Shi, Liguo Tan, Lin Lin, Lin Zhang, Lina Chen, Liwen Huang, Liying Shi, Longlong Gu, Mei Chen, Mengqiang Ren, Ming Li, Mingzhe Chen, Na Wang, Nan Wu, Qi Han, Qian Zhao, Qiang Zhang, Qianni Liu, Qiaohui Chen, Qiling Wu, Qinglin He, Qinyuan Tan, Qiufeng Wang, Qiuping Wu, Qiuyan Liang, Quan Sun, Rui Li, Ruihang Miao, Ruosi Wan, Ruyan Guo, Shangwu Zhong, Shaoliang Pang, Shengjie Fan, Shijie Shang, Shilei Jiang, Shiliang Yang, Shiming Hao, Shuli Gao, Siming Huang, Siqi Liu, Tiancheng Cao, Tianhao Cheng, Tianhao Peng, Wang You, Wei Ji, Wen Sun, Wenjin Deng, Wenqing He, Wenzhen Zheng, Xi Chen, Xiangwen Kong, Xianzhen Luo, Xiaobo Yang, Xiaojia Liu, Xiaoxiao Ren, Xin Han, Xin Li, Xin Wu, Xu Zhao, Yanan Wei, Yang Li, Yangguang Li, Yangshijie Xu, Yanming Xu, Yaqiang Shi, Yeqing Shen, Yi Yang, Yifei Yang, Yifeng Gong, Yihan Chen, Yijing Yang, Yinmin Zhang, Yizhuang Zhou, Yuanhao Ding, Yuantao Fan, Yuanzhen Yang, Yuchu Luo, Yue Peng, Yufan Lu, Yuhang Deng, Yuhe Yin, Yujie Liu, Yukun Chen, Yuling Zhao, Yun Mou, Yunlong Li, Yunzhou Ju, Yusheng Li, Yuxiang Yang, Yuxiang Zhang, Yuyang Chen, Zejia Weng, Zhe Xie, Zheng Ge, Zheng Gong, Zhenyi Lu, Zhewei Huang, Zhichao Chang, Zhiguo Huang, Zhirui Wang, Zidong Yang, Zili Wang, Ziqi Wang, Zixin Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Xiangyu Zhang

Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B parameters per token (more than DeepSeek-V3 and Qwen3 MoE 235B), demonstrating that hardware-aligned attention arithmetic intensity, MoE sparsity, and AFD are critical to cost-effectiveness. We perform a head-to-head comparison with DeepSeek-V3 in its favorable scenarios. Our implementation on Hopper GPUs achieves a decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA (4K context, FP8, no MTP). It is higher than DeepSeek-V3’s 2,324 in the same setup and sets a new Pareto frontier for LLM decoding.

nan

Article 933

Title@2025-07-25 (5): Perfect Clustering in Very Sparse Diverse Multiplex Networks

Title: Perfect Clustering in Very Sparse Diverse Multiplex Networks

Perfektes Clustering in sehr Sparse Diverse Multiplex-Netzwerke

在非常分散的多元多功能网络中完美分组 2507.19423v1

Authors (1): Marianna Pensky

The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers in the same group are embedded in the same ambient subspace but otherwise matrices of connection probabilities can be all different. This setting includes majority of multilayer network models as its particular cases. The key task in this model is to recover the groups of layers with unique subspace structures, since the case where all layers of the network are embedded in the same subspace has been fairly well studied. Until now, clustering of layers in such networks was based on the layer-per-layer analysis, which required the multilayer network to be sufficiently dense. Nevertheless, in this paper we succeeded in pooling information in all layers together and providing a tensor-based methodology that ensures perfect clustering for a much sparser network. Our theoretical results, established under intuitive non-restrictive assumptions, assert that the new technique achieves perfect clustering under sparsity conditions that, up to logarithmic factors, coincide with the computational lower bound derived for a much simpler model.

nan

Article 934

Title@2025-07-25 (5): Programmable Virtual Humans Toward Human Physiologically-Based Drug Discovery

Title: Programmable Virtual Humans Toward Human Physiologically-Based Drug Discovery

Programmierbare virtuelle Menschen auf dem Weg zur physiologischen Drogenentdeckung

人类生理病理药物发现方案虚拟人类 2507.19568v1

Authors (3): You Wu, Philip E. Bourne, Lei Xie

Artificial intelligence (AI) has sparked immense interest in drug discovery, but most current approaches only digitize existing high-throughput experiments. They remain constrained by conventional pipelines. As a result, they do not address the fundamental challenges of predicting drug effects in humans. Similarly, biomedical digital twins, largely grounded in real-world data and mechanistic models, are tailored for late-phase drug development and lack the resolution to model molecular interactions or their systemic consequences, limiting their impact in early-stage discovery. This disconnect between early discovery and late development is one of the main drivers of high failure rates in drug discovery. The true promise of AI lies not in augmenting current experiments but in enabling virtual experiments that are impossible in the real world: testing novel compounds directly in silico in the human body. Recent advances in AI, high-throughput perturbation assays, and single-cell and spatial omics across species now make it possible to construct programmable virtual humans: dynamic, multiscale models that simulate drug actions from molecular to phenotypic levels. By bridging the translational gap, programmable virtual humans offer a transformative path to optimize therapeutic efficacy and safety earlier than ever before. This perspective introduces the concept of programmable virtual humans, explores their roles in a new paradigm of drug discovery centered on human physiology, and outlines key opportunities, challenges, and roadmaps for their realization.

nan

Article 935

Title@2025-07-25 (5): CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Title: CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

CircuitProbe: Spatiotemporale visuelle Semantik mit Circuit Tracing

电路探测:用电路追踪解剖时光视觉语义 2507.19420v1

Authors (9): Yiming Zhang, Chengzhang Yu, Zhuokai Zhao, Kun Wang, Qiankun Li, Zihan Chen, Yang Liu, Zenghui Ding, Yining Sun

The processing mechanisms underlying language and image understanding in large vision-language models (LVLMs) have been extensively studied. However, the internal reasoning mechanisms of LVLMs for spatiotemporal understanding remain poorly understood. In this work, we introduce a systematic, circuit-based framework designed to investigate how spatiotemporal visual semantics are represented and processed within these LVLMs. Specifically, our framework comprises three circuits: visual auditing circuit, semantic tracing circuit, and attention flow circuit. Through the lens of these circuits, we discover that visual semantics are highly localized to specific object tokens–removing these tokens can degrade model performance by up to 92.6%. Furthermore, we identify that interpretable concepts of objects and actions emerge and become progressively refined in the middle-to-late layers of LVLMs. In contrary to the current works that solely focus on objects in one image, we reveal that the middle-to-late layers of LVLMs exhibit specialized functional localization for spatiotemporal semantics. Our findings offer significant mechanistic insights into spatiotemporal semantics analysis of LVLMs, laying a foundation for designing more robust and interpretable models.

nan

Article 936

Title@2025-07-25 (5): SILS: Strategic Influence on Liquidity Stability and Whale Detection in Concentrated-Liquidity DEXs

Title: SILS: Strategic Influence on Liquidity Stability and Whale Detection in Concentrated-Liquidity DEXs

SILS: Strategischer Einfluss auf Liquiditätsstabilität und Whale Detection in Konzentrations-Liquiditäts-DEXs

SILS: 集中-公平性DEX对流动性稳定和捕鲸探测的战略影响 2507.19411v1

Authors (4): Ali RajabiNekoo, Laleh Rasoul, Amirfarhad Farhadi, Azadeh Zamanifar

Traditional methods for identifying impactful liquidity providers (LPs) in Concentrated Liquidity Market Makers (CLMMs) rely on broad measures, such as nominal capital size or surface-level activity, which often lead to inaccurate risk analysis. The SILS framework offers a significantly more detailed approach, characterizing LPs not just as capital holders but as dynamic systemic agents whose actions directly impact market stability. This represents a fundamental paradigm shift from the static, volume-based analysis to a dynamic, impact-focused understanding. This advanced approach uses on-chain event logs and smart contract execution traces to compute Exponential Time-Weighted Liquidity (ETWL) profiles and apply unsupervised anomaly detection. Most importantly, it defines an LP’s functional importance through the Liquidity Stability Impact Score (LSIS), a counterfactual metric that measures the potential degradation of the market if the LP withdraws. This combined approach provides a more detailed and realistic characterization of an LP’s impact, moving beyond the binary and often misleading classifications used by existing methods. This impact-focused and comprehensive approach enables SILS to accurately identify high-impact LPs-including those missed by traditional methods and supports essential applications like a protective oracle layer and actionable trader signals, thereby significantly enhancing DeFi ecosystem. The framework provides unprecedented transparency into the underlying liquidity structure and associated risks, effectively reducing the common false positives and uncovering critical false negatives found in traditional models. Therefore, SILS provides an effective mechanism for proactive risk management, transforming how DeFi protocols safeguard their ecosystems against asymmetric liquidity behavior.

nan

Article 937

Title@2025-07-25 (5): On Arbitrary Predictions from Equally Valid Models

Title: On Arbitrary Predictions from Equally Valid Models

Auf willkürliche Vorhersagen von gleichermaßen gültigen Modellen

从同等有效模式作出的任意预测 2507.19408v1

Authors (7): Sarah Lockfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis

Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient – a risk that is poorly understood and insufficiently addressed. In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible models – highlighting patients that would receive arbitrary diagnoses if any single model were used. In contrast, (3) a small ensemble paired with an abstention strategy can effectively mitigate measurable predictive multiplicity in practice; predictions with high inter-model consensus may thus be amenable to automated classification. While accuracy is not a principled antidote to predictive multiplicity, we find that (4) higher accuracy achieved through increased model capacity reduces predictive multiplicity. Our findings underscore the clinical importance of accounting for model multiplicity and advocate for ensemble-based strategies to improve diagnostic reliability. In cases where models fail to reach sufficient consensus, we recommend deferring decisions to expert review.

nan

Article 938

Title@2025-07-25 (5): Review of Deep Learning Applications to Structural Proteomics Enabled by Cryogenic Electron Microscopy and Tomography

Title: Review of Deep Learning Applications to Structural Proteomics Enabled by Cryogenic Electron Microscopy and Tomography

Überprüfung von Deep-Learning-Anwendungen zur strukturellen Proteomik durch die kryogene Elektronenmikroskopie und Tomographie aktiviert

审查通过低低温电动显微镜和地形学对结构蛋白质组的深学习应用 2507.19565v1

Authors (5): Brady K. Zhou, Jason J. Hu, Jane K. J. Lee, Z. Hong Zhou, Demetri Terzopoulos

The past decade’s “cryoEM revolution” has produced exponential growth in high-resolution structural data through advances in cryogenic electron microscopy (cryoEM) and tomography (cryoET). Deep learning integration into structural proteomics workflows addresses longstanding challenges including low signal-to-noise ratios, preferred orientation artifacts, and missing-wedge problems that historically limited efficiency and scalability. This review examines AI applications across the entire cryoEM pipeline, from automated particle picking using convolutional neural networks (Topaz, crYOLO, CryoSegNet) to computational solutions for preferred orientation bias (spIsoNet, cryoPROS) and advanced denoising algorithms (Topaz-Denoise). In cryoET, tools like IsoNet employ U-Net architectures for simultaneous missing-wedge correction and noise reduction, while TomoNet streamlines subtomogram averaging through AI-driven particle detection. The workflow culminates with automated atomic model building using sophisticated tools like ModelAngelo, DeepTracer, and CryoREAD that translate density maps into interpretable biological structures. These AI-enhanced approaches have achieved near-atomic resolution reconstructions with minimal manual intervention, resolved previously intractable datasets suffering from severe orientation bias, and enabled successful application to diverse biological systems from HIV virus-like particles to in situ ribosomal complexes. As deep learning evolves, particularly with large language models and vision transformers, the future promises sophisticated automation and accessibility in structural biology, potentially revolutionizing our understanding of macromolecular architecture and function.

nan

Article 939

Title@2025-07-25 (5): FD4QC: Application of Classical and Quantum-Hybrid Machine Learning for Financial Fraud Detection A Technical Report

Title: FD4QC: Application of Classical and Quantum-Hybrid Machine Learning for Financial Fraud Detection A Technical Report

FD4QC: Anwendung von klassischem und Quantum-Hybrid-Maschinenlernen für die Erkennung von Finanzbetrug Ein technischer Bericht

FD4QC:应用古典和量子研究机器学习用于金融欺诈侦查技术报告 2507.19402v1

Authors (8): Matteo Cardaioli, Luca Marangoni, Giada Martini, Francesco Mazzolin, Luca Pajola, Andrea Ferretto Parodi, Alessandra Saitta, Maria Chiara Vernillo

The increasing complexity and volume of financial transactions pose significant challenges to traditional fraud detection systems. This technical report investigates and compares the efficacy of classical, quantum, and quantum-hybrid machine learning models for the binary classification of fraudulent financial activities. As of our methodology, first, we develop a comprehensive behavioural feature engineering framework to transform raw transactional data into a rich, descriptive feature set. Second, we implement and evaluate a range of models on the IBM Anti-Money Laundering (AML) dataset. The classical baseline models include Logistic Regression, Decision Tree, Random Forest, and XGBoost. These are compared against three hybrid classic quantum algorithms architectures: a Quantum Support Vector Machine (QSVM), a Variational Quantum Classifier (VQC), and a Hybrid Quantum Neural Network (HQNN). Furthermore, we propose Fraud Detection for Quantum Computing (FD4QC), a practical, API-driven system architecture designed for real-world deployment, featuring a classical-first, quantum-enhanced philosophy with robust fallback mechanisms. Our results demonstrate that classical tree-based models, particularly \textit{Random Forest}, significantly outperform the quantum counterparts in the current setup, achieving high accuracy ((97.34\%)) and F-measure ((86.95\%)). Among the quantum models, \textbf{QSVM} shows the most promise, delivering high precision ((77.15\%)) and a low false-positive rate ((1.36\%)), albeit with lower recall and significant computational overhead. This report provides a benchmark for a real-world financial application, highlights the current limitations of quantum machine learning in this domain, and outlines promising directions for future research.

nan

Article 940

Title@2025-07-25 (5): Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Title: Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten

精神病纵向数据产生的可预期的学习结果 2506.16629v4

Authors (1): Eric V. Strobl

Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.

nan

Article 941

Title@2025-07-25 (5): Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

Title: Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

Disentangled Latent Spaces erleichtern datengestütztes Hilfslernen

促进数据驱动辅助学习 2310.09278v3

Authors (6): Geri Skenderi, Luigi Capogrosso, Andrea Toaiari, Matteo Denitto, Franco Fummi, Simone Melzi

Auxiliary tasks facilitate learning in situations where data is scarce or the principal task of interest is extremely complex. This idea is primarily inspired by the improved generalization capability induced by solving multiple tasks simultaneously, which leads to a more robust shared representation. Nevertheless, finding optimal auxiliary tasks is a crucial problem that often requires hand-crafted solutions or expensive meta-learning approaches. In this paper, we propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover a new unrelated auxiliary classification task, which allows us to go from a Single-Task Learning (STL) to a Multi-Task Learning (MTL) problem. The disentanglement procedure works at the representation level, isolating the variation related to the principal task into an isolated subspace and additionally producing an arbitrary number of orthogonal subspaces, each of which encourages high separability among projections. We generate the auxiliary classification task through a clustering procedure on the most disentangled subspace, obtaining a discrete set of labels. Subsequently, the original data, the labels associated with the principal task, and the newly discovered ones can be fed into any MTL framework. Experimental validation on both synthetic and real data, along with various ablation studies, demonstrates promising results, revealing the potential in what has been, so far, an unexplored connection between learning disentangled representations and MTL. The source code is available at https://github.com/intelligolabs/Detaux.

nan

Article 942

Title@2025-07-25 (5): Multi-fidelity Bayesian Data-Driven Design of Energy Absorbing Spinodoid Cellular Structures

Title: Multi-fidelity Bayesian Data-Driven Design of Energy Absorbing Spinodoid Cellular Structures

Multi-Fidelity Bayesian Data-Driven Design of Energy Absorbing Spinodoid Zelluläre Strukturen

多纤维贝耶斯数据驱动设计 2507.22079v1

Authors (5): Leo Guo, Hirak Kansara, Siamak F. Khosroshahi, GuoQi Zhang, Wei Tan

Finite element (FE) simulations of structures and materials are getting increasingly more accurate, but also more computationally expensive as a collateral result. This development happens in parallel with a growing demand of data-driven design. To reconcile the two, a robust and data-efficient optimization method called Bayesian optimization (BO) has been previously established as a technique to optimize expensive objective functions. In parallel, the mesh width of an FE model can be exploited to evaluate an objective at a lower or higher fidelity (cost & accuracy) level. The multi-fidelity setting applied to BO, called multi-fidelity BO (MFBO), has also seen previous success. However, BO and MFBO have not seen a direct comparison with when faced with with a real-life engineering problem, such as metamaterial design for deformation and absorption qualities. Moreover, sampling quality and assessing design parameter sensitivity is often an underrepresented part of data-driven design. This paper aims to address these shortcomings by employing Sobol’ samples with variance-based sensitivity analysis in order to reduce design problem complexity. Furthermore, this work describes, implements, applies and compares the performance BO with that MFBO when maximizing the energy absorption (EA) problem of spinodoid cellular structures is concerned. The findings show that MFBO is an effective way to maximize the EA of a spinodoid structure and is able to outperform BO by up to 11% across various hyperparameter settings. The results, which are made open-source, serve to support the utility of multi-fidelity techniques across expensive data-driven design problems.

nan

Article 943

Title@2025-07-25 (5): Agreement-Based Cascading for Efficient Inference

Title: Agreement-Based Cascading for Efficient Inference

Vereinbarungsbasiertes Cascading für effiziente Schlussfolgerungen

以协议为基础的高效推断的连锁计算 2407.02348v3

Authors (4): Steven Kolawole, Don Dennis, Ameet Talwalkar, Virginia Smith

Adaptive inference schemes reduce the cost of machine learning inference by assigning smaller models to easier examples, attempting to avoid invocation of larger models when possible. In this work we explore a simple, effective adaptive inference technique we term Agreement-Based Cascading (ABC). ABC builds a cascade of models of increasing size/complexity, and uses agreement between ensembles of models at each level of the cascade as a basis for data-dependent routing. Although ensemble execution introduces additional expense, we show that these costs can be easily offset in practice due to large expected differences in model sizes, parallel inference execution capabilities, and accuracy benefits of ensembling. We examine ABC theoretically and empirically in terms of these parameters, showing that the approach can reliably act as a drop-in replacement for existing models and surpass the best single model it aims to replace in terms of both efficiency and accuracy. Additionally, we explore the performance of ABC relative to existing cascading methods in three common scenarios: (1) edge-to-cloud inference, where ABC reduces communication costs by up to 14x; (2) cloud-based model serving, where it achieves a 3x reduction in rental costs; and (3) inference via model API services, where ABC achieves a 2-25x reduction in average price per token/request relative to state-of-the-art LLM cascades.

nan

Article 944

Title@2025-07-25 (5): Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Title: Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Multimodale Recurrent-Ensembles zur Vorhersage von Gehirnreaktionen auf naturalistische Filme (Algonauten 2025)

预测对自然电影的脑反应的多式经常性多年度联合会议(2025年8月20日) 2507.17897v2

Authors (3): Semih Eren, Deniz Kucukahmetler, Nico Scherf

Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained video, audio, and language embeddings to fMRI time series recorded while four subjects watched almost 80 hours of movies provided by the Algonauts 2025 challenge. Modality-specific bidirectional RNNs encode temporal dynamics; their hidden states are fused and passed to a second recurrent layer, and lightweight subject-specific heads output responses for 1000 cortical parcels. Training relies on a composite MSE-correlation loss and a curriculum that gradually shifts emphasis from early sensory to late association regions. Averaging 100 model variants further boosts robustness. The resulting system ranked third on the competition leaderboard, achieving an overall Pearson r = 0.2094 and the highest single-parcel peak score (mean r = 0.63) among all participants, with particularly strong gains for the most challenging subject (Subject 5). The approach establishes a simple, extensible baseline for future multimodal brain-encoding benchmarks.

nan

Article 945

Title@2025-07-25 (5): Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

Title: Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

Vielfältige LLMs oder unterschiedliche Frageinterpretationen? Das ist die Assembling-Frage

不同的LLMs或不同的问题解释? 2507.21168v1

Authors (2): Rafael Rosales, Santiago Miret

Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.

nan

Article 946

Title@2025-07-25 (5): Learning neuro-symbolic convergent term rewriting systems

Title: Learning neuro-symbolic convergent term rewriting systems

neuro-symbolische konvergente Begriffs-Rewriting-Systeme lernen

学习神经 – – 共聚性神经 – – 神经 – – 共用术语重写系统 2507.19372v1

Authors (3): Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

Building neural systems that can learn to execute symbolic algorithms is a challenging open problem in artificial intelligence, especially when aiming for strong generalization and out-of-distribution performance. In this work, we introduce a general framework for learning convergent term rewriting systems using a neuro-symbolic architecture inspired by the rewriting algorithm itself. We present two modular implementations of such architecture: the Neural Rewriting System (NRS) and the Fast Neural Rewriting System (FastNRS). As a result of algorithmic-inspired design and key architectural elements, both models can generalize to out-of-distribution instances, with FastNRS offering significant improvements in terms of memory efficiency, training speed, and inference time. We evaluate both architectures on four tasks involving the simplification of mathematical formulas and further demonstrate their versatility in a multi-domain learning scenario, where a single model is trained to solve multiple types of problems simultaneously. The proposed system significantly outperforms two strong neural baselines: the Neural Data Router, a recent transformer variant specifically designed to solve algorithmic problems, and GPT-4o, one of the most powerful general-purpose large-language models. Moreover, our system matches or outperforms the latest o1-preview model from OpenAI that excels in reasoning benchmarks.

nan

Article 947

Title@2025-07-25 (5): Deep Learning for Double Auction

Title: Deep Learning for Double Auction

Deep Learning für doppelte Auktion

双重拍卖深度学习 2504.05355v2

Authors (2): Jiayin Liu, Chenglong Zhang

Auctions are important mechanisms extensively implemented in various markets, e.g., search engines’ keyword auctions, antique auctions, etc. Finding an optimal auction mechanism is extremely difficult due to the constraints of imperfect information, incentive compatibility (IC), and individual rationality (IR). In addition to the traditional economic methods, some recently attempted to find the optimal (single) auction using deep learning methods. Unlike those attempts focusing on single auctions, we develop deep learning methods for double auctions, where imperfect information exists on both the demand and supply sides. The previous attempts on single auction cannot directly apply to our contexts and those attempts additionally suffer from limited generalizability, inefficiency in ensuring the constraints, and learning fluctuations. We innovate in designing deep learning models for solving the more complex problem and additionally addressing the previous models’ three limitations. Specifically, we achieve generalizability by leveraging a transformer-based architecture to model market participants as sequences for varying market sizes; we utilize the numerical features of the constraints and pre-treat them for a higher learning efficiency; we develop a gradient-conflict-elimination scheme to address the problem of learning fluctuation. Extensive experimental evaluations demonstrate the superiority of our approach to classical and machine learning baselines.

nan

Article 948

Title@2025-07-25 (5): Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation

Title: Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation

Counterfactual Erklärungen in der medizinischen Bildgebung: Erforschung SPN-geführter latenter Raummanipulation

医疗成像中的反事实解释:探索SPN-CPN 导航的冷空空间操纵 2507.19368v1

Authors (2): Julia Siekiera, Stefan Kramer

Artificial intelligence is increasingly leveraged across various domains to automate decision-making processes that significantly impact human lives. In medical image analysis, deep learning models have demonstrated remarkable performance. However, their inherent complexity makes them black box systems, raising concerns about reliability and interpretability. Counterfactual explanations provide comprehensible insights into decision processes by presenting hypothetical “what-if” scenarios that alter model classifications. By examining input alterations, counterfactual explanations provide patterns that influence the decision-making process. Despite their potential, generating plausible counterfactuals that adhere to similarity constraints providing human-interpretable explanations remains a challenge. In this paper, we investigate this challenge by a model-specific optimization approach. While deep generative models such as variational autoencoders (VAEs) exhibit significant generative power, probabilistic models like sum-product networks (SPNs) efficiently represent complex joint probability distributions. By modeling the likelihood of a semi-supervised VAE’s latent space with an SPN, we leverage its dual role as both a latent space descriptor and a classifier for a given discrimination task. This formulation enables the optimization of latent space counterfactuals that are both close to the original data distribution and aligned with the target class distribution. We conduct experimental evaluation on the cheXpert dataset. To evaluate the effectiveness of the integration of SPNs, our SPN-guided latent space manipulation is compared against a neural network baseline. Additionally, the trade-off between latent variable regularization and counterfactual quality is analyzed.

nan

Article 949

Title@2025-07-25 (5): A Data-Driven Approach to Estimate LEO Orbit Capacity Models

Title: A Data-Driven Approach to Estimate LEO Orbit Capacity Models

Ein datengestützter Ansatz zur Schätzung von LEO-Orbit-Kapazitätsmodellen

数据驱动的低地轨道轨道能力估计模型方法 2507.19365v1

Authors (3): Braden Stock, Maddox McVarthy, Simone Servadio

Utilizing the Sparse Identification of Nonlinear Dynamics algorithm (SINDy) and Long Short-Term Memory Recurrent Neural Networks (LSTM), the population of resident space objects, divided into Active, Derelict, and Debris, in LEO can be accurately modeled to predict future satellite and debris propagation. This proposed approach makes use of a data set coming from a computational expensive high-fidelity model, the MOCAT-MC, to provide a light, low-fidelity counterpart that provides accurate forecasting in a shorter time frame.

nan

Article 950

Title@2025-07-25 (5): LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

Title: LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

LOTUS: Ein Leaderboard für detaillierte Bildunterschriften von Qualität zu gesellschaftlichen Bias und Benutzereinstellungen

LOTUS: 从质量到社会偏见和用户首选的详细图像描述领导板 2507.19362v1

Authors (10): Yusuke Hirota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic, Yuta Nakashima, Marco Pavone, Yejin Choi, Yu-Chiang Frank Wang, Chao-Han Huck Yang

Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.

nan

Article 951

Title@2025-07-25 (5): EffiComm: Bandwidth Efficient Multi Agent Communication

Title: EffiComm: Bandwidth Efficient Multi Agent Communication

EffiComm: Bandbreite Effiziente Multi Agent Kommunikation

EffiComm: 宽带高效多代理通信 2507.19354v1

Authors (3): Melih Yazgan, Allen Xavier Arasan, J. Marius Zöllner

Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle’s blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing latency and scalability problems. We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy. EffiComm operates on Bird’s-Eye-View (BEV) feature maps from any modality and applies a two-stage reduction pipeline: (1) Selective Transmission (ST) prunes low-utility regions with a confidence mask; (2) Adaptive Grid Reduction (AGR) uses a Graph Neural Network (GNN) to assign vehicle-specific keep ratios according to role and network load. The remaining features are fused with a soft-gated Mixture-of-Experts (MoE) attention layer, offering greater capacity and specialization for effective feature integration. On the OPV2V benchmark, EffiComm reaches 0.84 mAP@0.7 while sending only an average of approximately 1.5 MB per frame, outperforming previous methods on the accuracy-per-bit curve. These results highlight the value of adaptive, learned communication for scalable Vehicle-to-Everything (V2X) perception.

nan

Article 952

Title@2025-07-25 (5): Reconstruction of Sparse Urban Wireless Signals via Group Equivariant Non-Expansive Operators

Title: Reconstruction of Sparse Urban Wireless Signals via Group Equivariant Non-Expansive Operators

Rekonstruktion von Sparse Urban Wireless Signals über konzernunabhängige, nicht expansive Betreiber

通过集团等离差非扩大经营人重建城市无线无线信号 2507.19349v1

Authors (7): Lorenzo Mario Amorosa, Francesco Conti, Nicola Quercioli, Flavio Zabini, Tayebeh Lotfi Mahyari, Yiqun Ge, Patrizio Frosini

In emerging communication systems such as sixth generation (6G) wireless networks, efficient resource management and service delivery rely on accurate knowledge of spatially-varying quantities like signal-to-interference-noise ratio (SINR) maps, which are costly to acquire at high resolution. This work explores the reconstruction of such spatial signals from sparse measurements using Group Equivariant Non-Expansive Operators (GENEOs), offering a low-complexity alternative to traditional neural networks. The concept of GENEO, which originated in topological data analysis (TDA), is a mathematical tool used in machine learning to represent agents modelled as functional operators acting on data while incorporating application-specific invariances. Leveraging these invariances reduces the number of parameters with respect to traditional neural networks and mitigates data scarcity by enforcing known algebraic and geometric constraints that reflect symmetries in the agents’ actions. In this paper, we introduce a novel GENEO-based approach for SINR map reconstruction in urban wireless communication networks using extremely sparse sampling. We demonstrate that this mathematical framework achieves competitive performance compared to established methods. Our evaluation, conducted using both statistical and TDA metrics, highlights the advantages of our approach in accurately reconstructing spatial signals under severe data limitations on the number of samples.

nan

Article 953

Title@2025-07-25 (5): Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges

Title: Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges

Kurzform-Video-Empfehlungen mit multimodalen Einbettungen: Bewältigung von Kaltstart- und Bias-Herausforderungen

短形式视频建议,带有多模式嵌有:应对冷发和偏见挑战 2507.19346v1

Authors (5): Andrii Dzhoha, Katya Mirylenka, Egor Malykh, Marco-Andrea Buchmann, Francesca Catino

In recent years, social media users have spent significant amounts of time on short-form video platforms. As a result, established platforms in other domains, such as e-commerce, have begun introducing short-form video content to engage users and increase their time spent on the platform. The success of these experiences is due not only to the content itself but also to a unique UI innovation: instead of offering users a list of choices to click, platforms actively recommend content for users to watch one at a time. This creates new challenges for recommender systems, especially when launching a new video experience. Beyond the limited interaction data, immersive feed experiences introduce stronger position bias due to the UI and duration bias when optimizing for watch-time, as models tend to favor shorter videos. These issues, together with the feedback loop inherent in recommender systems, make it difficult to build effective solutions. In this paper, we highlight the challenges faced when introducing a new short-form video experience and present our experience showing that, even with sufficient video interaction data, it can be more beneficial to leverage a video retrieval system using a fine-tuned multimodal vision-language model to overcome these challenges. This approach demonstrated greater effectiveness compared to conventional supervised learning methods in online experiments conducted on our e-commerce platform.

nan

Article 954

Title@2025-07-25 (5): Lower Bounds on the Size of Markov Equivalence Classes

Title: Lower Bounds on the Size of Markov Equivalence Classes

Untere Grenzen auf der Größe der Markov-Äquivalenzklassen

马克夫等等效类大小的下下界界圈 2506.20933v2

Authors (3): Erik Jahn, Frederick Eberhardt, Leonard J. Schulman

Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely observational data. Under the assumptions of acyclicity, causal sufficiency, and a uniform model prior, Markov equivalence classes are known to be small on average. In this paper, we show that this is no longer the case when any of these assumptions is relaxed. Specifically, we prove exponentially large lower bounds for the expected size of Markov equivalence classes in three settings: sparse random directed acyclic graphs, uniformly random acyclic directed mixed graphs, and uniformly random directed cyclic graphs.

nan

Article 955

Title@2025-07-25 (5): Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Title: Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Verdoppelung Ihrer Daten in Minuten: Ultraschnelle Tabellendatenerstellung über LLM-induzierte Abhängigkeitsgraphen

将数据翻倍:通过LLM-引导依赖图生成超快制表数据 2507.19334v1

Authors (4): Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci

Tabular data is critical across diverse domains, yet high-quality datasets remain scarce due to privacy concerns and the cost of collection. Contemporary approaches adopt large language models (LLMs) for tabular augmentation, but exhibit two major limitations: (1) dense dependency modeling among tabular features that can introduce bias, and (2) high computational overhead in sampling. To address these issues, we propose SPADA for SPArse Dependency-driven Augmentation, a lightweight generative framework that explicitly captures sparse dependencies via an LLM-induced graph. We treat each feature as a node and synthesize values by traversing the graph, conditioning each feature solely on its parent nodes. We explore two synthesis strategies: a non-parametric method using Gaussian kernel density estimation, and a conditional normalizing flow model that learns invertible mappings for conditional density estimation. Experiments on four datasets show that SPADA reduces constraint violations by 4% compared to diffusion-based methods and accelerates generation by nearly 9,500 times over LLM-based baselines.

nan

Article 956

Title@2025-07-25 (5): SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence

Title: SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence

SIDE: Sparse Information Entfremdung für erklärbare künstliche Intelligenz

SID: 用于可解释人工智能的粗略信息解析 2507.19321v1

Authors (4): Viktar Dubovik, Łukasz Struski, Jacek Tabor, Dawid Rymarczyk

Understanding the decisions made by deep neural networks is essential in high-stakes domains such as medical imaging and autonomous driving. Yet, these models often lack transparency, particularly in computer vision. Prototypical-parts-based neural networks have emerged as a promising solution by offering concept-level explanations. However, most are limited to fine-grained classification tasks, with few exceptions such as InfoDisent. InfoDisent extends prototypical models to large-scale datasets like ImageNet, but produces complex explanations. We introduce Sparse Information Disentanglement for Explainability (SIDE), a novel method that improves the interpretability of prototypical parts through a dedicated training and pruning scheme that enforces sparsity. Combined with sigmoid activations in place of softmax, this approach allows SIDE to associate each class with only a small set of relevant prototypes. Extensive experiments show that SIDE matches the accuracy of existing methods while reducing explanation size by over $90\%$, substantially enhancing the understandability of prototype-based explanations.

nan

Article 957

Title@2025-07-25 (5): Human-AI Synergy in Adaptive Active Learning for Continuous Lithium Carbonate Crystallization Optimization

Title: Human-AI Synergy in Adaptive Active Learning for Continuous Lithium Carbonate Crystallization Optimization

Human-AI-Synergie im adaptiven aktiven Lernen für kontinuierliche Lithium-Karbonat-Kristallisierungs-Optimierung

人类-AI 在不断碳化液晶化优化的适应性积极学习中的人类-AI协同效应 2507.19316v1

Authors (6): Shayan S. Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Cara Cronin, Jason E. Hein, Jason Hattrick-Simpers

As demand for high-purity lithium surges with the growth of the electric vehicle (EV) industry, cost-effective extraction from lower-grade North American sources like the Smackover Formation is critical. These resources, unlike high-purity South American brines, require innovative purification techniques to be economically viable. Continuous crystallization is a promising method for producing battery-grade lithium carbonate, but its optimization is challenged by a complex parameter space and limited data. This study introduces a Human-in-the-Loop (HITL) assisted active learning framework to optimize the continuous crystallization of lithium carbonate. By integrating human expertise with data-driven insights, our approach accelerates the optimization of lithium extraction from challenging sources. Our results demonstrate the framework’s ability to rapidly adapt to new data, significantly improving the process’s tolerance to critical impurities like magnesium from the industry standard of a few hundred ppm to as high as 6000 ppm. This breakthrough makes the exploitation of low-grade, impurity-rich lithium resources feasible, potentially reducing the need for extensive pre-refinement processes. By leveraging artificial intelligence, we have refined operational parameters and demonstrated that lower-grade materials can be used without sacrificing product quality. This advancement is a significant step towards economically harnessing North America’s vast lithium reserves, such as those in the Smackover Formation, and enhancing the sustainability of the global lithium supply chain.

nan

Article 958

Title@2025-07-25 (5): Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer

Title: Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer

Erzeugen klinisch realistischer EHR-Daten über einen Hierarchie- und Semantik-geführten Transformer

通过等级和语义学指导变换器生成临床现实的 EHR 数据 2502.20719v2

Authors (2): Guanglin Zhou, Sebastiano Barbieri

Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and employs a graph neural network to derive hierarchy-aware embeddings. These are then fused with semantic embeddings extracted from a pre-trained clinical language model (e.g., ClinicalBERT), enabling the Transformer-based generator to more accurately model the nuanced clinical patterns inherent in real EHRs. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that HiSGT significantly improves the statistical alignment of synthetic data with real patient records, as well as supports robust downstream applications such as chronic disease classification. By addressing the limitations of conventional raw code-based generative models, HiSGT represents a significant step toward clinically high-fidelity synthetic data generation and a general paradigm suitable for interpretable medical code representation, offering valuable applications in data augmentation and privacy-preserving healthcare analytics.

nan

Article 959

Title@2025-07-25 (5): Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

Title: Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

Accelerometry-based Energy Expenses Abschätzung während der Aktivitäten des täglichen Lebens: Ein Vergleich unter verschiedenen Accelerometer Zusammensetzungen

日常生活活动期间的能源支出估计:不同加速计构成的比较 2502.10112v2

Authors (5): Shuhao Que, Remco Poelarends, Peter Veltink, Miriam Vollenbroek-Hutten, Ying Wang

Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).

nan

Article 960

Title@2025-07-25 (5): Negative news posts are less prevalent and generate lower user engagement than non-negative news posts across six countries

Title: Negative news posts are less prevalent and generate lower user engagement than non-negative news posts across six countries

Negative Nachrichtenposts sind weniger verbreitet und erzeugen ein geringeres Nutzerengagement als nicht negative Nachrichtenposts in sechs Ländern

与六个国家的非负面新闻站相比,负新闻站不太普遍,用户参与率低于非负面新闻站 2507.19300v1

Authors (3): Szymon Talaga, Dominik Batorski, Magdalena Wojcieszak

Although news negativity is often studied, missing is comparative evidence on the prevalence of and engagement with negative political and non-political news posts on social media. We use 6,081,134 Facebook posts published between January 1, 2020, and April 1, 2024, by 97 media organizations in six countries (U.S., UK, Ireland, Poland, France, Spain) and develop two multilingual classifiers for labeling posts as (non-)political and (non-)negative. We show that: (1) negative news posts constitute a relatively small fraction (12.6%); (2) political news posts are neither more nor less negative than non-political news posts; (3) U.S. political news posts are less negative relative to the other countries on average (40% lower odds); (4) Negative news posts get 15% fewer likes and 13% fewer comments than non-negative news posts. Lastly, (5) we provide estimates of the proportion of the total volume of user engagement with negative news posts and show that only between 10.2% to 13.1% of engagement is linked to negative posts by the analyzed news organizations.

nan

Article 961

Title@2025-07-25 (5): Controlling Topological Defects in Polar Fluids via Reinforcement Learning

Title: Controlling Topological Defects in Polar Fluids via Reinforcement Learning

Kontrolle topologischer Defekte in Polarflüssigkeiten durch Verstärkungslernen

通过强化学习控制极地流体的地形病变 2507.19298v1

Authors (2): Abhinav Singh, Petros Koumoutsakos

Topological defects in active polar fluids exhibit complex dynamics driven by internally generated stresses, reflecting the deep interplay between topology, flow, and non-equilibrium hydrodynamics. Feedback control offers a powerful means to guide such systems, enabling transitions between dynamic states. We investigated closed-loop steering of integer-charged defects in a confined active fluid by modulating the spatial profile of activity. Using a continuum hydrodynamic model, we show that localized control of active stress induces flow fields that can reposition and direct defects along prescribed trajectories by exploiting non-linear couplings in the system. A reinforcement learning framework is used to discover effective control strategies that produce robust defect transport across both trained and novel trajectories. The results highlight how AI agents can learn the underlying dynamics and spatially structure activity to manipulate topological excitations, offering insights into the controllability of active matter and the design of adaptive, self-organized materials.

nan

Article 962

Title@2025-07-25 (5): Interpretable Cross-Sphere Multiscale Deep Learning Predicts ENSO Skilfully Beyond 2 Years

Title: Interpretable Cross-Sphere Multiscale Deep Learning Predicts ENSO Skilfully Beyond 2 Years

Interpretable Cross-Sphere Multiscale Deep Learning prognostiziert ENSO skilfully beyond 2 Years

跨跨阶段多尺度深层学习预测 2503.21211v2

Authors (5): Rixu Hao, Yuxin Zhao, Shaoqing Zhang, Guihua Wang, Xiong Deng

El Ni~no-Southern Oscillation (ENSO) exerts global climate and societal impacts, but real-time prediction with lead times beyond one year remains challenging. Dynamical models suffer from large biases and uncertainties, while deep learning struggles with interpretability and multi-scale dynamics. Here, we introduce PTSTnet, an interpretable model that unifies dynamical processes and cross-scale spatiotemporal learning in an innovative neural-network framework with physics-encoding learning. PTSTnet produces interpretable predictions significantly outperforming state-of-the-art benchmarks with lead times beyond 24 months, providing physical insights into error propagation in ocean-atmosphere interactions. PTSTnet learns feature representations with physical consistency from sparse data to tackle inherent multi-scale and multi-physics challenges underlying ocean-atmosphere processes, thereby inherently enhancing long-term prediction skill. Our successful realizations mark substantial steps forward in interpretable insights into innovative neural ocean modelling.

nan

Article 963

Title@2025-07-25 (5): Query Efficient Structured Matrix Learning

Title: Query Efficient Structured Matrix Learning

Effizientes strukturiertes Matrix-Lernen abfragen

查询高效结构化矩阵学习 2507.19290v1

Authors (8): Noah Amsel, Pratyush Avi, Tyler Chen, Feyza Duman Keles, Chinmay Hegde, Cameron Musco, Christopher Musco, David Persson

We study the problem of learning a structured approximation (low-rank, sparse, banded, etc.) to an unknown matrix $A$ given access to matrix-vector product (matvec) queries of the form $x \rightarrow Ax$ and $x \rightarrow A^Tx$. This problem is of central importance to algorithms across scientific computing and machine learning, with applications to fast multiplication and inversion for structured matrices, building preconditioners for first-order optimization, and as a model for differential operator learning. Prior work focuses on obtaining query complexity upper and lower bounds for learning specific structured matrix families that commonly arise in applications. We initiate the study of the problem in greater generality, aiming to understand the query complexity of learning approximations from general matrix families. Our main result focuses on finding a near-optimal approximation to $A$ from any finite-sized family of matrices, $\mathcal{F}$. Standard results from matrix sketching show that $O(\log

\mathcal{F}

)$ matvec queries suffice in this setting. This bound can also be achieved, and is optimal, for vector-matrix-vector queries of the form $x,y\rightarrow x^TAy$, which have been widely studied in work on rank-$1$ matrix sensing. Surprisingly, we show that, in the matvec model, it is possible to obtain a nearly quadratic improvement in complexity, to $\tilde{O}(\sqrt{\log

\mathcal{F}

})$. Further, we prove that this bound is tight up to log-log factors.Via covering number arguments, our result extends to well-studied infinite families. As an example, we establish that a near-optimal approximation from any \emph{linear matrix family} of dimension $q$ can be learned with $\tilde{O}(\sqrt{q})$ matvec queries, improving on an $O(q)$ bound achievable via sketching techniques and vector-matrix-vector queries.

nan

Article 964

Title@2025-07-25 (5): Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments

Title: Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments

Knowledge Grafting: Ein Mechanismus zur Optimierung von KI-Modellen in ressourcenbeschränkten Umgebungen

知识获取:优化在资源受限制的环境中采用AI模型模型的机制 2507.19261v1

Authors (5): Osama Almurshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana

The increasing adoption of Artificial Intelligence (AI) has led to larger, more complex models with numerous parameters that require substantial computing power – resources often unavailable in many real-world application scenarios. Our paper addresses this challenge by introducing knowledge grafting, a novel mechanism that optimizes AI models for resource-constrained environments by transferring selected features (the scion) from a large donor model to a smaller rootstock model. The approach achieves an 88.54% reduction in model size (from 64.39 MB to 7.38 MB), while improving generalization capability of the model. Our new rootstock model achieves 89.97% validation accuracy (vs. donor’s 87.47%), maintains lower validation loss (0.2976 vs. 0.5068), and performs exceptionally well on unseen test data with 90.45% accuracy. It addresses the typical size vs performance trade-off, and enables deployment of AI frameworks on resource-constrained devices with enhanced performance. We have tested our approach on an agricultural weed detection scenario, however, it can be extended across various edge computing scenarios, potentially accelerating AI adoption in areas with limited hardware/software support – by mirroring in a similar manner the horticultural grafting enables productive cultivation in challenging agri-based environments.

nan

Article 965

Title@2025-07-25 (5): Reactivation: Empirical NTK Dynamics Under Task Shifts

Title: Reactivation: Empirical NTK Dynamics Under Task Shifts

Reaktivierung: Empirische NTK-Dynamik unter Aufgabenverschiebungen

重新激活: 任务变换下的NTK实证动态 2507.16039v2

Authors (5): Yuzhi Liu, Zixuan Chen, Zirui Zhang, Yufei Liu, Giulia Lanzillotta

The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space. The evolution of the NTK during training is necessary for feature learning, a key driver of deep learning success. The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours. However, this body of work has been limited to the single task setting, where the data distribution is assumed constant over time. In this work, we present a comprehensive empirical analysis of NTK dynamics in continual learning, where the data distribution shifts over time. Our findings highlight continual learning as a rich and underutilized testbed for probing the dynamics of neural training. At the same time, they challenge the validity of static-kernel approximations in theoretical treatments of continual learning, even at large scale.

nan

Article 966

Title@2025-07-25 (5): Delphos: A reinforcement learning framework for assisting discrete choice model specification

Title: Delphos: A reinforcement learning framework for assisting discrete choice model specification

Delphos: Ein Verstärkungs-Lernrahmen zur Unterstützung diskreter Auswahlmodellspezifikation

Delphos:一个强化学习框架,协助制定独立选择模型规格 2506.06410v2

Authors (3): Gabriel Nova, Stephane Hess, Sander van Cranenburgh

We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process. Unlike traditional approaches that treat model specification as a static optimisation problem, Delphos represents a paradigm shift: it frames this specification challenge as a sequential decision-making problem, formalised as a Markov Decision Process. In this setting, an agent learns to specify well-performing model candidates by choosing a sequence of modelling actions - such as selecting variables, accommodating both generic and alternative-specific taste parameters, applying non-linear transformations, and including interactions with covariates - and interacting with a modelling environment that estimates each candidate and returns a reward signal. Specifically, Delphos uses a Deep Q-Network that receives delayed rewards based on modelling outcomes (e.g., log-likelihood) and behavioural expectations (e.g., parameter signs), and distributes rewards across the sequence of actions to learn which modelling decisions lead to well-performing candidates. We evaluate Delphos on both simulated and empirical datasets, varying the size of the modelling space and the reward function. To assess the agent’s performance in navigating the model space, we analyse the learning curve, the distribution of Q-values, occupancy metrics, and Pareto fronts. Our results show that the agent learns to adaptively explore strategies to identify well-performing models across search spaces, even without prior domain knowledge. It efficiently explores large modelling spaces, concentrates its search in high-reward regions, and suggests candidates that define Pareto frontiers balancing model fit and behavioural plausibility. These findings highlight the potential of this novel adaptive, learning-based framework to assist in the model specification process.

nan

Article 967

Title@2025-07-25 (5): A Markov Categorical Framework for Language Modeling

Title: A Markov Categorical Framework for Language Modeling

Ein kategorisches Markov-Rahmenwerk für Sprachmodellierung

用于语言建模的 Markov 语言建模分类框架 2507.19247v1

Authors (1): Yifan Zhang

Auto-regressive language models factorize sequence probabilities and are trained by minimizing the negative log-likelihood (NLL) objective. While empirically powerful, a deep theoretical understanding of why this simple objective yields such versatile representations remains elusive. This work introduces a unifying analytical framework using Markov Categories (MCs) to deconstruct the AR generation process and the NLL objective. We model the single-step generation map as a composition of Markov kernels in the category Stoch. This compositional view, when enriched with statistical divergences, allows us to dissect information flow and learned geometry. Our framework makes three main contributions. First, we provide a formal, information-theoretic rationale for the success of modern speculative decoding methods like EAGLE, quantifying the information surplus in hidden states that these methods exploit. Second, we formalize how NLL minimization forces the model to learn not just the next token, but the data’s intrinsic conditional stochasticity, a process we analyze using categorical entropy. Third, and most centrally, we prove that NLL training acts as an implicit form of spectral contrastive learning. By analyzing the information geometry of the model’s prediction head, we show that NLL implicitly forces the learned representation space to align with the eigenspectrum of a predictive similarity operator, thereby learning a geometrically structured space without explicit contrastive pairs. This compositional and information-geometric perspective reveals the deep structural principles underlying the effectiveness of modern LMs. Project Page: https://github.com/asiresearch/lm-theory

nan

Article 968

Title@2025-07-25 (5): AGORA: Incentivizing Group Emergence Capability in LLMs via Group Distillation

Title: AGORA: Incentivizing Group Emergence Capability in LLMs via Group Distillation

AGORA: Anreize für Gruppenerneuerungsfähigkeit in LLMs durch Gruppendestillation

AGORA:通过集体蒸馏在LLMs中激励群体新兴能力 2507.21166v1

Authors (3): Ren Zhuang, Ben Wang, Shuifa Sun

Progress in complex reasoning is constrained by the static nature of the current training datasets. We propose structured interaction as a new scaling axis, moving beyond the prevailing paradigm of increasing model parameters. Our self-evolving framework, AGORA, enables a collaborative ensemble to achieve reasoning performance exceeding state-of-the-art monolithic systems by up to 4.45 percentage points on challenging mathematical benchmarks. This gain stems from group emergent ability-the synthesis of collective capabilities unattainable by isolated models, validating interaction as a scalable driver of intelligence. Our results position the engineering of collaborative ecosystems as a vital frontier for capability emergence.

nan

Article 969

Title@2025-07-25 (5): OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

Title: OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

OCSVM-geführtes Repräsentationslernen für unüberwachte Anomalieerkennung

OCSVM - 不受监督的异常检测指导代表性学习 2507.21164v1

Authors (2): Nicolas Pinon, Carole Lartizien

Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that tightly couples representation learning with an analytically solvable one-class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a new benchmark based on MNIST-C, and a challenging brain MRI subtle lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and scanner/age variations in MRI. Results demonstrate performance and robustness of our proposed mode,highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning

nan

Article 970

Title@2025-07-25 (5): Component-Based Machine Learning for Indoor Flow and Temperature Fields Prediction Latent Feature Aggregation and Flow Interaction

Title: Component-Based Machine Learning for Indoor Flow and Temperature Fields Prediction Latent Feature Aggregation and Flow Interaction

Komponentenbasiertes maschinelles Lernen für Indoor Flow und Temperaturfelder Vorhersage Latent Feature Aggregation und Flow Interaktion

基于组成部分的室内流动和温度场室内流动和温度场机器学习 2507.19233v1

Authors (3): Shaofan Wang, Nils Thuerey, Philipp Geyer

Accurate and efficient prediction of indoor airflow and temperature distributions is essential for building energy optimization and occupant comfort control. However, traditional CFD simulations are computationally intensive, limiting their integration into real-time or design-iterative workflows. This study proposes a component-based machine learning (CBML) surrogate modeling approach to replace conventional CFD simulation for fast prediction of indoor velocity and temperature fields. The model consists of three neural networks: a convolutional autoencoder with residual connections (CAER) to extract and compress flow features, a multilayer perceptron (MLP) to map inlet velocities to latent representations, and a convolutional neural network (CNN) as an aggregator to combine single-inlet features into dual-inlet scenarios. A two-dimensional room with varying left and right air inlet velocities is used as a benchmark case, with CFD simulations providing training and testing data. Results show that the CBML model accurately and fast predicts two-component aggregated velocity and temperature fields across both training and testing datasets.

nan

Article 971

Title@2025-07-25 (5): Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning

Title: Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning

Verlängerung des Werkzeuglebens: Erlernen eines kompetenzvollen Einsatzes von Allzweck-Werkzeugen durch lebenslanges Stärkungslernen

延长工具寿命:通过终身指导强化学习学习如何熟练使用普通用途工具 2507.17275v2

Authors (4): Po-Yen Wu, Cheng-Yu Kuo, Yuki Kadokawa, Takamitsu Matsubara

In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool’s lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner’s Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.

nan

Article 972

Title@2025-07-25 (5): Pilot Contamination-Aware Graph Attention Network for Power Control in CFmMIMO

Title: Pilot Contamination-Aware Graph Attention Network for Power Control in CFmMIMO

Pilot Contamination-Aware Graph Attention Network for Power Control in CFmMIMO

CFMMIMO 控制电源网络 2506.00967v2

Authors (5): Tingting Zhang, Sergiy A. Vorobyov, David J. Love, Taejoon Kim, Kai Dong

Optimization-based power control algorithms are predominantly iterative with high computational complexity, making them impractical for real-time applications in cell-free massive multiple-input multiple-output (CFmMIMO) systems. Learning-based methods have emerged as a promising alternative, and among them, graph neural networks (GNNs) have demonstrated their excellent performance in solving power control problems. However, all existing GNN-based approaches assume ideal orthogonality among pilot sequences for user equipments (UEs), which is unrealistic given that the number of UEs exceeds the available orthogonal pilot sequences in CFmMIMO schemes. Moreover, most learning-based methods assume a fixed number of UEs, whereas the number of active UEs varies over time in practice. Additionally, supervised training necessitates costly computational resources for computing the target power control solutions for a large volume of training samples. To address these issues, we propose a graph attention network for downlink power control in CFmMIMO systems that operates in a self-supervised manner while effectively handling pilot contamination and adapting to a dynamic number of UEs. Experimental results show its effectiveness, even in comparison to the optimal accelerated projected gradient method as a baseline.

nan

Article 973

Title@2025-07-25 (5): SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology

Title: SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology

SigBERT: Kombination narrativer medizinischer Berichte und rough Path Signature Theory zur Einschätzung des Überlebensrisikos in der Onkologie

SigBERT: 将叙述性医疗报告与肿瘤学生存风险估算的粗路签名理论相结合 2507.22941v1

Authors (5): Paul Minchella, Loïc Verlingue, Stéphane Chrétien, Rémi Vaucher, Guillaume Metzler

Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a LASSO-penalized Cox model to estimate patient-specific risk scores. The model was trained and evaluated on a real-world oncology dataset from the L'eon B'erard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort. SigBERT integrates sequential medical data to enhance risk estimation, advancing narrative-based survival analysis.

nan

Article 974

Title@2025-07-25 (5): Dependency-aware synthetic tabular data generation

Title: Dependency-aware synthetic tabular data generation

Dependency-aware synthetische tabellarische Datengenerierung

依赖意识合成表格数据生成 2507.19211v1

Authors (5): Chaithra Umesh, Kristian Schultz, Manjunath Mahendra, Saptarshi Bej, Olaf Wolkenhauer

Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.

nan

Article 975

Title@2025-07-25 (5): Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems

Title: Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems

Physik-informierte Graph-Neural-Netzwerke für transversale Momentum-Schätzung in CMS-Triggersystemen

CMS 触发系统反向动动动估计物理学综合图形神经网络 2507.19205v1

Authors (5): Md Abrar Jahin, Shahriar Soudeep, M. F. Mridha, Muhammad Mostafa Monowar, Md. Abdul Hamid

Real-time particle transverse momentum ($p_T$) estimation in high-energy physics demands algorithms that are both efficient and accurate under strict hardware constraints. Static machine learning models degrade under high pileup and lack physics-aware optimization, while generic graph neural networks (GNNs) often neglect domain structure critical for robust $p_T$ regression. We propose a physics-informed GNN framework that systematically encodes detector geometry and physical observables through four distinct graph construction strategies that systematically encode detector geometry and physical observables: station-as-node, feature-as-node, bending angle-centric, and pseudorapidity ($\eta$)-centric representations. This framework integrates these tailored graph structures with a novel Message Passing Layer (MPL), featuring intra-message attention and gated updates, and domain-specific loss functions incorporating $p_{T}$-distribution priors. Our co-design methodology yields superior accuracy-efficiency trade-offs compared to existing baselines. Extensive experiments on the CMS Trigger Dataset validate the approach: a station-informed EdgeConv model achieves a state-of-the-art MAE of 0.8525 with $\ge55\%$ fewer parameters than deep learning baselines, especially TabNet, while an $\eta$-centric MPL configuration also demonstrates improved accuracy with comparable efficiency. These results establish the promise of physics-guided GNNs for deployment in resource-constrained trigger systems.

nan

Article 976

Title@2025-07-25 (5): Latent Granular Resynthesis using Neural Audio Codecs

Title: Latent Granular Resynthesis using Neural Audio Codecs

Latent Granular Resynthesis mit neuralen Audio Codecs

使用神经音频编码器进行前端颗粒恢复合成 2507.19202v1

Authors (2): Nao Tokui, Tom Baker

We introduce a novel technique for creative audio resynthesis that operates by reworking the concept of granular synthesis at the latent vector level. Our approach creates a “granular codebook” by encoding a source audio corpus into latent vector segments, then matches each latent grain of a target audio signal to its closest counterpart in the codebook. The resulting hybrid sequence is decoded to produce audio that preserves the target’s temporal structure while adopting the source’s timbral characteristics. This technique requires no model training, works with diverse audio materials, and naturally avoids the discontinuities typical of traditional concatenative synthesis through the codec’s implicit interpolation during decoding. We include supplementary material at https://github.com/naotokui/latentgranular/ , as well as a proof-of-concept implementation to allow users to experiment with their own sounds at https://huggingface.co/spaces/naotokui/latentgranular .

nan

Article 977

Title@2025-07-25 (5): WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design

Title: WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design

WACA-UNet: Schwachheits-Bewusst-Kanal Aufmerksamkeit für statische IR-Drop-Vorhersage im integrierten Schaltungsdesign

WACA-UNet: 综合电路设计中静态IR投射预测的弱敏声道注意 2507.19197v1

Authors (9): Youngmin Seo, Yunhyeong Kwon, Younghun Park, HwiRyong Kim, Seungho Eum, Jinha Kim, Taigon Song, Juho Kim, Unsang Park

Accurate spatial prediction of power integrity issues, such as IR drop, is critical for reliable VLSI design. However, traditional simulation-based solvers are computationally expensive and difficult to scale. We address this challenge by reformulating IR drop estimation as a pixel-wise regression task on heterogeneous multi-channel physical maps derived from circuit layouts. Prior learning-based methods treat all input layers (e.g., metal, via, and current maps) equally, ignoring their varying importance to prediction accuracy. To tackle this, we propose a novel Weakness-Aware Channel Attention (WACA) mechanism, which recursively enhances weak feature channels while suppressing over-dominant ones through a two-stage gating strategy. Integrated into a ConvNeXtV2-based attention U-Net, our approach enables adaptive and balanced feature representation. On the public ICCAD-2023 benchmark, our method outperforms the ICCAD-2023 contest winner by reducing mean absolute error by 61.1% and improving F1-score by 71.0%. These results demonstrate that channel-wise heterogeneity is a key inductive bias in physical layout analysis for VLSI.

nan

Article 978

Title@2025-07-25 (5): Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?

Title: Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?

Kann Small-Scale-Datenvergiftung Dialect-Linked Biases in großen Sprachmodellen exazerbieren?

在大语言模型中,小范围数据中毒加剧分解链接的分界线能否成为大语言模型? 2507.19195v1

Authors (3): Chaymaa Abbas, Mariette Awad, Razane Tajeddine

Despite the ongoing improvements in the design of large language models (LLMs) to foster inclusion and balanced responses, these systems remain susceptible to encoding and amplifying social biases. This study examines how dialectal variation, specifically African American Vernacular English (AAVE) versus Standard American English (SAE), interacts with data poisoning to influence toxicity in outputs. Using both small- and medium-scale LLaMA models, we show that even minimal exposure to poisoned data significantly increases toxicity for AAVE inputs, while it remains comparatively unaffected for SAE. Larger models exhibit a more significant amplification effect which suggests heightened susceptibility with scale. To further assess these disparities, we employed GPT-4o as a fairness auditor, which identified harmful stereotypical patterns disproportionately tied to AAVE inputs, including portrayals of aggression, criminality, and intellectual inferiority. These findings underscore the compounding impact of data poisoning and dialectal bias and emphasize the need for dialect-aware evaluation, targeted debiasing interventions, and socially responsible training protocols during development.

nan

Article 979

Title@2025-07-25 (5): Bespoke multiresolution analysis of graph signals

Title: Bespoke multiresolution analysis of graph signals

Maßgeschneiderte Multiauflösungsanalyse von Graphensignalen

对图形信号进行多分辨率分析 2507.19181v1

Authors (4): Giacomo Elefante, Gianluca Giacchi, Michael Multerer, Jacopo Quizi

We present a novel framework for discrete multiresolution analysis of graph signals. The main analytical tool is the samplet transform, originally defined in the Euclidean framework as a discrete wavelet-like construction, tailored to the analysis of scattered data. The first contribution of this work is defining samplets on graphs. To this end, we subdivide the graph into a fixed number of patches, embed each patch into a Euclidean space, where we construct samplets, and eventually pull the construction back to the graph. This ensures orthogonality, locality, and the vanishing moments property with respect to properly defined polynomial spaces on graphs. Compared to classical Haar wavelets, this framework broadens the class of graph signals that can efficiently be compressed and analyzed. Along this line, we provide a definition of a class of signals that can be compressed using our construction. We support our findings with different examples of signals defined on graphs whose vertices lie on smooth manifolds. For efficient numerical implementation, we combine heavy edge clustering, to partition the graph into meaningful patches, with landmark \texttt{Isomap}, which provides low-dimensional embeddings for each patch. Our results demonstrate the method’s robustness, scalability, and ability to yield sparse representations with controllable approximation error, significantly outperforming traditional Haar wavelet approaches in terms of compression efficiency and multiresolution fidelity.

nan

Article 980

Title@2025-07-25 (5): Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Title: Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Maximale Redundanz-Beschneidung: Eine schichtweise spärliche Zuordnung für LLMs

最大限度的裁员审慎:LLMM 的按原则划分的图层分布 2503.18377v2

Authors (5): Chang Gao, Kang Zhao, Runqi Wang, Jianfei Chen, Liping Jing

Large language models (LLMs) have demonstrated impressive capabilities, but their enormous size poses significant challenges for deployment in real-world applications. To address this issue, researchers have sought to apply network pruning techniques to LLMs. A critical challenge in pruning is allocation the sparsity for each layer. Recent sparsity allocation methods is often based on heuristics or search that can easily lead to suboptimal performance. In this paper, we conducted an extensive investigation into various LLMs and revealed three significant discoveries: (1) the layerwise pruning sensitivity (LPS) of LLMs is highly non-uniform, (2) the choice of pruning metric affects LPS, and (3) the performance of a sparse model is related to the uniformity of its layerwise redundancy level. Based on these observations, we propose that the layerwise sparsity of LLMs should adhere to three principles: \emph{non-uniformity}, \emph{pruning metric dependency}, and \emph{uniform layerwise redundancy level} in the pruned model. To this end, we proposed Maximum Redundancy Pruning (MRP), an iterative pruning algorithm that prunes in the most redundant layers (\emph{i.e.}, those with the highest non-outlier ratio) at each iteration. The achieved layerwise sparsity aligns with the outlined principles. We conducted extensive experiments on publicly available LLMs, including the LLaMA2 and OPT, across various benchmarks. Experimental results validate the effectiveness of MRP, demonstrating its superiority over previous methods.

nan

Article 981

Title@2025-07-25 (5): Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection

Title: Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection

Automatische Cough-Analyse für nicht-kleinzellige Lungenkrebserkennung

非细胞细胞肺癌检测的自动咳嗽分析 2507.19174v1

Authors (10): Chiara Giangregorio, Cristina Maria Licciardello, Vanja Miskovic, Leonardo Provenzano, Alessandra Laura Giulia Pedrocchi, Andra Diana Dumitrascu, Arsela Prelaj, Marina Chiara Garassino, Emilia Ambrosini, Simona Ferrante

Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes, and novel approaches are needed to facilitate early diagnosis. In this study, we explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls. Cough audio recordings were prospectively acquired from a total of 227 subjects, divided into NSCLC patients and healthy controls. The recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost, as well as deep learning approaches, specifically convolutional neural networks (CNN) and transfer learning with VGG16. To enhance the interpretability of the machine learning model, we utilized Shapley Additive Explanations (SHAP). The fairness of the models across demographic groups was assessed by comparing the performance of the best model across different age groups (less than or equal to 58y and higher than 58y) and gender using the equalized odds difference on the test set. The results demonstrate that CNN achieves the best performance, with an accuracy of 0.83 on the test set. Nevertheless, SVM achieves slightly lower performances (accuracy of 0.76 in validation and 0.78 in the test set), making it suitable in contexts with low computational power. The use of SHAP for SVM interpretation further enhances model transparency, making it more trustworthy for clinical applications. Fairness analysis shows slightly higher disparity across age (0.15) than gender (0.09) on the test set. Therefore, to strengthen our findings’ reliability, a larger, more diverse, and unbiased dataset is needed – particularly including individuals at risk of NSCLC and those in early disease stages.

nan

Article 982

Title@2025-07-25 (5): Doubly Regularized Entropic Wasserstein Barycenters

Title: Doubly Regularized Entropic Wasserstein Barycenters

Doppelt regularisierte entropische Wasserstein Barycenter

普通化的 Entropic Wasserstein 巴利中心 2303.11844v2

Authors (1): Lénaïc Chizat

We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport (EOT) costs with respect to a family of given probability measures, plus an entropy term. We denote it $(\lambda,\tau)$-barycenter, where $\lambda$ is the inner regularization strength and $\tau$ the outer one. This formulation recovers several previously proposed EOT barycenters for various choices of $\lambda,\tau \geq 0$ and generalizes them. First, in spite of – and in fact owing to – being \emph{doubly} regularized, we show that our formulation is debiased for $\tau=\lambda/2$: the suboptimality in the (unregularized) Wasserstein barycenter objective is, for smooth densities, of the order of the strength $\lambda^2$ of entropic regularization, instead of $\max{\lambda,\tau}$ in general. We discuss this phenomenon for isotropic Gaussians where all $(\lambda,\tau)$-barycenters have closed form. Second, we show that for $\lambda,\tau>0$, this barycenter has a smooth density and is strongly stable under perturbation of the marginals. In particular, it can be estimated efficiently: given $n$ samples from each of the probability measures, it converges in relative entropy to the population barycenter at a rate $n^{-1/2}$. And finally, this formulation lends itself naturally to a grid-free optimization algorithm: we propose a simple \emph{noisy particle gradient descent} which, in the mean-field limit, converges globally at an exponential rate to the barycenter.

nan

Article 983

Title@2025-07-25 (5): Explainable AI guided unsupervised fault diagnostics for high-voltage circuit breakers

Title: Explainable AI guided unsupervised fault diagnostics for high-voltage circuit breakers

Erklärbare KI-geführte, unbeaufsichtigte Fehlerdiagnose für Hochspannungs-Leistungsschalter

可解释的AI 指导高压断路断路器不受监督的故障诊断 2507.19168v1

Authors (6): Chi-Ching Hsu, Gaëtan Frusque, Florent Forest, Felipe Macedo, Christian M. Franck, Olga Fink

Commercial high-voltage circuit breaker (CB) condition monitoring systems rely on directly observable physical parameters such as gas filling pressure with pre-defined thresholds. While these parameters are crucial, they only cover a small subset of malfunctioning mechanisms and usually can be monitored only if the CB is disconnected from the grid. To facilitate online condition monitoring while CBs remain connected, non-intrusive measurement techniques such as vibration or acoustic signals are necessary. Currently, CB condition monitoring studies using these signals typically utilize supervised methods for fault diagnostics, where ground-truth fault types are known due to artificially introduced faults in laboratory settings. This supervised approach is however not feasible in real-world applications, where fault labels are unavailable. In this work, we propose a novel unsupervised fault detection and segmentation framework for CBs based on vibration and acoustic signals. This framework can detect deviations from the healthy state. The explainable artificial intelligence (XAI) approach is applied to the detected faults for fault diagnostics. The specific contributions are: (1) we propose an integrated unsupervised fault detection and segmentation framework that is capable of detecting faults and clustering different faults with only healthy data required during training (2) we provide an unsupervised explainability-guided fault diagnostics approach using XAI to offer domain experts potential indications of the aged or faulty components, achieving fault diagnostics without the prerequisite of ground-truth fault labels. These contributions are validated using an experimental dataset from a high-voltage CB under healthy and artificially introduced fault conditions, contributing to more reliable CB system operation.

nan

Article 984

Title@2025-07-25 (5): Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them

Title: Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them

Scalpel vs. Hammer: GRPO verstärkt bestehende Fähigkeiten, SFT ersetzt sie

缩略图与锤子:GROPO 放大现有能力,SFT 替换 2507.10616v2

Authors (4): Neel Rajani, Aryo Pradipta Gema, Seraphina Goldfarb-Tarrant, Ivan Titov

Training large language models (LLMs) for reasoning via maths and code datasets has become a major new focus in LLM post-training. Two particularly popular approaches are reinforcement learning (RL) and supervised fine-tuning (SFT), but their training dynamics are poorly understood. We present a comparative analysis of RL and SFT on the same maths problems with the same model and similar hyperparameters. We find that RL yields minor in-domain gains on maths and slight degradation on knowledge-intensive benchmarks like MMLU, while both trends are more pronounced in SFT. We also analyse model parameters across checkpoints, observing that both algorithms modify query and key weights the most. Meanwhile, SFT exhibits greater updates and also affects mid-layer MLPs more, leading us to hypothesise that this may have caused the out-of-domain degradation. We therefore investigate whether freezing parts of the model during training can mitigate the reduced performance on knowledge-intensive benchmarks. However, our results are inconclusive, with benefits on GPQA:Diamond and degradation on other benchmarks. Taken together, our observations provide a preliminary indication for why RL amplifies existing capabilities, while SFT replaces old skills with new ones.

nan

Article 985

Title@2025-07-25 (5): Harnessing intuitive local evolution rules for physical learning

Title: Harnessing intuitive local evolution rules for physical learning

Nutzung intuitiver lokaler Evolutionsregeln für körperliches Lernen

利用自然直觉的地方进化规则进行物理学习 2507.19561v1

Authors (3): Roie Ezraty, Menachem Stern, Shmuel M. Rubinstein

Machine Learning, however popular and accessible, is computationally intensive and highly power-consuming, prompting interest in alternative physical implementations of learning tasks. We introduce a training scheme for physical systems that minimize power dissipation in which only boundary parameters (i.e. inputs and outputs) are externally controlled. Using this scheme, these Boundary-Enabled Adaptive State Tuning Systems (BEASTS) learn by exploiting local physical rules. Our scheme, BEASTAL (BEAST-Adaline), is the closest analog of the Adaline algorithm for such systems. We demonstrate this autonomous learning in silico for regression and classification tasks. Our approach advances previous physical learning schemes by using intuitive, local evolution rules without requiring large-scale memory or complex internal architectures. BEASTAL can perform any linear task, achieving best performance when the local evolution rule is non-linear.

nan

Article 986

Title@2025-07-25 (5): Learnable cut flow for high energy physics

Title: Learnable cut flow for high energy physics

Lernbarer Schnittfluss für die Hochenergiephysik

高能物理可学习的高能物理可削减流量 2503.22498v2

Authors (2): Jing Li, Hao Sun

Neural networks have emerged as a powerful paradigm for tasks in high energy physics, yet their opaque training process renders them as a black box. In contrast, the traditional cut flow method offers simplicity and interpretability but requires extensive manual tuning to identify optimal cut boundaries. To merge the strengths of both approaches, we propose the Learnable Cut Flow (LCF), a neural network that transforms the traditional cut selection into a fully differentiable, data-driven process. LCF implements two cut strategies-parallel, where observable distributions are treated independently, and sequential, where prior cuts shape subsequent ones-to flexibly determine optimal boundaries. Building on this strategy, we introduce the Learnable Importance, a metric that quantifies feature importance and adjusts their contributions to the loss accordingly, offering model-driven insights unlike ad-hoc metrics. To ensure differentiability, a modified loss function replaces hard cuts with mask operations, preserving data shape throughout the training process. LCF is tested on six varied mock datasets and a realistic diboson vs. QCD dataset. Results demonstrate that LCF 1. accurately learns cut boundaries across typical feature distributions in both parallel and sequential strategies, 2. assigns higher importance to discriminative features with minimal overlap, 3. handles redundant or correlated features robustly, and 4. performs effectively in real-world scenarios. In the diboson dataset, LCF initially underperforms boosted decision trees and multiplayer perceptrons when using all observables. However, pruning less critical features-guided by learned importance-boosts its performance to match or exceed these baselines. LCF bridges the gap between traditional cut flow method and modern black-box neural networks, delivering actionable insights into the training process and feature importance.

nan

Article 987

Title@2025-07-25 (5): ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination

Title: ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination

ReCoDe: Verstärktes Learning-basiertes dynamisches Constraint-Design für Multi-Agent-Koordination

ReCode:加强以学习为基础的强化学习,为多机构协调设计动态制约 2507.19151v1

Authors (6): Michael Amir, Guang Yang, Zhan Gao, Keisuke Okumura, Heedo Woo, Amanda Prorok

Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe–Reinforcement-based Constraint Design–a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.

nan

Article 988

Title@2025-07-25 (5): Studying Cross-cluster Modularity in Neural Networks

Title: Studying Cross-cluster Modularity in Neural Networks

Cross-Cluster-Modularität in neuralen Netzwerken studieren

神经网络跨集群模块化研究 2502.02470v3

Authors (5): Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots

An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a “clusterability loss” function that encourages the formation of non-interacting clusters. We then investigate the emerging properties of these highly clustered models. We find our trained clustered models do not exhibit more task specialization, but do form smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and GPT-2 and Pythia on the Wiki dataset, and Gemma on a Chemistry dataset. This investigation shows what to expect from clustered models.

nan

Article 989

Title@2025-07-25 (5): Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Title: Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Beschleunigung multimodaler Großsprachenmodelle über Dynamic Visual-Token Exit und die Empirical Findings

通过动态直视退出和实证结论加速多模式大语言模型 2411.19628v2

Authors (7): Qiong Wu, Wenhao Lin, Yiyi Zhou, Weihao Ye, Zhanpeng Zen, Xiaoshuai Sun, Rongrong Ji

The excessive use of visual tokens in existing Multimoal Large Language Models (MLLMs) often exhibits obvious redundancy and brings in prohibitively expensive computation. To gain insights into this problem, we first conduct extensive empirical studies on the attention behaviors of MLLMs, and summarize three main inference stages in MLLMs: (i) Early fusion between tokens is first accomplished quickly. (ii) Intra-modality modeling then comes to play. (iii) Multimodal reasoning} resumes and lasts until the end of inference. In particular, we reveal that visual tokens will stop contributing to reasoning when the text tokens receive enough image information, yielding obvious visual redundancy. Based on these generalized observations, we propose a simple yet effective method to improve the efficiency of MLLMs, termed dynamic visual-token exit (DyVTE). DyVTE uses lightweight hyper-networks to perceive the text token status and decide the removal of all visual tokens after a certain layer, thereby addressing the observed visual redundancy. To validate VTE, we apply it to a set of MLLMs, including LLaVA, VILA, Eagle and InternVL, and conduct extensive experiments on a bunch of benchmarks. The experiment results not only show the effectiveness of our VTE in improving MLLMs’ efficiency, but also yield the general modeling patterns of MLLMs, well facilitating the in-depth understanding of MLLMs. Our code is released at https://github.com/DoubtedSteam/DyVTE.

nan

Article 990

Title@2025-07-25 (5): Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

Title: Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

Vielfältiges und adaptives Verhalten Curriculum für autonomes Fahren: Ein Schüler-Lehrer-Rahmen mit Multi-Agent RL

自主驾驶的多样化和适应行为课程:学生-教师框架与多代理RL 2507.19146v1

Authors (4): Ahmed Abouelazm, Johannes Ratz, Philip Schörner, J. Marius Zöllner

Autonomous driving faces challenges in navigating complex real-world traffic, requiring safe handling of both common and critical scenarios. Reinforcement learning (RL), a prominent method in end-to-end driving, enables agents to learn through trial and error in simulation. However, RL training often relies on rule-based traffic scenarios, limiting generalization. Additionally, current scenario generation methods focus heavily on critical scenarios, neglecting a balance with routine driving behaviors. Curriculum learning, which progressively trains agents on increasingly complex tasks, is a promising approach to improving the robustness and coverage of RL driving policies. However, existing research mainly emphasizes manually designed curricula, focusing on scenery and actor placement rather than traffic behavior dynamics. This work introduces a novel student-teacher framework for automatic curriculum learning. The teacher, a graph-based multi-agent RL component, adaptively generates traffic behaviors across diverse difficulty levels. An adaptive mechanism adjusts task difficulty based on student performance, ensuring exposure to behaviors ranging from common to critical. The student, though exchangeable, is realized as a deep RL agent with partial observability, reflecting real-world perception constraints. Results demonstrate the teacher’s ability to generate diverse traffic behaviors. The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.

nan

Article 991

Title@2025-07-25 (5): Solar Photovoltaic Assessment with Large Language Model

Title: Solar Photovoltaic Assessment with Large Language Model

Solar-Photovoltaik-Abschätzung mit großem Sprachmodell

采用大语言模式的太阳能光伏评估 2507.19144v1

Authors (2): Muhao Guo, Yang Weng

Accurate detection and localization of solar photovoltaic (PV) panels in satellite imagery is essential for optimizing microgrids and active distribution networks (ADNs), which are critical components of renewable energy systems. Existing methods lack transparency regarding their underlying algorithms or training datasets, rely on large, high-quality PV training data, and struggle to generalize to new geographic regions or varied environmental conditions without extensive re-training. These limitations lead to inconsistent detection outcomes, hindering large-scale deployment and data-driven grid optimization. In this paper, we investigate how large language models (LLMs) can be leveraged to overcome these challenges. Despite their promise, LLMs face several challenges in solar panel detection, including difficulties with multi-step logical processes, inconsistent output formatting, frequent misclassification of visually similar objects (e.g., shadows, parking lots), and low accuracy in complex tasks such as spatial localization and quantification. To overcome these issues, we propose the PV Assessment with LLMs (PVAL) framework, which incorporates task decomposition for more efficient workflows, output standardization for consistent and scalable formatting, few-shot prompting to enhance classification accuracy, and fine-tuning using curated PV datasets with detailed annotations. PVAL ensures transparency, scalability, and adaptability across heterogeneous datasets while minimizing computational overhead. By combining open-source accessibility with robust methodologies, PVAL establishes an automated and reproducible pipeline for solar panel detection, paving the way for large-scale renewable energy integration and optimized grid management.

nan

Article 992

Title@2025-07-25 (5): Game-Theoretic Gradient Control for Robust Neural Network Training

Title: Game-Theoretic Gradient Control for Robust Neural Network Training

Spiel-Theoretische Gradientensteuerung für robustes Neural Network Training

强力神经网络培训游戏- 理论梯度控制 2507.19143v1

Authors (3): Maria Zaitseva, Ivan Tomilov, Natalia Gusarova

Feed-forward neural networks (FFNNs) are vulnerable to input noise, reducing prediction performance. Existing regularization methods like dropout often alter network architecture or overlook neuron interactions. This study aims to enhance FFNN noise robustness by modifying backpropagation, interpreted as a multi-agent game, and exploring controlled target variable noising. Our “gradient dropout” selectively nullifies hidden layer neuron gradients with probability 1 - p during backpropagation, while keeping forward passes active. This is framed within compositional game theory. Additionally, target variables were perturbed with white noise or stable distributions. Experiments on ten diverse tabular datasets show varying impacts: improvement or diminishing of robustness and accuracy, depending on dataset and hyperparameters. Notably, on regression tasks, gradient dropout (p = 0.9) combined with stable distribution target noising significantly increased input noise robustness, evidenced by flatter MSE curves and more stable SMAPE values. These results highlight the method’s potential, underscore the critical role of adaptive parameter tuning, and open new avenues for analyzing neural networks as complex adaptive systems exhibiting emergent behavior within a game-theoretic framework.

nan

Article 993

Title@2025-07-25 (5): Large Language Models as Attribution Regularizers for Efficient Model Training

Title: Large Language Models as Attribution Regularizers for Efficient Model Training

Große Sprachmodelle als Attribution Regularizer für effiziente Modellschulungen

大语言模式,作为高效模式培训的指定正规化机构 2502.20268v3

Authors (3): Davor Vukadin, Marin Šilić, Goran Delač

Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. However, effectively leveraging their vast knowledge for training smaller downstream models remains an open challenge, especially in domains like tabular data learning, where simpler models are often preferred due to interpretability and efficiency. In this paper, we introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Specifically, we propose an attribution-matching regularization term that aligns the training dynamics of the smaller model with the insights provided by the LLM. By doing so, our approach yields superior performance in few-shot learning scenarios. Notably, our method requires only black-box API access to the LLM, making it easy to integrate into existing training pipelines with minimal computational overhead. Furthermore, we demonstrate how this method can be used to address common issues in real-world datasets, such as skewness and bias. By integrating high-level knowledge from LLMs, our approach improves generalization, even when training data is limited or imbalanced. We validate its effectiveness through extensive experiments across multiple tasks, demonstrating improved learning efficiency and model robustness.

nan

Article 994

Title@2025-07-25 (5): Graph Structure Learning with Privacy Guarantees for Open Graph Data

Title: Graph Structure Learning with Privacy Guarantees for Open Graph Data

Graph Structure Learning mit Datenschutzgarantien für offene Graph Data

带有开放图表数据隐私保障的图表结构学习 2507.19116v1

Authors (5): Muhao Guo, Jiaqi Wu, Yang Weng, Yizheng Liao, Shengzhe Chen

Ensuring privacy in large-scale open datasets is increasingly challenging under regulations such as the General Data Protection Regulation (GDPR). While differential privacy (DP) provides strong theoretical guarantees, it primarily focuses on noise injection during model training, neglecting privacy preservation at the data publishing stage. Existing privacy-preserving data publishing (PPDP) approaches struggle to balance privacy and utility, particularly when data publishers and users are distinct entities. To address this gap, we focus on the graph recovery problem and propose a novel privacy-preserving estimation framework for open graph data, leveraging Gaussian DP (GDP) with a structured noise-injection mechanism. Unlike traditional methods that perturb gradients or model updates, our approach ensures unbiased graph structure recovery while enforcing DP at the data publishing stage. Moreover, we provide theoretical guarantees on estimation accuracy and extend our method to discrete-variable graphs, a setting often overlooked in DP research. Experimental results in graph learning demonstrate robust performance, offering a viable solution for privacy-conscious graph analysis.

nan

Article 995

Title@2025-07-25 (5): Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation

Title: Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation

Destillieren eines kleinen Utility-Based Passage Selectors zur Verbesserung der Retrieval-Augmented Generation

蒸馏一个小型以公用事业为基础的通道选择器,以加强回收-提款一代 2507.19102v1

Authors (7): Hengran Zhang, Keping Bi, Jiafeng Guo, Jiaming Zhang, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information. Standard retrieval process prioritized relevance, focusing on topical alignment between queries and passages. In contrast, in RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers. Despite empirical evidence showing the benefits of utility-based retrieval in RAG, the high computational cost of using LLMs for utility judgments limits the number of passages evaluated. This restriction is problematic for complex queries requiring extensive information. To address this, we propose a method to distill the utility judgment capabilities of LLMs into smaller, more efficient models. Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds. We train student models to learn pseudo-answer generation and utility judgments from teacher LLMs, using a sliding window method that dynamically selects useful passages. Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality. We present the distillation results using Qwen3-32B as the teacher model for both relevance ranking and utility-based selection, distilled into RankQwen1.7B and UtilityQwen1.7B. Our findings indicate that for complex questions, utility-based selection is more effective than relevance ranking in enhancing answer generation performance. We will release the relevance ranking and utility-based selection annotations for the MS MARCO dataset, supporting further research in this area.

nan

Article 996

Title@2025-07-25 (5): Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection

Title: Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection

Graph Neuronaler Netzwerk-basierter Vorhersager für eine optimale Quanten-Hardware-Auswahl

优化量子硬件选择的神经网络预测器 2507.19093v1

Authors (4): Antonio Tudisco, Deborah Volpe, Giacomo Orlandi, Giovanna Turvani

The growing variety of quantum hardware technologies, each with unique peculiarities such as connectivity and native gate sets, creates challenges when selecting the best platform for executing a specific quantum circuit. This selection process usually involves a brute-force approach: compiling the circuit on various devices and evaluating performance based on factors such as circuit depth and gate fidelity. However, this method is computationally expensive and does not scale well as the number of available quantum processors increases. In this work, we propose a Graph Neural Network (GNN)-based predictor that automates hardware selection by analyzing the Directed Acyclic Graph (DAG) representation of a quantum circuit. Our study evaluates 498 quantum circuits (up to 27 qubits) from the MQT Bench dataset, compiled using Qiskit on four devices: three superconducting quantum processors (IBM-Kyiv, IBM-Brisbane, IBM-Sherbrooke) and one trapped-ion processor (IONQ-Forte). Performance is estimated using a metric that integrates circuit depth and gate fidelity, resulting in a dataset where 93 circuits are optimally compiled on the trapped-ion device, while the remaining circuits prefer superconducting platforms. By exploiting graph-based machine learning, our approach avoids extracting the circuit features for the model evaluation but directly embeds it as a graph, significantly accelerating the optimal target decision-making process and maintaining all the information. Experimental results prove 94.4% accuracy and an 85.5% F1 score for the minority class, effectively predicting the best compilation target. The developed code is publicly available on GitHub (https://github.com/antotu/GNN-Model-Quantum-Predictor).

nan

Article 997

Title@2025-07-25 (5): Mean flow data assimilation using physics-constrained Graph Neural Networks

Title: Mean flow data assimilation using physics-constrained Graph Neural Networks

Mittlere Durchflussdatenassimilation mittels physikisch bedingter Graph-Neural-Netzwerke

利用受物理学限制的图形神经网络进行平均流量数据同化 2411.09476v3

Authors (4): M. Quattromini, M. A. Bucci, S. Cherubini, O. Semeraro

Despite their widespread use, purely data-driven methods often suffer from overfitting, lack of physical consistency, and high data dependency, particularly when physical constraints are not incorporated. This study introduces a novel data assimilation approach that integrates Graph Neural Networks (GNNs) with optimisation techniques to enhance the accuracy of mean flow reconstruction, using Reynolds-Averaged Navier-Stokes (RANS) equations as a baseline. The method leverages the adjoint approach, incorporating RANS-derived gradients as optimisation terms during GNN training, ensuring that the learned model adheres to physical laws and maintains consistency. Additionally, the GNN framework is well-suited for handling unstructured data, which is common in the complex geometries encountered in Computational Fluid Dynamics (CFD). The GNN is interfaced with the Finite Element Method (FEM) for numerical simulations, enabling accurate modelling in unstructured domains. We consider the reconstruction of mean flow past bluff bodies at low Reynolds numbers as a test case, addressing tasks such as sparse data recovery, denoising, and inpainting of missing flow data. The key strengths of the approach lie in its integration of physical constraints into the GNN training process, leading to accurate predictions with limited data, making it particularly valuable when data are scarce or corrupted. Results demonstrate significant improvements in the accuracy of mean flow reconstructions, even with limited training data, compared to analogous purely data-driven models.

nan

Article 998

Title@2025-07-25 (5): Clustering-Oriented Generative Attribute Graph Imputation

Title: Clustering-Oriented Generative Attribute Graph Imputation

Clustering-oriented generative Attribute Graph Imputation

以集群为主的生成图数 2507.19085v1

Authors (5): Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li

Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.

nan

Article 999

Title@2025-07-25 (5): Ambient Noise Full Waveform Inversion with Neural Operators

Title: Ambient Noise Full Waveform Inversion with Neural Operators

Ambient Noise Full Waveform Inversion mit neuralen Operatoren

使用神经操作器的环境噪音全波形反向 2503.15013v3

Authors (5): Caifeng Zou, Zachary E. Ross, Robert W. Clayton, Fan-Chi Lin, Kamyar Azizzadenesheli

Numerical simulations of seismic wave propagation are crucial for investigating velocity structures and improving seismic hazard assessment. However, standard methods such as finite difference or finite element are computationally expensive. Recent studies have shown that a new class of machine learning models, called neural operators, can solve the elastodynamic wave equation orders of magnitude faster than conventional methods. Full waveform inversion is a prime beneficiary of the accelerated simulations. Neural operators, as end-to-end differentiable operators, combined with automatic differentiation, provide an alternative approach to the adjoint-state method. State-of-the-art optimization techniques built into PyTorch provide neural operators with greater flexibility to improve the optimization dynamics of full waveform inversion, thereby mitigating cycle-skipping problems. In this study, we demonstrate the first application of neural operators for full waveform inversion on a real seismic dataset, which consists of several nodal transects collected across the San Gabriel, Chino, and San Bernardino basins in the Los Angeles metropolitan area.

nan

Article 1000

Title@2025-07-25 (5): A self-supervised neural-analytic method to predict the evolution of COVID-19 in Romania

Title: A self-supervised neural-analytic method to predict the evolution of COVID-19 in Romania

Eine selbstüberwachte neural-analytische Methode zur Vorhersage der Entwicklung von COVID-19 in Rumänien

一种自我监督的神经分析方法,用以预测罗马尼亚COVID-19的演变 2006.12926v3

Authors (5): Radu D. Stochiţoiu, Marian Petrica, Traian Rebedea, Ionel Popescu, Marius Leordeanu

Analysing and understanding the transmission and evolution of the COVID-19 pandemic is mandatory to be able to design the best social and medical policies, foresee their outcomes and deal with all the subsequent socio-economic effects. We address this important problem from a computational and machine learning perspective. More specifically, we want to statistically estimate all the relevant parameters for the new coronavirus COVID-19, such as the reproduction number, fatality rate or length of infectiousness period, based on Romanian patients, as well as be able to predict future outcomes. This endeavor is important, since it is well known that these factors vary across the globe, and might be dependent on many causes, including social, medical, age and genetic factors. We use a recently published improved version of SEIR, which is the classic, established model for infectious diseases. We want to infer all the parameters of the model, which govern the evolution of the pandemic in Romania, based on the only reliable, true measurement, which is the number of deaths. Once the model parameters are estimated, we are able to predict all the other relevant measures, such as the number of exposed and infectious people. To this end, we propose a self-supervised approach to train a deep convolutional network to guess the correct set of Modified-SEIR model parameters, given the observed number of daily fatalities. Then, we refine the solution with a stochastic coordinate descent approach. We compare our deep learning optimization scheme with the classic grid search approach and show great improvement in both computational time and prediction accuracy. We find an optimistic result in the case fatality rate for Romania which may be around 0.3% and we also demonstrate that our model is able to correctly predict the number of daily fatalities for up to three weeks in the future.

nan

Article 1001

Title@2025-07-25 (5): ToolACE: Winning the Points of LLM Function Calling

Title: ToolACE: Winning the Points of LLM Function Calling

ToolACE: Die Punkte des LLM-Funktionsaufrufs gewinnen

工具ACE:赢得LLLM函数调用点 2409.00920v2

Authors (27): Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

nan

Article 1002

Title@2025-07-25 (5): Towards Sustainability Model Cards

Title: Towards Sustainability Model Cards

Auf dem Weg zu Nachhaltigkeitsmodellkarten

走向可持续性示范卡 2507.19559v1

Authors (2): Gwendal Jouneaux, Jordi Cabot

The growth of machine learning (ML) models and associated datasets triggers a consequent dramatic increase in energy costs for the use and training of these models. In the current context of environmental awareness and global sustainability concerns involving ICT, Green AI is becoming an important research topic. Initiatives like the AI Energy Score Ratings are a good example. Nevertheless, these benchmarking attempts are still to be integrated with existing work on Quality Models and Service-Level Agreements common in other, more mature, ICT subfields. This limits the (automatic) analysis of this model energy descriptions and their use in (semi)automatic model comparison, selection, and certification processes. We aim to leverage the concept of quality models and merge it with existing ML model reporting initiatives and Green/Frugal AI proposals to formalize a Sustainable Quality Model for AI/ML models. As a first step, we propose a new Domain-Specific Language to precisely define the sustainability aspects of an ML model (including the energy costs for its different tasks). This information can then be exported as an extended version of the well-known Model Cards initiative while, at the same time, being formal enough to be input of any other model description automatic process.

nan

Article 1003

Title@2025-07-25 (5): XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

Title: XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

XAI4LLM. Lassen Sie Modelle für maschinelles Lernen und LLMs für verbessertes In-Context-Lernen im Gesundheitswesen zusammenarbeiten

XAI4LLLM. 让机器学习模式和LLM合作促进保健领域加强内文学习 2405.06270v4

Authors (4): Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia, Eugenio di Sciascio

Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies. We systematically evaluate this method across seventy distinct ICL designs by various prompt variations and two different communication styles-natural-language narrative and numeric conversational-and compare its performance to robust classical machine learning (ML) benchmarks on tasks involving heart disease and diabetes prediction. Our findings indicate that while traditional ML models maintain superior performance in balanced precision-recall scenarios, LLMs employing narrative prompts with integrated domain knowledge achieve higher recall and significantly reduce gender bias, effectively narrowing fairness disparities by an order of magnitude. Despite the current limitation of increased inference latency, LLMs provide notable advantages, including the capacity for zero-shot deployment and enhanced equity. This research offers the first comprehensive analysis of ICL design considerations for applying LLMs to tabular clinical tasks and highlights distillation and multimodal extensions as promising directions for future research.

nan

Article 1004

Title@2025-07-25 (5): Generating Adversarial Point Clouds Using Diffusion Model

Title: Generating Adversarial Point Clouds Using Diffusion Model

Erstellen von Adversarial Point Clouds mit Diffusionsmodell

使用扩散模型生成反向点云 2507.21163v1

Authors (5): Ruiyang Zhao, Bingbing Zhu, Chuxuan Tong, Xiaoyi Zhou, Xi Zheng

Adversarial attack methods for 3D point cloud classification reveal the vulnerabilities of point cloud recognition models. This vulnerability could lead to safety risks in critical applications that use deep learning models, such as autonomous vehicles. To uncover the deficiencies of these models, researchers can evaluate their security through adversarial attacks. However, most existing adversarial attack methods are based on white-box attacks. While these methods achieve high attack success rates and imperceptibility, their applicability in real-world scenarios is limited. Black-box attacks, which are more meaningful in real-world scenarios, often yield poor results. This paper proposes a novel black-box adversarial example generation method that utilizes a diffusion model to improve the attack success rate and imperceptibility in the black-box setting, without relying on the internal information of the point cloud classification model to generate adversarial samples. We use a 3D diffusion model to use the compressed features of the point cloud as prior knowledge to guide the reverse diffusion process to add adversarial points to clean examples. Subsequently, its reverse process is employed to transform the distribution of other categories into adversarial points, which are then added to the point cloud.

nan

Article 1005

Title@2025-07-25 (5): Exploring molecular assembly as a biosignature using mass spectrometry and machine learning

Title: Exploring molecular assembly as a biosignature using mass spectrometry and machine learning

Erforschung der molekularen Montage als Biosignatur mittels Massenspektrometrie und maschinellem Lernen

利用质量光谱测量和机器学习,探索分子组装作为一种生物签字 2507.19057v1

Authors (6): Lindsay A. Rutter, Abhishek Sharma, Ian Seet, David Obeh Alobo, An Goto, Leroy Cronin

Molecular assembly offers a promising path to detect life beyond Earth, while minimizing assumptions based on terrestrial life. As mass spectrometers will be central to upcoming Solar System missions, predicting molecular assembly from their data without needing to elucidate unknown structures will be essential for unbiased life detection. An ideal agnostic biosignature must be interpretable and experimentally measurable. Here, we show that molecular assembly, a recently developed approach to measure objects that have been produced by evolution, satisfies both criteria. First, it is interpretable for life detection, as it reflects the assembly of molecules with their bonds as building blocks, in contrast to approaches that discount construction history. Second, it can be determined without structural elucidation, as it can be physically measured by mass spectrometry, a property that distinguishes it from other approaches that use structure-based information measures for molecular complexity. Whilst molecular assembly is directly measurable using mass spectrometry data, there are limits imposed by mission constraints. To address this, we developed a machine learning model that predicts molecular assembly with high accuracy, reducing error by three-fold compared to baseline models. Simulated data shows that even small instrumental inconsistencies can double model error, emphasizing the need for standardization. These results suggest that standardized mass spectrometry databases could enable accurate molecular assembly prediction, without structural elucidation, providing a proof-of-concept for future astrobiology missions.

nan

Article 1006

Title@2025-07-25 (5): Closing the Modality Gap for Mixed Modality Search

Title: Closing the Modality Gap for Mixed Modality Search

Schließen der Modalitätslücke für gemischte Modalitätssuche

缩小混合方式搜索模式差距 2507.19054v1

Authors (6): Binxu Li, Yuhui Zhang, Xiaohan Wang, Weixin Liang, Ludwig Schmidt, Serena Yeung-Levy

Mixed modality search – retrieving information across a heterogeneous corpus composed of images, texts, and multimodal documents – is an important yet underexplored real-world application. In this work, we investigate how contrastive vision-language models, such as CLIP, perform on the mixed modality search task. Our analysis reveals a critical limitation: these models exhibit a pronounced modality gap in the embedding space, where image and text embeddings form distinct clusters, leading to intra-modal ranking bias and inter-modal fusion failure. To address this issue, we propose GR-CLIP, a lightweight post-hoc calibration method that removes the modality gap in CLIP’s embedding space. Evaluated on MixBench – the first benchmark specifically designed for mixed modality search – GR-CLIP improves NDCG@10 by up to 26 percentage points over CLIP, surpasses recent vision-language generative embedding models by 4 percentage points, while using 75x less compute.

nan

Article 1007

Title@2025-07-25 (5): Dynamics-Informed Reservoir Computing with Visibility Graphs

Title: Dynamics-Informed Reservoir Computing with Visibility Graphs

Dynamisch-informiertes Reservoir Computing mit Sichtbarkeitsgraphen

具有可见度图的动态化储量计算 2507.19046v1

Authors (2): Charlotte Geier, Merten Stender

Accurate prediction of complex and nonlinear time series remains a challenging problem across engineering and scientific disciplines. Reservoir computing (RC) offers a computationally efficient alternative to traditional deep learning by training only the read-out layer while employing a randomly structured and fixed reservoir network. Despite its advantages, the largely random reservoir graph architecture often results in suboptimal and oversized networks with poorly understood dynamics. Addressing this issue, we propose a novel Dynamics-Informed Reservoir Computing (DyRC) framework that systematically infers the reservoir network structure directly from the input training sequence. This work proposes to employ the visibility graph (VG) technique, which converts time series data into networks by representing measurement points as nodes linked by mutual visibility. The reservoir network is constructed by directly adopting the VG network from a training data sequence, leveraging the parameter-free visibility graph approach to avoid expensive hyperparameter tuning. This process results in a reservoir that is directly informed by the specific dynamics of the prediction task under study. We assess the DyRC-VG method through prediction tasks involving the canonical nonlinear Duffing oscillator, evaluating prediction accuracy and consistency. Compared to an Erd\H{o}s-R'enyi graph of the same size, spectral radius, and comparable density, we observe higher prediction quality and more consistent performance over repeated implementations in the DyRC-VG.

nan

Article 1008

Title@2025-07-25 (5): Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems

Title: Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems

Großes Sprachmodell Automatisierte Modellierung und Optimierung von Netzwerk-Dispatch-Problemen

大型语文示范电动自动建模和优化主动分发网络调度问题 2507.21162v1

Authors (7): Xu Yang, Chenhui Lin, Yue Yang, Qi Wang, Haotian Liu, Haizhou Hua, Wenchuan Wu

The increasing penetration of distributed energy resources into active distribution networks (ADNs) has made effective ADN dispatch imperative. However, the numerous newly-integrated ADN operators, such as distribution system aggregators, virtual power plant managers, and end prosumers, often lack specialized expertise in power system operation, modeling, optimization, and programming. This knowledge gap renders reliance on human experts both costly and time-intensive. To address this challenge and enable intelligent, flexible ADN dispatch, this paper proposes a large language model (LLM) powered automated modeling and optimization approach. First, the ADN dispatch problems are decomposed into sequential stages, and a multi-LLM coordination architecture is designed. This framework comprises an Information Extractor, a Problem Formulator, and a Code Programmer, tasked with information retrieval, optimization problem formulation, and code implementation, respectively. Afterwards, tailored refinement techniques are developed for each LLM agent, greatly improving the accuracy and reliability of generated content. The proposed approach features a user-centric interface that enables ADN operators to derive dispatch strategies via simple natural language queries, eliminating technical barriers and increasing efficiency. Comprehensive comparisons and end-to-end demonstrations on various test cases validate the effectiveness of the proposed architecture and methods.

nan

Article 1009

Title@2025-07-25 (5): Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations

Title: Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations

Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations

学习和外推系统动态的横跨两结构的神经普通差异和外推系统动态 2507.19036v1

Authors (4): Eva van Tegelen, George van Voorn, Ioannis Athanasiadis, Peter van Heijster

Forecasting system behaviour near and across bifurcations is crucial for identifying potential shifts in dynamical systems. While machine learning has recently been used to learn critical transitions and bifurcation structures from data, most studies remain limited as they exclusively focus on discrete-time methods and local bifurcations. To address these limitations, we use Neural Ordinary Differential Equations which provide a continuous, data-driven framework for learning system dynamics. We apply our approach to a predator-prey system that features both local and global bifurcations, presenting a challenging test case. Our results show that Neural Ordinary Differential Equations can recover underlying bifurcation structures directly from timeseries data by learning parameter-dependent vector fields. Notably, we demonstrate that Neural Ordinary Differential Equations can forecast bifurcations even beyond the parameter regions represented in the training data. We also assess the method’s performance under limited and noisy data conditions, finding that model accuracy depends more on the quality of information that can be inferred from the training data, than on the amount of data available.

nan

Article 1010

Title@2025-07-25 (5): ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs

Title: ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs

ProGMLP: Progressive Rahmenbedingungen für die GNN-to-MLP-Wissensdestillation mit effizienten Trade-offs

ProGMLP:全球NN-MLP知识提炼与有效取舍的渐进框架 2507.19031v1

Authors (8): Weigang Lu, Ziyu Guan, Wei Zhao, Yaming Yang, Yujie Sun, Zheng Liang, Yibing Zhan, Dapeng Tao

GNN-to-MLP (G2M) methods have emerged as a promising approach to accelerate Graph Neural Networks (GNNs) by distilling their knowledge into simpler Multi-Layer Perceptrons (MLPs). These methods bridge the gap between the expressive power of GNNs and the computational efficiency of MLPs, making them well-suited for resource-constrained environments. However, existing G2M methods are limited by their inability to flexibly adjust inference cost and accuracy dynamically, a critical requirement for real-world applications where computational resources and time constraints can vary significantly. To address this, we introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge distillation (ProGMLP). ProGMLP employs a Progressive Training Structure (PTS), where multiple MLP students are trained in sequence, each building on the previous one. Furthermore, ProGMLP incorporates Progressive Knowledge Distillation (PKD) to iteratively refine the distillation process from GNNs to MLPs, and Progressive Mixup Augmentation (PMA) to enhance generalization by progressively generating harder mixed samples. Our approach is validated through comprehensive experiments on eight real-world graph datasets, demonstrating that ProGMLP maintains high accuracy while dynamically adapting to varying runtime scenarios, making it highly effective for deployment in diverse application settings.

nan

Article 1011

Title@2025-07-25 (5): Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

Title: Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

Stella Nera: Ein differenzierter Maddness-basierter Hardware-Beschleuniger für eine effiziente, annähernde Matrix-Multiplikation

Stella Nera: 高效近光矩阵乘法的有区别的基于 Maddness 的硬件加速器 2311.10207v2

Authors (5): Jannis Schönleber, Lukas Cavigelli, Matteo Perotti, Luca Benini, Renzo Andri

Artificial intelligence has surged in recent years, with advancements in machine learning rapidly impacting nearly every area of life. However, the growing complexity of these models has far outpaced advancements in available hardware accelerators, leading to significant computational and energy demands, primarily due to matrix multiplications, which dominate the compute workload. Maddness (i.e., Multiply-ADDitioN-lESS) presents a hash-based version of product quantization, which renders matrix multiplications into lookups and additions, eliminating the need for multipliers entirely. We present Stella Nera, the first Maddness-based accelerator achieving an energy efficiency of 161 TOp/s/W@0.55V, 25x better than conventional MatMul accelerators due to its small components and reduced computational complexity. We further enhance Maddness with a differentiable approximation, allowing for gradient-based fine-tuning and achieving an end-to-end performance of 92.5% Top-1 accuracy on CIFAR-10.

nan

Article 1012

Title@2025-07-25 (5): Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

Title: Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

Jenseits von Rahmen: Zero-Shot Fußgänger Absichtsvorhersage mit Raw Temporal Video und multimodalen Queues

环视框架之外:用原始时光视频和多模式结壳进行零点热热热热食人故意预测 2507.21161v1

Authors (3): Pallavi Zambare, Venkata Nikhil Thanikella, Ying Liu

Pedestrian intention prediction is essential for autonomous driving in complex urban environments. Conventional approaches depend on supervised learning over frame sequences and require extensive retraining to adapt to new scenarios. Here, we introduce BF-PIP (Beyond Frames Pedestrian Intention Prediction), a zero-shot approach built upon Gemini 2.5 Pro. It infers crossing intentions directly from short, continuous video clips enriched with structured JAAD metadata. In contrast to GPT-4V based methods that operate on discrete frames, BF-PIP processes uninterrupted temporal clips. It also incorporates bounding-box annotations and ego-vehicle speed via specialized multimodal prompts. Without any additional training, BF-PIP achieves 73% prediction accuracy, outperforming a GPT-4V baseline by 18 %. These findings illustrate that combining temporal video inputs with contextual cues enhances spatiotemporal perception and improves intent inference under ambiguous conditions. This approach paves the way for agile, retraining-free perception module in intelligent transportation system.

nan

Article 1013

Title@2025-07-25 (5): MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

Title: MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

MindSpeed RL: Distributed Dataflow für skalierbare und effiziente RL-Schulungen auf Ascend NPU Cluster

MindSpeed RL: 用于对 Ascend NPU 群集进行可缩放和高效 RL 培训的分布式数据流 2507.19017v1

Authors (14): Laingjun Feng, Chenyi Pan, Xinjie Guo, Fei Mei, Benzhe Ning, Jianxiang Zhang, Xinyang Liu, Beirong Zhou, Zeng Shu, Chang Liu, Guang Yang, Zhenyu Han, Jiangben Wang, Bo Wang

Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical allgather–swap strategy is presented to eliminate redundant memory usage in resharding flow. In addition, MindSpeed RL further integrates numerous parallelization strategies and acceleration techniques for systematic optimization. Compared with existing state-of-the-art systems, comprehensive experiments on the RL training of popular Qwen2.5-Dense-7B/32B, Qwen3-MoE-30B, and DeepSeek-R1-MoE-671B show that MindSpeed RL increases the throughput by 1.42 ~ 3.97 times. Finally, we open–source MindSpeed RL and perform all the experiments on a super pod of Ascend with 384 neural processing units (NPUs) to demonstrate the powerful performance and reliability of Ascend.

nan

Article 1014

Title@2025-07-25 (5): Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Title: Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Causal Mechanism Abschätzung in Multi-Sensor-Systemen über mehrere Domains

跨多域多传感器系统中因果机制估算 2507.17792v2

Authors (3): Jingyi Yu, Tim Pychynski, Marco F. Huber

To gain deeper insights into a complex sensor system through the lens of causality, we present common and individual causal mechanism estimation (CICME), a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains. By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples. The identified common causal mechanisms are further used to guide the estimation of the remaining causal mechanisms in each domain individually. The performance of CICME is evaluated on linear Gaussian models under scenarios inspired from a manufacturing process. Building upon existing continuous optimization-based causal discovery methods, we show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.

nan

Article 1015

Title@2025-07-25 (5): A diffusion-based generative model for financial time series via geometric Brownian motion

Title: A diffusion-based generative model for financial time series via geometric Brownian motion

Ein diffusionsbasiertes generatives Modell für finanzielle Zeitreihen über geometrische Brownsche Bewegung

通过几何布朗运动的金融时间序列基于扩散的遗传模型 2507.19003v1

Authors (3): Gihun Kim, Sun-Yong Choi, Yeoneung Kim

We propose a novel diffusion-based generative framework for financial time series that incorporates geometric Brownian motion (GBM), the foundation of the Black–Scholes theory, into the forward noising process. Unlike standard score-based models that treat price trajectories as generic numerical sequences, our method injects noise proportionally to asset prices at each time step, reflecting the heteroskedasticity observed in financial time series. By accurately balancing the drift and diffusion terms, we show that the resulting log-price process reduces to a variance-exploding stochastic differential equation, aligning with the formulation in score-based generative models. The reverse-time generative process is trained via denoising score matching using a Transformer-based architecture adapted from the Conditional Score-based Diffusion Imputation (CSDI) framework. Empirical evaluations on historical stock data demonstrate that our model reproduces key stylized facts heavy-tailed return distributions, volatility clustering, and the leverage effect more realistically than conventional diffusion models.

nan

Article 1016

Title@2025-07-25 (5): Adapting to Fragmented and Evolving Data: A Fisher Information Perspective

Title: Adapting to Fragmented and Evolving Data: A Fisher Information Perspective

Anpassung an zersplitterte und sich entwickelnde Daten: Ein Blick auf die Fischer

适应零碎和不断演变的数据:渔业信息视角 2507.18996v1

Authors (3): Behraj Khan, Tahir Qasim Syed, Nouman Muhammad Durrani

Modern machine learning systems operating in dynamic environments often face \textit{sequential covariate shift} (SCS), where input distributions evolve over time while the conditional distribution remains stable. We introduce FADE (Fisher-based Adaptation to Dynamic Environments), a lightweight and theoretically grounded framework for robust learning under SCS. FADE employs a shift-aware regularization mechanism anchored in Fisher information geometry, guiding adaptation by modulating parameter updates based on sensitivity and stability. To detect significant distribution changes, we propose a Cramer-Rao-informed shift signal that integrates KL divergence with temporal Fisher dynamics. Unlike prior methods requiring task boundaries, target supervision, or experience replay, FADE operates online with fixed memory and no access to target labels. Evaluated on seven benchmarks spanning vision, language, and tabular data, FADE achieves up to 19\% higher accuracy under severe shifts, outperforming methods such as TENT and DIW. FADE also generalizes naturally to federated learning by treating heterogeneous clients as temporally fragmented environments, enabling scalable and stable adaptation in decentralized settings. Theoretical analysis guarantees bounded regret and parameter consistency, while empirical results demonstrate FADE’s robustness across modalities and shift intensities.

nan

Article 1017

Title@2025-07-25 (5): Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Title: Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen

交互式组合式动态规划:有效利用多种移动式数据集进行强力规划 2507.04790v3

Authors (5): Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon

Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.

nan

Article 1018

Title@2025-07-25 (5): Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations

Title: Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations

Agent0: LLM-Agenten nutzen, um Multi-Value-Features aus Text für erweiterte Empfehlungen zu entdecken

Ar0: 利用LLM代理器从强化建议文本中发现多价值特性 2507.18993v1

Authors (3): Blaž Škrlj, Benoît Guilleminot, Andraž Tori

Large language models (LLMs) and their associated agent-based frameworks have significantly advanced automated information extraction, a critical component of modern recommender systems. While these multitask frameworks are widely used in code generation, their application in data-centric research is still largely untapped. This paper presents Agent0, an LLM-driven, agent-based system designed to automate information extraction and feature construction from raw, unstructured text. Categorical features are crucial for large-scale recommender systems but are often expensive to acquire. Agent0 coordinates a group of interacting LLM agents to automatically identify the most valuable text aspects for subsequent tasks (such as models or AutoML pipelines). Beyond its feature engineering capabilities, Agent0 also offers an automated prompt-engineering tuning method that utilizes dynamic feedback loops from an oracle. Our findings demonstrate that this closed-loop methodology is both practical and effective for automated feature discovery, which is recognized as one of the most challenging phases in current recommender system development.

nan

Article 1019

Title@2025-07-25 (5): Reinforcement Learning via Conservative Agent for Environments with Random Delays

Title: Reinforcement Learning via Conservative Agent for Environments with Random Delays

Verstärktes Lernen über Conservative Agent for Environments mit zufälligen Verzögerungen

通过 “ 随机延缓环境保守代理 “ 强化学习 2507.18992v1

Authors (4): Jongsoo Lee, Jangwon Kim, Jiseok Jeong, Soohee Han

Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical results demonstrate that it significantly outperforms existing baseline algorithms in terms of asymptotic performance and sample efficiency.

nan

Article 1020

Title@2025-07-25 (5): GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units

Title: GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units

GENIAL: Generative Design Space Exploration über Netzwerk-Inversion für stromarme algorithmische Logische Einheiten

GENIAL:通过网络转换生成设计空间探索,用于低功率测算仪 2507.18989v1

Authors (5): Maxence Bouvier, Ryan Amaudruz, Felix Arnold, Renzo Andri, Lukas Cavigelli

As AI workloads proliferate, optimizing arithmetic units is becoming increasingly important to reduce the footprint of digital systems. Conventional design flows, which often rely on manual or heuristics-based optimization, are limited in their ability to thoroughly explore the vast design space. In this paper, we introduce GENIAL, a machine learning-based framework for the automatic generation and optimization of arithmetic units, more specifically multipliers. At the core of GENIAL is a Transformer-based surrogate model trained in two stages, involving self-supervised pretraining followed by supervised finetuning, to robustly forecast key hardware metrics such as power and area from abstracted design representations. By inverting the surrogate model, GENIAL efficiently searches for new operand encodings that directly minimize power consumption in arithmetic units for specific input data distributions. Extensive experiments on large datasets demonstrate that GENIAL is consistently more sample efficient than other methods, and converges faster towards optimized designs. This enables to deploy a high-effort logic synthesis optimization flow in the loop, improving the accuracy of the surrogate model. Notably, GENIAL automatically discovers encodings that achieve up to 18% switching activity savings within multipliers on representative AI workloads compared with the conventional two’s complement. We also demonstrate the versatility of our approach by achieving significant improvements on Finite State Machines, highlighting GENIAL’s applicability for a wide spectrum of logic functions. Together, these advances mark a significant step toward automated Quality-of-Results-optimized combinational circuit generation for digital systems.

nan

Article 1021

Title@2025-07-25 (5): CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

CLIP-geführte Backdoor-Verteidigung durch Entropie-basierte vergiftete Datensatztrennung

CLIP-通过基于英基中毒数据集的分离来引导后门防御 2507.05113v2

Authors (5): Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP’s logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD’s exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

nan

Article 1022

Title@2025-07-25 (5): Verbalized Representation Learning for Interpretable Few-Shot Generalization

Title: Verbalized Representation Learning for Interpretable Few-Shot Generalization

Verbalisiertes Repräsentationslernen für verdolmetschbare wenige-heiße Verallgemeinerung

以口头方式进行代表性学习,为可口译的少或偏的普及化提供口译 2411.18651v2

Authors (6): Cheng-Fu Yang, Da Yin, Wenbo Hu, Nanyun Peng, Bolei Zhou, Kai-Wei Chang

Humans recognize objects after observing only a few examples, a remarkable capability enabled by their inherent language understanding of the real-world environment. Developing verbalized and interpretable representation can significantly improve model generalization in low-data settings. In this work, we propose Verbalized Representation Learning (VRL), a novel approach for automatically extracting human-interpretable features for object recognition using few-shot data. Our method uniquely captures inter-class differences and intra-class commonalities in the form of natural language by employing a Vision-Language Model (VLM) to identify key discriminative features between different classes and shared characteristics within the same class. These verbalized features are then mapped to numeric vectors through the VLM. The resulting feature vectors can be further utilized to train and infer with downstream classifiers. Experimental results show that, at the same model scale, VRL achieves a 24% absolute improvement over prior state-of-the-art methods while using 95% less data and a smaller mode. Furthermore, compared to human-labeled attributes, the features learned by VRL exhibit a 20% absolute gain when used for downstream classification tasks. Code is available at: https://github.com/joeyy5588/VRL/tree/main.

nan

Article 1023

Title@2025-07-25 (5): Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model

Title: Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model

使用机械学习模型和有不同前科的贝叶斯神经网络的有区别的甲状腺癌症重复发生分类:以SHAP为基础对最佳性能模型进行解释 2507.18987v1

Authors (3): HMNS Kumari, HMLS Kumari, UMMPK Nawarathne

Differentiated thyroid cancer DTC recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty aware. This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning ML models were employed using the complete dataset, where the Support Vector Machines SVM model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression LR model obtained the maximum accuracy of 0.9611. However, these ML models often lack uncertainty quantification, which is critical in clinical decision making. Therefore, to address this limitation, the Bayesian Neural Networks BNN with six varying prior distributions, including Normal 0,1, Normal 0,10, Laplace 0,1, Cauchy 0,1, Cauchy 0,2.5, and Horseshoe 1, were implemented on both the complete and reduced datasets. The BNN model with Normal 0,10 prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively.

nan

Article 1024

Title@2025-07-25 (5): KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes

Title: KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes

KASPER: Kolmogorov Arnold Netzwerke für Stock Prediction und erklärbare Regimes

KASPER: Kolmogorov Arnold 股票预测和可解释制度网络 2507.18983v1

Authors (5): Vidhi Oad, Param Pathak, Nouhaila Innan, Shalini D, Muhammad Shafique

Forecasting in financial markets remains a significant challenge due to their nonlinear and regime-dependent dynamics. Traditional deep learning models, such as long short-term memory networks and multilayer perceptrons, often struggle to generalize across shifting market conditions, highlighting the need for a more adaptive and interpretable approach. To address this, we introduce Kolmogorov-Arnold networks for stock prediction and explainable regimes (KASPER), a novel framework that integrates regime detection, sparse spline-based function modeling, and symbolic rule extraction. The framework identifies hidden market conditions using a Gumbel-Softmax-based mechanism, enabling regime-specific forecasting. For each regime, it employs Kolmogorov-Arnold networks with sparse spline activations to capture intricate price behaviors while maintaining robustness. Interpretability is achieved through symbolic learning based on Monte Carlo Shapley values, which extracts human-readable rules tailored to each regime. Applied to real-world financial time series from Yahoo Finance, the model achieves an $R^2$ score of 0.89, a Sharpe Ratio of 12.02, and a mean squared error as low as 0.0001, outperforming existing methods. This research establishes a new direction for regime-aware, transparent, and robust forecasting in financial markets.

nan

Article 1025

Title@2025-07-25 (5): Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies

Title: Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies

Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken

混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策 2403.10996v7

Authors (3): Chinmay Vilas Samak, Tanmay Vilas Samak, Venkat Narayan Krovi

Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality (MR) digital twin (DT) framework capable of: (i) boosting training speeds by selectively scaling parallelized simulation workloads on-demand, and (ii) immersing the MARL policies across hybrid simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer, across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.

nan

Article 1026

Title@2025-07-25 (5): Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Title: Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Faire Algorithmen mit Probing für Multi-Agent Multi-Armed Bandits

多代理多武装强盗验证法的公允算法 2506.14988v3

Authors (4): Tianyi Xu, Jiaxin Liu, Nicholas Mattei, Zizhan Zheng

We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.

nan

Article 1027

Title@2025-07-25 (5): Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Title: Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Konzept-TRAK: Verstehen, wie Diffusionsmodelle Konzepte durch Konzept-Level-Attribution lernen

概念-TRAK:了解传播模式如何通过概念层面的归属来学习概念 2507.06547v2

Authors (10): Yonghyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji

While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that matter most to stakeholders. To bridge this gap, we introduce \emph{concept-level attribution} via a novel method called \emph{Concept-TRAK}. Concept-TRAK extends influence functions with two key innovations: (1) a reformulated diffusion training loss based on diffusion posterior sampling, enabling robust, sample-specific attribution; and (2) a concept-aware reward function that emphasizes semantic relevance. We evaluate Concept-TRAK on the AbC benchmark, showing substantial improvements over prior methods. Through diverse case studies–ranging from identifying IP-protected and unsafe content to analyzing prompt engineering and compositional learning–we demonstrate how concept-level attribution yields actionable insights for responsible generative AI development and governance.

nan

Article 1028

Title@2025-07-25 (5): Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling

Title: Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling

Auf dem Weg zur Verbesserung der Langzeitprognosen von Entity in zeitlichen Wissensgraphen durch globale Ähnlichkeiten und gewichtete Stichproben

通过全球相似性和加权抽样改进时间知识图中的长期审计实体预测 2507.18977v1

Authors (7): Mehrnoosh Mirtaheri, Ryan A. Rossi, Sungchul Kim, Kanak Mahadik, Tong Yu, Xiang Chen, Mohammad Rostami

Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training. This overlooks challenges stemming from the evolving nature of TKGs, such as: (i) the model’s requirement to generalize and assimilate new knowledge, and (ii) the task of managing new or unseen entities that often have sparse connections. In this paper, we present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections. Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method. The enhancement layer leverages a broader, global definition of entity similarity, which moves beyond mere local neighborhood proximity of GNN-based methods. The weighted sampling strategy employed in training accentuates edges linked to infrequently occurring entities. We evaluate our method on two benchmark datasets, and demonstrate that our framework outperforms existing methods in total link prediction, inductive link prediction, and in addressing long-tail entities. Notably, our method achieves a 10\% improvement and a 15\% boost in MRR for these datasets. The results underscore the potential of our approach in mitigating catastrophic forgetting and enhancing the robustness of TKG completion methods, especially in an incremental training context

nan

Article 1029

Title@2025-07-25 (5): A Toolbox, Not a Hammer – Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

Title: A Toolbox, Not a Hammer – Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

Eine Toolbox, kein Hammer – Multi-TAG: Skalierung der Mathematik mit Multi-Tool-Aggregation

一个工具箱, 不是锤锤 – – 多TAG: 使用多工具聚合的量性数学解释 2507.18973v1

Authors (2): Bohan Yao, Vikas Yadav

Augmenting large language models (LLMs) with external tools is a promising avenue for developing high-performance mathematical reasoning systems. Prior tool-augmented approaches typically finetune an LLM to select and invoke a single tool at each reasoning step and show promising results on simpler math reasoning benchmarks such as GSM8K. However, these approaches struggle with more complex math problems that require precise reasoning over multiple steps. To address this limitation, in this work, we propose Multi-TAG, a Multi-Tool AGgregation-based framework. Instead of relying on a single tool, Multi-TAG guides an LLM to concurrently invoke multiple tools at each reasoning step. It then aggregates their diverse outputs to verify and refine the reasoning process, enhancing solution robustness and accuracy. Notably, Multi-TAG is a finetuning-free, inference-only framework, making it readily applicable to any LLM backbone, including large open-weight models which are computationally expensive to finetune and proprietary frontier models which cannot be finetuned with custom recipes. We evaluate Multi-TAG on four challenging benchmarks: MATH500, AIME, AMC, and OlympiadBench. Across both open-weight and closed-source LLM backbones, Multi-TAG consistently and substantially outperforms state-of-the-art baselines, achieving average improvements of 6.0% to 7.5% over state-of-the-art baselines.

nan

Article 1030

Title@2025-07-25 (5): Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN

Title: Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN

Unterwasser-Abfallerkennung mit Deep Learning Ein Leistungsvergleich von YOLOv7 zu 10 und schneller RCNN

YOLOv7至10YOLOv7和更快RCNN的深入学习绩效比较 2507.18967v1

Authors (3): UMMPK Nawarathne, HMNS Kumari, HMLS Kumari

Underwater pollution is one of today’s most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8’s architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model’s potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.

nan

Article 1031

Title@2025-07-25 (5): On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem

Title: On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem

Auf der Erforschung des inneren Spiegelabflusses für stochastisches nichtkonvexes beschränktes Problem

探索内镜面下下下流的内孔反镜下流,以缓解杂乱的非电流制约问题 2507.15264v3

Authors (2): Kuangyu Ding, Kim-Chuan Toh

We study a nonsmooth nonconvex optimization problem defined over nonconvex constraints, where the feasible set is given by the intersection of the closure of an open set and a smooth manifold. By endowing the open set with a Riemannian metric induced by a barrier function, we obtain a Riemannian subgradient flow formulated as a differential inclusion, which remains strictly within the interior of the feasible set. This continuous dynamical system unifies two classes of iterative optimization methods, namely the Hessian barrier method and mirror descent scheme, by revealing that these methods can be interpreted as discrete approximations of the continuous flow. We explore the long-term behavior of the trajectories generated by this dynamical system and show that the existing deficient convergence properties of the Hessian barrier and mirror descent scheme can be unifily and more insightfully interpreted through these of the continuous trajectory. For instance, the notorious spurious stationary points \cite{chen2024spurious} observed in Hessian barrier method and mirror descent scheme are interpreted as stable equilibria of the dynamical system that do not correspond to real stationary points of the original optimization problem. We provide two sufficient condition such that these spurious stationary points can be avoided if the strict complementarity conditions holds. In the absence of these regularity condition, we propose a random perturbation strategy that ensures the trajectory converges (subsequentially) to an approximate stationary point. Building on these insights, we introduce two iterative Riemannian subgradient methods, form of interior point methods, that generalizes the existing Hessian barrier method and mirror descent scheme for solving nonsmooth nonconvex optimization problems.

nan

Article 1032

Title@2025-07-25 (5): MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model

Title: MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model

MedicalBERT: Verbesserung der biomedizinischen natürlichen Sprachverarbeitung mit vorgebildetem BERT-basiertem Modell

医学BERT:利用预先培训的BERT模式,加强生物医学自然语言处理 2507.08013v2

Authors (6): K. Sahit Reddy, N. Ragavenderan, Vasanth K., Ganesh N. Naik, Vishalakshi Prabhu, Nagaraja G. S

Recent advances in natural language processing (NLP) have been driven bypretrained language models like BERT, RoBERTa, T5, and GPT. Thesemodels excel at understanding complex texts, but biomedical literature, withits domain-specific terminology, poses challenges that models likeWord2Vec and bidirectional long short-term memory (Bi-LSTM) can’t fullyaddress. GPT and T5, despite capturing context, fall short in tasks needingbidirectional understanding, unlike BERT. Addressing this, we proposedMedicalBERT, a pretrained BERT model trained on a large biomedicaldataset and equipped with domain-specific vocabulary that enhances thecomprehension of biomedical terminology. MedicalBERT model is furtheroptimized and fine-tuned to address diverse tasks, including named entityrecognition, relation extraction, question answering, sentence similarity, anddocument classification. Performance metrics such as the F1-score,accuracy, and Pearson correlation are employed to showcase the efficiencyof our model in comparison to other BERT-based models such as BioBERT,SciBERT, and ClinicalBERT. MedicalBERT outperforms these models onmost of the benchmarks, and surpasses the general-purpose BERT model by5.67% on average across all the tasks evaluated respectively. This work alsounderscores the potential of leveraging pretrained BERT models for medicalNLP tasks, demonstrating the effectiveness of transfer learning techniques incapturing domain-specific information. (PDF) MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model. Available from: https://www.researchgate.net/publication/392489050_MedicalBERT_enhancing_biomedical_natural_language_processing_using_pretrained_BERT-based_model [accessed Jul 06 2025].

nan

Article 1033

Title@2025-07-25 (5): Handling Out-of-Distribution Data: A Survey

Title: Handling Out-of-Distribution Data: A Survey

Umgang mit Out-of-Distribution-Daten: Eine Umfrage

处理分发外数据:调查 2507.21160v1

Authors (4): Lakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.

nan

Article 1034

Title@2025-07-25 (5): Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity

Title: Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity

Adaptive Cluster Zusammenarbeit steigert LLMs medizinische Entscheidungsunterstützung Kapazität

LLM 医疗决策支助能力 2507.21159v1

Authors (4): Zhihao Peng, Liuxin Bao, Shengyuan Liu, Yixuan Yuan

The collaborativeness of large language models (LLMs) has proven effective in natural language processing systems, holding considerable promise for healthcare development. However, it lacks explicit component selection rules, necessitating human intervention or clinical-specific validation. Moreover, existing architectures heavily rely on a predefined LLM cluster, where partial LLMs underperform in medical decision support scenarios, invalidating the collaborativeness of LLMs. To this end, we propose an adaptive cluster collaborativeness methodology involving self-diversity and cross-consistency maximization mechanisms to boost LLMs medical decision support capacity. For the self-diversity, we calculate the fuzzy matching value of pairwise outputs within an LLM as its self-diversity value, subsequently prioritizing LLMs with high self-diversity values as cluster components in a training-free manner. For the cross-consistency, we first measure cross-consistency values between the LLM with the highest self-diversity value and others, and then gradually mask out the LLM having the lowest cross-consistency value to eliminate the potential inconsistent output during the collaborative propagation. Extensive experiments on two specialized medical datasets, NEJMQA and MMLU-Pro-health, demonstrate the effectiveness of our method across physician-oriented specialties. For example, on NEJMQA, our method achieves the accuracy rate up to the publicly official passing score across all disciplines, especially achieving ACC of 65.47\% compared to the 56.12\% achieved by GPT-4 on the Obstetrics and Gynecology discipline.

nan

Article 1035

Title@2025-07-25 (5): Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights

Title: Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights

Neural Tangent Kernel und Fisher Information Matrizen für einfache ReLU-Netzwerke mit zufälligen versteckten Gewichten

带有随机隐藏重的简单 ReLU 网络神经相垂直内核和渔业信息矩阵 2507.18555v2

Authors (6): Jun’ichi Takeuchi, Yoshinari Takeishi, Noboru Murata, Kazushi Mimura, Ka Long Keith Ho, Hiroshi Nagaoka

Fisher information matrices and neural tangent kernels (NTK) for 2-layer ReLU networks with random hidden weight are argued. We discuss the relation between both notions as a linear transformation and show that spectral decomposition of NTK with concrete forms of eigenfunctions with major eigenvalues. We also obtain an approximation formula of the functions presented by the 2-layer neural networks.

nan

Article 1036

Title@2025-07-25 (5): CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods

Title: CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods

CNN-basierte Surface Temperature Forecasts mit Ensemble Numerische Wettervorhersage über Mittelstrecken-Prognoseperioden

CNN有线有线电视新闻网的地表温度预报,以及中程预报期综合数字天气预报 2507.18937v1

Authors (2): Takuya Inoue, Takuya Kawabata

This study proposes a method that integrates convolutional neural networks (CNNs) with ensemble numerical weather prediction (NWP) models, enabling surface temperature forecasting at lead times beyond the short-range (five-day) forecast period. Owing to limited computational resources, operational medium-range temperature forecasts typically rely on low-resolution NWP models, which are prone to systematic and random errors. To resolve these limitations, the proposed method first reduces systematic errors through CNN-based post-processing (bias correction and spatial super-resolution) on each ensemble member, reconstructing high-resolution temperature fields from low-resolution model outputs. Second, it reduces random errors through ensemble averaging of the CNN-corrected members. This study also investigates whether the sequence of CNN correction and ensemble averaging affects the forecast accuracy. For comparison with the proposed method, we additionally conducted experiments with the CNN trained on ensemble-averaged forecasts. The first approach–CNN correction before ensemble averaging–consistently achieved higher accuracy than the reverse approach. Although based on low-resolution ensemble forecasts, the proposed method notably outperformed the high-resolution deterministic NWP models. These findings indicate that combining CNN-based correction with ensemble averaging effectively reduces both the systematic and random errors in NWP model outputs. The proposed approach is a practical and scalable solution for improving medium-range temperature forecasts, and is particularly valuable at operational centers with limited computational resources.

nan

Article 1037

Title@2025-07-25 (5): When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label

Title: When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label

Wenn geräuschvolle Etiketten die Klassenungleichgewichte auf Graphen treffen: Eine grafische Augmentationsmethode mit LLM und Pseudo-Label

当噪音标签在图表上达到类平衡时:与LLM和Pseudo标签的图表放大法 2507.18153v2

Authors (6): Riting Xia, Rucong Wang, Yulin Liu, Anchen Li, Xueyan Liu, Yan Zhang

Class-imbalanced graph node classification is a practical yet underexplored research problem. Although recent studies have attempted to address this issue, they typically assume clean and reliable labels when processing class-imbalanced graphs. This assumption often violates the nature of real-world graphs, where labels frequently contain noise. Given this gap, this paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels. We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques. Specifically, we design an LLM-based oversampling method to generate synthetic minority nodes, producing label-accurate minority nodes to alleviate class imbalance. Based on the class-balanced graphs, we develop a dynamically weighted pseudo-labeling method to obtain high-confidence pseudo labels to reduce label noise ratio. Additionally, we implement a secondary LLM-guided oversampling mechanism to mitigate potential class distribution skew caused by pseudo labels. Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.

nan

Article 1038

Title@2025-07-25 (5): Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction

Title: Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction

Geometrische Multi-Color-Nachricht Passing Graph Neural Networks für Blut-Hirn-Barriere Permeabilität Vorhersage

多色消息传传图神经网络 2507.18926v1

Authors (5): Trung Nguyen, Md Masud Rana, Farjana Tasnim Mukta, Chang-Guo Zhan, Duc Duy Nguyen

Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system (CNS) drug development. While graph neural networks (GNNs) have advanced molecular property prediction, they often rely on molecular topology and neglect the three-dimensional geometric information crucial for modeling transport mechanisms. This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN), a novel framework that enhances standard message-passing architectures by explicitly incorporating atomic-level geometric features and long-range interactions. Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern BBB permeability. We evaluated GMC-MPNN on three benchmark datasets for both classification and regression tasks, using rigorous scaffold-based splitting to ensure a robust assessment of generalization. The results demonstrate that GMC-MPNN consistently outperforms existing state-of-the-art models, achieving superior performance in both classifying compounds as permeable/non-permeable (AUC-ROC of 0.9704 and 0.9685) and in regressing continuous permeability values (RMSE of 0.4609, Pearson correlation of 0.7759). An ablation study further quantified the impact of specific atom-pair interactions, revealing that the model’s predictive power derives from its ability to learn from both common and rare, but chemically significant, functional motifs. By integrating spatial geometry into the graph representation, GMC-MPNN sets a new performance benchmark and offers a more accurate and generalizable tool for drug discovery pipelines.

nan

Article 1039

Title@2025-07-25 (5): From Conditional to Unconditional Independence: Testing Conditional Independence via Transport Maps

Title: From Conditional to Unconditional Independence: Testing Conditional Independence via Transport Maps

Von der bedingten zur bedingungslosen Unabhängigkeit: Prüfung der bedingten Unabhängigkeit über Transportkarten

从有条件独立到无条件独立:通过运输图测试有条件独立 2504.09567v3

Authors (4): Chenxuan He, Yuan Gao, Liping Zhu, Jian Huang

Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel method for testing conditional independence by transforming it to an unconditional independence test problem. We achieve this by constructing two transport maps that transform conditional independence into unconditional independence, this substantially simplifies the problem. These transport maps are estimated from data using conditional continuous normalizing flow models. Within this framework, we derive a test statistic and prove its asymptotic validity under both the null and alternative hypotheses. A permutation-based procedure is employed to evaluate the significance of the test. We validate the proposed method through extensive simulations and real-data analysis. Our numerical studies demonstrate the practical effectiveness of the proposed method for conditional independence

nan

Article 1040

Title@2025-07-25 (5): A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions

Title: A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions

Eine systematische Überprüfung der Systeme der wichtigsten retrieval-Augmented Generation (RAG): Fortschritt, Lücken und Zukunftsrichtungen

系统审查关键回收-养代(RAG)系统:进展、差距和未来方向 2507.18910v1

Authors (4): Agada Joseph Oche, Ademola Glory Folashade, Tirthankar Ghosal, Arpan Biswas

Retrieval-Augmented Generation (RAG) represents a major advancement in natural language processing (NLP), combining large language models (LLMs) with information retrieval systems to enhance factual grounding, accuracy, and contextual relevance. This paper presents a comprehensive systematic review of RAG, tracing its evolution from early developments in open domain question answering to recent state-of-the-art implementations across diverse applications. The review begins by outlining the motivations behind RAG, particularly its ability to mitigate hallucinations and outdated knowledge in parametric models. Core technical components-retrieval mechanisms, sequence-to-sequence generation models, and fusion strategies are examined in detail. A year-by-year analysis highlights key milestones and research trends, providing insight into RAG’s rapid growth. The paper further explores the deployment of RAG in enterprise systems, addressing practical challenges related to retrieval of proprietary data, security, and scalability. A comparative evaluation of RAG implementations is conducted, benchmarking performance on retrieval accuracy, generation fluency, latency, and computational efficiency. Persistent challenges such as retrieval quality, privacy concerns, and integration overhead are critically assessed. Finally, the review highlights emerging solutions, including hybrid retrieval approaches, privacy-preserving techniques, optimized fusion strategies, and agentic RAG architectures. These innovations point toward a future of more reliable, efficient, and context-aware knowledge-intensive NLP systems.

nan

Article 1041

Title@2025-07-25 (5): Probably Approximately Correct Causal Discovery

Title: Probably Approximately Correct Causal Discovery

Wahrscheinlich ungefähr korrekte Kausalentdeckung

可能大致正确原因发现 2507.18903v1

Authors (3): Mian Wei, Somesh Jha, David Page

The discovery of causal relationships is a foundational problem in artificial intelligence, statistics, epidemiology, economics, and beyond. While elegant theories exist for accurate causal discovery given infinite data, real-world applications are inherently resource-constrained. Effective methods for inferring causal relationships from observational data must perform well under finite data and time constraints, where “performing well” implies achieving high, though not perfect accuracy. In his seminal paper A Theory of the Learnable, Valiant highlighted the importance of resource constraints in supervised machine learning, introducing the concept of Probably Approximately Correct (PAC) learning as an alternative to exact learning. Inspired by Valiant’s work, we propose the Probably Approximately Correct Causal (PACC) Discovery framework, which extends PAC learning principles to the causal field. This framework emphasizes both computational and sample efficiency for established causal methods such as propensity score techniques and instrumental variable approaches. Furthermore, we show that it can also provide theoretical guarantees for other widely used methods, such as the Self-Controlled Case Series (SCCS) method, which had previously lacked such guarantees.

nan

Article 1042

Title@2025-07-25 (5): PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

Title: PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

PLEIADES: Bau von Temporalkernen mit orthogonalen Polynomen

PIADES:用矫形聚合体建造时空中枢 2405.12179v4

Authors (2): Yan Ru Pei, Olivier Coenen

We introduce a class of neural networks named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.

nan

Article 1043

Title@2025-07-25 (5): Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation

Title: Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation

Bridging Quantum and Classical Computing in Drug Design: Architektur-Prinzipien für verbesserte Molekül-Generierung

在药物设计中架桥量计算和古典计算:改进分子生成的建筑原则 2506.01177v2

Authors (2): Andrew Smith, Erhan Guven

Hybrid quantum-classical machine learning offers a path to leverage noisy intermediate-scale quantum (NISQ) devices for drug discovery, but optimal model architectures remain unclear. We systematically optimize the quantum-classical bridge architecture of generative adversarial networks (GANs) for molecule discovery using multi-objective Bayesian optimization. Our optimized model (BO-QGAN) significantly improves performance, achieving a 2.27-fold higher Drug Candidate Score (DCS) than prior quantum-hybrid benchmarks and 2.21-fold higher than the classical baseline, while reducing parameter count by more than 60%. Key findings favor layering multiple (3-4) shallow (4-8 qubit) quantum circuits sequentially, while classical architecture shows less sensitivity above a minimum capacity. This work provides the first empirically-grounded architectural guidelines for hybrid models, enabling more effective integration of current quantum computers into pharmaceutical research pipelines.

nan

Article 1044

Title@2025-07-25 (5): A Survey on State-of-the-art Deep Learning Applications and Challenges

Title: A Survey on State-of-the-art Deep Learning Applications and Challenges

Eine Umfrage zu aktuellen Anwendungen und Herausforderungen des Deep Learning

关于最先进的深深学习应用和挑战的调查 2403.17561v9

Authors (2): Mohd Halim Mohd Noor, Ayokunle Olalekan Ige

Deep learning, a branch of artificial intelligence, is a data-driven method that uses multiple layers of interconnected units or neurons to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is challenging due to the algorithm’s complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review the state-of-the-art deep learning models in computer vision, natural language processing, time series analysis and pervasive computing, and robotics. We highlight the key features of the models and their effectiveness in solving the problems within each domain. Furthermore, this study presents the fundamentals of deep learning, various deep learning model types and prominent convolutional neural network architectures. Finally, challenges and future directions in deep learning research are discussed to offer a broader perspective for future researchers.

nan

Article 1045

Title@2025-07-25 (5): Why Isn’t Relational Learning Taking Over the World?

Title: Why Isn’t Relational Learning Taking Over the World?

Warum übernimmt das relationale Lernen nicht die Welt?

为什么关系学习不超越世界? 2507.13558v2

Authors (1): David Poole

AI seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can’t be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world – except in a few cases with restricted relations – and what needs to be done to bring it to it’s rightful prominence.

nan

Article 1046

Title@2025-07-25 (5): VIBE: Video-Input Brain Encoder for fMRI Response Modeling

Title: VIBE: Video-Input Brain Encoder for fMRI Response Modeling

VIBE: Video-Input Gehirnencoder für fMRI Response Modeling

VIBE: 用于FMRI反应建模的视频投入大脑编码器 2507.17958v2

Authors (6): Daniel Carlström Schad, Shrey Dixit, Janis Keck, Viktor Studenyak, Aleksandr Shpilevoi, Andrej Bicanski

We present VIBE, a two-stage Transformer that fuses multi-modal video, audio, and text features to predict fMRI activity. Representations from open-source models (Qwen2.5, BEATs, Whisper, SlowFast, V-JEPA) are merged by a modality-fusion transformer and temporally decoded by a prediction transformer with rotary embeddings. Trained on 65 hours of movie data from the CNeuroMod dataset and ensembled across 20 seeds, VIBE attains mean parcel-wise Pearson correlations of 0.3225 on in-distribution Friends S07 and 0.2125 on six out-of-distribution films. An earlier iteration of the same architecture obtained 0.3198 and 0.2096, respectively, winning Phase-1 and placing second overall in the Algonauts 2025 Challenge.

nan

Article 1047

Title@2025-07-25 (5): Value-Based Deep RL Scales Predictably

Title: Value-Based Deep RL Scales Predictably

Wertbasierte Tiefen-RL-Skalen vorausschauend

基于价值的深 RL 尺度 2502.04327v2

Authors (7): Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar

Scaling data and compute is critical to the success of modern ML. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment. In this paper, we show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. First, we show that data and compute requirements to attain a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can predict this data requirement when given more compute, and this compute requirement when given more data. Second, we determine the optimal allocation of a total resource budget across data and compute for a given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling is enabled by first estimating predictable relationships between hyperparameters, which is used to manage effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.

nan

Article 1048

Title@2025-07-25 (5): Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

Title: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

Individueller Intrinsischer Lohn im Mehr-Agenten-Verstärkungs-Lernen durch Einbeziehung allgemeiner menschlicher Expertise

通过纳入通用的人类专门知识,学习多机构加强学习中的个人内在奖赏 2507.18867v1

Authors (4): Xuefei Wu, Xiao Yin, Yuanyang Zhu, Chunlin Chen

Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.

nan

Article 1049

Title@2025-07-25 (5): Estimation of conditional average treatment effects on distributed confidential data

Title: Estimation of conditional average treatment effects on distributed confidential data

Schätzung der bedingten durchschnittlichen Behandlungseffekte auf verteilte vertrauliche Daten

对分发的机密数据进行有条件平均待遇影响的估计 2402.02672v5

Authors (5): Yuji Kawamata, Ryoki Motai, Yukihiko Okada, Akira Imakura, Tetsuya Sakurai

The estimation of conditional average treatment effects (CATEs) is an important topic in many scientific fields. CATEs can be estimated with high accuracy if data distributed across multiple parties are centralized. However, it is difficult to aggregate such data owing to confidentiality or privacy concerns. To address this issue, we propose data collaboration double machine learning, a method for estimating CATE models using privacy-preserving fusion data constructed from distributed sources, and evaluate its performance through simulations. We make three main contributions. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data, providing robustness to model mis-specification compared to parametric approaches. Second, it enables collaborative estimation across different time points and parties by accumulating a knowledge base. Third, our method performs as well as or better than existing methods in simulations using synthetic, semi-synthetic, and real-world datasets.

nan

Article 1050

Title@2025-07-25 (5): Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning

Title: Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning

Frühe Mortalitätsvorhersage bei Patienten mit hypertensiver Nierenerkrankung unter Verwendung eines interpretierbaren maschinellen Lernens

使用可解释机器学习方法对伊斯兰法院联盟高血压肾脏疾病患者进行早期死亡率预测 2507.18866v1

Authors (9): Yong Si, Junyi Fan, Li Sun, Shuheng Chen, Minoo Ahmadi, Elham Pishgar, Kamiar Alaei, Greg Placencia, Maryam Pishgar

Background: Hypertensive kidney disease (HKD) patients in intensive care units (ICUs) face high short-term mortality, but tailored risk prediction tools are lacking. Early identification of high-risk individuals is crucial for clinical decision-making. Methods: We developed a machine learning framework to predict 30-day in-hospital mortality among ICU patients with HKD using early clinical data from the MIMIC-IV v2.2 database. A cohort of 1,366 adults was curated with strict criteria, excluding malignancy cases. Eighteen clinical features-including vital signs, labs, comorbidities, and therapies-were selected via random forest importance and mutual information filtering. Several models were trained and compared with stratified five-fold cross-validation; CatBoost demonstrated the best performance. Results: CatBoost achieved an AUROC of 0.88 on the independent test set, with sensitivity of 0.811 and specificity of 0.798. SHAP values and Accumulated Local Effects (ALE) plots showed the model relied on meaningful predictors such as altered consciousness, vasopressor use, and coagulation status. Additionally, the DREAM algorithm was integrated to estimate patient-specific posterior risk distributions, allowing clinicians to assess both predicted mortality and its uncertainty. Conclusions: We present an interpretable machine learning pipeline for early, real-time risk assessment in ICU patients with HKD. By combining high predictive performance with uncertainty quantification, our model supports individualized triage and transparent clinical decisions. This approach shows promise for clinical deployment and merits external validation in broader critical care populations.

nan

Article 1051

Title@2025-07-25 (5): PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning

Title: PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning

PrismRAG: Steigerung der RAG-Faktizität mit Distraktorresilienz und geschichteter Vernunft

PrismRAG:提高RAG事实质量,使其具有抗力和策略性合理性 2507.18857v1

Authors (13): Mohammad Kachuee, Teja Gollapudi, Minseok Kim, Yin Huang, Kai Sun, Xiao Yang, Jiaqi Wang, Nirav Shah, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

Retrieval-augmented generation (RAG) often falls short when retrieved context includes confusing semi-relevant passages, or when answering questions require deep contextual understanding and reasoning. We propose an efficient fine-tuning framework, called PrismRAG, that (i) trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages, and (ii) instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions. Evaluated across 12 open-book RAG QA benchmarks spanning diverse application domains and scenarios, PrismRAG improves average factuality by 5.4%, outperforming state-of-the-art solutions.

nan

Article 1052

Title@2025-07-25 (5): Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition

Title: Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition

Resonant-Tunnelling Diode Reservoir Computing System für die Bilderkennung

图像识别共振二氧化二氮储量计算系统 2507.15158v2

Authors (3): A. H. Abbas, Hend Abdel-Ghani, Ivan S. Maksymov

As artificial intelligence continues to push into real-time, edge-based and resource-constrained environments, there is an urgent need for novel, hardware-efficient computational models. In this study, we present and validate a neuromorphic computing architecture based on resonant-tunnelling diodes (RTDs), which exhibit the nonlinear characteristics ideal for physical reservoir computing (RC). We theoretically formulate and numerically implement an RTD-based RC system and demonstrate its effectiveness on two image recognition benchmarks: handwritten digit classification and object recognition using the Fruit~360 dataset. Our results show that this circuit-level architecture delivers promising performance while adhering to the principles of next-generation RC – eliminating random connectivity in favour of a deterministic nonlinear transformation of input signals.

nan

Article 1053

Title@2025-07-24 (4): Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number

Title: Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number

Optimierung des Metachronal-Paddelns mit Verstärkungs-Lernen bei niedriger Reynolds-Zahl

优化低Reynolds 数字加固学习的比数倾斜 2507.18849v1

Authors (2): Alana A. Bailey, Robert D. Guy

Metachronal paddling is a swimming strategy in which an organism oscillates sets of adjacent limbs with a constant phase lag, propagating a metachronal wave through its limbs and propelling it forward. This limb coordination strategy is utilized by swimmers across a wide range of Reynolds numbers, which suggests that this metachronal rhythm was selected for its optimality of swimming performance. In this study, we apply reinforcement learning to a swimmer at zero Reynolds number and investigate whether the learning algorithm selects this metachronal rhythm, or if other coordination patterns emerge. We design the swimmer agent with an elongated body and pairs of straight, inflexible paddles placed along the body for various fixed paddle spacings. Based on paddle spacing, the swimmer agent learns qualitatively different coordination patterns. At tight spacings, a back-to-front metachronal wave-like stroke emerges which resembles the commonly observed biological rhythm, but at wide spacings, different limb coordinations are selected. Across all resulting strokes, the fastest stroke is dependent on the number of paddles, however, the most efficient stroke is a back-to-front wave-like stroke regardless of the number of paddles.

nan

Article 1054

Title@2025-07-24 (4): Low-Rank Thinning

Title: Low-Rank Thinning

Low-Rank Thinning

低兰氏度 2502.12063v7

Authors (5): Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

nan

Article 1055

Title@2025-07-24 (4): Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Title: Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Perturbationseffiziente Zeroth-Order-Optimierung für hardwarefreundliches On-Device-Training

方便硬件的硬件设备培训优化 2504.20314v2

Authors (13): Qitao Tan, Sung-En Chang, Rui Xia, Huidong Ji, Chence Yang, Ci Zhang, Jun Liu, Zheng Zhan, Zhenman Fang, Zhou Zou, Yanzhi Wang, Jin Lu, Geng Yuan

Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs. In this paper, we identify this critical issue, which arises from the mismatch between algorithm and hardware designers. To address this issue, we proposed PeZO, a perturbation-efficient ZO framework. Specifically, we design random number reuse strategies to significantly reduce the demand for random number generation and introduce a hardware-friendly adaptive scaling method to replace the costly Gaussian distribution with a uniform distribution. Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6\% and 12.7\%, and saves at maximum 86\% power consumption, all without compromising training performance, making ZO optimization feasible for on-device training. To the best of our knowledge, we are the first to explore the potential of on-device ZO optimization, providing valuable insights for future research.

nan

Article 1056

Title@2025-07-24 (4): PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Title: PIPA: Preference Alignment as Prior-Informed Statistical Estimation

PIPA: Präferenz-Ausrichtung als vorherinformierte statistische Schätzung

PIPA: 优先一致,作为先前不完善的统计估计 2502.05773v2

Authors (3): Junbo Li, Zhangyang Wang, Qiang Liu

Offline preference alignment for language models such as Direct Preference Optimization (DPO) is favored for its effectiveness and simplicity, eliminating the need for costly reinforcement learning. Various offline algorithms have been developed for different data settings, yet they lack a unified understanding. In this study, we introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework that formulates language model preference alignment as a Maximum Likelihood Estimation (MLE) problem with prior constraints. This method effectively accommodates both paired and unpaired data, as well as answer and step-level annotations. We illustrate that DPO and KTO are special cases with different prior constraints within our framework. By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N. Both algorithms demonstrate a $3\sim10\%$ performance enhancement on the GSM8K and MATH benchmarks across all configurations, achieving these gains without additional training or computational costs compared to existing algorithms.

nan

Article 1057

Title@2025-07-24 (4): R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

Title: R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

R-Stitch: Dynamische Trajektorien-Stitching für effiziente Vernunft

R-Stitch: 高效理性的动态轨迹切换 2507.17307v2

Authors (6): Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang

Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM’s confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.

nan

Article 1058

Title@2025-07-24 (4): Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Title: Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Auf dem Weg zur Verbesserung des Belohnungsdesigns in RL: Ein Reward Alignment Metric für RL-Praktizierende

努力改进RL的奖励设计:为RL开业医生的奖励调整计量 2503.05996v2

Authors (6): Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor

Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly specified? In our work, we address these challenges by focusing on reward alignment – assessing whether a reward function accurately encodes the preferences of a human stakeholder. As a concrete measure of reward alignment, we introduce the Trajectory Alignment Coefficient to quantify the similarity between a human stakeholder’s ranking of trajectory distributions and those induced by a given reward function. We show that the Trajectory Alignment Coefficient exhibits desirable properties, such as not requiring access to a ground truth reward, invariance to potential-based reward shaping, and applicability to online RL. Additionally, in an 11 – person user study of RL practitioners, we found that access to the Trajectory Alignment Coefficient during reward selection led to statistically significant improvements. Compared to relying only on reward functions, our metric reduced cognitive workload by 1.5x, was preferred by 82% of users and increased the success rate of selecting reward functions that produced performant policies by 41%.

nan

Article 1059

Title@2025-07-24 (4): RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification

Title: RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification

RedactOR: Ein LLM-Powered Framework für die automatische De-Identifikation klinischer Daten

重编:一个LLM授权的自动临床数据识别框架 2505.18380v2

Authors (7): Praphul Singh, Charlotte Dzialo, Jangwon Kim, Sumana Srivatsa, Irfan Bulu, Sri Gadde, Krishnaram Kenthapadi

Ensuring clinical data privacy while preserving utility is critical for AI-driven healthcare and data analytics. Existing de-identification (De-ID) methods, including rule-based techniques, deep learning models, and large language models (LLMs), often suffer from recall errors, limited generalization, and inefficiencies, limiting their real-world applicability. We propose a fully automated, multi-modal framework, RedactOR for de-identifying structured and unstructured electronic health records, including clinical audio records. Our framework employs cost-efficient De-ID strategies, including intelligent routing, hybrid rule and LLM based approaches, and a two-step audio redaction approach. We present a retrieval-based entity relexicalization approach to ensure consistent substitutions of protected entities, thereby enhancing data coherence for downstream applications. We discuss key design desiderata, de-identification and relexicalization methodology, and modular architecture of RedactOR and its integration with the Oracle Health Clinical AI system. Evaluated on the i2b2 2014 De-ID dataset using standard metrics with strict recall, our approach achieves competitive performance while optimizing token usage to reduce LLM costs. Finally, we discuss key lessons and insights from deployment in real-world AI- driven healthcare data pipelines.

nan

Article 1060

Title@2025-07-24 (4): Toward Super Agent System with Hybrid AI Routers

Title: Toward Super Agent System with Hybrid AI Routers

Auf dem Weg zum Super Agent System mit Hybrid-KI Routern

向超级代理系统过渡 2504.10519v2

Authors (8): Yuhang Yao, Haixin Wang, Yibo Chen, Jiawen Wang, Min Chang Jordan Ren, Bosheng Ding, Salman Avestimehr, Chaoyang He

AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significant optimizations are required to ensure high efficiency and low cost. This position paper presents a design of the Super Agent System powered by the hybrid AI routers. Upon receiving a user prompt, the system first detects the intent of the user, then routes the request to specialized task agents with the necessary tools or automatically generates agentic workflows. In practice, most applications directly serve as AI assistants on edge devices such as phones and robots. As different language models vary in capability and cloud-based models often entail high computational costs, latency, and privacy concerns, we then explore the hybrid mode where the router dynamically selects between local and cloud models based on task complexity. Finally, we introduce the blueprint of an on-device super agent enhanced with cloud. With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed. Such architecture paves the way for super agents to be seamlessly integrated into everyday life in the near future.

nan

Article 1061

Title@2025-07-24 (4): LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

Title: LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

LeanKAN: Eine Parameter-Lean Kolmogorov-Arnold Netzwerkschicht mit verbesserter Speichereffizienz und Konvergenzverhalten

LeanKAN: 提高记忆效率和一致行为达标的Lean Kolmogorov-Arnold网络层 2502.17844v2

Authors (3): Benjamin C. Koenig, Suyong Kim, Sili Deng

The recently proposed Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling. While original KAN layers were only capable of representing the addition operator, the recently-proposed MultKAN layer combines addition and multiplication subnodes in an effort to improve representation performance. Here, we find that MultKAN layers suffer from a few key drawbacks including limited applicability in output layers, bulky parameterizations with extraneous activations, and the inclusion of complex hyperparameters. To address these issues, we propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers. LeanKANs address these three drawbacks of MultKAN through general applicability as output layers, significantly reduced parameter counts for a given network structure, and a smaller set of hyperparameters. As a one-to-one layer replacement for standard AddKAN and MultKAN layers, LeanKAN is able to provide these benefits to traditional KAN learning problems as well as augmented KAN structures in which it serves as the backbone, such as KAN Ordinary Differential Equations (KAN-ODEs) or Deep Operator KANs (DeepOKAN). We demonstrate LeanKAN’s simplicity and efficiency in a series of demonstrations carried out across a standard KAN toy problem as well as ordinary and partial differential equations learned via KAN-ODEs, where we find that its sparser parameterization and compact structure serve to increase its expressivity and learning capability, leading it to outperform similar and even much larger MultKANs in various tasks.

nan

Article 1062

Title@2025-07-24 (4): RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models

Title: RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models

RealDeal: Realismus und Details in der Gehirnbildgenerierung durch Image-to-Image-Diffusionsmodelle verbessern

Real Deal:通过图像到图像传播模型,加强脑图像生成的现实和细节 2507.18830v1

Authors (5): Shen Zhu, Yinzhu Jin, Tyler Spears, Ifrah Zawar, P. Thomas Fletcher

We propose image-to-image diffusion models that are designed to enhance the realism and details of generated brain images by introducing sharp edges, fine textures, subtle anatomical features, and imaging noise. Generative models have been widely adopted in the biomedical domain, especially in image generation applications. Latent diffusion models achieve state-of-the-art results in generating brain MRIs. However, due to latent compression, generated images from these models are overly smooth, lacking fine anatomical structures and scan acquisition noise that are typically seen in real images. This work formulates the realism enhancing and detail adding process as image-to-image diffusion models, which refines the quality of LDM-generated images. We employ commonly used metrics like FID and LPIPS for image realism assessment. Furthermore, we introduce new metrics to demonstrate the realism of images generated by RealDeal in terms of image noise distribution, sharpness, and texture.

nan

Article 1063

Title@2025-07-24 (4): CueBuddy: helping non-native English speakers navigate English-centric STEM education

Title: CueBuddy: helping non-native English speakers navigate English-centric STEM education

CueBuddy: Hilfe für nicht-native englische Referenten navigieren Englisch-centric STEM Bildung

CueBuddy:帮助非母语英语者掌握以英语为中心的STEM教育 2507.18827v1

Authors (1): Pranav Gupta

Students across the world in STEM classes, especially in the Global South, fall behind their peers who are more fluent in English, despite being at par with them in terms of scientific prerequisites. While many of them are able to follow everyday English at ease, key terms in English stay challenging. In most cases, such students have had most of their course prerequisites in a lower resource language. Live speech translation to lower resource languages is a promising area of research, however, models for speech translation can be too expensive on a large scale and often struggle with technical content. In this paper, we describe CueBuddy, which aims to remediate these issues by providing real-time “lexical cues” through technical keyword spotting along real-time multilingual glossary lookup to help students stay up to speed with complex English jargon without disrupting their concentration on the lecture. We also describe the limitations and future extensions of our approach.

nan

Article 1064

Title@2025-07-24 (4): Scale-Consistent Learning for Partial Differential Equations

Title: Scale-Consistent Learning for Partial Differential Equations

Scale-Consistent Learning für partielle Differentialgleichungen

部分差异等量的规模一致学习 2507.18813v1

Authors (7): Zongyi Li, Samuel Lanthaler, Catherine Deng, Michael Chen, Yixuan Wang, Kamyar Azizzadenesheli, Anima Anandkumar

Machine learning (ML) models have emerged as a promising approach for solving partial differential equations (PDEs) in science and engineering. Previous ML models typically cannot generalize outside the training data; for example, a trained ML model for the Navier-Stokes equations only works for a fixed Reynolds number ($Re$) on a pre-defined domain. To overcome these limitations, we propose a data augmentation scheme based on scale-consistency properties of PDEs and design a scale-informed neural operator that can model a wide range of scales. Our formulation leverages the facts: (i) PDEs can be rescaled, or more concretely, a given domain can be re-scaled to unit size, and the parameters and the boundary conditions of the PDE can be appropriately adjusted to represent the original solution, and (ii) the solution operators on a given domain are consistent on the sub-domains. We leverage these facts to create a scale-consistency loss that encourages matching the solutions evaluated on a given domain and the solution obtained on its sub-domain from the rescaled PDE. Since neural operators can fit to multiple scales and resolutions, they are the natural choice for incorporating scale-consistency loss during training of neural PDE solvers. We experiment with scale-consistency loss and the scale-informed neural operator model on the Burgers’ equation, Darcy Flow, Helmholtz equation, and Navier-Stokes equations. With scale-consistency, the model trained on $Re$ of 1000 can generalize to $Re$ ranging from 250 to 10000, and reduces the error by 34% on average of all datasets compared to baselines.

nan

Article 1065

Title@2025-07-24 (4): Even Faster Simulations with Flow Matching: A Study of Zero Degree Calorimeter Responses

Title: Even Faster Simulations with Flow Matching: A Study of Zero Degree Calorimeter Responses

Noch schnellere Simulationen mit Flow Matching: Eine Studie mit Null-Grad-Kalorimeter-Antworten

更快速的模拟流程匹配模拟:零度卡拉里米反应研究 2507.18811v1

Authors (1): Maksymilian Wojnar

Recent advances in generative neural networks, particularly flow matching (FM), have enabled the generation of high-fidelity samples while significantly reducing computational costs. A promising application of these models is accelerating simulations in high-energy physics (HEP), helping research institutions meet their increasing computational demands. In this work, we leverage FM to develop surrogate models for fast simulations of zero degree calorimeters in the ALICE experiment. We present an effective training strategy that enables the training of fast generative models with an exceptionally low number of parameters. This approach achieves state-of-the-art simulation fidelity for both neutron (ZN) and proton (ZP) detectors, while offering substantial reductions in computational costs compared to existing methods. Our FM model achieves a Wasserstein distance of 1.27 for the ZN simulation with an inference time of 0.46 ms per sample, compared to the current best of 1.20 with an inference time of approximately 109 ms. The latent FM model further improves the inference speed, reducing the sampling time to 0.026 ms per sample, with a minimal trade-off in accuracy. Similarly, our approach achieves a Wasserstein distance of 1.30 for the ZP simulation, outperforming the current best of 2.08. The source code is available at https://github.com/m-wojnar/faster_zdc.

nan

Article 1066

Title@2025-07-24 (4): Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy

Title: Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy

Neugier trieb die Exploration an, um Struktur-Eigentums-Lernen in der Mikroskopie zu optimieren

优化微观分析中结构-财产学习的探索 2504.20011v2

Authors (8): Aditya Vatsavai, Ganesh Narasimha, Yongtao Liu, Jawad Chowdhury, Jan-Chi Yang, Hiroshi Funakubo, Maxim Ziatdinov, Rama Vasudevan

Rapidly determining structure-property correlations in materials is an important challenge in better understanding fundamental mechanisms and greatly assists in materials design. In microscopy, imaging data provides a direct measurement of the local structure, while spectroscopic measurements provide relevant functional property information. Deep kernel active learning approaches have been utilized to rapidly map local structure to functional properties in microscopy experiments, but are computationally expensive for multi-dimensional and correlated output spaces. Here, we present an alternative lightweight curiosity algorithm which actively samples regions with unexplored structure-property relations, utilizing a deep-learning based surrogate model for error prediction. We show that the algorithm outperforms random sampling for predicting properties from structures, and provides a convenient tool for efficient mapping of structure-property relationships in materials science.

nan

Article 1067

Title@2025-07-24 (4): MetaSel: A Test Selection Approach for Fine-tuned DNN Models

Title: MetaSel: A Test Selection Approach for Fine-tuned DNN Models

MetaSel: Ein Testauswahlverfahren für fein abgestimmte DNN-Modelle

MetaSel: 微调 DNN 模型的测试选择方法 2503.17534v3

Authors (4): Amin Abbasishahkoo, Mahboubeh Dadkhah, Lionel Briand, Dayi Lin

Deep Neural Networks (DNNs) face challenges during deployment due to data distribution shifts. Fine-tuning adapts pre-trained models to new contexts requiring smaller labeled sets. However, testing fine-tuned models under constrained labeling budgets remains a critical challenge. This paper introduces MetaSel, a new approach, tailored for fine-tuned DNN models, to select tests from unlabeled inputs. MetaSel assumes that fine-tuned and pre-trained models share related data distributions and exhibit similar behaviors for many inputs. However, their behaviors diverge within the input subspace where fine-tuning alters decision boundaries, making those inputs more prone to misclassification. Unlike general approaches that rely solely on the DNN model and its input set, MetaSel leverages information from both the fine-tuned and pre-trained models and their behavioral differences to estimate misclassification probability for unlabeled test inputs, enabling more effective test selection. Our extensive empirical evaluation, comparing MetaSel against 11 state-of-the-art approaches and involving 68 fine-tuned models across weak, medium, and strong distribution shifts, demonstrates that MetaSel consistently delivers significant improvements in Test Relative Coverage (TRC) over existing baselines, particularly under highly constrained labeling budgets. MetaSel shows average TRC improvements of 28.46% to 56.18% over the most frequent second-best baselines while maintaining a high TRC median and low variability. Our results confirm MetaSel’s practicality, robustness, and cost-effectiveness for test selection in the context of fine-tuned models.

nan

Article 1068

Title@2025-07-24 (4): Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Title: Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Feature Flow analysieren, um Interpretation und Steuerung in Sprachmodellen zu verbessern

分析地貌流动,以加强语言模型的口译和指导 2502.03032v3

Authors (4): Daniil Laptev, Nikita Balagansky, Yaroslav Aksenov, Daniil Gavrilov

We introduce a new approach to systematically map features discovered by sparse autoencoder across consecutive layers of large language models, extending earlier work that examined inter-layer feature links. By using a data-free cosine similarity technique, we trace how specific features persist, transform, or first appear at each stage. This method yields granular flow graphs of feature evolution, enabling fine-grained interpretability and mechanistic insights into model computations. Crucially, we demonstrate how these cross-layer feature maps facilitate direct steering of model behavior by amplifying or suppressing chosen features, achieving targeted thematic control in text generation. Together, our findings highlight the utility of a causal, cross-layer interpretability framework that not only clarifies how features develop through forward passes but also provides new means for transparent manipulation of large language models.

nan

Article 1069

Title@2025-07-24 (4): Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Title: Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen

量子、原子学和连续系统科学人造情报 2307.08423v6

Authors (63): Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Alex Strasser, Haiyang Yu, YuQing Xie, Xiang Fu, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

nan

Article 1070

Title: Test-time Offline Reinforcement Learning on Goal-related Experience

Test-time Offline-Verstärkung Lernen über zielbezogene Erfahrungen

关于目标相关经验的脱线强化学习 2507.18809v1

Authors (5): Marco Bagatella, Mert Albaba, Jonas Hübotter, Georg Martius, Andreas Krause

Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at minimal compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state and quality with respect to the evaluation goal. We demonstrate across a wide range of high-dimensional loco-navigation and manipulation tasks that fine-tuning a policy on the selected data for a few gradient steps leads to significant performance gains over standard offline pre-training. Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out. Finally, we study compute allocation at inference, demonstrating that, at comparable costs, GC-TTT induces performance gains that are not achievable by scaling model size.

nan

Article 1071

Title@2025-07-24 (4): Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Title: Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling

通过推动积极抽样,自我监督的演讲人核查框架 2501.17772v4

Authors (2): Theo Lepage, Reda Dehak

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to the same speaker identity but correspond to different recording conditions. This method consistently demonstrates improvements in SV performance on VoxCeleb benchmarks when applied to major SSL frameworks, including SimCLR, SwAV, VICReg, and DINO. Using SSPS, SimCLR and DINO achieve 2.57% and 2.53% EER on VoxCeleb1-O, respectively. SimCLR yields a 58% relative reduction in EER, getting comparable performance to DINO with a simpler training framework. Furthermore, SSPS lowers intra-class variance and reduces channel information in speaker representations while exhibiting greater robustness without data-augmentation.

nan

Article 1072

Title@2025-07-24 (4): Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

Title: Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

Fischer kostenlos? Annäherung der Fisher Information Matrix durch Recycling der quadratischen Gradienten Akkumulator

通过回收平梯级积聚器来接近渔业信息矩阵 2507.18807v1

Authors (4): YuXin Li, Felix Dangel, Derek Tam, Colin Raffel

The diagonal of a model’s Fisher Information Matrix (the “Fisher diagonal”) has frequently been used as a way to measure parameter sensitivity. Typically, the Fisher diagonal is estimated via squared sampled gradients of the model’s likelihood with respect to its parameters, averaged over a few hundred or thousand examples – a process which incurs nontrivial computational costs. At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher diagonal can be obtained “for free” by recycling the squared gradient accumulator that has already been computed over the course of training. Through a comprehensive set of experiments covering five applications of the Fisher diagonal, we demonstrate that the “Squisher” (SQUared gradient accumulator as an approximation of the FISHER) consistently performs similarly to the Fisher diagonal while outperforming baseline methods. Additionally, we clarify the exact differences between the Squisher and the Fisher diagonal and provide empirical quantification of their respective impact.

nan

Article 1073

Title@2025-07-24 (4): Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors

Title: Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors

Ralts: Robuste Aggregation zur Verbesserung der Graphen-Neural-Netzwerk-Resilienz bei Bit-Flip-Fehlern

Ralts:加强图形神经网络抗力的强力聚合,以应对位翻转错误 2507.18804v1

Authors (2): Wencheng Zou, Nan Wu

Graph neural networks (GNNs) have been widely applied in safety-critical applications, such as financial and medical networks, in which compromised predictions may cause catastrophic consequences. While existing research on GNN robustness has primarily focused on software-level threats, hardware-induced faults and errors remain largely underexplored. As hardware systems progress toward advanced technology nodes to meet high-performance and energy efficiency demands, they become increasingly susceptible to transient faults, which can cause bit flips and silent data corruption, a prominent issue observed by major technology companies (e.g., Meta and Google). In response, we first present a comprehensive analysis of GNN robustness against bit-flip errors, aiming to reveal system-level optimization opportunities for future reliable and efficient GNN systems. Second, we propose Ralts, a generalizable and lightweight solution to bolster GNN resilience to bit-flip errors. Specifically, Ralts exploits various graph similarity metrics to filter out outliers and recover compromised graph topology, and incorporates these protective techniques directly into aggregation functions to support any message-passing GNNs. Evaluation results demonstrate that Ralts effectively enhances GNN robustness across a range of GNN models, graph datasets, error patterns, and both dense and sparse architectures. On average, under a BER of $3\times10^{-5}$, these robust aggregation functions improve prediction accuracy by at least 20\% when errors present in model weights or node embeddings, and by at least 10\% when errors occur in adjacency matrices. Ralts is also optimized to deliver execution efficiency comparable to built-in aggregation functions in PyTorch Geometric.

nan

Article 1074

Title@2025-07-24 (4): Central limit theorems for the eigenvalues of graph Laplacians on data clouds

Title: Central limit theorems for the eigenvalues of graph Laplacians on data clouds

Zentralgrenzensätze für die Eigenwerte von Graphen Laplacians auf Datenwolken

数据云中拉平板图 Laplacians 的天值中央限制定理 2507.18803v1

Authors (4): Chenghui Li, Nicolás García Trillos, Housen Li, Leo Suchan

Given i.i.d.\ samples $X_n ={ x_1, \dots, x_n }$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $\Delta_n$ associated to an $\varepsilon$-proximity graph over $X_n$ and study the asymptotic fluctuations of its eigenvalues around their means. In particular, letting $\hat{\lambda}l^\varepsilon$ denote the $l$-th eigenvalue of $\Delta_n$, and under suitable assumptions on the data generating model and on the rate of decay of $\varepsilon$, we prove that $\sqrt{n } (\hat{\lambda}{l}^\varepsilon - \mathbb{E}[\hat{\lambda}_{l}^\varepsilon] )$ is asymptotically Gaussian with a variance that we can explicitly characterize. A formal argument allows us to interpret this asymptotic variance as the dissipation of a gradient flow of a suitable energy with respect to the Fisher-Rao geometry. This geometric interpretation allows us to give, in turn, a statistical interpretation of the asymptotic variance in terms of a Cramer-Rao lower bound for the estimation of the eigenvalues of certain weighted Laplace-Beltrami operator. The latter interpretation suggests a form of asymptotic statistical efficiency for the eigenvalues of the graph Laplacian. We also present CLTs for multiple eigenvalues and through several numerical experiments explore the validity of our results when some of the assumptions that we make in our theoretical analysis are relaxed.

nan

Article 1075

Title@2025-07-24 (4): Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Title: Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Plan für Geschwindigkeit: Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle

速度计划: 遮蔽传播语言模型的饱和日程安排 2506.19037v3

Authors (3): Omer Luxembourg, Haim Permuter, Eliya Nachmani

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasked them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP) and general-knowledge benchmarks (BBH, MMLU-Pro), DUS outperforms confidence-based planners, without modifying the underlying denoiser, and reveals the true speed-quality frontier of MDLMs.

nan

Article 1076

Title@2025-07-24 (4): 2048: Reinforcement Learning in a Delayed Reward Environment

Title: 2048: Reinforcement Learning in a Delayed Reward Environment

2048: Verstärktes Lernen in einer verzögerten Belohnungsumgebung

2048年:在延迟奖励环境中加强学习 2507.05465v2

Authors (3): Prady Saligram, Tanvir Bhathal, Robby Manihani

Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from 3.988K (DQN) to 5.756K (PPO), 8.66K (QR-DQN), and 18.21K (H-DQN), with H-DQN reaching the 2048 tile. Upon scaling H-DQN it reaches a max score 41.828K and a 4096 tile. These results demonstrate that distributional, multi-step targets substantially enhance performance in sparse-reward domains, and they suggest promising avenues for further gains through model-based planning and curriculum learning.

nan

Article 1077

Title@2025-07-24 (4): Semantic IDs for Music Recommendation

Title: Semantic IDs for Music Recommendation

Semantische IDs für Musikempfehlung

用于音乐推荐的语义代号 2507.18800v1

Authors (5): M. Jeffrey Mei, Florian Henkel, Samuel E. Sandberg, Oliver Bembom, Andreas F. Ehmann

Training recommender systems for next-item recommendation often requires unique embeddings to be learned for each item, which may take up most of the trainable parameters for a model. Shared embeddings, such as using content information, can reduce the number of distinct embeddings to be stored in memory. This allows for a more lightweight model; correspondingly, model complexity can be increased due to having fewer embeddings to store in memory. We show the benefit of using shared content-based features (‘semantic IDs’) in improving recommendation accuracy and diversity, while reducing model size, for two music recommendation datasets, including an online A/B test on a music streaming service.

nan

Article 1078

Title@2025-07-24 (4): CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization

Title: CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization

CLEAR: Unlearning Spurious Style-Content Assoziationen mit Kontrastivem Lernen mit anti-kontrastiver Regularisierung

CLEAR: 学习与反竞争正规化相悖的利于竞争的利于竞争的纯净风格-知识型协会 2507.18794v1

Authors (3): Minghui Sun, Benjamin A. Goldstein, Matthew M. Engelhard

Learning representations unaffected by superficial characteristics is important to ensure that shifts in these characteristics at test time do not compromise downstream prediction performance. For instance, in healthcare applications, we might like to learn features that contain information about pathology yet are unaffected by race, sex, and other sources of physiologic variability, thereby ensuring predictions are equitable and generalizable across all demographics. Here we propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR), an intuitive and easy-to-implement framework that effectively separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time. We begin by supposing that data representations can be semantically separated into task-relevant content features, which contain information relevant to downstream tasks, and task-irrelevant style features, which encompass superficial attributes that are irrelevant to these tasks, yet may degrade performance due to associations with content present in training data that do not generalize. We then prove that our anti-contrastive penalty, which we call Pair-Switching (PS), minimizes the Mutual Information between the style attributes and content labels. Finally, we instantiate CLEAR in the latent space of a Variational Auto-Encoder (VAE), then perform experiments to quantitatively and qualitatively evaluate the resulting CLEAR-VAE over several image datasets. Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style. Our code will be made publicly available.

nan

Article 1079

Title@2025-07-24 (4): Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

Title: Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

Erzählen Sie mir, was Sie sehen: Ein iteratives Deep Learning Framework für Bildunterschriften

告诉我你看到的是什么:一个用于图像描述的循环深学习框架 2507.18788v1

Authors (1): Hitesh Kumar Gupta

Image captioning, a task at the confluence of computer vision and natural language processing, requires a sophisticated understanding of both visual scenes and linguistic structure. While modern approaches are dominated by large-scale Transformer architectures, this paper documents a systematic, iterative development of foundational image captioning models, progressing from a simple CNN-LSTM encoder-decoder to a competitive attention-based system. We present a series of five models, beginning with Genesis and concluding with Nexus, an advanced model featuring an EfficientNetV2B3 backbone and a dynamic attention mechanism. Our experiments chart the impact of architectural enhancements and demonstrate a key finding within the classic CNN-LSTM paradigm: merely upgrading the visual backbone without a corresponding attention mechanism can degrade performance, as the single-vector bottleneck cannot transmit the richer visual detail. This insight validates the architectural shift to attention. Trained on the MS COCO 2017 dataset, our final model, Nexus, achieves a BLEU-4 score of 31.4, surpassing several foundational benchmarks and validating our iterative design process. This work provides a clear, replicable blueprint for understanding the core architectural principles that underpin modern vision-language tasks.

nan

Article 1080

Title@2025-07-24 (4): Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification

Title: Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification

Vergleichende Analyse von Vision Transformern und konvolutionären Neuralnetzwerken für medizinische Bildklassifikation

关于医学图像分类的愿景变异器和革命神经网络的比较分析 2507.21156v1

Authors (1): Kunal Kawadkar

The emergence of Vision Transformers (ViTs) has revolutionized computer vision, yet their effectiveness compared to traditional Convolutional Neural Networks (CNNs) in medical imaging remains under-explored. This study presents a comprehensive comparative analysis of CNN and ViT architectures across three critical medical imaging tasks: chest X-ray pneumonia detection, brain tumor classification, and skin cancer melanoma detection. We evaluated four state-of-the-art models - ResNet-50, EfficientNet-B0, ViT-Base, and DeiT-Small - across datasets totaling 8,469 medical images. Our results demonstrate task-specific model advantages: ResNet-50 achieved 98.37% accuracy on chest X-ray classification, DeiT-Small excelled at brain tumor detection with 92.16% accuracy, and EfficientNet-B0 led skin cancer classification at 81.84% accuracy. These findings provide crucial insights for practitioners selecting architectures for medical AI applications, highlighting the importance of task-specific architecture selection in clinical decision support systems.

nan

Article 1081

Title@2025-07-24 (4): Discovering the dynamics of \emph{Sargassum} rafts’ centers of mass

Title: Discovering the dynamics of \emph{Sargassum} rafts’ centers of mass

Die Dynamik von \emph{Sargassum} Floßzentren entdecken

发现木筏质量中心的动态 2507.18771v1

Authors (2): Francisco J. Beron-Vera, Gage Bonner

Since 2011, rafts of floating \emph{Sargassum} seaweed have frequently obstructed the coasts of the Intra-Americas Seas. The motion of the rafts is represented by a high-dimensional nonlinear dynamical system. Referred to as the eBOMB model, this builds on the Maxey–Riley equation by incorporating interactions between clumps of \emph{Sargassum} forming a raft and the effects of Earth’s rotation. The absence of a predictive law for the rafts’ centers of mass suggests a need for machine learning. In this paper, we evaluate and contrast Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) and Sparse Identification of Nonlinear Dynamics (SINDy). In both cases, a physics-inspired closure modeling approach is taken rooted in eBOMB. Specifically, the LSTM model learns a mapping from a collection of eBOMB variables to the difference between raft center-of-mass and ocean velocities. The SINDy model’s library of candidate functions is suggested by eBOMB variables and includes windowed velocity terms incorporating far-field effects of the carrying flow. Both LSTM and SINDy models perform most effectively in conditions with tightly bonded clumps, despite declining precision with rising complexity, such as with wind effects and when assessing loosely connected clumps. The LSTM model delivered the best results when designs were straightforward, with fewer neurons and hidden layers. While LSTM model serves as an opaque black-box model lacking interpretability, the SINDy model brings transparency by discerning explicit functional relationships through the function libraries. Integration of the windowed velocity terms enabled effective modeling of nonlocal interactions, particularly in datasets featuring sparsely connected rafts.

nan

Article 1082

Title@2025-07-24 (4): ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Title: ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

ylmmcl bei Mehrsprachiger Textentgiftung 2025: Lexikon-geführte Entgiftung und Klassifikator-gestrichenes Umschreiben

2025年多语言文本解毒:Lexicon-Guid解毒和分类法改写 2507.18769v1

Authors (4): Nicole Lai-Lopez, Lusha Wang, Su Yuan, Liza Zhang

In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned sequence-to-sequence model (s-nlp/mt0-xl-detox-orpo) and an iterative classifier-based gatekeeping mechanism. Our approach departs from prior unsupervised or monolingual pipelines by leveraging explicit toxic word annotation via the multilingual_toxic_lexicon to guide detoxification with greater precision and cross-lingual generalization. Our final model achieves the highest STA (0.922) from our previous attempts, and an average official J score of 0.612 for toxic inputs in both the development and test sets. It also achieved xCOMET scores of 0.793 (dev) and 0.787 (test). This performance outperforms baseline and backtranslation methods across multiple languages, and shows strong generalization in high-resource settings (English, Russian, French). Despite some trade-offs in SIM, the model demonstrates consistent improvements in detoxification strength. In the competition, our team achieved ninth place with a score of 0.612.

nan

Article 1083

Title@2025-07-24 (4): SPADE-S: A Sparsity-Robust Foundational Forecaster

Title: SPADE-S: A Sparsity-Robust Foundational Forecaster

SPADE-S: Ein Sparsity-Robust Foundational Forecaster

SPADE-S: 纯度-罗布斯基础预测器 2507.21155v1

Authors (14): Malcolm Wolff, Matthew Li, Ravi Kiran Selvam, Hanjing Zhu, Kin G. Olivares, Ruijun Ma, Abhinav Katoch, Shankar Ramasubramanian, Mengfei Cao, Roberto Bandarra, Rahul Gopalsamy, Stefania La Vattiata, Sitan Yang, Michael M. Mahoney

Despite significant advancements in time series forecasting, accurate modeling of time series with strong heterogeneity in magnitude and/or sparsity patterns remains challenging for state-of-the-art deep learning architectures. We identify several factors that lead existing models to systematically underperform on low-magnitude and sparse time series, including loss functions with implicit biases toward high-magnitude series, training-time sampling methods, and limitations of time series encoding methods. SPADE-S is a robust forecasting architecture that significantly reduces magnitude- and sparsity-based systematic biases and improves overall prediction accuracy. Empirical results demonstrate that SPADE-S outperforms existing state-of-the-art approaches across a diverse set of use cases in demand forecasting. In particular, we show that, depending on the quantile forecast and magnitude of the series, SPADE-S can improve forecast accuracy by up to 15%. This results in P90 overall forecast accuracy gains of 2.21%, 6.58%, and 4.28%, and P50 forecast accuracy gains of 0.92%, 0.77%, and 1.95%, respectively, for each of three distinct datasets, ranging from 3 million to 700 million series, from a large online retailer.

nan

Article 1084

Title@2025-07-24 (4): Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Title: Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Exploitation Over Exploration: Entlarvung der Bias in Linear Bandit Recommender Offline-Evaluation

开采过度勘探:在线性强盗建议者离岸评估中揭开比亚斯 2507.18756v1

Authors (5): Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida

Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing between exploiting items likely to be enjoyed and exploring new ones to gather information. In contextual linear bandits, this trade-off is particularly central, as many variants share the same linear regression backbone and differ primarily in their exploration strategies. Despite its prevalent use, offline evaluation of MABs is increasingly recognized for its limitations in reliably assessing exploration behavior. This study conducts an extensive offline empirical comparison of several linear MABs. Strikingly, across over 90% of various datasets, a greedy linear model, with no type of exploration, consistently achieves top-tier performance, often outperforming or matching its exploratory counterparts. This observation is further corroborated by hyperparameter optimization, which consistently favors configurations that minimize exploration, suggesting that pure exploitation is the dominant strategy within these evaluation settings. Our results expose significant inadequacies in offline evaluation protocols for bandits, particularly concerning their capacity to reflect true exploratory efficacy. Consequently, this research underscores the urgent necessity for developing more robust assessment methodologies, guiding future investigations into alternative evaluation frameworks for interactive learning in recommender systems.

nan

Article 1085

Title@2025-07-24 (4): Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

Title: Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

Lärm Kontrastive Schätzung-basiertes Matching Framework für die Erkennung von Low-Resource-Sicherheitsangriffen

低资源安保攻击模式识别比对框架 2401.10337v4

Authors (3): Tu Nguyen, Nedim Šrndić, Alexander Neth

Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.

nan

Article 1086

Title@2025-07-24 (4): Time-resolved dynamic CBCT reconstruction using prior-model-free spatiotemporal Gaussian representation (PMF-STGR)

Title: Time-resolved dynamic CBCT reconstruction using prior-model-free spatiotemporal Gaussian representation (PMF-STGR)

Zeitaufgelöste dynamische CBCT-Rekonstruktion unter Verwendung einer modellfreien raumzeitlichen Gauß-Darstellung (PMF-STGR)

利用以前不设模型的时空代表性(PMF-STGR),解决时间问题,重建CBCT 2503.22139v2

Authors (3): Jiacheng Xie, Hua-Chieh Shao, You Zhang

Time-resolved CBCT imaging, which reconstructs a dynamic sequence of CBCTs reflecting intra-scan motion (one CBCT per x-ray projection without phase sorting or binning), is highly desired for regular and irregular motion characterization, patient setup, and motion-adapted radiotherapy. Representing patient anatomy and associated motion fields as 3D Gaussians, we developed a Gaussian representation-based framework (PMF-STGR) for fast and accurate dynamic CBCT reconstruction. PMF-STGR comprises three major components: a dense set of 3D Gaussians to reconstruct a reference-frame CBCT for the dynamic sequence; another 3D Gaussian set to capture three-level, coarse-to-fine motion-basis-components (MBCs) to model the intra-scan motion; and a CNN-based motion encoder to solve projection-specific temporal coefficients for the MBCs. Scaled by the temporal coefficients, the learned MBCs will combine into deformation vector fields to deform the reference CBCT into projection-specific, time-resolved CBCTs to capture the dynamic motion. Due to the strong representation power of 3D Gaussians, PMF-STGR can reconstruct dynamic CBCTs in a ‘one-shot’ training fashion from a standard 3D CBCT scan, without using any prior anatomical or motion model. We evaluated PMF-STGR using XCAT phantom simulations and real patient scans. Metrics including the image relative error, structural-similarity-index-measure, tumor center-of-mass-error, and landmark localization error were used to evaluate the accuracy of solved dynamic CBCTs and motion. PMF-STGR shows clear advantages over a state-of-the-art, INR-based approach, PMF-STINR. Compared with PMF-STINR, PMF-STGR reduces reconstruction time by 50% while reconstructing less blurred images with better motion accuracy. With improved efficiency and accuracy, PMF-STGR enhances the applicability of dynamic CBCT imaging for potential clinical translation.

nan

Article 1087

Title@2025-07-24 (4): Learned Single-Pixel Fluorescence Microscopy

Title: Learned Single-Pixel Fluorescence Microscopy

Gelernte Einzel-Pixel Fluoreszenzmikroskopie

获得单像素荧光显微镜 2507.18740v1

Authors (6): Serban C. Tudosie, Valerio Gandolfi, Shivaprasad Varakkoth, Andrea Farina, Cosimo D’Andrea, Simon Arridge

Single-pixel imaging has emerged as a key technique in fluorescence microscopy, where fast acquisition and reconstruction are crucial. In this context, images are reconstructed from linearly compressed measurements. In practice, total variation minimisation is still used to reconstruct the image from noisy measurements of the inner product between orthogonal sampling pattern vectors and the original image data. However, data can be leveraged to learn the measurement vectors and the reconstruction process, thereby enhancing compression, reconstruction quality, and speed. We train an autoencoder through self-supervision to learn an encoder (or measurement matrix) and a decoder. We then test it on physically acquired multispectral and intensity data. During acquisition, the learned encoder becomes part of the physical device. Our approach can enhance single-pixel imaging in fluorescence microscopy by reducing reconstruction time by two orders of magnitude, achieving superior image quality, and enabling multispectral reconstructions. Ultimately, learned single-pixel fluorescence microscopy could advance diagnosis and biological research, providing multispectral imaging at a fraction of the cost.

nan

Article 1088

Title@2025-07-24 (4): An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid

Title: An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid

Ein erklärbares Equity-Aware P2P Energy Trading Framework für sozio-ökonomische Diverse Microgrid

社会经济多样化微电网可解释的公平-公见P2P能源贸易框架 2507.18738v1

Authors (2): Abhijan Theja, Mayukha Pal

Fair and dynamic energy allocation in community microgrids remains a critical challenge, particularly when serving socio-economically diverse participants. Static optimization and cost-sharing methods often fail to adapt to evolving inequities, leading to participant dissatisfaction and unsustainable cooperation. This paper proposes a novel framework that integrates multi-objective mixed-integer linear programming (MILP), cooperative game theory, and a dynamic equity-adjustment mechanism driven by reinforcement learning (RL). At its core, the framework utilizes a bi-level optimization model grounded in Equity-regarding Welfare Maximization (EqWM) principles, which incorporate Rawlsian fairness to prioritize the welfare of the least advantaged participants. We introduce a Proximal Policy Optimization (PPO) agent that dynamically adjusts socio-economic weights in the optimization objective based on observed inequities in cost and renewable energy access. This RL-powered feedback loop enables the system to learn and adapt, continuously striving for a more equitable state. To ensure transparency, Explainable AI (XAI) is used to interpret the benefit allocations derived from a weighted Shapley value. Validated across six realistic scenarios, the framework demonstrates peak demand reductions of up to 72.6%, and significant cooperative gains. The adaptive RL mechanism further reduces the Gini coefficient over time, showcasing a pathway to truly sustainable and fair energy communities.

nan

Article 1089

Title@2025-07-24 (4): Less is More: Adaptive Coverage for Synthetic Training Data

Title: Less is More: Adaptive Coverage for Synthetic Training Data

Weniger ist mehr: Adaptive Abdeckung für Synthetische Trainingsdaten

较少为: 合成培训数据的适应性覆盖 2504.14508v2

Authors (6): Sasan Tavakkol, Max Springer, Mohammadhossein Bateni, Neslihan Bulut, Vincent Cohen-Addad, MohammadTaghi Hajiaghayi

Synthetic training data generation with Large Language Models (LLMs) like Google’s Gemma and OpenAI’s GPT offer a promising solution to the challenge of obtaining large, labeled datasets for training classifiers. When rapid model deployment is critical, such as in classifying emerging social media trends or combating new forms of online abuse tied to current events, the ability to generate training data is invaluable. While prior research has examined the comparability of synthetic data to human-labeled data, this study introduces a novel sampling algorithm, based on the maximum coverage problem, to select a representative subset from a synthetically generated dataset. Our results demonstrate that training a classifier on this contextually sampled subset achieves superior performance compared to training on the entire dataset. This “less is more” approach not only improves model accuracy but also reduces the volume of data required, leading to potentially more efficient model fine-tuning.

nan

Article 1090

Title@2025-07-24 (4): Bootstrapped Reward Shaping

Title: Bootstrapped Reward Shaping

Bootstrapped Reward Shaping

增强的奖励形状 2501.00989v2

Authors (4): Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, “potential-based reward shaping” (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required “potential function” must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a “bootstrapped” method of reward shaping, termed BSRS, in which the agent’s current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.

nan

Article 1091

Title@2025-07-24 (4): Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach

Title: Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach

Mehrjährige Wartungsplanung für großräumige Infrastruktursysteme: Ein neuartiges Netzwerk Deep Q-Learning-Ansatz

大型基础设施体系多年期维持规划:新网络深学习方法 2507.18732v1

Authors (2): Amir Fard, Arnold X. -X. Yuan

Infrastructure asset management is essential for sustaining the performance of public infrastructure such as road networks, bridges, and utility networks. Traditional maintenance and rehabilitation planning methods often face scalability and computational challenges, particularly for large-scale networks with thousands of assets under budget constraints. This paper presents a novel deep reinforcement learning (DRL) framework that optimizes asset management strategies for large infrastructure networks. By decomposing the network-level Markov Decision Process (MDP) into individual asset-level MDPs while using a unified neural network architecture, the proposed framework reduces computational complexity, improves learning efficiency, and enhances scalability. The framework directly incorporates annual budget constraints through a budget allocation mechanism, ensuring maintenance plans are both optimal and cost-effective. Through a case study on a large-scale pavement network of 68,800 segments, the proposed DRL framework demonstrates significant improvements over traditional methods like Progressive Linear Programming and genetic algorithms, both in efficiency and network performance. This advancement contributes to infrastructure asset management and the broader application of reinforcement learning in complex, large-scale environments.

nan

Article 1092

Title@2025-07-24 (4): Exploration Behavior of Untrained Policies

Title: Exploration Behavior of Untrained Policies

Explorationsverhalten ungeübter Politiken

未经过培训的政策的探索行为 2506.22566v3

Authors (1): Jacob Adamczyk

Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration behavior in early training.

nan

Article 1093

Title@2025-07-24 (4): An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Title: An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Effizientes Sparse-Fine-Tuning mit geringem Quantisierungsfehler über Neural Network Pruning

通过神经网络节制低量错误的高效粗简精细调整 2502.11439v2

Authors (2): Cen-Jhih Li, Aditya Bhaskara

Fine-tuning is an important step in adapting foundation models such as large language models to downstream tasks. To make this step more accessible to users with limited computational budgets, it is crucial to develop fine-tuning methods that are memory and computationally efficient. Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) are two frameworks that have emerged for addressing this problem and have been adopted widely in practice. In this work, we develop a new SpFT framework, based on ideas from neural network pruning. At a high level, we first identify ``important’’ neurons/nodes using feature importance metrics from network pruning (specifically, we use the structural pruning method), and then perform fine-tuning by restricting to weights involving these neurons. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20-50\% while matching the accuracy of state-of-the-art methods like LoRA’s variants.

nan

Article 1094

Title@2025-07-24 (4): The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models

Title: The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models

Das Recht vergessen zu werden: Enthüllen Sie Maschine Entlernen von Sparse-Modellen

普鲁宁被遗忘的权利:破碎型号的unveil 机器退出学习 2507.18725v1

Authors (6): Yang Xiao, Gen Li, Jie Ji, Ruimeng Ye, Xiaolong Ma, Bo Hui

Machine unlearning aims to efficiently eliminate the memory about deleted data from trained models and address the right to be forgotten. Despite the success of existing unlearning algorithms, unlearning in sparse models has not yet been well studied. In this paper, we empirically find that the deleted data has an impact on the pruned topology in a sparse model. Motivated by the observation and the right to be forgotten, we define a new terminology ``un-pruning” to eliminate the impact of deleted data on model pruning. Then we propose an un-pruning algorithm to approximate the pruned topology driven by retained data. We remark that any existing unlearning algorithm can be integrated with the proposed un-pruning workflow and the error of un-pruning is upper-bounded in theory. Also, our un-pruning algorithm can be applied to both structured sparse models and unstructured sparse models. In the experiment, we further find that Membership Inference Attack (MIA) accuracy is unreliable for assessing whether a model has forgotten deleted data, as a small change in the amount of deleted data can produce arbitrary MIA results. Accordingly, we devise new performance metrics for sparse models to evaluate the success of un-pruning. Lastly, we conduct extensive experiments to verify the efficacy of un-pruning with various pruning methods and unlearning algorithms. Our code is released at https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/.

nan

Article 1095

Title@2025-07-24 (4): SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning

Title: SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning

SCORE-SET: Ein Datensatz von GuitarPro-Dateien für Musik Phrase Generation und Sequence Learning

SCORE-SET: 用于音乐词组生成和序列学习的吉他Pro文件数据集 2507.18723v1

Authors (1): Vishakh Begari

A curated dataset of Guitar Pro tablature files (.gp5 format), tailored for tasks involving guitar music generation, sequence modeling, and performance-aware learning is provided. The dataset is derived from MIDI notes in MAESTRO and GiantMIDI which have been adapted into rhythm guitar tracks. These tracks are further processed to include a variety of expression settings typical of guitar performance, such as bends, slides, vibrato, and palm muting, to better reflect the nuances of real-world guitar playing.

nan

Article 1096

Title@2025-07-24 (4): Fixed-Point RNNs: Interpolating from Diagonal to Dense

Title: Fixed-Point RNNs: Interpolating from Diagonal to Dense

Fixed-Point RNNs: Interpolieren von Diagonal nach Dense

固定点区域NN:从对角线到对角线的内插 2503.10799v2

Authors (4): Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto

Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the commonly used toy tasks $A_5$, $S_5$, copying, and modular arithmetics.

nan

Article 1097

Title@2025-07-24 (4): Adaptive Neural Quantum States: A Recurrent Neural Network Perspective

Title: Adaptive Neural Quantum States: A Recurrent Neural Network Perspective

Adaptive Neurale Quantenzustände: Eine wiederkehrende Neurale Netzwerkperspektive

适应性神经量子州:经常性神经网络视角 2507.18700v1

Authors (2): Jake McNaughton, Mohamed Hibat-Allah

Neural-network quantum states (NQS) are powerful neural-network ans"atzes that have emerged as promising tools for studying quantum many-body physics through the lens of the variational principle. These architectures are known to be systematically improvable by increasing the number of parameters. Here we demonstrate an Adaptive scheme to optimize NQSs, through the example of recurrent neural networks (RNN), using a fraction of the computation cost while reducing training fluctuations and improving the quality of variational calculations targeting ground states of prototypical models in one- and two-spatial dimensions. This Adaptive technique reduces the computational cost through training small RNNs and reusing them to initialize larger RNNs. This work opens up the possibility for optimizing graphical processing unit (GPU) resources deployed in large-scale NQS simulations.

nan

Article 1098

Title@2025-07-24 (4): Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Title: Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Pseudo-Labeling für Kernel Ridge Regression unter Kovariate Shift

共变移下内核循环脊回归的优多环流 2302.10160v4

Authors (1): Kaizheng Wang

We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets, and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate accordingly. Our non-asymptotic excess risk bounds demonstrate that our estimator adapts effectively to both the structure of the target distribution and the covariate shift. This adaptation is quantified through a notion of effective sample size that reflects the value of labeled source data for the target regression task. Our estimator achieves the minimax optimal error rate up to a polylogarithmic factor, and we find that using pseudo-labels for model selection does not significantly hinder performance.

nan

Article 1099

Title@2025-07-24 (4): SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

Title: SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

SIDA: Synthetisches Bild angetrieben Null-Schuss Domain-Anpassung

SIDA: 合成图像驱动器零弹射域适应 2507.18632v1

Authors (5): Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim

Zero-shot domain adaptation is a method for adapting a model to a target domain without utilizing target domain image data. To enable adaptation without target images, existing studies utilize CLIP’s embedding space and text description to simulate target-like style features. Despite the previous achievements in zero-shot domain adaptation, we observe that these text-driven methods struggle to capture complex real-world variations and significantly increase adaptation time due to their alignment process. Instead of relying on text descriptions, we explore solutions leveraging image data, which provides diverse and more fine-grained style cues. In this work, we propose SIDA, a novel and efficient zero-shot domain adaptation method leveraging synthetic images. To generate synthetic images, we first create detailed, source-like images and apply image translation to reflect the style of the target domain. We then utilize the style features of these synthetic images as a proxy for the target domain. Based on these features, we introduce Domain Mix and Patch Style Transfer modules, which enable effective modeling of real-world variations. In particular, Domain Mix blends multiple styles to expand the intra-domain representations, and Patch Style Transfer assigns different styles to individual patches. We demonstrate the effectiveness of our method by showing state-of-the-art performance in diverse zero-shot adaptation scenarios, particularly in challenging domains. Moreover, our approach achieves high efficiency by significantly reducing the overall adaptation time.

nan

Article 1100

Title@2025-07-24 (4): Gait Recognition Based on Tiny ML and IMU Sensors

Title: Gait Recognition Based on Tiny ML and IMU Sensors

Gait-Erkennung basierend auf winzigen ML- und IMU-Sensoren

基于小ML和IMU传感器的Gait识别 2507.18627v1

Authors (3): Jiahang Zhang, Mingtong Chen, Zhengbao Yang

This project presents the development of a gait recognition system using Tiny Machine Learning (Tiny ML) and Inertial Measurement Unit (IMU) sensors. The system leverages the XIAO-nRF52840 Sense microcontroller and the LSM6DS3 IMU sensor to capture motion data, including acceleration and angular velocity, from four distinct activities: walking, stationary, going upstairs, and going downstairs. The data collected is processed through Edge Impulse, an edge AI platform, which enables the training of machine learning models that can be deployed directly onto the microcontroller for real-time activity classification.The data preprocessing step involves extracting relevant features from the raw sensor data using techniques such as sliding windows and data normalization, followed by training a Deep Neural Network (DNN) classifier for activity recognition. The model achieves over 80% accuracy on a test dataset, demonstrating its ability to classify the four activities effectively. Additionally, the platform enables anomaly detection, further enhancing the robustness of the system. The integration of Tiny ML ensures low-power operation, making it suitable for battery-powered or energy-harvesting devices.

nan

Article 1101

Title@2025-07-24 (4): TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

Title: TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

TRPrompt: Bootstrapping Query-Aware Prompt Optimierung von Textbelohnungen

TRPropt: 从文本奖励中促进解答询问软件快速优化 2507.18618v1

Authors (5): Andreea Nica, Ivan Zakazov, Nicolas Mario Baldwin, Saibo Geng, Robert West

Prompt optimization improves the reasoning abilities of large language models (LLMs) without requiring parameter updates to the target model. Following heuristic-based “Think step by step” approaches, the field has evolved in two main directions: while one group of methods uses textual feedback to elicit improved prompts from general-purpose LLMs in a training-free way, a concurrent line of research relies on numerical rewards to train a special prompt model, tailored for providing optimal prompts to the target model. In this paper, we introduce the Textual Reward Prompt framework (TRPrompt), which unifies these approaches by directly incorporating textual feedback into training of the prompt model. Our framework does not require prior dataset collection and is being iteratively improved with the feedback on the generated prompts. When coupled with the capacity of an LLM to internalize the notion of what a “good” prompt is, the high-resolution signal provided by the textual rewards allows us to train a prompt model yielding state-of-the-art query-specific prompts for the problems from the challenging math datasets GSMHard and MATH.

nan

Article 1102

Title: SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning

SynC: Synthetische Bildunterschrift Datensatzverfeinerung mit ein-zu-vielen Mapping für Zero-shot Bildunterschrift

合成图像说明: 合成图像说明数据集精化,用一到多个绘图进行零光图像说明的合成图像说明 2507.18616v1

Authors (6): Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim

Zero-shot Image Captioning (ZIC) increasingly utilizes synthetic datasets generated by text-to-image (T2I) models to mitigate the need for costly manual annotation. However, these T2I models often produce images that exhibit semantic misalignments with their corresponding input captions (e.g., missing objects, incorrect attributes), resulting in noisy synthetic image-caption pairs that can hinder model training. Existing dataset pruning techniques are largely designed for removing noisy text in web-crawled data. However, these methods are ill-suited for the distinct challenges of synthetic data, where captions are typically well-formed, but images may be inaccurate representations. To address this gap, we introduce SynC, a novel framework specifically designed to refine synthetic image-caption datasets for ZIC. Instead of conventional filtering or regeneration, SynC focuses on reassigning captions to the most semantically aligned images already present within the synthetic image pool. Our approach employs a one-to-many mapping strategy by initially retrieving multiple relevant candidate images for each caption. We then apply a cycle-consistency-inspired alignment scorer that selects the best image by verifying its ability to retrieve the original caption via image-to-text retrieval. Extensive evaluations demonstrate that SynC consistently and significantly improves performance across various ZIC models on standard benchmarks (MS-COCO, Flickr30k, NoCaps), achieving state-of-the-art results in several scenarios. SynC offers an effective strategy for curating refined synthetic data to enhance ZIC.

nan

Article 1103

Title@2025-07-24 (4): BEARCUBS: A benchmark for computer-using web agents

Title: BEARCUBS: A benchmark for computer-using web agents

BEARCUBS: Benchmark für computergestützte Web-Agenten

BEARCUBS:计算机使用网络代理器的基准 2503.07919v3

Authors (6): Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei Chang, Mazin Nadaf, Mohit Iyyer

Modern web agents possess computer use abilities that allow them to interact with webpages by sending commands to a virtual keyboard and mouse. While such agents have considerable potential to assist human users with complex tasks, evaluating their capabilities in real-world settings poses a major challenge. To this end, we introduce BEARCUBS, a “smallbut mighty” benchmark of 111 information-seeking questions designed to evaluate a web agent’s ability to search, browse, and identify factual information from the web. Unlike prior web agent benchmarks, solving BEARCUBS requires (1) accessing live web content rather than synthetic or simulated pages, which captures the unpredictability of real-world web interactions; and (2) performing a broad range of multimodal interactions (e.g., video understanding, 3D navigation) that cannot be bypassed via text-based workarounds. Each question in BEARCUBS has a corresponding short, unambiguous answer and a human-validated browsing trajectory, allowing for transparent evaluation of agent performance and strategies. A human study confirms that BEARCUBS questions are solvable but non-trivial (84.7% human accuracy), revealing domain knowledge gaps and overlooked details as common failure points. We find that ChatGPT Agent significantly outperforms other computer-using agents with an overall accuracy of 65.8% (compared to e.g., Operator’s 23.4%), showcasing substantial progress in tasks involving real computer use, such as playing web games and navigating 3D environments. Nevertheless, closing the gap to human performance requires improvements in areas like fine control, complex data filtering, and execution speed. To facilitate future research, BEARCUBS will be updated periodically to replace invalid or contaminated questions, keeping the benchmark fresh for future generations of web agents.

nan

Article 1104

Title@2025-07-24 (4): Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Title: Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Interact2Vec – Ein effizientes neuronales Netzwerk-basiertes Modell zum gleichzeitigen Lernen von Benutzern und Elementen in Empfehlungssysteme

Interact2Vec – – 一个有效的神经网络模式,用于同时学习用户和项目嵌入建议系统 2506.22648v3

Authors (2): Pedro R. Pires, Tiago A. Almeida

Over the past decade, recommender systems have experienced a surge in popularity. Despite notable progress, they grapple with challenging issues, such as high data dimensionality and sparseness. Representing users and items as low-dimensional embeddings learned via neural networks has become a leading solution. However, while recent studies show promising results, many approaches rely on complex architectures or require content data, which may not always be available. This paper presents Interact2Vec, a novel neural network-based model that simultaneously learns distributed embeddings for users and items while demanding only implicit feedback. The model employs state-of-the-art strategies that natural language processing models commonly use to optimize the training phase and enhance the final embeddings. Two types of experiments were conducted regarding the extrinsic and intrinsic quality of the model. In the former, we benchmarked the recommendations generated by Interact2Vec’s embeddings in a top-$N$ ranking problem, comparing them with six other recommender algorithms. The model achieved the second or third-best results in 30% of the datasets, being competitive with other recommenders, and has proven to be very efficient with an average training time reduction of 274% compared to other embedding-based models. Later, we analyzed the intrinsic quality of the embeddings through similarity tables. Our findings suggest that Interact2Vec can achieve promising results, especially on the extrinsic task, and is an excellent embedding-generator model for scenarios of scarce computing resources, enabling the learning of item and user embeddings simultaneously and efficiently.

nan

Article 1105

Title@2025-07-24 (4): Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Title: Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Erklärbarer Mapper: LLM-Embedding-Räume mit Perturbation-basierten Erklärungs- und Verifikations-Agenten kartographieren

可解释的成像仪:利用以扰动为基础的解释和核查仪器绘制LLM内嵌空间图 2507.18607v1

Authors (5): Xinyuan Yan, Rita Sevastjanova, Sinie van der Ben, Mennatallah El-Assady, Bei Wang

Large language models (LLMs) produce high-dimensional embeddings that capture rich semantic and syntactic relationships between words, sentences, and concepts. Investigating the topological structures of LLM embedding spaces via mapper graphs enables us to understand their underlying structures. Specifically, a mapper graph summarizes the topological structure of the embedding space, where each node represents a topological neighborhood (containing a cluster of embeddings), and an edge connects two nodes if their corresponding neighborhoods overlap. However, manually exploring these embedding spaces to uncover encoded linguistic properties requires considerable human effort. To address this challenge, we introduce a framework for semi-automatic annotation of these embedding properties. To organize the exploration process, we first define a taxonomy of explorable elements within a mapper graph such as nodes, edges, paths, components, and trajectories. The annotation of these elements is executed through two types of customizable LLM-based agents that employ perturbation techniques for scalable and automated analysis. These agents help to explore and explain the characteristics of mapper elements and verify the robustness of the generated explanations. We instantiate the framework within a visual analytics workspace and demonstrate its effectiveness through case studies. In particular, we replicate findings from prior research on BERT’s embedding properties across various layers of its architecture and provide further observations into the linguistic properties of topological neighborhoods.

nan

Article 1106

Title@2025-07-24 (4): Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Title: Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Hybrider quantenklassischer Algorithmus zur nahezu optimalen Planung in POMDPs

POMDPs中接近最佳规划的混合量子-古典量子算法 2507.18606v1

Authors (5): Gilberto Cunha, Alexandra Ramôa, André Sequeira, Michael de Oliveira, Luís Barbosa

Reinforcement learning (RL) provides a principled framework for decision-making in partially observable environments, which can be modeled as Markov decision processes and compactly represented through dynamic decision Bayesian networks. Recent advances demonstrate that inference on sparse Bayesian networks can be accelerated using quantum rejection sampling combined with amplitude amplification, leading to a computational speedup in estimating acceptance probabilities.\ Building on this result, we introduce Quantum Bayesian Reinforcement Learning (QBRL), a hybrid quantum-classical look-ahead algorithm for model-based RL in partially observable environments. We present a rigorous, oracle-free time complexity analysis under fault-tolerant assumptions for the quantum device. Unlike standard treatments that assume a black-box oracle, we explicitly specify the inference process, allowing our bounds to more accurately reflect the true computational cost. We show that, for environments whose dynamics form a sparse Bayesian network, horizon-based near-optimal planning can be achieved sub-quadratically faster through quantum-enhanced belief updates. Furthermore, we present numerical experiments benchmarking QBRL against its classical counterpart on simple yet illustrative decision-making tasks. Our results offer a detailed analysis of how the quantum computational advantage translates into decision-making performance, highlighting that the magnitude of the advantage can vary significantly across different deployment settings.

nan

Article 1107

Title@2025-07-24 (4): Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

Title: Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

Beyond Euklid: Ein illustrierter Leitfaden zum modernen maschinellen Lernen mit geometrischen, topologischen und algebraischen Strukturen

欧几里特以外:带有几何、地形学和代数结构的现代机器学习设计指南 2407.09468v2

Authors (11): Mathilde Papillon, Sophia Sanborn, Johan Mathe, Louisa Cornelis, Abby Bertics, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Fatih Dinc, Xavier Pennec, Nina Miolane

The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.

nan

Article 1108

Title@2025-07-24 (4): Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Title: Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Entmystifizieren Protein-Generation mit Hierarchische Bedingte Diffusion Modelle

使用等级级有条件扩散模型解密蛋白一代 2507.18603v1

Authors (5): Zinan Ling, Yi Shi, Da Yan, Yang Zhou, Bo Hui

Generating novel and functional protein sequences is critical to a wide range of applications in biology. Recent advancements in conditional diffusion models have shown impressive empirical performance in protein generation tasks. However, reliable generations of protein remain an open research question in de novo protein design, especially when it comes to conditional diffusion models. Considering the biological function of a protein is determined by multi-level structures, we propose a novel multi-level conditional diffusion model that integrates both sequence-based and structure-based information for efficient end-to-end protein design guided by specified functions. By generating representations at different levels simultaneously, our framework can effectively model the inherent hierarchical relations between different levels, resulting in an informative and discriminative representation of the generated protein. We also propose a Protein-MMD, a new reliable evaluation metric, to evaluate the quality of generated protein with conditional diffusion models. Our new metric is able to capture both distributional and functional similarities between real and generated protein sequences while ensuring conditional consistency. We experiment with the benchmark datasets, and the results on conditional protein generation tasks demonstrate the efficacy of the proposed generation framework and evaluation metric.

nan

Article 1109

Title@2025-07-24 (4): Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Title: Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Sparse Logit Sampling: Beschleunigung der Wissensdestillation in LLMs

粗略的登录抽样:加速在LLMs中进行知识蒸馏 2503.16870v2

Authors (8): Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee

Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation’, which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.

nan

Article 1110

Title@2025-07-24 (4): Linear Memory SE(2) Invariant Attention

Title: Linear Memory SE(2) Invariant Attention

Linearer Speicher SE(2) Invariante Aufmerksamkeit

线性内存 SE(2) 惯性注意 2507.18597v1

Authors (6): Ethan Pronovost, Neha Boloor, Peter Schleede, Noureldin Hendy, Andres Morales, Nicholas Roy

Processing spatial data is a key component in many learning tasks for autonomous driving such as motion forecasting, multi-agent simulation, and planning. Prior works have demonstrated the value in using SE(2) invariant network architectures that consider only the relative poses between objects (e.g. other agents, scene features such as traffic lanes). However, these methods compute the relative poses for all pairs of objects explicitly, requiring quadratic memory. In this work, we propose a mechanism for SE(2) invariant scaled dot-product attention that requires linear memory relative to the number of objects in the scene. Our SE(2) invariant transformer architecture enjoys the same scaling properties that have benefited large language models in recent years. We demonstrate experimentally that our approach is practical to implement and improves performance compared to comparable non-invariant architectures.

nan

Article 1111

Title@2025-07-24 (4): Private Counterfactual Retrieval

Title: Private Counterfactual Retrieval

Private kontraaktische Retrieval

私人反事实检索 2410.13812v2

Authors (5): Mohamed Nomeir, Pasan Dissanayake, Shreya Meel, Sanghamitra Dutta, Sennur Ulukus

Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of fulfilling this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user’s privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user’s preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.

nan

Article 1112

Title@2025-07-24 (4): DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

Title: DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

DRWKV: Fokussierung auf Objektkanten für Low-Light Image Enhancement

DRWKV: 关注低光图像增强对象边缘 2507.18594v1

Authors (8): Xuecheng Bai, Yuxiang Wang, Boyu Hu, Qinyuan Jie, Chuanzhi Xu, Hongru Xiao, Kechen Li, Vera Chung

Low-light image enhancement remains a challenging task, particularly in preserving object edge continuity and fine structural details under extreme illumination degradation. In this paper, we propose a novel model, DRWKV (Detailed Receptance Weighted Key Value), which integrates our proposed Global Edge Retinex (GER) theory, enabling effective decoupling of illumination and edge structures for enhanced edge fidelity. Secondly, we introduce Evolving WKV Attention, a spiral-scanning mechanism that captures spatial edge continuity and models irregular structures more effectively. Thirdly, we design the Bilateral Spectrum Aligner (Bi-SAB) and a tailored MS2-Loss to jointly align luminance and chrominance features, improving visual naturalness and mitigating artifacts. Extensive experiments on five LLIE benchmarks demonstrate that DRWKV achieves leading performance in PSNR, SSIM, and NIQE while maintaining low computational complexity. Furthermore, DRWKV enhances downstream performance in low-light multi-object tracking tasks, validating its generalization capabilities.

nan

Article 1113

Title@2025-07-24 (4): On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Title: On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen

关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v3

Authors (3): Zhen Qin, Jinxin Zhou, Zhihui Zhu

Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components–such as self-attention mechanisms and feedforward networks–without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

nan

Article 1114

Title@2025-07-24 (4): Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

Title: Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

Jenseits interner Daten: Konstruieren vollständiger Datensätze für Fairness-Tests

超越内部数据:为公平测试建立完整的数据集 2507.18561v1

Authors (4): Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz, David Barber

As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.

nan

Article 1115

Title@2025-07-24 (4): Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Title: Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Konzept-Probing: Wo man menschendefinierte Konzepte findet (erweiterte Version)

概念检验:如何找到人类定义的概念(扩展版本) 2507.18681v1

Authors (3): Manuel de Sousa Ribeiro, Afonso Leote, João Leite

Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer’s representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.

nan

Article 1116

Title@2025-07-24 (4): Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

Title: Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

Omni-Thinker: Skalierung der Cross-Domain-Verallgemeinerung in LLMs über Multi-Task RL mit Hybrid Rewards

Omni-Thinker:通过多任务RL与混合奖励在LLMLM中扩大跨域通用化 2507.14783v2

Authors (13): Derek Li, Jiaming Zhou, Amirreza Kazemi, Qianyi Sun, Abbas Ghaddar, Mohammad Ali Alomrani, Liheng Ma, Yu Luo, Dong Li, Feng Wen, Jianye Hao, Mark Coates, Yingxue Zhang

The advancement of general-purpose artificial intelligence relies on large language models (LLMs) that excel across a wide range of tasks, from structured reasoning to creative generation. However, post-training methods like Supervised Fine-Tuning (SFT) often struggle with generalization, favoring memorization over transferable learning. In this work, we introduce Omni-Thinker, a unified reinforcement learning (RL) framework that enhances LLM performance across diverse tasks by combining rule-based verifiable rewards with generative preference signals via LLM-as-a-Judge evaluations. Our approach enables consistent optimization across task types and scales RL-based training to subjective domains. We further investigate training strategies, demonstrating that a curriculum-based progression that orders tasks from structured to open-ended improves performance and reduces forgetting. Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging. These results highlight the importance of task-aware sampling and hybrid supervision in scaling RL-based post-training for general-purpose LLMs.

nan

Article 1117

Title@2025-07-24 (4): LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

Title: LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

LagKV: Lag-Relative Information des KV-Cache erzählt, welche Token wichtig sind

LagKV: KV 缓存告诉哪个 Tokens 重要, 而 KV 缓存的拉格- 相对信息Name 2504.04704v2

Authors (4): Manlai Liang, JiaMing Zhang, Xiong Li, Jinlong Li

The increasing size of the Key-Value (KV) cache during the Large Language Models long-context inference is the main obstacle for its balance between the deployment cost and task accuracy. To reduce the KV cache size in such scenarios, most previous efforts leveraged on the attention weight to evict non-critical cache tokens. But there is a trade-off in those methods, they usually require major modification of the inference infrastructure and significant computation overhead. Based on the fact that the Large Language models are autoregressive models, we propose LagKV, a KV compression strategy only relying on straight forward comparison among KV themselves. It is a totally attention free method which offers easy integration to the main stream inference platform and comparable performance comparing to other complicated KV compression methods. Results on RULER benchmark show that, our approach outperforms SnapKV and StreamingLLM in different compression ratios. Especially in the 64-digit passkey retrieval task, our method outperforms the attention weight based method $H_2O$ over $50\%$ with same compression ratios. Our code is available at https://github.com/AI-Lab-China-Merchants-Bank/LagKV.

nan

Article 1118

Title@2025-07-24 (4): The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm

Title: The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm

Die Geometrie der LLM-Quantisierung: GPTQ als Babai’s nächste Flugzeugalgorithmus

LLM 定量法的几何测量:GPTQ作为Babai最接近的平地 2507.18553v1

Authors (3): Jiale Chen, Torsten Hoefler, Dan Alistarh

Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai’s nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer’s inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai’s algorithm under the no-clipping condition. Taken together, these results place GPTQ on firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.

nan

Article 1119

Title@2025-07-24 (4): Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Title: Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Zeroth-Order Feinsteuerung von LLMs in Random Subspaces

随机子空间中LLMs的零级微调微调 2410.08989v3

Authors (6): Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang

Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model’s parameter dimension$\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs’ high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks. Code is available at https://github.com/zimingyy/SubZero.

nan

Article 1120

Title@2025-07-24 (4): On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Title: On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Zur Performance von Konzept-Probing: Der Einfluss der Daten (Erweiterte Version)

关于 “ 概念检验:数据的影响 “ 的绩效(扩展版) 2507.18550v1

Authors (3): Manuel de Sousa Ribeiro, Afonso Leote, João Leite

Concept probing has recently garnered increasing interest as a way to help interpret artificial neural networks, dealing both with their typically large size and their subsymbolic nature, which ultimately renders them unfeasible for direct human interpretation. Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest, thus allowing humans to peek inside artificial neural networks. Research on concept probing has mainly focused on the model being probed or the probing model itself, paying limited attention to the data required to train such probing models. In this paper, we address this gap. Focusing on concept probing in the context of image classification tasks, we investigate the effect of the data used to train probing models on their performance. We also make available concept labels for two widely used datasets.

nan

Article 1121

Title@2025-07-24 (4): Market Making Strategies with Reinforcement Learning

Title: Market Making Strategies with Reinforcement Learning

Strategien für die Marktentwicklung mit dem Ausbau des Lernens

具有强化学习的市场战略 2507.18680v1

Authors (1): Óscar Fernández Vicente

This thesis presents the results of a comprehensive research project focused on applying Reinforcement Learning (RL) to the problem of market making in financial markets. Market makers (MMs) play a fundamental role in providing liquidity, yet face significant challenges arising from inventory risk, competition, and non-stationary market dynamics. This research explores how RL, particularly Deep Reinforcement Learning (DRL), can be employed to develop autonomous, adaptive, and profitable market making strategies. The study begins by formulating the MM task as a reinforcement learning problem, designing agents capable of operating in both single-agent and multi-agent settings within a simulated financial environment. It then addresses the complex issue of inventory management using two complementary approaches: reward engineering and Multi-Objective Reinforcement Learning (MORL). While the former uses dynamic reward shaping to guide behavior, the latter leverages Pareto front optimization to explicitly balance competing objectives. To address the problem of non-stationarity, the research introduces POW-dTS, a novel policy weighting algorithm based on Discounted Thompson Sampling. This method allows agents to dynamically select and combine pretrained policies, enabling continual adaptation to shifting market conditions. The experimental results demonstrate that the proposed RL-based approaches significantly outperform traditional and baseline algorithmic strategies across various performance metrics. Overall, this research thesis contributes new methodologies and insights for the design of robust, efficient, and adaptive market making agents, reinforcing the potential of RL to transform algorithmic trading in complex financial systems.

nan

Article 1122

Title@2025-07-24 (4): The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection

Title: The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection

Die Preisgleichung zeigt ein universelles Gesetz des algorithmischen Lernens und der natürlichen Selektion.

价格方程式揭示了一种通用的算法学习法和自然选择法 2507.18549v1

Authors (1): Steven A. Frank

Diverse learning algorithms, optimization methods, and natural selection share a common mathematical structure, despite their apparent differences. Here I show that a simple notational partitioning of change by the Price equation reveals a universal force-metric-bias (FMB) law: $\Delta\mathbf{\theta} = \mathbf{M}\,\mathbf{f} + \mathbf{b} + \mathbf{\xi}$. The force $\mathbf{f}$ drives improvement in parameters, $\Delta\mathbf{\theta}$, through the covariance between the parameters and performance. The metric $\mathbf{M}$ rescales movement by inverse curvature. The bias $\mathbf{b}$ adds momentum or changes in the frame of reference. The noise $\mathbf{\xi}$ enables exploration. This framework unifies natural selection, Bayesian updating, Newton’s method, stochastic gradient descent, stochastic Langevin dynamics, Adam optimization, and most other algorithms as special cases of the same underlying process. The Price equation also reveals why Fisher information, Kullback-Leibler divergence, and d’Alembert’s principle arise naturally in learning dynamics. By exposing this common structure, the FMB law provides a principled foundation for understanding, comparing, and designing learning algorithms across disciplines.

nan

Article 1123

Title@2025-07-24 (4): Learning Gentle Grasping Using Vision, Sound, and Touch

Title: Learning Gentle Grasping Using Vision, Sound, and Touch

Sanftes Greifen lernen mit Vision, Sound und Touch

利用愿景、声音和触摸进行轻巧的学习 2503.07926v2

Authors (2): Ken Nakahara, Roberto Calandra

In our daily life, we often encounter objects that are fragile and can be damaged by excessive grasping force, such as fruits. For these objects, it is paramount to grasp gently – not using the maximum amount of force possible, but rather the minimum amount of force necessary. This paper proposes using visual, tactile, and auditory signals to learn to grasp and regrasp objects stably and gently. Specifically, we use audio signals as an indicator of gentleness during the grasping, and then train an end-to-end action-conditional model from raw visuo-tactile inputs that predicts both the stability and the gentleness of future grasping candidates, thus allowing the selection and execution of the most promising action. Experimental results on a multi-fingered hand over 1,500 grasping trials demonstrated that our model is useful for gentle grasping by validating the predictive performance (3.27% higher accuracy than the vision-only variant) and providing interpretations of their behavior. Finally, real-world experiments confirmed that the grasping performance with the trained multi-modal model outperformed other baselines (17% higher rate for stable and gentle grasps than vision-only). Our approach requires neither tactile sensor calibration nor analytical force modeling, drastically reducing the engineering effort to grasp fragile objects. Dataset and videos are available at https://lasr.org/research/gentle-grasping.

nan

Article 1124

Title@2025-07-24 (4): Deep Variational Free Energy Calculation of Hydrogen Hugoniot

Title: Deep Variational Free Energy Calculation of Hydrogen Hugoniot

Tiefe Variationsfreie Energieberechnung von Wasserstoff Hugoniot

雨原氢能深变化式自由能源计算 2507.18540v1

Authors (4): Zihang Li, Hao Xie, Xinyang Dong, Lei Wang

We develop a deep variational free energy framework to compute the equation of state of hydrogen in the warm dense matter region. This method parameterizes the variational density matrix of hydrogen nuclei and electrons at finite temperature using three deep generative models: a normalizing flow model that represents the Boltzmann distribution of the classical nuclei, an autoregressive transformer that models the distribution of electrons in excited states, and a permutational equivariant flow model that constructs backflow coordinates for electrons in Hartree-Fock orbitals. By jointly optimizing the three neural networks to minimize the variational free energy, we obtain the equation of state and related thermodynamic properties of dense hydrogen. We compare our results with other theoretical and experimental results on the deuterium Hugoniot curve, aiming to resolve existing discrepancies. The calculated results provide a valuable benchmark for deuterium in the warm dense matter region.

nan

Article 1125

Title@2025-07-24 (4): External Knowledge Injection for CLIP-Based Class-Incremental Learning

Title: External Knowledge Injection for CLIP-Based Class-Incremental Learning

Externe Wissensinjektion für CLIP-basiertes Klassen-Inkrementelles Lernen

为基于CLIP的高级类强化学习提供外部知识注射 2503.08510v2

Authors (6): Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan

Class-Incremental Learning (CIL) enables learning systems to continuously adapt to evolving data streams. With the advancement of pre-training, leveraging pre-trained vision-language models (e.g., CLIP) offers a promising starting point for CIL. However, CLIP makes decisions by matching visual embeddings to class names, overlooking the rich contextual information conveyed through language. For instance, the concept of ``cat’’ can be decomposed into features like tail, fur, and face for recognition. Besides, since the model is continually updated, these detailed features are overwritten in CIL, requiring external knowledge for compensation. In this paper, we introduce ExterNal knowledGe INjEction (ENGINE) for CLIP-based CIL. To enhance knowledge transfer from outside the dataset, we propose a dual-branch injection tuning framework that encodes informative knowledge from both visual and textual modalities. The visual branch is enhanced with data augmentation to enrich the visual features, while the textual branch leverages GPT-4 to rewrite discriminative descriptors. In addition to this on-the-fly knowledge injection, we also implement post-tuning knowledge by re-ranking the prediction results during inference. With the injected knowledge, the model can better capture informative features for downstream tasks as data evolves. Extensive experiments demonstrate the state-of-the-art performance of ENGINE. Code is available at: https://github.com/LAMDA-CL/ICCV25-ENGINE

nan

Article 1126

Title@2025-07-24 (4): Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

Title: Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

Erklärung des Design-Raums für willkürlich-lärmbasierte Diffusionsmodelle

说明以任意噪音为基础的传播模型的设计空间 2507.18534v1

Authors (10): Xingyu Qiu, Mengying Yang, Xinghua Ma, Dong Liang, Yuzhen Li, Fanding Li, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li

EDM elucidates the unified design space of diffusion models, yet its fixed noise patterns restricted to pure Gaussian noise, limit advancements in image restoration. Our study indicates that forcibly injecting Gaussian noise corrupts the degraded images, overextends the image transformation distance, and increases restoration complexity. To address this problem, our proposed EDA Elucidates the Design space of Arbitrary-noise-based diffusion models. Theoretically, EDA expands the freedom of noise pattern while preserving the original module flexibility of EDM, with rigorous proof that increased noise complexity incurs no additional computational overhead during restoration. EDA is validated on three typical tasks: MRI bias field correction (global smooth noise), CT metal artifact reduction (global sharp noise), and natural image shadow removal (local boundary-aware noise). With only 5 sampling steps, EDA outperforms most task-specific methods and achieves state-of-the-art performance in bias field correction and shadow removal.

nan

Article 1127

Title@2025-07-24 (4): C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

Title: C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

C2G-KD: PCA-Constrained Generator für datenfreie Wissensdestillation

C2G-KD: 五氯苯甲醚-经培训的无数据知识蒸馏生成器 2507.18533v1

Authors (2): Magnus Bengtsson, Kenneth Östberg

We introduce C2G-KD, a data-free knowledge distillation framework where a class-conditional generator is trained to produce synthetic samples guided by a frozen teacher model and geometric constraints derived from PCA. The generator never observes real training data but instead learns to activate the teacher’s output through a combination of semantic and structural losses. By constraining generated samples to lie within class-specific PCA subspaces estimated from as few as two real examples per class, we preserve topological consistency and diversity. Experiments on MNIST show that even minimal class structure is sufficient to bootstrap useful synthetic training pipelines.

nan

Article 1128

Title@2025-07-24 (4): Diffuse and Disperse: Image Generation with Representation Regularization

Title: Diffuse and Disperse: Image Generation with Representation Regularization

Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung

Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v2

Authors (2): Runqian Wang, Kaiming He

The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose \textit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.

nan

Article 1129

Title@2025-07-24 (4): Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Title: Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Sind KI-erzeugte Fixes sicher? LLM und Agent Patches auf der SWE-Bench analysieren

AI - 具有安全性吗? 分析SWE-bench 上的LLM 和代理补丁 2507.02976v2

Authors (3): Amirali Sajadi, Kostadin Damevski, Preetha Chatterjee

Large Language Models (LLMs) and their agentic frameworks are increasingly adopted to automate software development tasks such as issue resolution and program repair. While prior work has identified security risks in LLM-generated code, most evaluations have focused on synthetic or isolated settings, leaving open questions about the security of these systems in real-world development contexts. In this study, we present the first large-scale security analysis of LLM-generated patches using 20,000+ issues from the SWE-bench dataset. We evaluate patches produced by a standalone LLM (Llama 3.3) and compare them to developer-written patches. We also assess the security of patches generated by three top-performing agentic frameworks (OpenHands, AutoCodeRover, HoneyComb) on a subset of our data. Finally, we analyze a wide range of code, issue, and project-level factors to understand the conditions under which LLMs and agents are most likely to generate insecure code. Our findings reveal that the standalone LLM introduces nearly 9x more new vulnerabilities than developers, with many of these exhibiting unique patterns not found in developers’ code. Agentic workflows also generate a significant number of vulnerabilities, particularly when granting LLMs more autonomy, potentially increasing the likelihood of misinterpreting project context or task requirements. We find that vulnerabilities are more likely to occur in LLM patches associated with a higher number of files, more lines of generated code, and GitHub issues that lack specific code snippets or information about the expected code behavior and steps to reproduce. These results suggest that contextual factors play a critical role in the security of the generated code and point toward the need for proactive risk assessment methods that account for both code and issue-level information to complement existing vulnerability detection tools.

nan

Article 1130

Title@2025-07-24 (4): The Moral Gap of Large Language Models

Title: The Moral Gap of Large Language Models

Die moralische Kluft großer Sprachmodelle

大语言模式的道德差距 2507.18523v1

Authors (2): Maciej Skorski, Alina Landowska

Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear. This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis. Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts. These findings demonstrate that task-specific fine-tuning remains superior to prompting for moral reasoning applications.

nan

Article 1131

Title@2025-07-24 (4): Optimal Transport Regularized Divergences: Application to Adversarial Robustness

Title: Optimal Transport Regularized Divergences: Application to Adversarial Robustness

Optimaler Transport Regularisierte Divergenzen: Anwendung auf widrige Robustheit

优化运输常规化差异:适用于逆向强力 2309.03791v3

Authors (2): Jeremiah Birrell, Reza Ebrahimi

We introduce a new class of optimal-transport-regularized divergences, $D^c$, constructed via an infimal convolution between an information divergence, $D$, and an optimal-transport (OT) cost, $C$, and study their use in distributionally robust optimization (DRO). In particular, we propose the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These DRO-based methods are defined by minimizing the maximum expected loss over a $D^c$-neighborhood of the empirical distribution of the training data. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence; the addition of a principled and dynamical adversarial re-weighting on top of adversarial sample transport is a key innovation of $ARMOR_D$. $ARMOR_D$ can be viewed as a generalization of the best-performing loss functions and OT costs in the adversarial training literature; we demonstrate this flexibility by using $ARMOR_D$ to augment the UDR, TRADES, and MART methods and obtain improved performance on CIFAR-10 and CIFAR-100 image recognition. Specifically, augmenting with $ARMOR_D$ leads to 1.9\% and 2.1\% improvement against AutoAttack, a powerful ensemble of adversarial attacks, on CIFAR-10 and CIFAR-100 respectively. To foster reproducibility, we made the code accessible at https://github.com/star-ailab/ARMOR.

nan

Article 1132

Title@2025-07-24 (4): GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

Title: GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

GCC-Spam: Spam-Erkennung über GAN, Kontrastives Lernen und Charaktergleichheitsnetzwerke

海合会-Spam:通过全球大气监测网、反竞争学习和特征相似网络探测垃圾邮件 2507.14679v2

Authors (3): Zhijie Wang, Zixin Xu, Zhiyuan Pan

The exponential growth of spam text on the Internet necessitates robust detection mechanisms to mitigate risks such as information leakage and social instability. This work addresses two principal challenges: adversarial strategies employed by spammers and the scarcity of labeled data. We propose a novel spam-text detection framework GCC-Spam, which integrates three core innovations. First, a character similarity network captures orthographic and phonetic features to counter character-obfuscation attacks and furthermore produces sentence embeddings for downstream classification. Second, contrastive learning enhances discriminability by optimizing the latent-space distance between spam and normal texts. Third, a Generative Adversarial Network (GAN) generates realistic pseudo-spam samples to alleviate data scarcity while improving model robustness and classification accuracy. Extensive experiments on real-world datasets demonstrate that our model outperforms baseline approaches, achieving higher detection rates with significantly fewer labeled examples.

nan

Article 1133

Title@2025-07-24 (4): Robust sensitivity control in digital pathology via tile score distribution matching

Title: Robust sensitivity control in digital pathology via tile score distribution matching

Robuste Sensitivitätskontrolle in der digitalen Pathologie über Kacheln-Score-Verteilungsabgleich

通过瓷砖计分分布匹配对数字病理学中的强力敏感度控制 2502.20144v3

Authors (4): Arthur Pignet, John Klein, Genevieve Robin, Antoine Olivier

Deploying digital pathology models across medical centers is challenging due to distribution shifts. Recent advances in domain generalization improve model transferability in terms of aggregated performance measured by the Area Under Curve (AUC). However, clinical regulations often require to control the transferability of other metrics, such as prescribed sensitivity levels. We introduce a novel approach to control the sensitivity of whole slide image (WSI) classification models, based on optimal transport and Multiple Instance Learning (MIL). Validated across multiple cohorts and tasks, our method enables robust sensitivity control with only a handful of calibration samples, providing a practical solution for reliable deployment of computational pathology systems.

nan

Article 1134

Title@2025-07-24 (4): GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

Title: GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

GLANCE: Graph Logic Attention Network mit Cluster Enhancement für heterophiles Graph Representation Learning

图表逻辑关注网络,通过群集增强混合图示代表性学习 2507.18521v1

Authors (5): Zhongtian Sun, Anoushka Harit, Alexandra Cristea, Christl A. Donnelly, Pietro Liò

Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data but often struggle on heterophilous graphs, where connected nodes differ in features or class labels. This limitation arises from indiscriminate neighbor aggregation and insufficient incorporation of higher-order structural patterns. To address these challenges, we propose GLANCE (Graph Logic Attention Network with Cluster Enhancement), a novel framework that integrates logic-guided reasoning, dynamic graph refinement, and adaptive clustering to enhance graph representation learning. GLANCE combines a logic layer for interpretable and structured embeddings, multi-head attention-based edge pruning for denoising graph structures, and clustering mechanisms for capturing global patterns. Experimental results in benchmark datasets, including Cornell, Texas, and Wisconsin, demonstrate that GLANCE achieves competitive performance, offering robust and interpretable solutions for heterophilous graph scenarios. The proposed framework is lightweight, adaptable, and uniquely suited to the challenges of heterophilous graphs.

nan

Article 1135

Title@2025-07-24 (4): Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Title: Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Euklidische Distanz Deflation unter hochdimensionalen heteroskedastischen Geräuschen

高多变性热电传噪声下的远距离通缩 2507.18520v1

Authors (3): Keyi Li, Yuval Kluger, Boris Landa

Pairwise Euclidean distance calculation is a fundamental step in many machine learning and data analysis algorithms. In real-world applications, however, these distances are frequently distorted by heteroskedastic noise$\unicode{x2014}$a prevalent form of inhomogeneous corruption characterized by variable noise magnitudes across data observations. Such noise inflates the computed distances in a nontrivial way, leading to misrepresentations of the underlying data geometry. In this work, we address the tasks of estimating the noise magnitudes per observation and correcting the pairwise Euclidean distances under heteroskedastic noise. Perhaps surprisingly, we show that in general high-dimensional settings and without assuming prior knowledge on the clean data structure or noise distribution, both tasks can be performed reliably, even when the noise levels vary considerably. Specifically, we develop a principled, hyperparameter-free approach that jointly estimates the noise magnitudes and corrects the distances. We provide theoretical guarantees for our approach, establishing probabilistic bounds on the estimation errors of both noise magnitudes and distances. These bounds, measured in the normalized $\ell_1$ norm, converge to zero at polynomial rates as both feature dimension and dataset size increase. Experiments on synthetic datasets demonstrate that our method accurately estimates distances in challenging regimes, significantly improving the robustness of subsequent distance-based computations. Notably, when applied to single-cell RNA sequencing data, our method yields noise magnitude estimates consistent with an established prototypical model, enabling accurate nearest neighbor identification that is fundamental to many downstream analyses.

nan

Article 1136

Title@2025-07-24 (4): Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning

Title: Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning

Revisiting Bisimulation Metric für robuste Darstellungen in Verstärkungs-Lernen

重新研究强化学习中强力代表制的模拟比照模型 2507.18519v1

Authors (4): Leiji Zhang, Zeyu Wang, Xin Li, Yao-Hui Li

Bisimulation metric has long been regarded as an effective control-related representation learning technique in various reinforcement learning tasks. However, in this paper, we identify two main issues with the conventional bisimulation metric: 1) an inability to represent certain distinctive scenarios, and 2) a reliance on predefined weights for differences in rewards and subsequent states during recursive updates. We find that the first issue arises from an imprecise definition of the reward gap, whereas the second issue stems from overlooking the varying importance of reward difference and next-state distinctions across different training stages and task settings. To address these issues, by introducing a measure for state-action pairs, we propose a revised bisimulation metric that features a more precise definition of reward gap and novel update operators with adaptive coefficient. We also offer theoretical guarantees of convergence for our proposed metric and its improved representation distinctiveness. In addition to our rigorous theoretical analysis, we conduct extensive experiments on two representative benchmarks, DeepMind Control and Meta-World, demonstrating the effectiveness of our approach.

nan

Article 1137

Title@2025-07-24 (4): Visual Adaptive Prompting for Compositional Zero-Shot Learning

Title: Visual Adaptive Prompting for Compositional Zero-Shot Learning

Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen

零热学习的视觉适应性促进 2502.20292v6

Authors (4): Kyle Stein, Arash Mahyari, Guillermo Francia, Eman El-Sheikh

Vision-Language Models (VLMs) have demonstrated impressive multimodal capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives–such as attributes and objects–that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose a Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval mechanism within the framework of VLMs to bridge the gap between semantic and visual features. Our method introduces a dynamic visual prompt repository mechanism that selects the most relevant attribute and object prompts based on the visual features of the image. Our proposed system includes a visual prompt adapter that encourages the model to learn a more generalizable embedding space. Experiments on three CZSL benchmarks, across both closed and open-world scenarios, demonstrate state-of-the-art results.

nan

Article 1138

Title@2025-07-24 (4): A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

Title: A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets

遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究 2507.10084v2

Authors (2): Haonan Chen, Xin Tong

The Tibetan Plateau, known as the Asian Water Tower, faces significant water security challenges due to its high sensitivity to climate change. Advancing Earth observation for sustainable water monitoring is thus essential for building climate resilience in this region. This study proposes a two-stage transfer learning strategy using the SegFormer model to overcome domain shift and data scarcit–key barriers in developing robust AI for climate-sensitive applications. After pre-training on a diverse source domain, our model was fine-tuned for the arid Zhada Tulin area. Experimental results show a substantial performance boost: the Intersection over Union (IoU) for water body segmentation surged from 25.50% (direct transfer) to 64.84%. This AI-driven accuracy is crucial for disaster risk reduction, particularly in monitoring flash flood-prone systems. More importantly, the high-precision map reveals a highly concentrated spatial distribution of water, with over 80% of the water area confined to less than 20% of the river channel length. This quantitative finding provides crucial evidence for understanding hydrological processes and designing targeted water management and climate adaptation strategies. Our work thus demonstrates an effective technical solution for monitoring arid plateau regions and contributes to advancing AI-powered Earth observation for disaster preparedness in critical transboundary river headwaters.

nan

Article 1139

Title@2025-07-24 (4): Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Title: Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen

连续时线性强化学习问题分类的子线性遗憾 2407.17226v6

Authors (3): Yilie Huang, Yanwei Jia, Xun Yu Zhou

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.

nan

Article 1140

Title@2025-07-24 (4): Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

Title: Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

Multi-Preference Lambda-bewertet Listwise DPO für kleine Modellausrichtung

用于小规模模型调整的多参数 Lambda加权列表DPO 2506.19780v5

Authors (5): Yuhui Sun, Xiyao Wang, Zixi Li, Zhenlong Yuan, Jinman Zhao

Large language models (LLMs) demonstrate strong generalization across a wide range of language tasks, but often generate outputs that misalign with human preferences. Reinforcement Learning from Human Feedback (RLHF) addresses this by optimizing models toward human preferences using a learned reward function and reinforcement learning, yielding improved alignment but suffering from high computational cost and instability. Direct Preference Optimization (DPO) simplifies the process by treating alignment as a classification task over binary preference pairs, reducing training overhead while achieving competitive performance. However, it assumes fixed, single-dimensional preferences and only supports pairwise supervision. To address these limitations, we propose Multi-Preference Lambda-weighted Listwise DPO, which allows the model to learn from more detailed human feedback and flexibly balance multiple goals such as helpfulness, honesty, and fluency. Our method models full-ranked preference distributions rather than binary comparisons, enabling more informative learning signals. The lambda vector controls the relative importance of different alignment goals, allowing the model to generalize across diverse human objectives. During inference, lambda can be adjusted without retraining, providing controllable alignment behavior for downstream use. We also introduce a learned scheduler that dynamically samples performant lambda configurations to improve robustness. Notably, our method requires only 20GB of GPU memory for training, making it suitable for compute-constrained settings such as academic labs, educational tools, or on-device assistants. Experiments on 1B-2B scale models show that our method consistently outperforms standard DPO on alignment benchmarks while enabling efficient, controllable, and fine-grained adaptation suitable for real-world deployment.

nan

Article 1141

Title@2025-07-24 (4): DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

Title: DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

DualXDA: Auf dem Weg zu sparsamen, effizienten und erklärbaren Datenzuweisungen in großen KI-Modellen

DUAXDA:在大型AI型模型中实现数据分散、高效和可解释的归属 2402.12118v2

Authors (5): Galip Ümit Yolcu, Moritz Weckbecker, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

Deep learning models achieve remarkable performance, yet their decision-making processes often remain opaque. In response, the field of eXplainable Artificial Intelligence (XAI) has grown significantly over the last decade, primarily focusing on feature attribution methods. Complementing this perspective, Data Attribution (DA) has emerged as a promising paradigm that shifts the focus from features to data provenance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands. Additionally, current attribution methods exhibit low sparsity, hindering the discovery of decisive patterns in the data. We introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches for Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. We demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000$\times$ compared to the original Influence Functions method, and up to 11,000$\times$ compared to the method’s most efficient approximation from literature. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features. Taken together, our contributions in DualXDA ultimately point towards a future of eXplainable AI applied at unprecedented scale, enabling transparent, efficient and novel analysis of even the largest neural architectures fostering a new generation of accountable AI systems. Code at https://github.com/gumityolcu/DualXDA.

nan

Article 1142

Title@2025-07-24 (4): Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Title: Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Nicht alle Funktionen widmen sich der Aufmerksamkeit: Graphengeführtes Abhängigkeitslernen für tabellarische Datengenerierung mit Sprachmodellen

并非所有值得注意的地物:用语言模型编制图表数据时的图表指导依赖性学习 2507.18504v1

Authors (4): Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

Large Language Models (LLMs) have shown strong potential for tabular data generation by modeling textualized feature-value pairs. However, tabular data inherently exhibits sparse feature-level dependencies, where many feature interactions are structurally insignificant. This creates a fundamental mismatch as LLMs’ self-attention mechanism inevitably distributes focus across all pairs, diluting attention on critical relationships, particularly in datasets with complex dependencies or semantically ambiguous features. To address this limitation, we propose GraDe (Graph-Guided Dependency Learning), a novel method that explicitly integrates sparse dependency graphs into LLMs’ attention mechanism. GraDe employs a lightweight dynamic graph learning module guided by externally extracted functional dependencies, prioritizing key feature interactions while suppressing irrelevant ones. Our experiments across diverse real-world datasets demonstrate that GraDe outperforms existing LLM-based approaches by up to 12% on complex datasets while achieving competitive results with state-of-the-art approaches in synthetic data quality. Our method is minimally intrusive yet effective, offering a practical solution for structure-aware tabular data modeling with LLMs.

nan

Article 1143

Title@2025-07-24 (4): PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Title: PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

PLOT-TAL: Schnell lernen mit optimalem Transport für temporale Aktionslokalisierung

PLOT-TAL: 以最优化交通方式迅速学习,促进少数时空行动地方化 2403.18915v2

Authors (2): Edward Fish, Andrew Gilbert

Few-shot temporal action localization (TAL) methods that adapt large models via single-prompt tuning often fail to produce precise temporal boundaries. This stems from the model learning a non-discriminative mean representation of an action from sparse data, which compromises generalization. We address this by proposing a new paradigm based on multi-prompt ensembles, where a set of diverse, learnable prompts for each action is encouraged to specialize on compositional sub-events. To enforce this specialization, we introduce PLOT-TAL, a framework that leverages Optimal Transport (OT) to find a globally optimal alignment between the prompt ensemble and the video’s temporal features. Our method establishes a new state-of-the-art on the challenging few-shot benchmarks of THUMOS’14 and EPIC-Kitchens, without requiring complex meta-learning. The significant performance gains, particularly at high IoU thresholds, validate our hypothesis and demonstrate the superiority of learning distributed, compositional representations for precise temporal localization.

nan

Article 1144

Title@2025-07-24 (4): EarthLink: A Self-Evolving AI Agent for Climate Science

Title: EarthLink: A Self-Evolving AI Agent for Climate Science

EarthLink: Ein sich selbst entwickelnder KI-Agent für Klimawissenschaften

EarthLink:一个自我发展的AI气候科学代理机构 2507.17311v2

Authors (17): Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai

Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher’s workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.

nan

Article 1145

Title@2025-07-24 (4): Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Title: Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Unüberwachtes Konzept Drift Erkennung von Deep-Learning-Darstellungen in Echtzeit

从实时深层学习代表中检测出 2406.17813v2

Authors (4): Salvatore Greco, Bartolomeo Vacchetti, Daniele Apiletti, Tania Cerquitelli

Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time, leading to a degradation in model performance. Consequently, production models require continuous drift detection monitoring. Most drift detection methods to date are supervised, relying on ground-truth labels. However, they are inapplicable in many real-world scenarios, as true labels are often unavailable. Although recent efforts have proposed unsupervised drift detectors, many lack the accuracy required for reliable detection or are too computationally intensive for real-time use in high-dimensional, large-scale production environments. Moreover, they often fail to characterize or explain drift effectively. To address these limitations, we propose \textsc{DriftLens}, an unsupervised framework for real-time concept drift detection and characterization. Designed for deep learning classifiers handling unstructured data, \textsc{DriftLens} leverages distribution distances in deep learning representations to enable efficient and accurate detection. Additionally, it characterizes drift by analyzing and explaining its impact on each label. Our evaluation across classifiers and data-types demonstrates that \textsc{DriftLens} (i) outperforms previous methods in detecting drift in 15/17 use cases; (ii) runs at least 5 times faster; (iii) produces drift curves that align closely with actual drift (correlation $\geq!0.85$); (iv) effectively identifies representative drift samples as explanations.

nan

Article 1146

Title@2025-07-24 (4): Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Title: Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Treue, dolmetschbare Röntgendiagnose im Brustkorb mit Anti-Aliased-B-Cos-Netzwerken

真实的、可解释的胸透透透透透透透透透透透透透透透析与反闭合的B子网络的诊断 2507.16761v2

Authors (3): Marcel Kleinmann, Shashank Agnihotri, Margret Keuper

Faithfulness and interpretability are essential for deploying deep neural networks (DNNs) in safety-critical domains such as medical imaging. B-cos networks offer a promising solution by replacing standard linear layers with a weight-input alignment mechanism, producing inherently interpretable, class-specific explanations without post-hoc methods. While maintaining diagnostic performance competitive with state-of-the-art DNNs, standard B-cos models suffer from severe aliasing artifacts in their explanation maps, making them unsuitable for clinical use where clarity is essential. In this work, we address these limitations by introducing anti-aliasing strategies using FLCPooling (FLC) and BlurPool (BP) to significantly improve explanation quality. Our experiments on chest X-ray datasets demonstrate that the modified $\text{B-cos}\text{FLC}$ and $\text{B-cos}\text{BP}$ preserve strong predictive performance while providing faithful and artifact-free explanations suitable for clinical application in multi-class and multi-label settings. Code available at: GitHub repository (url: https://github.com/mkleinma/B-cos-medical-paper).

nan

Article 1147

Title@2025-07-24 (4): DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

Title: DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

DriftMoE: Eine Mischung aus Experten Ansatz zum Umgang mit Konzept Drifts

DriftMoE:处理 “ 漂流概念 “ 的混合专家办法 2507.18464v1

Authors (4): Miguel Aspis, Sebastián A. Cajas Ordónez, Andrés L. Suárez-Cetrulo, Ricardo Simón Carbajo

Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple voting schemes that fail to optimally leverage specialized knowledge. This paper introduces DriftMoE, an online Mixture-of-Experts (MoE) architecture that addresses these limitations through a novel co-training framework. DriftMoE features a compact neural router that is co-trained alongside a pool of incremental Hoeffding tree experts. The key innovation lies in a symbiotic learning loop that enables expert specialization: the router selects the most suitable expert for prediction, the relevant experts update incrementally with the true label, and the router refines its parameters using a multi-hot correctness mask that reinforces every accurate expert. This feedback loop provides the router with a clear training signal while accelerating expert specialization. We evaluate DriftMoE’s performance across nine state-of-the-art data stream learning benchmarks spanning abrupt, gradual, and real-world drifts testing two distinct configurations: one where experts specialize on data regimes (multi-class variant), and another where they focus on single-class specialization (task-based variant). Our results demonstrate that DriftMoE achieves competitive results with state-of-the-art stream learning adaptive ensembles, offering a principled and efficient approach to concept drift adaptation. All code, data pipelines, and reproducibility scripts are available in our public GitHub repository: https://github.com/miguel-ceadar/drift-moe.

nan

Article 1148

Title@2025-07-24 (4): Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Title: Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Wiederherstellung des Rhythmus: Pünktlichkeitsrestaurierung mit Transformer-Modellen für Bangla, eine Sprache mit geringer Ressource

恢复时速:使用孟加拉国低资源语言 “ 孟加拉 “ 变压器模型恢复脉冲 2507.18448v1

Authors (4): Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu

Punctuation restoration enhances the readability of text and is critical for post-processing tasks in Automatic Speech Recognition (ASR), especially for low-resource languages like Bangla. In this study, we explore the application of transformer-based models, specifically XLM-RoBERTa-large, to automatically restore punctuation in unpunctuated Bangla text. We focus on predicting four punctuation marks: period, comma, question mark, and exclamation mark across diverse text domains. To address the scarcity of annotated resources, we constructed a large, varied training corpus and applied data augmentation techniques. Our best-performing model, trained with an augmentation factor of alpha = 0.20%, achieves an accuracy of 97.1% on the News test set, 91.2% on the Reference set, and 90.2% on the ASR set. Results show strong generalization to reference and ASR transcripts, demonstrating the model’s effectiveness in real-world, noisy scenarios. This work establishes a strong baseline for Bangla punctuation restoration and contributes publicly available datasets and code to support future research in low-resource NLP.

nan

Article 1149

Title@2025-07-24 (4): Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Title: Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen

基于成果的在线强化学习:等级和基本限制 2505.20268v2

Authors (4): Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.

nan

Article 1150

Title@2025-07-24 (4): IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

Title: IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

IPCGRL: Sprachgestütztes Verstärkungslernen für die verfahrenstechnische Level-Generierung

ICPCGRL: 程序生成阶段语言教学强化学习 2503.12358v4

Authors (5): In-Chang Baek, Sung-Hyun Kim, Seo-Young Lee, Dong-Hyeon Kim, Kyung-Joong Kim

Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for content generation, research on deep reinforcement learning (DRL) agents utilizing text-based instructions for procedural content generation remains limited. In this paper, we propose IPCGRL, an instruction-based procedural content generation method via reinforcement learning, which incorporates a sentence embedding model. IPCGRL fine-tunes task-specific embedding representations to effectively compress game-level conditions. We evaluate IPCGRL in a two-dimensional level generation task and compare its performance with a general-purpose embedding method. The results indicate that IPCGRL achieves up to a 21.4% improvement in controllability and a 17.2% improvement in generalizability for unseen instructions. Furthermore, the proposed method extends the modality of conditional input, enabling a more flexible and expressive interaction framework for procedural content generation.

nan

Article 1151

Title@2025-07-24 (4): NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning

Title: NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning

NLML-HPE: Kopfhosenschätzung mit begrenzten Daten über Manifold Learning

NLML-HPE:通过人工学习用有限数据进行测算的负责人 2507.18429v1

Authors (2): Mahdi Ghafourian, Federico M. Sukno

Head pose estimation (HPE) plays a critical role in various computer vision applications such as human-computer interaction and facial recognition. In this paper, we propose a novel deep learning approach for head pose estimation with limited training data via non-linear manifold learning called NLML-HPE. This method is based on the combination of tensor decomposition (i.e., Tucker decomposition) and feed forward neural networks. Unlike traditional classification-based approaches, our method formulates head pose estimation as a regression problem, mapping input landmarks into a continuous representation of pose angles. To this end, our method uses tensor decomposition to split each Euler angle (yaw, pitch, roll) to separate subspaces and models each dimension of the underlying manifold as a cosine curve. We address two key challenges: 1. Almost all HPE datasets suffer from incorrect and inaccurate pose annotations. Hence, we generated a precise and consistent 2D head pose dataset for our training set by rotating 3D head models for a fixed set of poses and rendering the corresponding 2D images. 2. We achieved real-time performance with limited training data as our method accurately captures the nature of rotation of an object from facial landmarks. Once the underlying manifold for rotation around each axis is learned, the model is very fast in predicting unseen data. Our training and testing code is available online along with our trained models: https: //github.com/MahdiGhafoorian/NLML_HPE.

nan

Article 1152

Title@2025-07-24 (4): How do language models learn facts? Dynamics, curricula and hallucinations

Title: How do language models learn facts? Dynamics, curricula and hallucinations

Wie lernen Sprachmodelle Fakten? Dynamik, Lehrpläne und Halluzinationen

语言模式如何了解事实?动态、课程和幻觉 2503.21676v2

Authors (6): Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De

Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechanistically, this plateau coincides with the formation of attention-based circuits that support recall. Second, the training data distribution significantly impacts learning dynamics, as imbalanced distributions lead to shorter plateaus. Finally, hallucinations emerge simultaneously with knowledge, and integrating new knowledge into the model through fine-tuning is challenging, as it quickly corrupts its existing parametric memories. Our results emphasize the importance of data distribution in knowledge acquisition and suggest novel data scheduling strategies to accelerate neural network training.

nan

Article 1153

Title@2025-07-24 (4): Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins

Title: Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins

Multi-Model-Ensemble und Reservoir Computing für Flussentladungsvorhersage in ungespurten Becken

多模型组合和储量计算,用于未排出盆地的河流排泄预测 2507.18423v1

Authors (2): Mizuki Funato, Yohei Sawada

Despite the critical need for accurate flood prediction and water management, many regions lack sufficient river discharge observations, limiting the skill of rainfall-runoff analyses. Although numerous physically based and machine learning models exist, achieving high accuracy, interpretability, and computational efficiency under data-scarce conditions remains a major challenge. We address this challenge with a novel method, HYdrological Prediction with multi-model Ensemble and Reservoir computing (HYPER) that leverages multi-model ensemble and reservoir computing (RC). Our approach first applies Bayesian model averaging (BMA) to 43 “uncalibrated” catchment-based conceptual hydrological models. An RC model is then trained via linear regression to correct errors in the BMA output, a non-iterative process that ensures high computational efficiency. For ungauged basins, we infer the required BMA and RC weights by linking them to catchment attributes from gauged basins, creating a generalizable framework. We evaluated HYPER using data from 87 river basins in Japan. In a data-rich scenario, HYPER (median Kling-Gupta Efficiency, KGE, of 0.56) performed comparably to a benchmark LSTM (KGE 0.55) but required only 5% of its computational time. In a data-scarce scenario (23% of basins gauged), HYPER maintained robust performance (KGE 0.55) and lower uncertainty, whereas the LSTM’s performance degraded significantly (KGE -0.04). These results reveal that individual conceptual hydrological models do not necessarily need to be calibrated when an effectively large ensemble is assembled and combined with machine-learning-based bias correction. HYPER provides a robust, efficient, and generalizable solution for discharge prediction, particularly in ungauged basins, making it applicable to a wide range of regions.

nan

Article 1154

Title@2025-07-24 (4): Residual Prior-driven Frequency-aware Network for Image Fusion

Title: Residual Prior-driven Frequency-aware Network for Image Fusion

Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion

图像融合超前驱动频率感知网络 2507.06735v2

Authors (5): Guan Zheng, Xue Wang, Wenhua Qian, Peng Liu, Runzhuo Ma

Image fusion aims to integrate complementary information across modalities to generate high-quality fused images, thereby enhancing the performance of high-level vision tasks. While global spatial modeling mechanisms show promising results, constructing long-range feature dependencies in the spatial domain incurs substantial computational costs. Additionally, the absence of ground-truth exacerbates the difficulty of capturing complementary features effectively. To tackle these challenges, we propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet. Specifically, RPFNet employs a dual-branch feature extraction framework: the Residual Prior Module (RPM) extracts modality-specific difference information from residual maps, thereby providing complementary priors for fusion; the Frequency Domain Fusion Module (FDFM) achieves efficient global feature modeling and integration through frequency-domain convolution. Additionally, the Cross Promotion Module (CPM) enhances the synergistic perception of local details and global structures through bidirectional feature interaction. During training, we incorporate an auxiliary decoder and saliency structure loss to strengthen the model’s sensitivity to modality-specific differences. Furthermore, a combination of adaptive weight-based frequency contrastive loss and SSIM loss effectively constrains the solution space, facilitating the joint capture of local details and global features while ensuring the retention of complementary information. Extensive experiments validate the fusion performance of RPFNet, which effectively integrates discriminative features, enhances texture details and salient objects, and can effectively facilitate the deployment of the high-level vision task.

nan

Article 1155

Title@2025-07-24 (4): FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

Title: FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

FinDPO: Finanz-Sentiment-Analyse für algorithmischen Handel durch Preference-Optimierung von LLMs

FinDPO:通过优惠优化LLMs,分析通过高利贷交易的金融敏感度 2507.18417v1

Authors (3): Giorgos Iacovides, Wuyang Zhou, Danilo Mandic

Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel ‘logit-to-score’ conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).

nan

Article 1156

Title@2025-07-24 (4): Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

Title: Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

Iwin Transformer: Hierarchische Vision Transformer mit Interleaved Windows

Iwin 变换器: 使用内部视窗的等级愿景变换器 2507.18405v1

Authors (2): Simin Huo, Ning Li

We introduce Iwin Transformer, a novel position-embedding-free hierarchical vision transformer, which can be fine-tuned directly from low to high resolution, through the collaboration of innovative interleaved window attention and depthwise separable convolution. This approach uses attention to connect distant tokens and applies convolution to link neighboring tokens, enabling global information exchange within a single module, overcoming Swin Transformer’s limitation of requiring two consecutive blocks to approximate global attention. Extensive experiments on visual benchmarks demonstrate that Iwin Transformer exhibits strong competitiveness in tasks such as image classification (87.4 top-1 accuracy on ImageNet-1K), semantic segmentation and video action recognition. We also validate the effectiveness of the core component in Iwin as a standalone module that can seamlessly replace the self-attention module in class-conditional image generation. The concepts and methods introduced by the Iwin Transformer have the potential to inspire future research, like Iwin 3D Attention in video generation. The code and models are available at https://github.com/cominder/Iwin-Transformer.

nan

Article 1157

Title@2025-07-24 (4): CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Title: CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

CLEAR: Fehleranalyse über LLM-as-a-Judge leicht gemacht

CLLEAR:通过LLM-as-a法官进行错误分析 2507.18392v1

Authors (5): Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer

The evaluation of Large Language Models (LLMs) increasingly relies on other LLMs acting as judges. However, current evaluation paradigms typically yield a single score or ranking, answering which model is better but not why. While essential for benchmarking, these top-level scores obscure the specific, actionable reasons behind a model’s performance. To bridge this gap, we introduce CLEAR, an interactive, open-source package for LLM-based error analysis. CLEAR first generates per-instance textual feedback, then it creates a set of system-level error issues, and quantifies the prevalence of each identified issue. Our package also provides users with an interactive dashboard that allows for a comprehensive error analysis through aggregate visualizations, applies interactive filters to isolate specific issues or score ranges, and drills down to the individual instances that exemplify a particular behavioral pattern. We demonstrate CLEAR analysis for RAG and Math benchmarks, and showcase its utility through a user case study.

nan

Article 1158

Title@2025-07-24 (4): A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting

Title: A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting

Ein Vergleich der Modelle für datengesteuerte regionale Wettervorhersagen mit ausgedehntem Grid und begrenzten Flächen

数据驱动区域天气预报的用数据驱动的区域气象预报的拉累电网和有限区域模拟模型的比较 2507.18378v1

Authors (6): Jasper S. Wijnands, Michiel Van Ginderachter, Bastien François, Sophie Buurman, Piet Termonia, Dieter Van den Bleeken

Regional machine learning weather prediction (MLWP) models based on graph neural networks have recently demonstrated remarkable predictive accuracy, outperforming numerical weather prediction models at lower computational costs. In particular, limited-area model (LAM) and stretched-grid model (SGM) approaches have emerged for generating high-resolution regional forecasts, based on initial conditions from a regional (re)analysis. While LAM uses lateral boundaries from an external global model, SGM incorporates a global domain at lower resolution. This study aims to understand how the differences in model design impact relative performance and potential applications. Specifically, the strengths and weaknesses of these two approaches are identified for generating deterministic regional forecasts over Europe. Using the Anemoi framework, models of both types are built by minimally adapting a shared architecture and trained using global and regional reanalyses in a near-identical setup. Several inference experiments have been conducted to explore their relative performance and highlight key differences. Results show that both LAM and SGM are competitive deterministic MLWP models with generally accurate and comparable forecasting performance over the regional domain. Various differences were identified in the performance of the models across applications. LAM is able to successfully exploit high-quality boundary forcings to make predictions within the regional domain and is suitable in contexts where global data is difficult to acquire. SGM is fully self-contained for easier operationalisation, can take advantage of more training data and significantly surpasses LAM in terms of (temporal) generalisability. Our paper can serve as a starting point for meteorological institutes to guide their choice between LAM and SGM in developing an operational data-driven forecasting system.

nan

Article 1159

Title@2025-07-24 (4): On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Title: On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Über die Wiederherstellung von Trainingsdaten aus Bayesischen Nachbildungen und ausgebildeten Modellen

Bayesian Posides和经过培训的模型的培训数据重建 2507.18372v1

Authors (1): George Wynne

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.

nan

Article 1160

Title@2025-07-24 (4): Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Title: Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Effiziente Unsicherheit in LLMs durch Evidential Knowledge Destillation

通过证据知识蒸馏在LLMs中提高效能的不确定性 2507.18366v1

Authors (3): Lakshmana Sri Harsha Nemani, P. K. Srijith, Tomasz Kuśmierczyk

Accurate uncertainty quantification remains a key challenge for standard LLMs, prompting the adoption of Bayesian and ensemble-based methods. However, such methods typically necessitate computationally expensive sampling, involving multiple forward passes to effectively estimate predictive uncertainty. In this paper, we introduce a novel approach enabling efficient and effective uncertainty estimation in LLMs without sacrificing performance. Specifically, we distill uncertainty-aware teacher models - originally requiring multiple forward passes - into compact student models sharing the same architecture but fine-tuned using Low-Rank Adaptation (LoRA). We compare two distinct distillation strategies: one in which the student employs traditional softmax-based outputs, and another in which the student leverages Dirichlet-distributed outputs to explicitly model epistemic uncertainty via evidential learning. Empirical evaluations on classification datasets demonstrate that such students can achieve comparable or superior predictive and uncertainty quantification performance relative to their teacher models, while critically requiring only a single forward pass. To our knowledge, this is the first demonstration that immediate and robust uncertainty quantification can be achieved in LLMs through evidential distillation.

nan

Article 1161

Title@2025-07-24 (4): Leveraging the Structure of Medical Data for Improved Representation Learning

Title: Leveraging the Structure of Medical Data for Improved Representation Learning

Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen

利用医疗数据结构改进代表性学习 2507.02987v3

Authors (10): Andrea Agostini, Sonia Laguna, Alain Ryser, Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce

nan

Article 1162

Title@2025-07-24 (4): Latent Space Alignment for AI-Native MIMO Semantic Communications

Title: Latent Space Alignment for AI-Native MIMO Semantic Communications

Latent Space Alignment für KI-Native MIMO Semantische Kommunikation

用于AI-Native MIMO语义通信的远程空间对齐 2507.16680v2

Authors (4): Mario Edoardo Pandolfo, Simone Fiorellino, Emilio Calvanese Strinati, Paolo Di Lorenzo

Semantic communications focus on prioritizing the understanding of the meaning behind transmitted data and ensuring the successful completion of tasks that motivate the exchange of information. However, when devices rely on different languages, logic, or internal representations, semantic mismatches may occur, potentially hindering mutual understanding. This paper introduces a novel approach to addressing latent space misalignment in semantic communications, exploiting multiple-input multiple-output (MIMO) communications. Specifically, our method learns a MIMO precoder/decoder pair that jointly performs latent space compression and semantic channel equalization, mitigating both semantic mismatches and physical channel impairments. We explore two solutions: (i) a linear model, optimized by solving a biconvex optimization problem via the alternating direction method of multipliers (ADMM); (ii) a neural network-based model, which learns semantic MIMO precoder/decoder under transmission power budget and complexity constraints. Numerical results demonstrate the effectiveness of the proposed approach in a goal-oriented semantic communication scenario, illustrating the main trade-offs between accuracy, communication burden, and complexity of the solutions.

nan

Article 1163

Title@2025-07-24 (4): Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Title: Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Tiny ist nicht klein genug: Hochwertige, ressourcenarme Gesichtsanimationsmodelle durch Hybrid-Wissensdestillation

微小不够小:通过混合知识蒸馏,建立高质量、资源低的面部动画模型。 2507.18352v1

Authors (4): Zhen Han, Mattias Teye, Derek Yadgaroff, Judith Bütepage

The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.

nan

Article 1164

Title@2025-07-24 (4): Low-rank adaptive physics-informed HyperDeepONets for solving differential equations

Title: Low-rank adaptive physics-informed HyperDeepONets for solving differential equations

Low-rank adaptive Physik-informiert HyperDeepONets zur Lösung von Differentialgleichungen

用于解决差别方程的低级别适应性物理知情高超深电联 2507.18346v1

Authors (3): Etienne Zeudong, Elsa Cardoso-Bihlo, Alex Bihlo

HyperDeepONets were introduced in Lee, Cho and Hwang [ICLR, 2023] as an alternative architecture for operator learning, in which a hypernetwork generates the weights for the trunk net of a DeepONet. While this improves expressivity, it incurs high memory and computational costs due to the large number of output parameters required. In this work we introduce, in the physics-informed machine learning setting, a variation, PI-LoRA-HyperDeepONets, which leverage low-rank adaptation (LoRA) to reduce complexity by decomposing the hypernetwork’s output layer weight matrix into two smaller low-rank matrices. This reduces the number of trainable parameters while introducing an extra regularization of the trunk networks’ weights. Through extensive experiments on both ordinary and partial differential equations we show that PI-LoRA-HyperDeepONets achieve up to 70\% reduction in parameters and consistently outperform regular HyperDeepONets in terms of predictive accuracy and generalization.

nan

Article 1165

Title@2025-07-24 (4): Remembering the Markov Property in Cooperative MARL

Title: Remembering the Markov Property in Cooperative MARL

Erinnerung an das Markov-Grundstück in der Genossenschaft MARL

记得马尔科夫在MARL合作社中的财产 2507.18333v1

Authors (5): Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey

Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents’ behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.

nan

Article 1166

Title@2025-07-24 (4): Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations

Title: Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations

Hierarchisches dimensionsloses Lernen (Hi-π): Ein physik-data-hybridgetriebener Ansatz zur Entdeckung dimensionsloser Parameterkombinationen

高层次无尺寸学习(Hi-):物理学-数据混合驱动的发现无尺寸参数组合的物理-数据混合法 2507.18332v1

Authors (3): Mingkun Xia, Haitao Lin, Weiwei Zhang

Dimensional analysis provides a universal framework for reducing physical complexity and reveal inherent laws. However, its application to high-dimensional systems still generates redundant dimensionless parameters, making it challenging to establish physically meaningful descriptions. Here, we introduce Hierarchical Dimensionless Learning (Hi-{\pi}), a physics-data hybrid-driven method that combines dimensional analysis and symbolic regression to automatically discover key dimensionless parameter combination(s). We applied this method to classic examples in various research fields of fluid mechanics. For the Rayleigh-B'enard convection, this method accurately extracted two intrinsic dimensionless parameters: the Rayleigh number and the Prandtl number, validating its unified representation advantage across multiscale data. For the viscous flows in a circular pipe, the method automatically discovers two optimal dimensionless parameters: the Reynolds number and relative roughness, achieving a balance between accuracy and complexity. For the compressibility correction in subsonic flow, the method effectively extracts the classic compressibility correction formulation, while demonstrating its capability to discover hierarchical structural expressions through optimal parameter transformations.

nan

Article 1167

Title@2025-07-24 (4): Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Title: Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Position: Eine empirisch begründete Identifizierbarkeitstheorie beschleunigt die selbstüberwachte Lernforschung

职位: 以活性基础的可识别性理论将加速自我监督学习研究 2504.13101v3

Authors (4): Patrik Reizinger, Randall Balestriero, David Klindt, Wieland Brendel

Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. However, current IT cannot explain SSL’s empirical success. To bridge the gap between theory and practice, we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.

nan

Article 1168

Title@2025-07-24 (4): A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

Title: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

Ein Multi-Dataset-Benchmark für semi-überwachte semantische Segmentierung in EKG-Delineation

ECG 划定中半超部分解的多数据集基准 2507.18323v1

Authors (4): Minje Park, Jeonghwa Lim, Taehyung Yu, Sunghoon Joo

Electrocardiogram (ECG) delineation, the segmentation of meaningful waveform features, is critical for clinical diagnosis. Despite recent advances using deep learning, progress has been limited by the scarcity of publicly available annotated datasets. Semi-supervised learning presents a promising solution by leveraging abundant unlabeled ECG data. In this study, we present the first systematic benchmark for semi-supervised semantic segmentation (SemiSeg) in ECG delineation. We curated and unified multiple public datasets, including previously underused sources, to support robust and diverse evaluation. We adopted five representative SemiSeg algorithms from computer vision, implemented them on two different architectures: the convolutional network and the transformer, and evaluated them in two different settings: in-domain and cross-domain. Additionally, we propose ECG-specific training configurations and augmentation strategies and introduce a standardized evaluation framework. Our results show that the transformer outperforms the convolutional network in semi-supervised ECG delineation. We anticipate that our benchmark will serve as a foundation for advancing semi-supervised ECG delineation methods and will facilitate further research in this domain.

nan

Article 1169

Title@2025-07-24 (4): I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

Title: I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

I-CEE: Maßgeschneiderte Erläuterungen von Bildklassifikationsmodellen zur Benutzerexpertise

I-CEE:根据用户专门知识对图像分类模型的定制解释 2312.12102v3

Authors (4): Yao Rong, Peizhu Qian, Vaibhav Unhelkar, Enkelejda Kasneci

Effectively explaining decisions of black-box machine learning models is critical to responsible deployment of AI systems that rely on them. Recognizing their importance, the field of explainable AI (XAI) provides several techniques to generate these explanations. Yet, there is relatively little emphasis on the user (the explainee) in this growing body of work and most XAI techniques generate “one-size-fits-all” explanations. To bridge this gap and achieve a step closer towards human-centered XAI, we present I-CEE, a framework that provides Image Classification Explanations tailored to User Expertise. Informed by existing work, I-CEE explains the decisions of image classification models by providing the user with an informative subset of training data (i.e., example images), corresponding local explanations, and model decisions. However, unlike prior work, I-CEE models the informativeness of the example images to depend on user expertise, resulting in different examples for different users. We posit that by tailoring the example set to user expertise, I-CEE can better facilitate users’ understanding and simulatability of the model. To evaluate our approach, we conduct detailed experiments in both simulation and with human participants (N = 100) on multiple datasets. Experiments with simulated users show that I-CEE improves users’ ability to accurately predict the model’s decisions (simulatability) compared to baselines, providing promising preliminary results. Experiments with human participants demonstrate that our method significantly improves user simulatability accuracy, highlighting the importance of human-centered XAI

nan

Article 1170

Title@2025-07-24 (4): State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

Title: State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

Zustand der Gesundheit Schätzung von Batterien mit einem zeitinformierten dynamischen Sequenz-invertierten Transformer

使用时间化动态序列反向转换器对电池进行健康状况估计 2507.18320v1

Authors (4): Janak M. Patel, Milad Ramezankhani, Anirudh Deodhar, Dagnachew Birru

The rapid adoption of battery-powered vehicles and energy storage systems over the past decade has made battery health monitoring increasingly critical. Batteries play a central role in the efficiency and safety of these systems, yet they inevitably degrade over time due to repeated charge-discharge cycles. This degradation leads to reduced energy efficiency and potential overheating, posing significant safety concerns. Accurate estimation of a State of Health (SoH) of battery is therefore essential for ensuring operational reliability and safety. Several machine learning architectures, such as LSTMs, transformers, and encoder-based models, have been proposed to estimate SoH from discharge cycle data. However, these models struggle with the irregularities inherent in real-world measurements: discharge readings are often recorded at non-uniform intervals, and the lengths of discharge cycles vary significantly. To address this, most existing approaches extract features from the sequences rather than processing them in full, which introduces information loss and compromises accuracy. To overcome these challenges, we propose a novel architecture: Time-Informed Dynamic Sequence Inverted Transformer (TIDSIT). TIDSIT incorporates continuous time embeddings to effectively represent irregularly sampled data and utilizes padded sequences with temporal attention mechanisms to manage variable-length inputs without discarding sequence information. Experimental results on the NASA battery degradation dataset show that TIDSIT significantly outperforms existing models, achieving over 50% reduction in prediction error and maintaining an SoH prediction error below 0.58%. Furthermore, the architecture is generalizable and holds promise for broader applications in health monitoring tasks involving irregular time-series data.

nan

Article 1171

Title@2025-07-24 (4): Regression-aware Continual Learning for Android Malware Detection

Title: Regression-aware Continual Learning for Android Malware Detection

Regressions-aware Continual Learning für Android Malware-Erkennung

Android Maware 探测 Android Maware 持续学习 2507.18313v1

Authors (9): Daniele Ghiani, Daniele Angioni, Giorgio Piras, Angelo Sotgiu, Luca Minnei, Srishti Gupta, Maura Pintor, Fabio Roli, Battista Biggio

Malware evolves rapidly, forcing machine learning (ML)-based detectors to adapt continuously. With antivirus vendors processing hundreds of thousands of new samples daily, datasets can grow to billions of examples, making full retraining impractical. Continual learning (CL) has emerged as a scalable alternative, enabling incremental updates without full data access while mitigating catastrophic forgetting. In this work, we analyze a critical yet overlooked issue in this context: security regression. Unlike forgetting, which manifests as a general performance drop on previously seen data, security regression captures harmful prediction changes at the sample level, such as a malware sample that was once correctly detected but evades detection after a model update. Although often overlooked, regressions pose serious risks in security-critical applications, as the silent reintroduction of previously detected threats in the system may undermine users’ trust in the whole updating process. To address this issue, we formalize and quantify security regression in CL-based malware detectors and propose a regression-aware penalty to mitigate it. Specifically, we adapt Positive Congruent Training (PCT) to the CL setting, preserving prior predictive behavior in a model-agnostic manner. Experiments on the ELSA, Tesseract, and AZ-Class datasets show that our method effectively reduces regression across different CL scenarios while maintaining strong detection performance over time.

nan

Article 1172

Title@2025-07-24 (4): GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

Title: GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

GNN-ALLP:基于模拟电路链接预测的图表神经网络 2504.10240v4

Authors (9): Guanyuan Pan, Tiansheng Zhou, Bingtao Ma, Yaqi Wang, Jianxiang Zhao, Zhi Li, Yugui Lin, Pietro Lio, Shuai Wang

Circuit link prediction identifying missing component connections from incomplete netlists is crucial in analog circuit design automation. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to various netlist formats. We propose GNN-ACLP, a graph neural networks (GNNs) based method featuring three innovations to tackle these challenges. First, we introduce the SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction) framework and achieve port-level accuracy in circuit link prediction. Second, we propose Netlist Babel Fish, a netlist format conversion tool leveraging retrieval-augmented generation (RAG) with a large language model (LLM) to improve the compatibility of netlist formats. Finally, we construct SpiceNetlist, a comprehensive dataset that contains 775 annotated circuits across 10 different component classes. Experiments demonstrate accuracy improvements of 16.08% on SpiceNetlist, 11.38% on Image2Net, and 16.01% on Masala-CHAI compared to the baseline in intra-dataset evaluation, while maintaining accuracy from 92.05% to 99.07% in cross-dataset evaluation, exhibiting robust feature transfer capabilities.

nan

Article 1173

Title@2025-07-24 (4): Variational inference for pile-up removal at hadron colliders with diffusion models

Title: Variational inference for pile-up removal at hadron colliders with diffusion models

Variationsableitung zur Stapelabfuhr an Hadron-Kollidern mit Diffusionsmodellen

与扩散模型相撞的hadron相撞器的堆叠式清除的变异推论 2410.22074v2

Authors (4): Malte Algren, Tobias Golling, Christopher Pollard, John Andrew Raine

In this paper, we present a novel method for pile-up removal of $pp$ interactions using variational inference with diffusion models, called vipr. Instead of using classification methods to identify which particles are from the primary collision, a generative model is trained to predict the constituents of the hard-scatter particle jets with pile-up removed. This results in an estimate of the full posterior over hard-scatter jet constituents, which has not yet been explored in the context of pile-up removal, yielding a clear advantage over existing methods especially in the presence of imperfect detector efficiency. We evaluate the performance of vipr in a sample of jets from simulated $t\bar{t}$ events overlain with pile-up contamination. vipr outperforms softdrop and has comparable performance to puppiml in predicting the substructure of the hard-scatter jets over a wide range of pile-up scenarios.

nan

Article 1174

Title@2025-07-24 (4): PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Title: PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

PRIX: Planen lernen von rohen Pixeln für autonomes Fahren Ende-zu-Ende

PRIX: 学习从Raw像素到计划用于终端到终端自治驾驶 2507.17596v2

Authors (4): Maciej K. Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

nan

Article 1175

Title@2025-07-24 (4): Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

Title: Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

Selbstüberwachte Verzahnung des unstrukturierten Gitters mit automatischer Differenzierung

带有自动差异的无结构网格自操作粗化 2507.18297v1

Authors (5): Sergei Shumilin, Alexander Ryabov, Nikolay Yavich, Evgeny Burnaev, Vladimir Vanovskiy

Due to the high computational load of modern numerical simulation, there is a demand for approaches that would reduce the size of discrete problems while keeping the accuracy reasonable. In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics. We achieve this by employing k-means clustering, autodifferentiation and stochastic minimization algorithms. We demonstrate performance of the designed algorithm on two PDEs: a linear parabolic equation which governs slightly compressible fluid flow in porous media and the wave equation. Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest. The proposed approach can be applied to the simulation of an arbitrary system described by evolutionary partial differential equations.

nan

Article 1176

Title@2025-07-24 (4): Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

Title: Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

Leveraging Data Augmentation und Siamese Learning für vorausschauende Prozessüberwachung

利用数据增强和西亚学习来监测预测过程 2507.18293v1

Authors (3): Sjoerd van Straten, Alessandro Padella, Marwan Hassani

Predictive Process Monitoring (PPM) enables forecasting future events or outcomes of ongoing business process instances based on event logs. However, deep learning PPM approaches are often limited by the low variability and small size of real-world event logs. To address this, we introduce SiamSA-PPM, a novel self-supervised learning framework that combines Siamese learning with Statistical Augmentation for Predictive Process Monitoring. It employs three novel statistically grounded transformation methods that leverage control-flow semantics and frequent behavioral patterns to generate realistic, semantically valid new trace variants. These augmented views are used within a Siamese learning setup to learn generalizable representations of process prefixes without the need for labeled supervision. Extensive experiments on real-life event logs demonstrate that SiamSA-PPM achieves competitive or superior performance compared to the SOTA in both next activity and final outcome prediction tasks. Our results further show that statistical augmentation significantly outperforms random transformations and improves variability in the data, highlighting SiamSA-PPM as a promising direction for training data enrichment in process prediction.

nan

Article 1177

Title@2025-07-24 (4): Learning Concepts Definable in First-Order Logic with Counting

Title: Learning Concepts Definable in First-Order Logic with Counting

Lernkonzepte im Logic erster Ordnung mit Zählen definierbar

一阶逻辑中与计数相容的学习概念 1909.03820v5

Authors (1): Steffen van Bergerem

We study Boolean classification problems over relational background structures in the logical framework introduced by Grohe and Tur'an (TOCS 2004). It is known (Grohe and Ritzert, LICS 2017) that classifiers definable in first-order logic over structures of polylogarithmic degree can be learned in sublinear time, where the degree of the structure and the running time are measured in terms of the size of the structure. We generalise the results to the first-order logic with counting FOCN, which was introduced by Kuske and Schweikardt (LICS 2017) as an expressive logic generalising various other counting logics. Specifically, we prove that classifiers definable in FOCN over classes of structures of polylogarithmic degree can be consistently learned in sublinear time. This can be seen as a first step towards extending the learning framework to include numerical aspects of machine learning. We extend the result to agnostic probably approximately correct (PAC) learning for classes of structures of degree at most $(\log \log n)^c$ for some constant $c$. Moreover, we show that bounding the degree is crucial to obtain sublinear-time learning algorithms. That is, we prove that, for structures of unbounded degree, learning is not possible in sublinear time, even for classifiers definable in plain first-order logic.

nan

Article 1178

Title@2025-07-24 (4): Alternative Loss Function in Evaluation of Transformer Models

Title: Alternative Loss Function in Evaluation of Transformer Models

Alternative Verlustfunktion bei der Bewertung von Transformer-Modellen

变换模型评价中的替代损失功能 2507.16548v2

Authors (3): Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk

The proper design and architecture of testing machine learning models, especially in their application to quantitative finance problems, is crucial. The most important aspect of this process is selecting an adequate loss function for training, validation, estimation purposes, and hyperparameter tuning. Therefore, in this research, through empirical experiments on equity and cryptocurrency assets, we apply the Mean Absolute Directional Loss (MADL) function, which is more adequate for optimizing forecast-generating models used in algorithmic investment strategies. The MADL function results are compared between Transformer and LSTM models, and we show that in almost every case, Transformer results are significantly better than those obtained with LSTM.

nan

Article 1179

Title@2025-07-24 (4): SyncMapV2: Robust and Adaptive Unsupervised Segmentation

Title: SyncMapV2: Robust and Adaptive Unsupervised Segmentation

SyncMapV2: Robuste und adaptive unüberwachte Segmentierung

同步马普V2: 强力和适应性不受监督的分割 2506.16297v3

Authors (3): Heng Zhang, Zikang Wan, Danilo Vasconcellos Vargas

Human vision excels at segmenting visual cues without the need for explicit training, and it remains remarkably robust even as noise severity increases. In contrast, existing AI algorithms struggle to maintain accuracy under similar conditions. Here, we present SyncMapV2, the first to solve unsupervised segmentation with state-of-the-art robustness. SyncMapV2 exhibits a minimal drop in mIoU, only 0.01%, under digital corruption, compared to a 23.8% drop observed in SOTA methods. This superior performance extends across various types of corruption: noise (7.3% vs. 37.7%), weather (7.5% vs. 33.8%), and blur (7.0% vs. 29.5%). Notably, SyncMapV2 accomplishes this without any robust training, supervision, or loss functions. It is based on a learning paradigm that uses self-organizing dynamical equations combined with concepts from random networks. Moreover, unlike conventional methods that require re-initialization for each new input, SyncMapV2 adapts online, mimicking the continuous adaptability of human vision. Thus, we go beyond the accurate and robust results, and present the first algorithm that can do all the above online, adapting to input rather than re-initializing. In adaptability tests, SyncMapV2 demonstrates near-zero performance degradation, which motivates and fosters a new generation of robust and adaptive intelligence in the near future.

nan

Article 1180

Title@2025-07-24 (4): Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

Title: Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

Multimodale Verhaltensmusteranalyse mit Eye-Tracking und LLM-basierter Vernunft

以眼跟踪和基于LLM的理由进行多模式行为模式分析 2507.18252v1

Authors (4): Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci

Eye-tracking data reveals valuable insights into users’ cognitive states but is difficult to analyze due to its structured, non-linguistic nature. While large language models (LLMs) excel at reasoning over text, they struggle with temporal and numerical data. This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals. The framework includes: (1) a multi-stage pipeline using horizontal and vertical segmentation alongside LLM reasoning to uncover latent gaze patterns; (2) an Expert-Model Co-Scoring Module that integrates expert judgment with LLM output to generate trust scores for behavioral interpretations; and (3) a hybrid anomaly detection module combining LSTM-based temporal modeling with LLM-driven semantic analysis. Our results across several LLMs and prompt strategies show improvements in consistency, interpretability, and performance, with up to 50% accuracy in difficulty prediction tasks. This approach offers a scalable, interpretable solution for cognitive modeling and has broad potential in adaptive learning, human-computer interaction, and educational analytics.

nan

Article 1181

Title@2025-07-24 (4): Latent Representations of Intracardiac Electrograms for Atrial Fibrillation Driver Detection

Title: Latent Representations of Intracardiac Electrograms for Atrial Fibrillation Driver Detection

Latente Darstellungen von intrakardialen Elektrogrammen für Vorhofflimmern-Treibererkennung

用于实验性纤维纤维化驱动检测的心内热解电图 2507.19547v1

Authors (7): Pablo Peiro-Corbacho, Long Lin, Pablo Ávila, Alejandro Carta-Bergaz, Ángel Arenal, Carlos Sevilla-Salcedo, Gonzalo R. Ríos-Muñoz

Atrial Fibrillation (AF) is the most prevalent sustained arrhythmia, yet current ablation therapies, including pulmonary vein isolation, are frequently ineffective in persistent AF due to the involvement of non-pulmonary vein drivers. This study proposes a deep learning framework using convolutional autoencoders for unsupervised feature extraction from unipolar and bipolar intracavitary electrograms (EGMs) recorded during AF in ablation studies. These latent representations of atrial electrical activity enable the characterization and automation of EGM analysis, facilitating the detection of AF drivers. The database consisted of 11,404 acquisitions recorded from 291 patients, containing 228,080 unipolar EGMs and 171,060 bipolar EGMs. The autoencoders successfully learned latent representations with low reconstruction loss, preserving the morphological features. The extracted embeddings allowed downstream classifiers to detect rotational and focal activity with moderate performance (AUC 0.73-0.76) and achieved high discriminative performance in identifying atrial EGM entanglement (AUC 0.93). The proposed method can operate in real-time and enables integration into clinical electroanatomical mapping systems to assist in identifying arrhythmogenic regions during ablation procedures. This work highlights the potential of unsupervised learning to uncover physiologically meaningful features from intracardiac signals.

nan

Article 1182

Title@2025-07-24 (4): Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Title: Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Revisited Boosting: Benchmarking and Advancing LP-Based Ensemble Methods

重新审视促进:基准制定和推进基于LP的组合组合方法 2507.18242v1

Authors (5): Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hebrard, Thibaut Vidal

Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.

nan

Article 1183

Title@2025-07-24 (4): Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Title: Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Robustes Multi-View-Lernen durch Darstellung Fusion von Sample-Level-Achtung und Ausrichtung der simulierten Perturbation

通过展示抽样关注层的聚合和模拟扰动的调整,通过代表方式进行强有力的多视角学习 2503.04151v2

Authors (5): Jie Xu, Na Zhao, Gang Niu, Masashi Sugiyama, Xiaofeng Zhu

Recently, multi-view learning (MVL) has garnered significant attention due to its ability to fuse discriminative information from multiple views. However, real-world multi-view datasets are often heterogeneous and imperfect, which usually causes MVL methods designed for specific combinations of views to lack application potential and limits their effectiveness. To address this issue, we propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment. Specifically, we introduce a simple yet effective multi-view transformer fusion network where we transform heterogeneous multi-view data into homogeneous word embeddings, and then integrate multiple views by the sample-level attention mechanism to obtain a fused representation. Furthermore, we propose a simulated perturbation based multi-view contrastive learning framework that dynamically generates the noise and unusable perturbations for simulating imperfect data conditions. The simulated noisy and unusable data obtain two distinct fused representations, and we utilize contrastive learning to align them for learning discriminative and robust representations. Our RML is self-supervised and can also be applied for downstream tasks as a regularization. In experiments, we employ it in multi-view unsupervised clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval. Extensive comparison experiments and ablation studies validate RML’s effectiveness. Code is available at https://github.com/SubmissionsIn/RML.

nan

Article 1184

Title@2025-07-24 (4): Compositional Coordination for Multi-Robot Teams with Large Language Models

Title: Compositional Coordination for Multi-Robot Teams with Large Language Models

Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen

具有大语言模式的多机器人小组的组成协调 2507.16068v2

Authors (5): Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme

Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb

nan

Article 1185

Title@2025-07-24 (4): Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Title: Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Warum wirken sich klassenabhängige Auswertungseffekte mit Zeitreihen-Feature-Attributionen aus? Eine synthetische Datenuntersuchung

为何类依赖评价效果与时间序列特征属性是否相符? 合成数据调查 2506.11790v2

Authors (4): Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp

Evaluating feature attribution methods represents a critical challenge in explainable AI (XAI), as researchers typically rely on perturbation-based metrics when ground truth is unavailable. However, recent work reveals that these evaluation metrics can show different performance across predicted classes within the same dataset. These “class-dependent evaluation effects” raise questions about whether perturbation analysis reliably measures attribution quality, with direct implications for XAI method development and evaluation trustworthiness. We investigate under which conditions these class-dependent effects arise by conducting controlled experiments with synthetic time series data where ground truth feature locations are known. We systematically vary feature types and class contrasts across binary classification tasks, then compare perturbation-based degradation scores with ground truth-based precision-recall metrics using multiple attribution methods. Our experiments demonstrate that class-dependent effects emerge with both evaluation approaches, even in simple scenarios with temporally localized features, triggered by basic variations in feature amplitude or temporal extent between classes. Most critically, we find that perturbation-based and ground truth metrics frequently yield contradictory assessments of attribution quality across classes, with weak correlations between evaluation approaches. These findings suggest that researchers should interpret perturbation-based metrics with care, as they may not always align with whether attributions correctly identify discriminating features. By showing this disconnect, our work points toward reconsidering what attribution evaluation actually measures and developing more rigorous evaluation methods that capture multiple dimensions of attribution quality.

nan

Article 1186

Title@2025-07-24 (4): Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective

Title: Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective

Sparse Identifikation von nichtlinearen Dynamiken mit Bibliotheksoptimierungsmechanismus: Rekursive langfristige Vorhersageperspektive

利用图书馆优化机制粗略地识别非线性动态与图书馆优化机制:递归性长期预测前景 2507.18220v1

Authors (7): Ansei Yonezawa, Heisei Yonezawa, Shuichi Yahagi, Itsuro Kajiwara, Shinya Kijimoto, Hikaru Taniuchi, Kentaro Murakami

The sparse identification of nonlinear dynamics (SINDy) approach can discover the governing equations of dynamical systems based on measurement data, where the dynamical model is identified as the sparse linear combination of the given basis functions. A major challenge in SINDy is the design of a library, which is a set of candidate basis functions, as the appropriate library is not trivial for many dynamical systems. To overcome this difficulty, this study proposes SINDy with library optimization mechanism (SINDy-LOM), which is a combination of the sparse regression technique and the novel learning strategy of the library. In the proposed approach, the basis functions are parametrized. The SINDy-LOM approach involves a two-layer optimization architecture: the inner-layer, in which the data-driven model is extracted as the sparse linear combination of the candidate basis functions, and the outer-layer, in which the basis functions are optimized from the viewpoint of the recursive long-term (RLT) prediction accuracy; thus, the library design is reformulated as the optimization of the parametrized basis functions. The resulting SINDy-LOM model has good interpretability and usability, as the proposed approach yields the parsimonious model. The library optimization mechanism significantly reduces user burden. The RLT perspective improves the reliability of the resulting model compared with the traditional SINDy approach that can only ensure the one-step-ahead prediction accuracy. The validity of the proposed approach is demonstrated by applying it to a diesel engine airpath system, which is a well-known complex industrial system.

nan

Article 1187

Title@2025-07-24 (4): FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Title: FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

FedSA-GCL: Ein semi-asynchrones Federated Graph Learning Framework mit personalisierter Aggregation und Cluster-Aware Broadcasting

FedSA-GCL:半同步的联邦联邦图表学习框架,配有个性化聚合和集束软件广播 2507.18219v1

Authors (6): Zhongzheng Yuan, Lianshuai Guo, Xunkai Li, Yinlin Zhu, Wenyu Wang, Meixia Qu

Federated Graph Learning (FGL) is a distributed learning paradigm that enables collaborative training over large-scale subgraphs located on multiple local systems. However, most existing FGL approaches rely on synchronous communication, which leads to inefficiencies and is often impractical in real-world deployments. Meanwhile, current asynchronous federated learning (AFL) methods are primarily designed for conventional tasks such as image classification and natural language processing, without accounting for the unique topological properties of graph data. Directly applying these methods to graph learning can possibly result in semantic drift and representational inconsistency in the global model. To address these challenges, we propose FedSA-GCL, a semi-asynchronous federated framework that leverages both inter-client label distribution divergence and graph topological characteristics through a novel ClusterCast mechanism for efficient training. We evaluate FedSA-GCL on multiple real-world graph datasets using the Louvain and Metis split algorithms, and compare it against 9 baselines. Extensive experiments demonstrate that our method achieves strong robustness and outstanding efficiency, outperforming the baselines by an average of 2.92% with the Louvain and by 3.4% with the Metis.

nan

Article 1188

Title@2025-07-24 (4): The Role of the Time-Dependent Hessian in High-Dimensional Optimization

Title: The Role of the Time-Dependent Hessian in High-Dimensional Optimization

Die Rolle des Zeitabhängigen Hessen bei der hochdimensionalen Optimierung

时间依赖的赫西安人在高多样性最佳化中的作用 2403.02418v3

Authors (3): Tony Bonnaire, Giulio Biroli, Chiara Cammarota

Gradient descent is commonly used to find minima in rough landscapes, particularly in recent machine learning applications. However, a theoretical understanding of why good solutions are found remains elusive, especially in strongly non-convex and high-dimensional settings. Here, we focus on the phase retrieval problem as a typical example, which has received a lot of attention recently in theoretical machine learning. We analyze the Hessian during gradient descent, identify a dynamical transition in its spectral properties, and relate it to the ability of escaping rough regions in the loss landscape. When the signal-to-noise ratio (SNR) is large enough, an informative negative direction exists in the Hessian at the beginning of the descent, i.e in the initial condition. While descending, a BBP transition in the spectrum takes place in finite time: the direction is lost, and the dynamics is trapped in a rugged region filled with marginally stable bad minima. Surprisingly, for finite system sizes, this window of negative curvature allows the system to recover the signal well before the theoretical SNR found for infinite sizes, emphasizing the central role of initialization and early-time dynamics for efficiently navigating rough landscapes.

nan

Article 1189

Title@2025-07-24 (4): Goal-based Trajectory Prediction for improved Cross-Dataset Generalization

Title: Goal-based Trajectory Prediction for improved Cross-Dataset Generalization

Zielbasierte Trajektorie-Vorhersage für verbesserte Cross-Dataset-Verallgemeinerung

改进交叉数据通用化的基于目标的轨迹预测 2507.18196v1

Authors (3): Daniel Grimm, Ahmed Abouelazm, J. Marius Zöllner

To achieve full autonomous driving, a good understanding of the surrounding environment is necessary. Especially predicting the future states of other traffic participants imposes a non-trivial challenge. Current SotA-models already show promising results when trained on real datasets (e.g. Argoverse2, NuScenes). Problems arise when these models are deployed to new/unseen areas. Typically, performance drops significantly, indicating that the models lack generalization. In this work, we introduce a new Graph Neural Network (GNN) that utilizes a heterogeneous graph consisting of traffic participants and vectorized road network. Latter, is used to classify goals, i.e. endpoints of the predicted trajectories, in a multi-staged approach, leading to a better generalization to unseen scenarios. We show the effectiveness of the goal selection process via cross-dataset evaluation, i.e. training on Argoverse2 and evaluating on NuScenes.

nan

Article 1190

Title@2025-07-24 (4): Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Title: Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Jenseits der Low-Rank-Dekomposition: Ein Shortcut-Ansatz für effizientes On-Device-Lernen

超越低级别分解:高效在线学习的捷径方法 2505.05086v2

Authors (4): Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione

On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to $120.09\times$ compared to vanilla training, while also reducing overall training FLOPs up to $1.86\times$ when evaluated on traditional benchmarks.

nan

Article 1191

Title@2025-07-24 (4): A general language model for peptide identification

Title: A general language model for peptide identification

Ein allgemeines Sprachmodell für die Peptididentifikation

铅化物识别通用语言模式 2502.15610v4

Authors (8): Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture, enabling robust identification across diverse peptide classes and PTM sites. We curated comprehensive benchmark datasets and implemented strategies to address data imbalance, allowing PDeepPP to systematically extract both global and local sequence features. Through extensive analyses-including dimensionality reduction and comparison studies-PDeepPP demonstrates strong, interpretable peptide representations and achieves state-of-the-art performance in 25 of the 33 biological identification tasks. Notably, PDeepPP attains high accuracy in antimicrobial (0.9726) and phosphorylation site (0.9984) identification, with 99.5% specificity in glycosylation site prediction and substantial reduction in false negatives in antimalarial tasks. By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment. All code, datasets, and pretrained models are publicly available via GitHub:https://github.com/fondress/PDeepPP and Hugging Face:https://huggingface.co/fondress/PDeppPP.

nan

Article 1192

Title@2025-07-24 (4): Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Title: Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Innovator: Wissenschaftliche Weiterbildung mit feinkörnigem MoE Upcycling

创新者:科学继续预科培训,采用精美的机动车骑车 2507.18671v1

Authors (21): Ning Liao, Xiaoxing Wang, Zehao Lin, Weiyang Guo, Feng Hong, Shixiang Song, Geng Yu, Zihua Zhao, Sitao Xie, Longxuan Wei, Xiangqi Jin, Xiaohan Qin, Jiale Ma, Kai Chen, Jiangchao Yao, Zhouhan Lin, Junchi Yan, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Linfeng Zhang

A large language model (LLM) with knowledge in both scientific and general tasks is the foundation of science general intelligence. However, directly continued pretraining an LLM using science data usually leads to catastrophic forgetting, which indicates severe degradation in general ability. In this report, we present Innovator, which solves this problem by upcycling a pre-trained dense LLM into a fine-grained Mixtures-of-Experts model during continued pretraining, where different experts are expected to learn science knowledge in different disciplines, and a shared expert is utilized for general tasks. Innovator introduces a four-stage upcycle training paradigm: (1) Scientific Expert Induction on discipline-specific data, (2) Fine-grained Expert Splitting via FFN dimension decomposition, (3) Science-Aware Routing warmup, and (4) Generalist-Scientist Integration training on hybrid datasets. Such a paradigm enables knowledge in the general domain, and different scientific disciplines can be decoupled, avoiding the negative influence among knowledge in different domains. With 53.3B total parameters and 13.3B activated, Innovator extends Qwen2.5-7B using a shared general expert and 64 specialized scientific experts with 8 activated. Trained on 300B tokens with tri-level quality-controlled data, Innovator achieves 25% average improvement across 30 scientific tasks with a win rate as 70%, while retaining 99% performance in general tasks. Furthermore, Innovator-Reason, which is post-trained from Innovator for reasoning boosting, exhibits excellent reasoning performance in solving complex scientific problems with improvements over 30%.

nan

Article 1193

Title@2025-07-24 (4): ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

Title: ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

ChronoSelect: Robustes Lernen mit lauten Etiketten über Dynamics Temporal Memory

ChronoSect: 通过动态时空内存与新标签进行强力学习 2507.18183v1

Authors (5): Jianchao Wang, Qingfeng Li, Pengcheng Zheng, Xiaorong Pu, Yazhou Ren

Training deep neural networks on real-world datasets is often hampered by the presence of noisy labels, which can be memorized by over-parameterized models, leading to significant degradation in generalization performance. While existing methods for learning with noisy labels (LNL) have made considerable progress, they fundamentally suffer from static snapshot evaluations and fail to leverage the rich temporal dynamics of learning evolution. In this paper, we propose ChronoSelect (chrono denoting its temporal nature), a novel framework featuring an innovative four-stage memory architecture that compresses prediction history into compact temporal distributions. Our unique sliding update mechanism with controlled decay maintains only four dynamic memory units per sample, progressively emphasizing recent patterns while retaining essential historical knowledge. This enables precise three-way sample partitioning into clean, boundary, and noisy subsets through temporal trajectory analysis and dual-branch consistency. Theoretical guarantees prove the mechanism’s convergence and stability under noisy conditions. Extensive experiments demonstrate ChronoSelect’s state-of-the-art performance across synthetic and real-world benchmarks.

nan

Article 1194

Title@2025-07-24 (4): Statistical Runtime Verification for LLMs via Robustness Estimation

Title: Statistical Runtime Verification for LLMs via Robustness Estimation

Statistische Laufzeitprüfung für LLMs mittels Robustheitsschätzung

通过强力估计法对LLMs进行统计运行时间校验 2504.17723v2

Authors (3): Natan Levy, Adiel Ashrov, Guy Katz

Adversarial robustness verification is essential for ensuring the safe deployment of Large Language Models (LLMs) in runtime-critical applications. However, formal verification techniques remain computationally infeasible for modern LLMs due to their exponential runtime and white-box access requirements. This paper presents a case study adapting and extending the RoMA statistical verification framework to assess its feasibility as an online runtime robustness monitor for LLMs in black-box deployment settings. Our adaptation of RoMA analyzes confidence score distributions under semantic perturbations to provide quantitative robustness assessments with statistically validated bounds. Our empirical validation against formal verification baselines demonstrates that RoMA achieves comparable accuracy (within 1\% deviation), and reduces verification times from hours to minutes. We evaluate this framework across semantic, categorial, and orthographic perturbation domains. Our results demonstrate RoMA’s effectiveness for robustness monitoring in operational LLM deployments. These findings point to RoMA as a potentially scalable alternative when formal methods are infeasible, with promising implications for runtime verification in LLM-based systems.

nan

Article 1195

Title@2025-07-24 (4): SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

Title: SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

SDSC:A Structure-Aware Metric for Semantic Signal Representative Learning

SDSC:用于语义信号代言学习的结构-孔径计量仪 2507.14516v2

Authors (2): Jeyoung Lee, Hochul Kang

We propose the Signal Dice Similarity Coefficient (SDSC), a structure-aware metric function for time series self-supervised representation learning. Most Self-Supervised Learning (SSL) methods for signals commonly adopt distance-based objectives such as mean squared error (MSE), which are sensitive to amplitude, invariant to waveform polarity, and unbounded in scale. These properties hinder semantic alignment and reduce interpretability. SDSC addresses this by quantifying structural agreement between temporal signals based on the intersection of signed amplitudes, derived from the Dice Similarity Coefficient (DSC).Although SDSC is defined as a structure-aware metric, it can be used as a loss by subtracting from 1 and applying a differentiable approximation of the Heaviside function for gradient-based optimization. A hybrid loss formulation is also proposed to combine SDSC with MSE, improving stability and preserving amplitude where necessary. Experiments on forecasting and classification benchmarks demonstrate that SDSC-based pre-training achieves comparable or improved performance over MSE, particularly in in-domain and low-resource scenarios. The results suggest that structural fidelity in signal representations enhances the semantic representation quality, supporting the consideration of structure-aware metrics as viable alternatives to conventional distance-based methods.

nan

Article 1196

Title@2025-07-24 (4): GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar

Title: GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar

GeoAvatar: Adaptive geometrische Gaussian Splatting für 3D-Kopf Avatar

GeoAvatar: 3D Avatar 头的适应性几何高山喷涂 2507.18155v1

Authors (5): SeungJun Moon, Hah Min Lew, Seungeun Lee, Ji-Su Kang, Gyeong-Moon Park

Despite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.

nan

Article 1197

Title@2025-07-24 (4): Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Title: Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Robuste, nicht adaptive Gruppenprüfung unter Fehlern in den Gruppenmitgliedschaftsspezifikationen

根据集团成员类别规格错误进行强力非适应性小组测试 2409.05345v2

Authors (4): Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson, Ajit Rajwade

Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups’, where a group is formed by mixing a subset of the $p$ samples. Assuming that the number of defective samples is very small compared to $p$, GT algorithms have provided excellent recovery of the status of all $p$ samples with even a small number of groups. Most existing methods, however, assume that the group memberships are accurately specified. This assumption may not always be true in all applications, due to various resource constraints. Such errors could occur, eg, when a technician, preparing the groups in a laboratory, unknowingly mixes together an incorrect subset of samples as compared to what was specified. We develop a new GT method, the Debiased Robust Lasso Test Method (DRLT), that handles such group membership specification errors. The proposed DRLT method is based on an approach to debias, or reduce the inherent bias in, estimates produced by Lasso, a popular and effective sparse regression technique. We also provide theoretical upper bounds on the reconstruction error produced by our estimator. Our approach is then combined with two carefully designed hypothesis tests respectively for (i) the identification of defective samples in the presence of errors in group membership specifications, and (ii) the identification of groups with erroneous membership specifications. The DRLT approach extends the literature on bias mitigation of statistical estimators such as the LASSO, to handle the important case when some of the measurements contain outliers, due to factors such as group membership specification errors. We present numerical results which show that our approach outperforms several baselines and robust regression techniques for identification of defective samples as well as erroneously specified groups.

nan

Article 1198

Title@2025-07-24 (4): Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions

Title: Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions

Neuromorphes Computing für körpereigene Intelligenz in autonomen Systemen: Aktuelle Trends, Herausforderungen und Zukunftsrichtungen

自治区内渗透情报的神经元化计算:当前趋势、挑战和未来方向 2507.18139v1

Authors (2): Alberto Marchisio, Muhammad Shafique

The growing need for intelligent, adaptive, and energy-efficient autonomous systems across fields such as robotics, mobile agents (e.g., UAVs), and self-driving vehicles is driving interest in neuromorphic computing. By drawing inspiration from biological neural systems, neuromorphic approaches offer promising pathways to enhance the perception, decision-making, and responsiveness of autonomous platforms. This paper surveys recent progress in neuromorphic algorithms, specialized hardware, and cross-layer optimization strategies, with a focus on their deployment in real-world autonomous scenarios. Special attention is given to event-based dynamic vision sensors and their role in enabling fast, efficient perception. The discussion highlights new methods that improve energy efficiency, robustness, adaptability, and reliability through the integration of spiking neural networks into autonomous system architectures. We integrate perspectives from machine learning, robotics, neuroscience, and neuromorphic engineering to offer a comprehensive view of the state of the field. Finally, emerging trends and open challenges are explored, particularly in the areas of real-time decision-making, continual learning, and the development of secure, resilient autonomous systems.

nan

Article 1199

Title@2025-07-24 (4): DAA*: Deep Angular A Star for Image-based Path Planning

Title: DAA*: Deep Angular A Star for Image-based Path Planning

DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung

DAA*:基于图像的路径规划深角A星 2507.09305v3

Authors (1): Zhiwei Xu

Path smoothness is often overlooked in path imitation learning from expert demonstrations. In this paper, we introduce a novel learning method, termed deep angular A* (DAA), by incorporating the proposed path angular freedom (PAF) into A to improve path similarity through adaptive path smoothness. The PAF aims to explore the effect of move angles on path node expansion by finding the trade-off between their minimum and maximum values, allowing for high adaptiveness for imitation learning. DAA* improves path optimality by closely aligning with the reference path through joint optimization of path shortening and smoothing, which correspond to heuristic distance and PAF, respectively. Throughout comprehensive evaluations on 7 datasets, including 4 maze datasets, 2 video-game datasets, and a real-world drone-view dataset containing 2 scenarios, we demonstrate remarkable improvements of our DAA* over neural A* in path similarity between the predicted and reference paths with a shorter path length when the shortest path is plausible, improving by 9.0% SPR, 6.9% ASIM, and 3.9% PSIM. Furthermore, when jointly learning pathfinding with both path loss and path probability map loss, DAA* significantly outperforms the state-of-the-art TransPath by 6.3% SPR, 6.0% PSIM, and 3.7% ASIM. We also discuss the minor trade-off between path optimality and search efficiency where applicable. Our code and model weights are available at https://github.com/zwxu064/DAAStar.git.

nan

Article 1200

Title@2025-07-24 (4): TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

Title: TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

TOC-UCO: ein umfassendes Repository von tabellarischen Klassifikationsdatensätzen

TOC-UCO:表格格式分类数据集综合储存库 2507.17348v2

Authors (6): Rafael Ayllón-Gavilán, David Guijo-Rubio, Antonio Manuel Gómez-Orellana, Francisco Bérchez-Moreno, Víctor Manuel Vargas-Yun, Pedro A. Gutiérrez

An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of C'ordoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.

nan

Article 1201

Title@2025-07-24 (4): Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Title: Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Maximierung von Prefix-Konfidenz bei Test-Time verbessert mathematische Reasoning effizient

使试验时间有效改进数学理由的预设信息最大化 2507.18122v1

Authors (4): Matthias Otth, Jonas Hübotter, Ido Hakimi, Andreas Krause

Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model’s own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model’s prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible than BoN to length biases. Finally, we also evaluate test-time training with prefix-confidence and find that, while outperforming the base model, it does not improve over prefix-confidence scaling.

nan

Article 1202

Title@2025-07-24 (4): Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

Title: Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

Effizientes Knowledge Tracing Leveraging Higher-Order Information in integrierten Graphen

在综合图表中利用高级命令信息 2507.18668v1

Authors (6): Donghee Han, Daehee Kim, Minjun Lee, Daeyoung Roh, Keejun Han, Mun Yong Yi

The rise of online learning has led to the development of various knowledge tracing (KT) methods. However, existing methods have overlooked the problem of increasing computational cost when utilizing large graphs and long learning sequences. To address this issue, we introduce Dual Graph Attention-based Knowledge Tracing (DGAKT), a graph neural network model designed to leverage high-order information from subgraphs representing student-exercise-KC relationships. DGAKT incorporates a subgraph-based approach to enhance computational efficiency. By processing only relevant subgraphs for each target interaction, DGAKT significantly reduces memory and computational requirements compared to full global graph models. Extensive experimental results demonstrate that DGAKT not only outperforms existing KT models but also sets a new standard in resource efficiency, addressing a critical need that has been largely overlooked by prior KT approaches.

nan

Article 1203

Title@2025-07-24 (4): VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

Title: VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung

VCDiag: 失灵千兆字节加速不规则波形分类 2506.03590v3

Authors (7): Minh Luu, Surya Jasper, Khoi Le, Evan Pan, Michael Quinn, Aakash Tyagi, Jiang Hu

Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.

nan

Article 1204

Title@2025-07-24 (4): Generalizing Adam to Manifolds for Efficiently Training Transformers

Title: Generalizing Adam to Manifolds for Efficiently Training Transformers

Verallgemeinern von Adam zu Manifolds für effizientes Training Transformers

将亚当推广为高效率培训变换器的处理器 2305.16901v4

Authors (1): Benedikt Brantner

One of the primary reasons behind the success of neural networks has been the emergence of an array of new, highly-successful optimizers, perhaps most importantly the Adam optimizer. It is widely used for training neural networks, yet notoriously hard to interpret. Lacking a clear physical intuition, Adam is difficult to generalize to manifolds. Some attempts have been made to directly apply parts of the Adam algorithm to manifolds or to find an underlying structure, but a full generalization has remained elusive. In this work a new approach is presented that leverages the special structure of the manifolds which are relevant for optimization of neural networks, such as the Stiefel manifold, the symplectic Stiefel manifold and the Grassmann manifold: all of these are homogeneous spaces and as such admit a global tangent space representation - a common vector space (Lie subspace) in which all tangent spaces can easily be represented. This global tangent space representation is used to perform all of the steps in the Adam optimizer and we are able to fully generalize the optimizer to manifolds without a projection step. The resulting algorithm is then applied to train a transformer for which orthogonality constraints are enforced up to machine precision and we observe significant speed-ups in the training process.

nan

Article 1205

Title@2025-07-24 (4): A Two-armed Bandit Framework for A/B Testing

Title: A Two-armed Bandit Framework for A/B Testing

Ein zweiarmiges Bandit-Framework für A/B-Tests

A/B测试有两武装的土匪框架 2507.18118v1

Authors (5): Jinjuan Wang, Qianglin Wen, Yu Zhang, Xiaodong Yan, Chengchun Shi

A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the $p$-value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to existing methods.

nan

Article 1206

Title@2025-07-24 (4): The Impact of Pseudo-Science in Financial Loans Risk Prediction

Title: The Impact of Pseudo-Science in Financial Loans Risk Prediction

Die Auswirkungen von Pseudo-Science auf die Risikovorhersage von Finanzkrediten

假科学对金融贷款风险预测的影响 2507.16182v2

Authors (2): Bruno Scarone, Ricardo Baeza-Yates

We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.

nan

Article 1207

Title@2025-07-24 (4): Agentic AI framework for End-to-End Medical Data Inference

Title: Agentic AI framework for End-to-End Medical Data Inference

Agentische KI-Framework für Ende-zu-Ende medizinische Datenableitung

最终至最终医疗数据推断的AA AA 框架框架 2507.18115v1

Authors (5): Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha

Building and deploying machine learning solutions in healthcare remains expensive and labor-intensive due to fragmented preprocessing workflows, model compatibility issues, and stringent data privacy constraints. In this work, we introduce an Agentic AI framework that automates the entire clinical data pipeline, from ingestion to inference, through a system of modular, task-specific agents. These agents handle both structured and unstructured data, enabling automatic feature selection, model selection, and preprocessing recommendation without manual intervention. We evaluate the system on publicly available datasets from geriatrics, palliative care, and colonoscopy imaging. For example, in the case of structured data (anxiety data) and unstructured data (colonoscopy polyps data), the pipeline begins with file-type detection by the Ingestion Identifier Agent, followed by the Data Anonymizer Agent ensuring privacy compliance, where we first identify the data type and then anonymize it. The Feature Extraction Agent identifies features using an embedding-based approach for tabular data, extracting all column names, and a multi-stage MedGemma-based approach for image data, which infers modality and disease name. These features guide the Model-Data Feature Matcher Agent in selecting the best-fit model from a curated repository. The Preprocessing Recommender Agent and Preprocessing Implementor Agent then apply tailored preprocessing based on data type and model requirements. Finally, the ``Model Inference Agent” runs the selected model on the uploaded data and generates interpretable outputs using tools like SHAP, LIME, and DETR attention maps. By automating these high-friction stages of the ML lifecycle, the proposed framework reduces the need for repeated expert intervention, offering a scalable, cost-efficient pathway for operationalizing AI in clinical environments.

nan

Article 1208

Title@2025-07-24 (4): Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Title: Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Politische Disruption bei der Stärkung des Lernens:Umgekehrter Angriff mit großen Sprachmodellen und kritischer Zustandsidentifikation

强化学习方面的政策混乱:以大语言模式和关键状态识别进行反向攻击 2507.18113v1

Authors (5): Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong

Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving, but adversarial attacks designed to mislead RL systems remain challenging. Existing approaches often rely on modifying the environment or policy, limiting their practicality. This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment. We propose a reward iteration optimization framework that leverages large language models (LLMs) to generate adversarial rewards explicitly tailored to the vulnerabilities of the target agent, thereby enhancing the effectiveness of inducing the target agent toward suboptimal decision-making. Additionally, a critical state identification algorithm is designed to pinpoint the target agent’s most vulnerable states, where suboptimal behavior from the victim leads to significant degradation in overall performance. Experimental results in diverse environments demonstrate the superiority of our method over existing approaches.

nan

Article 1209

Title@2025-07-24 (4): Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Title: Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Prozentual basierte Deep-Verstärkung-Lernen und Belohnung basierte Personalisierung für Delay Aware RAN Slicing in O-RAN

在O-RAN为延迟了解RAN切片而进行百分百分率深强化学习和奖励性个人化 2507.18111v1

Authors (2): Peyman Tehrani, Anas Alsoliman

In this paper, we tackle the challenge of radio access network (RAN) slicing within an open RAN (O-RAN) architecture. Our focus centers on a network that includes multiple mobile virtual network operators (MVNOs) competing for physical resource blocks (PRBs) with the goal of meeting probabilistic delay upper bound constraints for their clients while minimizing PRB utilization. Initially, we derive a reward function based on the law of large numbers (LLN), then implement practical modifications to adapt it for real-world experimental scenarios. We then propose our solution, the Percentile-based Delay-Aware Deep Reinforcement Learning (PDA-DRL), which demonstrates its superiority over several baselines, including DRL models optimized for average delay constraints, by achieving a 38\% reduction in resultant average delay. Furthermore, we delve into the issue of model weight sharing among multiple MVNOs to develop a robust personalized model. We introduce a reward-based personalization method where each agent prioritizes other agents’ model weights based on their performance. This technique surpasses traditional aggregation methods, such as federated averaging, and strategies reliant on traffic patterns and model weight distance similarities.

nan

Article 1210

Title@2025-07-24 (4): A New Pair of GloVes

Title: A New Pair of GloVes

Ein neues Paar GloVes

新的地球之对 2507.18103v1

Authors (3): Riley Carlson, John Bauer, Christopher D. Manning

This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as non-Western newswire data.

nan

Article 1211

Title@2025-07-24 (4): Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

Title: Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

Vergleich der Segmentierungsmethoden bei der Fernerkundung für die Bodenbedeckung

土地利用、土地利用的变化和林业遥感遥感分路方法比较 2507.18099v1

Authors (5): Naman Srivastava, Joel D Joy, Yash Dixit, Swarup E, Rakshit Ramesh

Land Use Land Cover (LULC) mapping is essential for urban and resource planning, and is one of the key elements in developing smart and sustainable cities.This study evaluates advanced LULC mapping techniques, focusing on Look-Up Table (LUT)-based Atmospheric Correction applied to Cartosat Multispectral (MX) sensor images, followed by supervised and semi-supervised learning models for LULC prediction. We explore DeeplabV3+ and Cross-Pseudo Supervision (CPS). The CPS model is further refined with dynamic weighting, enhancing pseudo-label reliability during training. This comprehensive approach analyses the accuracy and utility of LULC mapping techniques for various urban planning applications. A case study of Hyderabad, India, illustrates significant land use changes due to rapid urbanization. By analyzing Cartosat MX images over time, we highlight shifts such as urban sprawl, shrinking green spaces, and expanding industrial areas. This demonstrates the practical utility of these techniques for urban planners and policymakers.

nan

Article 1212

Title@2025-07-24 (4): Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

Title: Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

Lernen von Hardlabels mit zusätzlicher Überwachung auf nicht-Hard-Label-Klassen

学习从硬标签中学习,对非黑、黑、黑、有附加监督的非黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑、黑 2507.18098v1

Authors (2): Kosuke Sugiyama, Masato Uchida

In scenarios where training data is limited due to observation costs or data scarcity, enriching the label information associated with each instance becomes crucial for building high-accuracy classification models. In such contexts, it is often feasible to obtain not only hard labels but also {\it additional supervision}, such as the confidences for the hard labels. This setting naturally raises fundamental questions: {\it What kinds of additional supervision are intrinsically beneficial?} And {\it how do they contribute to improved generalization performance?} To address these questions, we propose a theoretical framework that treats both hard labels and additional supervision as probability distributions, and constructs soft labels through their affine combination. Our theoretical analysis reveals that the essential component of additional supervision is not the confidence score of the assigned hard label, but rather the information of the distribution over the non-hard-labeled classes. Moreover, we demonstrate that the additional supervision and the mixing coefficient contribute to the refinement of soft labels in complementary roles. Intuitively, in the probability simplex, the additional supervision determines the direction in which the deterministic distribution representing the hard label should be adjusted toward the true label distribution, while the mixing coefficient controls the step size along that direction. Through generalization error analysis, we theoretically characterize how the additional supervision and its mixing coefficient affect both the convergence rate and asymptotic value of the error bound. Finally, we experimentally demonstrate that, based on our theory, designing additional supervision can lead to improved classification accuracy, even when utilized in a simple manner.

nan

Article 1213

Title@2025-07-24 (4): Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Title: Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Lang-Short-Distanz Graph Neural Networks und verbessertes Curriculum-Lernen für Emotionserkennung im Gespräch

长短距离远距神经神经网络和改进课程学习,以在对话中认识情感 2507.15205v2

Authors (3): Xinran Li, Xiujuan Xu, Jiaqi Qiao

Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a “weighted emotional shift” metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.

nan

Article 1214

Title@2025-07-24 (4): LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

Title: LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs

LLM 网络动态:追踪在LLM网络中的模型崩溃情况 2506.15690v3

Authors (4): Tianyu Wang, Akira Horiguchi, Lingyou Pang, Carey E. Priebe

The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.

nan

Article 1215

Title@2025-07-24 (4): A Principled Approach for Data Bias Mitigation

Title: A Principled Approach for Data Bias Mitigation

Ein prinzipieller Ansatz für Daten-Bias-Minderung

减轻数据偏见的原则办法 2405.12312v4

Authors (4): Bruno Scarone, Alfredo Viola, Renée J. Miller, Ricardo Baeza-Yates

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. \emph{Bias} in the data can adversely affect this decision-making. We present a new mitigation strategy to address data bias. Our methods are explainable and come with mathematical guarantees of correctness. They can take advantage of new work on table discovery to find new tuples that can be added to a dataset to create real datasets that are unbiased or less biased. Our framework covers data with non-binary labels and with multiple sensitive attributes. Hence, we are able to measure and mitigate bias that does not appear over a single attribute (or feature), but only intersectionally, when considering a combination of attributes. We evaluate our techniques on publicly available datasets and provide a theoretical analysis of our results, highlighting novel insights into data bias.

nan

Article 1216

Title@2025-07-24 (4): Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

Title: Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen

共同残存挖掘者:改进现实世界接触-Rich 人教管管管 2506.16685v2

Authors (4): Xiaomeng Xu, Yifan Hou, Zeyi Liu, Shuran Song

We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/

nan

Article 1217

Title@2025-07-24 (4): Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Title: Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Feinangepasste Sprachmodelle erzeugen stabile anorganische Materialien als Text

精精精导语言模型生成稳定无机材料作为文本 2402.04379v2

Authors (6): Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting’s inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models’ ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.

nan

Article 1218

Title@2025-07-24 (4): Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

Title: Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

Komprimierte und verteilte am wenigsten quadratische Regression: Konvergenzraten mit Anwendungen für Federated Learning

压缩和分布的最小平方回归:与应用到联邦学习的趋同率 2308.01358v2

Authors (2): Constantin Philippenko, Aymeric Dieuleveut

In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}{\mathrm{ania}}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}{\mathrm{ania}} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.

nan

Article 1219

Title@2025-07-24 (4): History-Guided Video Diffusion

Title: History-Guided Video Diffusion

Geschichte-geführte Video-Diffusion

历史引导视频传播 2502.06764v2

Authors (6): Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Project website: https://boyuan.space/history-guidance

nan

Article 1220

Title@2025-07-24 (4): Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method

Title: Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method

Squeeze10-LLM: Gewichte der LLMs um 10 Mal durch eine stufenweise gemischte Präzisionsquantifizierung

Squeze10-LLLM:通过分阶段混合精密量化方法用10 Times挤压LLMs的重量 2507.18073v1

Authors (12): Qingcheng Zhu, Yangyang Ren, Linlin Yang, Mingbao Lin, Yanjing Li, Sheng Xu, Zichao Feng, Haodong Zhu, Yuguang Yang, Juan Zhang, Runqi Wang, Baochang Zhang

Deploying large language models (LLMs) is challenging due to their massive parameters and high computational costs. Ultra low-bit quantization can significantly reduce storage and accelerate inference, but extreme compression (i.e., mean bit-width <= 2) often leads to severe performance degradation. To address this, we propose Squeeze10-LLM, effectively “squeezing” 16-bit LLMs’ weights by 10 times. Specifically, Squeeze10-LLM is a staged mixed-precision post-training quantization (PTQ) framework and achieves an average of 1.6 bits per weight by quantizing 80% of the weights to 1 bit and 20% to 4 bits. We introduce Squeeze10LLM with two key innovations: Post-Binarization Activation Robustness (PBAR) and Full Information Activation Supervision (FIAS). PBAR is a refined weight significance metric that accounts for the impact of quantization on activations, improving accuracy in low-bit settings. FIAS is a strategy that preserves full activation information during quantization to mitigate cumulative error propagation across layers. Experiments on LLaMA and LLaMA2 show that Squeeze10-LLM achieves state-of-the-art performance for sub-2bit weight-only quantization, improving average accuracy from 43% to 56% on six zero-shot classification tasks–a significant boost over existing PTQ methods. Our code will be released upon publication.

nan

Article 1221

Title@2025-07-24 (4): C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams

Title: C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams

C-AAE: Komprimierend anonymisierende Autoencoder für Datenschutz-Erhaltung Aktivitätserkennung in Healthcare Sensor Streams

C-AAE: 压缩匿名自动编码器,以便在保健感应器流中确认隐私保护活动 2507.18072v1

Authors (3): Ryusei Fujimoto, Yugo Nakamura, Yutaka Arakawa

Wearable accelerometers and gyroscopes encode fine-grained behavioural signatures that can be exploited to re-identify users, making privacy protection essential for healthcare applications. We introduce C-AAE, a compressive anonymizing autoencoder that marries an Anonymizing AutoEncoder (AAE) with Adaptive Differential Pulse-Code Modulation (ADPCM). The AAE first projects raw sensor windows into a latent space that retains activity-relevant features while suppressing identity cues. ADPCM then differentially encodes this latent stream, further masking residual identity information and shrinking the bitrate. Experiments on the MotionSense and PAMAP2 datasets show that C-AAE cuts user re-identification F1 scores by 10-15 percentage points relative to AAE alone, while keeping activity-recognition F1 within 5 percentage points of the unprotected baseline. ADPCM also reduces data volume by roughly 75 %, easing transmission and storage overheads. These results demonstrate that C-AAE offers a practical route to balancing privacy and utility in continuous, sensor-based activity recognition for healthcare.

nan

Article 1222

Title@2025-07-24 (4): BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

Title: BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz

BlockDiaect: 节能LLM 推论的粗件精细混合格式量化 2501.01144v5

Authors (2): Wonsuk Jang, Thierry Tambe

The rapidly increasing size of large language models (LLMs) presents significant challenges in memory usage and computational costs. Quantizing both weights and activations can address these issues, with hardware-supported fine-grained scaling emerging as a promising solution to mitigate outliers. However, existing methods struggle to capture nuanced block data distributions. We propose BlockDialect, a block-wise fine-grained mixed format technique that assigns a per-block optimal number format from a formatbook for better data representation. Additionally, we introduce DialectFP4, a formatbook of FP4 variants (akin to dialects) that adapt to diverse data distributions. To leverage this efficiently, we propose a two-stage approach for online DialectFP4 activation quantization. Importantly, DialectFP4 ensures energy efficiency by selecting representable values as scaled integers compatible with low-precision integer arithmetic. BlockDialect achieves 10.78% (7.48%) accuracy gain on the LLaMA3-8B (LLaMA2-7B) model compared to MXFP4 format with lower bit usage per data, while being only 5.45% (2.69%) below full precision even when quantizing full-path matrix multiplication. Focusing on how to represent over how to scale, our work presents a promising path for energy-efficient LLM inference.

nan

Article 1223

Title@2025-07-24 (4): Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Title: Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Multiscale Neural PDE Surrogats für Vorhersage und Downscaling: Anwendung auf Meeresströmungen

预测和缩小预测和缩小尺度的多尺度多神经PDE代号:对洋流的应用 2507.18067v1

Authors (4): Abdessamad El-Kabid, Loubna Benabbou, Redouane Lguensat, Alex Hernández-García

Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanography, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.

nan

Article 1224

Title@2025-07-24 (4): Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Title: Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Fixierung der Pitfalls der probabilistischen Zeitreihen-Prognosebewertung durch Kernel-Quadratur

由内核二次曲线确定概率时间- 系列预测评价的空隙 2503.06079v2

Authors (3): Masaki Adachi, Masahiro Fujisawa, Michael A Osborne

Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found that popular CRPS estimators–specifically, the quantile-based estimator implemented in the widely used GluonTS library and the probability-weighted moment approximation–both exhibit inherent estimation biases. These biases lead to crude approximations, resulting in improper rankings of forecasting model performance when CRPS values are close. To address this issue, we introduced a kernel quadrature approach that leverages an unbiased CRPS estimator and employs cubature construction for scalable computation. Empirically, our approach consistently outperforms the two widely used CRPS estimators.

nan

Article 1225

Title@2025-07-24 (4): Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Title: Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Causally Testing Gender Bias in LLMs: Eine Fallstudie über berufsbezogene Bias

《LLMM中因果测试性别偏见:职业偏见案例研究》 2212.10678v4

Authors (5): Yuen Chen, Vethavikashini Chithrra Raghuram, Justus Mattern, Rada Mihalcea, Zhijing Jin

Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. This paper introduces a causal formulation for bias measurement in generative language models. Based on this theoretical foundation, we outline a list of desiderata for designing robust bias benchmarks. We then propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We test several state-of-the-art open-source LLMs on OccuGender, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias. Lastly, we discuss prompting strategies for bias mitigation and an extension of our causal formulation to illustrate the generalizability of our framework. Our code and data https://github.com/chenyuen0103/gender-bias.

nan

Article 1226

Title@2025-07-24 (4): A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

Title: A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

Ein Multi-Faceted-Evaluierungsrahmen für die Bewertung synthetischer Daten, erzeugt durch große Sprachmodelle

评估由大语言模型生成的合成数据多面评价框架 2404.14445v2

Authors (3): Yefeng Yuan, Yuhong Liu, Liang Cheng

The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.

nan

Article 1227

Title@2025-07-24 (4): Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

Title: Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

Privacy-Preserving Synthetic Review Generation mit unterschiedlichen Schreibstilen mit LLMs

使用LLMMs以多种写作风格生成的隐私-保护合成审查 2507.18055v1

Authors (6): Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng

The increasing use of synthetic data generated by Large Language Models (LLMs) presents both opportunities and challenges in data-driven applications. While synthetic data provides a cost-effective, scalable alternative to real-world data to facilitate model training, its diversity and privacy risks remain underexplored. Focusing on text-based synthetic data, we propose a comprehensive set of metrics to quantitatively assess the diversity (i.e., linguistic expression, sentiment, and user perspective), and privacy (i.e., re-identification risk and stylistic outliers) of synthetic datasets generated by several state-of-the-art LLMs. Experiment results reveal significant limitations in LLMs’ capabilities in generating diverse and privacy-preserving synthetic data. Guided by the evaluation results, a prompt-based approach is proposed to enhance the diversity of synthetic reviews while preserving reviewer privacy.

nan

Article 1228

Title@2025-07-24 (4): Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Title: Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme

Unisoma:多层系统统一变压器解决方案 2506.06021v2

Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu

Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks. Code is avaiable at https://github.com/therontau0054/Unisoma.

nan

Article 1229

Title@2025-07-24 (4): ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Title: ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText: Deepfake-Bilderkennung mit Vision-Language-Modellerklärungen und Graph-Neural-Netzwerken

ViGText: 用视觉语言模型解释和图形神经网络进行深假图像探测 2507.18031v1

Authors (5): Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan

The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model’s superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches. When facing targeted attacks that exploit its graph-based architecture, ViGText limits classification performance degradation to less than 4%. ViGText uses detailed visual and textual analysis to set a new standard for detecting deepfakes, helping ensure media authenticity and information integrity.

nan

Article 1230

Title@2025-07-24 (4): AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Title: AI Workflow, External Validation, and Development in Eye Disease Diagnosis

KI-Workflow, externe Validierung und Entwicklung in der Augenerkrankungen-Diagnose

AI 工作流程、外部验证和眼病诊断的发展 2409.15087v2

Authors (38): Qingyu Chen, Tiarnan D L Keenan, Elvira Agron, Alexis Allot, Emily Guan, Bryant Duong, Amr Elsawy, Benjamin Hou, Cancan Xue, Sanjeeb Bhandari, Geoffrey Broadhead, Chantal Cousineau-Krieger, Ellen Davis, William G Gensheimer, David Grasic, Seema Gupta, Luis Haddock, Eleni Konstantinou, Tania Lamba, Michele Maiberger, Dimosthenis Mantopoulos, Mitul C Mehta, Ayman G Nahri, Mutaz AL-Nawaflh, Arnold Oshinsky, Brittany E Powell, Boonkit Purt, Soo Shin, Hillary Stiefel, Alisa T Thavikulwat, Keith James Wroblewski, Tham Yih Chung, Chui Ming Gemmy Cheung, Ching-Yu Cheng, Emily Y Chew, Michelle R. Hribar, Michael F. Chiang, Zhiyong Lu

Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.

nan

Article 1231

Title@2025-07-24 (4): Does visualization help AI understand data?

Title: Does visualization help AI understand data?

Hilft die Visualisierung KI, Daten zu verstehen?

可视化能帮助AI理解数据吗? 2507.18022v1

Authors (3): Victoria R. Li, Johnathan Sun, Martin Wattenberg

Charts and graphs help people analyze data, but can they also be useful to AI systems? To investigate this question, we perform a series of experiments with two commercial vision-language models: GPT 4.1 and Claude 3.5. Across three representative analysis tasks, the two systems describe synthetic datasets more precisely and accurately when raw data is accompanied by a scatterplot, especially as datasets grow in complexity. Comparison with two baselines – providing a blank chart and a chart with mismatched data – shows that the improved performance is due to the content of the charts. Our results are initial evidence that AI systems, like humans, can benefit from visualization.

nan

Article 1232

Title@2025-07-24 (4): Zeroth-order log-concave sampling

Title: Zeroth-order log-concave sampling

logkonkav-Probenahme der Nullten Ordnung

零级对数集中取样 2507.18021v1

Authors (1): Yunbum Kook

We study the zeroth-order query complexity of log-concave sampling, specifically uniform sampling from convex bodies using membership oracles. We propose a simple variant of the proximal sampler that achieves the query complexity with matched R'enyi orders between the initial warmness and output guarantee. Specifically, for any $\varepsilon>0$ and $q\geq2$, the sampler, initialized at $\pi_{0}$, outputs a sample whose law is $\varepsilon$-close in $q$-R'enyi divergence to $\pi$, the uniform distribution over a convex body in $\mathbb{R}^{d}$, using $\widetilde{O}(qM_{q}^{q/(q-1)}d^{2}\,\lVert\operatorname{cov}\pi\rVert\log\frac{1}{\varepsilon})$ membership queries, where $M_{q}=\lVert\text{d}\pi_{0}/\text{d}\pi\rVert_{L^{q}(\pi)}$. We further introduce a simple annealing scheme that produces a warm start in $q$-R'enyi divergence (i.e., $M_{q}=O(1)$) using $\widetilde{O}(qd^{2}R^{3/2}\,\lVert\operatorname{cov}\pi\rVert^{1/4})$ queries, where $R^{2}=\mathbb{E}_{\pi}[

\cdot

^{2}]$. This interpolates between known complexities for warm-start generation in total variation and R'enyi-infinity divergence. To relay a R'enyi warmness across the annealing scheme, we establish hypercontractivity under simultaneous heat flow and translate it into an improved mixing guarantee for the proximal sampler under a logarithmic Sobolev inequality. These results extend naturally to general log-concave distributions accessible via evaluation oracles, incurring additional quadratic queries.

nan

Article 1233

Title@2025-07-24 (4): On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

Title: On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

Über die Nutzung nicht markierter Daten für die gleichzeitige positive und nicht markierte Klassifizierung und robuste Generierung

利用未贴标签数据进行同时正-未贴标签分类和强力生成 2006.07841v3

Authors (5): Bing Yu, Ke Sun, He Wang, Zhouchen Lin, Zhanxing Zhu

The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. While abundant unlabeled data typically exist and provide a potential solution, it is highly challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and the conditional generation with extra unlabeled data \emph{simultaneously}. We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Theoretically, we prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.

nan

Article 1234

Title@2025-07-24 (4): Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Title: Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

vorausschauende Skalierungsgesetze für eine effiziente GRPO-Schulung großer vernünftiger Modelle

GROPP 高效培训大理由模型的预测增强法律 2507.18014v1

Authors (5): Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta

Fine-tuning large language models (LLMs) for reasoning tasks using reinforcement learning methods like Group Relative Policy Optimization (GRPO) is computationally expensive. To address this, we propose a predictive framework that models training dynamics and helps optimize resource usage. Through experiments on Llama and Qwen models (3B 8B), we derive an empirical scaling law based on model size, initial performance, and training progress. This law predicts reward trajectories and identifies three consistent training phases: slow start, rapid improvement, and plateau. We find that training beyond certain number of an epoch offers little gain, suggesting earlier stopping can significantly reduce compute without sacrificing performance. Our approach generalizes across model types, providing a practical guide for efficient GRPO-based fine-tuning.

nan

Article 1235

Title@2025-07-24 (4): Deep Reinforcement Learning for Real-Time Green Energy Integration in Data Centers

Title: Deep Reinforcement Learning for Real-Time Green Energy Integration in Data Centers

Deep Enforcement Learning für die Integration grüner Energie in Rechenzentren in Echtzeit

数据中心实时绿色能源整合深入强化学习 2507.21153v1

Authors (2): Abderaouf Bahi, Amel Ourici

This paper explores the implementation of a Deep Reinforcement Learning (DRL)-optimized energy management system for e-commerce data centers, aimed at enhancing energy efficiency, cost-effectiveness, and environmental sustainability. The proposed system leverages DRL algorithms to dynamically manage the integration of renewable energy sources, energy storage, and grid power, adapting to fluctuating energy availability in real time. The study demonstrates that the DRL-optimized system achieves a 38\% reduction in energy costs, significantly outperforming traditional Reinforcement Learning (RL) methods (28\%) and heuristic approaches (22\%). Additionally, it maintains a low SLA violation rate of 1.5\%, compared to 3.0\% for RL and 4.8\% for heuristic methods. The DRL-optimized approach also results in an 82\% improvement in energy efficiency, surpassing other methods, and a 45\% reduction in carbon emissions, making it the most environmentally friendly solution. The system’s cumulative reward of 950 reflects its superior performance in balancing multiple objectives. Through rigorous testing and ablation studies, the paper validates the effectiveness of the DRL model’s architecture and parameters, offering a robust solution for energy management in data centers. The findings highlight the potential of DRL in advancing energy optimization strategies and addressing sustainability challenges.

nan

Article 1236

Title@2025-07-24 (4): Active Learning For Repairable Hardware Systems With Partial Coverage

Title: Active Learning For Repairable Hardware Systems With Partial Coverage

Aktives Lernen für reparable Hardware-Systeme mit teilweiser Abdeckung

为部分覆盖的可修理硬件系统积极学习 2503.16315v3

Authors (4): Michael Potter, Beyza Kalkanlı, Deniz Erdoğmuş, Michael Everett

Identifying the optimal diagnostic test and hardware system instance to infer reliability characteristics using field data is challenging, especially when constrained by fixed budgets and minimal maintenance cycles. Active Learning (AL) has shown promise for parameter inference with limited data and budget constraints in machine learning/deep learning tasks. However, AL for reliability model parameter inference remains underexplored for repairable hardware systems. It requires specialized AL Acquisition Functions (AFs) that consider hardware aging and the fact that a hardware system consists of multiple sub-systems, which may undergo only partial testing during a given diagnostic test. To address these challenges, we propose a relaxed Mixed Integer Semidefinite Program (MISDP) AL AF that incorporates Diagnostic Coverage (DC), Fisher Information Matrices (FIMs), and diagnostic testing budgets. Furthermore, we design empirical-based simulation experiments focusing on two diagnostic testing scenarios: (1) partial tests of a hardware system with overlapping subsystem coverage, and (2) partial tests where one diagnostic test fully subsumes the subsystem coverage of another. We evaluate our proposed approach against the most widely used AL AF in the literature (entropy), as well as several intuitive AL AFs tailored for reliability model parameter inference. Our proposed AF ranked best on average among the alternative AFs across 6,000 experimental configurations, with respect to Area Under the Curve (AUC) of the Absolute Total Expected Event Error (ATEER) and Mean Squared Error (MSE) curves, with statistical significance calculated at a 0.05 alpha level using a Friedman hypothesis test.

nan

Article 1237

Title@2025-07-24 (4): Deep Unfolding for MIMO Signal Detection

Title: Deep Unfolding for MIMO Signal Detection

Tiefenentfaltung für MIMO-Signalerkennung

MIMIMO信号探测的深度拆解 2507.21152v1

Authors (2): Hangli Ge, Noboru Koshizuka

In this paper, we propose a deep unfolding neural network-based MIMO detector that incorporates complex-valued computations using Wirtinger calculus. The method, referred as Dynamic Partially Shrinkage Thresholding (DPST), enables efficient, interpretable, and low-complexity MIMO signal detection. Unlike prior approaches that rely on real-valued approximations, our method operates natively in the complex domain, aligning with the fundamental nature of signal processing tasks. The proposed algorithm requires only a small number of trainable parameters, allowing for simplified training. Numerical results demonstrate that the proposed method achieves superior detection performance with fewer iterations and lower computational complexity, making it a practical solution for next-generation massive MIMO systems.

nan

Article 1238

Title@2025-07-24 (4): Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs

Title: Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs

Analyse des Islamophoben Diskurses mit semi-kodierten Ausdrücken und LLMs

使用半编码术语和LLMs分析仇视伊斯兰者的情况 2503.18273v2

Authors (5): Raza Ul Mustafa, Roi Dupart, Gabrielle Smith, Noman Ashraf, Nathalie Japkowicz

In recent years, Islamophobia has gained significant traction across Western societies, fueled by the rise of digital communication networks. This paper performs a large-scale analysis of specialized, semi-coded Islamophobic terms such as (muzrat, pislam, mudslime, mohammedan, muzzies) floated on extremist social platforms, i.e., 4Chan, Gab, Telegram, etc. Many of these terms appear lexically neutral or ambiguous outside of specific contexts, making them difficult for both human moderators and automated systems to reliably identify as hate speech. First, we use Large Language Models (LLMs) to show their ability to understand these terms. Second, Google Perspective API suggests that Islamophobic posts tend to receive higher toxicity scores than other categories of hate speech like Antisemitism. Finally, we use BERT topic modeling approach to extract different topics and Islamophobic discourse on these social platforms. Our findings indicate that LLMs understand these Out-Of-Vocabulary (OOV) slurs; however, further improvements in moderation strategies and algorithmic detection are necessary to address such discourse effectively. Our topic modeling also indicates that Islamophobic text is found across various political, conspiratorial, and far-right movements and is particularly directed against Muslim immigrants. Taken altogether, we performed one of the first studies on Islamophobic semi-coded terms and shed a global light on Islamophobia.

nan

Article 1239

Title@2025-07-24 (4): Fine-Grained Uncertainty Quantification via Collisions

Title: Fine-Grained Uncertainty Quantification via Collisions

Feinkörnige Unsicherheit Quantifizierung über Kollisionen

通过碰撞进行精细的不确定性定量 2411.12127v4

Authors (3): Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon

We propose a new and intuitive metric for aleatoric uncertainty quantification (UQ), the prevalence of class collisions defined as the same input being observed in different classes. We use the rate of class collisions to define the collision matrix, a novel and uniquely fine-grained measure of uncertainty. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent difficulty in distinguishing between each pair of classes. We discuss several applications of the collision matrix, establish its fundamental mathematical properties, as well as show its relationship with existing UQ methods, including the Bayes error rate (BER). We also address the new problem of estimating the collision matrix using one-hot labeled data by proposing a series of innovative techniques to estimate $S$. First, we learn a pair-wise contrastive model which accepts two inputs and determines if they belong to the same class. We then show that this contrastive model (which is PAC learnable) can be used to estimate the Gramian matrix of $S$, defined as $G=S^TS$. Finally, we show that under reasonable assumptions, $G$ can be used to uniquely recover $S$, a new result on non-negative matrices which could be of independent interest. With a method to estimate $S$ established, we demonstrate how this estimate of $S$, in conjunction with the contrastive model, can be used to estimate the posterior class portability distribution of any point. Experimental results are also presented to validate our methods of estimating the collision matrix and class posterior distributions on several datasets.

nan