• 00 07-10 (4) Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models Auswirkungen von Pretraining Word Co-occurence auf die kompositorische Generalisierung in multimodalen Modellen 预言前世界共同会议对多式联运模式中整体构成的影响 2507.08000v1
  • 01 07-10 Single-pass Adaptive Image Tokenization for Minimum Program Search Single-Pass Adaptive Image Tokenization für minimale Programmsuche 用于最低程序搜索的单一被动图像适配 2507.07995v1
  • 02 07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Überspringen Sie eine Ebene oder Schleifen Sie es? Test-Zeit Tiefe Anpassung von vorgebildeten LLMs 跳过图层或循环它? 预设 LLM 的测试时间深度适应 2507.07996v1
  • 03 07-10 EXPO: Stable Reinforcement Learning with Expressive Policies EXPO: Stabiles Stärkungslernen mit ausdrucksstarker Politik 出口促进: 采用表达式政策进行稳定的加强学习 2507.07986v1
  • 04 07-10 KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors KinDEL: DNA-kodierter Bibliotheks-Datensatz für Kinase-Inhibitoren KinDEL: Kinas Inhibbitor 的DNA编码图书馆数据集 2410.08938v2
  • 05 07-10 Why is Your Language Model a Poor Implicit Reward Model? Warum ist Ihr Sprachmodell ein schlechtes Implizit-Reward-Modell? 为什么您的语言模式 是一个贫穷的隐含奖赏模式? 2507.07981v1
  • 06 07-10 Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Wasserzeichen degradiert Ausrichtung in Sprachmodellen: Analyse und Milderung 语言模型的分级调整:分析和减轻影响 2506.04462v2
  • 07 07-10 Reinforcement Learning with Action Chunking Verstärktes Lernen mit Action Chunking 强化学习与行动决赛 2507.07969v1
  • 08 07-10 Prospective Learning in Retrospect Zukunftsorientiertes Lernen im Nachhinein 回溯中的未来学习 2507.07965v1
  • 09 07-10 Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Dynamisches Chunking für die end-to-end-Hierarchische Sequenzmodellierung 端端到末端等级序列建模动态震动 2507.07955v1
  • 10 07-10 TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices TinierHAR: Auf dem Weg zu ultraleichten Deep-Learning-Modellen für effiziente menschliche Aktivitätserkennung auf Edge-Geräten TiniierHAR:迈向超轻量深深学习模型,以便有效识别人类在边缘装置方面的活动 2507.07949v1
  • 11 07-10 Low Resource Reconstruction Attacks Through Benign Prompts Niedrige Ressourcen-Wiederaufbau Angriffe durch Benign Prompts 通过慈善提示进行低资源重建袭击 2507.07947v1
  • 12 07-10 BarcodeBERT: Transformers for Biodiversity Analysis BarcodeBERT: Transformer für Biodiversitätsanalyse 条码BERT:生物多样性分析变异器 2311.02401v3
  • 13 07-10 Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice Towards Continuous Home Cage Monitoring: Eine Bewertung von Tracking- und Identifikationsstrategien für Labor-Mäuse 逐步实现家用钥匙持续监测:对实验室老鼠跟踪和识别战略的评价 2507.07929v1
  • 14 07-10 A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search Eine Theorie der Schlussfolgerung Berechnung Scaling: Vernunft durch gerichtete stochastische Fähigkeiten Suche 推断计算尺度理论论:通过定向斯托卡技能搜索推理 2507.00004v2
  • 15 07-10 No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning Keine $D_{\text{train}}}$: Modell-agnostische Gegenfaktische Erklärungen mit Verstärkungslernen 无 $D{text{train$:利用强化学习模型-不可允许的反事实解释 2405.18563v2
  • 16 07-10 Plausible Counterfactual Explanations of Recommendations Plausible gegenfaktische Erklärungen der Empfehlungen 对建议的反事实解释 2507.07919v1
  • 17 07-10 A statistical physics framework for optimal learning Statistischer Physikrahmen für optimales Lernen 促进最佳学习的统计物理框架 2507.07907v1
  • 18 07-10 Agentic Retrieval of Topics and Insights from Earnings Calls Agentische Retrieval von Themen und Erkenntnisse aus Earnings Calls 收入呼吁的主题和透视的 Agent 检索 2507.07906v1
  • 19 07-10 Efficient Causal Discovery for Autoregressive Time Series Effiziente Causal Discovery für autoregressive Zeitreihen 自动递减时间序列高效因果发现 2507.07898v1
  • 20 07-10 Sampling Imbalanced Data with Multi-objective Bilevel Optimization Probenahme ausgewogener Daten mit multi-objektiver Bilevel-Optimierung 具有多目标双一级最佳优化的数据 2506.11315v2
  • 21 07-10 Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach Cross-Problem-Parameter-Transfer in Quanten Ungefähre Optimierungs-Algorithmus: Ein Ansatz zum maschinellen Lernen 量子中交叉问题参数转移 近最佳优化算法:机械学习方法 2504.10733v2
  • 22 07-10 Masked Image Modeling: A Survey Maskenbildmodellierung: Eine Umfrage 蒙面图像建模:调查 2408.06687v3
  • 23 07-10 A Bilevel Optimization Framework for Imbalanced Data Classification Ein Bilevel-Optimierungsrahmen für die unausgewogene Datenklassifikation 平衡数据分类双级优化框架 2410.11171v3
  • 24 07-10 UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs UnIT: Skalierbare unstrukturierte Schlussfolgerungs-Zeit-Rechnung für MAC-effiziente Neuralinferenz auf MCUs UnIT:MAC 高效神经引力对多边协调单位的可缩放无结构的推推力-时间节制 2507.07885v1
  • 25 07-10 SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation SAMO: Ein leicht schärfer und bewusster Ansatz für die Multi-Task-Optimierung mit gemeinsamer Global-Local-Perturbation SAMO: 与全球-地方联合干扰进行多任务优化的轻量级锐锐利软件方法 2507.07883v1
  • 26 07-10 Can AI-predicted complexes teach machine learning to compute drug binding affinity? Können KI-vorhergesehene Komplexe maschinelles Lernen beibringen, um Arzneimittelbindungsaffinität zu berechnen? 人工智能预测综合体能教机器学习如何计算药物绑定的亲缘关系吗? 2507.07882v1
  • 27 07-10 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models Was hat ein Stiftungsmodell gefunden? Mit induktiven Bias zur Untersuchung von Weltmodellen ” 基金会模式 “ 有何发现? 2507.06952v2
  • 28 07-10 Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models Edge-ASR: Auf dem Weg zur Low-Bit Quantisierung von automatischen Spracherkennungsmodellen 边缘-ASR:实现自动语音识别模式的低比量量化 2507.07877v1
  • 29 07-10 Fair Uncertainty Quantification for Depression Prediction Faire Unsicherheit Quantifizierung für Depression Vorhersage 预测萧条预测的公平不确定性量化 2505.04931v2
  • 30 07-10 Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v1
  • 31 07-10 Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking Abmildernde Wasserzeichen-Stealing-Angriffe in generativen Modellen über Multi-Key-Wasserzeichen 通过多钥匙划水标记,在产生模型时通过多钥匙划水标记减轻盗用盗用水标志袭击 2507.07871v1
  • 32 07-10 Parametric Scaling Law of Tuning Bias in Conformal Prediction Parametrisches Skalierungsgesetz des Tuning Bias in konformer Vorhersage 非正规预测中计票比价的参数衡量法 2502.03023v2
  • 33 07-10 Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing Nexus: Durchsatz-Latenz-Tradeoff im LLM-Servieren durch effiziente GPU-Sharing Nexus:通过高效的GPU共享,在LLM服务中,控制通量-通量权衡交易 2507.06608v2
  • 34 07-10 Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders Re-Bottleneck: Latent Re-Structuring für Neural Audio Autoencoder 重新装瓶器:神经音频自动自动编码器前端重新结构 2507.07867v1
  • 35 07-10 Predicting and generating antibiotics against future pathogens with ApexOracle Vorhersage und Generierung von Antibiotika gegen zukünftige Krankheitserreger mit ApexOracle 预测并产生抗生素,用ApexOracle来防治未来的病原体 2507.07862v1
  • 36 07-10 Studying and Improving Graph Neural Network-based Motif Estimation Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung 研究和改善图形神经网络基于Motif 估计 2506.15709v3
  • 37 07-10 Principled Foundations for Preference Optimization Prinzipierte Grundlagen für die Preference-Optimierung 最优化原则基金会 2507.07855v1
  • 38 07-10 Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v1
  • 39 07-10 Optimization Guarantees for Square-Root Natural-Gradient Variational Inference Optimierungsgarantien für Square-Root Natural-Gradient Variational Inferenz 平方-极极自然-梯度变动性推断的最佳保障 2507.07853v1
  • 40 07-10 Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective Pre-Trained AI Model Assisted Online Entscheidungsfindung unter fehlenden Kovariaten: Eine theoretische Perspektive 在失踪的共变之下协助作出在线决策的模式:理论视角 2507.07852v1
  • 41 07-10 Revisiting the Predictability of Performative, Social Events Über die Vorhersagbarkeit von performativen, gesellschaftlichen Veranstaltungen 重新审视表演性、社会活动的可预测性 2503.11713v2
  • 42 07-10 “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents “So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents “告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。 2507.07848v1
  • 43 07-10 Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components Bewertung der Einhaltung der Hierarchischen Sicherheitsgrundsätze durch LLM-Agenten: Ein leichter Maßstab für die Erprobung grundlegender Steuerungskomponenten 遵守等级安全原则:基础控制组成部分检验的轻量基准 2506.02357v2
  • 44 07-10 Unsupervised Morphological Tree Tokenizer Unüberwachter morphologischer Baum Tokenizer 不受监督的病理树化器 2406.15245v2
  • 45 07-10 Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model Statistische Physik-Analyse von Graphen-Neuronalen-Netzwerken: Annäherung an die Optimität im kontextuellen stochastischen Blockmodell 图形神经网络的统计物理学分析:在背景随机区块模型中接近最佳性 2503.01361v2
  • 46 07-10 Towards Benchmarking Foundation Models for Tabular Data With Text Auf dem Weg zu Benchmarking-Grundlagenmodellen für tabellarische Daten mit Text 建立文字表格数据基准基准基础模型 2507.07829v1
  • 47 07-10 An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications Eine empirische Bernsteinungleichheit für abhängige Daten in Hilbert-Räumen und Anwendungen 希尔伯特空间和应用中依赖数据方面的不平等问题 2507.07826v1
  • 48 07-10 Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution Symmetrie entdecken Breaking in Physical Systems mit entspannter Gruppenkonvolution 发现物理系统中的对称断裂与放松的集团革命 2310.02299v8
  • 49 07-10 MAEBE: Multi-Agent Emergent Behavior Framework MAEBE: Multi-Agent Emergent Behavior Framework 多边代理新兴行为框架 2506.03053v2
  • 50 07-10 An Algorithm for Learning Smaller Representations of Models With Scarce Data Ein Algorithmus für das Lernen kleinerer Darstellungen von Modellen mit knappen Daten 学习缺乏数据模型较小比例模型的计算方法 2010.07990v2
  • 51 07-10 On the Effect of Instruction Tuning Loss on Generalization Auf die Auswirkungen der Instruktion Tuning Verlust auf die Verallgemeinerung 指示计票损失对普遍化的影响的影响 2507.07817v1
  • 52 07-10 Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers Achten Sie auf Aufmerksamkeit Verteilung: Eine neue lokale Lipschitz Bound für Transformatoren ” 注意注意分发 “ : “ 变革者新地方利普施奇茨圆环 “ 。 2507.07814v1
  • 53 07-10 “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models “I am bad”: Verdolmetschen von Stealthy, Universal und Robust Audio Jailbreaks in Audio-Language-Modellen “我是坏人”:在音频语言模型中解释隐形、通用和强势音频牢房破损 2502.00718v2
  • 54 07-10 Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks Tiefe Überlebensanalyse in multimodalen medizinischen Daten: Ein parametrischer und probabilistischer Ansatz mit kompetitiven Risiken 多模式医疗数据深度生存分析:与相竞风险的参数和概率分析方法 2507.07804v1
  • 55 07-10 Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning Kontextuelle Banditen in der Zahlungsabwicklung: Nicht einheitliche Exploration und überwachtes Lernen 付款处理:非统一探索和监督学习 2412.00569v2
  • 56 07-10 Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models Raumfüllende Regularisierung für robuste und interpretierbare nichtlineare State Space Modelle 强力和可解释的非线性国家空间模型的空间巡空常规化 2507.07792v1
  • 57 07-10 Understanding Chain-of-Thought in LLMs through Information Theory Verständnis der in LLMs durch Informationstheorie gesuchten Gedankenkette 通过信息理论在LLM 中探索了解链 2411.11984v2
  • 58 07-10 Unsupervised Automata Learning via Discrete Optimization Unüberwachtes Automata-Lernen über Diskrete Optimierung 通过 Discrete 优化化学习不受监督的自动自动数据 2303.14111v2
  • 59 07-10 Learning Algorithms in the Limit Algorithmen lernen an der Grenze 在限制范围内学习算法 2506.15543v2
  • 60 07-10 Approximation Depth of Convex Polytopes Näherungstiefe von Konvex-Polytopen 电解多面的近似深度 2507.07779v1
  • 61 07-10 Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten 同步任务行为: 测试时训练中对齐多个任务 2507.07778v1
  • 62 07-10 Deep Learning is Not So Mysterious or Different Deep Learning ist nicht so geheimnisvoll oder anders 深深学习不是那么神秘或不同 2503.02113v2
  • 63 07-10 A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision Ein einheitliches empirisches Risikominimierungs-Framework für flexible N-Tuples Schwache Überwachung 灵活N-Tuples弱监督统一经验风险最小化框架 2507.07771v1
  • 64 07-10 BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v1
  • 65 07-10 TRIX- Trading Adversarial Fairness via Mixed Adversarial Training TRIX- Trading-Adversarial Fairness durch gemischte Adversarial Training TRIX-通过混合反向培训进行贸易反向公平 2507.07768v1
  • 66 07-10 Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape Verteilte und dezentralisierte Ausbildung: Technische Governance-Herausforderungen in einer sich verändernden KI-Landschaft 分散和分散化培训:AI 横向变化中的技术治理挑战 2507.07765v1
  • 67 07-10 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Squeeze the Soaked Sponge: Effiziente Off-Policy-Verstärkung Feinsteuerung für großes Sprachmodell 挤压海绵:高效非政策强化大语言模式的高效非政策改进微调 2507.06892v2
  • 68 07-10 OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v1
  • 69 07-10 Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks Effiziente und skalierbare Abschätzung der Verteilungseffekte mit multi-Task Neuronalen Netzwerken 与多任务神经神经网络一道高效和可缩放地估算分布式治疗效应 2507.07738v1
  • 70 07-10 GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing GuardVal: Dynamic Large Language Model Jailbreak Evaluation für umfassende Sicherheitstests 警卫:综合安全测试动态大语言示范监狱防爆评价 2507.07735v1
  • 71 07-10 Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model Robuste, federführende, personalisierte mittlere Schätzung für das Gaussian Mixture Model Gaussian Mixture 模型的联邦硬性个人化平均平均估计值 2504.19955v2
  • 72 07-10 Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v6
  • 73 07-10 Discrete Optimal Transport and Voice Conversion Diskreter Optimaler Transport und Sprachumwandlung 分辨最佳传输和语音转换 2505.04382v2
  • 74 07-10 Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v1
  • 75 07-10 Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning Ausbalancieren der Vergangenheit und Gegenwart: Ein koordiniertes Replay-Framework für das Federated Class-Incremental Learning 平衡过去和现在的平衡:联邦级强化学习协调重现框架 2507.07712v1
  • 76 07-10 Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors Shapley-based Data Valuation mit gegenseitiger Information: Ein Schlüssel zu veränderten K-Nächsten Nachbarn 与相互信息一起进行基于虚光的数据估值:修改 K- 最近邻的密钥 2312.01991v4
  • 77 07-10 Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Rationale-Enhanced Decodierung für multimodale Chain-of-Thought 多式联运谈判链附加说明 2507.07685v1
  • 78 07-10 Accelerating Transposed Convolutions on FPGA-based Edge Devices Beschleunigung transponierter Konvolutionen auf FPGA-basierten Edge-Geräten 加速基于 FPGA 的边缘设备的转换变速 2507.07683v1
  • 79 07-10 Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis Jenseits von Cox-Modellen: Bewertung der Leistungsfähigkeit von Machine-Learning-Methoden bei nichtproportionalen Gefahren und nichtlinearer Überlebensanalyse 超越考克斯模型:评估机器学习方法在非季节性危险和无林性生存分析方面的性能 2504.17568v2
  • 80 07-10 Implicit Counterfactual Data Augmentation for Robust Learning Implizite gegenfaktische Datenvergrößerung für robustes Lernen 强力学习所需的反事实数据放大 2304.13431v4
  • 81 07-10 Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken 关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果 2507.07675v1
  • 82 07-10 Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation Erlernen der Polstrukturen von Hadronischen Staaten mittels vorausschauender Unsicherheitsabschätzung 使用预测性不确定性估计值的 强力国家学习极极结构 2507.07668v1
  • 83 07-10 Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL Uncovering RL Integration in SSL Loss: Zielspezifische Implikationen für dateneffiziente RL SSL损失中未覆盖的 RL 整合:对数据高效RL的客观具体影响 2410.17428v3
  • 84 07-10 Curriculum Negative Mining For Temporal Networks Curriculum Negative Mining für zeitliche Netzwerke 时间网络负面采矿课程 2407.17070v2
  • 85 07-10 Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor Machine Learning-Assisted Surrogate Modellierung mit multi-objektiver Optimierung und Entscheidungsfindung eines Dampfmethan-Reformreaktors 利用蒸气甲烷改造反应堆的多目标优化和决策 2507.07641v1
  • 86 07-10 HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric HLF-FSL. Eine dezentrale, gefederte Split-Learning-Lösung für IoT auf Hyperledger Fabric HLF-FLF-FLF. 关于超板机纤维化的IOT的分散化的联邦学习分解解决方案 2507.07637v1
  • 87 07-10 Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights Vergleichende Stimmungsanalyse der öffentlichen Wahrnehmung: Monkeypox vs. COVID-19 Verhaltenseinblicke 对公众感知的比较情绪分析:天花对COVID-19行为洞察力 2505.07430v2
  • 88 07-10 Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks Erforschung der Grenzen der Modellkompression in LLMs: Eine Studie zur Wissensdestillation über QA-Aufgaben 探索LLMM中模型压缩的限度:关于质量保证任务的知识积累研究 2507.07630v1
  • 89 07-10 Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials Konzentration von Messwerten für nichtlineare Zufallsmatrizen mit Anwendungen in neuronalen Netzwerken und nicht-kommutativen Polynomen 非线性随机随机矩阵的测量浓度,该矩阵应用到神经网络和非模拟多元复合体 2507.07625v1
  • 90 07-10 TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TransformEEG: Auf dem Weg zur Verbesserung des Modells Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TerverEEEG:努力改进深学习性EEG Parkinson疾病检测模式 2507.07622v1
  • 91 07-10 Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation Sparse Causal Discovery mit generativer Intervention für unüberwachte Graphen-Domänenanpassung 以未受监督的图形域适应的生成干预生成的简单原因发现 2507.07621v1
  • 92 07-10 Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0 Sparse Selbstgebundenes Lernen für energieeffiziente kooperative Intelligenz in der Gesellschaft 5.0 社会节能合作情报学会 2507.07613v1
  • 93 07-10 S2FGL: Spatial Spectral Federated Graph Learning S2FGL: Raumspektrales Federiertes Graphenlernen S2FGL: 空间光谱联邦图表学习 2507.02409v2
  • 94 07-10 Offline Trajectory Optimization for Offline Reinforcement Learning Offline-Trajektorienoptimierung für Offline-Verstärkungslernen 离线轨迹优化用于离线强化学习 2404.10393v2
  • 95 07-10 Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis Synthetische MC über biologische Transmitter: Therapeutische Modulation der Gut-Brain-Achse 通过生物传播器进行MC:古特脑轴体的治疗变化 2507.07604v1
  • 96 07-10 Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning Drücken Sie nicht auf den Knopf! Erforschen von Daten Leckage Risiken im maschinellen Lernen und Transfer Lernen 不要按按钮! 探索机器学习和传输学习中的数据泄漏风险 2401.13796v4
  • 97 07-10 Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations Überprüfung der Likelihood-basierten Out-of-Distribution-Erkennung durch Modellierung von Repräsentationen 通过建模代表机构重新审视以可能性为基础的分销外探测 2504.07793v3
  • 98 07-10 Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data Stressüberwachung im Gesundheitswesen: Ein Ensemble Machine Learning Framework mit tragbaren Sensordaten 保健中压力监测:使用穿戴感感应数据的综合机械学习框架 2507.07589v1
  • 99 07-10 Bayesian Discrete Diffusion Beats Autoregressive Perplexity Bayesian Discrete Diffusion schlägt autoregressive Verwirrung Bayesian Discrete 扩散比自动递减的两难性 2507.07586v1
  • 100 07-10 Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench Jenseits von Überkorrektur: Bewertung von Diversität in T2I-Modellen mit DivBench 超越过度纠正:在DivBench的T2I模型中评估多样性 2507.03015v2
  • 101 07-10 Improving Clustering on Occupational Text Data through Dimensionality Reduction Verbesserung der Clusterbildung auf berufsbezogenen Textdaten durch Dimensionalitätsreduzierung 通过减少分量改进职业文本数据集群化 2507.07582v1
  • 102 07-10 CHOMET: Conditional Handovers via Meta-Learning CHOMET: Bedingte Übergaben über Meta-Learning CHOMET: 通过Met-Learn 有条件的交接 2507.07581v1
  • 103 07-10 COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation COALA: Numerisch stabiles und effizientes Framework für kontextabhängige Low-Rank-Annäherung COALA: 低 Rank 上下低敏度接近度的数值稳定、高效框架 2507.07580v1
  • 104 07-10 On Trustworthy Rule-Based Models and Explanations Über vertrauenswürdige regelbasierte Modelle und Erklärungen 关于可信赖、有可信赖的、基于规则的模型和解释 2507.07576v1
  • 105 07-10 Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning Künstliche Generäle Intelligenz: Meistern von Generälen.io mit Verstärkungslernen 人造将军情报:掌握将军,加强学习 2507.06825v2
  • 106 07-10 Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound Lösung probabilistischer Verifikationsprobleme von neuralen Netzen mittels Branch und Bound 利用分支和边界解决神经网络的概率核查问题 2405.17556v3
  • 107 07-10 Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series Echtzeit-Dekorrelation-basierte Anomalieerkennung für multivariate Zeitreihen 用于多变量时间序列的基于实时显示关系异常探测 2507.07559v1
  • 108 07-10 TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference TokenWeave: Effiziente Compute-Communication Overlap für verteilte LLM-Inferenz TokenWeave: 有效计算分布式LLM 推理的通信重叠 2505.11329v2
  • 109 07-10 LARP: Learner-Agnostic Robust Data Prefiltering LARP: Learner-Agnostic Robuste Datenvorfilterung LARP: 学习者-不可知强力数据预过滤 2506.20573v3
  • 110 07-10 Position: We Need An Algorithmic Understanding of Generative AI Position: Wir brauchen ein algorithmisches Verständnis der Generativen KI 立场:我们需要对 “ 创造的人工智能 “ 的定量理解。 2507.07544v1
  • 111 07-10 Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series Nicht falsch machen: Wie man tiefe visuelle Interpretationen auf Zeitreihen anwendet 不要误会我: 如何将深视判读应用到时间序列 2203.07861v3
  • 112 07-10 Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings Neural Concept Verifier: Scaling Prover-Verifier Spiele über Concept Encodings 神经概念验证符:通过概念编码来缩放Prover-Ver化游戏 2507.07532v1
  • 113 07-10 Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models Thought Crime: Hintertüren und Emergent-Missausrichtung in vernünftigen Modellen 思想犯罪:后门和合理理由模型中新出现的不协调现象 2506.13206v2
  • 114 07-10 Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process Ableitung von Output-Korrelations-Schlussfolgerungen für Multi-Output (aka Multi-Task) Gaussian-Prozess 多种产出(又称多任务)的多产出(高斯)进程输出相关关系推断的衍生结果 2501.07964v4
  • 115 07-10 Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v3
  • 116 07-10 Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition Robuste und effiziente Schreib-Unabhängige IMU-basierte Handschriftenerkennung 强有力和高效率的独立作家、独立作家、以IMU为基础的手写识别 2502.20954v2
  • 117 07-10 Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning Unsicherheit Quantifizierung für Motor Imagery BCI – Machine Learning vs. Deep Learning 机动图像BCI – – 机器学习与深层学习 2507.07511v1
  • 118 07-10 Divergence Minimization Preference Optimization for Diffusion Model Alignment Divergenz-Minimierungspräferenz-Optimierung für Diffusionsmodellausrichtung 传播模型对齐 2507.07510v1
  • 119 07-10 Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code LLM zur Vernunft bringen: Stärkung Lernen aus algorithmischen Problemen ohne Code 教LLM到理由:加强从没有法典的等级问题中学习 2507.07498v1
  • 120 07-10 Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation Semi-überwachtes Lernen und Integration von Multi-Sequenz-MR-Bildern für karotide Gefäßwand- und Plaquesegmentierung 在半监督下学习和整合对折合体船只壁壁和隔板的多序列MMM-图像的半监督学习和集成 2507.07496v1
  • 121 07-10 Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning Aufgabenzuweisung und Explorationsoptimierung für UAV-Rescue mit geringer Höhe über Generative KI Enhanced Multi-Agent Verstärkungs-Lernen 通过创新的AI增强型多剂强化学习,为低高空无人驾驶航空器救援工作分配任务和探索优化 2504.13554v2
  • 122 07-10 Affordable AI Assistants with Knowledge Graph of Thoughts Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken 具有知识思想知识图的负担得起的AI助理 2504.02670v5
  • 123 07-10 Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v1
  • 124 07-10 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Machine Bullshit: Charakterisieren der Emergenten Missachtung der Wahrheit in großen Sprachmodellen 机器胡说:在大语言模型中突出新人无视真相的特点 2507.07484v1
  • 125 07-10 Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences Adaptive Randomisierte Glättung: Zertifizierte Adversarial Robustheit für Multi-Step-Verteidigungen 适应性随机调整平滑:多步骤防御的证明反向强力 2406.10427v3
  • 126 07-10 Mixture of Group Experts for Learning Invariant Representations Mixtur von Gruppenexperten für Learning Invariante Repräsentationen 学习不稳定代表小组专家混合 2504.09265v2
  • 127 07-10 ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining ixi-GEN: Effiziente industrielle sLLMs durch Domain Adaptive Continual Pretraining ixi-GEN:通过远程适应性连续训练前,提高工业低温生产效率 2507.06795v2
  • 128 07-10 Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting Galerkin-ARIMA: Ein zweistufiges Polynom-Regressions-Framework für schnelles Ein-Schritt-Vorhersagen Galerkin-ARIMA:一个双级多级倒退框架,用于快速滚动单步单步预告 2507.07469v1
  • 129 07-10 Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals Hess-MC2: Sequentielle Monte Carlo mit Hessischen Informationen und Vorschlägen für die zweite Ordnung Hess-MC2:使用黑森信息和第二顺序提案的顺序蒙特卡洛广场 2507.07461v1
  • 130 07-10 General purpose models for the chemical sciences Allgemeine Zweckmodelle für die Chemiewissenschaften 化学科学通用模型 2507.07456v1
  • 131 07-10 C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v4
  • 132 07-10 Neural networks leverage nominally quantum and post-quantum representations Neurale Netzwerke nutzen nominal Quanten- und Post-Quantum-Darstellungen 神经网络在名义上对数量和数量后代表的杠杆作用发挥杠杆作用 2507.07432v1
  • 133 07-10 ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense ARBoids: Adaptives Residual-Verstärkungs-Lernen mit Boids-Modell für kooperative Multi-USV-Zielverteidigung ABBOids:多紫外线合作多紫外线目标防御用BOids模式进行适应性残余强化学习 2502.18549v2
  • 134 07-10 Harmonic Loss Trains Interpretable AI Models Harmonische Verlust Züge Interpretierbare KI-Modelle 可解释的 AI 模型 2502.01628v2
  • 135 07-10 Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm Probabilistische annähernde Optimierung: Eine neue Variation des Monte Carlo-Algorithmus 概率近似优化:新的变异性蒙特卡洛算法 2507.07420v1
  • 136 07-10 Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation Autonomes KI-basiertes Cybersecurity Framework für kritische Infrastruktur: Echtzeit-Bedrohungsmilderung 以AI为基础的关键基础设施自动网络安全框架:减少实时威胁 2507.07416v1
  • 137 07-10 Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks Hybride LLM-verstärkte Intrusionserkennung für Zero-Day-Bedrohungen in IoT-Netzwerken 在IoT网络零日威胁下加强入侵探测 2507.07413v1
  • 138 07-10 Determinant Estimation under Memory Constraints and Neural Scaling Laws Determinante Abschätzung unter Gedächtnisbeschränkungen und neuralen Skalierungsgesetzen 根据记忆限制和神经扩增法对决定因素进行估算 2503.04424v2
  • 139 07-10 Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models Phishing Detection in der Gen-AI Ära: Quantisierte LLMs gegen klassische Modelle Gen-AI 时代中的幻影探测:量化的LMs 与古典模型 2507.07406v1
  • 140 07-10 HGMP:Heterogeneous Graph Multi-Task Prompt Learning HGMP:Heterogenes Graph-Multi-Task-Prompt-Lernen HGMP: 异基因图多任务快速学习 2507.07405v1
  • 141 07-10 Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization Generalized Tree Edit Distance (GTED): Ein treues Bewertungsmetrikum für die Autoformalisierung von Aussagen 通用树版编辑距离(GTED):声明自动正规化的忠实评价度量 2507.07399v1
  • 142 07-10 IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing IML-Spikeformer: Multi-Level Spiking Transformer für die Sprachverarbeitung IML-Spikeex: 用于语音处理的具有投入意识的多层Spiking变换器 2507.07396v1
  • 143 07-10 Learning Collective Variables from Time-lagged Generation Kollektive Variablen aus der zeitverzögerten Generation lernen 时间滞后一代的学习集体变量 2507.07390v1
  • 144 07-10 ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction ST-GRIT: Spatio-Temporal Graph Transformer für interne Eisschichtdicke Vorhersage ST-GRIT: 内部冰层厚度预测的时空图变异器 2507.07389v1
  • 145 07-10 GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction GRIT: Graph Transformer für interne Eisschichtdicke Vorhersage GRIT: 内部冰层厚度预测的图形变形器 2507.07388v1
  • 146 07-10 HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning HeLo:情感分布学习中带有标签关联的异变多模式融合 2507.06821v2
  • 147 07-10 Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay Online Continual Learning über Spiking Neuronal Networks mit Schlaf Enhanced Latent Replay 通过Spiking神经网络在线持续学习,并配有睡眠强化前端重播 2507.02901v2
  • 148 07-10 Unifews: You Need Fewer Operations for Efficient Graph Neural Networks Unifews: Sie brauchen weniger Operationen für effiziente Graphen-Neural-Netzwerke Unifews: 高效图形神经网络需要更少操作 2403.13268v2
  • 149 07-10 User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection Benutzerbasierte sequentielle Modellierung mit Transformer-Encodern für Insider Threat Detection 以用户为基础的序列模型,使用变换器编码器进行内部威胁探测 2506.23446v2
  • 150 07-10 BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems BountyBench: Dollar-Impact von KI-Agenten-Angriffen und Verteidigern auf reale Cybersicherheitssysteme BuntyBuntyBunnench: AI代理攻击者和捍卫者对现实世界网络安全系统的美元影响 2505.15216v2
  • 151 07-10 A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines Ein Multi-Granularität überwacht Kontrastive Rahmen für das Bleiben nützlicher Lebensvorhersage von Aero-Motoren 空气-发动机剩余使用寿命预测多族监督多族监督违规框架 2411.00461v3
  • 152 07-10 Bradley-Terry and Multi-Objective Reward Modeling Are Complementary Bradley-Terry und Multi-Objective Reward Modeling sind komplementär Bradley-Terriy和多目标奖励模型具有补充作用 2507.07375v1
  • 153 07-10 Prediction-Augmented Mechanism Design for Weighted Facility Location Voraussichtlicher Mechanismus für den Standort der gewichteten Fazilität 加权设施位置设计 2507.06509v2
  • 154 07-10 Atherosclerosis through Hierarchical Explainable Neural Network Analysis Atherosklerose durch hierarchische erklärende neurale Netzwerkanalyse 通过可解释的神经网络分析,通过高层次解析神经网络分析,实现天体硬化 2507.07373v1
  • 155 07-10 Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification Datengesteuerte kinematische Modellierung in Soft Robots: Systemidentifikation und Unsicherheitsquantifizierung 软机器人中数据驱动的虚拟模型:系统识别和不确定性量化 2507.07370v1
  • 156 07-10 A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning Eine kryptografische Perspektive auf Mitigation vs. Detection in Machine Learning 关于减缓与机械学习中的探测的加密视角 2504.20310v2
  • 157 07-10 Platform for Representation and Integration of multimodal Molecular Embeddings Plattform für Repräsentation und Integration multimodaler molekularer Einbettungen 多式联运分子嵌入的 代表性和一体化平台 2507.07367v1
  • 158 07-10 Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning Zielorientiertes sequentielles Bayesian Experimental Design für das kausale Lernen 以目标为导向、按顺序排列的Bayesian 因果关系学习实验设计 2507.07359v1
  • 159 07-10 Learning from positive and unlabeled examples -Finite size sample bounds Aus positiven und unmarkierten Beispielen lernen -Finite-Size-Probengrenzen 从正面和未贴标签的例子中学习 - 微小大小抽样范围 2507.07354v1
  • 160 07-10 Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience Mehrstufige MD-Workflows mit maschinellem Lernen: Die Mini-MuMMI-Erfahrung 由学习驱动的机械式学习驱动的多规模MD工作流程:微型MIMI经验 2507.07352v1
  • 161 07-10 Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts Zero-Shot-Context-Verallgemeinerung in der Verstärkung Lernen aus wenigen Trainingskontexten 从少见的培训背景中加强学习的零零零片背景概括化 2507.07348v1
  • 162 07-10 It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung 很难正常:噪音对结构-不可计量估计的影响 2507.02275v2
  • 163 07-10 Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures Viel mehr als die Summe ihrer Teile: Von statistischen zu strukturellen Mischungen 超出其部分总和:从统计到结构混合 2507.07343v1
  • 164 07-09 (3) Adversarial Defenses via Vector Quantization Adversariale Abwehr durch Vector Quantization 通过矢量定量的矢量反向防御 2305.13651v2
  • 165 07-09 Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset Benchmarking der Warteliste Mortalitätsvorhersage in der Herztransplantation durch Time-to-Event-Modellierung mit neuem Langzeit-UNOS-Datensatz 利用新的纵向UNOS数据集,利用从时间到晚上的建模,制定基准的等待名单 心脏移植中死亡率预测 2507.07339v1
  • 166 07-09 Bayesian Double Descent Bayesische Doppelabfahrt Bayesian 双世后代 2507.07338v1
  • 167 07-09 Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Leveraging Manifold Einbettungen für verbesserte Graph Transformer Darstellungen und Lernen 利用操纵嵌入器促进增强的图表变形器演示和学习 2507.07335v1
  • 168 07-09 Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery Überbrückung des Plausibilitäts-Validitäts-Gaps durch Feintuning eines mit Gründen versehenen LLM für chemische Synthese und Entdeckung 通过为化学合成和发现而精细提出一个有合理理由的强化LLM来弥合可视性与价值差距 2507.07328v1
  • 169 07-09 Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning Optimierung der Modellaufteilung und der Aufgabenzuweisung für trügerisches Signal unterstütztes privates Mehr-Hop-Split-Lernen 为欺骗性信号辅助的私人多窗口共享学习优化模型分割和设备任务分配 2507.07323v1
  • 170 07-09 Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy Optimierung von Kommunikation und Geräteclustering für clustergebundenes Federated Learning mit unterschiedlicher Privatsphäre 优化通信和装置集群,促进有不同隐私的分组联邦学习 2507.07320v1
  • 171 07-09 AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing AdeptHEQ-FL: Adaptive homomorphe Verschlüsselung für das Federated Learning von Hybrid-Klassisch-Quantum-Modellen mit dynamischem Layer-Sparing AdeptHEQ-FL:具有动态图层分隔的混合古量模型的联邦学习 2507.07316v1
  • 172 07-09 Frontier LLMs Still Struggle with Simple Reasoning Tasks Frontier LLMs kämpfen immer noch mit einfachen, vernünftigen Aufgaben 仍与简单理由任务抗争的前沿LLMS 2507.07313v1
  • 173 07-09 From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? Von Bildern zu Signalen: Sind große Visionsmodelle für die Zeitreihenanalyse nützlich? 从图像到信号:大型视觉模型对时间序列分析有用吗? 2505.24030v2
  • 174 07-09 Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing Spektrale Estimatoren für strukturierte generalisierte Linearmodelle über ungefähre Message Passing 通过近似消息传递为结构化通用线性模型制作的光谱光度模拟器 2308.14507v4
  • 175 07-09 Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v2
  • 176 07-09 Multilayer GNN for Predictive Maintenance and Clustering in Power Grids Multilayer GNN für Predictive Maintenance und Clustering in Power Grids 用于电力网的预报维护和集群的多层次GNN(多层次GNN) 2507.07298v1
  • 177 07-09 Time Series Foundation Models for Multivariate Financial Time Series Forecasting Time Series Foundation Modelle für Multivariate Finanzzeitreihen Prognose 多变财务时间序列预测模型 2507.07296v1
  • 178 07-09 Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning Thermodynamische Vorhersage ermöglicht durch automatisches Datensatzaufbauen und maschinelles Lernen 通过自动数据集建立和机器学习实现的热动力预测 2507.07293v1
  • 179 07-09 Discretization-independent multifidelity operator learning for partial differential equations Diskretisationsunabhängiges Multifidelitätsoperator-Lernen für partielle Differentialgleichungen 为部分差异方程学习部分差异方程 2507.07292v1
  • 180 07-09 Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems Schätzung der Datensatzdimension über Singular Metrics unter der Manifold Hypothese: Anwendung auf inverse Probleme 在 “ 假假设:对反向问题的应用 “ 下,通过单质计量法估算数据集的维度 2507.07291v1
  • 181 07-09 Natural Evolutionary Search meets Probabilistic Numerics Natürliche evolutionäre Suche trifft auf probabilistische Zahlen 自然进化搜索会议 2507.07288v1
  • 182 07-09 EditLord: Learning Code Transformation Rules for Code Editing EditLord: Regeln zur Code-Transformation für die Code-Editing 编辑主: 学习代码编辑的代码转换规则 2504.15284v4
  • 183 07-09 Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks Smart IoT Security: Leichtbautechniken für die Multi-Class-Angrifferkennung in IoT-Netzwerken Smart Iot安全:在IoT网络中用于多级攻击探测的轻型机械学习技术 2502.04057v3
  • 184 07-09 Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes Fast sichere Konvergenz für die letzte Phase der stochastischen Stufenabfolge 沙粒渐变后代计划最后一次迭代几乎可以肯定的趋同 2507.07281v1
  • 185 07-09 TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores TRIP: Ein nichtparametrischer Test zur Diagnose von Biased Feature Importance Scores TRIP: 诊断基线地物重要性分数的非参数测试 2507.07276v1
  • 186 07-09 Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time Jenseits der ATE: Interpretierbare Modellierung von Behandlungseffekten über Dosis und Zeit 超越ATE:可解释的剂量和时间处理效果模型 2507.07271v1
  • 187 07-09 Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors Robuster multimodaler Lernrahmen für die Erfassung von Aufnahmegesten mit kontaktlosen Radar- und tragbaren IMU-Sensoren 使用无接触雷达和可穿戴的IMU传感器掌握手势的强有力多模式学习框架 2507.07261v1
  • 188 07-09 Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record Kryptogener Schlaganfall und Migräne: Verwendung von probabilistischer Unabhängigkeit und maschinellem Lernen, um latente Krankheitsquellen aus dem elektronischen Gesundheitsbericht zu entdecken 低温和偏头痛:利用概率独立和机器学习,从电子健康记录中发现潜在的疾病源 2505.04631v2
  • 189 07-09 AXLearn: Modular Large Model Training on Heterogeneous Infrastructure AXLearn: Modulares Großes Modelltraining auf heterogener Infrastruktur AXLearn:关于不同差异基础设施的模块大型示范培训 2507.05411v2
  • 190 07-09 Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning Ausnutzen von Edge-Features für übertragbare adversariale Angriffe im verteilten maschinellen Lernen 在分配式机器学习中利用可转移的对地攻击的边缘特征 2507.07259v1
  • 191 07-09 A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms Ein sprachgetriebener Rahmen zur Verbesserung personalisierter Empfehlungen: Zusammenführung von LLMs mit traditionellen Algorithmen 改进个性建议的语言驱动框架:将LLMs与传统等级合并 2507.07251v1
  • 192 07-09 Position: Adopt Constraints Over Penalties in Deep Learning Position: Überstrapazierte Strafen im Deep Learning adoptieren 职位:在深深学习中采用约束措施以凌驾刑罚 2505.20628v2
  • 193 07-09 Leveraging the Structure of Medical Data for Improved Representation Learning Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen 利用医疗数据结构改进代表性学习 2507.02987v2
  • 194 07-09 Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention Aufmerksamkeiten unter dem Mikroskop: Eine vergleichende Studie der Ressourcennutzung für Varianten der Selbstachtung ” 显微镜 “ 下的注意:自觉变式资源利用比较研究 2507.07247v1
  • 195 07-09 Cosmos World Foundation Model Platform for Physical AI Kosmos World Foundation Modellplattform für physische KI 宇宙世界基金会物理光学示范平台 2501.03575v3
  • 196 07-09 Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased Herausforderungen beim Lernen aus unausgeglichenen Daten mit Baum-basierten Modellen: Prävalenzschätzungen hängen systematisch von Hyperparametern ab und können nach oben verzerrt sein 利用树基模型从不平衡数据中吸取挑战:流行率估计数系统依赖超参数,可能向上偏偏 2412.16209v2
  • 197 07-09 Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research Challenges Semantic Edge Computing und Semantic Communications in 6G Networks: Eine einheitliche Umfrage und Forschungsherausforderungen 6G网络中的电子计算和语义通信:统一调查和研究挑战 2411.18199v3
  • 198 07-09 Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism Kostenreduzierung: Der Weg zur Optimierung der Gedankenkette durch Sparse-Achtungsmechanismus 降低理据成本:通过分散关注机制优化思维链的道路 2411.09111v8
  • 199 07-09 Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture Auf dem Weg zu robusten Surrogate-Modellen: Benchmarking Machine Learning Ansätze zur Beschleunigung von Phasenfeldsimulationen von Bruchbruch 走向强力替代模型:为加速阶段断裂断裂的实地模拟工作制定机构学习方法基准 2507.07237v1
  • 200 07-09 An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation Eine informationstheoretische Perspektive zur Multi-LLM-Unsicherheitsschätzung 关于多LLLM不确定性估算的信息理论视角 2507.07236v1
  • 201 07-09 Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection Emoji-Angriff: Verstärkung von Jailbreak-Angriffen gegen Richter LLM-Erkennung Emoji攻击:加强针对LLM法官的越狱袭击 2411.01077v3
  • 202 07-09 OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning OVOR: OnePrompt mit Virtual Outlier Regularization für probenfreies Klassen-Inkrementelles Lernen OVOR: 为无排练班级入门学习提供虚拟外部正规化的一次性 2402.04129v2
  • 203 07-09 Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences Zeit macht Raum: Entstehung von Platzfeldern in Netzwerken Kodierung temporär kontinuierlicher sensorischer Erfahrungen 时间创造空间:网络中位置字段的出现, 网络编码 温度持续感知经验 2408.05798v3
  • 204 07-09 Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems Effizienter parametrischer SVD von Koopman Operator für stochastische dynamische Systeme Koopman 斯托切动态系统操作员的高效参数SVD 2507.07222v1
  • 205 07-09 Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards Optimas: Optimierung von KI-Systemen mit global ausgerichteten lokalen Belohnungen 最佳结果:以全球统一地方奖励办法优化化合物AI系统 2507.03041v2
  • 206 07-09 Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains Neurosymbolische Merkmalsextraktion zur Identifizierung von Zwangsarbeit in Lieferketten 查明供应链中强迫劳动情况 2507.07217v1
  • 207 07-09 Bias-Aware Mislabeling Detection via Decoupled Confident Learning Bias-Aware-Mislabeling-Erkennung durch entkoppeltes vertrauensvolles Lernen 通过解开信任学习解开错误标签检测 2507.07216v1
  • 208 07-09 One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory Eine Trajektorie, ein Token: Erdliche Video-Tokenisierung über panoptische Sub-Objekt-Trajektorie 一个轨迹, 一个 Token: 通过泛光子物件轨迹, 固定的视频轨迹 2505.23617v2
  • 209 07-09 Scale leads to compositional generalization Skala führt zu kompositorischer Verallgemeinerung 缩放导致编成一般化 2507.07207v1
  • 210 07-09 MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation MODA: Ein einheitliches 3D-Diffusions-Framework für multi-Task Target-Aware-Molekulargenerierung MODA:多目标目标目标软件分子生成3D统一传播框架 2507.07201v1
  • 211 07-09 Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning Kombination von vortrainierten Modellen für eine verbesserte Darstellung von Funktionen bei der Stärkung des Lernens 将强化学习中强化地物代表的预先培训模式相结合 2507.07197v1
  • 212 07-09 Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching Bridging the Last Mile of Prediction: Verbesserung der Zeitreihenvorhersage mit konditional gesteuertem Flow Matching 连接预测的最后一环:加强时间序列预测与有条件的引导流动匹配 2507.07192v1
  • 213 07-09 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs Bepflanzt in der Vorausbildung, durch Finetuning abgeschwächt: Eine Fallstudie über die Herkunft von Kognitiv-Biasen in LLMs 编在培训前编,《微调:关于LLM中认知性双星起源的个案研究》,《微调摇摇晃》 2507.07186v1
  • 214 07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Auf dem Weg zum multimodalen Verständnis durch stabile Diffusion als Task-Aware Feature Extractor 通过稳定传播实现多模式理解,作为任务软件特写提取器 2507.07106v1
  • 215 07-09 Does Data Scaling Lead to Visual Compositional Generalization? Führt die Datenskalierung zu einer visuellen Verallgemeinerung der Zusammensetzung? 数据缩放是否导致视觉构成的概括化? 2507.07102v1
  • 216 07-09 Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful Kleine Batch Size Training für Sprachmodelle: Wenn Vanilla SGD funktioniert, und warum gradient Akkumulation verschwenderisch ist 语言模式小批量规模培训:Vanilla SGD何时奏效,为何梯度累积是浪费的 2507.07101v1
  • 217 07-09 Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts Umgang mit unausgewogenem Domain-Incremental-Lernen durch Dual-Balance-Kollaborative Experten 通过双平衡合作专家解决不平等问题 2507.07100v1
  • 218 07-09 From Pseudorandomness to Multi-Group Fairness and Back Von Pseudorandomness zu Multi-Group Fairness und Back 从优多度到多群体公平与后向 2301.08837v4
  • 219 07-09 Large-scale portfolio optimization with variational neural annealing Große Portfolio-Optimierung mit variationaler neuronaler Glühung 大规模组合组合优化,采用变异性神经净化 2507.07159v1
  • 220 07-09 Less can be more for predicting properties with large language models Weniger kann mehr für die Vorhersage von Eigenschaften mit großen Sprachmodellen sein 用大语言模型预测属性可以少得多 2406.17295v3
  • 221 07-09 Multi-Attribute Steering of Language Models via Targeted Intervention Multi-Attribute-Lenkung von Sprachmodellen durch gezielte Intervention 通过有针对性的干预措施,多语言模式多来源指导 2502.12446v2
  • 222 07-09 How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks Wie man die Sim-to-Real Gap in digitalen Twin-Aided Telecommunication Networks überbrücken kann 如何弥合数字双辅助电信网络的一对一差距 2507.07067v1
  • 223 07-09 LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits LASER: Mit Multi-Armed Bandits lernen, Belohnungsmodelle adaptiv zu wählen LASeR: 学习用多武装强盗来灵活选择奖赏模型 2410.01735v2
  • 224 07-09 Interpretable EEG-to-Image Generation with Semantic Prompts Interpretierbare EEG-to-Image-Generation mit semantischen Prompts 具有语义提示的可解释的 EEG 至 图像生成 2507.07157v1
  • 225 07-09 ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment ROCKET-2: Steuerung der Visuomotorik über Cross-View Goal Alignment ROCKET-2:通过交叉视野目标协调指导维苏莫多政策 2503.02505v2
  • 226 07-09 Geometry-Informed Neural Operator Transformer Geometrie-informierter Neuraloperator Transformer 智能神经操作器变换器 2504.19452v4
  • 227 07-09 Low-Rank Adaptation Secretly Imitates Differentially Private SGD Low-Rank-Anpassung hinterhältig imitiert unterschiedlich private SGD 低浓度适应 秘密模仿 不同的私人 SGD 2409.17538v7
  • 228 07-09 A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering Eine neuartige, hybride Deep-Learning-Technik zur Sprachemotionserkennung mit Feature Engineering 利用地貌工程探测语音情感的新混合深层学习技术 2507.07046v1
  • 229 07-09 Non-Asymptotic Analysis of Online Local Private Learning with SGD Nicht-asymptotische Analyse des lokalen Online-Lernens mit SGD 与SGD合作对本地私人在线学习进行非心理分析 2507.07041v1
  • 230 07-09 Self-Supervised Learning at the Edge: The Cost of Labeling Selbstüberwachtes Lernen am Rande: Die Kosten der Kennzeichnung 边缘自导学习:标签成本 2507.07033v1
  • 231 07-09 ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation ZKTorch: Kompilieren von ML-Inferenz zu Null-Wissens-Proofs durch parallele Proof-Kumulation ZKTorch:通过平行证据累积,将ML推论编成零知识证据 2507.07031v1
  • 232 07-09 Scaling 4D Representations Skalierung von 4D-Darstellungen 缩放 4D 代表制 2412.15212v2
  • 233 07-09 Topological Machine Learning with Unreduced Persistence Diagrams Topologisches maschinelles Lernen mit nicht reduzierten Persistenzdiagrammen 具有未减少持久性图谱图的地形机学习 2507.07156v1
  • 234 07-09 On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence On-Device-Training von PV-Stromprognosemodellen in einem Smart Meter für Grid Edge Intelligence 电网边缘情报智能光电预报模型在线设计培训 2507.07016v1
  • 235 07-09 MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation MST-Destill: Mischung von spezialisierten Lehrern für Cross-Modal Knowledge Destillation MST-蒸馏:跨模式知识蒸馏专业教师混合 2507.07015v1
  • 236 07-09 When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior Wenn der Kontext nicht genug ist: Unerklärliche Variabilität im Auto-folgenden Verhalten modellieren 内容不够充分时:模拟汽车跟踪行为中未解释的变异性 2507.07012v1
  • 237 07-09 TokenShapley: Token Level Context Attribution with Shapley Value TokenShapley: Token Level Context Attribution mit Shapley Wert TokenShapley: 具有形状价值的 Token 级别背景属性 2507.05261v2
  • 238 07-09 Exact Evaluation of the Accuracy of Diffusion Models for Inverse Problems with Gaussian Data Distributions Exakte Bewertung der Genauigkeit von Diffusionsmodellen für inverse Probleme mit Gaußschen Datenverteilungen 对高山数据发布反问题传播模型准确性精确评价 2507.07008v1
  • 239 07-09 GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning GNN-ViTCap: GNN-erweitertes Multiple-Instance-Lernen mit Vision-Transformern für die Klassifizierung und Beschriftung von ganzen Folienbildern GNN-Vittrap:GNN-加强多实例学习,与全幻灯片图像分类和说明的愿景变异者一起进行全幻灯片图像分类和说明 2507.07006v1
  • 240 07-09 Skewed Score: A statistical framework to assess autograders Skewed Score: Ein statistischer Rahmen zur Bewertung von Autogradern 倾斜分分数:评估自动降解器的统计框架 2507.03772v2
  • 241 07-09 Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI Multi-Modality Conditioned Variational U-Net für Feld-of-View-Erweiterung in Gehirn-Diffusion MRT 用于脑传播MRI中实地观察扩展的多式移动式有条件U-Net 2409.13846v2
  • 242 07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Bewusst lernen, intuitiv handeln: Entsperren von Test-Time-Reasoning in multimodalen LLMs 故意、自觉地学习:在多模式LMs中解锁试验时间理由 2507.06999v1
  • 243 07-09 Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks Federated Learning-based MARL zur Stärkung der Physical-Layer-Sicherheit in B5G-Netzwerken 加强B5G网络中人身安全的联邦学习型MARL 2507.06997v1
  • 244 07-09 Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing Erzeugung von Multi-Table-Zeitreihen EHR aus dem Latent Space mit minimaler Vorverarbeitung 利用最小预处理器从冷藏空间生成多表时间序列 EHR 2507.06996v1
  • 245 07-09 A Principled Framework for Multi-View Contrastive Learning Ein prinzipielles Rahmenwerk für das kontrastive Lernen mit mehreren Ansichten 多方审查差异学习原则性框架 2507.06979v1
  • 246 07-09 PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series PyPOTS: Ein Python-Toolkit für maschinelles Lernen auf teilbeobachteten Zeitreihen PyPOPTS:关于部分使用时间序列的机器学习的Python工具包 2305.18811v2
  • 247 07-09 Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy Vereinheitlichung von Re-Identifikations-, Attribut-Schlussfolgerungs- und Datenwiederaufbaurisiken in der unterschiedlichen Privatsphäre 不同隐私中的识别、属性推断和数据重建风险 2507.06969v1
  • 248 07-09 Noisy PDE Training Requires Bigger PINNs Noisy PDE Training erfordert größere PINNs 吵闹的PDE培训需要更大的 PINN 2507.06967v1
  • 249 07-09 Pullback Flow Matching on Data Manifolds Pullback Flow Passend zu Daten-Manifolds 数据处理匹配的回回回回回流 2410.04543v2
  • 250 07-09 Off-Policy Evaluation Under Nonignorable Missing Data Off-Policy-Bewertung unter nicht entschuldbaren fehlenden Daten 在无污名失踪数据项下的非政策评价 2507.06961v1
  • 251 07-09 From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis Vom Video zum EEG: Anpassung der gemeinsamen Einbettung von vorausschauender Architektur an die Entdeckung visueller Konzepte in der Gehirnsignalanalyse 从视频到EEG:使联合嵌入的预测结构适应脑信号分析中的不可见视觉概念 2507.03633v3
  • 252 07-09 Bayesian Invariance Modeling of Multi-Environment Data Bayesische Invarianz Modellierung von Multi-Umwelt-Daten 多种环境数据的贝耶斯因果建模 2506.22675v3
  • 253 07-09 Generating Heterogeneous Multi-dimensional Data : A Comparative Study Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie 生成异质多维数据:比较研究 2507.00090v2
  • 254 07-09 Representative Ranking for Deliberation in the Public Sphere Repräsentative Rangliste für die Beratung im öffentlichen Bereich 公共领域评议代表 2503.18962v2
  • 255 07-09 ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease ADPv2: Ein hierarchischer histologischer Gewebetyp-Annotierter Datensatz für potenzielle Biomarker Entdeckung von Kolorektalerkrankungen ADPv2: 用于潜在直肠疾病生物标志发现的潜在生物标志数据集 2507.05656v2
  • 256 07-09 DICE: Data Influence Cascade in Decentralized Learning DICE: Dateneinfluss Cascade im dezentralisierten Lernen DICIC: 分散学习中的数据影响连锁数据 2507.06931v1
  • 257 07-09 Machine-Learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy Machine-learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy 混合集成级精密度拉蒂动态的机生成力场 2507.06929v1
  • 258 07-09 Distribution-free inference for LightGBM and GLM with Tweedie loss Verteilungsfreie Inferenz für LightGBM und GLM mit Tweedie-Verlust 对Tweedie损失的轻GBM和GLM灯光和GLM的无分配式推推 2507.06921v1
  • 259 07-09 Robust and Safe Traffic Sign Recognition using N-version with Weighted Voting Robuste und sichere Verkehrszeichenerkennung mit N-Version mit gewichteter Abstimmung 以加权投票方式使用N版本进行强力和安全交通信号识别 2507.06907v1
  • 260 07-09 Neural Canonical Polyadic Factorization for Traffic Analysis Neurale kanonischen Polyadische Faktorisierung für die Verkehrsanalyse 用于交通分析的神经卡纳尼基多功能因子化 2506.15079v3
  • 261 07-09 Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams Konzipieren von adaptiven Algorithmen auf Basis von Verstärkungslernen zur dynamischen Optimierung der Schiebefenstergröße in multidimensionalen Datenströmen 基于强化学习,使多维数据流中滑动窗口大小动态优化的强化学习设计适应性比值 2507.06901v1
  • 262 07-09 SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN SCore: Verstärkte Corpus-basierte Relation-Extraktion mit multi-Label Kontrastivem Lernen und Bayesian kNN SCOR:利用多标签差异学习和Bayesian kNN简化公司关系采掘 2507.06895v1
  • 263 07-09 Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants Horizontales und vertikales Federated Causal Structure Learning über hochgeordnete Kumulanten 横向和垂直联邦结业结构通过高阶碳素学习 2507.06888v1
  • 264 07-09 A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas Eine Umfrage zu Event Prediction Methods aus einer Systemperspektive: Gemeinsam unterschiedliche Forschungsbereiche 从系统角度对事件预测方法的调查:将分离研究领域结合起来 2302.04018v2
  • 265 07-09 Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems Konsequentitäts-Robustheits-Trade-Offs für Learning-Augmented Online-Knapsack-Probleme 学习增强型在线知识包在线问题近最佳一致度-交战贸易差额 2406.18752v2
  • 266 07-09 IntOPE: Off-Policy Evaluation in the Presence of Interference IntOPE: Off-Policy-Evaluierung in Gegenwart von Störungen IntOPE: 干涉情况下的非政策评价 2408.13484v2
  • 267 07-09 Conformal Prediction for Long-Tailed Classification Konforme Vorhersagen für die lange hintere Klassifizierung 长期驾驶分类的正规预测 2507.06867v1
  • 268 07-09 Episodic Contextual Bandits with Knapsacks under Conversion Models Episodic Contextual Bandits mit Knapsacks unter Conversion Models 转换模型下带有Knapsack 背包的“瞬时背景强盗” 2507.06859v1
  • 269 07-09 Adaptive Elicitation of Latent Information Using Natural Language Adaptive Vermittlung von latenten Informationen mit natürlicher Sprache 利用自然语言调适性易感信息使用自然语言 2504.04204v2
  • 270 07-09 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models DiffSpectra: Molekulare Strukturerklärung von Spectra mit Diffusionsmodellen Diffspectra: 使用扩散模型从Spetra进行分子结构说明 2507.06853v1
  • 271 07-09 EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning EMORL: Ensemble Multi-Objektive Verstärkungs-Lernen für effizientes und flexibles LLM-Fine-Tuning EMORL: 将多目标强化学习组合起来,促进高效率和灵活的LLM 微调 2505.02579v3
  • 272 07-09 Adaptive collaboration for online personalized distributed learning with heterogeneous clients Adaptive Zusammenarbeit für online personalisiertes verteiltes Lernen mit heterogenen Kunden 与不同客户进行适应性协作,促进在线个性化个人化分布式学习 2507.06844v1
  • 273 07-09 Towards Collaborative Anti-Money Laundering Among Financial Institutions Auf dem Weg zur gemeinsamen Geldwäschebekämpfung unter den Finanzinstituten 实现金融机构之间反洗钱合作 2502.19952v3
  • 274 07-09 Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning Skalierbare Gaußprozesse: Fortschritte bei iterativen Methoden und pfadweisen Konditionierungen 可缩放高斯进程:迭接方法和路径性条件方面的进展 2507.06839v1
  • 275 07-09 PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection PBCAT: Patch-basiertes komposites gegnerisches Training gegen physikalisch realisierbare Angriffe auf Objekterkennung PBCAT:针对实际可实现的物体探测攻击的补补综合对抗性训练 2506.23581v2
  • 276 07-09 Fredholm Neural Networks for forward and inverse problems in elliptic PDEs Fredholm Neural Networks für vorwärts und inverse Probleme in elliptischen PDEs Fredholm神经网络,处理椭圆形PDE中前向和反向问题 2507.06038v2
  • 277 07-09 Comprehensive Evaluation of Prototype Neural Networks Umfassende Evaluierung von Prototypen-Neuralen Netzwerken 原型神经网络综合评价 2507.06819v1
  • 278 07-09 Bayesian Multi-Scale Neural Network for Crowd Counting Bayesian Multi-Scale Neural Network für die Crowd Counting Bayesian 多人计数多空间神经网络 2007.14245v4
  • 279 07-09 Designing Robust Software Sensors for Nonlinear Systems via Neural Networks and Adaptive Sliding Mode Control Entwerfen robuster Softwaresensoren für nichtlineare Systeme über neurale Netzwerke und adaptive Schiebemodussteuerung 通过神经网络和适应性滑动模式控制,为非线性系统设计强力软件传感器 2507.06817v1
  • 280 07-09 Intrinsic Training Signals for Federated Learning Aggregation Intrinsische Trainingssignale für die Federated Learning Aggregation 联邦学习综合组织内部培训信号 2507.06813v1
  • 281 07-09 Noise tolerance via reinforcement: Learning a reinforced quantum dynamics Geräuschtoleranz durch Verstärkung: Erlernen einer verstärkten Quantendynamik 通过加强增强的噪音容忍:学习强化量子动力学 2506.12418v2
  • 282 07-09 Very fast Bayesian Additive Regression Trees on GPU Sehr schnelle Bayesian Additive Regression Bäume auf GPU 在 GPU 上非常快速的 Bayesian 补退树 2410.23244v2
  • 283 07-09 A Wireless Foundation Model for Multi-Task Prediction Ein drahtloses Foundation-Modell für Multi-Task-Vorhersage 无线电无线基金会多任务预测模型 2507.05938v2
  • 284 07-09 Speech Tokenizer is Key to Consistent Representation Sprach-Tokenizer ist der Schlüssel zur konsistenten Repräsentation 语音收发器是统一代表的关键 2507.06802v1
  • 285 07-09 Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure Neurale Netzwerke für gezähmte Milstein-Annäherung von SDEs mit additivem symmetrischem Sprunggeräusch, angetrieben durch eine Poisson Random Measure 利用 Poisson 随机测量法驱动的 添加对称性跳跃振动的 SDE 近似电磁神经网络 2507.04417v2
  • 286 07-09 The cost of ensembling: is it always worth combining? Die Kosten für die Montage: lohnt es sich immer zu kombinieren? 集合的成本:总值得合并吗? 2506.04677v2
  • 287 07-09 Test-Time Scaling with Reflective Generative Model Test-Zeit-Skalierung mit reflektierendem Generativem Modell 具有反反思考生成模型的试验时间缩放 2507.01951v2
  • 288 07-09 Temporal Information Retrieval via Time-Specifier Model Merging Temporal Information Retrieval über Zeit-Spezifikator Modellverschmelzung 通过时间规格模型合并获取时间信息 2507.06782v1
  • 289 07-09 Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm Sicheres Lernen, eingeschränkte Richtlinien durch Nachahmung des Lernens: Verbindung zur probabilistischen Inferenz und zu einem Naive-Algorithmus 通过模仿学习学习,学习安全、有限制的政策:与概率推论和自然算法的联系 2507.06780v1
  • 290 07-09 Tailoring deep learning for real-time brain-computer interfaces: From offline models to calibration-free online decoding Maßgeschneidertes Deep Learning für Echtzeit-Brain-Computer-Schnittstellen: Von Offline-Modellen zur kalibrierungsfreien Online-Dekodierung 实时脑计算机界面的深层学习:从离线模型到无校准的在线解码 2507.06779v1
  • 291 07-09 Mutual Information Free Topological Generalization Bounds via Stability Gegenseitige Informationen Freie Topologische Verallgemeinerung Grenzen durch Stabilität 通过稳定建立无地形普遍化的互信信息 2507.06775v1
  • 292 07-09 From Gradient Clipping to Normalization for Heavy Tailed SGD Vom gradienten Clipping zur Normalisierung für schweres Tailed SGD 从重尾SGD的梯度滑动到正常化 2410.13849v3
  • 293 07-09 Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking Tail-aware Adversarial Attacks: Ein Distributionsansatz für effizientes LLM-Jailbreaking 反性攻击:高效LLM监狱破解的分发办法 2507.04446v2
  • 294 07-09 Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric Robustes Deep Network Lernen von nichtlinearen Regressionsaufgaben durch parametrische Leaky Exponential Linear Units (LELUs) und eine Diffusion Metric 通过参数激光光极线性单位(LELUs)和扩散计量器对非线性回归任务进行强有力的深网络学习 2507.06765v1
  • 295 07-09 Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers Fast Equivariant Imaging: Beschleunigung für unüberwachtes Lernen über Augmented Lagrangian und Auxiliary PnP Denoisers 快速等同成像:通过增强拉格朗江语和辅助PnP Denoisers语加速未受监督的学习 2507.06764v1
  • 296 07-09 Reinforcement Learning-based Feature Generation Algorithm for Scientific Data Verstärktes Lernen-basiertes Feature-Generierungs-Algorithmus für wissenschaftliche Daten 科学数据基于学习的加强 学习 科学数据 地貌生成 算法 2507.03498v2
  • 297 07-09 Mathematical artificial data for operator learning Mathematische künstliche Daten für das Bedienerlernen 用于操作员学习的数学人工数据 2507.06752v1
  • 298 07-09 A Blockchain Solution for Collaborative Machine Learning over IoT Eine Blockchain-Lösung für kollaboratives maschinelles Lernen über IoT 合作机构在IoT上学习的链链解决方案 2311.14136v2
  • 299 07-09 EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward Modeling EFKAN: Ein KAN-Integrierter Neuraloperator für effiziente Magnetotellurische Vorwärtsmodellierung EFKAN: 高效磁电离子推进建模的KAN综合神经操作员 2502.02195v2
  • 300 07-09 Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching Dual-Granularity Cross-Modal Identity Association für schwach-überwachte Text-zu-Personen-Bild-Matching 弱度超强文本到人图像匹配双重通用跨模式身份协会 2507.06744v1
  • 301 07-09 Residual Prior-driven Frequency-aware Network for Image Fusion Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion 图像融合超前驱动频率感知网络 2507.06735v1
  • 302 07-09 Torsion in Persistent Homology and Neural Networks Torsion in persistenter Homologie und neuralen Netzwerken 持久性同族体和神经网络中的折磨 2506.03049v2
  • 303 07-09 On the Effect of Uncertainty on Layer-wise Inference Dynamics Über den Effekt der Unsicherheit auf die schichtweise Schlussfolgerungsdynamik 不确定性对图层推论作用的影响 2507.06722v1
  • 304 07-09 Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning Multiparameter-Steuerung für das $(1+(λ,λ))$-GA auf OneMax über Deep Reinforcement Learning (1+(,,)$-GA的多参数控制 2505.12982v2
  • 305 07-09 From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution Von Blurry bis Brillant Detection: YOLO-basierte Aerial Object Detection mit Super-Auflösung 从布鲁里到辉煌的探测:以YOLO为基地的超分辨率探测天体物体 2401.14661v2
  • 306 07-09 PINN-Obs: Physics-Informed Neural Network-Based Observer for Nonlinear Dynamical Systems PINN-Obs: Physik-informierter neuraler Netzwerk-basierter Beobachter für nichtlineare dynamische Systeme PINN-Obs:非线性动态系统物理内成神经网络观察员 2507.06712v1
  • 307 07-09 Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem Kausale Schlussfolgerung ist nicht besonders: Warum es nur ein weiteres Vorhersageproblem ist 原因推论不是特别的:为什么这只是另一个预测问题? 2504.04320v3
  • 308 07-09 Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity Diversifizierung von Roboter-Lokomotionsverhalten mit extrinsischer Verhaltenskuriosität 具有极端行为判断力的多样化机器人行为行为 2410.06151v3
  • 309 07-09 Autonomy by Design: Preserving Human Autonomy in AI Decision-Support Autonomie nach Design: Erhaltung der menschlichen Autonomie in der KI-Entscheidungsunterstützung 设计自主:在AI决定支持中维护人类自主 2506.23952v3
  • 310 07-09 Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement Wert aus Beobachtungen: Auf dem Weg zu groß angelegtem Imitations-Lernen durch Selbstverbesserung 从观察中得出的价值:通过自我改进实现大型模拟学习 2507.06701v1
  • 311 07-09 Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems Multi-Task Offline-Verstärkung Lernen für Online-Werbung in Recommender-Systemen 顾问系统中在线广告网上广告多任务离线强化学习 2506.23090v2
  • 312 07-09 An Optimisation Framework for Unsupervised Environment Design Ein Rahmen für die Optimierung des unbeaufsichtigten Umweltdesigns 无人监督环境设计优化框架 2505.20659v2
  • 313 07-09 Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study Heterogene Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study 用于全域和时标电力系统短期国家预报的异异变形图形神经网络:水电厂案例研究 2507.06694v1
  • 314 07-09 A statistical approach to latent dynamic modeling with differential equations Ein statistischer Ansatz zur latenten dynamischen Modellierung mit Differentialgleichungen 利用差异方程进行潜在动态建模的统计方法 2311.16286v2
  • 315 07-09 Hierarchical Procedural Framework for Low-latency Robot-Assisted Hand-Object Interaction Hierarchischer Verfahrensrahmen für die Interaktion von Robotern mit geringer Latenz mit Hand-Objekten 低弹性机器人辅助手用物体相互作用的等级程序框架 2405.19531v3
  • 316 07-09 Class conditional conformal prediction for multiple inputs by p-value aggregation Klasse bedingte Konformitätsvorhersage für mehrere Eingänge durch p-Wert-Aggregation 通过 p-value 集合对多种投入进行有条件的分类一致预测 2507.07150v1
  • 317 07-09 Fast Gaussian Processes under Monotonicity Constraints Schnelle Gauß-Prozesse unter Monotonik-Beschränkungen 在单度制约下快速高斯进程 2507.06677v1
  • 318 07-09 DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training DAF: Ein effizientes dynamisches Aktivierungs-Framework für On-Device-DNN-Schulungen DAF: 设计DNN培训的高效端对端动态启动框架 2507.07149v1
  • 319 07-09 Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey Erklärbare Künstliche Intelligenz in der biomedizinischen Bildanalyse: Eine umfassende Umfrage 生物医学图像分析中可解释的人工智能:全面调查 2507.07148v1
  • 320 07-09 Enhancing Diffusion Model Stability for Image Restoration via Gradient Management Verbesserung der Diffusions-Modellstabilität für die Bildwiederherstellung durch Gradient Management 通过梯度管理加强通过图像恢复的传播模型稳定模型 2507.06656v1
  • 321 07-09 Federated Learning Inspired Fuzzy Systems: Decentralized Rule Updating for Privacy and Scalable Decision Making Federated Learning Inspired Fuzzy Systems: Dezentrale Regel Aktualisierung für Datenschutz und skalierbare Entscheidungsfindung 受联邦学习联合会启发的模糊系统:关于隐私和可扩展决策的权力下放规则更新更新 2507.06652v1
  • 322 07-09 Deep Disentangled Representation Network for Treatment Effect Estimation Deep Disentangled Representation Network for Treatment Effect Abschätzung 治疗效果估计的深度分解代表网 2507.06650v1
  • 323 07-09 AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model AHCPTQ: Genaue und hardwarekompatible Nachschulungs-Quantisierung für Segment-Anything-Modell ACHPTQ: 分片 “ 任何 “ 模式的准确和硬件兼容的训练后培训后量化 2503.03088v2
  • 324 07-09 Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning Lost in Retraining: Roaming des Parameterraums exponentieller Familien unter geschlossenem Loop-Lernen 损失在再培训中:在闭路学习下,在封闭式学习下,对有生命力的家庭的参数空间进行Roaming 2506.20623v2
  • 325 07-09 Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment Lernen von Sparse Point-Labels für Dense Carcinosis Lokalisierung im fortgeschrittenen Ovarialkrebs-Assessment 学习在高级肠道癌症评估中,从 “ 零星点标签 “ 中学习,用于 “ 高度癌症诊断 “ 地方化 2507.06643v1
  • 326 07-09 EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision EXAONE Path 2.0: Pathologie-Stiftungsmodell mit End-to-End-Überwachung EXONE路径2.0:病理学基金会的终至终监督模式 2507.06639v1
  • 327 07-09 Semi-parametric Functional Classification via Path Signatures Logistic Regression Semiparametrische Funktionsklassifizierung über Pfadsignaturen Logistische Regression 通过 “ 路径签名 “ 进行半参数功能分类 2507.06637v1
  • 328 07-09 PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER) PBa-LLM: Datenschutz- und Bias-aware NLP mit Named-Entity-Erkennung (NER) PBA-LLM: 使用命名实体识别(NER)的隐私和Bias-awar NLP 2507.02966v2
  • 329 07-09 Prevention of Overfitting on Mesh-Structured Data Regressions with a Modified Laplace Operator Vermeidung von Überrüstungen auf netzstrukturierte Datenregressionen mit einem modifizierten Laplace-Operator 使用变换的拉拉布操作器防止对网状结构化数据递减过度适应 2507.06631v1
  • 330 07-09 Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation Gewichtetes Multi-Prompt-Lernen mit beschreibungsfreier großsprachiger Modelldestillation 无说明无大语言模式的加权多速学习 2507.07147v1
  • 331 07-09 An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs Ein aufmerksamer GNN-basierter Eingabeverteidiger gegen Multiturn-Jailbreak auf LLMs 一名关注GNN GNN的输入捍卫者, 2507.07146v1
  • 332 07-09 Multi-objective methods in Federated Learning: A survey and taxonomy Multi-objektive Methoden im Federated Learning: Eine Umfrage und Taxonomie 联邦学习的多目标方法:调查和分类 2502.03108v2
  • 333 07-09 Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning Zielorientierte Skill Abstraction für Offline-Multi-Task-Verstärkung Lernen 面向目标的离线多任务强化学习技能总结 2507.06628v1
  • 334 07-09 Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic Q-STAC: Q-geführtes Stein-Variantenmodell Predictive Actor-Critic Q-STAC: Q-受控斯坦变化模型预测动动作用-批评 2507.06625v1
  • 335 07-09 UniOD: A Universal Model for Outlier Detection across Diverse Domains UniOD: Ein universelles Modell für Ausreißer-Erkennung über verschiedene Domains UNOD: 跨不同域域的外星探测通用模型 2507.06624v1
  • 336 07-09 Saffron-1: Safety Inference Scaling Saffron-1: Sicherheitsableitende Skalierung Safron-1:安全推断比例 2506.06444v2
  • 337 07-09 Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000 Schritte Adaptive Decay DPSGD: Leistungssteigerung auf unausgeglichenen Datensätzen mit differentieller Privatsphäre mit HAM10000 适应性衰减步骤DPSGD:用HAM1000提高具有不同隐私的平衡数据集的性能 2507.06619v1
  • 338 07-09 Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance Effizientes Multi-Task-Verstärktes Lernen mit bereichsübergreifender politischer Anleitung 具有跨任务政策指导的高效多任务强化学习 2507.06615v1
  • 339 07-09 Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning Klassifizierung von Autoimmunerkrankungen aus peripheren Blut-TCR-Repertoires durch multimodales Multi-Instanz-Lernen 通过多式多机构学习,从周边血液中的自体免疫疾病 TCR 孔径通过多功能多功能学习分类 2507.04981v3
  • 340 07-09 Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation Denoising Multi-Beta VAE: Repräsentationslernen für Entflechtung und Generation 多鲸目动物VAE:拆分和代代代代代代学习 2507.06613v1
  • 341 07-09 Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Decoder-Hybrid-Decoder-Architektur für effizientes Nachdenken mit langer Generation 提高长代人合理性效率的代coder-Hybrid-Decer 结构 2507.06607v1
  • 342 07-09 Generalization in Reinforcement Learning for Radio Access Networks Generalisierung im Ausbau-Lernen für Funkzugangsnetze 无线电接入网络强化学习一般化 2507.06602v1
  • 343 07-09 SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model SeisMoLLM: Verbesserung der seismischen Überwachung durch modalen Transfer mit vortrainiertem Großsprachenmodell SeisMOLLM:利用预先培训的大型语言模式,通过跨模式转让推进地震监测 2502.19960v2
  • 344 07-09 FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models FEVO: Finanzwissenserweiterung und -entwicklung für große Sprachmodelle FEVO: 金融知识的扩大和大语言模式的推理演变 2507.06057v2
  • 345 07-09 Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots Hysterese-Aware Neural Network Modeling und Ganzkörper-Verstärkung Learning Control von Soft Robots 软机器人的神经网络建模和整体加强学习控制 2504.13582v2
  • 346 07-09 Learning controllable dynamics through informative exploration Lernen steuerbare Dynamik durch informative Erkundung 通过信息探索进行学习控制动态 2507.06582v1
  • 347 07-09 Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints Diffusionsgetriebene semantische Kommunikation für generative Modelle mit Bandbreitenbeschränkungen 在带宽限制下生成模型的传播-驱动性语义交流 2407.18468v4
  • 348 07-09 From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization Von Daten-Centric zu Proben-Centric: Verbesserung der LLM-Reasoning durch progressive Optimierung 从数据中心到抽样中心:通过渐进优化提高LLM的教学理由 2507.06573v1
  • 349 07-09 CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs CCQ: Konvolutionaler Code für extreme Low-Bit-Quantisierung in LLMs CCQ: 极低位量化微量(LLMM)的革命法 2507.07145v1
  • 350 07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference SlimCaching: Kanten-Caching von Mixture-of-Experts für verteilte Inferenz SlimCaching: 分布式推断的混合专家的边缘缓存 2507.06567v1
  • 351 07-09 PASS: Private Attributes Protection with Stochastic Data Substitution PASS: Schutz privater Attribute mit stochastischer Datenersetzung PASS: 私人财产保护,用存储数据替代数据 2506.07308v2
  • 352 07-09 CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback CHAI für LLMs: Verbesserung der code-gemischten Übersetzung in großen Sprachmodellen durch Verstärkungslernen mit KI-Feedback LLM女士CHAI:利用AI反馈加强学习,改进大语言模式的编码混合翻译 2411.09073v3
  • 353 07-09 The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production Die Fehler anderer: Ein LLM-gestützter Rahmen für die wissenschaftliche Wissensproduktion 其他法律:由LLM驱动的科学知识生产框架 2507.06565v1
  • 354 07-09 Efficient Transfer Learning via Causal Bounds Effizientes Transfer-Lernen über Causal Bounds 通过 “ 因果关系 “ 进行高效转让学习 2308.03572v5
  • 355 07-09 Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v3
  • 356 07-09 Divergence-Based Similarity Function for Multi-View Contrastive Learning Divergenzbasierte Ähnlichkeitsfunktion für das kontrastive Lernen mit mehreren Ansichten 多查看差异学习的基于差异的相似性功能 2507.06560v1
  • 357 07-09 The Primacy of Magnitude in Low-Rank Adaptation Das Primat der Magnitude in der Low-Rank-Anpassung 低Rank适应中磁力的优先性 2507.06558v1
  • 358 07-09 On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective Über die Härte der unüberwachten Domänenanpassung: Optimale Lernende und informationstheoretische Perspektive 关于未受监督的领域适应的难度:最佳学习者和信息理论视角 2507.06552v1
  • 359 07-09 Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery Q2SAR: Ein Quantum-Multiple-Kernel-Lernansatz für die Drogenentdeckung Q2SAR:药物发现量子多核心学习方法 2506.14920v3
  • 360 07-09 Deep-Learning-Based Pre-Layout Parasitic Capacitance Prediction on SRAM Designs Deep-Learning-based Pre-Layout Parasitic Capacitance Prediction auf SRAM-Designs 关于SRRAM设计设计的深层学习的Layount前寄生虫能力预测 2507.06549v1
  • 361 07-09 Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution Konzept-TRAK: Verstehen, wie Diffusionsmodelle Konzepte durch Konzept-Level-Attribution lernen 概念-TRAK:了解传播模式如何通过概念层面的归属来学习概念 2507.06547v1
  • 362 07-09 Semantic Augmentation in Images using Language Semantische Augmentation in Bildern mit Sprache 语言图像中的语义放大 2404.02353v2
  • 363 07-09 A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning Eine einzige Zusammenführung: Wiederherstellung serverbasierter Lernleistung im dezentralisierten Lernen 单一合并条件:在分散学习中恢复基于服务器的学习绩效 2507.06542v1
  • 364 07-09 Attribution Regularization for Multimodal Paradigms Zuordnungsregularisierung für multimodale Paradigmen 多式联运标准规范化 2404.02359v2
  • 365 07-09 Graph-based Fake Account Detection: A Survey Grafische Fake Account Detection: Eine Umfrage 以图表为基础的假账户探测:调查 2507.06541v1
  • 366 07-09 Understanding Malware Propagation Dynamics through Scientific Machine Learning Malware-Propagation-Dynamik durch wissenschaftliches maschinelles Lernen verstehen 通过科学机器学习了解恶意传播动态 2507.07143v1
  • 367 07-09 Few-shot Learning on AMS Circuits and Its Application to Parasitic Capacitance Prediction Nur wenige Fotos auf AMS-Schaltungen und seine Anwendung auf parasitäre Kapazität Vorhersage 关于AMS电路及其在寄生虫能力预测中的应用的微小学习 2507.06538v1
  • 368 07-09 Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits Übertragbare parasitäre Abschätzung über Graph Contrastive Learning und Label Rebalancing in AMS Circuits AMS 电路中通过图表对比学习和标签重新平衡进行可转移的寄生虫估计 2507.06535v1
  • 369 07-09 Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+ Medizinische Bildsegmentierung mit Advanced Unet: VMSE-Unet und VM-Unet CBAM+ 利用高级Unet:VMSE-Unet和VM-Unet CBAM+ 2507.00511v2
  • 370 07-09 From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions Von Big-eddy-Simulationen bis zum Deep Learning: Ein U-Net-Modell für schnelle urbane Baldachinflussprognosen 从大成模拟到深学习:快速城市树冠流预测的U-net模型 2507.06533v1
  • 371 07-09 A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence Ein Politik-Gradient-Ansatz zur Lösung von Unvollkommenheits-Informationsspielen mit bestmöglicher Konvergenz 以政策优先方式解决最不完美信息游戏与最佳战略汇合问题 2408.00751v2
  • 372 07-09 Direct Regret Optimization in Bayesian Optimization Direkte Bedauern Optimierung in Bayesian Optimierung Bayesian优化的直接遗憾优化 2507.06529v1
  • 373 07-09 InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior InvestAlign: Überwindung von Datenknappheit bei der Ausrichtung großer Sprachmodelle mit Entscheidungsprozessen von Investoren unter Herdverhalten 投资对称:克服在将大语言模式与牧民行为下的投资者决策过程相匹配方面的数据稀缺性 2507.06528v1
  • 374 07-09 Emergence in non-neural models: grokking modular arithmetic via average gradient outer product Entstehung in nicht-neuralen Modellen: Grokking modulare Arithmetik über durchschnittliche Gradienten-Außenprodukt 非新模式的出现:通过平均梯度外产产品计算模块计算 2407.20199v3
  • 375 07-09 The Power of Regularization in Solving Extensive-Form Games Die Macht der Regularisierung bei der Lösung umfangreicher Form Spiele 解决广泛形式运动会的正规化权力 2206.09495v3
  • 376 07-09 AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks AdaDPIGU: Differential Private SGD mit adaptivem Clipping und wichtigen Gradienten-Updates für tiefe neurale Netzwerke AdaDPIGU: 与深神经网络的适应性滑动和基于重要性的梯度更新有区别的私人SGD SGD 2507.06525v1
  • 377 07-09 Str-GCL: Structural Commonsense Driven Graph Contrastive Learning Str-GCL: Structural Commonsense Driven Graph Contrastive Learning Str-GCL:结构常识驱动图对比性学习 2507.07141v1
  • 378 07-09 Instance-Wise Monotonic Calibration by Constrained Transformation Instance-Wise Monotonische Kalibrierung durch eingeschränkte Transformation 由受控变换产生的实度- Wise 单声调校准 2507.06516v1
  • 379 07-09 Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts Erforschen von Sparse Adaptern für skalierbare Zusammenführung von Parameter-Effizienten Experten 探索可缩放的参数集成高效专家的分散适配器 2507.07140v1
  • 380 07-09 UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models UniF$^2$ace: Feinkörniges Gesichtsverständnis und -generierung mit einheitlichen multimodalen Modellen 2美元 UUF$$2美元ce:精美的面孔理解和具有统一多式联运模式的一代人 2503.08120v3
  • 381 07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings Pun Intended: Multi-Agent Übersetzung von Wordplay mit Kontrastivem Lernen und phonetisch-semantischen Einbettungen 未预期:多代理翻译与矛盾学习和语音-语言嵌入的文字游戏的多代理翻译 2507.06506v1
  • 382 07-09 GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification GMLM: Überbrückung von Graph Neuronalen Netzwerken und Sprachmodellen für heterophile Knotenklassifikation GMLM: 血氧哲学节点分类的架接图形神经网络和语言模型 2503.05763v5
  • 383 07-09 Oscillation-Reduced MXFP4 Training for Vision Transformers Oszillationsreduziertes MXFP4 Training für Vision Transformer MXFP4 愿景变形器培训 2502.20853v2
  • 384 07-09 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models MoFE-Time: Mischung von Frequenzdomänenexperten für Zeitreihenprognosemodelle MOFE-时间:时间-系列预测模型频率域专家混合 2507.06502v1
  • 385 07-09 Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning Bild kann Ihr Gedächtnis zurückbringen: Ein neuartiger Multi-Modal-Angriff gegen das Entlernen des Modells der Bildgenerierung 图像可以将您的记忆带回: 与图像生成模型重新学习相比的新创多模式向导攻击 2507.07139v1
  • 386 07-09 Terrier: A Deep Learning Repeat Classifier Terrier: Ein Deep Learning Repeat Klassifikator Terrier: 深学习重复分类 2503.09312v2
  • 387 07-09 On the Inherent Privacy of Zeroth Order Projected Gradient Descent Auf der inhärenten Privatsphäre der Zeroth Order projiziert Gradient Descent 关于零分顺序的固有隐私 预测的渐变后裔 2507.05610v2
  • 388 07-09 Proximal Oracles for Optimization and Sampling Proximale Oracles für Optimierung und Probenahme 最佳化和取样的近似甲骨文 2404.02239v2
  • 389 07-09 FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning FedDifRC: Entsperren des Potenzials von Text-zu-Bild-Diffusionsmodellen im Heterogenen Federated Learning FedDifRC:在异质联邦学习中释放文本到图像传播模型的潜力 2507.06482v1
  • 390 07-09 Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity Generative lagrangianische Datenassimilation für Ozeandynamik unter extremer Sparsamkeit 产生拉格朗加数据同化,用于极端零星条件下的海洋动态 2507.06479v1
  • 391 07-09 GNNs Meet Sequence Models Along the Shortest-Path: an Expressive Method for Link Prediction GNNs treffen auf Sequenzmodelle entlang des kürzesten Paths: eine Expressive Methode zur Link Prediction GNNs在最短的平面上遇到序列模型:联系预测的表达方法 2507.07138v1
  • 392 07-09 Stochastic Alignments: Matching an Observed Trace to Stochastic Process Models Stochastische Ausrichtungen: Passend zu einem beobachteten Pfad zu stochastischen Prozessmodellen 斯托卡对齐: 将观察到的路径与斯托卡过程模型匹配 2507.06472v1
  • 393 07-09 MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters MetaOptimize: Ein Framework zur Optimierung von Schrittgrößen und anderen Metaparametern 模型优化:优化步步尺寸和其他元参数的框架 2402.02342v6
  • 394 07-09 Mitigating Message Imbalance in Fraud Detection with Dual-View Graph Representation Learning Message Imbalance bei Betrugserkennung mit Dual-View Graph Representation Learning abmildern 利用双视图代表制学习减少欺诈检测中的不平衡现象 2507.06469v1
  • 395 07-09 Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Foundation Model Self-Play: Open-Ended Strategy Innovation über Foundation Models 基金会示范自发挥作用:通过基金会模式进行开放式战略创新 2507.06466v1
  • 396 07-09 Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge Automatisieren der Evaluation von Diffusion Model Unlearning mit (Vision-) Language Model World Knowledge 以(视觉)语言示范世界知识自动评价传播模式 2507.07137v1
  • 397 07-09 SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam SoftSignSGD(S3): Ein optimierter Optimierer für praktisches DNN Training und Verlust Spikes Minimierung jenseits Adams SoftSignSGD(S3):在亚当之后将实际的 DNN 培训和损失尖峰缩小到最小化的增强优化 2507.06464v1
  • 398 07-09 Cons-training Tensor Networks: Embedding and Optimization Over Discrete Linear Constraints Cons-Training Tensor Networks: Einbetten und Optimieren über diskrete lineare Einschränkungen 内嵌和优化超过分辨线性限制 2405.09005v5
  • 399 07-09 Energy-Efficient Supervised Learning with a Binary Stochastic Forward-Forward Algorithm Energieeffizientes Überwachtes Lernen mit einem binären stochastischen Vorwärtsalgorithmus 以二进制前向前前向算法进行能源高效监督学习 2507.06461v1
  • 400 07-09 Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations Nicht-negative Matrix-Fabrikisierung mit impliziten neuralen Darstellungen neu denken 以隐含神经表征代表面重新思考非电子矩阵的乘数 2404.04439v2
  • 401 07-08 (2) Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models Automatisierte Neuronen-Kennzeichnung ermöglicht generative Steuerung und Dolmetschbarkeit in Protein-Sprachmodellen 自动神经标签自动化使蛋白语言模型能够产生引导和解释 2507.06458v1
  • 402 07-08 Wild refitting for black box prediction Wilde Nachrüstung für Black Box Vorhersage 黑盒预测的野生改造 2506.21460v2
  • 403 07-08 FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models FedPhD: Federated Pruning mit Hierarchical Learning of Diffusion Models FFPhD: 与传播模型的等级化学习结合的联邦节制 2507.06449v1
  • 404 07-08 Understanding Fixed Predictions via Confined Regions Verstehe feste Vorhersagen über zusammenhängende Regionen 通过封闭区域了解固定预测 2502.16380v2
  • 405 07-08 Can Interpretation Predict Behavior on Unseen Data? Kann Interpretation Verhalten auf ungesehenen Daten voraussagen? 解释预测行为是否与未知数据有关? 2507.06445v1
  • 406 07-08 FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval FACap: Ein großformatiger Fashion-Datensatz für feinkörnige Verbundbilder 缩放成形图像检索: 用于精制合成图像检索的大型时装数据集 2507.07135v1
  • 407 07-08 Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention Quadratic Gating Mixture of Experts: Statistische Einblicke in die Selbstachtung 二次曲线交配专家:将统计透视纳入自我意识 2410.11222v3
  • 408 07-08 Deprecating Benchmarks: Criteria and Framework Deprecating Benchmarks: Kriterien und Rahmen 折旧基准:标准和框架 2507.06434v1
  • 409 07-08 eegFloss: A Python package for refining sleep EEG recordings using machine learning models eegFloss: Ein Python-Paket zur Verfeinerung von EEG-Aufnahmen mit maschinellen Lernmodellen eegFloss:一个使用机器学习模型改进睡眠 EEEE 录音的 Python 软件包 2507.06433v1
  • 410 07-08 Transfer Learning for Transient Classification: From Simulations to Real Data and ZTF to LSST Transfer Learning for Transient Classification: Von Simulationen zu Realdaten und ZTF zu LSST 中转分类的学习转移:从模拟到真实数据和ZTF到LSST 2502.18558v2
  • 411 07-08 Bridging Data Gaps of Rare Conditions in ICU: A Multi-Disease Adaptation Approach for Clinical Prediction Überbrückung von Datenlücken seltener Bedingungen in der Intensivstation: Ein Multi-Disease-Anpassungsansatz für klinische Vorhersagen 在伊斯兰法院联盟中缩小罕见条件的数据差距:临床预测的多种疾病适应办法 2507.06432v1
  • 412 07-08 Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies Neurale Aktor-Kritische Methoden für Hamilton-Jacobi-Bellman PDEs: Asymptotische Analyse und Numerische Studien 汉密尔顿-Jacobi-Bellman PDEs的神经活性-化学方法:无症状分析和数值研究 2507.06428v1
  • 413 07-08 BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks BOOST: Out-of-Distribution-informierte Adaptive Probenahme für Bias Mitigation in stylistischen konvolutionären Neuralen Netzwerken BOOST:在立体革命神经网络中为减轻细菌影响而进行分配外-综合的适应性抽样 2507.07134v1
  • 414 07-08 Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders Erforschen der Aufgabenleistung mit Interpretierbaren Modellen über Sparse Auto-Encoder 通过 Sparass Auto- Enccders 与可解释模型探索任务性能 2507.06427v1
  • 415 07-08 Generative Panoramic Image Stitching Generative Panoramabildstiche 生成全景图像切换 2507.07133v1
  • 416 07-08 Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification Kapsel-ConvKAN: Ein hybrider Neuralansatz zur Klassifizierung medizinischer Bilder Capsule-ConvKAN:医学图像分类混合神经方法 2507.06417v1
  • 417 07-08 PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning PERK: Lang-Kontext als Parameter-Effizientes Test-Zeit-Lernen PERK: 参数有效试验时间学习的长期理由 2507.06415v1
  • 418 07-08 Tokenization for Molecular Foundation Models Tokenization für molekulare Stiftungsmodelle 分子基础模型的代名化 2409.15370v3
  • 419 07-08 MedSyn: Enhancing Diagnostics with Human-AI Collaboration MedSyn: Diagnostik durch Mensch-AI-Kollaboration verbessern MedSyn:与人类-AI合作加强诊断 2506.14774v2
  • 420 07-08 SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents SHADE-Arena: Bewertung von Sabotage und Überwachung in LLM-Agenten SHADE-Arena:用LLM代理物评估破坏和监测 2506.15740v2
  • 421 07-08 Many-Task Federated Fine-Tuning via Unified Task Vectors Multi-Task-Federated Fine-Tuning über Unified Task-Vektoren 通过统一任务矢量进行许多任务、许多任务、联邦、联邦和 2502.06376v3
  • 422 07-08 Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction Lernen, autonomes Verhalten in Mensch-Roboter-Interaktion zu bewerten 学习评估人类机器人互动中的自主行为 2507.06404v1
  • 423 07-08 Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach Feinabstimmung multimodaler Transformer am Rand: Ein paralleler Split-Lernansatz 边缘的微调多式变形器:平行分割学习方法 2502.06355v3
  • 424 07-08 Detection of Intelligent Tampering in Wireless Electrocardiogram Signals Using Hybrid Machine Learning Erkennung der intelligenten Zähmung in drahtlosen Elektrokardiogrammsignalen mittels hybridem maschinellem Lernen 利用混合机器学习,在无线电心电图信号中探测智能安装设备 2507.06402v1
  • 425 07-08 The Trilemma of Truth in Large Language Models Das Trilemma der Wahrheit in großen Sprachmodellen 大语言模型中的真理三边 2506.23921v2
  • 426 07-08 Representing Prompting Patterns with PDL: Compliance Agent Case Study Präsentieren von Prompting Patterns mit PDL: Compliance Agent Case Study 代表PDL的提示模式:合规代理案例研究 2507.06396v1
  • 427 07-08 Nonlinear denoising score matching for enhanced learning of structured distributions Nichtlineare Denoisierungs-Score-Matching für verbessertes Lernen von strukturierten Distributionen 非线性非线性除空比对匹配,以加强结构化分配的学习 2405.15625v2
  • 428 07-08 Ampere: Communication-Efficient and High-Accuracy Split Federated Learning Ampere: Kommunikationseffizientes und hochgenaues Split-Federated-Learning Ampere: 通信效率和高准确度分立联邦学习 2507.07130v1
  • 429 07-08 KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks KPFlow: Eine Operator-Perspektive zum dynamischen Kollaps bei gradienter Abwärtsschulung von recurrenten Netzwerken KPFlow:经常网络渐渐下降源培训的操作员视角 2507.06381v1
  • 430 07-08 Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation Sichere und speicherfähige Deep-Learning-Modelle für Edge-KI mit automatischer Gewichtsgenerierung 使用自动重力生成的边缘AI的安全和储存高效深层学习模型 2507.06380v1
  • 431 07-08 Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos Thermisator: Stabile autoregressive neuronale Emulation des spatiotemporalen Chaos 热水器:稳定的自动递减性神经神经模拟时空混乱 2503.18731v2
  • 432 07-08 Efficient Decision Trees for Tensor Regressions Effiziente Entscheidungsbäume für Tensor-Regressionen 用于回退的高效决策树 2408.01926v2
  • 433 07-08 The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks Die Riemannsche Geometrie im Zusammenhang mit Gradient Flows of Linear Convolutional Networks 与线性革命网络的梯度流相关的里伊曼尼几何学 2507.06367v1
  • 434 07-08 DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction DecoyDB: Ein Datensatz für Graph Kontrastives Lernen in Protein-Ligand Bindung Affinity Prediction 代号DB:蛋白碱-激光捆绑性近亲预测中的图表对比学习数据集 2507.06366v1
  • 435 07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate Wachsende Transformer: Modulare Zusammensetzung und schichtweise Erweiterung auf einem gefrorenen Substrat 增长变异器:在冻结基质上的模块构成和图层扩展 2507.07129v1
  • 436 07-08 ConTextTab: A Semantics-Aware Tabular In-Context Learner ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner ConTextTab : 语义学软件制表器 2506.10707v2
  • 437 07-08 hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v2
  • 438 07-08 Deep learning-based species-area models reveal multi-scale patterns of species richness and turnover Deep Learning-basierte Arten-Gebiet Modelle zeigen Multi-Skala-Muster von Arten Reichtum und Umsatz 基于深深学习的物种区模型揭示了物种丰富性和更替性等多种规模的物种丰富性模式 2507.06358v1
  • 439 07-08 On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving Über die natürliche Robustheit von Vision-Sprachmodellen gegen visuelle Wahrnehmungsangriffe im autonomen Fahren 关于在自主驾驶时反对视觉视力攻击的视觉-语言模型的自然强力 2506.11472v2
  • 440 07-08 Trainability of Quantum Models Beyond Known Classical Simulability Ausbildungsfähigkeit von Quantenmodellen jenseits der bekannten klassischen Simulierbarkeit 超出已知古典模拟性之外的量数模型的可操作性 2507.06344v1
  • 441 07-08 A Unifying Framework for Robust and Efficient Inference with Unstructured Data Ein einheitliches Rahmenwerk für robuste und effiziente Schlussfolgerungen mit unstrukturierten Daten 与无结构数据进行有力和高效推断的统一框架 2505.00282v2
  • 442 07-08 SymFlux: deep symbolic regression of Hamiltonian vector fields SymFlux: tiefe symbolische Regression der Hamiltonschen Vektorfelder 交汇:汉密尔顿矢量场的深刻象征性回归 2507.06342v1
  • 443 07-08 Learning Nonlinear Finite Element Solution Operators using Multilayer Perceptrons and Energy Minimization Nichtlineare Finite-Elemente-Lösungs-Betreiber lernen mit Multilayer-Perceptronen und Energie-Minimierung 使用多层光谱和能源最小化的学习非线性有限元素解答操作器 2412.04596v2
  • 444 07-08 Self-supervised learning predicts plant growth trajectories from multi-modal industrial greenhouse data Selbstüberwachtes Lernen prognostiziert Pflanzenwachstumspfade aus multimodalen industriellen Treibhausgasdaten 自监督的自学预测从多模式工业温室数据中预测到植物生长轨迹 2507.06336v1
  • 445 07-08 AR2: Attention-Guided Repair for the Robustness of CNNs Against Common Corruptions AR2: Aufmerksamkeitsgeführte Reparatur für die Robustheit von CNNs gegen häufige Korruption AR2:对有线电视新闻网反常见腐败的强力进行引人注意的修理 2507.06332v1
  • 446 07-08 Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson’s Disease Sample-Efficient Verstärkungs-Lern-Controller für tiefe Gehirnstimulation bei Parkinson-Krankheit 帕金森病深脑刺激学主计长 2507.06326v1
  • 447 07-08 (How) Can Transformers Predict Pseudo-Random Numbers? (Wie) Können Transformer Pseudo-Random-Zahlen vorhersagen? (如何)变形者预测出优多-兰多数字? 2502.10390v2
  • 448 07-08 Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation Zentralisierter Kopier-Paste: Verbesserte Datenerweiterungsstrategie für die semantische Segmentierung von Wildlandfeuern 中央中央复印版面:加强荒地火灾静义分割区的数据增加战略 2507.06321v1
  • 449 07-08 Implicit Neural Representations for Chemical Reaction Paths Implizite neurale Darstellungen für chemische Reaktionswege 化学反应路径的隐性神经图示 2502.15843v3
  • 450 07-08 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Effizienz-Wirksamkeit Reranking FLOPs für LLM-basierte Reranker 为基于LLM的升级者重新安排FLOP 2507.06223v1
  • 451 07-08 Deep Learning Optimization of Two-State Pinching Antennas Systems Deep Learning Optimierung von zweistufigen Pinching-Antennensystemen 双州钉钉天线系统的深度学习优化 2507.06222v1
  • 452 07-08 Is Diversity All You Need for Scalable Robotic Manipulation? Ist Vielfalt alles, was Sie für eine skalierbare Robotermanipulation benötigen? 多样性是否是可缩放的机器人操纵所需要的? 2507.06219v1
  • 453 07-08 What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23 Was ZTF sah, wo Rubin sah: Anomalie Jagd in DR23 ZTF所看到的Rubin所看的地方:DR23的异常狩猎 2507.06217v1
  • 454 07-08 Embedding Atlas: Low-Friction, Interactive Embedding Visualization Atlas einbetten: Low-Friction, Interactive Embedding Visualisierung 嵌入式Atlas:低功能、互动嵌入式可视化 2505.06386v2
  • 455 07-08 Instruction Following by Boosting Attention of Large Language Models Anleitung, indem man die Aufmerksamkeit großer Sprachmodelle erhöht 之后的教学,培养对大语言模式的注意 2506.13734v2
  • 456 07-08 StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation StreamDiffusion: Eine Pipeline-Level-Lösung für die interaktive Echtzeit-Generierung 流流扩散:实时互动新一代管道一级的解决方案 2312.12491v2
  • 457 07-08 Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA Benchmarking der CoW mit der TopCoW Challenge: Topologie-Bewusst Anatomische Segmentierung des Willis-Kreises für CTA und MRA 将欧洲联盟委员会与顶级COW挑战:CTA和MRA的Willis圈的地形学-软件解剖分解 2312.17670v4
  • 458 07-08 Modern Methods in Associative Memory Moderne Methoden im assoziativen Gedächtnis 共同内存现代方法 2507.06211v1
  • 459 07-08 EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG EEG2TEXT-CN: Eine explorative Studie der offenen Vokabulären chinesischen Text-EEG-Ausrichtung über großsprachliches Modell und kontrastives Lernen auf ChinesischEEG EEG2TEXT-CN:通过大语言模式和中经语言差异性学习对中文文本与EEEG校对开放词汇的探索性研究 2506.00854v3
  • 460 07-08 Differential Mamba Unterschiedliche Mamba 差异 Mamba 2507.06204v1
  • 461 07-08 Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions Effiziente Umsetzung von Gaußschen Prozessregression Beschleunigte Sattelpunktsuche mit Anwendung auf molekulare Reaktionen 高效实施高斯进程递减加速散积点搜索,应用分子反应 2505.12519v2
  • 462 07-08 UQLM: A Python Package for Uncertainty Quantification in Large Language Models UQLM: Ein Python-Paket zur Unsicherheitsquantifizierung in großen Sprachmodellen UQLM: 用于大语言模型中不确定性量化的 Python 软件包 2507.06196v1
  • 463 07-08 SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads SQLBarber: Ein System, das große Sprachmodelle nutzt, um angepasste und realistische SQL Workloads zu generieren SQLBarber:一个系统利用大语言模型生成自定义和现实的 SQL 工作量 2507.06192v1
  • 464 07-08 Conservative approximation-based feedforward neural network for WENO schemes Konservatives annäherungsbasiertes neuronales Feedforward-Netzwerk für WENO-Systeme WENO计划保守近近近近近近效饲料向神经网络 2507.06190v1
  • 465 07-08 GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus GuiderNet: Ein Meta-Learning-Framework zur Optimierung der Quantum Circuit Geometrie und zur Abmilderung von Barren Plateaus 指南Net:优化量子电路几何和减缓巴伦高原的元学习框架 2506.21940v2
  • 466 07-08 A Method for Optimizing Connections in Differentiable Logic Gate Networks Eine Methode zur Optimierung von Verbindungen in differenzierbaren Logic Gate-Netzwerken 在有区别逻辑门网中优化连接的方法 2507.06173v1
  • 467 07-08 Inferring Higher-Order Couplings with Neural Networks Rückschlüsse auf höhere Auftragskoppelungen mit neuralen Netzen 与神经网络连接 2501.06108v4
  • 468 07-08 Aliasing in Convnets: A Frame-Theoretic Perspective Aliasing in Convnets: Eine rahmentheoretische Perspektive Convnets中的异异相:框架理论视角 2507.06152v1
  • 469 07-08 Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning Online-Planung für Multi-UAV Pursuit-Evasion in unbekannten Umgebungen mit Deep Verstärkung Learning 利用深强化学习,在未知环境中进行多无人驾驶航空器探索-扩展的在线规划 2409.15866v4
  • 470 07-08 Transformers Simulate MLE for Sequence Generation in Bayesian Networks Transformer simulieren MLE für Sequenzgenerierung in Bayesischen Netzwerken Bayesian 网络序列生成的变换器模拟 MLE MLE 2501.02547v2
  • 471 07-08 Topic Modeling and Link-Prediction for Material Property Discovery Thema Modellierung und Link-Prädikation für Material Property Discovery 物质财产发现专题建模和连接预测 2507.06139v1
  • 472 07-08 The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret Die Gefahren der Optimierung gelernter Belohnungsfunktionen: Niedriger Trainingsfehler garantiert keinen geringen Bedauern 优化得失回报功能的危害:低培训错误不能保证低遗憾 2406.15753v3
  • 473 07-08 Subspace-based Approximate Hessian Method for Zeroth-Order Optimization Subraum-basierte annähernde hessische Methode zur Zeroth-Order-Optimierung 近乎赫西安的赫斯法 2507.06125v1
  • 474 07-08 Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression Regression für den Mittelwert: Auto-Evaluierung und Schlussfolgerung mit wenigen Etiketten durch post-hoc-Regression 平均值的回归:通过热后回归,用很少的标签进行自动评价和推断 2411.12665v2
  • 475 07-08 Entropy stable conservative flux form neural networks Entropiestabile konservative Flussform neuronale Netzwerke Entropy稳定稳定的保守通量形成神经网络 2411.01746v2
  • 476 07-08 Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation Sichere Domain Randomisierung durch Uncertainty-Aware Out-of-Distribution-Erkennung und Politikanpassung 通过不确定性—- 软件外分发检测和政策适应而安全域域随机化 2507.06111v1
  • 477 07-08 Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification Fliesenbasierte ViT-Inferenz mit Visual-Cluster-Prioren für die zero-shot Multi-Spezies Pflanzenidentifikation 零热多物种植物识别与视光聚群前科的电压VT推断值 2507.06093v1
  • 478 07-08 Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AI Zähmen von Daten-Herausforderungen in ML-basierten Sicherheitsaufgaben: Lehren aus der Integration generativer KI 在以多边后勤为基础的安全任务中应对数据挑战:从综合生成中汲取的教训 2507.06092v1
  • 479 07-08 CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs CoRE: Verbesserung der Metakognition durch etikettenfreie Selbsteinschätzung in LRMs CORE:加强低林机制中与无标签自评的比喻意识 2507.06087v1
  • 480 07-08 A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models Ein Cascading kooperatives Multi-Agent-Framework für die On-Ramp Merging Control Integration großer Sprachmodelle 连串合作多剂框架,以综合大语言模式,促进在轨合并控制控制 2503.08199v2
  • 481 07-08 QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models QS4D: Quantization-aware-Training für effiziente Hardware-Bereitstellung strukturierter State-Space-Sequential-Modelle QS4D:为高效部署结构有序的州-空相继模型的硬件进行量化了解培训 2507.06079v1
  • 482 07-08 Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol Kontrastives und Transferlernen für effektives Audio-Fingerprinting durch ein Real-World Evaluation Protocol 通过《现实世界评价议定书》促进有效音频指纹的交流和转让学习 2507.06070v1
  • 483 07-08 Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle Sind LLMs prescient? Eine kontinuierliche Bewertung mit Daily News als Oracle LLLM是否具有先锋精神?利用每日新闻作为甲骨文进行持续评估。 2411.08324v2
  • 484 07-08 Few-Shot Learning by Explicit Physics Integration: An Application to Groundwater Heat Transport Wenig heißes Lernen durch explizite Physik-Integration: Eine Anwendung auf den Grundwasser-Wärmetransport 通过明确物理集成进行很少热的热学习:地下水热运输的应用 2507.06062v1
  • 485 07-08 Estimating prevalence with precision and accuracy Prävalenz mit Präzision und Genauigkeit abschätzen 精确和精确地估计流行率 2507.06061v1
  • 486 07-08 Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators Kernel Trace Distance: Quantum Statistical Metric zwischen den Messungen durch RKHS-Dichteoperatoren 核心跟踪距离:通过RKHS密度运算器测量的量子统计尺度 2507.06055v1
  • 487 07-08 Minimal Deterministic Echo State Networks Outperform Random Reservoirs in Learning Chaotic Dynamics Minimale deterministische Echo State Networks übertreffen zufällige Reservoirs beim Lernen der Chaotischen Dynamik 最起码的确定性回声状态网络 2507.06050v1
  • 488 07-08 Neural-Network solver of ideal MHD equilibria Neural-Netzwerk-Löser der idealen MHD-Gleichgewichte 理想MHD平衡的神经网解决方案 2507.03119v2
  • 489 07-08 EdgeCodec: Onboard Lightweight High Fidelity Neural Compressor with Residual Vector Quantization EdgeCodec: Onboard-Leichtgewicht Hoher Fidelity-Neuralkompressor mit residualer Vector-Quantisierung 载体内轻重量高菲力坚硬性神经压缩器,残余矢量量化 2507.06040v1
  • 490 07-08 TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision TextPixs: Glyphenkonditionierte Diffusion mit zeichenbewusster Aufmerksamkeit und OCR-geführter Überwachung 文本像素: 带有字符提醒注意和 OCR 引导监督的 淋巴- 有条件扩散 2507.06033v1
  • 491 07-08 Efficient Federated Learning with Timely Update Dissemination Effizientes Federated Learning mit rechtzeitiger Aktualisierung der Verbreitung 及时更新传播的高效联邦学习和及时更新更新的传播 2507.06031v1
  • 492 07-08 Multi-view mid fusion: a universal approach for learning in an HDLSS setting Multi-View Mid Fusion: ein universeller Ansatz für das Lernen in einem HDLSS-Setting 多视角多视角中融合:在多语言和多语言系统环境下学习的普遍方法 2507.06026v1
  • 493 07-08 Kamae: Bridging Spark and Keras for Seamless ML Preprocessing Kamae: Bridging Spark und Keras für Nahtlose ML Vorverarbeitung Kamae: 连接Spark和Keras, 用于无缝ML 2507.06021v1
  • 494 07-08 Instance-Optimal Quantum State Certification with Entangled Measurements Instance-Optimal Quantum State Zertifizierung mit verschränkten Messungen 具有相互缠绕的测量量的国家认证 2507.06010v1
  • 495 07-08 KnowIt: Deep Time Series Modeling and Interpretation KnowIt: Deep Time Series Modellierung und Interpretation 深时间系列建模和解释 2507.06009v1
  • 496 07-08 Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics Skalierbare diskrete Diffusionsproben: Kombinatorische Optimierung und Statistische Physik 可缩放分分解扩散采样器:组合优化和统计物理 2502.08696v3
  • 497 07-08 An Optimal Transport Perspective on Unpaired Image Super-Resolution Optimale Transportperspektive auf ungepaarte Bild-Super-Resolution 对未调图像超分辨率的优化传输视角 2202.01116v3
  • 498 07-08 DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting DRAN: Ein Vertriebs- und Beziehungsadaptives Netzwerk für die räumlich-zeitliche Vorhersage DRAN: 空间时预报分布和关系适应网络 2504.01531v2
  • 499 07-08 Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence Das beste konstante Rebalancing-Portfolio bei langfristigen Investitionen schlagen: Eine Generalisierung des Kelly-Kriteriums und des universellen Lernalgorithmus für Märkte mit serieller Abhängigkeit 击败长期投资中最佳常态再平衡组合:串级依赖市场凯利标准标准和普遍学习标准普遍化 2507.05994v1
  • 500 07-08 Counterfactual Inference under Thompson Sampling Gegenfaktische Schlussfolgerung unter Thompson Sampling Thompson抽样下的反事实推断 2504.08773v2
  • 501 07-08 A Survey of Multi Agent Reinforcement Learning: Federated Learning and Cooperative and Noncooperative Decentralized Regimes Eine Umfrage über das Lernen von Multi-Agenten-Verstärkung: Föderiertes Lernen und kooperative und nichtkooperative dezentralisierte Regimes 多代理强化学习调查:联邦学习与合作及不合作的权力下放制度 2507.06278v1
  • 502 07-08 Robust Speech-Workload Estimation for Intelligent Human-Robot Systems Robuste Sprach-Workload-Schätzung für intelligente Mensch-Roboter-Systeme 智能人类机器人系统的强力语音-工作负荷估计 2507.05985v1
  • 503 07-08 RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages RabakBench: Skalierung menschlicher Anmerkungen zum Aufbau lokalisierter Mehrsprachiger Sicherheits-Benchmarks für ressourcenarme Sprachen Rabak Rabak Bench:提高人文说明,以建立本地化的低资源语言多语文安全基准 2507.05980v1
  • 504 07-08 CoDy: Counterfactual Explainers for Dynamic Graphs CoDy: Counterfactual Explainers für dynamische Graphen CoDy: 动态图表的反事实解析器 2403.16846v2
  • 505 07-08 VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play VolleyballBots: Ein Testbett für Multi-Drone Volleyball Spiel Kombination von Motion Control und strategischem Spiel VolleyBots:多轨排球游戏组合运动控制和战略游戏的测试台 2502.01932v4
  • 506 07-08 Generalized and Unified Equivalences between Hardness and Pseudoentropy Generalisierte und einheitliche Äquivalenz zwischen Härte und Pseudoentropie 硬度和优度机能之间的普遍和统一等同 2507.05972v1
  • 507 07-08 Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model Analytisches Subraum-Routing: Wie rekursive Least Squares im kontinuierlichen Lernen des großen Sprachmodells funktionieren 分析子空间运行分析:在继续不断学习大语言模式中如何使最起码的广场恢复 2503.13575v2
  • 508 07-08 Simple Convergence Proof of Adam From a Sign-like Descent Perspective Simple Convergence Beweis für Adam aus einer sign-like Descent Perspektive 亚当的简单聚合证据 2507.05966v1
  • 509 07-08 Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification Vergleichende Analyse von CNN- und Transformer-Architekturen mit Herzzyklus-Normalisierung für automatisierte Phonokardiogramm-Klassifikation CNN和变压器结构与心脏循环正常化用于自动心电图分类的比较分析 2507.07058v1
  • 510 07-08 skfolio: Portfolio Optimization in Python skfolio: Portfoliooptimierung in Python skfolio: Python 的组合组合优化 2507.04176v2
  • 511 07-08 Rethinking Associative Memory Mechanism in Induction Head Rethinking Assoziative Memory Mechanism in Induktion Head 上岗时重新思考联合记忆机制 2412.11459v2
  • 512 07-08 Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport Unüberwachte Anomalienerkennung durch massenabstoßenden optimalen Transport 通过大规模反转最佳运输方式进行不受监督的异常探测 2502.12793v2
  • 513 07-08 Deep neural networks have an inbuilt Occam’s razor Tiefe neuronale Netzwerke haben einen eingebauten Occams Rasiermesser 深心神经网络有一个内建的奥卡姆剃刀 2304.06670v2
  • 514 07-08 Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling Verbesserung der KI-basierten Canine Heart Disease Diagnose mit Expert-Consensus Auskultation Kennzeichnung 改进基于AI的犬心病诊断,与专家 – – 咨询理事会的隔离标签 2507.05950v1
  • 515 07-08 Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence Messung variabler Bedeutung bei heterogenen Behandlungseffekten mit Vertrauen 具有信心地衡量不同不同不同待遇的可变重要性 2408.13002v4
  • 516 07-08 Information-theoretic machine learning for time-varying mode decomposition of separated aerodynamic flows Information-theoretisches maschinelles Lernen für die Zeit-Verschiedenheit-Modus Zersetzung von getrennten aerodynamischen Strömungen 分离的空气动力流动分流时间变化模式分解分解信息理论机学习 2505.24132v2
  • 517 07-08 News and Load: Social and Economic Drivers of Regional Multi-horizon Electricity Demand Forecasting News and Load: Soziale und wirtschaftliche Treiber regionaler Mehrhorizont-Stromnachfrageprognosen 新闻与负载:区域多光速电力需求预测的社会和经济驱动因素 2406.06641v2
  • 518 07-08 WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling WATS: Kalibrierung von Graphen-Neuralnetzwerken mit Wavelet-Aware-Temperaturskalierung WATS: 校准带有波盘软件温度缩放的图形神经网络 2506.23782v2
  • 519 07-08 A Comprehensive Study of Shapley Value in Data Analytics Eine umfassende Untersuchung des Shapley-Wertes in der Datenanalyse 全面研究数据分析中的虚光值 2412.01460v8
  • 520 07-08 Self-supervised learning of speech representations with Dutch archival data Selbstüberwachtes Lernen von Sprachdarstellungen mit niederländischen Archivdaten 自行监督学习用荷兰档案数据进行语音陈述 2507.04554v2
  • 521 07-08 KD$^{2}$M: A unifying framework for feature knowledge distillation KD$^{2}$M: Ein einheitlicher Rahmen für die Merkmalsdestillation von Wissen KD$2}百万美元:特色知识蒸馏的统一框架 2504.01757v2
  • 522 07-08 Online Regularized Learning Algorithms in RKHS with $β$- and $φ$-Mixing Sequences Online Regularisierte Lernalgorithmen in RKHS mit $β$- und $φ$-Mischsequenzen RKHS的在线正规化学习比值,按美元和美元混合顺序计算 2507.05929v1
  • 523 07-08 Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques Advancing Offline Handschriftliche Texterkennung: Eine systematische Überprüfung von Datenvergrößerungs- und Erzeugungstechniken 推进离线手写文本识别:系统审查数据增强和生成技术 2507.06275v1
  • 524 07-08 Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data Diffusion Datensatzkondensation: Training Ihres Diffusionsmodells schneller mit weniger Daten 传播数据集集中: 训练您的传播模型, 以更少数据更快的速度 2507.05914v1
  • 525 07-08 Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis Best-of-N durch die Glättende Lens: KL Divergence and Regret Analysis 通过平滑透镜:KL差异和悔恨分析实现最佳N级 2507.05913v1
  • 526 07-08 Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality Tiefes Lernen aus stark mischenden Beobachtungen: Sparse-penalisierte Regularisierung und Minimax-Optimalität 从高度混合的观测中汲取的深层经验教训: 2406.08321v2
  • 527 07-08 Longitudinal Ensemble Integration for sequential classification with multimodal data Longitudinal Ensemble Integration für die sequentielle Klassifizierung mit multimodalen Daten 与多式联运数据相继分类的一体化 2411.05983v2
  • 528 07-08 Trust-Region Twisted Policy Improvement Vertrauensregion verdrehte politische Verbesserung 改变政策改进 2504.06048v4
  • 529 07-08 Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen 未受监督的优化交通学习计划预测 2506.12025v3
  • 530 07-08 Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why Feature-based vs. GAN-based Learning from Demonstrations: Wann und warum 从示范活动中学习:何时和为何 2507.05906v1
  • 531 07-08 Universal Embeddings of Tabular Data Universelle Einbettungen von Tabellendaten 表格数据通用嵌入式 2507.05904v1
  • 532 07-08 On the Fundamental Impossibility of Hallucination Control in Large Language Models Über die grundsätzliche Unmöglichkeit der Halluzinationskontrolle in großen Sprachmodellen 关于大语言模型中幻听控制的基本不可能性 2506.06382v3
  • 533 07-08 Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning Stabile akustische Relay-Zuordnung mit hohem Durchsatz über Lase Chaos-basiertes Verstärkungslernen 通过以激光混乱为基础的加强学习,以高输送量方式分配稳定的声频中继任务 2507.05900v1
  • 534 07-08 Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment Composable Strategy Framework mit integrierten Video-Text basierten großen Sprachmodellen für die Herzfehlerbewertung 综合战略框架,包括基于视频文本的心衰失评估大语言模型 2502.16548v2
  • 535 07-08 Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales Verbesserung der Vertrauensschätzung in der Mensch-Roboter-Kollaboration mit Beta-Reputation bei feinkörnigen Zeitskalan 利用精确时间尺度的贝塔名声提高人类-机器人合作信任度估计 2411.01866v2
  • 536 07-08 Iterative Importance Fine-tuning of Diffusion Models Iterative Bedeutung Feinabstimmung von Diffusionsmodellen 传播模型的微调 2502.04468v2
  • 537 07-08 Robust Power System State Estimation using Physics-Informed Neural Networks Robuste Leistungssystem-State-Schätzung mit physikinformierten Neuronalen Netzwerken 利用物理成形神经网络进行强力动力系统状态估算 2507.05874v1
  • 538 07-08 Bayesian Hierarchical Invariant Prediction Bayesische Hierarchische Invariante Vorhersage Bayesian 海湾的等级性不稳定预测 2505.11211v2
  • 539 07-08 Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments Kommunikation-Effizient Modul-Wise-Federated Learning für Graspone-Erkennung in zerklüfteten Umgebungen 用于在冷冻环境中进行格拉斯普·波斯探测的通信效率模块 – – 联邦学习 2507.05861v1
  • 540 07-08 Quantum QSAR for drug discovery Quantum QSAR für die Entdeckung von Arzneimitteln 药物发现量量 QSAR 2505.04648v2
  • 541 07-08 Property Elicitation on Imprecise Probabilities Immobilien-Elizitierung auf ungenaue Wahrscheinlichkeiten 易失可能性财产援引 2507.05857v1
  • 542 07-08 Optimal Transport for Domain Adaptation through Gaussian Mixture Models Optimaler Transport für Domain-Anpassung durch Gaussian Mixture Models 通过高山混合混合模型优化迁移以适应域域 2403.13847v3
  • 543 07-08 Prototype-Guided and Lightweight Adapters for Inherent Interpretation and Generalisation in Federated Learning Prototypengeführte und leichte Adapter zur inhärenten Interpretation und Generalisierung im Federated Learning 用于联邦学习中固有解释和通化的原型辅助和轻量级适应器 2507.05852v1
  • 544 07-08 Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining Filtern wie Sie testen: Datengesteuerte Datenfilterung für CLIP-Vorschulung 像您一样的过滤器测试: CLIP 预培训前的数据驱动数据过滤 2503.08805v2
  • 545 07-08 GC-GAT: Multimodal Vehicular Trajectory Prediction using Graph Goal Conditioning and Cross-context Attention GC-GAT: Multimodale Fahrzeugbahnvorhersage mit Graph Goal Conditioning und Cross-Context-Achtung GC-GAT:利用图表目标条件和交叉注意的多式车辆轨迹预测 2504.11150v2
  • 546 07-08 PDFMathTranslate: Scientific Document Translation Preserving Layouts PDFMathTranslate: Wissenschaftliche Dokumentübersetzung Layouts erhalten PDFMath Translate:科学文件翻译保存布局 2507.03009v2
  • 547 07-08 Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing Intra-DP: Ein Hochleistungs-Kollaboratives Schlussfolgerungssystem für Mobile Edge Computing DP内部:移动边缘电子计算高绩效合作推断系统 2507.05829v1
  • 548 07-08 The Impact of Prompt Programming on Function-Level Code Generation Die Auswirkungen der Prompt-Programmierung auf die Code-Generierung auf Funktionsebene 迅速编制方案对职能层面代码生成的影响 2412.20545v2
  • 549 07-08 Accelerating Large-Scale Regularized High-Order Tensor Recovery Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung 加速大规模正规化高阶天文恢复 2506.09594v2
  • 550 07-08 Fair Domain Generalization: An Information-Theoretic View Fair Domain Generalization: Eine informationstheoretische Ansicht 公平域一般化:信息理论观点 2507.05823v1
  • 551 07-08 Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations Einfach besser oder schlechter sagen: Ein Mensch-AI-Kollaborativer Rahmen für medizinische Bildsegmentierung ohne manuelle Anmerkungen 说好说坏说好说好说坏:未经人工说明的医学图像分割人类-大赦国际协作框架 2507.05815v1
  • 552 07-08 Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs Konzeptbasierte mechanistische Interpretationsfähigkeit mit strukturierten Wissensgraphen 使用结构化知识图 2507.05810v1
  • 553 07-08 Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters Verbesserung der Robustheit von Foundation-Modellen in der Domänenanpassung mit Soup-Adaptern 改进与Soup-Adapters公司在适应域域域方面的基础模型的牢固性 2507.05807v1
  • 554 07-08 Predicting Graph Structure via Adapted Flux Balance Analysis Vorhersage der Graphenstruktur über angepasste Flux-Balance-Analyse 通过经调整的通量平衡分析实现的预测图结构 2507.05806v1
  • 555 07-08 Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v3
  • 556 07-08 An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems Ein Ensemble-Embedding-Ansatz zur Verbesserung der semantischen Caching-Performance in LLM-basierten Systemen 提高基于LLM系统的语义缓冲性能的结合嵌入方法 2507.07061v1
  • 557 07-08 MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training MAMUT: Ein neuartiges Framework zur Modifizierung mathematischer Formeln für die Generierung von Spezialdatensätzen für Sprachmodellschulungen MAMUT:为语言模式培训制作专门数据集修改数学公式的新框架 2502.20855v2
  • 558 07-08 Copula Density Neural Estimation Copula Dichte Neurale Schätzung Copula 密度心神经估计 2211.15353v3
  • 559 07-08 Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning Robuste Bandbreitenschätzung für Echtzeit-Kommunikation mit Offline-Verstärkung 与离线强化学习进行实时通信的强力波段测算 2507.05785v1
  • 560 07-08 From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification Von der Bewegung zur Bedeutung: Biomechanik-informiertes Neuronales Netzwerk zur erklärbaren Identifizierung von Herz-Kreislauf-Erkrankungen 从运动到意义:用于可解释心血管疾病识别的生物机械-内建神经网络 2507.05783v1
  • 561 07-08 On the relation between trainability and dequantization of variational quantum learning models Über die Beziehung zwischen Lernfähigkeit und Dequantisierung von variationalen Quanten-Lernmodellen 关于变化量量学习模型的可培训性和量化之间的关系 2406.07072v3
  • 562 07-08 NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models NoWag: Ein einheitliches Rahmenwerk für die Erhaltung von Formen für die Kompression großer Sprachmodelle NoWag:大型语言模型形状保留压缩统一框架 2504.14569v2
  • 563 07-08 Efficient Risk-sensitive Planning via Entropic Risk Measures Effiziente risikosensible Planung über entropische Risikomaßnahmen 通过整体风险措施进行对风险有敏感认识的有效规划 2502.20423v2
  • 564 07-08 Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack Achten Sie auf die Kosten des Gerüstes! Benign Clients können sogar Komplizen von Backdoor Attack werden 注意脚手架的成本! 仁慈的客户甚至可能成为幕后攻击的帮凶 2411.16167v3
  • 565 07-08 PSAT: Pediatric Segmentation Approaches via Adult Augmentations and Transfer Learning PSAT: Pädiatrische Segmentierungsansätze über Erwachsene Augmentationen und Transfer Learning PSAT: 通过成人增强和转移学习的儿科分解方法 2507.05764v1
  • 566 07-08 Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism Jigsaw: Training von Multi-Billion-Parameter KI-Wettermodellen mit optimierter Modellparallelität Jigsaw:以优化模型平行化优化模式培训多亿分之一的气象模型 2507.05753v1
  • 567 07-08 Pretrained Reversible Generation as Unsupervised Visual Representation Learning Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen 作为无人监督的视觉代表学习 2412.01787v5
  • 568 07-08 Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates Politikprüfung in stochastischen dynamischen Systemen mit logarithmischen Neuralzertifikaten 使用对数神经神经证书进行斯托卡动态系统的政策核查 2406.00826v3
  • 569 07-08 A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation Ein Satelliten-Ground Synergistisches Large Vision-Language-Modellsystem für die Erdbeobachtung 地球观测卫星地面同步卫星同步大型视野语言模型系统 2507.05731v1
  • 570 07-08 Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling Verbesserung der LLM-Zuverlässigkeit durch explizite Wissensgrenzenmodellierung 通过明确知识边界建模增强LLM可靠性 2503.02233v3
  • 571 07-08 Multi-Channel Hypergraph Contrastive Learning for Matrix Completion Multi-Kanal Hypergraph Kontrastives Lernen für Matrix-Vervollständigung 用于完成矩阵完成的多通道高频反竞争学习 2411.01376v2
  • 572 07-08 Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Omni-Router: Routing-Entscheidungen in Sparse Mixture-of-Experts für die Spracherkennung teilen Omni-Router: 分享语音识别专家的松散混集决定 2507.05724v1
  • 573 07-08 Empirical Validation of the Independent Chip Model Empirische Validierung des unabhängigen Chipmodells 独立芯片模型的经验验证 2506.00180v2
  • 574 07-08 Hierarchical Task Offloading for UAV-Assisted Vehicular Edge Computing via Deep Reinforcement Learning Hierarchische Aufgabe Offloading für UAV-Assisted Vehicular Edge Computing über Deep Reinforcement Learning 通过深强化学习为无人驾驶航空器协助的车辆边缘计算机卸载 2507.05722v1
  • 575 07-08 MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment MobileGUI-RL: Mobile GUI-Agent durch Verstärkungslernen im Online-Umfeld fördern MomGUI-RL:通过在线环境强化学习推进移动图形界面代理 2507.05720v1
  • 576 07-08 Agentic-R1: Distilled Dual-Strategy Reasoning Agentic-R1: Destillierte Dual-Strategy-Reasoning 剂- R1: 蒸馏的双战略理由 2507.05707v1
  • 577 07-08 MPX: Mixed Precision Training for JAX MPX: Mixed Precision Training für JAX MPX:JAX混合精密训练 2507.03312v2
  • 578 07-08 A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution Ein COMPASS zu Modellvergleich und simulationsbasierter Schlussfolgerung in der galaktischen chemischen Evolution 银河化学进化中以模型比较和模拟为基础的模拟推论的合成系统 2507.05060v2
  • 579 07-08 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs AutoTriton: Automatische Triton-Programmierung mit Verstärkungslernen in LLMs AutoTriton: 自动三联编程,在LLMs中进行强化学习 2507.05687v1
  • 580 07-08 Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG Pre-Training Graph Kontrastive Maskierte Autoencoder sind starke Destiller für EEG 在EEG中,培训前的对比图蒙面自动编码器是强大的蒸馏器 2411.19230v2
  • 581 07-08 Training Set Reconstruction from Differentially Private Forests: How Effective is DP? Trainingsset Rekonstruktion aus unterschiedlichen Privatwäldern: Wie effektiv ist DP? 从不同的私人森林中重建培训成套培训:发展伙伴关系如何有效? 2502.05307v2
  • 582 07-08 Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach Effiziente Schulung von großformatigen KI-Modellen durch Federated Mixture-of-Experts: Ein System-Level-Ansatz 通过联邦混合专家对大型AI型模式进行有效培训:全系统办法 2507.05685v1
  • 583 07-08 LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization LoSiA: Effizientes High-Rank Feintuning über Subnet Lokalisierung und Optimierung LoSiA:通过子网本地化和优化化提高高射速精密导航效率 2507.04487v2
  • 584 07-08 GATMesh: Clock Mesh Timing Analysis using Graph Neural Networks GATMesh: Uhr Mesh Timing Analyse mit Hilfe von Graph Neural Networks GATMesh:利用图形神经网络分析时钟网时间 2507.05681v1
  • 585 07-08 Speeding up Speculative Decoding via Sequential Approximate Verification Beschleunigung der spekulativen Dekodierung über die sequentielle annähernde Verifikation 通过序列近似间接核查加快投机性代号 2502.04557v3
  • 586 07-08 From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks Von Gegenfakten zu Bäumen: Wettbewerbsanalyse von Modellextraktionsangriffen 从反事实到树木:对模拟采掘攻击的竞争分析 2502.05325v2
  • 587 07-08 Canine Clinical Gait Analysis for Orthopedic and Neurological Disorders: An Inertial Deep-Learning Approach Canine Clinical Gait Analysis für orthopädische und neurologische Erkrankungen: Ein inertialer Deep-Learning-Ansatz 关于矫形和神经神经病症的犬犬临床盖特分析:一种永恒的深层学习方法 2507.05671v1
  • 588 07-08 HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales HRRRCast: ein datengesteuerter Emulator für regionale Wettervorhersagen bei Konvektion ermöglicht Skalen HRRRCast:数据驱动模拟器,用于在对流允许尺度范围内进行区域天气预报 2507.05658v1
  • 589 07-08 KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks KAN-AD: Zeitreihen-Anomalieerkennung mit Kolmogorov-Arnold-Netzwerken KAN-AD: 与科尔莫戈罗夫-阿诺尔德网络的异常探测时间序列 2411.00278v3
  • 590 07-08 MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation MALT Diffusion: Memory-Augmented Latent Transformers für jede Länge Video-Generation MALT 传播: 用于任何语言视频生成的内存增强的中子变换器 2502.12632v3
  • 591 07-08 DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning DESIGN: Verschlüsselte GNN-Inferenz über Server-Side Input Graph Pruning design:通过服务器- Side 输入图路透图加密的 GNN 推论 2507.05649v1
  • 592 07-08 Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics Krümmungsorientiertes Federated Learning (CAFe): Harmonisierung von Verlustlandschaften für Fairness ohne Demographie CAFE: 协调丧失的景观,促进没有人口统计的公平 2404.19725v5
  • 593 07-08 FACT: the Features At Convergence Theorem for neural networks FACT: die Merkmale am Konvergenz-Theorem für neuronale Netzwerke FACT:神经网络在趋同理论上的特征 2507.05644v1
  • 594 07-08 An empirical study of task and feature correlations in the reuse of pre-trained models Eine empirische Untersuchung von Aufgaben- und Merkmalskorrelationen bei der Wiederverwendung von vortrainierten Modellen 经验性研究,研究在重新使用培训前模式方面的任务和特点相互关系 2506.01975v2
  • 595 07-08 A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs Laufzeit-Adaptive Transformer Neural Network Accelerator auf FPGAs FPGAs 运行时间- 适应性变革器神经网络加速器 2411.18148v3
  • 596 07-08 Variational OOD State Correction for Offline Reinforcement Learning Unterschiedliche OOD-Staatskorrektur für Offline-Verstärkungslernen OOOD 国家离线强化学习校正 2505.00503v3
  • 597 07-08 Learnable quantum spectral filters for hybrid graph neural networks Erlernbare Quantenspektralfilter für hybride Graphen-Neuralnetzwerke 用于混合图形神经网络的可学习量子光谱过滤器 2507.05640v1
  • 598 07-08 Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs Learning Federated Neural Graph Datenbanken zur Beantwortung komplexer Abfragen aus verteilten Wissensgraphen 用于回答分布式知识图的复杂查询的学习联邦神经图数据库 2402.14609v4
  • 599 07-08 Graph Learning Graphisches Lernen 图表学习 2507.05636v1
  • 600 07-08 DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning DeepCell: Selbstüberwachte Multiview-Fusion für Circuit Representative Learning DeepCell: 巡回代表学习自上式多视角集成 2502.06816v2
  • 601 07-08 How Not to Detect Prompt Injections with an LLM Wie man Injektionen mit einem LLM nicht erkennen kann 如何不用LLM检测快速注射 2507.05630v1
  • 602 07-08 One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP Ein Surrogate an Narren: All: Universelle, übertragbare und gezielte Widersacherangriffe mit CLIP 以CLIP取代 “ 愚人Them all “ :通用、可转移和有针对性的对立攻击 2505.19840v2
  • 603 07-08 StreamDiT: Real-Time Streaming Text-to-Video Generation StreamDiT: Echtzeit-Streaming Text-zu-Video-Generierung 流流DDIT:实时流动文本到视频一代 2507.03745v2
  • 604 07-08 DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective DATENBench: Auswertung von Datensatz-Auditing in Deep Learning aus einer adversarischen Perspektive DATABENCH:从反向角度评价深学习中数据集审计 2507.05622v1
  • 605 07-08 Generative Head-Mounted Camera Captures for Photorealistic Avatars Generative Head-Mounted Kameraaufnahmen für fotorealistische Avatare 为摄影现实主义的Astatars 生成头热摄像头摄像头抓取 2507.05620v1
  • 606 07-08 Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study Erkennung und Eindämmung von Belohnungshacking in Verstärkungs-Lernsystemen: Eine umfassende empirische Studie 检测和减轻强化学习系统中的回扣:综合经验研究 2507.05619v1
  • 607 07-08 A Theory for Conditional Generative Modeling on Multiple Data Sources Eine Theorie für die bedingte Generative Modellierung auf mehreren Datenquellen 多种数据源有条件生成建模理论 2502.14583v2
  • 608 07-08 Activation Steering for Chain-of-Thought Compression Aktivierungslenkung für Ketten-of-Thought-Kompression 研究链压缩启动指导 2507.04742v2
  • 609 07-08 Stacked conformal prediction Gestapelte konforme Vorhersage 堆叠一致预测 2505.12578v3
  • 610 07-08 Efficient Detection of Intermittent Job Failures Using Few-Shot Learning Effiziente Erkennung intermittierender Job-Fälle durch wenig scharfes Lernen 利用很少热的学习方法有效检测间歇性工作失败 2507.04173v2
  • 611 07-08 The Fourier Spectral Transformer Networks For Efficient and Generalizable Nonlinear PDEs Prediction Die Fourier Spectral Transformer-Netzwerke für effiziente und generalisierbare nichtlineare PDE-Vorhersage Fariier 光谱变异器网络,以高效和可通用的非线性PDEs预测 2507.05584v1
  • 612 07-08 Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance Ethische KI für junge digitale Bürgerinnen und Bürger: Ein Aufruf zum Handeln in Sachen Datenschutz-Governance 青年数字公民道德伦理协会:隐私权治理行动呼吁 2503.11947v2
  • 613 07-08 Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization Modellfreie optische Prozessoren mit In Situ Verstärkungslernen mit proximaler Politikoptimierung 采用不使用模型的光学处理器,在采用最优政策优化的 “ 环境强化学习 “ 中采用无模型的光学处理器 2507.05583v1
  • 614 07-08 Visual Adaptive Prompting for Compositional Zero-Shot Learning Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen 零热学习的视觉适应性促进 2502.20292v5
  • 615 07-08 Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen 动态系统中不可解释的异常探测和根本原因分析 2502.12086v2
  • 616 07-08 The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation Die Landschaft der Erinnerung in LLMs: Mechanismen, Messung und Milderung LLMM中纪念的景观:机制、衡量和缓解 2507.05578v1
  • 617 07-08 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen 语言模型中强化研究链理由的特征采掘和指南 2505.15634v3
  • 618 07-08 Offline Learning and Forgetting for Reasoning with Large Language Models Offline-Lernen und Vergessen für die Vernunft mit großen Sprachmodellen 利用大语言模式进行离线学习和遗忘 2504.11364v3
  • 619 07-08 Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA Beyond Retrieval: Zusammenbauen von Cross-Encodern und GPT-Rerankern mit LLMs für biomedizinische QA 超出检索范围:利用生物医学质量评估LMs将跨环境者和GPT再生者与生物医学质量评估LMs组合在一起 2507.05577v1
  • 620 07-08 Approximating invariant functions with the sorting trick is theoretically justified Annähernde invariante Funktionen mit dem Sortiertrick sind theoretisch gerechtfertigt 用排序技巧排序相似的变量函数在理论上是合理的 2403.01671v4
  • 621 07-08 ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models ReLayout: Integration von Beziehungsgründen für die Content-aware Layout-Generierung mit multimodalen großen Sprachmodellen ReLayyout:将具有内容意识的布局生成相关理由与多模式大语言模式相结合 2507.05568v1
  • 622 07-08 AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks AbdomenAtlas-8K: Anmerkung zu 8.000 CT-Volumes für Multi-Organ-Segmentierung in drei Wochen AbdomenAtlas-8K:说明8 000个CT卷,供三周内多组织分解使用 2305.09666v3
  • 623 07-08 Exact and efficient basis pursuit denoising via differential inclusions and a selection principle Exakte und effiziente Basisverfolgung durch Differentialeinschlüsse und ein Auswahlprinzip 通过有区别的包容和甄选原则实现无无异化 2507.05562v1
  • 624 07-08 LATST: Are Transformers Necessarily Complex for Time-Series Forecasting LATST: Sind Transformer notwendig komplex für die Zeitreihenprognose LATST: 是变异器 必然复杂 用于时间- 系列预报 2410.23749v9
  • 625 07-08 Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines Preemptive Lösung künftiger Probleme: Multitask Preplay in Menschen und Maschinen 预防性解决未来问题:人与机器的多重任务前奏 2507.05561v1
  • 626 07-08 Neural Network-Based Parameter Estimation for Non-Autonomous Differential Equations with Discontinuous Signals Neurale netzwerkbasierte Parameterschätzung für nichtautonome Differentialgleichungen mit diskontinuierlichen Signalen 具有不连续信号的非自主差异等式的以神经网络为基础的参数参数估计 2507.06267v1
  • 627 07-08 Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort Unversicherbare Risiken durch KI sichern: Der Staat als Versicherer des Last Resorts AI:国家作为最后度假地的保险人 2409.06672v2
  • 628 07-08 Machine Learning based Enterprise Financial Audit Framework and High Risk Identification Machine Learning based Enterprise Financial Audit Framework und High Risk Identification 企业财务审计框架和高风险识别 2507.06266v1
  • 629 07-08 Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI Haftung und Versicherung für Katastrophenschäden: Der Atomkraftvorhersage und Lehren für KI 灾难损失的责任和保险:核能先例和对AI的教训 2409.06673v2
  • 630 07-08 A Malliavin calculus approach to score functions in diffusion generative models Ein Malliavin Kalkül Ansatz, um Funktionen in Diffusion generative Modelle punkten 以Malliavin微积分法在传播基因变异模型中计分功能 2507.05550v1
  • 631 07-08 SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v2
  • 632 07-08 Gait-Based Hand Load Estimation via Deep Latent Variable Models with Auxiliary Information Gait-basierte Handlastschätzung über tief latente variable Modelle mit Hilfsinformationen 通过带有辅助信息的深边变量模型进行基于 Gait 的手负载估计 2507.05544v1
  • 633 07-07 (1) Robust Learning on Noisy Graphs via Latent Space Constraints with External Knowledge Robustes Lernen von Noisy Graphs über Latent Space Constraints mit externem Wissen 通过具有外部知识的冷藏空间限制对噪音图进行有力的学习 2507.05540v1
  • 634 07-07 The Role of Deductive and Inductive Reasoning in Large Language Models Die Rolle der deduktiven und induktiven Vernunft in großen Sprachmodellen 大语种模式中的减量和引引原因的作用 2410.02892v3
  • 635 07-07 Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality Ausbalancierende Effizienz und Expressivität: Subgraph GNNs mit Walk-Based Centrality 平衡效率和表现:GNNs与步行中心 2501.03113v2
  • 636 07-07 Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records Zero-shot Medical Event Vorhersage mit einem Generativen Vortrainierten Transformer auf elektronischen Gesundheitsdaten 利用电子健康记录培训前变革器生成电子健康记录 2503.05893v2
  • 637 07-07 Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception Simulation refraaktiver Verzerrungen und wetterinduzierter Artefakte für ressourcenabhängige autonome Wahrnehmung 为受资源限制的自主感知模拟折变扭曲和天气诱导的天体 2507.05536v1
  • 638 07-07 Special-Unitary Parameterization for Trainable Variational Quantum Circuits Spezial-Unitary-Parameterisierung für lernfähige Variations-Quantenkreise 可训练变量量电路特殊单位参数 2507.05535v1
  • 639 07-07 Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models Zufällige Spaziergänge mit Tweedie: Eine einheitliche Ansicht von Score-basierten Diffusionsmodellen 与 Tweedie 一起随机漫步:基于分数的传播模型统一视图 2411.18702v2
  • 640 07-07 Theoretical Learning Performance of Graph Neural Networks: The Impact of Jumping Connections and Layer-wise Sparsification Theoretische Lernleistung von Graph Neural Networks: Die Auswirkungen von Sprungverbindungen und schichtweise Sparsifikation 图形神经网络的理论学习表现:跳动连接和图层分化的影响 2507.05533v1
  • 641 07-07 Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search Bit-Flip-Fault-Angriff: Zerkleinernde Graphen-Neural-Netzwerke über schrittweise Bitsuche Bit- Flip 错误攻击: 通过渐变位搜索粉碎图形神经网络 2507.05531v1
  • 642 07-07 Mitigating Shortcut Learning with InterpoLated Learning Kurzbefehls-Lernen mit interpoLiertem Lernen abmildern 以互插学习来缩短快捷键学习 2507.05527v1
  • 643 07-07 Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning Schätzung der Interventionsverteilungen mit unsicheren Kausalgraphen durch Meta-Learning 通过Met-Llearn 估计含不确定因果图的干预分布 2507.05526v1
  • 644 07-07 Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models 脱离柏拉图的洞穴:调整独立培训的愿景和语言模式的JAM 2507.01201v3
  • 645 07-07 Deep Learning of Continuous and Structured Policies for Aggregated Heterogeneous Treatment Effects Deep Learning von kontinuierlichen und strukturierten Strategien für Aggregate Heterogene Behandlungseffekte 深入学习关于综合异异同治疗效果的持续和结构性政策 2507.05511v1
  • 646 07-07 Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v2
  • 647 07-07 Heterogeneous Causal Learning for Optimizing Aggregated Functions in User Growth Heterogenes Kausallernen zur Optimierung Aggregation von Funktionen im Nutzerwachstum 在用户增长中优化综合功能以优化综合功能 2507.05510v1
  • 648 07-07 Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning Beyond Communication Overhead: Ein mehrstufiger Monte-Carlo-Ansatz zur Vermeidung von Kompressions-Bias im verteilten Lernen ” 超越传播 “ overoverhead:减少在分配的学习中 “ 压力 “ 的多层次蒙特卡洛办法 2507.05508v1
  • 649 07-07 X-ray transferable polyrepresentation learning Röntgen übertragbares Polyrepräsentationslernen X光可转移多代表制学习 2507.06264v1
  • 650 07-07 Dynamic Campus Origin-Destination Mobility Prediction using Graph Convolutional Neural Network on WiFi Logs Dynamic Campus Origin-Destination Mobilitätsvorhersage mit Graph Convolutional Neural Network auf WiFi Logs 利用无线、无线、无线、无线、无线、无线、无线、无线、无线、无线、无线和无线 2507.05507v1
  • 651 07-07 Predicting mutational effects on protein binding from folding energy Vorhersage von Mutationseffekten auf Proteinbindung aus Faltenergie 预测折叠能量对蛋白质结合的突变效应 2507.05502v1
  • 652 07-07 Explainable Hierarchical Deep Learning Neural Networks (Ex-HiDeNN) Erklärbare Hierarchische Deep-Learning-Neural-Netzwerke (Ex-HiDeNN) 可解释的等级级深层学习神经网络(Ex-HIDENN) 2507.05498v1
  • 653 07-07 MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation MEIT: Multimodale Elektrokardiogramm-Anleitung Tuning auf großen Sprachmodellen für die Berichtserstellung MEIT:多式电动心电图指示,关于报告生成大语言模式的图示 2403.04945v4
  • 654 07-07 Cloud Diffusion Part 1: Theory and Motivation Cloud Diffusion Teil 1: Theorie und Motivation 云传播第一部分:理论和动力 2507.05496v1
  • 655 07-07 Navigating Sparse Molecular Data with Stein Diffusion Guidance Navigation von Sparse Molekulardaten mit Stein Diffusion Guidance 使用 Stein 扩散指导导航微粒分子数据 2507.05482v1
  • 656 07-07 Dynamic Regret Reduces to Kernelized Static Regret Dynamisches Bedauern reduziert den kernelisierten statischen Bedauern 向内内中性静态迟缓的动态遗憾减少 2507.05478v1
  • 657 07-07 Epistemically-guided forward-backward exploration Epistemisch geführte Forward-Backward-Exploration 后向探索 2507.05477v1
  • 658 07-07 Features are fate: a theory of transfer learning in high-dimensional regression Merkmale sind Schicksal: eine Theorie des Transferlernens in hochdimensionaler Regression 特征是命运:在高维回归中传授学习的理论 2410.08194v2
  • 659 07-07 Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting Temporal Conformal Prediction (TCP): Ein verteilungsfreies statistisches und maschinelles Lernkonzept für adaptive Risikoprognosen 时空危机预测:用于适应风险预测的不分发的统计和机器学习框架 2507.05470v1
  • 660 07-07 GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC GCN-getriebenes Verstärkungslernen für probabilistische Echtzeitgarantien in industriellen URLLC GCN-Driven Energy Enter Enter Engment 学习,以在工业URLC中进行概率性实时保证 2506.15011v2
  • 661 07-07 2048: Reinforcement Learning in a Delayed Reward Environment 2048: Verstärktes Lernen in einer verzögerten Belohnungsumgebung 2048年:在延迟奖励环境中加强学习 2507.05465v1
  • 662 07-07 RSPO: Regularized Self-Play Alignment of Large Language Models RSPO: Regularisierte Selbstausrichtung großer Sprachmodelle RSPO:大语言模式标准化的自用调整 2503.00030v2
  • 663 07-07 When Federated Learning Meets Quantum Computing: Survey and Research Opportunities Wenn Federated Learning auf Quanten Computing trifft: Umfrage- und Forschungsmöglichkeiten 《当联邦学习与量子计算:调查和研究机会》 2504.08814v2
  • 664 07-07 Towards General Continuous Memory for Vision-Language Models Auf dem Weg zum allgemeinen Dauergedächtnis für Vision-Sprachenmodelle 争取实现视野-语言模型的一般持续记忆 2505.17670v2
  • 665 07-07 Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack 通过最大对抗性多目标攻击对财务报告进行反向机器学习攻击 2507.05441v1
  • 666 07-07 DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning DeepRetro: Retrosynthetische Pathway Discovery mit iterativem LLM Reasoning 使用迭代LM 理由解释的回溯合成途径发现 2507.07060v1
  • 667 07-07 Optimizing Bidding Strategies in First-Price Auctions in Binary Feedback Setting with Predictions Optimierung von Bietenstrategien in Erstpreis-Auktionen in Binary-Feedback-Einstellung mit Vorhersagen 在有预测的二进制反馈中,在第一价格拍卖中优化投标战略 2506.15817v2
  • 668 07-07 LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users LLM Hypnose: Nutzung des Benutzerfeedbacks für unautorisierte Wissensinjektion für alle Benutzer LLM Hypnisis:利用用户反馈,为所有用户提供未经授权知识注射 2507.02850v2
  • 669 07-07 Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning Coarse-to-fine Q-Network mit Aktionssequenz für dateneffizientes Roboterlernen Coarse 至 fine Q 网络与数据效率机器人学习行动序列 2411.12155v5
  • 670 07-07 Motion Generation: A Survey of Generative Approaches and Benchmarks Motion Generation: Eine Übersicht über generative Ansätze und Benchmarks 动议的产生:对产生方式和基准的调查 2507.05419v1
  • 671 07-07 Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning Global lernen, lokal sprechen: Die Lücken in der Mehrsprachigkeit überwinden 《全球学习,本地语言:弥合多种语文理由方面的差距》 2507.05418v1
  • 672 07-07 What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning Was würdest du fragen, wenn du zuerst $a^2+b^2=c^2$ sahst? LLM auf Kuriosität-getriebene Fragestellung bewerten 当你第一次看到$2+b2=c2$时你会问什么? 2409.17172v2
  • 673 07-07 EmissionNet: Air Quality Pollution Forecasting for Agriculture EmissionsNet: Vorhersage der Luftqualität für die Landwirtschaft 排放网:农业空气质量污染预测 2507.05416v1
  • 674 07-07 Tractable Transformers for Flexible Conditional Generation Tractable Transformer für flexible Bedingte Generation 灵活条件一代的可变变变换器 2502.07616v2
  • 675 07-07 Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift Einschließliche Interventionale Unabhängigkeit verbessert Robustheit gegen Interventionale Verteilungsverschiebung 纳入干预性独立 增强抵御干预性分配转变的力度 2507.05412v1
  • 676 07-07 Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification Probabilistisch verstärkte lineare Entspannungs-basierte Störungsanalyse für neurale Netzwerkverifizierung 神经网络核查的基于扰动分析的线性轻松度分析 2507.05405v1
  • 677 07-07 Q-Detection: A Quantum-Classical Hybrid Poisoning Attack Detection Method Q-Erkennung: Eine Quantum-klassische Hybrid-Vergiftungs-Angriffserkennungsmethode Q-检测:量子谱混合中毒袭击探测方法 2507.06262v1
  • 678 07-07 Embedding-Based Approaches to Hyperpartisan News Detection Einbetten-basierte Ansätze zu Hyperparteien-Nachrichten-Erkennung 以嵌入式方法探测超党派新闻 2501.01370v3
  • 679 07-07 pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models pFedMMA: Personalisiertes Federed Fine-Tuning mit Multi-Modal Adapter für Vision-Language-Modelle pFedMMA:具有视觉语言模型多模式适应器的个性化联邦微调 2507.05394v1
  • 680 07-07 Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training Verstärkte Feinsteuerung mildert natürlich das Vergessen im kontinuierlichen Post-Training 长期培训后忘却的自然缓解措施 2507.05386v1
  • 681 07-07 On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study Über die Bias von Next-Token Predictors hin zu systematisch ineffizienter Begründung: Eine Kurzeste-Path-Fallstudie 关于对系统、低效率理性的 “ 下肯预测家 “ 的偏见:最短篇案例研究 2507.05362v1
  • 682 07-07 LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks LoRA-Augmented Generation (LAG) für wissensintensive Sprachaufgaben LoRA - 知识密集型语言任务支持代(LAG) 2507.05346v1
  • 683 07-07 Causal Foundation Models: Disentangling Physics from Instrument Properties Causal Foundation Models: Entwirren von Physik aus Instrumenteneigenschaften 因果基金会模型:将物理与仪器属性分离 2507.05333v1
  • 684 07-07 Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations Beyond One Shot, Beyond One Perspective: Cross-View und Long-Horizon-Destillation für bessere LiDAR-Darstellungen 超越一个镜头,超越一个视角:交叉查看和长毛利区蒸馏,以更好地进行激光雷达演示 2507.05260v1
  • 685 07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Spatio-Temporal LLM: Begründung von Umgebungen und Aktionen Spatio-临时LLM:环境原因和行动 2507.05258v1
  • 686 07-07 Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining Human2LocoMan: Vielseitige Quadrupedalmanipulation mit menschlichem Vortraining lernen 人类2 Locoman: 学习与人类预科培训一起四步操作 2506.16475v2
  • 687 07-07 From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving Von Marginal zu Joint Predictions: Bewertung szenekonsistenter Trajektorien-Vorhersageansätze für Automatisiertes Fahren 从边际到联合预测:评价自动驾驶的场景-群集轨迹预测方法 2507.05254v1
  • 688 07-07 Physics-Guided Dual Implicit Neural Representations for Source Separation Physik-geführte Dual-Implizite-Neural-Darstellungen für die Quellentrennung 源分离的物理引导双隐含神经图示 2507.05249v1
  • 689 07-07 Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints Multi-Disease Deep Learning Framework für GWAS: Neben der Feature-Auswahl Einschränkungen GWAS多重疾病深入学习框架:超越地物选择限制 2507.05247v1
  • 690 07-07 Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Über Heuristiken hinausgehend durch politische Verbesserung als Zwang 将政策改进作为约束手段, 2507.05328v1
  • 691 07-07 Logit Reweighting for Topic-Focused Summarization Logit-Umgewichtung für thematisch fokussierte Zusammenfassung 用于专题用途汇总的对数重新加权 2507.05235v1
  • 692 07-07 Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals Transformer können nicht-lineare und nicht-markowsche Filterprobleme in kontinuierlicher Zeit für bedingt gaussische Signale lösen 变换器可以在连续时间解答非滑动和非马尔科维的过滤问题, 以用于有条件的高斯信号 2310.19603v3
  • 693 07-07 SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound SEE-2-SOUND: Zero-Shot-Spatial-Environment-to-Spatial-Sound SEE-2-SOUND:零热空间环境对空间声音 2406.06612v2
  • 694 07-07 Cascade: Token-Sharded Private LLM Inference Cascade: Token-Sharded Private LLM Inferenz Cascade: 石块硬硬的私人LLM 推论 2507.05228v1
  • 695 07-07 NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving NavigScene: Überbrückung der lokalen Wahrnehmung und globale Navigation für das autonome Fahren jenseits des Visuellen Bereichs 导航-光景:将地方感知与全球导航连接起来,促进超越视觉-Range 自主驾驶 2507.05227v1
  • 696 07-07 OminiControl: Minimal and Universal Control for Diffusion Transformer OminiControl: Minimale und universelle Steuerung für Diffusionstransformator Omini Control: 扩散变异器最小和通用控制 2411.15098v6
  • 697 07-07 QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions QuEst: Verbesserung der Schätzungen von Quantil-basierten Verteilungsmaßen mittels Modellvorhersagen Quest:利用模型预测,提高量基分配措施的估计数 2507.05220v1
  • 698 07-07 A 3D Machine Learning based Volume Of Fluid scheme without explicit interface reconstruction Ein 3D Machine Learning based Volume Of Fluid Schema ohne explizite Schnittstellenrekonstruktion 基于A3D的基于A3D机机学习的流流计划量,没有明确的接口重建 2507.05218v1
  • 699 07-07 Bridging Prediction and Intervention Problems in Social Systems Überbrückung von Vorhersagen und Interventionsprobleme in den Sozialsystemen 弥合社会制度中的预测和干预问题 2507.05216v1
  • 700 07-07 Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense Mehrstufige Picard-Annäherungen und tiefe neuronale Netzwerke mit ReLU, undichter ReLU und Softplus-Aktivierung überwinden den Fluch der Dimensionalität, wenn sie semilineare parabolische partielle Differentialgleichungen in $L^p$-Sense annähern 多级 Piccar 近似和深神经网络,与 ReLU、 泄漏 ReLU 和软附加激活 克服了维度的诅咒, 当半线性半线性抛抛物线部分偏差方程以 $Lp$- sense 等值接近一致时 2409.20431v3
  • 701 07-07 ST-LoRA: Low-rank Adaptation for Spatio-Temporal Forecasting ST-LoRA: Niedrigrank-Anpassung für Spatio-Temporale Vorhersagen ST-LORA: 低度适应性 2404.07919v2
  • 702 07-07 Pre-Trained Policy Discriminators are General Reward Models Pre-Trained Policy Diskriminatoren sind allgemeine Prämienmodelle 培训前政策歧视者是一般奖励模式 2507.05197v1
  • 703 07-07 Train-before-Test Harmonizes Language Model Rankings Train-fore-Test harmonisiert Sprachmodell-Rankings 培训前测试前统一语言协调培训示范排列 2507.05195v1
  • 704 07-07 $\varphi$-Adapt: A Physics-Informed Adaptation Learning Approach to 2D Quantum Material Discovery $\varphi$-Adapt: Ein physikinformierter Anpassungs-Lernansatz zur 2D Quantenmaterial-Entdeckung 美元-dapt: 2D 量量材料发现物理上综合的适应学习方法 2507.05184v1
  • 705 07-07 MMD-OPT : Maximum Mean Discrepancy Based Sample Efficient Collision Risk Minimization for Autonomous Driving MMD-OPT : Maximale mittlere Diskrepanz auf Probenbasis Effiziente Kollisionsrisikominimierung für autonomes Fahren MMD-OPT: 以最大平均值差差为基础的抽样有效碰撞风险最小化 2412.09121v2
  • 706 07-07 Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis Auf dem Weg zu erklärbarer Fusion und ausgewogenem Lernen in der multimodalen Sentiment-Analyse 争取在多式联运分析中实现可解释的融合和平衡学习 2504.12151v2
  • 707 07-07 Blind Targeting: Personalization under Third-Party Privacy Constraints Blind Targeting: Personalisierung unter Datenschutzbeschränkungen Dritter 瞄准盲人:第三方隐私限制下的个性化 2507.05175v1
  • 708 07-07 Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models Physikkodierte Blöcke in residualen Neural Netzwerkarchitekturen für digitale Zwillingsmodelle 数字双型模型残余神经网络架构中的物理编码区块 2411.11497v2
  • 709 07-07 Critiques of World Models Kritik an Weltmodellen 世界模式的证明 2507.05169v1
  • 710 07-07 Can Local Representation Alignment RNNs Solve Temporal Tasks? Kann die lokale Vertretung RNNs zeitliche Aufgaben lösen? 当地代表协调区域网络能够解决时空任务吗? 2504.13531v2
  • 711 07-07 Language Models can Self-Improve at State-Value Estimation for Better Search Sprachmodelle können sich bei State-Value-Schätzung für eine bessere Suche selbst improvisieren 语言模型可以在国家估价估算中自我改进,以更好地搜索 2503.02878v2
  • 712 07-07 A Dynamical Systems Perspective on the Analysis of Neural Networks Eine dynamische Systemperspektive zur Analyse neuraler Netzwerke 神经网络分析动态系统视角 2507.05164v1
  • 713 07-07 Role of scrambling and noise in temporal information processing with quantum systems Rolle von Scrambling und Rauschen in der zeitlichen Informationsverarbeitung mit Quantensystemen 利用量子系统处理时空信息的作用 2505.10080v2
  • 714 07-07 Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks Skalierbares Multi-Task-Lernen für die Rekonstruktion von Teilchenkollisionsereignissen mit heterogenen Graphen-Neuralnetzwerken 可缩放多任务学习,用于与异形图形神经网络相配合的粒子碰撞重建活动 2504.21844v2
  • 715 07-07 Robust Molecular Property Prediction via Densifying Scarce Labeled Data Robuste Vorhersage der molekularen Eigenschaften durch verdichtende, mit Beschriftung versehene Daten 通过冷凝隐蔽标签数据进行强力分子属性预测 2506.11877v2
  • 716 07-07 Distributional Diffusion Models with Scoring Rules Verteilungs-Diffusionsmodelle mit Bewertungsregeln 配有Scorcing 规则的分发传播模型 2502.02483v3
  • 717 07-07 OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows OGF: Eine Online-Gradient-Flow-Methode zur Optimierung des statistischen Steady-State-Zeitdurchschnitts unbeständiger Turbulentenströme OGGF:优化不稳定波动流量统计稳定时间平均值的在线渐进流动方法 2507.05149v1
  • 718 07-07 Dataless Neural Networks for Resource-Constrained Project Scheduling Datenlose Neurale Netzwerke für ressourcengebundene Projektplanung 无数据神经网络 资源培训项目日程安排 2507.05322v1
  • 719 07-07 Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings Pseudo-Likelihood erzeugt assoziative Erinnerungen, die auch für asymmetrische Kupplungen verallgemeinern können 优多似近似产生连带记忆,即使对非对称联结而言,也能加以概括化。 2507.05147v1
  • 720 07-07 VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems VERITAS: Verifizierung und Erklärung der Realität in Bildern für Transparenz in KI-Systemen VERITAS: 核实和解释图像的真实性,促进AI系统透明度 2507.05146v1
  • 721 07-07 A generalized Wasserstein-2 distance approach for efficient reconstruction of random field models using stochastic neural networks Ein generalisierter Wasserstein-2-Distanzansatz zur effizienten Rekonstruktion von Zufallsfeldmodellen mit stochastischen neuronalen Netzwerken 利用随机现场模型利用随机神经神经网络有效重建的通用瓦森斯坦-2远程方法 2507.05143v1
  • 722 07-07 Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications Hardware-effiziente tragbare probabilistische Inferenz für TinyML Neurosymbolische KI-Anwendungen TinyML Nerosymblic AI 应用的硬件高效可移动可移动概率推推 2507.05141v1
  • 723 07-07 Inductive randomness predictors: beyond conformal Induktive Zufallsvorhersagen: jenseits von konform 感导随机性预测值:超出符合 2503.02803v2
  • 724 07-07 Deep Learning to Automate Parameter Extraction and Model Fitting of Two-Dimensional Transistors Deep Learning zur Automatisierung der Parameterextraktion und Modellmontage von zweidimensionalen Transistoren 深入学习自动取用参数自动提取和两维晶体管模型 2507.05134v1
  • 725 07-07 Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions Federated Learning for Big Data: Eine Umfrage zu Chancen, Anwendungen und Zukunftsrichtungen 大数据联邦学习:机会、应用和未来方向调查 2110.04160v3
  • 726 07-07 SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction SMART: Simulierte Schüler ausgerichtet mit Item Response Theorie für Frage Schwierigkeitsvorhersage SMART: 与 “ 问题困难预测 “ 项目反应理论一致的模拟学生 2507.05129v1
  • 727 07-07 A Concise Lyapunov Analysis of Nesterov’s Accelerated Gradient Method Eine kurze Lyapunov-Analyse der beschleunigten Gradientenmethode von Nesterov Nesterov加速梯度方法的简明Lyapunov分析 2502.17373v3
  • 728 07-07 LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks LVM4CSI: Direkte Anwendung von vortrainierten Large Vision-Modellen für Wireless Channel-Aufgaben LVM4CSI:无线频道任务能够直接应用培训前大型愿景模型 2507.05121v1
  • 729 07-07 A Comparative Study of Machine Learning Algorithms for Stock Price Prediction Using Insider Trading Data Eine vergleichende Studie des maschinellen Lernens Algorithmen für Aktienkursvorhersage mit Insider Trading-Daten 利用内部贸易数据进行股票价格预测的机器学习算法比较研究 2502.08728v2
  • 730 07-07 Fast online node labeling with graph subsampling Schnelle Online-Node-Beschriftung mit Graphen-Subsampling 带有图形子抽样的快速在线节点标签 2503.16755v2
  • 731 07-07 CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation CLIP-geführte Backdoor-Verteidigung durch Entropie-basierte vergiftete Datensatztrennung CLIP-通过基于英基中毒数据集的分离来引导后门防御 2507.05113v1
  • 732 07-07 DICE: Discrete inverse continuity equation for learning population dynamics DICE: Diskrete inverse Kontinuitätsgleichung für das Lernen von Populationsdynamik DICIC: 学习人口动态的分辨反反连续性方程 2507.05107v1
  • 733 07-07 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs PRING: Protein-Protein-Interaktionsvorhersage von Paaren zu Graphen neu denken 重新思考从对到图的蛋白质-蛋白质-蛋白质互动预测 2507.05101v1
  • 734 07-07 Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance Beyond Features: Wie Dataset Design die Leistung von Multi-Agent Trajektorien beeinflusst 超越地貌外的地貌:数据集设计如何影响多要素轨迹预测性能 2507.05098v1
  • 735 07-07 Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph Erforschung semantischer Clusterung und Ähnlichkeit Suche nach heterogenen Verkehrsszenarien Graph 探索迭代交通情况图 2507.05086v1
  • 736 07-07 Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks Verteilungsabhängige Generalisierungs-Bounds für das Tuning linearer Regressions-Across-Aufgaben 调制线性递退跨任务时的视分配情况而定的通用宽度 2507.05084v1
  • 737 07-07 Mirror Online Conformal Prediction with Intermittent Feedback Spiegel Online Konforme Vorhersage mit intermittierendem Feedback 附带间断反馈的 “ 镜 “ 在线非正式预测 2503.10345v5
  • 738 07-07 A dimensionality reduction technique based on the Gromov-Wasserstein distance Eine Dimensionalitätsreduktionstechnik auf Basis der Gromov-Wasserstein-Distanz 基于Gromov-Wasserstein距离的降低维度技术 2501.13732v2
  • 739 07-07 Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference Computation-Aware Gaussian Prozesse: Modellauswahl und lineare Zeitableitung 计算-软件高斯过程:模型选择和线性时间推算 2411.01036v2
  • 740 07-07 Replacing thinking with tool usage enables reasoning in small language models Das Ersetzen des Denkens durch den Werkzeuggebrauch ermöglicht die Argumentation in kleinen Sprachmodellen 以使用工具取代思维,使小语言模式推理成为可能 2507.05065v1
  • 741 07-07 Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes Vecchia-Inducing-Points Voll-Scale-Annäherungen für Gauß-Prozesse 高斯进程的全面接近度 2507.05064v1
  • 742 07-07 AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics KI-getriebene Zytomorphologie Bildsynthese für medizinische Diagnostik AI-Driven 用于医学诊断的细胞形态图像合成 2507.05063v1
  • 743 07-07 Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems Follow-the-Perturbed-Leader nähert sich Best-of-Both-Worlds für die m-Set Semi-Bandit-Probleme M-Set半银行问题最佳世界最佳办法 2504.07307v4
  • 744 07-07 Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v3
  • 745 07-07 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v2
  • 746 07-07 Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens Jenseits von Skalierungskurven: Interne Dynamik neuraler Netzwerke durch die NTK-Lens 超越缩放曲线:通过NTK镜头神经网络的内部动态 2507.05035v1
  • 747 07-07 A Generative Diffusion Model for Amorphous Materials Ein Generatives Diffusionsmodell für amorphe Materialien 无定形材料发源扩散模型 2507.05024v1
  • 748 07-07 A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation Eine neuartige automatische Echtzeit-Bewegungsverfolgungsmethode in der MRT-geführten Radiotherapie mit erweitertem Tracking-Learning-Detection-Framework mit automatischer Segmentierung 利用带有自动分割的强化跟踪-学习-检测框架,在MRI-制导放射治疗中使用新颖自动实时实时运动跟踪方法 2411.07503v3
  • 749 07-07 Meta-Learning Transformers to Improve In-Context Generalization Meta-Learning Transformer zur Verbesserung der In-Context-Verallgemeinerung 利用元学习变换器改进内文通化 2507.05019v1
  • 750 07-07 Relative Overfitting and Accept-Reject Framework Relativer Überbau und Akzeptanz-Abwehrrahmen 相对超配和接受-接受-拒绝框架 2505.07783v4
  • 751 07-07 Moderate Adaptive Linear Units (MoLU) Mäßige adaptive Lineareinheiten (MoLU) 适应性线性线性单位(MoLU) 2302.13696v6
  • 752 07-07 Conditional Graph Neural Network for Predicting Soft Tissue Deformation and Forces Bedingtes Graphen-Neural-Netzwerk zur Vorhersage weicher Gewebedeformationen und Kräfte 预测软质变形和力量有条件图图神经网络 2507.05315v1
  • 753 07-07 Random weights of DNNs and emergence of fixed points Zufällige Gewichte von DNNs und Entstehung von festen Punkten DNN的随机权数和固定点的出现 2501.04182v2
  • 754 07-07 Solar Flare Prediction Using LSTM and DLSTM with Sliding Window Pattern Recognition Solarflare-Vorhersage mit LSTM und DLSTM mit Schiebefenstermustererkennung 使用 LSTM 和 DLSTM 和 DLSTM 进行太阳光电预测,并识别滑动窗口模式 2507.05313v1
  • 755 07-07 The Case for Instance-Optimized LLMs in OLAP Databases Der Fall der instance-optimierten LLM in OLAP-Datenbanken OLAP数据库中经适当改造的理 案LMLM 2507.04967v1
  • 756 07-07 Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning Maskenannäherung Netz: Ein neuartiger Diffusions-Modellansatz zur Fernerkundung von Änderungsunterschriften 面罩接近网:遥感变化说明的新传播示范方法 2412.19179v3
  • 757 07-07 Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems Quantifying Robustness: Ein Benchmarking-Framework für Deep Learning Forecasting in Cyber-Physical Systems 量化强度:网络物理系统深度学习预测基准框架 2504.03494v2
  • 758 07-07 ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints ConBatch-BAL: Batch Bayesian Active Learning unter Budgetbeschränkungen BAL:预算限制下的Bayesian Batch Bayesian积极学习 2507.04929v1
  • 759 07-07 Object-centric Denoising Diffusion Models for Physical Reasoning Objektzentrierte Denoising-Diffusionsmodelle für physikalische Vernunft 物理理由解释的传播模型 2507.04920v1
  • 760 07-07 LaCoOT: Layer Collapse through Optimal Transport LaCoOT: Layer Collapse durch optimalen Transport LaCOOT: 通过最佳迁移折叠图层 2406.08933v2
  • 761 07-07 Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning Gradient Purification: Verteidigung gegen den Giftangriff im dezentralisierten Föderierten Lernen 逐步净化:在权力下放的联邦学习中防范毒物攻击 2501.04453v3
  • 762 07-07 Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes Verstehen LLMs die Sicherheit ihrer Eingaben? Schulungsfreie Moderation über Latent Prototypen LLM女士了解其投入的安全性吗?通过中原原型进行无培训的调节 2502.16174v2
  • 763 07-07 Training-Conditional Coverage Bounds under Covariate Shift Training-Conditional Coverage Bounds unter Kovariate Shift 共同变换下的培训 – – 有条件的覆盖范围 2405.16594v3
  • 764 07-07 When do World Models Successfully Learn Dynamical Systems? Wann lernen Weltmodelle erfolgreich Dynamische Systeme? 世界模型何时成功学习动态系统? 2507.04898v1
  • 765 07-07 EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework EFRame: Deeper Reasoning über Exploration-Filter-Replay Stärkungs-Lernrahmen EFRame:通过探索-钻孔-回放-强化学习框架,更深的原因 2506.22200v3
  • 766 07-07 Fine-tuning on simulated data outperforms prompting for agent tone of voice Feinabstimmung auf simulierte Daten übertrifft Aufforderung für Agent Ton der Stimme 模拟数据优于工作表现的微调,激发了代理方声音的音调 2507.04889v1
  • 767 07-07 Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning Jenseits der Trainingszeit Vergiftung: Komponenten-Ebene und Nachschulung Hintertüren im Deep Reinforcement Learning 超越培训时间中毒:深强化学习的构成部分和培训后后后门 2507.04883v1
  • 768 07-07 Model-free Posterior Sampling via Learning Rate Randomization Modellfreies Polandor-Proben über Lernrate Randomisierung 通过学习率随机调整进行无模型的Pos别处抽样 2310.18186v2
  • 769 07-07 Adaptive Slimming for Scalable and Efficient Speech Enhancement Adaptive Schlankheit für skalierbare und effiziente Sprachverbesserung 促进可缩放和高效语音增强的适应性缩小 2507.04879v1
  • 770 07-07 Learned enclosure method for experimental EIT data Gelerntes Gehäuseverfahren für experimentelle EIT-Daten 经济转型期数据实验性数据累积方法 2504.11512v3
  • 771 07-07 ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment ReCAP: Rekursives Cross-Care-Netzwerk für die Pseudo-Label-Generation in Robotic Surgical Skills Assessment RECAP:机器人外科技能评估中优杜-拉贝尔一代人递递性交叉关注网络 2407.05180v4
  • 772 07-07 NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification NTSFormer: Ein selbstlernender Graphentransformator für multimodale Kaltstartknotenklassifikation NTS Former: 用于多式冷 Start- Start节点分类的自教形图变形器 2507.04870v1
  • 773 07-07 Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation Music Boomerang: Wiederverwendbare Diffusionsmodelle für Datenvergrößerung und Audiomanipulation 音乐 “ 潮动 “ :重新使用数据增强和音频操纵的传播模型 2507.04864v1
  • 774 07-07 Fairness and Sparsity within Rashomon sets: Enumeration-Free Exploration and Characterization Fairness und Sparsamkeit innerhalb der Rashomon-Sets: Enumeration-freie Exploration und Charakterisierung Rashomon各套内部的公平和公平:数字-自由勘探和定性 2502.05286v2
  • 775 07-07 Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models Verbesserung der Vorhersagen von Konvective Storm Wind Gusts durch statistische Nachbearbeitung von neuralen Wettermodellen 通过神经气象模型的统计后处理,改进对动态风暴风速风速素的预测 2504.00128v2
  • 776 07-07 Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu Auf dem Weg zum Menschen-in-the-Loop-Eingangserkennung: Ein Transfer-Lernansatz für Maracatu 走向人类在卢布Onset探测:Maracatu的转让学习方法 2507.04858v1
  • 777 07-07 Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos Räumliche und semantische Integration für Stereo Sound Event Lokalisierung und Detektion in regelmäßigen Videos 常规视频中立体声音事件定位和探测的空间和语义内嵌式整合 2507.04845v1
  • 778 07-07 PEVLM: Parallel Encoding for Vision-Language Models PEVLM: Parallele Kodierung für Vision-Language-Modelle PEVLM: 视觉语言模型平行编码 2506.19651v2
  • 779 07-07 RewardAnything: Generalizable Principle-Following Reward Models BelohnungAlles: Allgemeines Prinzip-folgende Belohnung Modelle 奖励任何奖励:可普遍适用的遵循原则的奖赏模式 2506.03637v2
  • 780 07-07 Discrete Diffusion Trajectory Alignment via Stepwise Decomposition Diskrete Diffusions-Trajektorie-Ausrichtung über Schrittweise Zersetzung 通过步进分解解解析分解扩散轨迹对齐 2507.04832v1
  • 781 07-07 A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization Ein High-Level-Compiler-Integrationsansatz für Deep-Learning-Beschleuniger, die Abstraktion und Optimierung unterstützen 支持抽象和优化的深学习加速器高级别汇编者一体化办法 2507.04828v1
  • 782 07-07 Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v3
  • 783 07-07 UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment UDF-GMA: Ungewissheitsentflechtung und Fusion für allgemeine Bewegungsbewertung UDF-GMA: 不确定的分解和融合促进总体运动评估 2507.04814v1
  • 784 07-07 Neural Velocity for hyperparameter tuning Neurale Geschwindigkeit für Hyperparameter-Tuning 用于超分光计调的神经速率 2507.05309v1
  • 785 07-07 Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning 与深创机学习相结合的活动参与和时间安排 2501.10221v3
  • 786 07-07 High Order Collaboration-Oriented Federated Graph Neural Network for Accurate QoS Prediction High Order Collaboration-Oriented Federated Graph Neural Network für genaue QoS-Vorhersage 高级秩序协作-以联邦州际同步预测神经网络 2507.05308v1
  • 787 07-07 Kalman Filter Aided Federated Koopman Learning Kalman Filter Aided Federated Koopman Learning Kalman过滤器帮助联邦库普曼学习 2507.04808v1
  • 788 07-07 Interpretable Machine Learning for Urban Heat Mitigation: Attribution and Weighting of Multi-Scale Drivers Verdolmetschbares maschinelles Lernen für die urbane Wärmemilderung: Attribution und Gewichtung von Multi-Scale-Treibern 缓解城市热量的可解释性机器学习:多规模驱动因素的归属和加权 2507.04802v1
  • 789 07-07 Training-Free Query Optimization via LLM-Based Plan Similarity Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v2
  • 790 07-07 Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen 交互式组合式动态规划:有效利用多种移动式数据集进行强力规划 2507.04790v1
  • 791 07-07 Machine Learning from Explanations Maschinelles Lernen aus Erklärungen 从解释中学习机器 2507.04788v1
  • 792 07-07 Reason to Rote: Rethinking Memorization in Reasoning Grund zu rotieren: Die Erinnerung an die Vernunft neu denken 旋转理由:重新思考在理由方面的记忆 2507.04782v1
  • 793 07-07 FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift FedPall: Prototypenbasiertes adversarisches und kollaboratives Lernen für Federated Learning mit Feature Drift FedPall: 以原型为基础的反versarial和协作学习,促进联邦学习与地物学 2507.04781v1
  • 794 07-07 Sure Convergence and Constructive Universal Approximation for Multi-Layer Neural Networks Sichere Konvergenz und konstruktive Universal-Annäherung für multi-layer-Neural-Netzwerke 多层神经神经网络的稳固趋同和建设性通用近似 2507.04779v1
  • 795 07-07 Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction Verbesserung des BERT für symbolisches Musikverständnis mit Token Denoising und Pianoroll Prediction 改进使用托肯否认学和钢琴预测学的符号音乐理解BERT 2507.04776v1
  • 796 07-07 Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces Integration biologischer und maschineller Intelligenz: Aufmerksamkeitsmechanismen in Gehirn-Computer-Schnittstellen 生物和机器综合情报:脑-计算机界面中的注意机制 2502.19281v2
  • 797 07-07 Efficient Unlearning with Privacy Guarantees Effizientes Lernen mit Datenschutzgarantien 以隐私保障高效率地不学习 2507.04771v1
  • 798 07-07 ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems ABench-Physics: Benchmarking Physical Reasoning in LLMs über hochschwierige und dynamische Physikprobleme ABench-物理:通过高难度和动态物理问题确定LLMLM中物理原因的基准 2507.04766v1
  • 799 07-07 Intervening to learn and compose disentangled representations Intervenieren, um entfremdete Darstellungen zu lernen und zu komponieren 干预学习和组成分解的表述 2507.04754v1
  • 800 07-07 Enhancing variational quantum algorithms by balancing training on classical and quantum hardware Verbesserung von variationalen Quantenalgorithmen durch Ausbalancieren von Trainings auf klassischer und Quantenhardware 通过平衡传统硬件和量子硬件的培训,加强变异量算法 2503.16361v2
  • 801 07-07 Improving Graph Out-of-distribution Generalization Beyond Causality Verbesserung der Graphen-Verallgemeinerung außerhalb der Verteilung jenseits der Kausalität 改善超出原因范围的普遍化的分布外图 2407.10204v3
  • 802 07-07 Enjoying Non-linearity in Multinomial Logistic Bandits Nichtlinearität in multinomialen Logistischen Banditen genießen 享受多种后勤大盗的非线性 2507.05306v1
  • 803 07-07 CueLearner: Bootstrapping and local policy adaptation from relative feedback CueLearner: Bootstrapping und lokale Politikanpassung durch relatives Feedback CueLearner:利用相对反馈推动和调整地方政策 2507.04730v1
  • 804 07-07 Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization Textentgiftung: Dateneffizienz, Semantische Erhaltung und Modellverallgemeinerung 文字解毒:数据效率、语义保护和模式通用 2507.01050v2
  • 805 07-07 Data Matters: The Case of Predicting Mobile Cellular Traffic Datenangelegenheiten: Der Fall der Vorhersage des mobilen Zellulären Verkehrs 数据事项:预测移动电话流量的案例 2411.02418v2
  • 806 07-07 Dementia Detection using Multi-modal Methods on Audio Data Demenzerkennung mit multimodalen Methoden auf Audiodaten 使用多模式音频数据方法检测痴呆症 2501.00465v2
  • 807 07-07 Uncertainty in Real-Time Semantic Segmentation on Embedded Systems Unsicherheit in der Echtzeit-Semantik-Segmentierung auf Embedded-Systemen 内嵌系统实时塞义分割的不确定性 2301.01201v5
  • 808 07-07 Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs 正式/非正式陈述和正式/非正式证明正式理由理由完全基准辩护人 2507.04719v1
  • 809 07-07 On the quality of randomized approximations of Tukey’s depth Zur Qualität der randomisierten Näherung der Tukey-Tiefe Tukey深度随机近似质量 2309.05657v3
  • 810 07-07 Optimal Model Selection for Conformalized Robust Optimization Optimale Modellauswahl für konformisierte Robuste Optimierung 正规化强力优化最佳模型选择最佳模型选择 2507.04716v1
  • 811 07-07 Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments Maximal sichere Sets mit Hypernetworks für MPC-basierte lokale Flugbahnplanung in unbekannten Umgebungen lernen 为未知环境中基于 MPC 的本地轨迹规划使用超网络学习最大安全设置 2410.20267v3
  • 812 07-07 Mutual Information Optimal Control of Discrete-Time Linear Systems Gegenseitige Information Optimale Kontrolle von Diskretzeit-Linearsystemen Discrete-Ti时线性系统最佳最佳控制 2507.04712v1
  • 813 07-07 Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model Geometrisch-geführte Wenig-heiße Dentalmark-Erkennung mit human-zentrischem Stiftungsmodell 使用人类研究中心基金会模型的几口热牙科地标探测 2507.04710v1
  • 814 07-07 Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication Spooky Action auf Distanz: Normalisierungsschichten aktivieren die räumliche Kommunikation mit Seitenkanal 远距离恐怖行动:普通化层使侧通道空间通信得以实现 2507.04709v1
  • 815 07-07 UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization UrbanMind: Auf dem Weg zu urbaner Allgemeinintelligenz durch Tool-Enhanced Retrieval-Augmented Generation und Multilevel-Optimierung 城市中心:通过工具增强的回收型和多层次优化实现城市一般情报 2507.04706v1
  • 816 07-07 Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards Balance Act: Priorisierungsstrategien für LLM-Designed Restless Bandit Rewards 平衡法:LLM-指派的无无无无无无无无无无无无无无无无有匪奖项的优先战略 2408.12112v4
  • 817 07-07 FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs FAMOUS: Flexibler Beschleuniger für den Aufmerksamkeitsmechanismus des Transformators auf UltraScale+ FPGAs FANOUS: 超标准+FPGAs变异器注意机制灵活加速器 2409.14023v3
  • 818 07-07 The Neural Networks with Tensor Weights and the Corresponding Fermionic Quantum Field Theory Die neuralen Netzwerke mit Tensorgewichten und die korrespondierende Fermionische Quantenfeldtheorie 具有Tensor重量的神经网络和相应的Fermionic 量子场理论 2507.05303v1
  • 819 07-07 ResQuNNs: Towards Enabling Deep Learning in Quantum Convolution Neural Networks ResQuNNs: Auf dem Weg zum Deep Learning in Quantum Convolution Neural Networks ResQuNNs:争取在量子革命神经网络中进行深入学习 2402.09146v6
  • 820 07-07 Synergistic Localization and Sensing in MIMO-OFDM Systems via Mixed-Integer Bilevel Learning Synergistische Lokalisierung und Sensing in MIMO-OFDM-Systemen über Mixed-Integer Bilevel Learning 通过混合内插双级学习,在MOIMO-OFDM系统中实现协同本地化和遥感 2507.07118v1
  • 821 07-07 Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation Performance-Evaluierung allgemeiner Zwecke Große Sprachmodelle für grundlegende lineare Algebra-Unterprogramme Code-Generierung 基本线性代代数子方案代码生成通用大语言模型绩效评价 2507.04697v1
  • 822 07-07 Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v5
  • 823 07-07 Interpretable Reward Modeling with Active Concept Bottlenecks Interpretierbare Prämienmodellierung mit Active Concept Engpässen 具有主动概念瓶颈的可解释的奖励模型 2507.04695v1
  • 824 07-07 Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness Überbrücken von KAN und MLP: MJKAN, eine hybride Architektur mit Effizienz und Expressivität 连接KAN和MLP:MJKAN,一个既效率又能表达的混合结构 2507.04690v1
  • 825 07-07 Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks Skalierungskollaps offenbart universelle Dynamiken in berechenbaren neuralen Netzwerken 计算最佳培训神经网络中的缩放 缩放 折叠 Revials 通用动态 2507.02119v2
  • 826 07-07 Fully Automatic Neural Network Reduction for Formal Verification Vollautomatische Neural-Netzwerk-Reduktion für formale Überprüfung 用于正式核查的全自动神经网络减少 2305.01932v3
  • 827 07-07 Recovering Plasticity of Neural Networks via Soft Weight Rescaling Wiederherstellung der Plastizität von neuralen Netzwerken durch weiche Gewichtsreskalierung 通过软重量缩放恢复神经网络的可塑性 2507.04683v1
  • 828 07-07 Operator-based machine learning framework for generalizable prediction of unsteady treatment dynamics in stormwater infrastructure Bedienerbasierter Rahmen für maschinelles Lernen zur allgemeinen Vorhersage unbeständiger Behandlungsdynamiken in der Sturmwasserinfrastruktur 以操作人员为基础的机械学习框架,以普遍预测暴雨水基础设施中的不稳定处理动态 2507.04682v1
  • 829 07-07 Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation Identifizieren, Isolieren und Spülen: Halluzinationen in LVLMs mittels selbst-entwickelnder Destillation abmildern 识别、孤立和清洗:通过自演蒸馏减少LVLMs中的幻觉 2507.04680v1
  • 830 07-07 QuTE: decentralized multiple testing on sensor networks with false discovery rate control QuTE: dezentrale Mehrfachprüfung in Sensornetzwerken mit falscher Discovery-Rate-Steuerung QuTE:对带有虚假发现率控制的传感器网络进行分散式多重测试 2210.04334v2
  • 831 07-07 Normality-Guided Distributional Reinforcement Learning for Continuous Control Normalitätsorientiertes Verstärkungslernen zur kontinuierlichen Kontrolle 为持续控制而指导的正常水平分配强化学习 2208.13125v4
  • 832 07-07 Enhancing Long Video Generation Consistency without Tuning Verbesserung der langen Video-Generierungskonsistenz ohne Tuning 增强长视频一代的连续性,不收费 2412.17254v2
  • 833 07-07 Universal approximation results for neural networks with non-polynomial activation function over non-compact domains Universelle Näherungsergebnisse für neuronale Netzwerke mit nicht-polynomischer Aktivierungsfunktion über nicht-kompakte Domänen 具有非常规领域非球体激活功能的神经网络的通用近似结果 2410.14759v4
  • 834 07-07 DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation DANCE: Ressourceneffiziente Neuralarchitektur-Suche mit Daten-Bewusstsein und kontinuierlicher Anpassung 包括数据软件和持续适应在内的资源效率高的神经结构搜索 2507.04671v1
  • 835 07-07 Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading Pensieve Grader: Eine KI-Powered, Ready-to-Use Plattform für mühelose handschriftliche STEM-Grading Pensieve grafer: 一个AI授权的无力手写STEM分级的现用平台 2507.01431v2
  • 836 07-07 Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks for Signal Data Augmentation in Ultra-precision Machining Surface Roughness Prediction Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks für Signaldaten Augmentation in Ultrapräzision Bearbeitung Oberflächenrauigkeit Vorhersage 超精密断层表面粗糙预测中信号数据增强条件性反向网络 2507.04665v1
  • 837 07-07 A Cycle-Consistency Constrained Framework for Dynamic Solution Space Reduction in Noninjective Regression Ein zyklisch-konsistenzbeschränktes Framework zur Reduzierung des Raumes dynamischer Lösungen bei nichtinjektiver Regression 循环-一致性协调框架,以在非预测性倒退中减少动态解决办法空间 2507.04659v1
  • 838 07-07 Quantum Doeblin Coefficients: Interpretations and Applications Quanten-Doeblin-Koeffizienten: Interpretationen und Anwendungen Doeblin系数:解释和应用 2503.22823v2
  • 839 07-07 VaxPulse: Monitoring of Online Public Concerns to Enhance Post-licensure Vaccine Surveillance VaxPulse: Überwachung der Online-Belange der Öffentlichkeit zur Verbesserung der Überwachung nach der Lizenzierung von Impfstoffen VaxPulse:监测公众在线关注加强许可证后疫苗监测 2507.04656v1
  • 840 07-07 Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models Breach in the Shield: Enthüllen der Schwachstellen großer Sprachmodelle 打破盾牌:消除大语言模式的脆弱性 2504.03714v2
  • 841 07-07 Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance Selbst-rektifizierende Diffusions-Probenahme mit Perturbed-Attention-Anleitung 使用受扰动的防扰动指导进行自我强化扩散抽样 2403.17377v2
  • 842 07-07 Decision Feedback In-Context Learning for Wireless Symbol Detection Entscheidungsfeedback In-Context-Lernen für drahtlose Symbolerkennung 无线电无线符号探测的网上学习 2503.16594v2
  • 843 07-07 SOSAE: Self-Organizing Sparse AutoEncoder SOSAE: Selbstorganisierender Sparse AutoEncoder SOSAE: 自行组织式的垃圾自动编码器 2507.04644v1
  • 844 07-07 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models? 快速困难能否预测为加速理据模型的RL微调而在线化? 2507.04632v1
  • 845 07-07 Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation Multimodales Latent Diffusionsmodell für komplexe Nähmuster-Generierung 复杂缝合模式生成的多式联运中流传播模式模型 2412.14453v2
  • 846 07-07 VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung VCDiag: 失灵千兆字节加速不规则波形分类 2506.03590v2
  • 847 07-07 Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences Multimodale LLM integrierte semantische Kommunikation für 6G Immersive Erfahrungen 用于6G吸收经验的多式多式LLM综合语义通信 2507.04621v1
  • 848 07-07 Information-Guided Diffusion Sampling for Dataset Distillation Informationsgeführte Diffusionsproben für die Datensatzdestillation 用于数据集蒸馏的信息引导扩散抽样 2507.04619v1
  • 849 07-07 AI for the Open-World: the Learning Principles KI für die offene Welt: die Lernprinzipien 开放世界大赦国际:学习原则 2504.14751v2
  • 850 07-07 Towards Cost-Effective Reward Guided Text Generation Auf dem Weg zu einer kosteneffizienten, von der Belohnung geleiteten Textgenerierung 实现成本-效益高的回报制制制制制制 2502.04517v2
  • 851 07-07 Network Topology Inference from Smooth Signals Under Partial Observability Netzwerk Topologie Schlussfolgerung von glatten Signalen unter teilweiser Beobachtung 局部可部分观测下平滑信号产生的网络地形推断 2410.05707v3
  • 852 07-07 Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v3
  • 853 07-07 any4: Learned 4-bit Numeric Representation for LLMs any4: Erlernte 4-Bit numerische Darstellung für LLMs 任何4: 获得4位数的LLM 数字代表数 2507.04610v1
  • 854 07-07 Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions Beschleunigtes Online-Verstärkungslernen mit Hilfe von Hilfs-Zustandsverteilungen 利用辅助启动州分发物资加速在线强化学习 2507.04606v1
  • 855 07-07 A Statistical Approach for Synthetic EEG Data Generation Ein statistischer Ansatz für die synthetische EEG-Datenerstellung 合成EEG数据生成的统计方法 2504.16143v2
  • 856 07-07 Predicting Drivers’ Route Trajectories in Last-Mile Delivery Using A Pair-wise Attention-based Pointer Neural Network Vorhersage der Routentrajektorien von Fahrern in der Last-Mile-Lieferung mit einem paarweise aufmerksamkeitsbasierten Neural-Netzwerk 利用以对等关注为基础的指针神经网络预测最后中程交付中的司机路线轨迹 2301.03802v2
  • 857 07-07 Photon Splatting: A Physics-Guided Neural Surrogate for Real-Time Wireless Channel Prediction Photon Splatting: Ein physikgeführtes Neural Surrogate für Echtzeit-Wireless-Channel-Vorhersage 光子喷射:实时无线频道预测的物理引导神经系统巡视 2507.04595v1
  • 858 07-07 Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M) Structured Captions verbessern die sofortige Einhaltung von Text-zu-Bild-Modellen (Re-LAION-Caption 19M) 结构说明 改进对文本到图像模型的迅速遵守(Re-LAION-Caption 19M) 2507.05300v1
  • 859 07-07 Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation Gradient-geführte Aufmerksamkeitskarte Bearbeitung: Auf dem Weg zu einer effizienten kontextuellen Halluzination 逐步制导注意地图编辑:努力有效减少环境幻觉 2503.08963v2
  • 860 07-07 Label-free evaluation of lung and heart transplant biopsies using tissue autofluorescence-based virtual staining Labelfreie Bewertung von Lungen- und Herztransplantationsbiopsien unter Verwendung von Gewebe-Autofluoreszenz-basierter virtueller Färbung 利用组织基于自体荧光的虚拟污点,对肺部和心脏移植活性细胞进行无标签的无标签评价 2409.05255v2
  • 861 07-07 A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network Ein leichtes Deep-Learning-Modell für automatische Modulationsklassifikation mit Dual Path Deep Residual Shrinkage Network 使用双路深海残余缩小网络自动调整分类的轻量深学习模式 2507.04586v1
  • 862 07-07 Distributionally Robust Active Learning for Gaussian Process Regression Distributionell robustes aktives Lernen für Gaußsche Prozessregression Gaussian 进程倒退的分布强力积极学习 2502.16870v3
  • 863 07-06 (7) Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates Iterative lineare quadratische Optimierung für nichtlineare Steuerung: Differentielle Programmierung algorithmischer Vorlagen 非线性控控的迭代线性线性赤道优化:可区分的编程 2207.06362v2
  • 864 07-06 DMesh++: An Efficient Differentiable Mesh for Complex Shapes DMesh++: Ein effizientes, differenzierbares Mesh für komplexe Formen DMEsh++: 用于复杂形状的高效区别网格 2412.16776v2
  • 865 07-06 Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes Sensor-fusionsbasierte Prognostik für Deep-Space-Habitate, die mehrere nicht markierte Ausfallmodi ausstellen 展示多种无标签故障模式的深空生境感应集成预测 2411.12159v3
  • 866 07-06 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts Nile-Chat: Ägyptische Sprachmodelle für arabische und lateinische Scripte Nile-Chat:阿拉伯文和拉丁文脚本埃及语言模型 2507.04569v1
  • 867 07-06 Evaluating LLMs on Real-World Forecasting Against Human Superforecasters Bewertung von LLMs bei der Real-World-Prognose gegen menschliche Superforecaster 评价针对人类超先天预测者进行真实世界预测的LLMs评价项目 2507.04562v1
  • 868 07-06 PENCIL: Long Thoughts with Short Memory PENCIL: Lange Gedanken mit kurzer Erinnerung PENCIL:有短记忆的长思考 2503.14337v2
  • 869 07-06 SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection SPIRA: Aufbau eines intelligenten Systems zur Erkennung von respiratorischer Insuffizienz SPIRA: 建立呼吸系统不足检测智能系统 2507.04548v1
  • 870 07-06 Long Context Modeling with Ranked Memory-Augmented Retrieval Langes Kontextmodellieren mit Rangfolgememory-Augmented Retrieval 长处建模,有升级的内存增强检索模型 2503.14800v2
  • 871 07-06 Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data Gradienten Steigerung Entscheidung Bäume auf medizinische Diagnose über Tabellendaten 关于表格数据上医学诊断的渐进推动决策树 2410.03705v4
  • 872 07-06 da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs da4ml: Verteilte Arithmetik für Echtzeit-Neurale Netzwerke auf FPGAs da4ml: FPGAs 实时神经网络的分布式重新测量 2507.04535v1
  • 873 07-06 DP-Fusion: Token-Level Differentially Private Inference for Large Language Models DP-Fusion: Token-Level Differentially Private Inferenz für große Sprachmodelle DP-Fusion: 大语言模式的调控层面的私人不同推论 2507.04531v1
  • 874 07-06 Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness Von ähnlichen linearen Darstellungen lernen: Adaptivität, Minimaximität und Robustheit 从类似线性表示法学习:适应性、微量和强力 2303.17765v5
  • 875 07-06 Selective Prediction via Training Dynamics Selektive Vorhersage über Trainingsdynamik 通过培训动态进行选择性预测 2205.13532v4
  • 876 07-06 Verification of Visual Controllers via Compositional Geometric Transformations Verifizierung von visuellen Controllern durch kompositorische geometrische Transformationen 通过组成几何变形核查视觉控制器 2507.04523v1
  • 877 07-06 Diffusion Explorer: Interactive Exploration of Diffusion Models Diffusion Explorer: Interaktive Erforschung von Diffusionsmodellen 传播探索:传播模型的互动探索 2507.01178v2
  • 878 07-06 Toward Universal Laws of Outlier Propagation Auf dem Weg zu universellen Gesetzen der Ausreißerpropagation 走向世界外阴外传播法 2502.08593v3
  • 879 07-06 DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging DOTResize: Reduzierung der LLM-Breite durch diskrete Optimale transportbasierte Neuronenverschmelzung DOTRIZ: 通过分立优化运输最佳方式的中子合并减少 LLM 宽度 2507.04517v1
  • 880 07-06 On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems Über die Nützlichkeit von Domain-Adjacent Fine-Tuned Model Ensembles für Probleme mit wenigen Fotos 关于Dome-邻接精度精美设计模型集的效用问题 2406.13720v2
  • 881 07-06 Collapse-Proof Non-Contrastive Self-Supervised Learning Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens 非封闭性自我监督学习的折叠结果 2410.04959v4
  • 882 07-06 Mixtures of Gaussian Process Experts with SMC$^2$ Mischungen von Gaußschen Prozessexperten mit SMC$^2$ 高斯进程专家与SMC合金2美元混合 2208.12830v2
  • 883 07-06 Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation Ein dimensionaler Zusammenbruch hin zu einer umfassenden universellen Domain-Anpassung 应对走向全面通用域域适应的方位折叠 2410.11271v3
  • 884 07-06 Self-Consistency Preference Optimization Selbst-Konsistenz-Präferenz-Optimierung 优化优惠 2411.04109v3
  • 885 07-06 Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference Tausende-Brains-Systeme: Sensorimotorische Intelligenz für schnelles, robustes Lernen und Rückschlüsse 千字系统:用于快速、强力学习和推断的感知分子情报 2507.04494v1
  • 886 07-06 Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models Machine Learning-based Prediction of Metal-Organic Framework Materials: Eine vergleichende Analyse mehrerer Modelle 对金属-有机框架材料的基于学习的机 机 动 学习-预测:对多种模型的比较分析 2507.04493v1
  • 887 07-06 Dealing with Uncertainty in Contextual Anomaly Detection Umgang mit Ungewissheit bei der kontextuellen Anomalienerkennung 处理内地异常探测的不确定性 2507.04490v1
  • 888 07-06 Source Attribution in Retrieval-Augmented Generation Quellenzuweisung in der retrieval-angereicherten Generation 来源来源来源的归属 2507.04480v1
  • 889 07-06 Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice Minimax und Buchten Optimale Best-Arm-Identifikation: Adaptives Experimentaldesign für die Behandlungswahl Minimax和Bayes 最佳最佳武器识别:治疗选择的适应性实验设计 2506.24007v2
  • 890 07-06 Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models Modellinversionsangriffe auf Llama 3: PII aus großen Sprachmodellen extrahieren 3:从大语言模型中提取PII 2507.04478v1
  • 891 07-06 SHyPar: A Spectral Coarsening Approach to Hypergraph Partitioning ShyPar: Ein spektraler, gewichtiger Ansatz zur Hypergraphenpartitionierung SHyPar: 高射线分割的光谱粗化方法 2410.10875v3
  • 892 07-06 ESSA: Evolutionary Strategies for Scalable Alignment ESSA: Evolutionäre Strategien für eine skalierbare Ausrichtung ESSA:可缩放调整的演变战略 2507.04453v1
  • 893 07-06 Transfer Learning in Infinite Width Feature Learning Networks Transfer Lernen in unendlichen Breite Feature Lernnetzwerke 无限宽地学习网络中的转移学习 2507.04448v1
  • 894 07-06 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität KnowRL:探索知识强化学习促进事实质量 2506.19807v2
  • 895 07-06 AutoMind: Adaptive Knowledgeable Agent for Automated Data Science AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft 自动Mind:自动数据科学适应性知识代理 2506.10974v2
  • 896 07-06 The Joys of Categorical Conformal Prediction Die Freuden der kategorischen konformen Vorhersage 分类共变预言的欢乐 2507.04441v1
  • 897 07-06 Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities Quantenalgorithmen für Banditen mit Knapsacks mit verbesserter Reue und Zeitkomplexitäten 带克纳普书包、有改进的悔恨和时间复杂性的强盗的量图量算法 2507.04438v1
  • 898 07-06 Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain Herstellung einer Pipeline-Produktion: Herausforderungen und Lektionen im Bereich Healthcare 《管道生产-准备:保健领域的挑战和经验教训》 2506.06946v3
  • 899 07-06 Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models Rekonstruieren biologischer Pfade durch Anwendung selektiven Inkrementellen Lernens auf (Sehr) kleine Sprachmodelle 通过将选择性增量学习应用到(老式)小型语言模式,重建生物途径 2507.04432v1
  • 900 07-06 ARMR: Adaptively Responsive Network for Medication Recommendation ARMR: Adaptiv Responsive Network for Medicination Empfehlung ARMR: 药品建议适应性反应网络 2507.04428v1
  • 901 07-06 Accelerated Diffusion Models via Speculative Sampling Beschleunigte Diffusionsmodelle durch spekulative Probenahme 通过投机抽样加速传播模型 2501.05370v2
  • 902 07-06 Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive Learning Verbesserung der textbasierten Hierarchischen Multilabel-Klassifikation für mobile Anwendungen durch Kontrastives Lernen 通过差异学习加强用于移动应用的基于文本的等级式等级式多标签标签分类 2507.04413v1
  • 903 07-06 Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning Konvergenz und Stichprobenkomplexität von Methoden erster Ordnung für das agnostische Verstärkungslernen 一等加强教育方法的趋同性和样本复杂性 2507.04406v1
  • 904 07-06 Convergence of Policy Mirror Descent Beyond Compatible Function Approximation Konvergenz der Politik Spiegel Abstieg jenseits kompatibler Funktion Annäherung 超出相兼容功能相似性的政策从镜面产生的趋同 2502.11033v3
  • 905 07-06 KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs KERAP: Ein wissensbasierter Ansatz für genaue Null-Shot-Diagnose-Vorhersage mit Multi-Agent LLMs KERRAP: 利用多种试剂LLMs进行准确零光诊断预测的知识强化理由说明方法 2507.02773v2
  • 906 07-06 Have LLMs Made Active Learning Obsolete? Surveying the NLP Community Haben LLMs aktives Lernen überholt? Vermessung der NLP-Gemeinschaft LLM女士是否做过积极学习过时? 2503.09701v2
  • 907 07-06 Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization Inverses Verstärktes Lernen mit Revealed Preferences und passiver stochastischer Optimierung 利用公开首选项和被动斯托口优化进行反强化学习 2507.04396v1
  • 908 07-06 Continual Visual Reinforcement Learning with A Life-Long World Model Kontinuierliches Visuelles Verstärktes Lernen mit einem lebenslangen Weltmodell 采用 “ 终身世界模式 “ 的持续视力强化学习 2303.06572v2
  • 909 07-06 Linear Bandits with Partially Observable Features Lineare Banditen mit teilweise beobachtbaren Eigenschaften 带有部分可部分观察特征的线形强盗 2502.06142v2
  • 910 07-06 Tractable Representation Learning with Probabilistic Circuits Tractable Representative Learning mit probabilistischen Schaltungen 利用概率电路进行可追踪的代表性学习 2507.04385v1
  • 911 07-06 SCAWaveNet: A Spatial-Channel Attention-Based Network for Global Significant Wave Height Retrieval SCAWaveNet: Ein raumgreifendes, auf Aufmerksamkeit basierendes Netzwerk für globales, signifikantes Wellenhöhen-Retrieval SCAWaveNet:全球重大波高高度回收空间通道关注网 2507.00701v2
  • 912 07-06 Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic Übertragung der visuellen Erklärbarkeit von selbsterklärenden Modellen durch Task Arithmetic 通过任务重新定性转移自我解释模型的可视解释性 2507.04380v1
  • 913 07-06 Hierarchical Reinforcement Learning with Targeted Causal Interventions Hierarchisches Stärkungslernen mit gezielten Kausalinterventionen 具有有针对性的因果干预措施的等级强化学习 2507.04373v1
  • 914 07-06 Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification Adaptive Malware-Erkennung mit sequentieller Feature-Auswahl: Ein duellierendes Double Deep Q-Network (D3QN) Framework für intelligente Klassifizierung 使用序列特征选择选择的适应性恶意检测:智能分类双重深Q网络框架(D3QN) 2507.04372v1
  • 915 07-06 Time2Agri: Temporal Pretext Tasks for Agricultural Monitoring Time2Agri: Temporale Vortextaufgaben für die Agrarüberwachung Time2Agri: 农业监测时间预测任务 2507.04366v1
  • 916 07-06 K Nearest Neighbor-Guided Trajectory Similarity Learning K Näherste Nachbar-geführte Flugbahn Ähnlichkeit Lernen Kneeest 邻里引导轨迹相似性学习 2502.00285v2
  • 917 07-06 Information-theoretic Quantification of High-order Feature Effects in Classification Problems Information-theoretische Quantifizierung von High-Order-Feature-Effekten bei Klassifizierungsproblemen 对分类问题中高顺序地物效应的信息理论定量 2507.04362v1
  • 918 07-06 AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection AVTENet: Ein vom Menschen erkannten, inspirierten Audio-Visual Transformer-basiertes Ensemble-Netzwerk für Video-Detektion Deepfake AVTENet:一个以视频深假探测视频合成网络 2310.13103v2
  • 919 07-06 Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models Effiziente Verwirrtheit und Verhältnisbindung in diskreten Diffusions-Sprachmodellen 分散传播语言模型中高效的双倍宽度和比例匹配 2507.04341v1
  • 920 07-06 Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback Interactive Groupwise Vergleich für verstärktes Lernen aus menschlichem Feedback 利用人类反馈加强学习交互式群体比较 2507.04340v1
  • 921 07-06 Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting 智能网格负载预测的维护隐私合作拆分学习框架 2403.01438v3
  • 922 07-06 Consistent Labeling Across Group Assignments: Variance Reduction in Conditional Average Treatment Effect Estimation Konsequente Beschriftung über Gruppenzuweisungen: Varianzreduzierung in bedingter durchschnittlicher Behandlungseffektschätzung 各组任务统一标签标签:减少在条件平均治疗效果估计方面的差异 2507.04332v1
  • 923 07-06 A note on the unique properties of the Kullback–Leibler divergence for sampling via gradient flows Ein Hinweis auf die einzigartigen Eigenschaften der Kullback–Leiibler Divergenz für die Probenahme über Gradientenströme 关于通过梯度流取样的Kullback-Librec-Libber差异的独特特性的说明 2507.04330v1
  • 924 07-06 TinyProto: Communication-Efficient Federated Learning with Sparse Prototypes in Resource-Constrained Environments TinyProto: Kommunikationseffizientes Federated Learning mit Sparse Prototypen in ressourcengebundenen Umgebungen TinyProto:在资源紧张的环境中,与浅质原型进行交流-高效的联邦学习 2507.04327v1
  • 925 07-06 Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures Benachteiligung von Angriffsdaten: SCANIAs Erfahrung zur Verbesserung von Cybersicherheitsmaßnahmen im Fahrzeug 减轻攻击数据稀缺性:SCANIA在加强车辆内部网络安全措施方面的经验 2507.02607v2
  • 926 07-06 CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning CLIP-RL: Chirurgische Szene-Segmentierung mit Kontrastiv-Sprach-Vision Vor- und Verstärkungs-Lernen CLIP-RL: 使用矛盾语言视力的外科区分解 2507.04317v1
  • 927 07-06 A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment Ein ursächliches Weltmodell für die nächste Token-Vorhersage: GPT in einer kontrollierten Umgebung erforschen 由下到下预测所根据的因果世界模式:在受控制的环境中探索GPT 2412.07446v4
  • 928 07-06 Heterogeneous Federated Learning with Prototype Alignment and Upscaling Heterogenes Föderiertes Lernen mit Prototypenausrichtung und Upscaling 具有原型调整和升级的异异质联邦学习 2507.04310v1
  • 929 07-06 Domain Adaptation of Drag Reduction Policy to Partial Measurements Bereich Anpassung der Drag-Reduction-Politik an Teilmessungen 将减拖量政策调整为部分计量 2507.04309v1
  • 930 07-06 Learning Classifiers That Induce Markets Lernklassifikatoren, die Märkte induzieren 招生市场学习分类 2502.20012v2
  • 931 07-06 Agnostic Online Learning and Excellent Sets Agnostic Online-Lernen und ausgezeichnete Sets 无名在线学习和优秀套 2108.05569v3
  • 932 07-06 Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction Verbesserung der Neuralen Autoregressiven Verteilungsstimulatoren für die Bildrekonstruktion 图像重建的加强神经自动递减分布动动测算器 2506.05391v2
  • 933 07-06 Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization Adversariale Daten Augmentation für Single Domain Generalization via Lyapunov Exponent-Guided Optimization 通过 Lyapunov 指导热源引导优化化对单域的对数据增强 2507.04302v1
  • 934 07-06 QF: Quick Feedforward AI Model Training without Gradient Back Propagation QF: Quick Feedforward KI-Modelltraining ohne gradiente Back Propagation QF: 快速进料推进AI 示范培训,不逐步后推推进 2507.04300v1
  • 935 07-06 Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents Intelligente Assistenten für die Semiconductor Failure Analysis mit LLM-basierten Planungs-Agenten 与基于LLM的规划代理人进行半导体故障分析的智能助理 2506.15567v2
  • 936 07-06 Learning collision risk proactively from naturalistic driving data at scale Kollisionsrisiko proaktiv aus naturalistischen Fahrdaten im Maßstab lernen 从规模的自然主义驾驶数据中积极主动地学习碰撞风险 2505.13556v3
  • 937 07-06 Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping Robustes Federated Learning Over the Air: Mit Median Anchored Clipping gegen schwerfällige Geräusche 强力的联邦在空中学习:用中度排成的剪接打击重击噪音 2409.15100v6
  • 938 07-06 Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning Skalierung von Iterationsnetzwerken auf 5000 Ebenen für extreme Langzeitplanung 用于极端长期规划的 向5000层扩展值迭代网络 2406.08404v2
  • 939 07-06 VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning VOLTRON: Unbekannte Malware mit graphisch basiertem Zero-Shot-Lernen erkennen VOLTRON: 使用基于图表的零热学习检测未知的恶意软件 2507.04275v1
  • 940 07-06 EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research EDALearn: Ein umfassender RTL-to-Signoff EDA Benchmark für Demokratisierung und reproduzierbare ML für EDA-Forschung EDALearn: EDA 研究的民主化和可复制的 ML 综合RTL-Sign-Sign-Sign-Signoff EDA基准 2312.01674v2
  • 941 07-06 Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need Effiziente Ausbildung von Deep Networks unter Verwendung einer geführten Spektraldatenauswahl: Ein Schritt auf dem Weg zu lernen, was man braucht 利用向导光谱数据选择对深网络进行有效培训:逐步学习你需要什么 2507.04269v1
  • 942 07-06 An Explainable Transformer Model for Alzheimer’s Disease Detection Using Retinal Imaging Ein erklärbares Transformer-Modell für die Alzheimer-Erkennung mit retinaler Bildgebung 阿尔茨海默氏病利用视网膜成像检测疾病可解释的变形模型 2507.04259v1
  • 943 07-06 Bandit Pareto Set Identification in a Multi-Output Linear Model Bandit Pareto Set-Identifikation in einem linearen Multi-Output-Modell 在多输出线性模型中设定多输出线性模型中的标识 2507.04255v1
  • 944 07-06 FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services FedAli: Personalisiertes Federated Learning Alignment mit Prototypen-Layers für Generalized Mobile Services FedAli:通用移动服务中个性化联邦学习与原型图层相适应 2411.10595v2
  • 945 07-06 ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments ATwo-Stage-Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments 分布式计算环境中微射线数据分类的两步组合特征选择和粒子摇动优化方法 2507.04251v1
  • 946 07-06 Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning Just Enough Shifts: Mit gezielten Darstellungs-Fine-Tunings überwiderliche Sprachmodelle abmildern 仅作足够的改动:减少在有针对性地派有定额代表的定时派定额代表制统一语言模式中的过度拒绝 2507.04250v1
  • 947 07-06 GLU Attention Improve Transformer GLU Aufmerksamkeit verbessern Transformer GLU 关注改善变换器 2507.00022v2
  • 948 07-06 Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning Inertiale quadratische Majorisierung Minimierung mit Anwendung auf Kernel Regularisiertes Lernen 内核正规化学习应用的惯性二次二次二次多数化最小化 2507.04247v1
  • 949 07-06 Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics Mit maschinellem Lernen Parsimonious und physisch-interpretable Darstellungen von Catchment-Scale Rainfall-Runoff Dynamics entdecken 利用机器学习发现抓捕规模降雨量规模降雨运行动态的分解和物理解释性说明 2412.04845v4
  • 950 07-06 EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models EasyEdit2: Ein einfach zu bedienender Steuerungsrahmen für die Bearbeitung großer Sprachmodelle EasyEdit2: 编辑大语言模式的易用指导框架 2504.15133v2
  • 951 07-06 Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment Music102: Ein $D_{12}$-equivarianter Transformator für Akkordprogressionsbegleitung 音乐102: 用于伴奏合奏进度的 $D12} QQ变压器 2410.18151v2
  • 952 07-06 Scaling Context Requires Rethinking Attention Skalierungskontext erfordert nachdenkliche Aufmerksamkeit 需要重新思考时重新思考环境 2507.04239v1
  • 953 07-06 Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering Tensor-basiertes Graphenlernen mit Konsistenz und Spezifität für Multi-View-Clustering 多观点群集的具有一致性和针对性的基于时标的图示学习 2403.18393v3
  • 954 07-06 Normalized Iterative Hard Thresholding for Tensor Recovery Normalisierte iterative Hard Thresholding für Tensor Erholung 恢复Tensor的标准化迭代硬推进手段 2507.04228v1
  • 955 07-06 Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling Erklärbare KI für psychische Gesundheit Notfallrückkehren: Integration von LLMs mit vorausschauender Modellierung 心理健康紧急返回可解释的AI:将LLMs与预测模型相结合 2502.00025v4
  • 956 07-06 Zero-Shot Cyclic Peptide Design with Composable Geometric Conditions Zero-Shot Cyclic Peptid Design mit komposierbaren geometrischen Bedingungen 具有可合成几何条件的零热相流式五氯苯设计 2507.04225v1
  • 957 07-06 Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning Interface Laplace Learning: Erlernbarer Interface Term hilft semi-überwachtes Lernen 界面 Laplace 学习: 可学习的界面定期帮助半监督学习 2408.05419v2
  • 958 07-06 Context Tuning for In-Context Optimization Kontext-Tuning für In-Context-Optimierung Intext 优化的上下文图示 2507.04221v1
  • 959 07-06 Particle Trajectory Representation Learning with Masked Point Modeling Partikel-Trajektoriendarstellung Lernen mit maskierten Punktmodellierung 使用蒙面点建模模型的粒子轨迹代表学习 2502.02558v3
  • 960 07-06 Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs Modellkollaps ist kein Fehler, sondern ein Feature in Machine Unlearning für LLMs 模型折叠不是臭虫,而是机器为 LLM 取消学习的特写 2507.04219v1
  • 961 07-06 CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v2
  • 962 07-06 Confidence Intervals for Evaluation of Data Mining Vertrauensintervalle für die Evaluation von Data Mining 评价数据采矿的互信评价 2502.07016v2
  • 963 07-06 Backdooring Bias ($B^2$) into Stable Diffusion Models Hintertürige Bias ($B^2$) in stabile Diffusionsmodelle 后门Bisas (B2美元) 进入稳定扩散模型 2406.15213v4
  • 964 07-06 An explicit formulation of the learned noise predictor $ε_θ({\bf x}t, t)$ via the forward-process noise $ε{t}$ in denoising diffusion probabilistic models (DDPMs) Eine explizite Formulierung des erlernten Noise Predictors $ε_θ({\bf x}t, t)$ über das Forward-Process-Noise $ε{t}$ in demnoising diffusion probabilistic models (DDPMs) 通过在排除扩散概率模型(DDPMs)中的前处理噪声,明确配制所学的噪音预测器$(~bf xt, t)$的清晰配方 2507.04203v1
  • 965 07-06 Predicting Air Pollution in Cork, Ireland Using Machine Learning Vorhersage der Luftverschmutzung in Cork, Irland durch maschinelles Lernen 利用机器学习预测爱尔兰科克的空气污染 2507.04196v1
  • 966 07-06 Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning Mixed-Sample SGD: Eine End-to-End-Analyse des Supervised Transfer Learning 混合抽样 SGD:监督转移学习的端到端分析 2507.04194v1
  • 967 07-06 JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling JAX-MPM: Ein Learning-Augmented Differentiable Meshfree Framework für GPU-beschleunigte lagrangianische Simulation und geophysikalische Inverse Modellierung JAX-MPM:GPU-加速拉格朗日模拟和地球物理反建模的学习增强型差异网状框架 2507.04192v1
  • 968 07-05 (6) Where to Intervene: Action Selection in Deep Reinforcement Learning Wo man interveniert: Aktionsauswahl im Deep Reinforcement Learning 在哪里干预:深强化学习中的行动选择 2507.04187v1
  • 969 07-05 Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models Bayesian Multiobject Tracking mit neural-erweiterten Bewegungs- und Messmodellen Bayesian 多功能物体跟踪,以神经强化机动和测量模型跟踪 2506.18124v2
  • 970 07-05 AASeg: Attention Aware Network for Real Time Semantic Segmentation AASeg: Achtung im Netzwerk für Echtzeit-Semantik-Segmentierung AASeg:实时语义区段注意网络 2108.04349v4
  • 971 07-05 Graded Neural Networks Eingestufte neurale Netze 神经网络 2502.17751v2
  • 972 07-05 Uncertainty Quantification in the Tsetlin Machine Unsicherheit Quantifizierung in der Tsetlin Maschine Tsetlin 机器的不确定性量化 2507.04175v1
  • 973 07-05 Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization Struktur als Suche: Unüberwachtes Permutationslernen für kombinatorische Optimierung 结构搜索: 用于组合优化的未经监督的变异学习 2507.04164v1
  • 974 07-05 Enhancing Learning Path Recommendation via Multi-task Learning Verbesserung der Empfehlung über Lernpfade durch Multi-Task-Learning 通过多任务学习加强学习途径建议 2507.05295v1
  • 975 07-05 SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network SEZ-HARN: Selbsterklärbares Null-Shot-Netzwerk für die Anerkennung menschlicher Aktivität SEZ-HARN: 自爆零弹人活动识别网 2507.00050v2
  • 976 07-05 Geometric Deep Learning for Computer-Aided Design: A Survey Geometrisches Deep Learning für computergestütztes Design: Eine Umfrage 计算机辅助设计深深深学习:调查 2402.17695v2
  • 977 07-05 Physics-informed neural networks and neural operators for a study of EUV electromagnetic wave diffraction from a lithography mask Physik-informierte neuronale Netzwerke und neuronale Operatoren für eine Untersuchung der elektromagnetischen Wellenbeugung von EUV aus einer Lithographiemaske 物理知情神经网络和神经操作器,用于研究欧洲紫外线电磁波从平面印刷面罩的碎片 2507.04153v1
  • 978 07-05 Pedestrian Intention Prediction via Vision-Language Foundation Models Fußgänger-Intentionsvorhersage über Vision-Language-Stiftungsmodelle 通过视觉语言基金会模型预测行人意图 2507.04141v1
  • 979 07-05 Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles Driver-Net: Multi-Kamera Fusion für die Bewertung von Driver Take-Over Readiness in Automated Vehicles 司机-网络:评估自动车辆驾驶员准备就绪情况评估驱动驾驶员采用自动车辆多车队合并 2507.04139v1
  • 980 07-05 Token Level Hallucination Detection via Variance in Language Models Token Level Halluzination Erkennung durch Varianz in Sprachmodellen 通过语言模式差异探测洞穴级幻觉 2507.04137v1
  • 981 07-05 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v3
  • 982 07-05 Online Convex Optimization with Switching Cost with Only One Single Gradient Evaluation Online Convex-Optimierung mit Schaltkosten mit nur einem einzigen Gradienten-Bewertung 仅用单级梯度评价以切换成本实现在线 Convex优化 2507.04133v1
  • 983 07-05 False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems Falsche Alarme, wirkliche Schäden: Widerwärtige Angriffe mit LLM-basierten Modellen auf textbasierten Cyber Threat Intelligence-Systemen 假警报、真实损害:利用基于LLM的基于LLM的基于文本的网络威胁信息系统模型进行对抗性攻击 2507.06252v1
  • 984 07-05 Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics Graphische Neuralnetze als Ersatz für Transformer in Single-Cell Transcriptomics 以神经网络替代单子转换器变换器的神经网络 2507.04125v1
  • 985 07-05 Diversity Conscious Refined Random Forest Diversity Bewusst Verfeinerter Zufallswald 改良的随机森林 2507.00467v2
  • 986 07-05 Graph Inverse Style Transfer for Counterfactual Explainability Graph Inverse Style Transfer für gegenfaktische Erklärbarkeit 反事实解释的反反向样式传输 2505.17542v2
  • 987 07-05 Model selection for stochastic dynamics: a parsimonious and principled approach Modellauswahl für stochastische Dynamik: ein parsimonischer und prinzipienorientierter Ansatz 随机动态模型选择模式:有原则、有原则的方法 2507.04121v1
  • 988 07-05 When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need Wenn datenfreie Wissensdestillation auf nicht übertragbaren Lehrer trifft: Aus-von-Distribution-Falle zu entkommen ist alles, was Sie brauchen 当数据-无知识蒸馏遇到非可转让教师时:摆脱分配办法就是你所需要的一切 2507.04119v1
  • 989 07-05 Relational inductive biases on attention mechanisms Relationale induktive Verzerrungen der Aufmerksamkeitsmechanismen 对关注机制的关系诱导偏见 2507.04117v1
  • 990 07-05 How to Train Your LLM Web Agent: A Statistical Diagnosis Wie Sie Ihren LLM Web Agent trainieren: Eine statistische Diagnose 如何训练你的LLM网络代理:统计分析 2507.04103v1
  • 991 07-05 How more data can hurt: Instability and regularization in next-generation reservoir computing Wie mehr Daten schaden können: Instabilität und Regularisierung im Reservoir Computing der nächsten Generation 下一代储油层计算中的不稳定性和正规化 2407.08641v3
  • 992 07-05 Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems Hierarchische Tests mit Rabbit-Optimierung für industrielle Cyber-Physische Systeme 工业网络物理系统对兔子最佳优化进行等级测试 2507.04100v1
  • 993 07-05 A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory A LoD of Gaussians: Unified Training und Rendering für Ultra-Large Scale Rekonstruktion mit externem Speicher Gaussian AD: 利用外部记忆进行超大型重建的统一培训和投标 2507.01110v2
  • 994 07-05 Accurate and Efficient World Modeling with Masked Latent Transformers Präzise und effiziente Weltmodellierung mit maskierten Latententransformatoren 以隐蔽的冷流变形器建立准确而高效的世界模型 2507.04075v1
  • 995 07-05 Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering Diffusionsmodelle lernen Low-Dimensional Distributionen über Subspace Clustering 通过子空间聚集学习低多样性分布的传播模型 2409.02426v4
  • 996 07-05 Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering Jenseits von unabhängigen Passagen: Adaptive Passage-Kombinations-Retrieval für retrieval Augmented Open-Domain-Fragebeantwortung 超过独立通过通道: 用于检索的适应性被动通道组合检索, 以获取更多开放域 问题解答 2507.04069v1
  • 997 07-05 Attributing Data for Sharpness-Aware Minimization Zuordnung von Daten zur Schärfe-Bewusst-Minimierung 用于尽量减少尖锐度( 最小化)软件的数据 2507.04059v1
  • 998 07-05 Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms Aggregierende Konzepte von Fairness und Genauigkeit in Vorhersagen Algorithmen 综合预测数值中的公平和准确概念 2505.08829v3
  • 999 07-05 Membership Inference Attacks on Large-Scale Models: A Survey Membership Inferenz Attacks on Large-Scale Models: Eine Umfrage 成员对大型模型的推断攻击:调查 2503.19338v2
  • 1000 07-05 Predictive Modeling of Effluent Temperature in SAT Systems Using Ambient Meteorological Data: Implications for Infiltration Management Prädiktive Modellierung der Effluenttemperatur in SAT-Systemen unter Verwendung von Ambient Meteorological Data: Implikationen für Infiltrationsmanagement 利用环境气象数据利用环境气象数据对SAT系统中的废水温度进行预测性模型建模:对渗透管理的影响 2507.04050v1
  • 1001 07-05 Discrete Diffusion in Large Language and Multimodal Models: A Survey Diskrete Diffusion in großen Sprach- und Multimodalmodellen: Eine Umfrage 大语言和多模式模型的分解传播:调查 2506.13759v3
  • 1002 07-05 Physics-Informed Graph Neural Networks to Reconstruct Local Fields Considering Finite Strain Hyperelasticity Physik-informierte Graph-Neural-Netzwerke, um lokale Felder unter Berücksichtigung der Finite Strain Hyperelastizität zu rekonstruieren 物理成形图形神经网络,以重建考虑到有限斯特兰超弹性的局部场 2507.05291v1
  • 1003 07-05 Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks Benchmarking stochastischer Annäherungsalgorithmen für Fairness-beschränkte Ausbildung von tiefen neuralen Netzwerken 为深神经网络的公平-有约束的深神经网络培训制定标准 2507.04033v1
  • 1004 07-05 From Prompting to Alignment: A Generative Framework for Query Recommendation Von Prompting zu Alignment: Ein generativer Rahmen für die Abfrageempfehlung 从提示到对齐:查询建议生成框架 2504.10208v3
  • 1005 07-05 An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks Ein effizienter lokaler Suchansatz für polarisierte Community Discovery in Signed Networks 对签署网络中的极地化社区发现的有效地方搜索方法 2502.02197v2
  • 1006 07-05 Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models Commute Networks als Signatur urbaner Sozioökonomischer Leistung: Bewertung von Mobilitätsstrukturen mit Deep-Learning-Modellen 将网络作为城市社会经济绩效的标志:利用深学习模式评价流动结构 2507.04027v1
  • 1007 07-05 Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo Test-Time Alignment von diskreten Diffusionsmodellen mit Sequential Monte Carlo 使用顺序式蒙特卡洛的分解传播模型的测试时间对齐 2505.22524v2
  • 1008 07-05 Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder Adversariale Entflechtung durch Backpropagation mit physik-informiertem Variational Autoencoder 与物理不完善的变异自动编码器反向反向反向分解 2506.13658v2
  • 1009 07-05 Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics Quantum Computing und Neuromorphic Computing für sicheres, zuverlässiges und erklärbares Multi-Agenten-Verstärkungslernen: Optimale Kontrolle in der autonomen Robotik 用于安全、可靠和可解释的多机构强化学习的量子计算和神经形态计算:自主机器人学的最佳控制 2408.03884v2
  • 1010 07-05 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Easy Dataset: Ein einheitliches und erweiterbares Framework zur Synthese von LLM-Feineinstellungsdaten aus unstrukturierten Dokumenten 简易数据集:从无结构文档中合成LLM精密调试数据的统一和扩展框架 2507.04009v1
  • 1011 07-05 Score-based Diffusion Models via Stochastic Differential Equations – a Technical Tutorial Score-basierte Diffusionsmodelle über stochastische Differentialgleichungen – ein technisches Tutorial 以记分为基础的通过斯托卡差异等式传播模型 – – 技术教学 2402.07487v3
  • 1012 07-05 Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation Nahtlose Integration von baumbasierten Positions-Embeddings in Transformer-Modelle zur Darstellung von Quellcodes 将植树定位嵌入式嵌入纳入源代码代表法的变换模型 2507.04003v1
  • 1013 07-05 Fast Re-Trainable Attention Autoencoder for Liquid Sensor Anomaly Detection at the Edge Schnelle, retrainierbare Aufmerksamkeit Autoencoder für Flüssigkeitssensoranomalienerkennung an der Kante 边边缘液体感应器异常探测快速再培训注意自动编码器 2507.03995v1
  • 1014 07-05 MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation MalVol-25: Ein vielfältiges, beschriftetes und detailliertes Speicherdatensatz für Malware-Erkennung und -Ansprechtests und -Validierung MalVol-25:用于检测和反应测试及校验的多种、标签和详细、挥发性记忆数据集 2507.03993v1
  • 1015 07-05 Adaptive Inference-Time Scaling via Cyclic Diffusion Search Adaptive Inferenz-Zeitskalierung über zyklische Diffusionssuche 通过cycclic传播搜索进行适应性推导-时间缩放 2505.14036v3
  • 1016 07-05 Improving RL Exploration for LLM Reasoning through Retrospective Replay Verbesserung der RL Exploration für LLM Reasoning durch retrospektive Replay 通过回溯性回放改进LL探索LLM 2504.14363v2
  • 1017 07-05 Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data Real-TabPFN: Tabuläre Stiftungsmodelle durch fortgesetztes Pre-Training mit Real-World Daten verbessern Real-TabPPFN:通过继续以真实世界数据进行预培训,改进图表基础模型 2507.03971v1
  • 1018 07-05 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Open-Reasoner-Zero: Ein Open Source Ansatz zur Skalierung von Verstärkungslernen am Basismodell 开放反馈-零:基于基础模式的扩大强化学习的开放源码方法 2503.24290v2
  • 1019 07-05 A Comparative Study of Specialized LLMs as Dense Retrievers Eine vergleichende Studie von spezialisierten LLMs als Dense Retriever 专业专利妇女作为 “ 大量开发 “ 的比较研究 2507.03958v1
  • 1020 07-05 Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT Networks Optimierung des Zeitalters des Vertrauens und des Durchsatzes in Multi-Hop UAV-geförderten IoT-Netzwerken 优化多Hop UAV援助的IOT网络的信任和输送时间 2507.03950v1
  • 1021 07-05 Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs Graph Kollaboratives Aufmerksamkeitsnetzwerk für Link-Vorhersage in Wissensgraphen 知识图中预测联系协作关注网络 2507.03947v1
  • 1022 07-05 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance Selbstlenkende, nicht-lineare, räumlich selektive Filter zur effizienten Extraktion von beweglichen Lautsprechern unter schwacher Anleitung 在薄弱指导下有效抽取流动发言者的自制深不滑空间选择过滤器 2507.02791v2
  • 1023 07-05 Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction mit Weather Image Time Series 纯粹外源预测所用时间分配的深学习模型:利用气象图像时间序列对水表深度预测应用 2409.13284v2
  • 1024 07-05 Hierarchical graph sampling based minibatch learning with chain preservation and variance reduction Hierarchisches Graphen-Sampling basiertes Minibatch-Lernen mit Kettenerhaltung und Varianzreduktion 以等级图表抽样抽样为基础的小型学习,同时保持链条和减少差异 2503.00860v7
  • 1025 07-05 Learning Physically Interpretable Atmospheric Models from Data with WSINDy Physikalisch interpretierbare atmosphärische Modelle aus Daten mit WSINDy lernen WSINDy 数据产生的学习物理上可解释的大气模型 2501.00738v2
  • 1026 07-05 MCST-Mamba: Multivariate Mamba-Based Model for Traffic Prediction MCST-Mamba: Multivariates Mamba-basiertes Modell für Verkehrsvorhersage MCST-Mamba:多变 Mamba 流量预测模型 2507.03927v1
  • 1027 07-05 Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model Kombination von Graph-Neural-Netzwerken und gemischter Integer-Linear-Programmierung für molekulare Inferenz unter dem zweilagigen Modell 在双面模型下结合图形神经网络和混合整数线性线性规划,用于分子推断 2507.03920v1
  • 1028 07-05 Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search Konsistenz-Bewusst-Padding für unvollständige Multi-Modal Alignment Clustering basierend auf selbst-abweisender Greedy Ankersuche 以自应贪婪锁定搜索为基础,为不完全多模式对齐聚合集成设置一致软件面 2507.03917v1
  • 1029 07-05 Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces Rückkehr des Latent Space COWBOYS: Der Einsatz von VAEs für die Bayesian-Optimierung strukturierter Räume neu denken 冷层空间COWBOYS的返回:重新思考如何利用VAEs优化Bayesian结构空间 2507.03910v1
  • 1030 07-05 Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning Risikosensitive Aktor-Kritik mit statischen Spektralrisikomaßnahmen für Online- und Offline-Verstärkungstraining 与在线和离线强化学习的静态频谱风险衡量措施有关的对风险敏感的行为者-批评 2507.03900v1
  • 1031 07-05 Transformer Model for Alzheimer’s Disease Progression Prediction Using Longitudinal Visit Sequences Transformer-Modell für Alzheimer-Krankheit Progressionsvorhersage mit Längsschnitt Besuch Sequenzen 利用纵向访问序列对阿尔茨海默氏病变异变异模型 2507.03899v1
  • 1032 07-05 GenAI-Powered Inference GenAI-Powered-Schlussfolgerung GenAI 授权推论 2507.03897v1
  • 1033 07-05 LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records LabTOP: Ein einheitliches Modell für Labortestergebnisse Vorhersage auf elektronische Gesundheitsdatensätze LabTOP:电子健康记录实验室试验结果预测统一模型 2502.14259v5
  • 1034 07-05 Unraveling the Black-box Magic: An Analysis of Neural Networks’ Dynamic Local Extrema Entwirren der Black-Box-Magie: Eine Analyse der dynamischen lokalen Neuralnetzwerke 解开黑箱魔术:神经网络动态局部极端现象分析 2507.03885v1
  • 1035 07-05 Leo Breiman, the Rashomon Effect, and the Occam Dilemma Leo Breiman, der Rashomon-Effekt und das Occam-Dilemma 里欧布雷曼,拉索蒙效应 和奥卡姆困境 2507.03884v1
  • 1036 07-05 On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v2
  • 1037 07-05 Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application Neurale Modelle von multiskaligen Systemen: konzeptionelle Einschränkungen, stochastische Parametrisierungen und eine Klimaanwendung 多尺度系统的神经模型:概念局限性、随机超称和气候应用 2506.22552v2
  • 1038 07-05 Predicting function of evolutionarily implausible DNA sequences Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen 进化性DNA序列的预测功能 2506.10271v2
  • 1039 07-05 LoRAShield: Data-Free Editing Alignment for Secure Personalized LoRA Sharing LoRAShield: Datenfreie Bearbeitungsausrichtung für die sichere Personalisierte LoRA-Sharing LoRAShild: 安全个人化LORA共享的无数据编辑一致 2507.07056v1
  • 1040 07-05 Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States Verbesserung der adaptiven Verhaltensinterventionen mit LLM-Schlussfolgerungen aus den von den Teilnehmern benannten Staaten 加强适应性行为干预,使用LLM LLM 2507.03871v1
  • 1041 07-05 A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime Eine strenge Verhaltensbewertung von CNNs mittels eines Data-Domain-Sampling-Regimes 利用数据域抽样制度对有线电视新闻网进行严格行为评估 2507.03866v1
  • 1042 07-05 Efficient generative adversarial networks using linear additive-attention Transformers Effiziente generative Adversarial-Netzwerke mit linearen Additive-Attention-Transformatoren 使用线性添加添加剂-意向变异器的高效基因化对抗性网络 2401.09596v5
  • 1043 07-05 OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference OrthoRank: Tokenauswahl über Sink Token Orthogonalität für effiziente LLM-Inferenz OrthoRank:通过 Sink Token 条形对高效LLM 推论进行当量选择 2507.03865v1
  • 1044 07-05 Enhanced accuracy through ensembling of randomly initialized auto-regressive models for time-dependent PDEs Verbesserte Genauigkeit durch Assembling von zufällig initialisierten auto-regressiven Modellen für zeitabhängige PDEs 通过混合随机初始化自动递减模型,为时间依赖性项目设计实体组合随机初始化自动递减模型,提高准确性 2507.03863v1
  • 1045 07-05 Continual Recommender Systems Kontinuierliche Recommender-Systeme 连续建议系统 2507.03861v1
  • 1046 07-05 Structured Generative Modeling with the Thermodynamic Kolmogorov-Arnold Model Strukturierte Generative Modellierung mit dem Thermodynamischen Kolmogorov-Arnold-Modell 与热力科尔莫戈洛夫-阿诺尔德模型进行结构生成模型 2506.14167v2
  • 1047 07-05 Taylor-Model Physics-Informed Neural Networks (PINNs) for Ordinary Differential Equations Taylor-Model Physics-informed Neural Networks (PINNs) für normale Differentialgleichungen 用于普通差异等式的Taylor-Model物理成形神经网络 2507.03860v1
  • 1048 07-05 Towards Efficient Contrastive PAC Learning Auf dem Weg zu einem effizienten kontraproduktiven PAC-Lernen 争取高效率竞争和反腐败方案学习 2502.15962v2
  • 1049 07-05 Learning Representations for Independence Testing Lernrepräsentanzen für Unabhängigkeitsprüfungen 独立测试学习代表 2409.06890v2
  • 1050 07-05 Graphons of Line Graphs Graphen von Liniengraphen 线图图图图图 2409.01656v3
  • 1051 07-05 Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting Transformer mit Koopman-erweitertem Graphen-Konvolutions-Netzwerk für Spatiotemporal Dynamics Forecasting 带有Koopman-Enced的光时动态预测图变形网络的变形器 2507.03855v1
  • 1052 07-05 Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters Latent FxLMS: Beschleunigung der aktiven Geräuschkontrolle mit neuralen adaptiven Filtern 低端 FxLMS: 使用神经适应过滤器加速主动噪音控制 2507.03854v1
  • 1053 07-05 OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems OrbitAll: Ein einheitliches Quantum Mechanische Darstellung Deep Learning Framework für alle molekularen Systeme 全轨道:所有分子系统统一量子机械代表深学习框架 2507.03853v1
  • 1054 07-05 KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis KEA Explain: Erklärungen von Halluzinationen mittels Graph Kernel Analysis KEA 解释: 使用图心分析解释幻觉 2507.03847v1
  • 1055 07-05 Interpretable Diffusion Models with B-cos Networks Interpretierbare Diffusionsmodelle mit B-cos-Netzwerken B-cos网络的可解释传播模型 2507.03846v1
  • 1056 07-05 Regularizing Log-Linear Cost Models for Inpatient Stays by Merging ICD-10 Codes Regularisieren von Log-Linear Cost Models für stationäre Aufenthalte durch Zusammenführen von ICD-10 Codes 通过合并 ICD-10 代码,规范住院患者日志成本模型 2507.03843v1
  • 1057 07-04 (5) Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction 用于大型电子结构预测的分布式等差图像神经网络 2507.03840v1
  • 1058 07-04 MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search MatRL: Wahrscheinlich verallgemeinerbare iterative Algorithmen Entdeckung über Monte-Carlo Baumsuche MatRL: 通过蒙特-卡洛树搜索 发现可普遍实现的迭代性电算算法 2507.03833v1
  • 1059 07-04 Understanding Differential Transformer Unchains Pretrained Self-Attentions Verständnis von Differentialtransformer-Unketten Vorgespannte Selbstaufmerksamkeiten 理解差异变异器 2505.16333v2
  • 1060 07-04 Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion Beziehungs-Aware-Netzwerk mit aufmerksamkeitsbasiertem Verlust für wenige-heiße Wissensgraphen-Vervollständigung 几乎没有热知识图补全的基于关注的损失关系软件网络 2306.09519v2
  • 1061 07-04 IMPACT: Importance-Aware Activation Space Reconstruction IMPACT: Bedeutungs-Bewusst-Aktivierung Weltraum-Rekonstruktion IMPACT: 重要-警报启动空间重建 2507.03828v1
  • 1062 07-04 Symmetry-Robust 3D Orientation Estimation Symmetrie-Robuste 3D-Orientierung Schätzung 对称-对称 3D 3D 方向估计 2410.02101v4
  • 1063 07-04 Compressing Deep Neural Networks Using Explainable AI Komprimierende Tiefen-Neural-Netzwerke mit erklärbarer KI 使用可解释的 AI 压缩深神经网络 2507.05286v1
  • 1064 07-04 Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts Nutzung großer Sprachmodelle für die Tacit Knowledge Discovery in organisatorischen Kontexten 利用大型语言模型促进组织环境中的隐蔽知识发现 2507.03811v1
  • 1065 07-04 Coil Geometry Learning for Short-Range Magnetic Actuation Spulengeometrie-Lernen für kurze magnetische Betätigung 短期磁力动动动的焦油几何学习 2507.03806v1
  • 1066 07-04 Learning Differentiable Logic Programs for Abstract Visual Reasoning Lernen von differenzierbaren Logic-Programmen für abstrakte visuelle Vernunft 学习用于抽象视觉理性的差别化逻辑程序 2307.00928v2
  • 1067 07-04 Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs Halluzinatorische Bild-Token: Ein trainingsfreier EAZY-Ansatz zur Erkennung und Abmilderung von Objekt-Halluzinationen in LVLMs 幻觉图像 Tokens: 关于在LVLMs中探测和减轻天体幻觉的无培训的EAZY方法 2503.07772v2
  • 1068 07-04 Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation Effizientes und effektives Abfrage-Context-Aware-Lern-to-Rank-Modell für sequentielle Empfehlung 高效率和高效能的查询背景- 软件学习到 Rank 序列建议模型 2507.03789v1
  • 1069 07-04 Effective Capacitance Modeling Using Graph Neural Networks Effektive Kapazitätsmodellierung mit Hilfe von Graph Neural Networks 使用图形神经网络的有效能力建模 2507.03787v1
  • 1070 07-04 Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods Identifizieren von großräumigen Linearparameter-Verschiedenheitssystemen mit dynamischen Methoden zur Dekomposition 确定具有动态模式分解方法的大型线性参数变异系统 2502.02336v2
  • 1071 07-04 FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed FastDINOv2: Frequency Based Curriculum Learning verbessert Robustheit und Trainingsgeschwindigkeit FastDINOv2:基于频率的课程学习提高强力和培训速度 2507.03779v1
  • 1072 07-04 Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms Effiziente Finite-Initialisierung mit Teilnormen für spannungsverstärkte Neuronalnetze und Tensor-Netzwerke Algorithmen 具有部分规范的短期神经网络和Tensor网络的有限化 2309.06577v4
  • 1073 07-04 Finetuning CLIP to Reason about Pairwise Differences Finetuning CLIP auf Grund von Pairwise Unterschiede 将 CLIP 微调到对等差异的原因 2409.09721v2
  • 1074 07-04 Causal Evidence for the Primordiality of Colors in Trans-Neptunian Objects Kausale Beweise für die Primordialität von Farben in transneptunischen Objekten 跨尼普敦天体颜色纯度的因果证据 2507.03760v1
  • 1075 07-04 Sequential Regression Learning with Randomized Algorithms Sequentielle Regressions-Lernen mit Randomisierten Algorithmen 使用随机定序算法的序列回归学习 2507.03759v1
  • 1076 07-04 Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models Probabilistische Einbettungen für frozen Vision-Language-Modelle: Ungewissheitsquantifizierung mit Gaußschen Prozess Latent Variable Models 冻结愿景-语言模型的概率嵌入模型:与高山进程边流变量模型的不确定性量化 2505.05163v2
  • 1077 07-04 Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis Implizite Regularisierung in Diffusionsmodellen: Eine Algorithmen-Abhängige Generalisierungsanalyse 在传播模型中隐含的规范化:一个分析法 – – 依赖性一般分析 2507.03756v1
  • 1078 07-04 Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression Bestimmung von Partikelgrößenverteilungen aus lichtstreuender Messung mit eingeschränkter Gauß-Prozessregression 利用受限制的高斯进程倒退,确定从轻喷射测量中得出的粒子-碎片分布 2507.03736v1
  • 1079 07-04 Inverse Synthetic Aperture Fourier Ptychography Inverse Synthetische Blende Fourier Ptychographie 反向合成孔径孔径 2507.03733v1
  • 1080 07-04 Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers Calabi-Yau-Konstruktionen transformieren: Neue Calabi-Yau-Manifolds mit Transformern generieren Calabi-Yau 变换建筑: 以变换器生成新 Calabi- Yau 的 变换元件 2507.03732v1
  • 1081 07-04 Less is More: Empowering GUI Agent with Context-Aware Simplification Weniger ist mehr: Ermächtigen von GUI-Agenten mit Context-Aware-Vereinfachung 更少是更多: 授权用户界面代理,简化内容软件 2507.03730v1
  • 1082 07-04 FAROS: Fair Graph Generation via Attribute Switching Mechanisms FAROS: Fair Graph Generierung über Attributwechselmechanismen FAROS:通过属性转换机制产生公平的图表 2507.03728v1
  • 1083 07-04 Temporal Window Smoothing of Exogenous Variables for Improved Time Series Prediction Temporale Fensterglättung von exogenen Variablen für eine verbesserte Zeitreihenvorhersage 改进时间序列预测外源变量平滑时间窗口 2507.05284v1
  • 1084 07-04 Differentially private scale testing via rank transformations and percentile modifications Differential private Skalierungstests über Rangtransformationen und Perzentilmodifikationen 通过等级变换和百分度变换进行差别私人规模测试 2507.03725v1
  • 1085 07-04 Predicting Business Angel Early-Stage Decision Making Using AI Geschäftsengel frühzeitige Entscheidungsfindung mit KI voraussagen 利用大赦国际预测商业天使早期阶段决策 2507.03721v1
  • 1086 07-04 Code Simulation as a Proxy for High-order Tasks in Large Language Models Code-Simulation als Proxy für High-Order-Aufgaben in großen Sprachmodellen 守则模拟作为大语言模型中高顺序任务代理器的代码模拟 2502.03568v3
  • 1087 07-04 CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning CosmoBench: Multiscale, Multiview, Multitask Kosmologie Benchmark für geometrisches Deep Learning CosmoBench: 用于几何深习的多尺度、多视角、多任务宇宙学基准 2507.03707v1
  • 1088 07-04 Offline RLAIF: Piloting VLM Feedback for RL via SFO Offline RLAIF: Pilotierung von VLM Feedback für RL über SFO RLAIF离线:通过SFO试行VLM对RL的反馈 2503.01062v5
  • 1089 07-04 The Geometries of Truth Are Orthogonal Across Tasks Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg 真理的地貌不同任务 2506.08572v2
  • 1090 07-04 Plugging Attention into Power Grids: Towards Transparent Forecasting Einbinden der Aufmerksamkeit in Stromnetze: Auf dem Weg zu einer transparenten Prognose 将注意力引入电力网网:走向透明预测 2507.03690v1
  • 1091 07-04 A Resource Efficient Quantum Kernel Ein ressourceneffizienter Quantenkern 资源效率高的量子核心 2507.03689v1
  • 1092 07-04 Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning Auf dem Weg zur Maschinentheorie des Geistes mit groß sprachlicher modellverstärkter Inverse Planung 通向大语言模拟反向规划的脑力理论 2507.03682v1
  • 1093 07-04 Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data Robuste Abschätzung heterogener Behandlungseffekte in randomisierten Studien, die externe Daten nutzen 利用外部数据对随机试验中多种不同治疗效果的有力估计 2507.03681v1
  • 1094 07-04 Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs Re-Emergent-Missausrichtung: Wie schmale Feintuning-Erodes Sicherheit Ausrichtung in LLMs 重新出现错配: LLM 中如何将安全对齐 2507.03662v1
  • 1095 07-04 When Network Architecture Meets Physics: Deep Operator Learning for Coupled Multiphysics Wenn Netzwerkarchitektur auf Physik trifft: Deep Operator Learning für gekoppelte Multiphysik 当网络架构与物理相遇时:深操作者为混合多物理学学习 2507.03660v1
  • 1096 07-04 Forecast Evaluation and the Relationship of Regret and Calibration Prognosebewertung und das Verhältnis von Bedauern und Kalibrierung 预测评价以及遗憾和校准的关系 2401.14483v3
  • 1097 07-04 PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks PRUNE: Ein Patching-basiertes Reparatur-Framework für zertifizierbares Entlernen neuraler Netzwerke PPRUNE:神经网络可核证的不可学习补补补式修理框架 2505.06520v3
  • 1098 07-04 MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson’s Progression Markers Initiative cohort MLASDO: ein Software-Tool zur Erkennung und Erklärung klinischer und omischer Inkonsistenzen, die auf die Kohorte der Parkinson Progression Markers Initiative angewendet werden MLASDO:用于检测和解释适用于帕金森人进步标记倡议组群的临床和流行性不一致现象的软件工具 2507.03656v1
  • 1099 07-04 Disentangling the Roles of Representation and Selection in Data Pruning Entwirren der Rollen von Repräsentation und Auswahl bei der Datenkorrektur 分离数据谨慎中的代表和甄选作用 2507.03648v1
  • 1100 07-04 SymmetryLens: Unsupervised Symmetry Learning via Locality and Density Preservation SymmetrieLens: Unüberwachte Symmetrie Lernen über Lokalität und Dichte Erhaltung 对称路程:通过地点和密度保护进行不受监督的对称学习 2410.05232v2
  • 1101 07-04 A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications Hybrides überwachtes und selbstüberwachtes Graphen-Neural-Netzwerk für Edge-Centric-Anwendungen 边缘中心应用混合监督和自我监督图形神经网络 2501.12309v2
  • 1102 07-04 Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States Probing Latent Subspaces in LLM für KI-Sicherheit: Identifizierung und Manipulation von Adversarialstaaten 为AI安全而在LLM中为LLM中为LLM中进行试验的低端子空间:识别和操纵反向国家 2503.09066v2
  • 1103 07-04 PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: 用于加密失传图像重建的基于隐私-保护联邦学习的残余常量空间网络 2507.00230v2
  • 1104 07-04 Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations Wissenschaftliches maschinelles Lernen chaotischer Systeme entdeckt regierende Gleichungen für neurale Bevölkerungen 神经人口等分的麻风系统发现科学机学 2507.03631v1
  • 1105 07-04 Disentangling Doubt in Deep Causal AI Disentangling Zweifel in tiefen kausalen KI 《深因果中驱散怀疑》 2507.03622v1
  • 1106 07-04 Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy Ist es Zeit, Prompts als Code zu behandeln? Eine Multi-Use-Fallstudie für Prompt-Optimierung mit DSPy 是否是时候将提示作为代码处理? 使用 DSPy 快速优化的多用途案例研究 2507.03620v1
  • 1107 07-04 DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation DTN: Deep Multiple Task-spezifisches Feature Interactions Network für Multi-Task-Empfehlung DTN:多任务建议深度多任务特有互动网络 2408.11611v4
  • 1108 07-04 Non-negative matrix factorization algorithms generally improve topic model fits Nicht-negative Matrixfaktorisierungsalgorithmen verbessern generell Themenmodell-Anpassungen 非负矩阵非负矩阵乘数化算法一般改进专题模式 2105.13440v4
  • 1109 07-04 Kinetic Langevin Diffusion for Crystalline Materials Generation Kinetische Langevin Diffusion für die kristalline Materialgenerierung 水晶材料生产动态Langevin传播 2507.03602v1
  • 1110 07-04 SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications SciVid: Cross-Domain-Bewertung von Videomodellen in wissenschaftlichen Anwendungen SciVid:科学应用录像模型跨主题评价 2507.03578v1
  • 1111 07-04 JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model JanusDNA: Ein leistungsstarkes bidirektionales Hybrid-DNA-Stiftungsmodell JanusDNA:一个强大的双向双向混合DNA基金会模型 2505.17257v3
  • 1112 07-04 Learning unitaries with quantum statistical queries Lerneinheiten mit quantenstatistischen Abfragen 附有量数统计查询的学习单 2310.02254v2
  • 1113 07-04 From Street Form to Spatial Justice: Explaining Urban Exercise Inequality via a Triadic SHAP-Informed Framework Von der Straßenform zur räumlichen Gerechtigkeit: Erklären städtischer Übungsungleichheit über einen triadischen SHAP-informierten Rahmen 从街道形式到空间司法:通过三部三部CHAP综合框架解释城市行使不平等问题 2507.03570v1
  • 1114 07-04 Consistency of augmentation graph and network approximability in contrastive learning Kohärenz von Augmentationsgraph und Netzwerk-Annäherung im kontrastierenden Lernen 在对比性学习中,增强图和网络接近网络的一致性 2502.04312v2
  • 1115 07-04 2.5D Object Detection for Intelligent Roadside Infrastructure 2.5D-Objekterkennung für intelligente Straßeninfrastruktur 2.5D 智能路边基础设施物体探测 2507.03564v1
  • 1116 07-04 Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention Gradient Short-Circuit: Effiziente Out-of-Distribution-Erkennung über Feature-Intervention 渐进短路:通过地物干预有效外分配探测 2507.01417v2
  • 1117 07-04 Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure Vereinfachende Graphen-Neural-Kernel: von Stapelschichten zur zusammengebrochenen Struktur 简化图形神经内核:从堆叠层到折叠结构 2507.03560v1
  • 1118 07-04 Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware Sketching Kommunikation Effiziente, differenzierte Private Distributed Optimization mit Correlation-Aware Sketching 使用关联软件拼写法实现通信效率高的、有差异的私人分配最佳优化 2507.03545v1
  • 1119 07-04 Generating Synthetic Relational Tabular Data via Structural Causal Models Synthetische relationale Tabellendaten mittels struktureller Kausalmodelle generieren 通过结构因果模型生成合成关系图表数据 2507.03528v1
  • 1120 07-04 Decoupled Relative Learning Rate Schedules Entkoppelte Zeitpläne für relative Lernraten 分离的相对学习率表 2507.03526v1
  • 1121 07-04 UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies UWB TDoA Fehlerkorrektur mit Transformern: Patching und Positional Encoding Strategien UWB TDoA 使用变换器校正错误:补丁和定位编码战略 2507.03523v1
  • 1122 07-04 A Flexible Instruction Set Architecture for Efficient GEMMs Flexible Instruktions-Set-Architektur für effiziente GEMMs 高效的通用环管机制的灵活教学结构 2507.03522v1
  • 1123 07-04 High-Dimensional Learning in Finance High-Dimensional Learning in Finance 金融领域高层次学习 2506.03780v3
  • 1124 07-04 ObjectRL: An Object-Oriented Reinforcement Learning Codebase ObjectRL: Eine objektorientierte, verstärkte Lerncodebase 对象RL: 目标导向的加强学习代码库 2507.03487v1
  • 1125 07-04 Molecular Machine Learning Using Euler Characteristic Transforms Molekulares maschinelles Lernen mit Euler Charakteristische Transformationen 利用超强特征变形进行分子机学习 2507.03474v1
  • 1126 07-04 Evaluating the Evaluators: Trust in Adversarial Robustness Tests Bewertung der Evaluatoren: Vertrauen in zweifelhafte Robustheitstests 评估评价者:对抗逆力测试的信任 2507.03450v1
  • 1127 07-04 Multi-Level Fusion Graph Neural Network for Molecule Property Prediction Multi-Level Fusion Graph Neural Network für Molecule Property Prediction 多级融合图图 分子财产预测多级融合图神经网络 2507.03430v1
  • 1128 07-04 Generating realistic patient data Erzeugen realistischer Patientendaten 产生现实的病人数据 2507.03423v1
  • 1129 07-04 A Hybrid Game-Theory and Deep Learning Framework for Predicting Tourist Arrivals via Big Data Analytics and Opinion Leader Detection Ein hybrides Game-Theorie- und Deep-Learning-Framework zur Vorhersage von Touristenankünften über Big Data Analytics und Opinion Leader Detection 通过大数据分析和舆论领袖检测,预测游客抵达的混合游戏理论和深层学习框架 2507.03411v1
  • 1130 07-04 On the Effectiveness of the $z$-Transform Method in Quadratic Optimization Über die Wirksamkeit der $z$-Transform Methode in der quadratischen Optimierung 美元-四重力优化转换方法的效果 2507.03404v1
  • 1131 07-04 Absolute Evaluation Measures for Machine Learning: A Survey Absolute Bewertungsmaßnahmen für maschinelles Lernen: Eine Umfrage 机器学习绝对评价措施:调查 2507.03392v1
  • 1132 07-04 On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages Zur Expressivität und Längenverallgemeinerung von selektiven State-Space-Modellen auf regulären Sprachen 关于常规语言选择性国家空间模型的表达性和时间长度的一般化 2412.19350v2
  • 1133 07-04 Learning Traffic Anomalies from Generative Models on Real-Time Observations Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen 实时观测生成模型的学习交通异常现象 2502.01391v4
  • 1134 07-04 Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v2
  • 1135 07-04 Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections Implizite Belohnung als Brücke: Eine einheitliche Ansicht von SFT- und DPO-Verbindungen 隐含奖励为桥梁:SFT和DPO连接的统一观点 2507.00018v2
  • 1136 07-04 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning MENTOR: Mixture-of-Experts-Netzwerk mit Task-Oriented Perturbation für visuelles Verstärkungslernen INTOOR: 视力强化学习中以任务为导向的干扰干扰模拟专家网络 2410.14972v3
  • 1137 07-04 Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization Maßnahmen Robustes Verstärktes Lernen durch Optimal Adversary Aware Policy Optimization 通过优化反向最佳认识政策优化,加强学习行动强力加强学习 2507.03372v1
  • 1138 07-04 Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education Detection of Disengagement from Voluntary Quizzes: Ein erklärbarer maschineller Lernansatz in der höheren Fernlehre 探测自愿Quizzes脱离自愿Quizzes:高远程教育中可解释的机械学习方法 2507.02681v2
  • 1139 07-04 A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference Ein Echtzeit-Digital Twin für Typ-1-Diabetes mit simulationsbasierter Schlussfolgerung 使用模拟推断法为1型糖尿病实时数字双 2507.01740v2
  • 1140 07-04 Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting Adaptive Gate-Aware Mamba-Netzwerke für Magnetresonanz-Fingerprinting 用于磁共振指纹的适应性门软件Mamba网络 2507.03369v1
  • 1141 07-04 LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment LearnAlign: Grundlegende Datenauswahl für Verstärkungslernen in großen Sprachmodellen basierend auf verbesserter Gradient Alignment 学习对称:根据改进梯度对齐,为在大语言模式中强化学习选择理由数据 2506.11480v3
  • 1142 07-04 Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency Grade der Freiheit für lineare Aufmerksamkeit: Destillieren von Softmax-Achtung mit optimaler Feature-Effizienz 线性关注自由度:以最佳地物效率保持软性关注 2507.03340v1
  • 1143 07-04 CAOTE: KV Cache Eviction for LLMs via Attention Output Error-Based Token Selection CAOTE: KV-Cache-Eviktion für LLMs über Aufmerksamkeitsausgabe Fehlerbasierte Token-Auswahl CAOTE: KV 通过注意输出错误选择选中的LLMs KV Cache Evoction 2504.14051v3
  • 1144 07-04 Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Disambiguation-Centric Finetuning macht Enterprise Tool-Calling LLMs realistischer und weniger riskant 差异性-中心微调使企业用工具召集LLM 更现实、风险更小 2507.03336v1
  • 1145 07-04 Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v1
  • 1146 07-04 Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis Universal Checkpointing: Ein flexibles und effizientes Distributed Checkpointing-System für großformatige DNN-Schulungen mit rekonfigurierbarer Parallelis 通用检查:采用可重新配置平行系统进行大型DNN培训的灵活和高效分布式检查系统 2406.18820v3
  • 1147 07-04 Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization Structure-Aware Compound-Protein Affinitätsvorhersage über Graph Neural Network mit Group Lasso Regularization 通过Lasso集团正规化的图像神经网络进行结构器件复合复合物-蛋白质亲近性预测 2507.03318v1
  • 1148 07-04 Rewriting Pre-Training Data Boosts LLM Performance in Math and Code Umschreiben von Vorschulungsdaten steigert die LLM-Performance in Mathematik und Code 数学和代码方面的LLM性能 2505.02881v3
  • 1149 07-04 OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework OpenRLHF: Ein einfach zu bedienendes, skalierbares und leistungsstarkes RLHF-Framework OpenRLHF: 易于使用、可缩放和高绩效RLHF框架 2405.11143v5
  • 1150 07-04 Deep Autoregressive Models as Causal Inference Engines Deep Autoregressive Modelle als Kausale Inferenz-Engine 深自递减模型,如因果推断引擎 2409.18581v3
  • 1151 07-04 Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure Diffusionsfaktormodelle: Erzeugen von hochdimensionalen Rückgaben mit Faktorstruktur 传播因数模型:产生具有因数结构的高差异返回 2504.06566v3
  • 1152 07-04 KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction KEPLA: Ein Wissen-erweitertes Deep Learning Framework für präzise Protein-Ligand Bindung Affinity Prediction KEPLA:一个知识强化的更深层学习框架,用于准确预测蛋白-银-捆绑性近亲关系 2506.13196v2
  • 1153 07-04 ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series ReTimeCausal: 非定期时间序列中可解释性因果关系发现模型的EM- 增强的Additive Additive 噪音模型 2507.03310v1
  • 1154 07-04 LRM-1B: Towards Large Routing Model LRM-1B: Auf dem Weg zu einem großen Routing-Modell LRM-1B:迈向大型区域迁移模式 2507.03300v1
  • 1155 07-04 Dyn-O: Building Structured World Models with Object-Centric Representations Dyn-O: Strukturierte Weltmodelle mit objektzentralen Darstellungen Dyn-O:建立结构化世界模型,有实物中心代表 2507.03298v1
  • 1156 07-04 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v3
  • 1157 07-04 MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression von LLMs MGAA: 低Rank 低Rank 低压压LLMs的多组适应性分配 2507.03294v1
  • 1158 07-04 LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents LTLCrit: Ein auf Temporal Logic basierender LLM-Kritiker für sichere und effiziente einkörperige Wirkstoffe LLLCritri: 一种基于时时逻辑的LLM LLM 用于安全和高效装药剂的Criticr 2507.03293v1
  • 1159 07-04 Lion Cub: Minimizing Communication Overhead in Distributed Lion Lion Cub: Minimierung der Kommunikation über Kopf in verteilten Löwen Lion Cub:尽量减少分配狮子的通讯问题 2411.16462v2
  • 1160 07-04 Global Variational Inference Enhanced Robust Domain Adaptation Globale Variationsableitung Verbesserte Robuste Domain-Anpassung 全球变异推力增强强力域域适应 2507.03291v1
  • 1161 07-04 Do Tensorized Large-Scale Spatiotemporal Dynamic Atmospheric Data Exhibit Low-Rank Properties? Tensorisierte großräumige räumlich-zeitliche dynamische atmosphärische Daten zeigen Low-Rank-Eigenschaften? 长期大型外观、时空动态大气数据展览低兰克属性是否? 2507.03289v1
  • 1162 07-04 VGMShield: Mitigating Misuse of Video Generative Models VGMShield: Missbrauch von Videogenerativen Modellen verhindern VGMShield:减少滥用视频生成模型 2402.13126v2
  • 1163 07-04 QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction QCResUNet: Gemeinsame Segmentierung auf Subjektebene und Voxel-Ebene QCResUNet:联合主题一级和福克斯级分解质量预测 2412.07156v2
  • 1164 07-04 Conformal Information Pursuit for Interactively Guiding Large Language Models Conformal Information Pursuit für interactively Guiding große Sprachmodelle 互动指导型大语言模式非正式信息搜索 2507.03279v1
  • 1165 07-04 Securing Transformer-based AI Execution via Unified TEE and Crypto-protected Accelerators Sicherung transformerbasierter KI-Execution über Unified TEE und Crypto-geschützte Acceleratoren 通过统一TEE和加密保护加速器实施基于安全变换器的AI执行 2507.03278v1
  • 1166 07-04 REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v5
  • 1167 07-04 DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning DBA-DFL: Auf dem Weg zu verteilten Backdoor-Angriffen mit Netzwerkerkennung im dezentralisierten Föderierten Lernen DBA-DDFL:努力在权力下放的联邦学习中以网络探测方式进行分散的后门攻击 2501.15005v2
  • 1168 07-04 SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA SC-LoRA: Ausbalancieren effizienter Feinsteuerung und Wissenserhaltung über Subraum-kontrainierte LoRA SC-LORA:通过分空间训练LORA平衡高效微调和知识保护 2505.23724v2
  • 1169 07-04 Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos Memory Storyboard: Nutzung von Temporal Segmentation zum Streaming von selbstüberwachtem Lernen aus egozentrischen Videos 内存故事板:利用时间分层利用从 Egocent 视频中流出自我超强学习 2501.12254v2
  • 1170 07-04 Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies Auftragserwerb unter Wettbewerbsdruck: Ein schnell adaptiver Lernansatz zur Stärkung der Reit-Hailing-Subsidy-Strategien 在竞争压力下取得订单:快速适应性强化学习办法,用于骑乘-快乐补贴战略 2507.02244v2
  • 1171 07-04 ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization ConceptMix++: Leveln des Spielfeldes in Text-zu-Bild-Benchmarking über iterative Prompt-Optimierung 概念Mix+++:通过迭代性即时优化,在文本到图像基准设定中将游戏场定位 2507.03275v1
  • 1172 07-04 DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift DNN-basierte Vorkodierung in RIS-gestützten mmWave MIMO-Systemen mit praktischer Phasenverschiebung 以DNN为基础的RIS-援助毫米 wave MIMO系统实际阶段调整前编码 2507.02824v2
  • 1173 07-04 LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference LILI-Clustering-Algorithmus: Limit Inferior Leaf Interval Integrated in Causal Forest for Causal Interference LLIL 群集算法: 将受难者叶叶之间结合的限值纳入Causal Forest 2507.03271v1
  • 1174 07-04 On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization On-Policy-Optimierung von ANFIS-Politiken unter Verwendung einer proximalen Politikoptimierung ANFIS政策采用最优政策优化的政策优化 2507.01039v2
  • 1175 07-04 Subject Invariant Contrastive Learning for Human Activity Recognition Thema Invariant Kontrastives Lernen für die Anerkennung menschlicher Aktivität 为确认人类活动而进行各种差异学习的主题 2507.03250v1
  • 1176 07-04 EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v2
  • 1177 07-04 Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach Bewertung der Auswirkungen mehrerer DER Aggregatoren auf die Energiemärkte des Großhandels: Ein hybrider mittlerer Feldansatz 评价多个DER聚合企业对批发能源市场的影响:混合的实地办法 2409.00107v2
  • 1178 07-04 Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks Fault Sneaking Attack: Ein Stealthy-Framework für fehlführende tiefe neurale Netzwerke 失事偷偷攻击:错误领导深神经网络的隐形框架 1905.12032v2
  • 1179 07-04 Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models Neural Diskrete Token Repräsentanz Lernen für extreme Token Reduction in Video Große Sprachmodelle 在视频大语言模式中进行关于极端减少视频大语言模式的神经分辨代言人学习 2503.16980v4
  • 1180 07-04 Federated Continual Learning: Concepts, Challenges, and Solutions Federated Continual Learning: Konzepte, Herausforderungen und Lösungen 联邦持续性学习:概念、挑战和解决方案 2502.07059v2
  • 1181 07-04 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v6
  • 1182 07-03 (4) SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models SI-Agent: Agentischer Rahmen für feedbackgetriebene Erzeugung und Tuning von menschenlesbaren Systemanweisungen für große Sprachmodelle SI-Agent:用于大语言模型的可读人类系统指令反馈驱动生成和调试的代理框架 2507.03223v1
  • 1183 07-03 Neural Inhibition Improves Dynamic Routing and Mixture of Experts Neurale Hemmung verbessert dynamische Routing und Mischung von Experten 改善专家动态路由和混合 2507.03221v1
  • 1184 07-03 Symbiosis: Multi-Adapter Inference and Fine-Tuning Symbiose: Multi-Adapter-Schlussfolgerung und Feinabstimmung 共生关系:多位开发商的推断和精准调整 2507.03220v1
  • 1185 07-03 Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data Weiterentwicklung der Präzisions-Onkologie durch Modellierung von Längs- und Multimodaldaten 通过建立纵向和多模式数据模型,推进精密肿瘤学 2502.07836v3
  • 1186 07-03 Fast Algorithm for Constrained Linear Inverse Problems Schneller Algorithmus für eingeschränkte lineare Inverse Probleme 限制线性反向问题快速算法 2212.01068v7
  • 1187 07-03 Structured Difference-of-Q via Orthogonal Learning Strukturierter Differenz-of-Q über Orthogonales Lernen 通过矫形学习实现结构差异 2406.08697v3
  • 1188 07-03 DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing DistZO2: High-Throughput und Memory-Efficient Zeroth-Order Fine-Tuning LLMs mit verteiltem Parallel Computing DistrZO 2 : 具有分布式平行计算法的高吞压和内存-内存-有效 Zero- Ord 微调微调LMLM 2507.03211v1
  • 1189 07-03 LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction LANTERN: Ein maschinelles Lern-Framework für Lipid-Nanopartikel-Transfektions-Effizienz-Vorhersage LANTERN:利皮纳米粒子转化效率预测的机械学习框架 2507.03209v1
  • 1190 07-03 Statistical Significance of Feature Importance Rankings Statistische Signifikanz der Merkmals-Wichtigitäts-Rankings 地物重要性排名的统计意义 2401.15800v5
  • 1191 07-03 Weak Form Scientific Machine Learning: Test Function Construction for System Identification Schwache Form des wissenschaftlichen maschinellen Lernens: Testfunktionsaufbau für die Systemidentifizierung 微弱形式科学机器学习:系统识别的测试功能建设 2507.03206v1
  • 1192 07-03 On the Adversarial Robustness of Graph Neural Networks with Graph Reduction Auf der adversarialen Robustheit von Graphen-Neuralen Netzwerken mit Graphenreduktion 关于具有缩小图量的图形神经网络的逆向强力 2412.05883v2
  • 1193 07-03 Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding Quantifizierung der Cross-Attention-Interaktion in Transformern zur Interpretation von TCR-pMHC-Bindung 为解释TRCR-pMHC而对变压器中交叉注意互动进行量化 2507.03197v1
  • 1194 07-03 CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning CODE-CL: Konzeptorbasierte Gradientenprojektion für tiefes kontinuierliches Lernen CODE-CL:深海持续学习基于概念的逐步预测 2411.15235v3
  • 1195 07-03 Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity Steigerung der Robustheit im Preference-Based Verstärkungs-Lernen mit dynamischer Sparsamkeit 以动态平等促进基于优惠的强化学习的强力 2406.06495v3
  • 1196 07-03 Discovering Algorithms with Computational Language Processing Algorithmen mit numerischer Sprachverarbeitung entdecken 使用计算语言语言处理发现算法 2507.03190v1
  • 1197 07-03 First Contact: Data-driven Friction-Stir Process Control Erster Kontakt: Datengesteuerte Friktions-Stir Prozesssteuerung 第一次联系:数据驱动摩擦-系统进程控制 2507.03177v1
  • 1198 07-03 Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies Deep Learning Atmospheric Models zuverlässig simulieren außerhalb der Probelandwärme und Kaltwellenfrequenzen 可可靠模拟地模拟地热和冷波碎片 2507.03176v1
  • 1199 07-03 Understanding Knowledge Transferability for Transfer Learning: A Survey Wissensübertragbarkeit für Transfer Learning verstehen: Eine Umfrage 了解知识转让转让学习的可转让性:调查 2507.03175v1
  • 1200 07-03 Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data Latent Thermodynamic Flows: Unified Representation Learning und Generative Modellierung von temperaturabhängigen Verhaltensmustern aus begrenzten Daten 冷热动力流动:从有限数据中产生单一代表学习和产生温度依附行为行为模型 2507.03174v1
  • 1201 07-03 A Neurosymbolic Framework for Geometric Reduction of Binary Forms Ein neurosymbolisches Framework zur geometrischen Reduktion von Binärformen 二进制几何减少二进制的纽罗申姆伯利框架 2501.15404v2
  • 1202 07-03 Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation Über SEO hinaus: Ein transformerbasierter Ansatz zur Neuerfindung von Web Content Optimierung 超越SEO:改造网络内容优化的变革型办法 2507.03169v1
  • 1203 07-03 Adopting a human developmental visual diet yields robust, shape-based AI vision Die Annahme einer visuellen Ernährung für die menschliche Entwicklung führt zu einer robusten, formenbasierten KI-Vision 采用人文发展视觉饮食,产生稳健的、基于形状的人工智能愿景 2507.03168v1
  • 1204 07-03 Adversarial Manipulation of Reasoning Models using Internal Representations Adversariale Manipulation von Vernunftmodellen mittels interner Repräsentationen 采用内部代表制对理性模型进行反向调整 2507.03167v1
  • 1205 07-03 PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset PiCME: Pipeline zur kontrastiven Modalitätsbewertung und Kodierung im MIMIC Datensatz PICME:MIMIMIMIC数据集中的反竞争模式评估和编码管道 2507.03165v1
  • 1206 07-03 MathOptAI.jl: Embed trained machine learning predictors into JuMP models MathOptAI.jl: Einbetten ausgebildeter Maschinenlern-Prognostiker in JuMP-Modelle MathOptAI.jl: 嵌入式培训机器学习预测器,以进入JuMP模型 2507.03159v1
  • 1207 07-03 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference HGCA: Hybride GPU-CPU Aufmerksamkeit für lange Kontext LLM-Inferenz HGCA: 混合GPU-CPU注意长距离LLM 推理 2507.03153v1
  • 1208 07-03 Expert-level validation of AI-generated medical text with scalable language models Validierung von KI-generierten medizinischen Texten auf Expertenebene mit skalierbaren Sprachmodellen 专家一级对AI产生的带有可缩放语言模型的可缩放语言模型的医学文本进行鉴定 2507.03152v1
  • 1209 07-03 Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games Last-Iterate Konvergenz von No-Regret-Learning für Equilibria in Schnäppchen Spiele 谈判运动会中不登记学习促进平衡的融合 2507.03150v1
  • 1210 07-03 Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models Elemente des Weltwissens (EWoK): Ein von Kognition inspirierter Rahmen zur Bewertung des Grundwissens der Welt in Sprachmodellen 世界知识要素(EWoK): “ 评价基本世界语言模式知识的认知 – – 启发框架 “ 2405.09605v2
  • 1211 07-03 DeepGesture: A conversational gesture synthesis system based on emotions and semantics DeepGesture: Ein dialogisches Gesten-Synthesesystem basierend auf Emotionen und Semantik DeepGesture:基于情感和语义的谈话手势合成系统 2507.03147v1
  • 1212 07-03 Set Valued Predictions For Robust Domain Generalization Wertvolle Vorhersagen für robuste Domain-Verallgemeinerung setzen 设定强力域一般化的价值预测值 2507.03146v1
  • 1213 07-03 Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics Neuraler Ersatzlöser für effiziente Edge Inferenz von Power Electronic Hybrid Dynamics 电动电子混合动力学高效边缘推断神经替代溶剂 2507.03144v1
  • 1214 07-03 The geometry of financial institutions – Wasserstein clustering of financial data Die Geometrie der Finanzinstitute – Wasserstein-Clustering von Finanzdaten 金融机构的几何形状 – – 金融数据的瓦塞斯坦组合 2305.03565v2
  • 1215 07-03 Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations Auf dem Weg zu einer psychoanalytischen Perspektive auf VLM-Verhalten: Eine erste Interpretation mit faszinierenden Beobachtungen 对VLM行为进行心理分析:带有引人注意的首步解释 2507.03123v1
  • 1216 07-03 MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration MORPH-LER: Log-Euklidischen Regularisierung für bevölkerungsbewusste Image-Registrierung MORPH-LER: 人口软件图像注册的日志-欧克利德兰常规化 2502.02029v2
  • 1217 07-03 LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms LEDA: Log-Euclidean Diffeomorphism Autoencoder für effiziente statistische Analyse von Diffeomorphismen LEDA: 用于有效分析二异形特征的高效统计分析的Log-Euclidean Diffepheromist 自动编码器 2412.16129v2
  • 1218 07-03 Predicting the Susceptibility of Examples to Catastrophic Forgetting Vorhersage der Anfälligkeit von Beispielen für katastrophisches Vergessen 预测灾难性遗忘实例的可视性 2406.09935v2
  • 1219 07-03 Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings Federated Learning für die ICD-Klassifizierung mit Leichtbaumodellen und vorgeschulten Einbettungen 具有轻重量型模型和未受过训练的嵌入型综合疾病分类的联邦学习 2507.03122v1
  • 1220 07-03 How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models Wie Übervertrauen in anfängliche Entscheidungen und Unterbewusstsein unter Kritik den Sinneswandel in großen Sprachmodellen modulieren 批评主义下对最初选择和不信任的过度自信如何改变大语言模式中的心态变化 2507.03120v1
  • 1221 07-03 BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers BLaST: High Performance Inferenz und Pretraining mit BLock Sparse Transformers BLAST:使用BLock Sparse变形器进行高性能推断和预先训练 2507.03117v1
  • 1222 07-03 Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization Hintere Inferenz mit Diffusionsmodellen für die hochdimensionale Black-Box-Optimierung 高维黑盒优化扩散模型的外皮推断 2502.16824v2
  • 1223 07-03 An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks Ein erweitertes Benchmarking von Multi-Agenten-Verstärkung-Lernalgorithmen in komplexen voll kooperativen Aufgaben 扩大复杂全面合作任务中多机构强化学习比重的基准基准 2502.04773v2
  • 1224 07-03 SymMatika: Structure-Aware Symbolic Discovery SymMatika: Struktur-Bewusst Symbolische Entdeckung SymMatika: 结构-意识符号发现 2507.03110v1
  • 1225 07-03 Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse Observations Neural Dynamische Modi: Computational Imaging von dynamischen Systemen aus Sparse-Beobachtungen 神经动态模式:来自分散观测的动态系统的计算成像 2507.03094v1
  • 1226 07-03 The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks Die Surrogat Gibbs-Posterior einer korrigierten stochastischen MALA: Auf dem Weg zur Unsicherheitsquantifizierung für neuronale Netzwerke 校正的《MAMALA》的替代Gibbs前置:争取对神经网络的不确定性进行量化 2310.09335v2
  • 1227 07-03 Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Point3R: Streaming der 3D-Rekonstruktion mit explizitem Raumzeigerspeicher Point3R: 利用明确的空间指内存进行三维重建 2507.02863v1
  • 1228 07-03 Answer Matching Outperforms Multiple Choice for Language Model Evaluation Antwort Matching Outperforms Multiple Choice für Sprachmodell-Bewertung 语言模式评价的多种选择 2507.02856v1
  • 1229 07-03 Urban Region Pre-training and Prompting: A Graph-based Approach Urban Region Pre-Training und Prompting: Ein graphischer Ansatz 培训前和推动:基于图表的方法 2408.05920v4
  • 1230 07-03 Mitigating Goal Misgeneralization with Minimax Regret Zielverallgemeinerung mit Minimax-Beschwerde abmildern 与Minimax Regret 的不普及化 2507.03068v1
  • 1231 07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs MOTIF: Modulares Denken durch Verstärkung Feinabstimmung in LLMs MOTIF:通过强化微调在LLM中进行模块思考 2507.02851v1
  • 1232 07-03 MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis MvHo-IB: Multi-View Higher-Order Informationen Engpass für Gehirnerkrankungen Diagnose MvHO-IB:脑紊乱诊断多视高端信息瓶颈 2507.02847v1
  • 1233 07-03 Evaluating Frontier Models for Stealth and Situational Awareness Bewertung von Frontier-Modellen für Stealth und Situationsbewusstsein 评价隐形和了解情况的边界模式 2505.01420v4
  • 1234 07-03 LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding LLM-getriebene Behandlung Wirkung Schätzung unter Inferenzzeit Text Confounding LLM-驱动处理效应估算 2507.02843v1
  • 1235 07-03 On the Structure of Replicable Hypothesis Testers Über die Struktur replizierbarer Hypothesenprüfer 关于可复制的假设测试器结构 2507.02842v1
  • 1236 07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason StepHint: Mehrstufige stufenweise Hinweise stärken das Lernen zur Vernunft 步进提示:多级分步骤将强化学习提升到合理 2507.02841v1
  • 1237 07-03 From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v3
  • 1238 07-03 Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble Mitgliedschafts-Inferenz Angriffe als Datenschutz-Tools: Zuverlässigkeit, Disparität und Ensemble 以成员攻击作为隐私工具:可靠性、差异性和共性 2506.13972v2
  • 1239 07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning ExPO: Entsperren harter Vernunft mit selbsterklärungsgeführtem Verstärkungslernen ExPO: 以自我剥削指导强化学习来解锁困难理由 2507.02834v1
  • 1240 07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model SynapseRoute: Ein Auto-Routen-Schaltrahmen für das Dual-State Large Language Model SynapseRoute:关于两州大语言模式的自动运行切换框架 2507.02822v1
  • 1241 07-03 Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks Messung als Bricolage: Untersuchung, wie Datenwissenschaftler Zielvariablen für vorausschauende Modellierungsaufgaben konstruieren 计量为 “ 饼干测量 “ :审查数据科学家如何构建预测模拟任务的目标变量 2507.02819v1
  • 1242 07-03 Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case Große Sprachmodelle zur Automatisierung der klinischen Datenstandardisierung: HL7 FHIR Use Case 临床数据标准化自动化大语言模型:HL7 FHIR使用案例 2507.03067v1
  • 1243 07-03 Replicable Distribution Testing Reproduzierbare Verteilungsprüfung 可复制分发测试 2507.02814v1
  • 1244 07-03 Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference Cycle-Consistent Helmholtz Machine: Ziel-gesäte Simulation durch Inverted Inferenz Helmholtz 机 : 通过反向推论进行目标种子模拟 2507.03065v1
  • 1245 07-03 In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization In-Training Multikalibrierte Überlebensanalyse für das Gesundheitswesen durch eingeschränkte Optimierung 通过 “ 限制最佳化 “ ,对通过 “ 限制优化 “ 进行的 “ 多校制 “ 多校准生存分析 “ 进行的 “ 多校准 “ 培训中 “ 多校准 “ 健康生存分析 “ 2507.02807v1
  • 1246 07-03 Transferrable Surrogates in Expressive Neural Architecture Search Spaces Übertragbare Surrogate in Expressive Neural Architecture Suchplätze 显性神经结构搜索空间中的可转让代孕人 2504.12971v3
  • 1247 07-03 Learning to Coordinate Bidders in Non-Truthful Auctions Lernen, Bieter in nicht-wahrheitsgemäßen Auktionen zu koordinieren 学习如何在非真话拍卖中协调投标人 2507.02801v1
  • 1248 07-03 GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling GPAS: Beschleunigung der Konvergenz des LLM-Vortrainings durch Gradient-Preserving Activation Scaling GPAS:通过 “ 渐进式保留动力扩增 “ 加速汇集LLM预备训练 2506.22049v2
  • 1249 07-03 Understanding and Improving Length Generalization in Recurrent Models Verständnis und Verbesserung der Längenverallgemeinerung in rekurrenten Modellen 经常模式中的了解和改进时间长度的概述 2507.02782v1
  • 1250 07-03 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Selbstkorrektionsbank: Enthüllung und Adressierung des Selbstkorrektions-Blindflecks in LLMs 自我校正法官:在LLMs中披露和处理自我校正的盲人点 2507.02778v1
  • 1251 07-03 LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet LLM-Powered Vorhersage von Hyperglykämie und Entdeckung von Verhaltenstherapie Wege von Wearables und Diät LLM 法力预测超血糖和从穿衣和饮食中发现的行为处理途径 2503.03935v2
  • 1252 07-03 BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data BERT4Traj: Transformer-basierter Trajektorien-Rekonstruktion für Sparse Mobility Data BERT4Traj: 变压器基础轨迹重建,用于分析移动数据 2507.03062v1
  • 1253 07-03 Grounding Intelligence in Movement Erdungsintelligenz in der Bewegung 移动中的地面情报 2507.02771v1
  • 1254 07-03 Avoiding Catastrophe in Online Learning by Asking for Help Vermeiden von Katastrophen im Online-Lernen, indem Sie um Hilfe bitten 通过请求帮助避免在线学习中的灾难 2402.08062v5
  • 1255 07-03 Multiple data-driven missing imputation Mehrere datengetriebene fehlende Imputation 多数据驱动的缺失估算 2507.03061v1
  • 1256 07-03 MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v3
  • 1257 07-03 Contextual Online Pricing with (Biased) Offline Data Kontextuelle Online-Preise mit (Biased) Offline-Daten 在线在线在线在线上上上上下线数据(比亚) 2507.02762v1
  • 1258 07-03 Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness Quantifizierung der sektorübergreifenden Intersecting-Diskrepanzen innerhalb mehrerer Gruppen mittels Latent Class-Analyse zur Fairness 量化多个群体内部的跨部门交叉差异,利用编内分类分析进行量化,以实现公平 2407.03133v4
  • 1259 07-03 Fast and Simplex: 2-Simplicial Attention in Triton Schnell und Simplex: 2-Simplizielle Aufmerksamkeit in Triton 快速和简单: Tritton 中的 2 简化注意 2507.02754v1
  • 1260 07-03 Towards a Novel Measure of User Trust in XAI Systems Auf dem Weg zu einem neuartigen Maß für das Vertrauen der Nutzer in XAI-Systeme 争取对XAI系统用户信任的新措施 2405.05766v2
  • 1261 07-03 Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics Lineare Aufmerksamkeit mit globalem Kontext: Ein mehrpoliger Aufmerksamkeitsmechanismus für Vision und Physik 全球一线关注:视觉和物理多极关注机制 2507.02748v1
  • 1262 07-03 Early Signs of Steganographic Capabilities in Frontier LLMs Frühe Anzeichen von Steganographischen Fähigkeiten in Frontier LLMs 边疆长长体动物能力早期信号 2507.02737v1
  • 1263 07-03 Classification by Separating Hypersurfaces: An Entropic Approach Klassifizierung durch Trennen von Hyperoberflächen: Ein entropischer Ansatz 按分离超表层分类: 2507.02732v1
  • 1264 07-03 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Bourbaki: Selbsterzeugte und zielkonditionierte MDPs für Theorem-Proving Bourbaki:为理论证明提供自创和有目标、有条件的MDP 2507.02726v1
  • 1265 07-03 Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms Hierarchisches Multi-Label Kontrastives Lernen für Protein-Protein-Interaktionsvorhersage über Organismen hinweg 高等级多标签差异学习,促进各种组织间蛋白因-蛋白因-蛋白质-蛋白因互动预测 2507.02724v1
  • 1266 07-03 On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability Über Charakterisierungen für die Sprachgenerierung: Interplay von Halluzinationen, Breadth und Stabilität 语言生成特征:幻觉、面包和稳定之间的相互作用 2412.18530v2
  • 1267 07-03 Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Gradient-Based Model Fingerprinting für LLM Ähnlichkeitserkennung und Familienklassifizierung LLM相似性探测和家庭分类的渐进式样指纹 2506.01631v2
  • 1268 07-03 A Comprehensive Machine Learning Framework for Micromobility Demand Prediction Umfassendes Machine Learning Framework für die Nachfragevorhersage der Mikromobilität 微观流动需求预测综合机械学习框架 2507.02715v1
  • 1269 07-03 A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Eine Forget-and-Grow-Strategie für tiefe Verstärkungs-Lernskalierung in kontinuierlicher Kontrolle 连续控制深入加强学习的 “ 遗忘与增长 “ 战略 2507.02712v1
  • 1270 07-03 Fluid Democracy in Federated Data Aggregation Fluide Demokratie in Federated Data Aggregation 联邦数据聚合中流体民主 2507.02710v1
  • 1271 07-03 Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Multi-Agenten-Verstärkungs-Lernen für dynamische Preise in Lieferketten: Benchmarking Strategic Agent Verhaltensmuster unter realistisch simulierten Marktbedingungen 供应链动态定价多机构强化学习:在现实模拟市场条件下确定战略代理行为基准 2507.02698v1
  • 1272 07-03 RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes RLHGNN: Verstärkung Lernorientiertes Heterogenes Graph Neuronales Netzwerk für die nächste Aktivitätsvorhersage in Geschäftsprozessen RLHGNN: 业务流程下一个活动预测的强化学习驱动的异质图形神经网络 2507.02690v1
  • 1273 07-03 HAPI: A Model for Learning Robot Facial Expressions from Human Preferences HAPI: Ein Modell zum Lernen von Roboter-Gesichtsausdrücken aus menschlichen Präferenzen HAPI: 人类首选机器人法表达方式学习模式 2503.17046v2
  • 1274 07-03 Learning few-step posterior samplers by unfolding and distillation of diffusion models Lernen der wenigen Schritte hinteren Sampler durch Entfalten und Destillation von Diffusionsmodellen 通过推广模型的推出和蒸馏,学习几步后后后继采样器 2507.02686v1
  • 1275 07-03 Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs Einbetten-basierter Federated Data Sharing über differentially Private Conditional VAEs 通过差异私人条件性自愿和有条件的 嵌入式基于嵌入式的联邦联邦数据分享 2507.02671v1
  • 1276 07-03 Guided Generation for Developable Antibodies Geführte Generation für entwickelbare Antikörper 可开发抗体的代人制 2507.02670v1
  • 1277 07-03 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding OmniDraft: Ein Cross-Vocabulary, Online Adaptive Drafter für die gerätespezifische Dekodierung 总括草案:跨词汇、在线在线可调适性套用投机下限设计图纸 2507.02659v1
  • 1278 07-03 Fair Deepfake Detectors Can Generalize Faire Deepfake-Detektoren können generalisieren 公平深假探测器可普遍化 2507.02645v1
  • 1279 07-03 On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Effiziente bayesische Exploration im modellbasierten Ausbau-Lernen 关于在示范强化学习中高效探索贝耶斯人 2507.02639v1
  • 1280 07-03 High-Order Deep Meta-Learning with Category-Theoretic Interpretation High-Order Deep Meta-Learning mit kategorie-theoretischer Interpretation 具有分类理论判读的高等级深层元元学习 2507.02634v1
  • 1281 07-03 Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Medical Data Pecking: Ein kontextorientierter Ansatz zur automatisierten Qualitätsbewertung strukturierter medizinischer Daten 医疗数据分级:结构医疗数据自动质量评价的内幕意识方法 2507.02628v1
  • 1282 07-03 A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes Ein Matrix-Varianten-Auto-Encoder für Variant-Effekt-Vorhersage in Pharmakogenen 药用植物可变效应预测矩阵变式自动编码器 2507.02624v1
  • 1283 07-03 L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation L-VAE: Variationaler Auto-Encoder mit erlernbarer Beta zur entfremdeten Darstellung L-VAE: 具有拆分代表的可学习贝塔变量自动编码器 2507.02619v1
  • 1284 07-03 Interpreting Graph Inference with Skyline Explanations Graphenableitung mit Skyline-Erläuterungen interpretieren 与天线解释解释的图表推理 2505.07635v2
  • 1285 07-03 Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection Orientation-Aware Sparse Tensor PCA für effiziente, unüberwachte Feature-Auswahl 用于高效、不受监督地物选择的五氯苯甲醚 2407.16985v3
  • 1286 07-03 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Lost in Latent Space: Eine empirische Studie von Latent Diffusionsmodellen für die Emulation von Physik 失落于隐时空:物理模拟的中子扩散模型经验研究 2507.02608v1
  • 1287 07-03 Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs” Nieder mit der Hierarchie: Das ‘H’ in HNSW steht für “Hubs” 打倒等级制度: HNSW的”H”代表”Hubs” 2412.01940v3
  • 1288 07-03 De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks De-AntiFake: Die Protective Perturbations gegen Stimmen-Klonangriffe neu denken De-Antifake:重新思考对语音克隆攻击进行保护性干扰 2507.02606v1
  • 1289 07-03 Automated Grading of Students’ Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models Automatisierte Abstufung der handschriftlichen Grafiken von Studenten: Vergleich von Meta-Learning und Vision-Large Language Models 学生手写图图的自动分级:混合学习和视野大的语言模型比较 2507.03056v1
  • 1290 07-03 StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models StructTransform: Eine skalierbare Angriffsfläche für sicherheitsorientierte große Sprachmodelle 条形变形:用于安全-统一大语言模型的可缩放攻击表面 2502.11853v2
  • 1291 07-03 Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data Padé Ungefähre neurale Netzwerke für verbesserte elektrische Motorfehlerdiagnose mit Vibration und akustischen Daten 利用振动和声频数据增强电动机动车故障诊断 2507.02599v1
  • 1292 07-03 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks SURE-VQA: Systematisches Verständnis der Robustheitsbewertung in medizinischen VQA-Aufgaben SNU-VQA: 系统了解医疗VQA任务中的强力评价 2411.19688v3
  • 1293 07-03 Revisiting Active Learning under (Human) Label Variation Aktives Lernen unter (menschlichen) Label-Varianten 在(人)标签标签变换下重新审查积极学习 2507.02593v1
  • 1294 07-03 Scalable Interconnect Learning in Boolean Networks Skalierbares Interconnect-Lernen in Booleschen Netzwerken 布尔网络中的可缩放互连学习 2507.02585v1
  • 1295 07-03 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings SoccerDiffusion: Auf dem Weg zum Lernen von End-to-End Humanoid Robot Soccer aus Gameplay-Aufnahmen 足球传播:从游戏游戏录音录制中学习最终到最终人类的机器人足球 2504.20808v2
  • 1296 07-03 From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction Von 2:4 bis 8:16 Sparsitätsmuster in LLMs für Ausreißer und Gewichte mit Varianzkorrektur 从2:4至8:16差异校正后外部线和重力LLMLMs的宽度模式 2507.03052v1
  • 1297 07-03 Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v2
  • 1298 07-03 Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Transformer benötigen keine Layernorm zur Inferenzzeit: Skalierung der Layernorm Entfernung zu GPT-2 XL und die Implikationen für mechanistische Interpretierbarkeit 突变器不需要在推断时间使用层内线: 将层内线缩放去除到 GPT-2 XL 和对机械可解释性的影响 2507.02559v1
  • 1299 07-03 Fading memory and the convolution theorem Das verblassende Gedächtnis und das Konvolutionstheorem 过时的记忆和变幻的定理 2408.07386v3
  • 1300 07-03 Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements Adaptive probabilistische ODE-Lösemittel ohne adaptive Speicheranforderungen 无适应性内存要求的适应性概率性ODE溶剂 2410.10530v2
  • 1301 07-03 Flow Matching on Lie Groups Flow Passend zu Lie-Gruppen 与 Lis Group 匹配 2504.00494v2
  • 1302 07-03 Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints Interleaved Gibbs Diffusion: Diskrete-kontinuierliche Daten mit impliziten Einschränkungen generieren Gibbs Interlebert Gibbs 扩散:产生具有隐含限制的分散持续数据 2502.13450v2
  • 1303 07-03 AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench KI-Forscher für maschinelles Lernen: Suche, Exploration und Generalisierung in der MLE-Bench AI 机器学习研究代理机构:MLE-Bench的搜索、探索和普及 2507.02554v1
  • 1304 07-03 Position: A Theory of Deep Learning Must Include Compositional Sparsity Position: Eine Theorie des tiefen Lernens muss kompositorische Sparsamkeit einschließen 位置:深深学习理论必须包含构成平等 2507.02550v1
  • 1305 07-03 Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling Offline-Verstärkung Lernen für das Lernen zum Versand für Job Shop Scheduling 离线强化学习,学习到派到招工商店日程安排的调度 2409.10589v4
  • 1306 07-03 Higher-Order Singular-Value Derivatives of Rectangular Real Matrices Hohe Ordnung Singular-Value Derivate der rechteckigen Realmatrizen 矩形真实矩阵的高端圆形单值衍生物 2506.03764v3
  • 1307 07-03 Assessing Quantum Advantage for Gaussian Process Regression Bewertung des Quantenvorteils für Gaussian Process Regression 评估高山进程倒退的量度优势 2505.22502v2
  • 1308 07-03 Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN Normalität in Diskrete-Level-Frage Schwierigkeitsgrad Schätzung: Einführung ausgewogene DRPS und bestelltLogitNN 不同层次问题难以估计的常态:采用平衡的DRPS和有秩序LogitNN 2507.00736v2
  • 1309 07-03 A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning Eine quadratische Peg in einem quadratischen Loch: Meta-Experte für langes semi-überwachtes Lernen 广场洞中的一个广场塞格:长期辍学的半模拟学习的元专家 2505.16341v2
  • 1310 07-03 RetrySQL: text-to-SQL training with retry data for self-correcting query generation RetrySQL: Text-zu-SQL-Training mit Retry-Daten zur Selbstkorrektur der Abfragegenerierung 重新trySQL: 文本到 SQL 培训, 并重试数据, 用于自校查询生成 2507.02529v1
  • 1311 07-03 Deep learning four decades of human migration Tiefes Lernen von vier Jahrzehnten menschlicher Migration 四十年人类移徙的深入学习 2506.22821v2
  • 1312 07-03 Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling 通过推动积极抽样,自我监督的演讲人核查框架 2501.17772v3
  • 1313 07-03 Reconsidering the energy efficiency of spiking neural networks Über die Energieeffizienz neuronaler Netze nachdenken 重新考虑突发神经网络的能源效率 2409.08290v2
  • 1314 07-03 Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v2
  • 1315 07-03 Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning Selbstgesteuerte Prozess-Reward-Optimierung mit neu definiertem Schrittweiser Vorteil für Prozess-Verstärkungs-Lernen 自指导流程向上优化,具有重新定义的逐步改进的流程强化学习优势 2507.01551v2
  • 1316 07-03 Monitoring of Static Fairness Überwachung der statischen Fairness 监测静态公平 2507.03048v1
  • 1317 07-03 TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification TFOC-Net: Ein kurzzeitiger Fourier-Transform-basierter Deep-Learning-Ansatz zur Verbesserung der Klassifizierung bereichsübergreifender Motorbilder TFOC-Net:一个短时间短时间的四倍变换式深学习方法,以加强跨主题机动机图象分类 2507.02510v1
  • 1318 07-03 The Choice of Normalization Influences Shrinkage in Regularized Regression Die Wahl der Normalisierung beeinflusst Schrumpfung bei regularisierter Regression 正常化的影响在正规化后退过程中的缩减 2501.03821v3
  • 1319 07-03 IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders IndianBailJudgments-1200: Ein Multi-Attribut-Datensatz für legale NLP auf indischen Bail-Aufträgen IndianBailJail Judgments-12000:印度保释令法律国家保护程序多属性数据集 2507.02506v1
  • 1320 07-03 Continual Gradient Low-Rank Projection Fine-Tuning for LLMs Kontinuierliche gradiente Low-Rank-Projektion Feintuning für LLMs LLMM 连续渐进式低射频低射射精度 2507.02503v1
  • 1321 07-03 Online Conformal Prediction with Efficiency Guarantees Online-Konforme Vorhersage mit Effizienzgarantien 具有效率保障的在线非正式预测 2507.02496v1
  • 1322 07-03 Reliability-Adjusted Prioritized Experience Replay Reliability-Adjusted Prioritized Experience Replay 调整了可靠性调整后确定优先经验重述 2506.18482v2
  • 1323 07-03 MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations MC-INR: Effiziente Kodierung von multivariaten wissenschaftlichen Simulationsdaten mit Meta-Learning und clusterierten Impliziten Neuraldarstellungen MC-INR:利用元学习和集成隐隐性神经图示对多变量科学模拟数据进行有效编码 2507.02494v1
  • 1324 07-03 Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke Ergebnisvorhersage und individualisierte Abschätzung des Behandlungseffekts bei Patienten mit großem Gefäßverschluss-Schlaganfall 大型船舶隔离中风病人的结果预测和个人化治疗估计效果 2507.03046v1
  • 1325 07-03 Deep Transfer Learning for Kidney Cancer Diagnosis Deep Transfer Learning für Nierenkrebs-Diagnose 肾脏癌症诊断深入转移学习 2408.04318v2
  • 1326 07-03 Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD Learning Massive-scale Partial Correlation Networks in klinischen Multi-Omics-Studien mit HP-ACCORD 与HP-ACCORD在临床多工程学研究中学习大规模大规模局部关联网络 2412.11554v3
  • 1327 07-03 Variational Kolmogorov-Arnold Network Unterschiedliches Kolmogorov-Arnold-Netzwerk 变化科尔莫戈罗夫-阿诺尔德网络 2507.02466v1
  • 1328 07-03 Quantum-enhanced causal discovery for a small number of samples Quantenverstärkte kausale Entdeckung für eine kleine Anzahl von Proben 少数样本的量吨量强化因果发现 2501.05007v2
  • 1329 07-03 Universal Collection of Euclidean Invariants between Pairs of Position-Orientations Universelle Sammlung von Euklidischen Invarianten zwischen Paaren von Positionsorientierungen 位置对等位置对等之间通用的欧立地异异异物收集 2504.03299v2
  • 1330 07-03 Roto-Translation Invariant Metrics on Position-Orientation Space Roto-Translation Invariante Metrics auf Positions-Orientierungsraum 定位空间的旋转变换变化计量器 2504.03309v2
  • 1331 07-03 Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic Rote Traubenerkennung mit beschleunigten künstlichen neuronalen Netzwerken in der programmierbaren Logik des FPGA FPGA的可编程逻辑的红葡萄探测与加速人工神经网络 2507.02443v1
  • 1332 07-03 A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms 二进制黑洞波形的深学习动力数字相对相对性替代工具 2412.06946v2
  • 1333 07-03 Unsupervised Cognition Unüberwachte Kognition 未受监督的认知 2409.18624v3
  • 1334 07-03 Anatomical Foundation Models for Brain MRIs Anatomische Grundlagenmodelle für Hirn-MRTs 脑吸管机模型解剖基础模型 2408.07079v4
  • 1335 07-03 Optimisation Is Not What You Need Optimierung ist nicht das, was man braucht 最优化不是你需要的 2507.03045v1
  • 1336 07-03 The Artificial Scientist – in-transit Machine Learning of Plasma Simulations Der Künstliche Wissenschaftler – in-transit maschinelles Lernen von Plasmasimulationen 人造科学家 – – Plasma模拟模拟的中转机器学习 2501.03383v3
  • 1337 07-03 Is Complex Query Answering Really Complex? Ist die Antwort auf komplexe Abfragen wirklich komplex? 复杂查询回答是否真的复杂? 2410.12537v3
  • 1338 07-03 Determination Of Structural Cracks Using Deep Learning Frameworks Bestimmung von strukturellen Rissen mittels Deep Learning Frameworks 利用深学习框架确定结构裂缝 2507.02416v1
  • 1339 07-03 ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForgGrip: VR 手动操纵中现实力力控的无参考课程学习 2503.08061v4
  • 1340 07-03 EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks EquiTabPFN: Eine Ziel-Permutation Equivariant Prior Fitted Networks EquiTabPFN: 目标-变异等同先前适合的网络 2502.06684v2
  • 1341 07-03 Significativity Indices for Agreement Values Signifikativitätsindizes für Vereinbarungswerte 协议价值的量化指数 2504.15325v2
  • 1342 07-03 Attention-based clustering Aufmerksamkeitsbasiertes Clustering 以关注为主的集群集群 2505.13112v2
  • 1343 07-03 Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability Benchmarking Akan ASR-Modelle über Domain-spezifische Datensätze: Eine vergleichende Bewertung von Leistung, Skalierbarkeit und Anpassungsfähigkeit 确定Akan ASR模型基准的全域具体数据集:业绩比较评价、可缩放性和可调适性 2507.02407v1
  • 1344 07-03 Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization Verbesserung der Konsistenz in der Fahrzeug-Trajektorie-Vorhersage durch Preference-Optimierung 通过优先优化提高车辆轨迹预测的一致性 2507.02406v1
  • 1345 07-03 XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v3
  • 1346 07-03 Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings Wildlife Target Re-Identification mit selbstüberwachtem Lernen in nicht-Urban-Einstellungen 在非城市环境中利用自我监督的学习重新确定野生动物目标 2507.02403v1
  • 1347 07-03 TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation TABNet: Ein Triplet Augmentation Self-Recovery Framework mit Boundary-Aware Pseudo-Labors für die Segmentierung medizinischer Bilder TABNet:三联增强自我恢复框架,配有医疗图像分割的边界-软件软件-标签 2507.02399v1
  • 1348 07-03 Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement Hinterer Übergang Modellierung für unüberwachte Diffusions-basierte Sprachverbesserung 未经监督的基于传播的语音增强的外向过渡建模 2507.02391v1
  • 1349 07-03 MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning MInCo: Informationskonflikte im distraktierten visuellen Modell-basierten Verstärkungslernen abmildern MINCO: 减缓视觉模型化强化学习中的信息冲突 2504.04164v3
  • 1350 07-03 Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited Sparse Gaussian Processes: Strukturierte Annäherungen und Power-EP Revisited 粗浅的高斯进程:结构相近和电源-EP 重新审视 2507.02377v1
  • 1351 07-03 Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations Gateformer: Multivariate Zeitreihen durch Temporale und Variate-Wise-Achtung mit gezahnten Darstellungen prognostizieren Gatefex:通过时空和蒸汽预报推进多变时间序列 2505.00307v3
  • 1352 07-03 Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Gute Darstellung, bessere Erklärung: Rolle konvolutionärer neuraler Netzwerke bei transformerbasierter Fernerkundung Bildunterschriften 良好的表述,更好的解释:进化神经网络在以变换器为基础的遥感图像描述中的作用 2502.16095v2
  • 1353 07-03 Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes Erforschung der Integration von großen Sprachmodellen in industrielle Testwartungsprozesse 探索将大语言模型纳入工业试验维护工艺 2409.06416v2
  • 1354 07-03 Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations Tiefenverstärkte DRAM-Equalizer-Parameter-Optimierung mit latenten Darstellungen 利用延迟代表方式优化深度强化学习-基于深强化的 DRAM 平准参数 2507.02365v1
  • 1355 07-03 Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents Erkennung von Schienengleisen und Menschen in der Nähe des Gleises, um Unfälle zu vermeiden 在避免事故的轨道上探测铁路线轨迹和附近人类 2507.03040v1
  • 1356 07-03 Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling Verbesserung der Swarms Durability to Threats durch Graph Signal Processing und GNN-basierte Generative Modellierung 通过图形信号处理和基于全球NNN的创制模型,提高牲畜对威胁的易感性 2507.03039v1
  • 1357 07-03 Offline Reinforcement Learning with Penalized Action Noise Injection Offline-Verstärkung Lernen mit Penalized Action Noise Injection 用惩罚性行动噪音注射加强离线强化学习 2507.02356v1
  • 1358 07-03 Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v2
  • 1359 07-03 Privacy-Preserving Operating Room Workflow Analysis using Digital Twins Datenschutz-Bewahrung von Betriebsraum-Workflow-Analysen mit digitalen Zwillingen 利用数字双对进行工作流量分析 2504.12552v2
  • 1360 07-03 UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security UniNet: Ein einheitliches Multi-Granular Traffic Modeling Framework für Netzwerksicherheit UniNet:网络安全统一多重点交通示范框架 2503.04174v2
  • 1361 07-03 DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v1
  • 1362 07-03 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2:通过人类-AI协同增强优先数据曲线 2507.01352v2
  • 1363 07-03 Cautious Next Token Prediction Vorsichtige nächste Zeichen Vorhersage 谨慎的次下 Tok 预测 2507.03038v1
  • 1364 07-03 SecAlign: Defending Against Prompt Injection with Preference Optimization SecAlign: Abwehr gegen prompte Injektion mit Preference-Optimierung Secallign: 以优先优化防范迅速注射 2410.05451v3
  • 1365 07-03 Path Planning using a One-shot-sampling Skeleton Map Pfadplanung mit einer One-Shot-Sampling-Skeleton-Karte 使用单点抽样的 Skeleton 地图进行路径规划 2507.02328v1
  • 1366 07-03 High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures Leistungsstarkes Stärkungslernen vor Ort: Simulationsparameter mit distributiven Maßnahmen optimieren 高绩效加强现场学习:以分配措施优化模拟参数 2504.17857v3
  • 1367 07-03 Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v2
  • 1368 07-03 Transformer-based EEG Decoding: A Survey Transformer-basierte EEG-Dekodierung: Eine Umfrage 以变换器为基础的电子EEG编码:调查 2507.02320v1
  • 1369 07-03 Generalization vs. Specialization under Concept Shift Verallgemeinerung vs. Spezialisierung unter Konzeptverschiebung 普遍化与概念转变下的专业化 2409.15582v2
  • 1370 07-03 Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo Verbesserung der eingeschränkten Generierung in Sprachmodellen durch selbstdestillierte Twisted Sequential Monte Carlo 通过自行筛选的蒙蒙特卡洛顺序改进语言模型中受约束的一代 2507.02315v1
  • 1371 07-03 Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment Ganzheitliches kontinuierliches Lernen unter Konzept Drift mit adaptiver Memory-Neuausrichtung 适应性记忆调整概念下的全面持续学习 2507.02310v1
  • 1372 07-03 Distributional Soft Actor-Critic with Diffusion Policy Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik 配发软软软动作- 带有传播政策批评器 2507.01381v2
  • 1373 07-03 Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks Neurale CRNs: Eine natürliche Umsetzung des Lernens in chemischen Reaktionsnetzwerken CRNs:化学反应网络学习的自然应用 2409.00034v3
  • 1374 07-03 Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Bypass Back-Propagation: Optimierungsbasiertes Structural Pruning für große Sprachmodelle über Policy Gradient Bypass 后回通信:通过 “ 政策梯度 “ 优化基于优化的结构结构,为大语言模式提供缓冲 2406.10576v3
  • 1375 07-03 Kernel Density Bayesian Inverse Reinforcement Learning Kernel Dichte Bayesian Inverse Verstärkung Lernen 内核冷密度贝耶斯反强化学习 2303.06827v4
  • 1376 07-03 DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning DoMIX: Ein effizientes Framework zur Nutzung von Domain-Wissen im Feintuning DoMIX:一个在微调中利用域知识的有效框架 2507.02302v1
  • 1377 07-03 Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications Wissensgraphisch erklärbare und generalisierte zero-shot semantische Kommunikation 基于知识的图表可解释和通用的零热语义通信 2507.02291v1
  • 1378 07-03 Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain Optimale Strategien zur mehrsprachigen Analyse sozialer Inhalte für einen neuartigen Datensatz im Tourismusbereich 为旅游领域新数据集的社会内容进行多语种社会内容分析的最佳最佳战略 2311.14727v2
  • 1379 07-03 Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization Prompt Entwirren über Sprachführung und Repräsentation Ausrichtung für Domain-Verallgemeinerung 通过语言指导和代表对称调整,迅速解脱 2507.02288v1
  • 1380 07-03 Content filtering methods for music recommendation: A review Methoden zur Inhaltsfilterung für Musikempfehlungen: Eine Rezension 音乐建议的内容过滤方法:审查 2507.02282v1
  • 1381 07-03 Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments Kausales Repräsentationslernen mit generativer Künstlicher Intelligenz: Anwendung auf Texte als Behandlungen 产生人工智能的因果代表性学习:应用文字作为治疗 2410.00903v3
  • 1382 07-03 Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge bei CVPR 2025 MEIS Workshop 基准的可通用二手操纵:2025年欧洲气象和气象科学研究所讲习班上的机器人双臂双臂合作挑战 2506.23351v2
  • 1383 07-03 Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model Adaptives kubisches Regularisiertes Latenten-Faktoranalysemodell zweiter Ordnung 适应性三次立方 正规化的二至二极冷态要素分析模型 2507.03036v1
  • 1384 07-03 Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach Erklärbare KI für umfassende Risikobewertung für Finanzberichte: Ein leichter Hierarchischer Transformer-Netzwerkansatz 用于财务报告综合风险评估的可解释的AIAI:轻量级高压变压器网络方法 2506.23767v2
  • 1385 07-03 Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning Neural Graph Matching verbessert retrieval Augmented Generation im molekularen maschinellen Lernen 在分子机学习中改进回流增强一代的神经图相匹配 2502.17874v2
  • 1386 07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding NLP4Neuro: Sequence-to-Sequence-Lernen für neuronale Populationsdekodierung NLP4Neuro:神经人口解码的序列到序列学习 2507.02264v1
  • 1387 07-03 MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent MemAgent: Umgestalten von Langkontext-LLM mit Multi-Conv RL-basierten Speicheragenten MemerAgent: 与基于多Conv RL的内存代理重塑长文本LLM 2507.02259v1
  • 1388 07-03 Uncertainty-aware Reward Design Process Unsicheres Design von Belohnungen 不确定性-对不确定性的认知度-奖励设计流程 2507.02256v1
  • 1389 07-03 Listwise Preference Alignment Optimization for Tail Item Recommendation Listwise Preference Alignment Optimization for Tail Item Empfehlung 用于反尾项目建议的优先对齐优化列表 2507.02255v1
  • 1390 07-03 Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks Schaltungs-Tuning: Mechanistischer Ansatz zur Identifizierung von Parameter Redundanz und Feinsteuerung neuraler Netzwerke 电路调控:确定参数冗余和精微调整神经网络的机械化方法 2502.06106v2
  • 1391 07-03 Adapter-Enhanced Semantic Prompting for Continual Learning Adapter-erweiterte semantische Prompting für kontinuierliches Lernen 增强适应性强化的持续学习的语义提示 2412.11074v3
  • 1392 07-03 Rethinking Data Protection in the (Generative) Artificial Intelligence Era Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken 在人工(人工)情报时代重新思考数据保护问题 2507.03034v1
  • 1393 07-03 Transfer Learning for Matrix Completion Transfer-Lernen für Matrix-Vervollständigung 用于完成矩阵表完成的转移学习 2507.02248v1
  • 1394 07-03 EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference EigenLoRAx: Recycling-Adapter, um wesentliche Subräume für ressourceneffiziente Anpassung und Schlussfolgerung zu finden EigenLORAx:为资源效率适应和推断寻找主要子空间的再循环适应器 2502.04700v4
  • 1395 07-03 VERBA: Verbalizing Model Differences Using Large Language Models VERBA: Verbalisieren von Modellunterschieden mit großen Sprachmodellen VERBA: 使用大语言模式 2507.02241v1
  • 1396 07-03 Bi-modality medical images synthesis by a bi-directional discrete process matching method Bi-Modalität medizinische Bilder Synthese durch eine bi-direktionale diskrete Prozess-Matching-Methode 以双向离散进程匹配方法对双向单离散过程匹配方法进行双向医学图像合成 2409.03977v3
  • 1397 07-03 Fast AI Model Splitting over Edge Networks Schnelle KI-Modellaufspaltung über Edge-Netzwerke 快速 AI 模型在边缘网络上分割 2507.01041v2
  • 1398 07-03 Privacy-Preserving Quantized Federated Learning with Diverse Precision Privacy-Preserving Quantized Federated Learning with Diverse Precision 具有多种精密度的隐私保护量化联邦学习 2507.00920v2
  • 1399 07-03 PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations PhysikKorrektur: Ein trainingsfreier Ansatz für stabile neurale PDE-Simulationen 物理校正:稳定神经物质模拟的无培训方法 2507.02227v1
  • 1400 07-03 DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs DecoRTL: Ein Laufzeit-Decoding-Framework für RTL-Code-Generierung mit LLMs DecoRTL: 使用LLMs的RTL代码生成运行时间解码框架 2507.02226v1
  • 1401 07-03 Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction Metrisches Design != Metrisches Verhalten: Verbesserung der Metrischen Auswahl für die unvoreingenommene Bewertung der Dimensionalitätsreduktion 计量设计 ! = 计量行为:改进对减少多维度的无偏见评价的计量选择 2507.02225v1
  • 1402 07-03 On the Mathematical Impossibility of Safe Universal Approximators Über die mathematische Unmöglichkeit sicherer Universal-Annäherer 安全普遍接近者在数学方面的不易性 2507.03031v1
  • 1403 07-03 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v2
  • 1404 07-03 Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v3
  • 1405 07-03 Hybrid least squares for learning functions from highly noisy data Hybride kleinste Quadrate für das Lernen von Funktionen aus hochlauten Daten 从高噪音数据中学习函数最小混合方格 2507.02215v1
  • 1406 07-03 SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers SciGA: Ein umfassender Datensatz zur Gestaltung grafischer Abstracts in wissenschaftlichen Papieren SciGA: 用于设计学术论文制图摘要的综合数据集 2507.02212v1

Article 0

Title@2025-07-10 (4): Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models

Title: Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models Auswirkungen von Pretraining Word Co-occurence auf die kompositorische Generalisierung in multimodalen Modellen 预言前世界共同会议对多式联运模式中整体构成的影响 2507.08000v1

Authors (2): Helen Qu, Sang Michael Xie

CLIP and large multimodal models (LMMs) have better accuracy on examples involving concepts that are highly represented in the training data. However, the role of concept combinations in the training data on compositional generalization is largely unclear – for instance, how does accuracy vary when a common object appears in an uncommon pairing with another object? In this paper, we investigate how word co-occurrence statistics in the pretraining dataset (a proxy for co-occurrence of visual concepts) impacts CLIP/LMM performance. To disentangle the effects of word co-occurrence frequencies from single-word frequencies, we measure co-occurrence with pointwise mutual information (PMI), which normalizes the joint probability of two words co-occurring by the probability of co-occurring independently. Using synthetically generated images with a variety of concept pairs, we show a strong correlation between PMI in the CLIP pretraining data and zero-shot accuracy in CLIP models trained on LAION-400M (r=0.97 and 14% accuracy gap between images in the top and bottom 5% of PMI values), demonstrating that even accuracy on common concepts is affected by the combination of concepts in the image. Leveraging this finding, we reproduce this effect in natural images by editing them to contain pairs with varying PMI, resulting in a correlation of r=0.75. Finally, we demonstrate that this behavior in CLIP transfers to LMMs built on top of CLIP (r=0.70 for TextVQA, r=0.62 for VQAv2). Our findings highlight the need for algorithms and architectures that improve compositional generalization in multimodal models without scaling the training data combinatorially. Our code is available at https://github.com/helenqu/multimodal-pretraining-pmi.

CLIP 和大型多式联运模型(LMMs) 对培训数据中高度体现的概念实例的准确性更好。 但是,在培训数据中,概念组合在组成总体化的培训数据中的作用基本上不清楚,例如,当一个共同对象出现在与另一个对象不常见的配对中时,一个共同对象与另一个对象的概率会如何不同呢? 在本文中,我们调查培训前数据集(用于共同发现视觉概念的代名)中的单词共振/LMMM性能如何影响CLIP/LMM性能。为了分解单词频率的单词共振频率的影响,我们用点对调的相互信息(PMI),我们用点对调的正值A 共振值(PMI) 来测量共同组合的组合。使用合成生成的图像与各种概念配对的组合,我们显示在CLIPPIP前期数据与CIPpregreporting data blates the blateal conditional lax the we fal lishal lishal lishal lives in the mess of the listal lishal listal dal dal listrations in the the messments the the listal listal listal listal listaldaldal lishements.


Article 1

Title: Single-pass Adaptive Image Tokenization for Minimum Program Search Single-Pass Adaptive Image Tokenization für minimale Programmsuche 用于最低程序搜索的单一被动图像适配 2507.07995v1

Authors (5): Shivam Duggal, Sanghyun Byun, William T. Freeman, Antonio Torralba, Phillip Isola

According to Algorithmic Information Theory (AIT) – Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual representation learning systems use fixed-length representations for all inputs, ignoring variations in complexity or familiarity. Recent adaptive tokenization methods address this by allocating variable-length representations but typically require test-time search over multiple encodings to find the most predictive one. Inspired by Kolmogorov Complexity principles, we propose a single-pass adaptive tokenizer, KARL, which predicts the appropriate number of tokens for an image in a single forward pass, halting once its approximate KC is reached. The token count serves as a proxy for the minimum description length. KARL’s training procedure closely resembles the Upside-Down Reinforcement Learning paradigm, as it learns to conditionally predict token halting based on a desired reconstruction quality. KARL matches the performance of recent adaptive tokenizers while operating in a single pass. We present scaling laws for KARL, analyzing the role of encoder/decoder size, continuous vs. discrete tokenization and more. Additionally, we offer a conceptual study drawing an analogy between Adaptive Image Tokenization and Algorithmic Information Theory, examining the predicted image complexity (KC) across axes such as structure vs. noise and in- vs. out-of-distribution familiarity – revealing alignment with human intuition.

根据 “ AIT “ 算法信息理论(AIT) – – 智能演示将数据压缩到能够重建其内容的尽可能最短的程序中,显示低科尔莫戈洛夫复杂度(KC) 。相反,大多数视觉演示学习系统使用所有投入的固定长度表示,忽略复杂性或熟悉度的差异。最近适应性象征性方法通过分配不同长度表示方法解决这个问题,但通常需要对多个编码进行测试-时间搜索,以找到最有预测性的标本。在科尔莫戈洛夫复杂度原则的启发下,我们提议一个单一的通用适应性标本化器,该标本预测一个图像在单一的远端路口的适当数量,在接近 KC 时停止使用。符号数作为最小描述长度的代号。 KARL的培训程序与上方强化学习模式非常相似,因为它学会有条件地预测以预期的重建质量为基础的象征停止。 KARL 匹配最近适应性符号的性,同时在单一路口运行。我们为 KARL 展示了标准化法律, 分析离心机/ 图像的解析比标 。


Article 2

Title@2025-07-10 (4): Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Title: Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Überspringen Sie eine Ebene oder Schleifen Sie es? Test-Zeit Tiefe Anpassung von vorgebildeten LLMs 跳过图层或循环它? 预设 LLM 的测试时间深度适应 2507.07996v1

Authors (3): Ziyue Li, Yang Li, Tianyi Zhou

Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.

一个经过预先训练的神经网络能否在不作任何微调的情况下使其结构适应不同的投入? 我们是否需要所有层次的简单任务, 并且它们是否足以完成具有挑战性的任务? 我们发现, 一个经过训练的大型语言模型(LLM) 的层层可以被作为单独的模块来操作, 以建立为每个测试样本定制的更好、 甚至更浅的模型。 特别是, 一个经过训练的模型的每层层可以作为经常性神经网络( RNN) 跳过/ 调整或重复多次, 并与其他层一起任意地堆叠叠, 产生一个纯层的缺陷( Cola) 。 这个构成空间空间空间可以大大扩大环绕/ 经常性的模块、 层的运行或早期退出网络。 我们开发了一个蒙特卡洛树搜索(MCTS) 协议, 来探索和确定每个样本来自数学和普通推理推理推理基准的最佳 CoLA。 比较一个固定深度的固定模型, Cola 能够找到更短的路径(快速思考) 、 同一层(slow MC) 的复现, 和结合两者, 提供更灵活、 动态的模型改进不同的空间结构 来改进不同的输入。


Article 3

Title@2025-07-10 (4): EXPO: Stable Reinforcement Learning with Expressive Policies

Title: EXPO: Stable Reinforcement Learning with Expressive Policies EXPO: Stabiles Stärkungslernen mit ausdrucksstarker Politik 出口促进: 采用表达式政策进行稳定的加强学习 2507.07986v1

Authors (4): Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn

We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies – a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.

我们用在线强化学习(RL)来研究在线强化学习(RL)的表达式政策的培训和微调问题。在线RL的培训表达式政策课程提出了稳定价值最大化的独特挑战。与在线RL通常使用的更简单的高斯政策不同,传播和流程匹配政策等表达式政策由长期脱线链来参数化,这阻碍了在优化某些价值功能时从行动到政策参数的稳定梯度传播。我们的关键洞察力是,我们能够通过避免直接优化以表达式政策取代价值,从而解决稳定价值最大化的问题,而代之以建立在线优化政策以最大限度地提高Q价值。我们建议“显示政策优化”(EXPO),即一个抽样高效的在线优化政策(EXPO),即利用一个“在现场政策”来最大限度地提高价值,同时采用两个参数化政策 – – 一种经过稳定模仿学习目标培训的更大的表达式基础政策,以及一种轻度高度的编辑政策,通过更高的价值分布,通过更高的价值分配,在基础政策上优化从标度政策,从升级到升级的升级政策,从我们之前的标度上,从基点调整到升级政策,从基点调整到升级到升级政策,选择前的升级政策,在前的标定的标比的标定的标值上。选择在前的标定的上,在前的标定的标定的上,在比的上,在比的标值上,在前的上,在比的上,选择的上,在比的上,从基底,在比的上,在比的上,在前的上,选择在比的上,在比的上,在比的上,在比的上,在前的上,在前的上,在比的上,在比的上,在比上,选择的上,在比前的上,在比的上,在比的上,在比的上,选择的上,选择的上,在比的上,选择的上,在比的上,在比的上,在比的上,在前的上,在上,在上,在上,在上,在比的上,在比的上,在前的上,在前的上,在比的上,在上,在比的上,在上,选择的上,在


Article 4

Title@2025-07-10 (4): KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Title: KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors KinDEL: DNA-kodierter Bibliotheks-Datensatz für Kinase-Inhibitoren KinDEL: Kinas Inhibbitor 的DNA编码图书馆数据集 2410.08938v2

Authors (21): Benson Chen, Tomasz Danel, Gabriel H. S. Dreiman, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Kent Gorday, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts

DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.

DNA-编码图书馆(DEL)是药物发现方面的变革技术,有助于对广大化学空间进行高通量勘探。尽管它具有潜力,但公开提供的DEL数据集的稀缺为推进该领域的机器学习方法提供了瓶颈。为了解决这一差距,我们引入了KinDEL,这是公众可获取的最大DEL数据集之一,第一个包括分子对接实验的捆绑成份。侧重于两个骨干,即Micogen-actived Protein Kinas 14 (MAPK14) 和 Discoidin Domeyopor Tyrosine Kinase 1 (DR1) , KinDEL 包括8 100万化合物,为计算探索提供了丰富的资源。此外,我们提供了全面的生物物理鉴定鉴定数据,包括DNA和DNA以外的测量,我们用这些数据来评估一套机器学习技术,包括基于新结构的概率模型。我们希望我们的基准,包括2D和3D结构,将有助于推动开发数据驱动撞击识别的机器学习模型。


Article 5

Title@2025-07-10 (4): Why is Your Language Model a Poor Implicit Reward Model?

Title: Why is Your Language Model a Poor Implicit Reward Model? Warum ist Ihr Sprachmodell ein schlechtes Implizit-Reward-Modell? 为什么您的语言模式 是一个贫穷的隐含奖赏模式? 2507.07981v1

Authors (4): Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora

Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear head over the hidden representations of a language model. The existence of a generalization gap is puzzling, as EX-RMs and IM-RMs are nearly identical. They can be trained using the same data, loss function, and language model, and differ only in how the reward is computed. Towards a fundamental understanding of the implicit biases underlying different reward model types, we investigate the root cause of this gap. Our main finding, backed by theory and experiments, is that IM-RMs rely more heavily on superficial token-level cues. Consequently, they often generalize worse than EX-RMs under token-level distribution shifts, as well as in-distribution. Furthermore, we provide evidence against alternative hypotheses for the generalization gap. Most notably, we challenge the intuitive claim that IM-RMs struggle in tasks where generation is harder than verification because they can operate both as a verifier and a generator. Taken together, our results highlight that seemingly minor design choices can substantially impact the generalization behavior of reward models.

语言模式培训后和发酵管道模式的关键在于奖赏模式。 简洁地说,最近的工作表明,每个语言模式都定义了隐含奖赏模式(IM-RM),而不需要任何建筑变革。然而,与明确奖赏模式(EX-RMs)相比,这种模式往往更加普遍,特别是分配外的。对语言模式的隐含偏见,对隐含的线性表现模式(EX-RMs)应用了专门的线性头来取代隐含的语言模式(EX-RMs),普遍化差距的存在令人费解,因为EX-RMs和IM-RMs几乎是相同的。它们可以使用相同的数据、损失功能和语言模式来进行培训,而且只能在奖赏的计算方式上有所不同。为了从根本上了解不同奖赏模式类型背后的隐含的偏见,我们调查这一差距的根源。我们的主要发现,在理论和实验的支持下,IM-RMs更多地依赖表面的提示。因此,它们通常比象征性分配选择下的EX-RMs(Ex-RMs)更差,以及分配。 此外,我们提供证据证据来反对普遍化差距的替代的替代的假设,因为常规-RMDI-I-I会要求是更强烈的产生更难的行为。


Article 6

Title@2025-07-10 (4): Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

Title: Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Wasserzeichen degradiert Ausrichtung in Sprachmodellen: Analyse und Milderung 语言模型的分级调整:分析和减轻影响 2506.04462v2

Authors (3): Apurv Verma, NhatHai Phan, Shubhendu Trivedi

Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect these core alignment properties across four aligned LLMs. Our experiments reveal two distinct degradation patterns: guard attenuation, where enhanced helpfulness undermines model safety, and guard amplification, where excessive caution reduces model helpfulness. These patterns emerge from watermark-induced shifts in token distribution, surfacing the fundamental tension that exists between alignment objectives. To mitigate these degradations, we propose Alignment Resampling (AR), an inference-time sampling method that uses an external reward model to restore alignment. We establish a theoretical lower bound on the improvement in expected reward score as the sample size is increased and empirically demonstrate that sampling just 2-4 watermarked generations effectively recovers or surpasses baseline (unwatermarked) alignment scores. To overcome the limited response diversity of standard Gumbel watermarking, our modified implementation sacrifices strict distortion-freeness while maintaining robust detectability, ensuring compatibility with AR. Experimental results confirm that AR successfully recovers baseline alignment in both watermarking approaches, while maintaining strong watermark detectability. This work reveals the critical balance between watermark strength and model alignment, providing a simple inference-time solution to responsibly deploy watermarked LLMs in practice.

大型语言模型(LLMS)的水标记技术可以对产出质量产生重大影响,但是它们对真实性、安全性和帮助性的影响仍然严重不足。本文件对两种流行的水标记方法-Gumberl和KGW如何影响四个对齐的LLMS的这些核心校准特性进行了系统分析。我们的实验揭示了两种截然不同的退化模式:保护减少,因为加强帮助性会破坏模型安全;保护放大,因为过度谨慎会减少模型的帮助性。这些模式产生于标志性分配的水标记导致的改变,揭示了调整目标之间存在的根本紧张关系。为了减轻这些退化,我们建议调整抽取(AR),即一种推断性时间取样方法,使用外部奖励模式来恢复校准。我们从理论上对预期的奖励得分的改进进行较低约束,因为抽样规模的增加,而且经验表明,仅仅2-4个有标志的代人有效地恢复或超过基线(无水标记的)校准。为了克服标准Gumber水标记的响应多样性,我们修改后的执行在保持可靠的可探测性、保证可靠基线和可靠地保持基准之间的一致性,同时确认可靠地测量和可靠地测量工作的结果。


Article 7

Title@2025-07-10 (4): Reinforcement Learning with Action Chunking

Title: Reinforcement Learning with Action Chunking Verstärktes Lernen mit Action Chunking 强化学习与行动决赛 2507.07969v1

Authors (3): Qiyang Li, Zhiyuan Zhou, Sergey Levine

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a ‘chunked’ action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

我们展示了Q入网式的简单而有效的方法,即改进长方位、低回报任务的强化学习(RL)算法。我们的配方是为脱线至在线的RL设置而设计的,目的是利用离线前数据集最大限度地提高在线学习的样本效率。有效的探索和样本效率学习仍然是这一背景下的核心挑战,因为目前还不清楚如何利用离线数据获取良好的探索政策。我们的关键见解是,在模拟学习中推广的行动组合式技术,即预测未来行动的序列而不是每个时间步骤的单一行动,可以适用于基于时间差异(TD)的RL方法,以减轻探索挑战。Q入网式采用直接在“下层”行动空间运行RL的组合动作,使代理商能够(1)利用离线数据的时间一致行为来更有效地在线探索,(2)利用公允的美元级备份来更稳定和高效的TD学习。我们的实验结果显示,Q入网式快速的离线式远程操作方法在离线上展示了最强的远的在线运行率。


Article 8

Title@2025-07-10 (4): Prospective Learning in Retrospect

Title: Prospective Learning in Retrospect Zukunftsorientiertes Lernen im Nachhinein 回溯中的未来学习 2507.07965v1

Authors (6): Yuxin Bai, Cecelia Shuai, Ashwin De Silva, Siyu Yu, Pratik Chaudhari, Joshua T. Vogelstein

In most real-world applications of artificial intelligence, the distributions of the data and the goals of the learners tend to change over time. The Probably Approximately Correct (PAC) learning framework, which underpins most machine learning algorithms, fails to account for dynamic data distributions and evolving objectives, often resulting in suboptimal performance. Prospective learning is a recently introduced mathematical framework that overcomes some of these limitations. We build on this framework to present preliminary results that improve the algorithm and numerical results, and extend prospective learning to sequential decision-making scenarios, specifically foraging. Code is available at: https://github.com/neurodata/prolearn2.

在人工智能的多数现实应用中,数据的分配和学习者的目标往往随时间而变化。支持大多数机器学习算法的很可能大致正确(PAC)学习框架没有考虑到动态数据分布和不断变化的目标,往往导致业绩欠佳。潜在学习是最近引入的一个数学框架,克服了其中一些限制。我们利用这个框架提出初步结果,改进算法和数字结果,并将预期的学习扩展至连续决策情景,特别是调试。代码见:https://github.com/neurodata/prolearn2。代码见:https://github. com/neuratad/prolearn2。


Article 9

Title@2025-07-10 (4): Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Title: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Dynamisches Chunking für die end-to-end-Hierarchische Sequenzmodellierung 端端到末端等级序列建模动态震动 2507.07955v1

Authors (3): Sukjun Hwang, Brandon Wang, Albert Gu

Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content – and context – dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net’s improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.

尽管近年来在语言模型(LMS)方面取得了令人难以置信的进展,但主要由于从为具体任务设计的专业化模型(LMS)向基于强势结构(例如变异器)的一般模型转变,这些结构从原始数据中学习一切,而从原始数据学习一切的高级模型(例如变异器),象征性化等预处理步骤仍然是真正的端对端基础模型的障碍。我们采用了一系列新技术,使动态拼拼图机制能够自动学习内容 – – 和背景 – – 与模型其他部分共同学习的依附分割战略。将这一系统纳入一个明确的等级网络(H-Net),能够取代(隐含等级的)代号代号(代号为H-Net)的代号-LM-脱位化管道,用一个单一的模型从终端到终端。当计算和数据匹配时,一个具有一级等级的H-Net-Net(Signal-degility)的H-Net(一个等级化阶段),比BPEPE代号代号代号代号代号代号代号更强得多,通过模拟的多重的模型、更精确的代号代号级、更精确的代号代号级、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确化、更精确的中文的中文的中文的中文级、更精确化、更甚、更精确化、更精确化、更精确化、更精确、更精确、更甚、更精确、更精确、更精确、更精确的中文级、更精确、更精确、更甚、更甚、更甚、更精确、更精确的中文级、更甚、更精确化、更精确化、更精确化、更精确化、更甚、更甚、更精确化、更甚、更精确化、更甚、更甚、更甚、更甚、更甚、更甚、更甚、更甚、更甚、更甚、更甚、更精确化、更精确化、更精确化、更精确化、更精确化、更甚、更甚、更甚、更甚、更甚的


Article 10

Title@2025-07-10 (4): TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices

Title: TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices TinierHAR: Auf dem Weg zu ultraleichten Deep-Learning-Modellen für effiziente menschliche Aktivitätserkennung auf Edge-Geräten TiniierHAR:迈向超轻量深深学习模型,以便有效识别人类在边缘装置方面的活动 2507.07949v1

Authors (5): Sizhen Bian, Mengxi Liu, Vitor Fortes Rey, Daniel Geissler, Paul Lukowicz

Human Activity Recognition (HAR) on resource-constrained wearable devices demands inference models that harmonize accuracy with computational efficiency. This paper introduces TinierHAR, an ultra-lightweight deep learning architecture that synergizes residual depthwise separable convolutions, gated recurrent units (GRUs), and temporal aggregation to achieve SOTA efficiency without compromising performance. Evaluated across 14 public HAR datasets, TinierHAR reduces Parameters by 2.7x (vs. TinyHAR) and 43.3x (vs. DeepConvLSTM), and MACs by 6.4x and 58.6x, respectively, while maintaining the averaged F1-scores. Beyond quantitative gains, this work provides the first systematic ablation study dissecting the contributions of spatial-temporal components across proposed TinierHAR, prior SOTA TinyHAR, and the classical DeepConvLSTM, offering actionable insights for designing efficient HAR systems. We finally discussed the findings and suggested principled design guidelines for future efficient HAR. To catalyze edge-HAR research, we open-source all materials in this work for future benchmarking\footnote{https://github.com/zhaxidele/TinierHAR}

本文介绍TiniierHAR,这是一个超轻的深深学习结构,它能将残余的深度分解变异、封闭的经常性单元(GRUs)和时间聚合结合起来,以便在不损害性能的情况下实现SOTA效率。TiniierHAR通过14个公共HAR数据集进行评估,用2.7x(vs. TinyHAR)和43.3x(vs. deep ConvLSTM)和MACs分别用6.4x和58.6x(vs. Deep Conv. ConvLSTM)和MACs(MACs)进行推导出,同时维持平均的F1-核心。除了数量收益外,这项工作提供了第一个系统化的模拟研究,将空间-时间组成部分的贡献分解了拟议的TiniierHAR(SOTA TinyyyHAR)和古典深CONULSTM,为设计高效的HAR系统提供了可操作的见解。我们最后讨论了为未来高效HAR设计的研究结果和建议的有原则的设计准则设计准则。为了CHAR-HAR-CAR-CARstelvey-comcommex ressearlead Stewside Stewside {/HARgIGYDRUDRIGY}/http://http/HARgIGHAR_GYTOTODRTOTOTOTOTOTOTOTOTOTOTOTOTOTODOT}所有材料来源。


Article 11

Title@2025-07-10 (4): Low Resource Reconstruction Attacks Through Benign Prompts

Title: Low Resource Reconstruction Attacks Through Benign Prompts Niedrige Ressourcen-Wiederaufbau Angriffe durch Benign Prompts 通过慈善提示进行低资源重建袭击 2507.07947v1

Authors (2): Sol Yarkoni, Roi Livni

The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts. In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt’’ can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.

最近推广模式等基因模型的进展引起了与隐私、版权侵犯和数据管理有关的若干风险和关切。为了更好地了解和控制这些风险,各种研究人员创造了技术、实验和攻击,从培训组中重建图像或部分图像。这些技术已经确定培训组的数据可以重建,但他们往往依赖高资源,超出培训组,以及设计设计和设计得当。在这项工作中,我们设计了一种新的攻击,需要低资源,几乎无法进入实际培训组,并查明了可能导致潜在风险图像重建的看似良性提示。这凸显了图像甚至可能由不知情的用户无意地重建的风险。例如,我们发现,关于一个现有模型,即即即即“Unisex T-Shirt”的迅速“Bluue Unisex T-Shirt”能够产生真实人类模型的面貌。我们的方法基于以前作品的直觉,它利用了域知识,并查明了从电子商务平台报废数据中产生的一种基本脆弱性,即模板版式布局和图像与快速模式联系在一起。


Article 12

Title@2025-07-10 (4): BarcodeBERT: Transformers for Biodiversity Analysis

Title: BarcodeBERT: Transformers for Biodiversity Analysis BarcodeBERT: Transformer für Biodiversitätsanalyse 条码BERT:生物多样性分析变异器 2311.02401v3

Authors (12): Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Scott C. Lowe, Graham W. Taylor

In the global challenge of understanding and characterizing biodiversity, short species-specific genomic sequences known as DNA barcodes play a critical role, enabling fine-grained comparisons among organisms within the same kingdom of life. Although machine learning algorithms specifically designed for the analysis of DNA barcodes are becoming more popular, most existing methodologies rely on generic supervised training algorithms. We introduce BarcodeBERT, a family of models tailored to biodiversity analysis and trained exclusively on data from a reference library of 1.5M invertebrate DNA barcodes. We compared the performance of BarcodeBERT on taxonomic identification tasks against a spectrum of machine learning approaches including supervised training of classical neural architectures and fine-tuning of general DNA foundation models. Our self-supervised pretraining strategies on domain-specific data outperform fine-tuned foundation models, especially in identification tasks involving lower taxa such as genera and species. We also compared BarcodeBERT with BLAST, one of the most widely used bioinformatics tools for sequence searching, and found that our method matched BLAST’s performance in species-level classification while being 55 times faster. Our analysis of masking and tokenization strategies also provides practical guidance for building customized DNA language models, emphasizing the importance of aligning model training strategies with dataset characteristics and domain knowledge. The code repository is available at https://github.com/bioscan-ml/BarcodeBERT.

在认识和确定生物多样性的全球挑战中,被称为DNA条形码的简短物种特有基因组序列具有关键作用,能够在同一生命王国内对生物进行细微的对比。虽然专门为分析DNA条形码而设计的机器学习算法越来越受欢迎,但大多数现有方法都依赖一般监督的培训算法。我们引入了BarcodeBERT,这是一套为生物多样性分析而专门设计的模型,专门以来自1.5M无脊椎DNA条形码参考图书馆的数据为根据。我们比较了BarcodBERT在分类识别分类任务方面的表现,与一系列机器学习方法,包括古典神经结构的监督培训以及一般DNA基础模型的微调。我们专门为分析特定领域数据而设计的预先培训战略正在日益普及,特别是在涉及基因和物种等较低分类的识别任务方面。我们还将BarcodeBERT与BLAST这一最广泛使用的生物信息搜索工具进行了比较,并发现我们的方法与BLAST在物种分类方面的模型/BARST在物种层次上的业绩相匹配,同时将55倍地强调实际的DNA定义的标准化战略。


Article 13

Title@2025-07-10 (4): Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice

Title: Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice Towards Continuous Home Cage Monitoring: Eine Bewertung von Tracking- und Identifikationsstrategien für Labor-Mäuse 逐步实现家用钥匙持续监测:对实验室老鼠跟踪和识别战略的评价 2507.07929v1

Authors (2): Juan Pablo Oberhauser, Daniel Grzenda

Continuous, automated monitoring of laboratory mice enables more accurate data collection and improves animal welfare through real-time insights. Researchers can achieve a more dynamic and clinically relevant characterization of disease progression and therapeutic effects by integrating behavioral and physiological monitoring in the home cage. However, providing individual mouse metrics is difficult because of their housing density, similar appearances, high mobility, and frequent interactions. To address these challenges, we develop a real-time identification (ID) algorithm that accurately assigns ID predictions to mice wearing custom ear tags in digital home cages monitored by cameras. Our pipeline consists of three parts: (1) a custom multiple object tracker (MouseTracks) that combines appearance and motion cues from mice; (2) a transformer-based ID classifier (Mouseformer); and (3) a tracklet associator linear program to assign final ID predictions to tracklets (MouseMap). Our models assign an animal ID based on custom ear tags at 30 frames per second with 24/7 cage coverage. We show that our custom tracking and ID pipeline improves tracking efficiency and lowers ID switches across mouse strains and various environmental factors compared to current mouse tracking methods.

对实验室小鼠进行持续自动监测,有助于通过实时洞察,更准确地收集数据,改善动物福利;研究人员可以通过将行为和生理监测纳入家庭笼子,对疾病进展和治疗效果进行更动态和临床相关的定性;然而,由于鼠鼠的居住密度、相似外观、高度流动性和频繁互动,很难提供单项鼠量;为应对这些挑战,我们开发了实时识别(ID)算法,精确地为在由照相机监测的数码家庭笼子里戴定制耳标签的老鼠分配识别预测;我们的管道由三部分组成:(1) 将老鼠的外观和运动提示结合起来的定制多物体跟踪器(MouseTracks);(2) 以变压器为基础的ID分类器(Mouseforth);(3) 将最终身份预测分配给跟踪器(MouseMap)的跟踪器线性连接器程序;我们的模型为每秒30个定制耳标记和24/7个笼子覆盖线,指定一个动物识别码;我们显示,我们的定制跟踪和识别管道改进了跟踪效率和降低鼠标菌株和各种环境因素的身份证开关。


Article 14

Title: A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search Eine Theorie der Schlussfolgerung Berechnung Scaling: Vernunft durch gerichtete stochastische Fähigkeiten Suche 推断计算尺度理论论:通过定向斯托卡技能搜索推理 2507.00004v2

Authors (3): Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney

Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field’s recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimality that consider model size, dataset size, and inference tokens in isolation or in fixed combinations risk overlooking more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learned skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies – including chain-of-thought (CoT) and tree-of-thought (ToT) – enabling comparative analysis as a function of task difficulty and model capability. To that end, we extend a prior first-principles tripartite graph framework of LLM training to incorporate inference, and separately bridge DS3 with empirical methods that characterize LLM scaling behavior. We theoretically recover empirically observed patterns, including: linear accuracy scaling with logarithmic compute; variation in preferred inference strategies as a function of task difficulty and model capability; emergent behavior elicited by reasoning even when performance plateaus under parameter scaling; and both best-of-N (BoN) and majority voting behavior captured within a unified analytical framework. By explicitly characterizing training-inference interdependencies, our framework deepens theoretical understanding and supports principled algorithmic design and resource allocation.

大型语言模型(LLMS)在培训和部署期间都需要大量的计算、精力和财政资源。虽然扩大培训法的规模在培训方面指导了外地最近取得的许多进展,但推断成本现在代表了总体资源负担中一个重要且日益增长的组成部分,对于注重推理的模型来说尤其如此。现有的计算优化特征,考虑到模型大小、数据集大小,以及独立或固定组合中的推断符号,有可能忽略效率更高的操作点。为此,我们引入了定向分析技能搜索(DS3),这是一个总框架,代表了对学习技能图进行精细的考察。从简化但明显的即时化中,我们为任务成功而设计封闭的表达形式,并计算了范围广泛的推论战略的成本 – – 包括思维链(CoT)和思绪符号(TT) – 使得比较分析成为任务困难和模型能力的一个函数。为此,我们扩展了LMM培训的首次原则,纳入了推论,并单独将DS3与实验性S3的连接起来,而实验性标准分配方式则在成本分析模型分析中进行缩缩缩缩缩缩缩缩;我们通过实验性分析的逻辑定义的排序,在逻辑上,在成本分析中,在成本分析中,在成本分析中,在成本分析中进行中,在成本分析中,在成本分析中,在成本分析中,在成本分析中进行。


Article 15

Title@2025-07-10 (4): No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning

Title: No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning Keine $D_{\text{train}}}$: Modell-agnostische Gegenfaktische Erklärungen mit Verstärkungslernen 无 $D{text{train$:利用强化学习模型-不可允许的反事实解释 2405.18563v2

Authors (3): Xiangyu Sun, Raquel Aoki, Kevin H. Wilson

Machine learning (ML) methods have experienced significant growth in the past decade, yet their practical application in high-impact real-world domains has been hindered by their opacity. When ML methods are responsible for making critical decisions, stakeholders often require insights into how to alter these decisions. Counterfactual explanations (CFEs) have emerged as a solution, offering interpretations of opaque ML models and providing a pathway to transition from one decision to another. However, most existing CFE methods require access to the model’s training dataset, few methods can handle multivariate time-series, and none of model-agnostic CFE methods can handle multivariate time-series without training datasets. These limitations can be formidable in many scenarios. In this paper, we present NTD-CFE, a novel model-agnostic CFE method based on reinforcement learning (RL) that generates CFEs when training datasets are unavailable. NTD-CFE is suitable for both static and multivariate time-series datasets with continuous and discrete features. NTD-CFE reduces the CFE search space from a multivariate time-series domain to a lower dimensional space and addresses the problem using RL. Users have the flexibility to specify non-actionable, immutable, and preferred features, as well as causal constraints. We demonstrate the performance of NTD-CFE against four baselines on several datasets and find that, despite not having access to a training dataset, NTD-CFE finds CFEs that make significantly fewer and significantly smaller changes to the input time-series. These properties make CFEs more actionable, as the magnitude of change required to alter an outcome is vastly reduced. The code is available in the supplementary material.

机器学习(ML)方法在过去十年中经历了显著增长,但在影响大的现实领域的实际应用却因不透明而受阻。当ML方法负责做出关键决定时,利益攸关方往往需要深入了解如何改变这些决定。反事实解释(CFES)作为一种解决方案出现,为不透明的ML模型提供了解释,并为从一个决定向另一个决定过渡提供了途径。然而,大多数现有的CFE方法需要访问模型的培训数据集,很少有方法能够处理多变时间序列,而没有模型的CFE方法可以处理多变时间序列,而没有训练数据集。这些限制在许多情况中可能十分艰巨。我们介绍了NTD-CFE, 一种基于强化学习(RL)的新型模型-对CFE方法,在培训数据集无法使用时产生CFE。 NTD-CFE方法适合固定和多变时间序列数据集,具有连续和分立的特性。 NTD-CFEFE方法将C搜索空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间从一个极小,其基础功能可以大大降低,而软性地展示。


Article 16

Title@2025-07-10 (4): Plausible Counterfactual Explanations of Recommendations

Title: Plausible Counterfactual Explanations of Recommendations Plausible gegenfaktische Erklärungen der Empfehlungen 对建议的反事实解释 2507.07919v1

Authors (4): Jakub Černý, Jiří Němeček, Ivan Dovica, Jakub Mareček

Explanations play a variety of roles in various recommender systems, from a legally mandated afterthought, through an integral element of user experience, to a key to persuasiveness. A natural and useful form of an explanation is the Counterfactual Explanation (CE). We present a method for generating highly plausible CEs in recommender systems and evaluate it both numerically and with a user study.

解释在各种建议系统中起着各种作用,从法律授权的事后思考,通过用户经验的有机组成部分,到说服力的关键,一种自然和有用的解释形式是反事实解释,我们提出一种方法,在建议系统中产生高度可信的CE,并用数字和用户研究来加以评估。


Article 17

Title@2025-07-10 (4): A statistical physics framework for optimal learning

Title: A statistical physics framework for optimal learning Statistischer Physikrahmen für optimales Lernen 促进最佳学习的统计物理框架 2507.07907v1

Authors (2): Francesca Mignacco, Francesco Mori

Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further hindered by the high dimensionality of the learning space, often resulting in predominantly heuristic, difficult to interpret, and computationally demanding solutions. Here, we combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in prototypical neural network models. In the high-dimensional limit, we derive closed-form ordinary differential equations that track online stochastic gradient descent through low-dimensional order parameters. We formulate the design of learning protocols as an optimal control problem directly on the dynamics of the order parameters with the goal of minimizing the generalization error at the end of training. This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets. We apply it to representative cases, including optimal curricula, adaptive dropout regularization and noise schedules in denoising autoencoders. We find nontrivial yet interpretable strategies highlighting how optimal protocols mediate crucial learning tradeoffs, such as maximizing alignment with informative input directions while minimizing noise fitting. Finally, we show how to apply our framework to real datasets. Our results establish a principled foundation for understanding and designing optimal learning protocols and suggest a path toward a theory of meta-learning grounded in statistical physics.

仔细设计人工神经网络超参数时间表或生物学习者有效分配认知资源的超参数可以极大地影响业绩。然而,对最佳学习战略的理论理解仍然很少,特别是由于不断演变的元参数和非线性学习动态之间错综复杂的相互作用。寻找最佳协议的努力进一步受到学习空间高度维度的阻碍,这往往导致学习空间的高度多元性、难以解释和计算要求高的解决办法。在这里,我们将统计物理和控制理论结合到一个统一的理论框架中,以确定模拟神经网络模型中的最佳协议。在高度限制下,我们产生封闭式普通差异方程式,通过低度顺序参数跟踪在线蒸发梯度梯度梯度下降。我们设计学习协议,作为关于秩序参数动态的最佳控制问题,目的是在培训结束时尽量减少一般化错误。这个框架包括各种学习假设、优化原则约束和控制预算。我们将其用于有代表性的案例,包括最佳课程、调整退出和噪音表,在解析理论性理论化过程中,通过低维度的顺序来跟踪在线梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度下降。我们如何通过学习最优化的阶度解释,我们如何在最优化的进化的进化的逻辑框架中学习,我们如何在最优化的基底线上学习。我们如何学习一个最优化的阶值规则,我们如何向最优化地学习。我们如何向最优化地学习。我们如何在最优化的进进进进进进进进进进进进的进的阶论,我们如何学习。我们如何学习。我们如何在最深地学习。


Article 18

Title@2025-07-10 (4): Agentic Retrieval of Topics and Insights from Earnings Calls

Title: Agentic Retrieval of Topics and Insights from Earnings Calls Agentische Retrieval von Themen und Erkenntnisse aus Earnings Calls 收入呼吁的主题和透视的 Agent 检索 2507.07906v1

Authors (3): Anant Gupta, Rajarshi Bhowmik, Geoffrey Gunow

Tracking the strategic focus of companies through topics in their earnings calls is a key task in financial analysis. However, as industries evolve, traditional topic modeling techniques struggle to dynamically capture emerging topics and their relationships. In this work, we propose an LLM-agent driven approach to discover and retrieve emerging topics from quarterly earnings calls. We propose an LLM-agent to extract topics from documents, structure them into a hierarchical ontology, and establish relationships between new and existing topics through a topic ontology. We demonstrate the use of extracted topics to infer company-level insights and emerging trends over time. We evaluate our approach by measuring ontology coherence, topic evolution accuracy, and its ability to surface emerging financial trends.

金融分析的一项关键任务是,通过公司收入征召中的专题跟踪公司的战略重点,然而,随着行业的发展,传统专题模拟技术在努力动态地捕捉新兴专题及其关系。在这项工作中,我们提议由LLM代理商驱动的方法,从季度收入征召中发现和检索新兴专题。我们提议由LLM代理商从文件中提取专题,将其结构化为分级的本体学,并通过一个专题本体学在新专题和现有专题之间建立关系。我们展示了利用提取的专题推断公司一级的认识和长期的新趋势。我们通过测量本体学的一致性、专题演变准确性及其呈现新金融趋势的能力来评估我们的方法。


Article 19

Title@2025-07-10 (4): Efficient Causal Discovery for Autoregressive Time Series

Title: Efficient Causal Discovery for Autoregressive Time Series Effiziente Causal Discovery für autoregressive Zeitreihen 自动递减时间序列高效因果发现 2507.07898v1

Authors (2): Mohammad Fesanghary, Achintya Gopal

In this study, we present a novel constraint-based algorithm for causal structure learning specifically designed for nonlinear autoregressive time series. Our algorithm significantly reduces computational complexity compared to existing methods, making it more efficient and scalable to larger problems. We rigorously evaluate its performance on synthetic datasets, demonstrating that our algorithm not only outperforms current techniques, but also excels in scenarios with limited data availability. These results highlight its potential for practical applications in fields requiring efficient and accurate causal inference from nonlinear time series data.

在本研究中,我们为非线性自动递减时间序列专门设计的因果结构学习提出了一种新的基于约束的算法。我们的算法大大降低了计算的复杂性,使其比现有方法更高效,更可伸缩到更大的问题。我们严格评估了合成数据集的性能,表明我们的算法不仅优于现有技术,而且优于数据有限的情况。这些结果凸显了它在需要从非线性时间序列数据中高效和准确的因果推算的领域的实际应用潜力。


Article 20

Title@2025-07-10 (4): Sampling Imbalanced Data with Multi-objective Bilevel Optimization

Title: Sampling Imbalanced Data with Multi-objective Bilevel Optimization Probenahme ausgewogener Daten mit multi-objektiver Bilevel-Optimierung 具有多目标双一级最佳优化的数据 2506.11315v2

Authors (3): Karen Medlin, Sven Leyffer, Krishnan Raghavan

Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or na"ive resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric – `$\epsilon/ \delta$ non-overlapping diversification metric’ – that quantifies the goodness of a sampling method towards model performance. With this metric we experimentally demonstrate state-of-the-art performance with improvement in diversity driving a $1-15 \%$ increase in $F1$ scores.

两大类分类问题的特点往往是多数和少数数据点数量之间的不平衡,导致少数类的分类特别差。传统方法,例如重新加权损失功能或“反”重新抽样,风险过大,后来由于不考虑多数和少数数据集的多样性而未能改进分类。这种考虑是不可行的,因为没有衡量不平衡对模型的影响的尺度。为避免这些挑战,我们做出了两项关键贡献。首先,我们引入MOODS~(数据抽样的多目标优化),这是一个新的多目标双级优化框架,指导合成的过度抽样和多数抽样调查。第二,我们引入一个验证性指标——`$\epsilon/\delta$的非重叠多样化指标’——用以量化抽样方法对模型性能的好坏。我们用这一指标实验性地展示了最先进的业绩,通过改进多样性,使多样性增加1—15美元分数。


Article 21

Title@2025-07-10 (4): Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach

Title: Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach Cross-Problem-Parameter-Transfer in Quanten Ungefähre Optimierungs-Algorithmus: Ein Ansatz zum maschinellen Lernen 量子中交叉问题参数转移 近最佳优化算法:机械学习方法 2504.10733v2

Authors (3): Kien X. Nguyen, Bao Bach, Ilya Safro

Quantum Approximate Optimization Algorithm (QAOA) is one of the most promising candidates to achieve the quantum advantage in solving combinatorial optimization problems. The process of finding a good set of variational parameters in the QAOA circuit has proven to be challenging due to multiple factors, such as barren plateaus. As a result, there is growing interest in exploiting parameter transferability, where parameter sets optimized for one problem instance are transferred to another that could be more complex either to estimate the solution or to serve as a warm start for further optimization. But can we transfer parameters from one class of problems to another? Leveraging parameter sets learned from a well-studied class of problems could help navigate the less studied one, reducing optimization overhead and mitigating performance pitfalls. In this paper, we study whether pretrained QAOA parameters of MaxCut can be used as is or to warm start the Maximum Independent Set (MIS) circuits. Specifically, we design machine learning models to find good donor candidates optimized on MaxCut and apply their parameters to MIS acceptors. Our experimental results show that such parameter transfer can significantly reduce the number of optimization iterations required while achieving comparable approximation ratios.

QAOA 电路中找到一套良好的变异参数的过程已证明由于多种因素,例如高原贫瘠,因此具有挑战性。结果,人们越来越有兴趣利用参数的可转移性,因为一个问题的参数组最优化地转移到另一个可能比较复杂的区域,要么用于估计解决办法,要么作为进一步优化的热点开端。但是,我们能否将某一类问题的参数转移到另一个类别?从一个类别的问题中学到的参数组可以帮助在经过仔细研究的各类问题中找到一套良好的变异参数,减少优化的间接费用和减轻性能陷阱。在本文中,我们研究的是,是否可以像现在这样或温暖地利用MaxCut的QAA参数参数组参数组参数组来启动最大独立集(MIS)电路。具体地说,我们设计机器学习模型,以找到最佳的捐赠者候选人,并将其参数应用到MIS接受者身上。我们的实验结果显示,在达到可比的精确度时,这种参数转换率可以大大降低。


Article 22

Title@2025-07-10 (4): Masked Image Modeling: A Survey

Title: Masked Image Modeling: A Survey Maskenbildmodellierung: Eine Umfrage 蒙面图像建模:调查 2408.06687v3

Authors (5): Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, Nicu Sebe

In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent representations, and training a model, usually an autoencoder, to predicting the missing information by using the context available in the visible part of the input. We identify and formalize two categories of approaches on how to implement MIM as a pretext task, one based on reconstruction and one based on contrastive learning. Then, we construct a taxonomy and review the most prominent papers in recent years. We complement the manually constructed taxonomy with a dendrogram obtained by applying a hierarchical clustering algorithm. We further identify relevant clusters via manually inspecting the resulting dendrogram. Our review also includes datasets that are commonly used in MIM research. We aggregate the performance results of various masked image modeling methods on the most popular datasets, to facilitate the comparison of competing methods. Finally, we identify research gaps and propose several interesting directions of future work. We supplement our survey with the following public repository containing organized references: https://github.com/vladhondru25/MIM-Survey.

在这项工作中,我们调查了最近关于蒙面图像建模(MIM)的研究,这是一种在计算机视觉中作为强大的自我监督的学习技术而出现的方法。MIM的任务包括掩盖某些信息,例如像素、补丁、甚至潜在代表,并培训一个模型,通常是自动编码器,以便利用投入中可见部分的现有背景来预测缺失的信息。我们确定并正式确定关于如何实施蒙面图像建模(MIM)的两种方法,一种基于重建,一种基于对比性学习。然后,我们建立一个分类,审查近年来最突出的文件。我们用一个分级分类算法来补充手工构建的分类法。我们通过手动检查由此形成的分类法进一步确定相关组群。我们的审查还包括MIM研究中常用的数据集。我们汇总了在最受欢迎的数据集上的各种蒙面图像建模方法的绩效结果,以便于比较相互竞争的方法。最后,我们找出了研究差距,并提出了未来工作的一些有趣的方向。我们用一个分级组算法来补充我们的调查。我们通过手动检查结果,我们用载有MDRM/Sudrov。我们组织的公共仓库。


Article 23

Title@2025-07-10 (4): A Bilevel Optimization Framework for Imbalanced Data Classification

Title: A Bilevel Optimization Framework for Imbalanced Data Classification Ein Bilevel-Optimierungsrahmen für die unausgewogene Datenklassifikation 平衡数据分类双级优化框架 2410.11171v3

Authors (3): Karen Medlin, Sven Leyffer, Krishnan Raghavan

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new undersampling approach that: (i) avoids the pitfalls of noise and overlap caused by synthetic data and (ii) avoids the pitfall of under-fitting caused by random undersampling. Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss. Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint’s impact on loss and rejects those unable to improve it. In so doing, our approach rejects majority datapoints redundant to datapoints already accepted and, thereby, finds an optimal subset of majority training data for classification. The accept/reject component of our algorithm is motivated by a bilevel optimization problem uniquely formulated to identify the optimal training set we seek. Experimental results show our proposed technique with F1 scores up to 10% higher than state-of-the-art methods.

数据再平衡技术,包括过度抽样和抽样不足,是解决数据不平衡挑战的共同方法。为了解决与过度抽样和抽样不足有关的未决问题,我们提议一种新的低抽样方法,即:(一) 避免合成数据造成的噪音和重叠的陷阱,(二) 避免随机抽样抽查造成的不适的陷阱。我们的方法不是随机抽查多数数据,而是根据模型损失改善能力对数据点进行下抽样。利用改进后的模型损失作为分类绩效的代用计量,我们的技术评估数据点对损失的影响,拒绝无法改进的数据点。因此,我们的方法拒绝接受多数数据点对已经接受的数据点的多余,从而找到用于分类的大多数培训数据的最佳组合。我们算法的接受/截取部分的动机是双级优化问题,为确定我们所寻求的最佳培训集。实验结果显示,我们提议的F1技术的分数高达10%,高于目前采用的方法。


Article 24

Title@2025-07-10 (4): UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

Title: UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs UnIT: Skalierbare unstrukturierte Schlussfolgerungs-Zeit-Rechnung für MAC-effiziente Neuralinferenz auf MCUs UnIT:MAC 高效神经引力对多边协调单位的可缩放无结构的推推力-时间节制 2507.07885v1

Authors (6): Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam

Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across multiple connections and applying layer- and group-specific pruning sensitivity. We present three fast, hardware-friendly division approximations tailored to the capabilities of common embedded platforms. Demonstrated on the MSP430 microcontroller, UnIT achieves 11.02% to 82.03% MAC reduction, 27.30% to 84.19% faster inference, and 27.33% to 84.38% lower energy consumption compared to training-time pruned models, while maintaining accuracy with 0.48-7%. Under domain shift, UnIT matches or exceeds the accuracy of retrained models while requiring significantly fewer MACs. These results establish unstructured inference-time pruning as a viable and practical solution for efficient, retraining-free deployment of deep neural networks on MCUs.

现有的理算方法通常在培训或编译时间期间应用,并且往往依赖结构化的宽度。 UnIT 与结构化的微控制器(MCUs)兼容,但结构化的修剪功能不足,在没有SIMMD支持或平行计算的情况下,对设备使用精细效率的机会使用不足,没有SIMD支持或平行计算。为了解决这些限制,我们采用了UnIT(不结构化的推算-时间裁剪),这是一种轻量方法,它动态地识别和避免在推算期间不必要的增积分(MAC)操作,以输入特定的激活模式为指导。与结构化的微控制器不同, UnIT 包含不规则化的缩放,不需要再培训或硬件专门化。它将修剪裁决定转化为轻量度的比较,用阈值检查和近似的分数。UIT进一步优化 UnIT,通过在多个连接中重新使用阈值计算并应用分层和特定组的理算灵敏度的灵敏度灵敏度的灵敏度。我们提出了三种快速、软硬度差差差差比,在MICdrode-rial rodu化的模型上为普通平台的能力。在MICdrodudrodudaldaldaldaldaldaldaldaldaldal raisal laxxx rade lax lax la la la la la la lax der der 。在深度的精确度中,在深度测量算中,在深度的精确度中为11.


Article 25

Title@2025-07-10 (4): SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation

Title: SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation SAMO: Ein leicht schärfer und bewusster Ansatz für die Multi-Task-Optimierung mit gemeinsamer Global-Local-Perturbation SAMO: 与全球-地方联合干扰进行多任务优化的轻量级锐锐利软件方法 2507.07883v1

Authors (3): Hao Ban, Gokul Ram Subramani, Kaiyi Ji

Multi-task learning (MTL) enables a joint model to capture commonalities across multiple tasks, reducing computation costs and improving data efficiency. However, a major challenge in MTL optimization is task conflicts, where the task gradients differ in direction or magnitude, limiting model performance compared to single-task counterparts. Sharpness-aware minimization (SAM) minimizes task loss while simultaneously reducing the sharpness of the loss landscape. Our empirical observations show that SAM effectively mitigates task conflicts in MTL. Motivated by these findings, we explore integrating SAM into MTL but face two key challenges. While both the average loss gradient and individual task gradients-referred to as global and local information-contribute to SAM, how to combine them remains unclear. Moreover, directly computing each task gradient introduces significant computational and memory overheads. To address these challenges, we propose SAMO, a lightweight \textbf{S}harpness-\textbf{A}ware \textbf{M}ulti-task \textbf{O}ptimization approach, that leverages a joint global-local perturbation. The local perturbations are approximated using only forward passes and are layerwise normalized to improve efficiency. Extensive experiments on a suite of multi-task benchmarks demonstrate both the effectiveness and efficiency of our method. Code is available at https://github.com/OptMN-Lab/SAMO.

多任务学习( MTL) 使得一个联合模型能够捕捉多种任务的共性,降低计算成本并提高数据效率。 然而, MTL优化中的一个主要挑战是任务冲突,任务梯度在方向或规模上各不相同,限制了与单一任务对应方相比的模型性能。 锐化- 智能最小化( SAM) 最大限度地减少任务损失,同时降低损失面貌的清晰度。 我们的实证观察显示, SAM 有效地减轻了MTL中的任务冲突。 受这些发现激励, 我们探索将 SAM 纳入MTL 面临两大挑战。 虽然任务梯度和任务梯度的平均损失梯度都被称为全球和地方信息匹配点, 如何将这些梯度组合起来, 仍然不清楚。 此外, 直接计算每项任务梯度会引入重要的计算和记忆管理管理。 为了应对这些挑战, 我们提议SAMO, 一个轻量的 kextbf{ Sharn- textbf{A} sweabtlear b{MLti- task- textfk knational) 面临两大挑战。 方法只能用来利用全球/ mainal- liveralbalbalbalbbal adal adal beal bealbildalbsildildalal bealbisgildal


Article 26

Title@2025-07-10 (4): Can AI-predicted complexes teach machine learning to compute drug binding affinity?

Title: Can AI-predicted complexes teach machine learning to compute drug binding affinity? Können KI-vorhergesehene Komplexe maschinelles Lernen beibringen, um Arzneimittelbindungsaffinität zu berechnen? 人工智能预测综合体能教机器学习如何计算药物绑定的亲缘关系吗? 2507.07882v1

Authors (5): Wei-Tse Hsu, Savva Grevtsev, Thomas Douglas, Aniket Magarkar, Philip C. Biggin

We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.

我们评估了利用合成数据增加的双倍模型来培训机器学习性评分功能(MLSFs)进行约束性近亲预测的可行性。我们的研究结果表明,业绩的提高主要取决于扩大数据的结构质量。 有鉴于此,我们建立了简单的累进法,用于在没有参考结构的情况下确定高质量的双倍预测,使其能够取代MLSF培训中的实验结构。我们的研究为基于共同模式的未来数据增加战略提供了依据。


Article 27

Title@2025-07-10 (4): What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Title: What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models Was hat ein Stiftungsmodell gefunden? Mit induktiven Bias zur Untersuchung von Weltmodellen ” 基金会模式 “ 有何发现? 2507.06952v2

Authors (4): Keyon Vafa, Peter G. Chang, Ashesh Rambachan, Sendhil Mullainathan

Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler’s predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model’s inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

基建模型的前提是,序列预测能够发现更深的域内理解,非常像开普勒对行星运动的预测后来导致发现牛顿力学。然而,评估这些模型是否真正捕捉更深的结构仍然是一项挑战。我们开发了一种基础模型评估技术,以审查这些模型如何适应某些假设世界模型产生的合成数据集。我们的技术测量了基础模型的感应偏差是否与世界模型一致,因此我们把它称为一种感应偏差探测器。在多个领域,我们发现基础模型在适应新任务时,能够出色地完成培训任务,但未能形成对基本世界模型的直导偏向。我们特别发现,在轨道轨迹上受训的基础模型在适应新的物理学任务时,始终无法应用牛顿力学力。进一步的分析显示,这些模型的行为表现为它们开发了无法概括的任务性超自然学。


Article 28

Title@2025-07-10 (4): Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Title: Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models Edge-ASR: Auf dem Weg zur Low-Bit Quantisierung von automatischen Spracherkennungsmodellen 边缘-ASR:实现自动语音识别模式的低比量量化 2507.07877v1

Authors (7): Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Xiaopeng Zhang

Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on resource constrained edge devices (e.g., IoT device, wearables) still presents substantial challenges due to strict limits on memory, compute and power. Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining. Despite its importance, the performance implications of various advanced quantization methods and bit-width configurations on ASR models remain unclear. In this work, we present a comprehensive benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine. We systematically evaluate model performances (i.e., accuracy, memory I/O and bit operations) across seven diverse datasets from the open ASR leaderboard, analyzing the impact of quantization and various configurations on both weights and activations. Built on an extension of the LLM compression toolkit, our framework integrates edge-ASR models, diverse advanced quantization algorithms, a unified calibration and evaluation data pipeline, and detailed analysis tools. Our results characterize the trade-offs between efficiency and accuracy, demonstrating that even 3-bit quantization can succeed on high capacity models when using advanced PTQ techniques. These findings provide valuable insights for optimizing ASR models on low-power, always-on edge devices.

自动语音识别(ASR)的近期进展表明,各种音频应用程序,如现场转录和语音指令处理等的精确度和稳健度是显著的,然而,在资源受限边缘装置(如Iot设备、可磨损设备)上部署这些模型,由于对记忆、计算和权力的严格限制,仍面临重大挑战。量化,特别是培训后量化(PTQ),提供了在不进行再培训的情况下降低模型规模和推论成本的有效方法。尽管其重要性仍然不明,但各种高级量化方法和对ASR模型的微宽度配置的性能影响仍然不明。在这项工作中,我们提出了对资源受限边缘装置(SOTA)8个状态(SAT)的PTQ方法的全面基准,该基准适用于两个主要的边缘-ASR模型的严格记忆、计算模型的准确性能(即准确性能、记忆一/O和比特操作),我们系统地评估了7个不同的模型的模型,分析了量化和各种配置对ASR的重量和激活模型的影响。


Article 29

Title@2025-07-10 (4): Fair Uncertainty Quantification for Depression Prediction

Title: Fair Uncertainty Quantification for Depression Prediction Faire Unsicherheit Quantifizierung für Depression Vorhersage 预测萧条预测的公平不确定性量化 2505.04931v2

Authors (2): Yonghong Li, Xiuzhuang Zhou

Trustworthy depression prediction based on deep learning, incorporating both predictive reliability and algorithmic fairness across diverse demographic groups, is crucial for clinical application. Recently, achieving reliable depression predictions through uncertainty quantification has attracted increasing attention. However, few studies have focused on the fairness of uncertainty quantification (UQ) in depression prediction. In this work, we investigate the algorithmic fairness of UQ, namely Equal Opportunity Coverage (EOC) fairness, and propose Fair Uncertainty Quantification (FUQ) for depression prediction. FUQ pursues reliable and fair depression predictions through group-based analysis. Specifically, we first group all the participants by different sensitive attributes and leverage conformal prediction to quantify uncertainty within each demographic group, which provides a theoretically guaranteed and valid way to quantify uncertainty for depression prediction and facilitates the investigation of fairness across different demographic groups. Furthermore, we propose a fairness-aware optimization strategy that formulates fairness as a constrained optimization problem under EOC constraints. This enables the model to preserve predictive reliability while adapting to the heterogeneous uncertainty levels across demographic groups, thereby achieving optimal fairness. Through extensive evaluations on several visual and audio depression datasets, our approach demonstrates its effectiveness.

基于深层学习的可靠预测,包括预测可靠性和不同人口群体之间的算法公平性,是临床应用的关键。最近,通过不确定性量化实现可靠的抑郁预测,引起了越来越多的关注。然而,很少有研究侧重于抑郁预测中不确定性量化(UQ)的公平性。在这项工作中,我们调查了UQ的算法公平性,即平等机会覆盖(EOC)公平性,并提出了用于抑郁预测的公平性不确定性量化(FUQ) 。FUQ通过群体分析追求可靠和公平的抑郁预测。具体地说,我们首先以不同敏感属性将所有参与者组合起来,并利用一致预测来量化每个人口群体中的不确定性,这在理论上提供了一种有保障的、有效的方法来量化抑郁预测不确定性,并促进了不同人口群体之间的公平性调查。此外,我们提出了公平性意识优化战略,将公平性作为受EOC制约的有限优化问题来阐述。这样可以使模型保持预测可靠性,同时适应各人口群体间混杂的不确定性,从而实现最佳公平性。我们的方法是通过对若干个视觉和听觉抑郁数据集的广泛评价,展示其有效性。


Article 30

Title@2025-07-10 (4): Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle

Title: Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v1

Authors (2): Daniel Betschinske, Steven Peters

The safety validation of automatic emergency braking system (AEBS) requires accurately distinguishing between false positive (FP) and true positive (TP) system activations. While simulations allow straightforward differentiation by comparing scenarios with and without interventions, analyzing activations from open-loop resimulations - such as those from field operational testing (FOT) - is more complex. This complexity arises from scenario parameter uncertainty and the influence of driver interventions in the recorded data. Human labeling is frequently used to address these challenges, relying on subjective assessments of intervention necessity or situational criticality, potentially introducing biases and limitations. This work proposes a rule-based classification approach leveraging the Prediction Divergence Principle (PDP) to address those issues. Applied to a simplified AEBS, the proposed method reveals key strengths, limitations, and system requirements for effective implementation. The findings suggest that combining this approach with human labeling may enhance the transparency and consistency of classification, thereby improving the overall validation process. While the rule set for classification derived in this work adopts a conservative approach, the paper outlines future directions for refinement and broader applicability. Finally, this work highlights the potential of such methods to complement existing practices, paving the way for more reliable and reproducible AEBS validation frameworks.

自动紧急制动系统(AEBS)的安全验证要求准确区分假正(FP)和真实正(TP)系统启动。虽然模拟允许通过将假设情景与干预相比较和不采取干预措施来进行直接区分,但分析开放环模拟(如实地操作测试)的激活更为复杂。这种复杂性产生于情景参数的不确定性和记录数据中驱动器干预的影响。人类标签经常被用来应对这些挑战,依靠对干预必要性或情况危急性的主观评估,可能引入偏见和限制。这项工作提出了一种基于规则的分类方法,利用预测差异原则解决这些问题。在简化的AEBS中,拟议方法显示了有效执行的关键优点、局限性和系统要求。研究结果表明,将这种方法与人类标签相结合,可以提高分类的透明度和一致性,从而改进总体验证过程。虽然这项工作中制定的分类规则采用保守的方法,但本文件概述了今后改进和更广泛适用的方向。最后,这项工作突出了这些方法对于简化AEBRRRRR的可靠和更加适用性框架的潜在作用。


Article 31

Title@2025-07-10 (4): Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking

Title: Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking Abmildernde Wasserzeichen-Stealing-Angriffe in generativen Modellen über Multi-Key-Wasserzeichen 通过多钥匙划水标记,在产生模型时通过多钥匙划水标记减轻盗用盗用水标志袭击 2507.07871v1

Authors (8): Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas

Watermarking offers a promising solution for GenAI providers to establish the provenance of their generated content. A watermark is a hidden signal embedded in the generated content, whose presence can later be verified using a secret watermarking key. A threat to GenAI providers are \emph{watermark stealing} attacks, where users forge a watermark into content that was \emph{not} generated by the provider’s models without access to the secret key, e.g., to falsely accuse the provider. Stealing attacks collect \emph{harmless} watermarked samples from the provider’s model and aim to maximize the expected success rate of generating \emph{harmful} watermarked samples. Our work focuses on mitigating stealing attacks while treating the underlying watermark as a black-box. Our contributions are: (i) Proposing a multi-key extension to mitigate stealing attacks that can be applied post-hoc to any watermarking method across any modality. (ii) We provide theoretical guarantees and demonstrate empirically that our method makes forging substantially less effective across multiple datasets, and (iii) we formally define the threat of watermark forging as the task of generating harmful, watermarked content and model this threat via security games.

水印为GenAI提供商提供了一个很有希望的解决方案,可以让GenAI提供商确定其生成内容的来源。水印是生成内容中隐含的信号,其存在后来可以用一个秘密的水印键进行验证。对GenAI提供商的威胁是攻击\ emph{ watermark steating},用户将一个水印印到由提供商模型生成的内容中,而无法获取秘密钥匙,例如无法对提供者进行虚假指控的隐含钥匙,例如窃取攻击收集了\ emph{ harmless} 水印样本,目的是最大限度地提高生成\ emph{hjalth} 水印样本的预期成功率。我们的工作重点是减少偷盗袭击,同时将基本水印作为黑箱处理。我们的贡献是:(一) 提出一个多功能扩展,以减少窃取袭击,这种窃取可以适用于任何方式的任何水印方法。 (二) 我们提供理论保证,并用经验证明我们的方法使得多套数据集之间明显无效,以及(三) 我们正式界定了这一威胁的模型内容。


Article 32

Title@2025-07-10 (4): Parametric Scaling Law of Tuning Bias in Conformal Prediction

Title: Parametric Scaling Law of Tuning Bias in Conformal Prediction Parametrisches Skalierungsgesetz des Tuning Bias in konformer Vorhersage 非正规预测中计票比价的参数衡量法 2502.03023v2

Authors (4): Hao Zeng, Kangdao Liu, Bingyi Jing, Hongxin Wei

Conformal prediction is a popular framework of uncertainty quantification that constructs prediction sets with coverage guarantees. To uphold the exchangeability assumption, many conformal prediction methods necessitate an additional holdout set for parameter tuning. Yet, the impact of violating this principle on coverage remains underexplored, making it ambiguous in practical applications. In this work, we empirically find that the tuning bias - the coverage gap introduced by leveraging the same dataset for tuning and calibration, is negligible for simple parameter tuning in many conformal prediction methods. In particular, we observe the scaling law of the tuning bias: this bias increases with parameter space complexity and decreases with calibration set size. Formally, we establish a theoretical framework to quantify the tuning bias and provide rigorous proof for the scaling law of the tuning bias by deriving its upper bound. In the end, we discuss how to reduce the tuning bias, guided by the theories we developed.

复杂预测是一个流行的不确定性量化框架,它构建了具有覆盖保障的预测数据集。为了维护互换性假设,许多一致的预测方法需要为参数调控设置额外的阻力。然而,违反这一原则对覆盖范围的影响仍未得到充分探讨,使其在实际应用中变得模糊不清。在这项工作中,我们从经验中发现,调试偏差——利用相同的数据集进行调试和校准时出现的覆盖差距——对于在许多一致性预测方法中进行简单的参数调试是微不足道的。我们特别看到调试偏差的缩放法:这种偏差随着参数空间复杂性的增加而增加,而用校准设定的大小而减少。形式上下文是,我们建立一个理论框架,量化调适偏的偏差,并为调偏差的调整法的缩放提供严格的证明。最后,我们讨论如何减少调适偏的偏差,以我们开发的理论为指导。


Article 33

Title@2025-07-10 (4): Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing

Title: Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing Nexus: Durchsatz-Latenz-Tradeoff im LLM-Servieren durch effiziente GPU-Sharing Nexus:通过高效的GPU共享,在LLM服务中,控制通量-通量权衡交易 2507.06608v2

Authors (5): Xiaoxiang Shi, Colin Cai, Junjia Du, Zhanda Zhu, Zhihao Jia

Current prefill-decode (PD) disaggregation is typically deployed at the level of entire serving engines, assigning separate GPUs to handle prefill and decode phases. While effective at reducing latency, this approach demands more hardware. To improve GPU utilization, Chunked Prefill mixes prefill and decode requests within the same batch, but introduces phase interference between prefill and decode. While existing PD disaggregation solutions separate the phases across GPUs, we ask: can the same decoupling be achieved within a single serving engine? The key challenge lies in managing the conflicting resource requirements of prefill and decode when they share the same hardware. In this paper, we first show that chunked prefill requests cause interference with decode requests due to their distinct requirements for GPU resources. Second, we find that GPU resources exhibit diminishing returns. Beyond a saturation point, increasing GPU allocation yields negligible latency improvements. This insight enables us to split a single GPU’s resources and dynamically allocate them to prefill and decode on the fly, effectively disaggregating the two phases within the same GPU. Across a range of models and workloads, our system Nexus achieves up to 2.2x higher throughput, 20x lower TTFT, and 2.5x lower TBT than vLLM. It also outperforms SGLang with up to 2x higher throughput, 2x lower TTFT, and 1.7x lower TBT, and achieves 1.4x higher throughput than vLLM-disaggregation using only half the number of GPUs.

当前填充前编码( PD) 分解通常部署在整个服务引擎的级别上, 指定不同的 GPU 来处理预填和解码阶段, 关键的挑战在于如何管理预填和解码阶段的矛盾资源需求。 虽然在降低 LPU 利用率方面效果有效, 这种方法要求更多硬件。 为改善 GPU 利用率, Checked 预填组合和解码请求在同一批次内, 引入预填和解码请求之间的阶段干扰。 虽然现有的 PD 分解解决方案将GPU 的阶段分为不同的 GPU ; 我们问 : 在单个服务引擎中, 能否实现相同的分解? 关键的挑战在于管理预填和解码的相冲突资源需求。 在本文中, 块块的预填请求会干扰解码请求, 因为他们对 GPUPU资源的要求不同。 其次, 我们发现 GPU 资源在预填和解码之间会减少回报。 在一个饱和点之外, 增加 GPUPU 资源, 仅将一个单一 GPU 资源分解到 预填和解, 在同一个 GPUTFTF 中有效分解两个阶段, 在GPUPUDL 中, 中, 20x 和 中, 20x 和 20x 工作 完成一个较低的半 较低 , , 20x , , 和 和 水平 低的分解算到 。


Article 34

Title@2025-07-10 (4): Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders

Title: Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders Re-Bottleneck: Latent Re-Structuring für Neural Audio Autoencoder 重新装瓶器:神经音频自动自动编码器前端重新结构 2507.07867v1

Authors (3): Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis

Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a “Re-Bottleneck”, an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework’s effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we introduce equivariance, ensuring that a filtering operation on the input waveform directly corresponds to a specific transformation in the latent space. Ultimately, our Re-Bottleneck framework offers a flexible and efficient way to tailor representations of neural audio models, enabling them to seamlessly meet the varied demands of different applications with minimal additional training.

神经调制解调器和自动读数器已成为音频压缩、传输、特效扩展和潜空生成的多功能模型。 但是,一个关键的限制因素是,大多数都受过培训,以最大限度地实现重建的忠诚,往往忽视了在各种下游应用中最佳性能所需的具体潜在结构。我们提出了一个简单的、后热框架来解决这个问题,办法是修改受过训练的自动校正器的瓶颈。我们的方法引入了“Re-Balttleneck ” , 一种完全通过潜伏空间损失来训练的内部瓶颈,以培养用户定义的结构。我们在三个实验中展示了框架的有效性。首先,我们在不牺牲重建质量的情况下对潜在渠道实施订单。第二,我们将潜伏与语义嵌入结合起来,分析下游扩散模型的影响。第三,我们引入了一种不均匀性,确保输入波形的过滤操作直接与潜伏空间的特定转变相匹配。最终,我们的“Re-Stallneck 框架提供了一种灵活而高效的方法,以调整神经音模型的外观,使其能够无缝地满足不同应用的各种需求,并获得最起码的培训。


Article 35

Title@2025-07-10 (4): Predicting and generating antibiotics against future pathogens with ApexOracle

Title: Predicting and generating antibiotics against future pathogens with ApexOracle Vorhersage und Generierung von Antibiotika gegen zukünftige Krankheitserreger mit ApexOracle 预测并产生抗生素,用ApexOracle来防治未来的病原体 2507.07862v1

Authors (4): Tianang Leng, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez

Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an artificial intelligence (AI) model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion language model and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of “new-to-nature” molecules with high predicted efficacy against priority threats. By pairing rapid activity prediction with targeted molecular generation, ApexOracle offers a scalable strategy for countering AMR and preparing for future infectious-disease outbreaks.

抗微生物抗药性正在升级和超越目前的抗生素发展速度。因此,发现抗生素有效抗新病原体的抗生素正在变得越来越重要。然而,现有方法无法迅速识别有效分子以对抗新病原体或新出现的抗药菌株。在这里,我们引入了人工智能模型ApexOracle,这是一个人工智能模型,既预测现有化合物的抗菌力,又设计出一种抗菌素的抗生素。脱离了完全依赖分子特征的模型,ApexOracle结合了特定病原体环境,整合了通过基本离散传播语言模型和双渗透框架采集的分子特征,将基因病原体病原体和耐药性菌株综合在一起。在多种细菌物种和化学模式中,Apex Oracle持续超越了现有化合物的抗菌性能,并证明可以可靠地转移到新病原原体,且几乎没有或没有抗微生物数据。其统一的代表性生成结构进一步使得通过高预测的抗原体扩散分子分子特性模型和高预测性抗原状的传染病爆发战略,并准备了可迅速预测的、可预见的、可预见性、可预见性、可预见性、可预见性、可预见性、可预见的先变先变先变性战略。


Article 36

Title@2025-07-10 (4): Studying and Improving Graph Neural Network-based Motif Estimation

Title: Studying and Improving Graph Neural Network-based Motif Estimation Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung 研究和改善图形神经网络基于Motif 估计 2506.15709v3

Authors (3): Pedro C. Vieira, Miguel E. P. Silva, Pedro Manuel Pinto Ribeiro

Graph Neural Networks (GNNs) are a predominant method for graph representation learning. However, beyond subgraph frequency estimation, their application to network motif significance-profile (SP) prediction remains under-explored, with no established benchmarks in the literature. We propose to address this problem, framing SP estimation as a task independent of subgraph frequency estimation. Our approach shifts from frequency counting to direct SP estimation and modulates the problem as multitarget regression. The reformulation is optimised for interpretability, stability and scalability on large graphs. We validate our method using a large synthetic dataset and further test it on real-world graphs. Our experiments reveal that 1-WL limited models struggle to make precise estimations of SPs. However, they can generalise to approximate the graph generation processes of networks by comparing their predicted SP with the ones originating from synthetic generators. This first study on GNN-based motif estimation also hints at how using direct SP estimation can help go past the theoretical limitations that motif estimation faces when performed through subgraph counting.

神经网络图(GNNs)是图示显示学习的主要方法。然而,除了子图频率估计外,它们对网络基本重要性(SP)预测的应用仍然不够深入,文献中也没有确定基准。我们提议解决这个问题,将SP估计作为独立于子图频率估计的一项任务。我们的方法从频率计算转向直接的SP估计,并将问题调节为多目标回归。改写是对大图的可解释性、稳定性和可缩放性的优化。我们使用大型合成数据集验证我们的方法,并在真实世界的图表上进一步测试。我们的实验显示,1-WL的有限模型很难精确估计SP。然而,它们可以通过比较预测的SP与合成发电机产生的模型来概括网络的图形生成过程。关于基于GNN的模型估计的首次研究还暗示,使用直接的SP估计如何有助于克服通过子图计法进行时Motif估计所面临的理论局限性。


Article 37

Title@2025-07-10 (4): Principled Foundations for Preference Optimization

Title: Principled Foundations for Preference Optimization Prinzipierte Grundlagen für die Preference-Optimierung 最优化原则基金会 2507.07855v1

Authors (7): Wenxuan Zhou, Shujian Zhang, Brice Magdalou, John Lambert, Ehsan Amid, Richard Nock, Andrew Hard

In this paper, we show that direct preference optimization (DPO) is a very specific form of a connection between two major theories in the ML context of learning from preferences: loss functions (Savage) and stochastic choice (Doignon-Falmagne and Machina). The connection is established for all of Savage’s losses and at this level of generality, (i) it includes support for abstention on the choice theory side, (ii) it includes support for non-convex objectives on the ML side, and (iii) it allows to frame for free some notable extensions of the DPO setting, including margins and corrections for length. Getting to understand how DPO operates from a general principled perspective is crucial because of the huge and diverse application landscape of models, because of the current momentum around DPO, but also – and importantly – because many state of the art variations on DPO definitely occupy a small region of the map that we cover. It also helps to understand the pitfalls of departing from this map, and figure out workarounds.

在本文中,我们表明,直接优惠优化(DPO)是一种非常具体的联系形式,它涉及从偏好中学习的两个主要理论:损失功能(Savage)和随机选择(Doignon-Falmagne和Machina),这一联系是针对Savage所有损失和这一一般程度确定的,(一)它包括支持在选择理论方面弃权,(二)它包括支持ML一侧的非康韦克斯目标,以及(三)它允许为DPO设置的一些值得注意的免费扩展设定框架,包括边距和长程校正。了解DPO如何从一般原则角度运作至关重要,因为目前DPO周围的动力很大,但也非常重要的是,因为DPO的许多艺术变异状态肯定占据了我们所覆盖的地图上的一个小区域。它还有助于理解离开该地图的陷阱,并绘制出工作路线。


Article 38

Title@2025-07-10 (4): Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain

Title: Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v1

Authors (5): Zizhou Zhang, Qinyan Shen, Zhuohuan Hu, Qianying Liu, Huijie Shen

Small and Medium-sized Enterprises (SMEs) are vital to the modern economy, yet their credit risk analysis often struggles with scarce data, especially for online lenders lacking direct credit records. This paper introduces a Graph Neural Network (GNN)-based framework, leveraging SME interactions from transaction and social data to map spatial dependencies and predict loan default risks. Tests on real-world datasets from Discover and Ant Credit (23.4M nodes for supply chain analysis, 8.6M for default prediction) show the GNN surpasses traditional and other GNN baselines, with AUCs of 0.995 and 0.701 for supply chain mining and default prediction, respectively. It also helps regulators model supply chain disruption impacts on banks, accurately forecasting loan defaults from material shortages, and offers Federal Reserve stress testers key data for CCAR risk buffers. This approach provides a scalable, effective tool for assessing SME credit risk.

中小企业对现代经济至关重要,但它们的信用风险分析往往与稀缺数据纠缠不休,特别是缺乏直接信用记录的在线放款人。本文介绍了一个基于图表神经网络(GNN)的框架,利用交易和社会数据的中小企业互动,绘制空间依赖性和社会数据,预测贷款违约风险。发现和Ant Credit(23.4M供应链分析实际世界数据集测试(23.4M供应链分析节点,8.6M违约预测节点)显示GNN超过了传统和其他GNN基线,供应链采矿和违约预测分别为0.995和0.701澳元。该文件还帮助监管者模拟供应链对银行的影响,准确预测物资短缺造成的贷款违约,并为CCAR风险缓冲提供联邦储备压力测试员关键数据。这种方法为评估中小企业信用风险提供了可扩展的有效工具。


Article 39

Title@2025-07-10 (4): Optimization Guarantees for Square-Root Natural-Gradient Variational Inference

Title: Optimization Guarantees for Square-Root Natural-Gradient Variational Inference Optimierungsgarantien für Square-Root Natural-Gradient Variational Inferenz 平方-极极自然-梯度变动性推断的最佳保障 2507.07853v1

Authors (4): Navish Kumar, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Aurelien Lucchi

Variational inference with natural-gradient descent often shows fast convergence in practice, but its theoretical convergence guarantees have been challenging to establish. This is true even for the simplest cases that involve concave log-likelihoods and use a Gaussian approximation. We show that the challenge can be circumvented for such cases using a square-root parameterization for the Gaussian covariance. This approach establishes novel convergence guarantees for natural-gradient variational-Gaussian inference and its continuous-time gradient flow. Our experiments demonstrate the effectiveness of natural gradient methods and highlight their advantages over algorithms that use Euclidean or Wasserstein geometries.

与自然渐渐下降的不同的推论往往表明实践中的快速趋同,但其理论趋同保证一直难以确定。甚至对于最简单的、涉及对日志相似性并使用高斯近似法的案例也是如此。我们证明,对于这类案例,可以使用高斯共变法的平方根参数来规避挑战。这种方法为自然渐渐变变-高西因推论及其持续时间梯度流提供了新的趋同保证。我们的实验证明了自然梯度方法的有效性,并突显了这些方法相对于使用厄克西德或瓦塞尔斯坦地貌的算法的优势。


Article 40

Title@2025-07-10 (4): Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective

Title: Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective Pre-Trained AI Model Assisted Online Entscheidungsfindung unter fehlenden Kovariaten: Eine theoretische Perspektive 在失踪的共变之下协助作出在线决策的模式:理论视角 2507.07852v1

Authors (2): Haichen Hu, David Simchi-Levi

We study a sequential contextual decision-making problem in which certain covariates are missing but can be imputed using a pre-trained AI model. From a theoretical perspective, we analyze how the presence of such a model influences the regret of the decision-making process. We introduce a novel notion called “model elasticity”, which quantifies the sensitivity of the reward function to the discrepancy between the true covariate and its imputed counterpart. This concept provides a unified way to characterize the regret incurred due to model imputation, regardless of the underlying missingness mechanism. More surprisingly, we show that under the missing at random (MAR) setting, it is possible to sequentially calibrate the pre-trained model using tools from orthogonal statistical learning and doubly robust regression. This calibration significantly improves the quality of the imputed covariates, leading to much better regret guarantees. Our analysis highlights the practical value of having an accurate pre-trained model in sequential decision-making tasks and suggests that model elasticity may serve as a fundamental metric for understanding and improving the integration of pre-trained models in a wide range of data-driven decision-making problems.

我们从理论角度分析了这种模型的存在如何影响决策过程的遗憾。我们引入了一个叫作“模型弹性”的新概念,将奖励功能的敏感性量化为真实的共变体与其被估算对应方之间的差异。这个概念提供了一个统一的方法来描述由于模型估算而产生的遗憾,而不管其根本的缺失机制如何。更令人惊讶的是,在随机(MAR)设置的缺失下,我们显示有可能使用来自或多级统计学习的工具和双倍有力的回归工具对预先培训的模式进行顺序校准。这种校准大大提高了被选的共变体的质量,从而导致更好的遗憾保证。我们的分析强调了在一系列由数据驱动的决策任务中有一个准确的预培训模式的实际价值,并表明模型弹性可以作为理解和改进受校准前模型在一系列由数据驱动的决策问题中的整合的基本衡量标准。


Article 41

Title@2025-07-10 (4): Revisiting the Predictability of Performative, Social Events

Title: Revisiting the Predictability of Performative, Social Events Über die Vorhersagbarkeit von performativen, gesellschaftlichen Veranstaltungen 重新审视表演性、社会活动的可预测性 2503.11713v2

Authors (1): Juan C. Perdomo

Social predictions do not passively describe the future; they actively shape it. They inform actions and change individual expectations in ways that influence the likelihood of the predicted outcome. Given these dynamics, to what extent can social events be predicted? This question was discussed throughout the 20th century by authors like Merton, Morgenstern, Simon, and others who considered it a central issue in social science methodology. In this work, we provide a modern answer to this old problem. Using recent ideas from performative prediction and outcome indistinguishability, we establish that one can always efficiently predict social events accurately, regardless of how predictions influence data. While achievable, we also show that these predictions are often undesirable, highlighting the limitations of previous desiderata. We end with a discussion of various avenues forward.

社会预测并不被动地描述未来; 它们积极塑造未来。 它们以影响预期结果可能性的方式告知行动并改变个人期望。 鉴于这些动态, 社会事件可以预测到什么程度? 这个问题在整个20世纪由莫顿、 蒙根斯特尔、 西蒙等作者和其他认为是社会科学方法中心问题的作者讨论过。 在这项工作中, 我们为这个老问题提供了现代答案。 我们利用最近从业绩预测和结果差异化中得出的理念, 确定人们总是能够有效地准确预测社会事件, 不论预测如何影响数据。 虽然我们可以实现,但我们也表明这些预测往往不可取,凸显了以前的边缘的局限性。 我们最后要讨论各种途径。


Article 42

Title@2025-07-10 (4): “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents

Title: “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents “So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents “告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。 2507.07848v1

Authors (3): Giovanni Dispoto, Paolo Bonetti, Marcello Restelli

Recent advances in Reinforcement Learning (RL) largely benefit from the inclusion of Deep Neural Networks, boosting the number of novel approaches proposed in the field of Deep Reinforcement Learning (DRL). These techniques demonstrate the ability to tackle complex games such as Atari, Go, and other real-world applications, including financial trading. Nevertheless, a significant challenge emerges from the lack of interpretability, particularly when attempting to comprehend the underlying patterns learned, the relative importance of the state features, and how they are integrated to generate the policy’s output. For this reason, in mission-critical and real-world settings, it is often preferred to deploy a simpler and more interpretable algorithm, although at the cost of performance. In this paper, we propose a novel algorithm, supported by theoretical guarantees, that can extract an interpretable policy (e.g., a linear policy) without disregarding the peculiarities of expert behavior. This result is obtained by considering the advantage function, which includes information about why an action is superior to the others. In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience. The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario, demonstrating its ability to extract meaningful information from complex expert policies.

强化学习(RL)最近的进展主要得益于深神经网络的纳入,这增加了深强化学习(DRL)领域提出的新办法的数目。这些技术表明有能力处理Atari、Go等复杂游戏和其他现实世界应用,包括金融交易,然而,一个重大挑战来自缺乏可解释性,特别是在试图理解所了解的基本模式、国家特征的相对重要性以及这些特征如何结合以产生政策产出时。为此原因,在任务关键和实际世界环境中,往往倾向于采用更简单、更易解释的算法,但以业绩成本为代价。在本文件中,我们提出一种新的算法,在理论保证的支持下,可以产生一种可解释的政策(例如线性政策),而不会忽视专家行为的特殊性。这一结果通过考虑优势功能,包括某项行动为何优于其他行为的信息。与以往的工作不同,我们的方法使得能够利用先前收集的经验来培训一项可解释的政策。我们提出的算法是从经验的角度,从模拟其典型的金融控制环境到从一个富有经验性的专家政策。


Article 43

Title@2025-07-10 (4): Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components

Title: Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components Bewertung der Einhaltung der Hierarchischen Sicherheitsgrundsätze durch LLM-Agenten: Ein leichter Maßstab für die Erprobung grundlegender Steuerungskomponenten 遵守等级安全原则:基础控制组成部分检验的轻量基准 2506.02357v2

Authors (1): Ram Potham

Credible safety plans for advanced AI development require methods to verify agent behavior and detect potential control deficiencies early. A fundamental aspect is ensuring agents adhere to safety-critical principles, especially when these conflict with operational goals. This paper introduces a lightweight, interpretable benchmark to evaluate an LLM agent’s ability to uphold a high-level safety principle when faced with conflicting task instructions. Our evaluation of six LLMs reveals two primary findings: (1) a quantifiable “cost of compliance” where safety constraints degrade task performance even when compliant solutions exist, and (2) an “illusion of compliance” where high adherence often masks task incompetence rather than principled choice. These findings provide initial evidence that while LLMs can be influenced by hierarchical directives, current approaches lack the consistency required for reliable safety governance.

高级AI开发的可靠安全计划要求早期核实代理人行为和发现潜在控制缺陷的方法。一个根本的方面是确保代理人遵守安全关键原则,特别是在这些原则与业务目标发生冲突时。本文件提出了一个轻量的、可解释的基准,用以评估有限责任管理代理人在面临任务指示冲突时坚持高层次安全原则的能力。我们对六大LMs的评估揭示了两个主要结论:(1) 量化的“遵守成本 ” , 即即使存在合规解决方案,但安全限制降低任务绩效;(2) “遵守预期” ,即高度遵守往往掩盖不称职的任务而不是原则性选择。这些结论提供了初步证据,表明尽管LLMs可能受到等级指令的影响,但目前的做法缺乏可靠的安全治理所需的一致性。


Article 44

Title@2025-07-10 (4): Unsupervised Morphological Tree Tokenizer

Title: Unsupervised Morphological Tree Tokenizer Unüberwachter morphologischer Baum Tokenizer 不受监督的病理树化器 2406.15245v2

Authors (5): Qingyang Zhu, Xiang Hu, Pengyu Ji, Wei Wu, Kewei Tu

As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named $\textit{MorphOverriding}$ to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks. Code is available at https://github.com/martianmartina/TreeTokenizer.

作为语言模型的基石,象征性化意味着将文字输入分解成预定义的原子单位。常规统计符号经常扰乱单词内部的构成界限,从而腐蚀语义信息。为了解决这一缺陷,我们引入了形态结构指导符号化,并提出了一个深刻的模型以诱导字符级结构。具体地说,深层模型将内部结构和文字表达方式与一个名为$\textit{Morphovertriding}$的机制联合编码,以确保模phemes的不兼容性。通过以自我监督的目标培训模型,我们的方法能够引导与形态学规则相一致的字符级结构,而没有附加说明的培训数据。基于导出的结构,我们的算法通过自上至下方式的词汇匹配将文字符号化。经验性结果显示,拟议的方法有效地保留了形态分解任务和语言建模任务上广泛采用的方法,如BPE和WordPiece。代码可在 https://github.com/martianmatina/Treekeizer上查阅。


Article 45

Title@2025-07-10 (4): Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model

Title: Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model Statistische Physik-Analyse von Graphen-Neuronalen-Netzwerken: Annäherung an die Optimität im kontextuellen stochastischen Blockmodell 图形神经网络的统计物理学分析:在背景随机区块模型中接近最佳性 2503.01361v2

Authors (2): O. Duranthon, L. Zdeborová

Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-called oversmoothing; and overcoming it is one of the practically motivated challenges. We consider the situation where information is aggregated by multiple steps of convolution, leading to graph convolutional networks (GCNs). We analyze the generalization performance of a basic GCN, trained for node classification on data generated by the contextual stochastic block model. We predict its asymptotic performance by deriving the free energy of the problem, using the replica method, in the high-dimensional limit. Calling depth the number of convolutional steps, we show the importance of going to large depth to approach the Bayes-optimality. We detail how the architecture of the GCN has to scale with the depth to avoid oversmoothing. The resulting large depth limit can be close to the Bayes-optimality and leads to a continuous GCN. Technically, we tackle this continuous limit via an approach that resembles dynamical mean-field theory (DMFT) with constraints at the initial and final times. An expansion around large regularization allows us to solve the corresponding equations for the performance of the deep GCN. This promising tool may contribute to the analysis of further deep neural networks.

图形神经网络(GNNS)旨在处理与图形相关的数据。 它们正在发现越来越多的应用范围; 但是, 与其他现代机器学习技术一样, 它们的理论理解有限。 GNNS在从相交聚合步骤相距甚远的节点收集信息时会遇到困难。 这种状况部分是由于所谓的超浮; 克服了它是一个实际驱动的挑战。 我们考虑的是信息通过多步相接而汇总的情况, 导致图形相趋网络( GCNs) 。 我们分析一个基础GCN的通用性能, 受过环境随机区块模型生成的数据的节点分类培训。 我们预测其无线性能, 方法是在高维限度内, 利用复制方法来获取问题的自由能量。 调深层的进化步骤的数量, 我们展示着深入接近Bayes- 最优化的步伐。 我们详细说明GCN的架构如何与深度相近, 以避免超近CN 度 度 度 度 度 模型的深度分析。 由此而产生的深度深度深度限制, GDMDF 直径直径直径直径直径直径直径直径直径直径直径直径直径直径, , 直到GDMDRialalalalalalalalalalal 。


Article 46

Title@2025-07-10 (4): Towards Benchmarking Foundation Models for Tabular Data With Text

Title: Towards Benchmarking Foundation Models for Tabular Data With Text Auf dem Weg zu Benchmarking-Grundlagenmodellen für tabellarische Daten mit Text 建立文字表格数据基准基准基础模型 2507.07829v1

Authors (5): Martin Mráz, Breenda Das, Anshul Gupta, Lennart Purucker, Frank Hutter

Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.

表格数据基础模型正在迅速演变,人们越来越有兴趣将其扩展,以支持自由文本特征等额外模式,然而,现有表格数据基准很少包括文字栏,确定具有文字内容丰富的真实世界表格数据集并非三重性,我们提出了一系列简单而有效的通缩型战略,用于将文字纳入传统的表格管道。此外,我们衡量目前最先进的表格基础模型如何通过手动整理具有有意义的文字特征的真实世界表格数据集集处理文本数据。我们的研究是改进文本表格数据基础模型基准的重要一步。


Article 47

Title@2025-07-10 (4): An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications

Title: An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications Eine empirische Bernsteinungleichheit für abhängige Daten in Hilbert-Räumen und Anwendungen 希尔伯特空间和应用中依赖数据方面的不平等问题 2507.07826v1

Authors (4): Erfan Mirzaei, Andreas Maurer, Vladimir R. Kostic, Massimiliano Pontil

Learning from non-independent and non-identically distributed data poses a persistent challenge in statistical learning. In this study, we introduce data-dependent Bernstein inequalities tailored for vector-valued processes in Hilbert space. Our inequalities apply to both stationary and non-stationary processes and exploit the potential rapid decay of correlations between temporally separated variables to improve estimation. We demonstrate the utility of these bounds by applying them to covariance operator estimation in the Hilbert-Schmidt norm and to operator learning in dynamical systems, achieving novel risk bounds. Finally, we perform numerical experiments to illustrate the practical implications of these bounds in both contexts.

从非独立和未识别分布的数据中学习,对统计学习构成持续的挑战。在本研究中,我们引入了针对希尔伯特空间病媒估值过程而定制的数据依赖伯恩斯坦不平等。我们的不平等既适用于固定过程,也适用于非静止过程,并利用暂时分离的变量之间可能迅速恶化的相互关系来改进估计。我们通过将这些界限应用于希尔伯特-施密特规范中的共变操作者估算和动态系统中的操作者学习,从而实现新的风险界限,来证明这些界限的效用。最后,我们进行了数字实验,以说明这两个范围内这些界限的实际影响。


Article 48

Title@2025-07-10 (4): Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution

Title: Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution Symmetrie entdecken Breaking in Physical Systems mit entspannter Gruppenkonvolution 发现物理系统中的对称断裂与放松的集团革命 2310.02299v8

Authors (5): Rui Wang, Elyssa Hofgard, Han Gao, Robin Walters, Tess E. Smidt

Modeling symmetry breaking is essential for understanding the fundamental changes in the behaviors and properties of physical systems, from microscopic particle interactions to macroscopic phenomena like fluid dynamics and cosmic structures. Thus, identifying sources of asymmetry is an important tool for understanding physical systems. In this paper, we focus on learning asymmetries of data using relaxed group convolutions. We provide both theoretical and empirical evidence that this flexible convolution technique allows the model to maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in various physical systems. We employ various relaxed group convolution architectures to uncover various symmetry-breaking factors that are interpretable and physically meaningful in different physical systems, including the phase transition of crystal structure, the isotropy and homogeneity breaking in turbulent flow, and the time-reversal symmetry breaking in pendulum systems.

建模对称分解对于理解物理系统的行为和特性的根本变化至关重要,从微粒粒相互作用到流体动态和宇宙结构等宏观现象。 因此,确定不对称的来源是了解物理系统的一个重要工具。 在本文中,我们侧重于通过放松群体变异来学习数据的不对称性。 我们提供了理论和经验证据,证明这种灵活的共变技术使模型能够保持与数据一致的最高水平的均匀性,并发现各种物理系统中微妙的对称破碎因素。 我们使用各种放松的集团变异结构来发现在不同物理系统中可以解释和具有实际意义的各种对称破碎因素,包括晶体结构的阶段过渡、在动荡中打破的异位和同质性,以及时反向对称断裂。


Article 49

Title@2025-07-10 (4): MAEBE: Multi-Agent Emergent Behavior Framework

Title: MAEBE: Multi-Agent Emergent Behavior Framework MAEBE: Multi-Agent Emergent Behavior Framework 多边代理新兴行为框架 2506.03053v2

Authors (4): Sinem Erisken, Timothy Gothard, Martin Leitgab, Ram Potham

Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.

对孤立的LLMs进行传统的AI安全评价是不够的,因为多试剂AI联合体变得很普遍,带来新的新风险。本文件介绍了多代理新兴行为评价(MAEBE)框架,以系统评估此类风险。我们利用MAEBE和最伟大的良好基准(以及一种新型的双重反向问题技术)来证明:(1)LLM道德偏好,特别是工具伤害的道德偏好,在单一代理体和组合体中都令人惊讶地变得脆弱,随着问题设置而发生重大变化。 (2)LLM联合体的道德推理不能直接从新出现的集团动态造成的孤立的代理体行为中预见出来。(3)具体地说,集合体展示了影响趋同的同行压力等现象,即使由上司指导,也突出了不同的安全和一致性挑战。我们的调查结果强调,有必要在其互动的多剂环境中评价AI系统。


Article 50

Title@2025-07-10 (4): An Algorithm for Learning Smaller Representations of Models With Scarce Data

Title: An Algorithm for Learning Smaller Representations of Models With Scarce Data Ein Algorithmus für das Lernen kleinerer Darstellungen von Modellen mit knappen Daten 学习缺乏数据模型较小比例模型的计算方法 2010.07990v2

Authors (1): Adrian de Wynter

We present an algorithm for solving binary classification problems when the dataset is not fully representative of the problem being solved, and obtaining more data is not possible. It relies on a trained model with loose accuracy constraints, an iterative hyperparameter searching-and-pruning procedure over a search space $\Theta$, and a data-generating function. Our algorithm works by reconstructing up to homology the manifold on which lies the support of the underlying distribution. We provide an analysis on correctness and runtime complexity under ideal conditions and an extension to deep neural networks. In the former case, if $\size{\Theta}$ is the number of hyperparameter sets in the search space, this algorithm returns a solution that is up to $2(1 - {2^{-\size{\Theta}}})$ times better than simply training with an enumeration of $\Theta$ and picking the best model. As part of our analysis we also prove that an open cover of a dataset has the same homology as the manifold on which lies the support of the underlying probability distribution, if and only said dataset is learnable. This latter result acts as a formal argument to explain the effectiveness of data expansion techniques.

当数据集不能完全代表正在解决的问题时,我们提出了一个解决二进制分类问题的算法,而获取更多的数据是不可能的。它依赖于一个训练有素的模型,该模型具有松散的精确度限制,一个对搜索空间的迭代超参数搜索和调整程序$\Theta$,以及一个数据生成功能。我们的算法工作,通过将支持基础分布的方块重建成同质,将之与支持基础分布的方块相提并论。我们在理想条件下对正确性和运行时间复杂性进行了分析,并将数据扩展至深神经网络。在前一种情况下,如果$\ size_Theta}$是搜索空间的超参数组数,则该算法返回一个高达2(1 - {2\\\ size_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


Article 51

Title@2025-07-10 (4): On the Effect of Instruction Tuning Loss on Generalization

Title: On the Effect of Instruction Tuning Loss on Generalization Auf die Auswirkungen der Instruktion Tuning Verlust auf die Verallgemeinerung 指示计票损失对普遍化的影响的影响 2507.07817v1

Authors (4): Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty

Instruction Tuning has emerged as a pivotal post-training paradigm that enables pre-trained language models to better follow user instructions. Despite its significance, little attention has been given to optimizing the loss function used. A fundamental, yet often overlooked, question is whether the conventional auto-regressive objective - where loss is computed only on response tokens, excluding prompt tokens - is truly optimal for instruction tuning. In this work, we systematically investigate the impact of differentially weighting prompt and response tokens in instruction tuning loss, and propose Weighted Instruction Tuning (WIT) as a better alternative to conventional instruction tuning. Through extensive experiments on five language models of different families and scale, three finetuning datasets of different sizes, and five diverse evaluation benchmarks, we show that the standard instruction tuning loss often yields suboptimal performance and limited robustness to input prompt variations. We find that a low-to-moderate weight for prompt tokens coupled with a moderate-to-high weight for response tokens yields the best-performing models across settings and also serve as better starting points for the subsequent preference alignment training. These findings highlight the need to reconsider instruction tuning loss and offer actionable insights for developing more robust and generalizable models. Our code is open-sourced at https://github.com/kowndinya-renduchintala/WIT.

教学图案已成为一个关键的培训后范例,使得培训前语言模型能够更好地遵循用户指令。尽管它意义重大,但很少注意优化所使用的损失功能。一个根本性但经常被忽视的问题是常规自动递减目标(即损失只计算在应答标牌上,不包括及时标牌上)是否真正是教学调适的最佳方法。在这项工作中,我们系统地调查在调试损失时对快速和应答标牌进行不同加权的影响,并提议将加权指示图案(WIT)作为常规调控的更好替代方法。通过对不同家庭和规模的五种语言模型、三个不同尺寸的微调数据集和五个不同的评估基准进行广泛实验,我们发现标准指示调整损失往往产生不理想的性能,对输入快速变异作用的力度也有限。我们发现,在调控标物的低调权重加上中度至高度的响应标物,可以产生最佳表现模式,还可以作为随后的优惠调整培训的更佳起点。这些发现,我们需要重新思考的源码/变换。


Article 52

Title@2025-07-10 (4): Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers

Title: Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers Achten Sie auf Aufmerksamkeit Verteilung: Eine neue lokale Lipschitz Bound für Transformatoren ” 注意注意分发 “ : “ 变革者新地方利普施奇茨圆环 “ 。 2507.07814v1

Authors (4): Nikolay Yudin, Alexander Gaponov, Sergei Kudriashov, Maxim Rakhuba

We present a novel local Lipschitz bound for self-attention blocks of transformers. This bound is based on a refined closed-form expression for the spectral norm of the softmax function. The resulting bound is not only more accurate than in the prior art, but also unveils the dependence of the Lipschitz constant on attention score maps. Based on the new findings, we suggest an explanation of the way distributions inside the attention map affect the robustness from the Lipschitz constant perspective. We also introduce a new lightweight regularization term called JaSMin (Jacobian Softmax norm Minimization), which boosts the transformer’s robustness and decreases local Lipschitz constants of the whole network.

我们展示了一个新的本地 Lipschitz , 用于装配自闭式变压器。 这个绑定基于软式函数光谱规范的精细封闭式表达方式。 由此形成的绑定不仅比先前的艺术更准确, 也暴露了Lipschitz 常数对关注度计分图的依赖性。 根据新的发现, 我们建议解释注意地图内的分布方式如何影响利普施茨常数的稳健性。 我们还引入了一个新的轻度正规化术语, 名为 JaSMin (Jacobian Softmax 规范最小化) , 以提升变压器的稳健性, 并降低整个网络的本地 Lipschitz 常数 。


Article 53

Title@2025-07-10 (4): “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

Title: “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models “I am bad”: Verdolmetschen von Stealthy, Universal und Robust Audio Jailbreaks in Audio-Language-Modellen “我是坏人”:在音频语言模型中解释隐形、通用和强势音频牢房破损 2502.00718v2

Authors (3): Isha Gupta, David Khachaturov, Robert Mullins

The rise of multimodal large language models has introduced innovative human-machine interaction paradigms but also significant challenges in machine learning safety. Audio-Language Models (ALMs) are especially relevant due to the intuitive nature of spoken communication, yet little is known about their failure modes. This paper explores audio jailbreaks targeting ALMs, focusing on their ability to bypass alignment mechanisms. We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples, demonstrating the first universal jailbreaks in the audio modality, and show that these remain effective in simulated real-world conditions. Beyond demonstrating attack feasibility, we analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech - suggesting that the most effective perturbations for eliciting toxic outputs specifically embed linguistic features within the audio signal. These results have important implications for understanding the interactions between different modalities in multimodal models, and offer actionable insights for enhancing defenses against adversarial audio attacks.

多式大型语言模式的兴起引入了创新的人类机器互动模式,但也带来了机器学习安全方面的重大挑战。由于口语交流的直观性质,音频-语言模型(ALMs)特别相关,但对其失败模式知之甚少。本文探讨了针对ALMs的音频突破,重点是其绕过校准机制的能力。我们构建了贯穿于各种提示、任务甚至基本音频样本的对立干扰,展示了音频模式中的第一个普遍突破,并表明这些在模拟现实世界条件下依然有效。我们分析了ALMs如何解释这些音频对立实例,并揭示了这些实例,以编码无法察觉的第一人次有毒言论—-这表明,在诱导出有毒产出时,最有效的干扰具体地将语言特征嵌入音频信号中。这些结果对理解多种模式之间互动有着重要影响,并提供了可操作的洞察力,以加强对对抗对敌声攻击的防御。


Article 54

Title@2025-07-10 (4): Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks

Title: Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks Tiefe Überlebensanalyse in multimodalen medizinischen Daten: Ein parametrischer und probabilistischer Ansatz mit kompetitiven Risiken 多模式医疗数据深度生存分析:与相竞风险的参数和概率分析方法 2507.07804v1

Authors (5): Alba Garrido, Alejandro Almodóvar, Patricia A. Apellániz, Juan Parras, Santiago Zazo

Accurate survival prediction is critical in oncology for prognosis and treatment planning. Traditional approaches often rely on a single data modality, limiting their ability to capture the complexity of tumor biology. To address this challenge, we introduce a multimodal deep learning framework for survival analysis capable of modeling both single and competing risks scenarios, evaluating the impact of integrating multiple medical data sources on survival predictions. We propose SAMVAE (Survival Analysis Multimodal Variational Autoencoder), a novel deep learning architecture designed for survival prediction that integrates six data modalities: clinical variables, four molecular profiles, and histopathological images. SAMVAE leverages modality specific encoders to project inputs into a shared latent space, enabling robust survival prediction while preserving modality specific information. Its parametric formulation enables the derivation of clinically meaningful statistics from the output distributions, providing patient-specific insights through interactive multimedia that contribute to more informed clinical decision-making and establish a foundation for interpretable, data-driven survival analysis in oncology. We evaluate SAMVAE on two cancer cohorts breast cancer and lower grade glioma applying tailored preprocessing, dimensionality reduction, and hyperparameter optimization. The results demonstrate the successful integration of multimodal data for both standard survival analysis and competing risks scenarios across different datasets. Our model achieves competitive performance compared to state-of-the-art multimodal survival models. Notably, this is the first parametric multimodal deep learning architecture to incorporate competing risks while modeling continuous time to a specific event, using both tabular and image data.

在预测和治疗规划的肿瘤学中,准确的存活预测至关重要。传统方法往往依赖单一的数据模式,限制了它们捕捉肿瘤生物学复杂性的能力。为了应对这一挑战,我们为生存分析引入一个多式深层次学习框架,能够建模单一和相互竞争的风险假设情景,评价综合多种医疗数据源对生存预测的影响。我们提议SAMVAE(Survival分析多模式自动编码),这是为生存预测设计的新颖的深层次学习结构,它综合了六个数据模式:临床变量、四个分子剖面图和病理学图像。SAMVAE利用具体事件模式将投入投射到共同的潜在空间,从而能够在保存模式特定信息的同时进行可靠的生存预测。它的参数设计有助于从产出分布中得出具有临床意义的统计数据,通过互动的多媒体提供病人特有的见解,有助于更知情的临床决策,并为在肿瘤学中进行可解释的、数据驱动的存活分析奠定基础。我们评估了两个癌症组、4个分子剖面剖面图和较低品级结构模型,同时将高层次数据模型纳入一个经过升级的磁度数据模型,以进行成功的磁度数据分析。


Article 55

Title@2025-07-10 (4): Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning

Title: Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning Kontextuelle Banditen in der Zahlungsabwicklung: Nicht einheitliche Exploration und überwachtes Lernen 付款处理:非统一探索和监督学习 2412.00569v2

Authors (2): Akhila Vangara, Alex Egg

Uniform random exploration in decision-making systems supports off-policy learning via supervision but incurs high regret, making it impractical for many applications. Conversely, non-uniform exploration offers better immediate performance but lacks support for off-policy learning. Recent research suggests that regression oracles can bridge this gap by combining non-uniform exploration with supervised learning. In this paper, we analyze these approaches within a real-world industrial context at Adyen, a large global payments processor characterized by batch logged delayed feedback, short-term memory, and dynamic action spaces under the Empirical Risk Minimization (ERM) framework. Our analysis reveals that while regression oracles significantly improve performance, they introduce challenges due to rigid algorithmic assumptions. Specifically, we observe that as a policy improves, subsequent generations may perform worse due to shifts in the reward distribution and increased class imbalance in the training data. This degradation occurs de spite improvements in other aspects of the training data, leading to decreased performance in successive policy iterations. We further explore the long-term impact of regression oracles, identifying a potential “oscillation effect.” This effect arises when regression oracles influence probability estimates and the realizability of subsequent policy models, leading to fluctuations in performance across iterations. Our findings highlight the need for more adaptable algorithms that can leverage the benefits of regression oracles without introducing instability in policy performance over time.

决策系统中的统一随机探索有助于通过监督进行政策外学习,但非常遗憾,使许多应用不切实际。相反,非统一勘探能提供更好的即时业绩,但缺乏对政策外学习的支持。最近的研究表明,回归或奇迹可以通过将非统一探索与监督学习相结合来弥补这一差距。在本文中,我们在Adyen这个大型全球支付进程机构,在现实世界的工业环境中分析这些办法,其特点是分批记录延迟反馈、短期记忆和风险最小化(ERM)框架下的动态行动空间。我们的分析表明,虽然回归或奇迹大大改善了业绩,但它们带来了因僵硬的算法假设而产生的挑战。具体地说,我们观察到,随着政策的改进,后几代人可能因奖励分配的变化和培训数据中的班级不平衡加剧。尽管培训数据的其他方面有所改进,导致连续政策反复变化的绩效下降。我们进一步探索回归或奇迹的长期影响,并找出潜在的“振荡效应 ” 。 当回归或奇迹带来这种影响时,当回归或奇迹影响着我们政策性变现的概率时,则会影响我们的政策变后变的演的周期性结果。


Article 56

Title@2025-07-10 (4): Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models

Title: Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models Raumfüllende Regularisierung für robuste und interpretierbare nichtlineare State Space Modelle 强力和可解释的非线性国家空间模型的空间巡空常规化 2507.07792v1

Authors (3): Hermann Klein, Max Heinz Herkersdorf, Oliver Nelles

The state space dynamics representation is the most general approach for nonlinear systems and often chosen for system identification. During training, the state trajectory can deform significantly leading to poor data coverage of the state space. This can cause significant issues for space-oriented training algorithms which e.g. rely on grid structures, tree partitioning, or similar. Besides hindering training, significant state trajectory deformations also deteriorate interpretability and robustness properties. This paper proposes a new type of space-filling regularization that ensures a favorable data distribution in state space via introducing a data-distribution-based penalty. This method is demonstrated in local model network architectures where good interpretability is a major concern. The proposed approach integrates ideas from modeling and design of experiments for state space structures. This is why we present two regularization techniques for the data point distributions of the state trajectories for local affine state space models. Beyond that, we demonstrate the results on a widely known system identification benchmark.

国家空间动态代表是非线性系统最普遍的方法,而且往往被选用于系统识别。在培训期间,国家轨迹可以显著地变形,导致国家空间数据覆盖不足。这可能会给空间导向培训算法带来重大问题,如依赖电网结构、树分隔或类似方法。除了妨碍培训外,重大国家轨迹变形还恶化了可解释性和稳健性特性。本文件提出了一种新的空间填充正规化模式,通过引入基于数据分布的处罚,确保国家空间的有利数据分布。这种方法体现在当地模型网络结构中,良好的可解释性是一个主要关注问题。拟议方法将国家空间结构实验模型和设计中的想法综合在一起。这就是为什么我们为当地扇形州空间模型的数据点分布提出两种正规化技术。除此之外,我们展示了广为人知的系统识别基准的结果。


Article 57

Title@2025-07-10 (4): Understanding Chain-of-Thought in LLMs through Information Theory

Title: Understanding Chain-of-Thought in LLMs through Information Theory Verständnis der in LLMs durch Informationstheorie gesuchten Gedankenkette 通过信息理论在LLM 中探索了解链 2411.11984v2

Authors (3): Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu

Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through the use of Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks. However, existing CoT evaluation techniques either require annotated CoT data or fall short in accurately assessing intermediate reasoning steps, leading to high rates of false positives. In this paper, we formalize CoT reasoning in LLMs through an information-theoretic lens. Specifically, our framework quantifies the `information-gain’ at each reasoning step, enabling the identification of failure modes in LLMs without the need for expensive annotated datasets. We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets, where it significantly outperforms existing outcome-based methods by providing more accurate insights into model performance on individual subtasks.

大型语言模型(LLMs)通过使用“Cott”推理,在复杂的推理任务中表现出了令人印象深刻的成绩,使模型能够将问题分为可管理的子任务,然而,现有的Cot评价技术要么需要附加注释的COT数据,要么在准确评估中间推理步骤方面做得不够,导致出现高比例的假正数。在本文件中,我们通过信息理论透镜将LLMs中的Cot推理工作正式化。具体地说,我们的框架对每个推理步骤的“信息收益”进行了量化,使得LLOMs的失败模式得以识别,而无需花费昂贵的附加注释数据集。我们通过微量算术、GSM8K和PRM800k数据集的广泛实验,展示了我们的方法的有效性,在这些实验中,它通过提供对单个子任务模型性能的更准确的洞察力,大大超过现有的基于结果的方法。


Article 58

Title@2025-07-10 (4): Unsupervised Automata Learning via Discrete Optimization

Title: Unsupervised Automata Learning via Discrete Optimization Unüberwachtes Automata-Lernen über Diskrete Optimierung 通过 Discrete 优化化学习不受监督的自动自动数据 2303.14111v2

Authors (8): Simon Lutz, Daniil Kaminskyi, Florian Wittbold, Simon Dierl, Falk Howar, Barbara König, Emmanuel Müller, Daniel Neider

Automata learning is a successful tool for many application domains such as robotics and automatic verification. Typically, automata learning techniques operate in a supervised learning setting (active or passive) where they learn a finite state machine in contexts where additional information, such as labeled system executions, is available. However, other settings, such as learning from unlabeled data - an important aspect in machine learning - remain unexplored. To overcome this limitation, we propose a framework for learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled words. We show that this problem is computationally hard and develop three learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate practical feasibility in the context of unsupervised anomaly detection.

Automata 学习是许多应用领域的成功工具, 如机器人和自动验证。 通常, Automata 学习技术在有监督的学习环境中( 主动或被动) 运行, 在有额外信息的情况下学习一定的状态机器, 如标签系统执行。 但是, 其它设置, 如从未标的数据中学习 — — 机器学习的一个重要方面 — — 仍然没有被探索。 为了克服这一限制, 我们提出了一个框架, 用于从一个特定多组未标词中学习确定性的有限自动测量( DFA ) 。 我们显示, 这个问题在计算上是困难的, 并开发了三种基于限制优化的学习算法。 此外, 我们为优化问题引入了新颖的正规化计划, 以提高我们 DFAs 的总体可解释性。 使用原型实施, 我们展示了在未经监控的异常探测背景下的实际可行性 。


Article 59

Title@2025-07-10 (4): Learning Algorithms in the Limit

Title: Learning Algorithms in the Limit Algorithmen lernen an der Grenze 在限制范围内学习算法 2506.15543v2

Authors (2): Hristo Papazov, Nicolas Flammarion

This paper studies the problem of learning computable functions in the limit by extending Gold’s inductive inference framework to incorporate \textit{computational observations} and \textit{restricted input sources}. Complimentary to the traditional Input-Output Observations, we introduce Time-Bound Observations, and Policy-Trajectory Observations to study the learnability of general recursive functions under more realistic constraints. While input-output observations do not suffice for learning the class of general recursive functions in the limit, we overcome this learning barrier by imposing computational complexity constraints or supplementing with approximate time-bound observations. Further, we build a formal framework around observations of \textit{computational agents} and show that learning computable functions from policy trajectories reduces to learning rational functions from input and output, thereby revealing interesting connections to finite-state transducer inference. On the negative side, we show that computable or polynomial-mass characteristic sets cannot exist for the class of linear-time computable functions even for policy-trajectory observations.

本文通过扩展Gold的感性推论框架以纳入 \ textit{ computation 观察} 和\ textit{ remitted 投入源} 来研究限制范围内可计算函数的问题。 与传统的输入- 输出观察相匹配, 我们引入时间- 循环观察, 和政策- 方向观察, 以研究在更现实的限制下可学习一般再现函数。 虽然输入- 产出观察不足以学习限制内一般循环函数的类别, 但我们通过强加计算复杂性限制或补充近似有时限的观察来克服这一学习障碍。 此外, 我们围绕对\ textit{ computation 代理的观察建立一个正式框架, 并表明从政策轨道学习可比较功能会减少从输入和输出中学习合理功能, 从而揭示与有限状态导师推断的有趣联系。 在负面方面, 我们显示, 即使在政策- 轨道观察 也不可能在线性时间可调函数类别中存在可比较或多质- 特征组。


Article 60

Title@2025-07-10 (4): Approximation Depth of Convex Polytopes

Title: Approximation Depth of Convex Polytopes Näherungstiefe von Konvex-Polytopen 电解多面的近似深度 2507.07779v1

Authors (3): Egor Bakaev, Florestan Brunck, Amir Yehudayoff

We study approximations of polytopes in the standard model for computing polytopes using Minkowski sums and (convex hulls of) unions. Specifically, we study the ability to approximate a target polytope by polytopes of a given depth. Our main results imply that simplices can only be trivially approximated''. On the way, we obtain a characterization of simplices as the onlyouter additive’’ convex bodies.

我们用Minkowski总和和(convex boards of) 工会来研究计算多式计算机标准模型中多式计算机的近似值。 具体地说, 我们研究用某种深度的多式计算机来近似目标多式计算机的能力。 我们的主要结果表明, 单体只能“ 略微接近 ” 。 在路上, 我们获得了“ 单体” 的描述, 认为它是唯一的“ 外添加” 的共体 。


Article 61

Title@2025-07-10 (4): Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training

Title: Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten 同步任务行为: 测试时训练中对齐多个任务 2507.07778v1

Authors (4): Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon

Generalizing neural networks to unseen target domains is a significant challenge in real-world deployments. Test-time training (TTT) addresses this by using an auxiliary self-supervised task to reduce the domain gap caused by distribution shifts between the source and target. However, we find that when models are required to perform multiple tasks under domain shifts, conventional TTT methods suffer from unsynchronized task behavior, where the adaptation steps needed for optimal performance in one task may not align with the requirements of other tasks. To address this, we propose a novel TTT approach called Synchronizing Tasks for Test-time Training (S4T), which enables the concurrent handling of multiple tasks. The core idea behind S4T is that predicting task relations across domain shifts is key to synchronizing tasks during test time. To validate our approach, we apply S4T to conventional multi-task benchmarks, integrating it with traditional TTT protocols. Our empirical results show that S4T outperforms state-of-the-art TTT methods across various benchmarks.

将神经网络推广到看不见的目标领域是现实世界部署中的一项重大挑战。 测试时间培训(TTT)通过使用辅助性自我监督的任务来解决这个问题,以减少源和目标之间分配变化造成的领域差距。 然而,我们发现,当模型需要在域转移下执行多项任务时,传统的TTT方法会遇到不同步的任务行为,在其中一项任务中,最佳业绩所需的适应步骤可能不符合其他任务的要求。 为了解决这个问题,我们提议了一种叫作测试时间培训同步任务(S4T)的新型TTTT方法,这可以同时处理多项任务。 S4T背后的核心思想是,预测跨域变化的任务关系是在试验期间同步任务的关键。为了验证我们的方法,我们将S4T应用到常规的多任务基准中,将它与传统的TT协议结合起来。我们的经验结果表明,S4T在各种基准中超越了最新的TTT方法。


Article 62

Title@2025-07-10 (4): Deep Learning is Not So Mysterious or Different

Title: Deep Learning is Not So Mysterious or Different Deep Learning ist nicht so geheimnisvoll oder anders 深深学习不是那么神秘或不同 2503.02113v2

Authors (1): Andrew Gordon Wilson

Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized, using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.

纵深神经网络往往被视为不同于其他模型类别,无视常规的笼统概念。流行的超常行为的例子包括良性超称、双向下降和超平衡的成功。我们争论说,这些现象并不有别于神经网络,或特别神秘。此外,这种一般化行为可以直觉地理解和严格定性,使用诸如PAC-Bayes和可计算的假设界限等长期的概括化框架和可计算假设界限。我们在解释这些现象时提出软性暗示偏见作为关键的统一原则:而不是限制假说空间以避免过度适应,而接受灵活的假设空间,同时软性地偏好与数据一致的更简单的解决办法。这一原则可以在许多模型类中进行编码,因此深层次的学习不像其他模型类那样神秘或不同。然而,我们也强调深层次的学习在其他方面是相对不同的,例如代表性学习能力、模式连接等现象及其相对普遍性。


Article 63

Title@2025-07-10 (4): A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision

Title: A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision Ein einheitliches empirisches Risikominimierungs-Framework für flexible N-Tuples Schwache Überwachung 灵活N-Tuples弱监督统一经验风险最小化框架 2507.07771v1

Authors (4): Shuying Huang, Junpeng Li, Changchun Hua, Yana Yang

To alleviate the annotation burden in supervised learning, N-tuples learning has recently emerged as a powerful weakly-supervised method. While existing N-tuples learning approaches extend pairwise learning to higher-order comparisons and accommodate various real-world scenarios, they often rely on task-specific designs and lack a unified theoretical foundation. In this paper, we propose a general N-tuples learning framework based on empirical risk minimization, which systematically integrates pointwise unlabeled data to enhance learning performance. This paper first unifies the data generation processes of N-tuples and pointwise unlabeled data under a shared probabilistic formulation. Based on this unified view, we derive an unbiased empirical risk estimator that generalizes a broad class of existing N-tuples models. We further establish a generalization error bound for theoretical support. To demonstrate the flexibility of the framework, we instantiate it in four representative weakly supervised scenarios, each recoverable as a special case of our general model. Additionally, to address overfitting issues arising from negative risk terms, we adopt correction functions to adjust the empirical risk. Extensive experiments on benchmark datasets validate the effectiveness of the proposed framework and demonstrate that leveraging pointwise unlabeled data consistently improves generalization across various N-tuples learning tasks.

为了减轻监督学习过程中的说明负担,N-tuples学习最近作为一种强大的薄弱监督方法出现。虽然现有的N-tuples学习方法将双向学习扩展至更高层次的比较,并适应各种现实世界情景,但它们往往依赖特定任务的设计,缺乏统一的理论基础。在本文件中,我们提出了一个基于经验风险最小化的通用N-tuples学习框架,该框架系统地整合了点性非标签数据,以提高学习绩效。本文件首先将N-tuples和点性非标签数据的数据生成进程统一起来,根据这种统一的观点,我们产生了一个公正的实证风险估算器,它概括了现有N-tuples模型的广泛类别。我们进一步确立了一个用于理论支持的概括错误。为了展示该框架的灵活性,我们将它从四个有代表性的、有代表性的、没有标签的情景中进行回馈,作为我们一般模型的一个特殊实例。此外,为了解决由负面风险术语引起的过多的问题,我们采用了纠正功能,以调整经验风险风险系数。基于这种统一的观点,我们产生了一个公正的实证风险估计器,将现有的N-treal labless slaveal laviews slaved slax slaviing thes aviild laviild slaviildingingingingingingingingdgislations agislation laveild laviilds agislds agations bed laveild laveild laveildations saldgation saldgation saldgationdgationaldgildgilddgildgild labild labild laveild salddddddddd labildildildildildild labildddddddddddgaldgalddgaldddddddddddddildddddddddddddddddddddd lad laddddddddddddddddddddd lad lagaldald lad


Article 64

Title@2025-07-10 (4): BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

Title: BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v1

Authors (3): Ruohong Liu, Jack Umenberger, Yize Chen

Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.

近年来,在设计建筑能源管理的强化学习(RL)代理物方面取得了显著进展。虽然在模拟或受控环境中观察到个别成功,但在建筑动态和操作设想中,在效率和通用方面,RL方法的可扩展性仍是一个未决问题。在这项工作中,我们正式确定跨环境、多目标建筑能源管理任务的通用空间,并拟订多目标环境RL问题。这种表述有助于理解在多种控制目标(如舒适程度和能源消耗)下,在气候和热对流动态等不同业务背景下转移已学政策的挑战。我们提供了一个原则性方法,将这种背景信息在现实建筑环境中进行参数化,并建立一个新的基准,以便利评估实际建筑控制任务中可通用的RL算法。我们的成果显示,现有的多目标RL方法能够在相互冲突的目标之间实现合理的权衡。然而,其性能在某些环境变化下退化,强调将动态背景信息纳入政策学习过程的重要性。


Article 65

Title@2025-07-10 (4): TRIX- Trading Adversarial Fairness via Mixed Adversarial Training

Title: TRIX- Trading Adversarial Fairness via Mixed Adversarial Training TRIX- Trading-Adversarial Fairness durch gemischte Adversarial Training TRIX-通过混合反向培训进行贸易反向公平 2507.07768v1

Authors (3): Tejaswini Medi, Steffen Jung, Margret Keuper

Adversarial Training (AT) is a widely adopted defense against adversarial examples. However, existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability. This results in adversarial unfairness: classes with well distinguishable features (strong classes) tend to become more robust, while classes with overlapping or shared features(weak classes) remain disproportionately susceptible to adversarial attacks. We observe that strong classes do not require strong adversaries during training, as their non-robust features are quickly suppressed. In contrast, weak classes benefit from stronger adversaries to effectively reduce their vulnerabilities. Motivated by this, we introduce TRIX, a feature-aware adversarial training framework that adaptively assigns weaker targeted adversaries to strong classes, promoting feature diversity via uniformly sampled targets, and stronger untargeted adversaries to weak classes, enhancing their focused robustness. TRIX further incorporates per-class loss weighting and perturbation strength adjustments, building on prior work, to emphasize weak classes during the optimization. Comprehensive experiments on standard image classification benchmarks, including evaluations under strong attacks such as PGD and AutoAttack, demonstrate that TRIX significantly improves worst-case class accuracy on both clean and adversarial data, reducing inter-class robustness disparities, and preserves overall accuracy. Our results highlight TRIX as a practical step toward fair and effective adversarial defense.

反对立培训(AT)是被广泛接受的对抗性例子的辩护理由,然而,现有办法通常对所有班级都适用统一的培训目标,忽略了阶级脆弱性的差异。这导致对抗性不公平:具有明显不同特点的班级(强班级)往往变得更强,而具有重叠或共享特点的班级(弱班级)则特别容易受到敌对性攻击。我们注意到,强班级在培训期间不需要强大的对手,因为其非野蛮特征很快被压制。相比之下,弱班级受益于强势对手,以有效减少其脆弱性。我们为此引入了有特色的对抗性培训框架TRIX,这种有特色的有意识的对抗性培训框架将较弱的对手派到强班级,通过统一的抽样目标促进特色多样性,而较弱班级(弱班级)则更强的非目标对手,从而增强其集中性强势。TRIX还进一步吸收了每班级损失加权和扰动强度的强度调整,以强调优化班级。关于标准形象分类基准的全面实验,包括在强攻势攻击(如PGD和AutAtalAttack)下进行的评估,显示TRIX大幅改进了我们对立性判断性最强的等级之间的准确性判断性。


Article 66

Title@2025-07-10 (4): Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape

Title: Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape Verteilte und dezentralisierte Ausbildung: Technische Governance-Herausforderungen in einer sich verändernden KI-Landschaft 分散和分散化培训:AI 横向变化中的技术治理挑战 2507.07765v1

Authors (3): Jakub Kryś, Yashvardhan Sharma, Janet Egan

Advances in low-communication training algorithms are enabling a shift from centralised model training to compute setups that are either distributed across multiple clusters or decentralised via community-driven contributions. This paper distinguishes these two scenarios - distributed and decentralised training - which are little understood and often conflated in policy discourse. We discuss how they could impact technical AI governance through an increased risk of compute structuring, capability proliferation, and the erosion of detectability and shutdownability. While these trends foreshadow a possible new paradigm that could challenge key assumptions of compute governance, we emphasise that certain policy levers, like export controls, remain relevant. We also acknowledge potential benefits of decentralised AI, including privacy-preserving training runs that could unlock access to more data, and mitigating harmful power concentration. Our goal is to support more precise policymaking around compute, capability proliferation, and decentralised AI development.

低通信培训算法的进展正促使从集中模式培训转向计算分布于多个集群或分散于社区驱动的组合,本文件区分了这两种情景—-分布式和分散式培训—-在政策讨论中很少理解,而且往往混杂在一起;我们讨论了它们如何通过增加计算结构、能力扩散以及可探测性和可关闭性侵蚀的风险来影响技术自主治理;这些趋势预示着一种可能的新模式,可能挑战计算治理的关键假设,但我们强调,某些政策杠杆,如出口管制,仍然具有相关性;我们也承认分散式的AI的潜在好处,包括保密培训,这些培训能够解开获取更多数据的途径,并减轻有害的权力集中;我们的目标是支持围绕计算、能力扩散和分散式的AI开发进行更精确的决策。


Article 67

Title@2025-07-10 (4): Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Title: Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Squeeze the Soaked Sponge: Effiziente Off-Policy-Verstärkung Feinsteuerung für großes Sprachmodell 挤压海绵:高效非政策强化大语言模式的高效非政策改进微调 2507.06892v2

Authors (8): Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao

Reinforcement Learning (RL) has demonstrated its potential to improve the reasoning ability of Large Language Models (LLMs). One major limitation of most existing Reinforcement Finetuning (RFT) methods is that they are on-policy RL in nature, i.e., data generated during the past learning process is not fully utilized. This inevitably comes at a significant cost of compute and time, posing a stringent bottleneck on continuing economic and efficient scaling. To this end, we launch the renaissance of off-policy RL and propose Reincarnating Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy data. ReMix consists of three major components: (1) Mix-policy proximal policy gradient with an increased Update-To-Data (UTD) ratio for efficient training; (2) KL-Convex policy constraint to balance the trade-off between stability and flexibility; (3) Policy reincarnation to achieve a seamless transition from efficient early-stage learning to steady asymptotic improvement. In our experiments, we train a series of ReMix models upon PPO, GRPO and 1.5B, 7B base models. ReMix shows an average Pass@1 accuracy of 52.10% (for 1.5B model) with 0.079M response rollouts, 350 training steps and achieves 63.27%/64.39% (for 7B model) with 0.007M/0.011M response rollouts, 50/75 training steps, on five math reasoning benchmarks (i.e., AIME’24, AMC’23, Minerva, OlympiadBench, and MATH500). Compared with 15 recent advanced models, ReMix shows SOTA-level performance with an over 30x to 450x reduction in training cost in terms of rollout data volume. In addition, we reveal insightful findings via multifaceted analysis, including the implicit preference for shorter responses due to the Whipping Effect of off-policy discrepancy, the collapse mode of self-reflection behavior under the presence of severe off-policyness, etc.

强化学习(RL) 展示了它提高大语言模型(LLM)推理能力的潜力。 多数现有的强化微调(RFT)方法的一个主要局限性是,这些方法在性质上是政策性RL,即,过去学习过程中生成的数据没有得到充分利用。这不可避免地要付出巨大的计算和时间代价,对持续的经济和高效规模造成严格的瓶颈。 为此,我们启动了退出政策RL的复兴,并提议恢复Mix-polient Explex Proximal Deferation(ReMix),这是在政策上推行50FT方法(PPO和GROPO等)以利用离政策数据的一般方法。 ReMix包含三个主要组成部分:(1) Mix-polict-Data(UTD)比率增加;(2) KL-Conex政策制约,以平衡稳定性和灵活性之间的贸易模式;(3) 政策再整合,从早期高效学习到平稳的测试改进。 (REMMMM) 在我们的实验中,我们用一个平均的模型显示SO-DM数据序列,包括最新的SO-Dal deal deal deal deal destrate deal deal demode deal deal deal demodestration the mession the messional demodudustration the mex in the mex the mex mex mex mex deal deal dealation)


Article 68

Title@2025-07-10 (4): OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting

Title: OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v1

Authors (4): Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee

Machine unlearning seeks to remove the influence of particular data or class from trained models to meet privacy, legal, or ethical requirements. Existing unlearning methods tend to forget shallowly: phenomenon of an unlearned model pretend to forget by adjusting only the model response, while its internal representations retain information sufficiently to restore the forgotten data or behavior. We empirically confirm the widespread shallowness by reverting the forgetting effect of various unlearning methods via training-free performance recovery attack and gradient-inversion-based data reconstruction attack. To address this vulnerability fundamentally, we define a theoretical criterion of ``deep forgetting’’ based on one-point-contraction of feature representations of data to forget. We also propose an efficient approximation algorithm, and use it to construct a novel general-purpose unlearning algorithm: One-Point-Contraction (OPC). Empirical evaluations on image classification unlearning benchmarks show that OPC achieves not only effective unlearning performance but also superior resilience against both performance recovery attack and gradient-inversion attack. The distinctive unlearning performance of OPC arises from the deep feature forgetting enforced by its theoretical foundation, and recaps the need for improved robustness of machine unlearning methods.

现有的不学习方法往往浅显地忘记:一个假装遗忘的未学习模式现象,只对模型反应进行调整,而其内部陈述则保留足够的信息,以恢复被遗忘的数据或行为。我们从经验上确认,通过不培训的恢复性能攻击和梯度反向数据重建攻击,恢复各种不学习方法的忘却效应,是普遍的浅度。为了从根本上解决这一脆弱性问题,我们根据数据特征显示的一点合同界定了“深为忘记”的理论标准。我们还提议了一个有效的近似算法,并用它来构建一个新的通用的不学习算法:一点合同(OPC)。关于图像分类不学习基准的实证评价表明,OPC不仅取得了有效的不学习业绩,而且还提高了对业绩恢复性攻击和梯度反向反向攻击的抗力。OPC的独特不学习表现源于其理论基础所实施的深刻特征,并重新确认需要改进机器不学习方法的稳健性。


Article 69

Title@2025-07-10 (4): Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks

Title: Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks Effiziente und skalierbare Abschätzung der Verteilungseffekte mit multi-Task Neuronalen Netzwerken 与多任务神经神经网络一道高效和可缩放地估算分布式治疗效应 2507.07738v1

Authors (5): Tomu Hirata, Undral Byambadalai, Tatsushi Oka, Shota Yasui, Shingo Uto

We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity.

在随机实验中,我们建议采用新的多任务神经网络方法来估计分布式处理效应(DTE) 。虽然DTE对以平均治疗效果(ATE)为重点的常规方法的实验结果提供了更多的粒子洞察力,但用回归调整方法来估计它却提出了重大挑战。具体地说,分配尾的精确因数据不平衡而受到影响,计算效率低下是因为需要解决许多回归问题,特别是在工业中常见的大型数据集中。为了解决这些限制,我们的方法利用多任务神经网络来估计有条件的结果分布,同时纳入单一形状限制和多临界标签学习以提高准确性。为了证明我们拟议方法的实际有效性,我们运用了我们的方法来模拟和现实世界数据集,包括旨在减少美国水消耗的随机实地实验和日本主要流平台的大规模A/B测试。实验结果一贯表明各种数据集的优异性表现,确定我们的方法是现代因果关系推论应用的可靠和实用解决办法,需要详细了解治疗效果的遗传性。


Article 70

Title@2025-07-10 (4): GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing

Title: GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing GuardVal: Dynamic Large Language Model Jailbreak Evaluation für umfassende Sicherheitstests 警卫:综合安全测试动态大语言示范监狱防爆评价 2507.07735v1

Authors (4): Peiyan Zhang, Haibo Jin, Liying Kang, Haohan Wang

Jailbreak attacks reveal critical vulnerabilities in Large Language Models (LLMs) by causing them to generate harmful or unethical content. Evaluating these threats is particularly challenging due to the evolving nature of LLMs and the sophistication required in effectively probing their vulnerabilities. Current benchmarks and evaluation methods struggle to fully address these challenges, leaving gaps in the assessment of LLM vulnerabilities. In this paper, we review existing jailbreak evaluation practices and identify three assumed desiderata for an effective jailbreak evaluation protocol. To address these challenges, we introduce GuardVal, a new evaluation protocol that dynamically generates and refines jailbreak prompts based on the defender LLM’s state, providing a more accurate assessment of defender LLMs’ capacity to handle safety-critical situations. Moreover, we propose a new optimization method that prevents stagnation during prompt refinement, ensuring the generation of increasingly effective jailbreak prompts that expose deeper weaknesses in the defender LLMs. We apply this protocol to a diverse set of models, from Mistral-7b to GPT-4, across 10 safety domains. Our findings highlight distinct behavioral patterns among the models, offering a comprehensive view of their robustness. Furthermore, our evaluation process deepens the understanding of LLM behavior, leading to insights that can inform future research and drive the development of more secure models.

在大型语言模型(LLMs)中,破狱袭击暴露出巨大的脆弱性,导致其产生有害或不道德的内容。评估这些威胁尤其具有挑战性,因为LLMs的性质不断演变,有效调查其脆弱性需要精密的技巧。目前的基准和评价方法努力充分应对这些挑战,在评估LLM脆弱性方面留下空白。在本文件中,我们审查现有的破狱评估做法,并找出三种假定的缺陷,以制定有效的破狱评估议定书。为了应对这些挑战,我们引入了GuardVal,这是一项新的评价协议,根据捍卫者LLM的状态,积极生成和完善破狱提示,为捍卫者LLMs处理安全危急情况的能力提供了更准确的评估。此外,我们提出了新的优化方法,防止在迅速完善期间出现停滞,确保产生日益有效的破狱提示,暴露捍卫者LMMs的更深处弱点。我们将这一协议应用于从Mistral-7b到GPT-4的10个安全领域的各种模式。我们的调查结果突出了各种模式中的不同行为模式,提供了对其健全性的全面认识。此外,我们的评估进程加深了对LMM行为的未来认识。


Article 71

Title@2025-07-10 (4): Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model

Title: Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model Robuste, federführende, personalisierte mittlere Schätzung für das Gaussian Mixture Model Gaussian Mixture 模型的联邦硬性个人化平均平均估计值 2504.19955v2

Authors (3): Malhar A. Managoli, Vinod M. Prabhakaran, Suhas Diggavi

Federated learning with heterogeneous data and personalization has received significant recent attention. Separately, robustness to corrupted data in the context of federated learning has also been studied. In this paper we explore combining personalization for heterogeneous data with robustness, where a constant fraction of the clients are corrupted. Motivated by this broad problem, we formulate a simple instantiation which captures some of its difficulty. We focus on the specific problem of personalized mean estimation where the data is drawn from a Gaussian mixture model. We give an algorithm whose error depends almost linearly on the ratio of corrupted to uncorrupted samples, and show a lower bound with the same behavior, albeit with a gap of a constant factor.

以各种数据和个性化相结合的学习最近受到极大关注。 另外,还研究了在联合学习背景下腐败数据是否可靠的问题。 在本文中,我们探讨了将差异性数据的个人化与稳健性相结合的问题,因为其中客户中经常有一部分是腐败的。我们受这个广泛问题的驱使,我们形成了一个简单的即时化,它捕捉了其中的一些困难。我们侧重于从高斯混合模型中提取数据的个人化平均估计的具体问题。我们给出了一种算法,其错误几乎线性地取决于腐败性样品与未腐蚀样品的比例,并且显示与同一行为之间的约束程度较低,尽管存在一个不变因素的空白。


Article 72

Title@2025-07-10 (4): Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means

Title: Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v6

Authors (3): Anna Van Elst, Igor Colin, Stephan Clémençon

This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for situations where communication is constrained. A fundamental challenge in existing mean-based gossip algorithms is their vulnerability to malicious or corrupted nodes. In this paper, we show that an outlier-robust mean can be computed by globally estimating a robust statistic. More specifically, we propose a novel gossip algorithm for rank estimation, referred to as \textsc{GoRank}, and leverage it to design a gossip procedure dedicated to trimmed mean estimation, coined \textsc{GoTrim}. In addition to a detailed description of the proposed methods, a key contribution of our work is a precise convergence analysis: we establish an $\mathcal{O}(1/t)$ rate for rank estimation and an $\mathcal{O}(1 / {t})$ rate for trimmed mean estimation, where by $t$ is meant the number of iterations. Moreover, we provide a breakdown point analysis of \textsc{GoTrim}. We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.

本文讨论了对任意通信图表的八卦算法进行可靠估计的问题。 流言算法完全分散, 仅依赖本地邻居与邻居之间的通信, 使这些算法完全适用于通信受限制的情况。 现有基于平均八卦算法的一个基本挑战是其易受恶意或腐败节点的影响。 在本文中, 我们显示, 可以通过全球估计一个稳妥的统计数据来计算出一个胜值- 粗鲁的平均值。 更具体地说, 我们提出一种新颖的八卦算法, 称为\ textsc{ GoRank} , 并且利用它来设计一个八卦算法, 专门用于缩短平均估计, 并生成了\ textsc{ GoTrim} 。 除了详细描述拟议方法之外, 我们工作的一个关键贡献是精确的趋同分析: 我们为排名估算设定了$mathcal{O} (1/t) 和 $mathcalal {O} (1/ {t} $crimeal ration of rimeal guideal assuralalisalisal exisal explations) export export exports. we four slupulations.


Article 73

Title@2025-07-10 (4): Discrete Optimal Transport and Voice Conversion

Title: Discrete Optimal Transport and Voice Conversion Diskreter Optimaler Transport und Sprachumwandlung 分辨最佳传输和语音转换 2505.04382v2

Authors (2): Anton Selitskiy, Maitreya Kocharekar

In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.

在这项工作中,我们用一个基于矢量的界面处理语音转换任务。为了协调发言者之间的音频嵌入,我们采用了离散的最佳运输图绘制方法。我们的评估结果显示了这一方法的高质量和有效性。此外,我们还表明,将离散的最佳运输作为后处理步骤用于音频生成可能导致合成音频被错误地归类为真实。


Article 74

Title@2025-07-10 (4): Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots

Title: Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v1

Authors (6): Julio Garrido, Javier Vales, Diego Silva-Muñiz, Enrique Riveiro, Pablo López-Matencio, Josué Rivera-Andrade

Cable-Driven Parallel Robots (CDPRs) are increasingly used for load manipulation tasks involving predefined toolpaths with intermediate stops. At each stop, where the platform maintains a fixed pose and the motors keep the cables under tension, the system must evaluate whether it is safe to proceed by detecting anomalies that could compromise performance (e.g., wind gusts or cable impacts). This paper investigates whether anomalies can be detected using only motor torque data, without additional sensors. It introduces an adaptive, unsupervised outlier detection algorithm based on Gaussian Mixture Models (GMMs) to identify anomalies from torque signals. The method starts with a brief calibration period, just a few seconds, during which a GMM is fit on known anomaly-free data. Real-time torque measurements are then evaluated using Mahalanobis distance from the GMM, with statistically derived thresholds triggering anomaly flags. Model parameters are periodically updated using the latest segments identified as anomaly-free to adapt to changing conditions. Validation includes 14 long-duration test sessions simulating varied wind intensities. The proposed method achieves a 100% true positive rate and 95.4% average true negative rate, with 1-second detection latency. Comparative evaluation against power threshold and non-adaptive GMM methods indicates higher robustness to drift and environmental variation.

Cable-Driver 平行机器人(CDPRs) 越来越多地用于使用中间站点的预定义工具路径( CDPRs) 装载操作任务。 在每站, 平台保持固定姿势, 发动机使电缆处于紧张状态, 系统必须评估通过检测可能损害性能的异常现象( 如风螺或电缆撞击) 是否安全进行操作 。 本文调查是否只使用发动机扭动数据就可以检测异常现象, 而不增加传感器 。 它根据高山混合模型( GMMMM) 引入适应性、 不受监督的外部探测算法, 以辨别偏差信号的异常现象 。 方法从一个短暂的校准期开始, 短短几秒钟, 在此期间GMM 适合已知的无异常数据 。 然后使用与GMM的马哈亚诺比斯距离, 以统计学角度得出的阈值触发异常点 。 模型参数定期更新, 使用最新的部分确定为无异常情况以适应变化的条件。 校验包括14次长测试会议, 反向不同风度平均强度, 平均强度, 和高度 度 度 度 度 度 度 度 度 度


Article 75

Title@2025-07-10 (4): Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning

Title: Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning Ausbalancieren der Vergangenheit und Gegenwart: Ein koordiniertes Replay-Framework für das Federated Class-Incremental Learning 平衡过去和现在的平衡:联邦级强化学习协调重现框架 2507.07712v1

Authors (3): Zhuang Qi, Lei Meng, Han Yu

Federated Class Incremental Learning (FCIL) aims to collaboratively process continuously increasing incoming tasks across multiple clients. Among various approaches, data replay has become a promising solution, which can alleviate forgetting by reintroducing representative samples from previous tasks. However, their performance is typically limited by class imbalance, both within the replay buffer due to limited global awareness and between replayed and newly arrived classes. To address this issue, we propose a class wise balancing data replay method for FCIL (FedCBDR), which employs a global coordination mechanism for class-level memory construction and reweights the learning objective to alleviate the aforementioned imbalances. Specifically, FedCBDR has two key components: 1) the global-perspective data replay module reconstructs global representations of prior task in a privacy-preserving manner, which then guides a class-aware and importance-sensitive sampling strategy to achieve balanced replay; 2) Subsequently, to handle class imbalance across tasks, the task aware temperature scaling module adaptively adjusts the temperature of logits at both class and instance levels based on task dynamics, which reduces the model’s overconfidence in majority classes while enhancing its sensitivity to minority classes. Experimental results verified that FedCBDR achieves balanced class-wise sampling under heterogeneous data distributions and improves generalization under task imbalance between earlier and recent tasks, yielding a 2%-15% Top-1 accuracy improvement over six state-of-the-art methods.

联邦级递增学习(FCIL)旨在协作处理不断增加多个客户新来的任务。在各种办法中,数据重放已成为一个大有希望的解决办法,通过重新引入先前任务中的代表性样本,可以减轻忘记的记忆。然而,由于全球认识有限,因此在重播缓冲中,以及在重播和新抵达的类别之间,其性能通常受到阶级不平衡的限制。为了解决这一问题,我们提议为FCIL(FedCBDDR)采用一个明智的等级平衡数据重放方法,该方法采用一个全球协调机制,用于等级级记忆构建和重估学习目标,以减轻上述不平衡。具体地说,FedCBDR有两个关键组成部分:1)全球透视数据重放模块以隐私保存方式重建全球先前任务的表述,然后在重放缓缓缓缓冲中,然后指导一个班级和重要敏感度抽样战略,以实现平衡重现;2)随后,为了处理各任务之间的阶级不平衡,根据任务动态,有意识的温度缩模块调整了班级和例级的校正温度,这降低了多数班的自信,同时加强其对少数群体类的敏感度,同时加强其对最近对五级的敏感性。


Article 76

Title@2025-07-10 (4): Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors

Title: Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors Shapley-based Data Valuation mit gegenseitiger Information: Ein Schlüssel zu veränderten K-Nächsten Nachbarn 与相互信息一起进行基于虚光的数据估值:修改 K- 最近邻的密钥 2312.01991v4

Authors (4): Mohammad Ali Vahedifar, Azim Akhtarshenas, Mohammad Mohammadi Rafatpanah, Maryam Sabbaghian

The K-Nearest Neighbors (KNN) algorithm is widely used for classification and regression; however, it suffers from limitations, including the equal treatment of all samples. We propose Information-Modified KNN (IM-KNN), a novel approach that leverages Mutual Information ($I$) and Shapley values to assign weighted values to neighbors, thereby bridging the gap in treating all samples with the same value and weight. On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80%, 17.08%, and 16.98%, respectively, across 12 benchmark datasets. Experiments on four large-scale datasets further highlight IM-KNN’s robustness to noise, imbalanced data, and skewed distributions.

K- Nearest Mighbors (KNN) 算法被广泛用于分类和回归;然而,它受到各种限制,包括所有样本的平等待遇。我们提议采用信息修改 KNN(IM-KNN)这一新办法,即利用共同信息(I$)和Shapey值来为邻居分配加权值,从而缩小以相同价值和重量处理所有样本方面的差距。 平均而言,IM-KNN在12个基准数据集中提高传统 KNN的准确性、准确性,并分别提高16.80%、17.08%和16.98%。 对4个大型数据集的实验进一步凸显IM-KNN对噪音、不平衡数据和扭曲分布的坚固性。


Article 77

Title@2025-07-10 (4): Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought

Title: Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Rationale-Enhanced Decodierung für multimodale Chain-of-Thought 多式联运谈判链附加说明 2507.07685v1

Authors (3): Shin’ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa

Large vision-language models (LVLMs) have demonstrated remarkable capabilities by integrating pre-trained vision encoders with large language models (LLMs). Similar to single-modal LLMs, chain-of-thought (CoT) prompting has been adapted for LVLMs to enhance multi-modal reasoning by generating intermediate rationales based on visual and textual inputs. While CoT is assumed to improve grounding and accuracy in LVLMs, our experiments reveal a key challenge: existing LVLMs often ignore the contents of generated rationales in CoT reasoning. To address this, we re-formulate multi-modal CoT reasoning as a KL-constrained reward maximization focused on rationale-conditional log-likelihood. As the optimal solution, we propose rationale-enhanced decoding (RED), a novel plug-and-play inference-time decoding strategy. RED harmonizes visual and rationale information by multiplying distinct image-conditional and rationale-conditional next token distributions. Extensive experiments show that RED consistently and significantly improves reasoning over standard CoT and other decoding methods across multiple benchmarks and LVLMs. Our work offers a practical and effective approach to improve both the faithfulness and accuracy of CoT reasoning in LVLMs, paving the way for more reliable rationale-grounded multi-modal systems.

大型视觉语言模型(LVLMs)通过将经过预先培训的视觉编码器与大型语言模型(LLMs)整合起来,展示出非凡的能力。与单一模式的LLMs相似,为LVMs调整了激励思维链(CoT),以通过产生基于视觉和文字投入的中间推理加强多式推理。虽然假定COT可以提高LVMs的基础和准确性,但我们的实验揭示出一个关键的挑战:现有的LVMs往往忽视CoT推理中产生的理由。为了解决这个问题,我们重新将多模式COT推理重新制定为KLL限制的奖励最大化,侧重于理论条件逻辑类同性。作为最佳解决办法,我们提出了基于理论的强化解码(RED),这是一个新的插座和游戏时间解码战略。RED通过增加不同的图像附加条件和合理的下一个象征性分布来协调视觉和理由信息。广泛的实验表明,REDGT和其他解码方法持续和大大改进了对标准COT和其他精确性推理方法的推理,从而改进了我们的多层次推理和LMs。


Article 78

Title@2025-07-10 (4): Accelerating Transposed Convolutions on FPGA-based Edge Devices

Title: Accelerating Transposed Convolutions on FPGA-based Edge Devices Beschleunigung transponierter Konvolutionen auf FPGA-basierten Edge-Geräten 加速基于 FPGA 的边缘设备的转换变速 2507.07683v1

Authors (2): Jude Haris, José Cano

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping, overlapping sums, and ineffectual computations. These inefficiencies further exacerbate the performance bottleneck of TCONV and generative models on resource-constrained edge devices. To address this problem, in this paper we propose MM2IM, a hardware-software co-designed accelerator that combines Matrix Multiplication (MatMul) with col2IM to process TCONV layers on resource-constrained edge devices efficiently. Using the SECDA-TFLite design toolkit, we implement MM2IM and evaluate its performance across 261 TCONV problem configurations, achieving an average speedup of 1.9x against a dual-thread ARM Neon optimized CPU baseline. We then evaluate the performance of MM2IM on a range of TCONV layers from well-known generative models achieving up to 4.2x speedup, and compare it against similar resource-constrained TCONV accelerators, outperforming them by at least 2x GOPs/DSP. Finally, we evaluate MM2IM on the DCGAN and pix2pix GAN models, achieving up to 3x speedup and 2.4x energy reduction against the CPU baseline.

为了解决这个问题,我们在本文件中提议了MM2IM, 一个硬件软件共同设计的加速器,将MM2IM与COL2IM组合在一起,以高效地处理控制资源边缘装置上的TCONV层。我们使用SECDA-TFLite设计工具包,执行MM2IM,并评估其在261 TCONV问题配置中的性能表现,实现1.9x的平均速度,与双轨的ARM Neon优化的CPU基准相对应。然后我们从众所周知的Com2SUI模型到达到4.2x速度的TRIM, 将其与类似的GMMSM2 基准模型相比较。


Article 79

Title@2025-07-10 (4): Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis

Title: Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis Jenseits von Cox-Modellen: Bewertung der Leistungsfähigkeit von Machine-Learning-Methoden bei nichtproportionalen Gefahren und nichtlinearer Überlebensanalyse 超越考克斯模型:评估机器学习方法在非季节性危险和无林性生存分析方面的性能 2504.17568v2

Authors (6): Ivan Rossi, Flavio Sartori, Cesare Rollo, Giovanni Birolo, Piero Fariselli, Tiziana Sanavia

Survival analysis often relies on Cox models, assuming both linearity and proportional hazards (PH). This study evaluates machine and deep learning methods that relax these constraints, comparing their performance with penalized Cox models on a benchmark of three synthetic and three real datasets. In total, eight different models were tested, including six non-linear models of which four were also non-PH. Although Cox regression often yielded satisfactory performance, we showed the conditions under which machine and deep learning models can perform better. Indeed, the performance of these methods has often been underestimated due to the improper use of Harrell’s concordance index (C-index) instead of more appropriate scores such as Antolini’s concordance index, which generalizes C-index in cases where the PH assumption does not hold. In addition, since occasionally high C-index models happen to be badly calibrated, combining Antolini’s C-index with Brier’s score is useful to assess the overall performance of a survival method. Results on our benchmark data showed that survival prediction should be approached by testing different methods to select the most appropriate one according to sample size, non-linearity and non-PH conditions. To allow an easy reproducibility of these tests on our benchmark data, code and documentation are freely available at https://github.com/compbiomed-unito/survhive.

以线性和成比例的危害(PH)为假设,生存分析往往依赖于Cox模型。本研究评估了放松这些限制的机器和深层次学习方法,比较了它们的表现,将其与三个合成和三个真实数据集基准的受罚Cox模型相比较。总共测试了八个不同的模型,包括六个非线性模型,其中四个模型也是非PH。虽然Cox回归往往产生令人满意的效果,但我们展示了机器和深层次学习模型能够更好表现的条件。事实上,由于不当使用Harrell的和谐指数(C-index)而不是安托利尼的和谐指数(Antolini的和谐指数)等更合适的分数,这些分数比较了它们的表现。安托利尼的C-指数和Brier的得分有时发生误差,因此,我们展示这些方法的绩效往往被低估。我们的基准数据预测应该通过测试不同的方法来选择最合适的方法,按照样本大小、非线性/可比较性数据。


Article 80

Title@2025-07-10 (4): Implicit Counterfactual Data Augmentation for Robust Learning

Title: Implicit Counterfactual Data Augmentation for Robust Learning Implizite gegenfaktische Datenvergrößerung für robustes Lernen 强力学习所需的反事实数据放大 2304.13431v4

Authors (3): Xiaoling Zhou, Ou Wu, Michael K. Ng

Machine learning models are prone to capturing the spurious correlations between non-causal attributes and classes, with counterfactual data augmentation being a promising direction for breaking these spurious associations. However, generating counterfactual data explicitly poses a challenge, and incorporating augmented data into the training process decreases training efficiency. This study proposes an Implicit Counterfactual Data Augmentation (ICDA) method to remove spurious correlations and make stable predictions. Specifically, first, a novel sample-wise augmentation strategy is developed that generates semantically and counterfactually meaningful deep features with distinct augmentation strength for each sample. Second, we derive an easy-to-compute surrogate loss on the augmented feature set when the number of augmented samples becomes infinite. Third, two concrete schemes are proposed, including direct quantification and meta-learning, to derive the key parameters for the robust loss. In addition, ICDA is explained from a regularization perspective, revealing its capacity to improve intra-class compactness and augment margins at both class and sample levels. Extensive experiments have been conducted across various biased learning scenarios covering both image and text datasets, demonstrating that ICDA consistently enhances the generalization and robustness performance of popular networks.

机器学习模型容易捕捉非因果属性和类别之间的虚假关联,反事实数据增强是打破这些虚假协会的一个大有希望的方向。然而,生成反事实数据显然构成挑战,并将扩大的数据纳入培训过程会降低培训效率。本研究报告提出了隐含反事实数据增强方法,以消除虚假关联和作出稳定的预测。具体地说,首先,开发了新型的样本增强战略,在每样样本中产生具有明显增强力的精细精细特征。第二,我们在扩大的样本数量无限时,对强化的功能设定的功能进行简单化的代谢损失。第三,提出了两个具体计划,包括直接量化和元学习,以得出稳健损失的关键参数。此外,ICDA从正规化角度解释其提高本级内部缩缩缩缩和增加等级和样本级间距的能力。在包括图像和文本数据集在内的各种有偏差的学习假设中进行了广泛的实验,表明ICDA始终加强通用网络的一般性和稳健性。


Article 81

Title@2025-07-10 (4): Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

Title: Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken 关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果 2507.07675v1

Authors (1): Darshan Makwana

We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the $m\times n$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the “revival” depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e^{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.

我们分析完全连接的定期ReLU 网络中有限宽度范围内的分层有效维度( 特性矩阵的排序) 。 具体地说, 对于固定的 批量的 美元 投入和随机的 Gausian 重量, 我们为预期的 $m\ time n\ 隐藏的激活矩阵的级别得出闭式表达式 。 我们的主要结果表明, $\ mathbb{ [EDim( ell)] =m[ 1- (1- 2/\ pi)\ pl) +O( e- cm} ) , 使排名赤字以 1-2 /\ pi\ aprox 0. 3634$ 的比例递减几何。 我们还证明, 预期排名达到本地最大值的“ revival” 深度 。 特别是, 这些峰值发生在深度 $\ kpprox (k+1/2)\\ log/ pi) $, lexal- a deal- listal listal listal rodeal oral ormadeal oral ormadeal: 我们 rodeal a deal a oral deal deal bal ortial ortial ortial ortial ortial rogal rogal rogal pral pral pral pral pral pressal pral pral oral orgal pral pral oral orgal oral pral pressal pressal pral pressal pral pressal pressal pressal pressal pral pral pral pral pral pral pral pral pral pral pral pral pral pral pral pral pressal pressal pral pral pral pral pral pressal pressal pral pral pral pral pral pral pral pral pral pral pr


Article 82

Title@2025-07-10 (4): Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation

Title: Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation Erlernen der Polstrukturen von Hadronischen Staaten mittels vorausschauender Unsicherheitsabschätzung 使用预测性不确定性估计值的 强力国家学习极极结构 2507.07668v1

Authors (4): Felix Frohnert, Denny Lane B. Sombrillo, Evert van Nieuwenburg, Patrick Emonts

Matching theoretical predictions to experimental data remains a central challenge in hadron spectroscopy. In particular, the identification of new hadronic states is difficult, as exotic signals near threshold can arise from a variety of physical mechanisms. A key diagnostic in this context is the pole structure of the scattering amplitude, but different configurations can produce similar signatures. The mapping between pole configurations and line shapes is especially ambiguous near the mass threshold, where analytic control is limited. In this work, we introduce an uncertainty-aware machine learning approach for classifying pole structures in $S$-matrix elements. Our method is based on an ensemble of classifier chains that provide both epistemic and aleatoric uncertainty estimates. We apply a rejection criterion based on predictive uncertainty, achieving a validation accuracy of nearly $95\%$ while discarding only a small fraction of high-uncertainty predictions. Trained on synthetic data with known pole structures, the model generalizes to previously unseen experimental data, including enhancements associated with the $P_{c\bar{c}}(4312)^+$ state observed by LHCb. In this, we infer a four-pole structure, representing the presence of a genuine compact pentaquark in the presence of a higher channel virtual state pole with non-vanishing width. While evaluated on this particular state, our framework is broadly applicable to other candidate hadronic states and offers a scalable tool for pole structure inference in scattering amplitudes.

将理论预测与实验数据相匹配仍然是原子光谱分析中的一项中心挑战。 特别是, 确定新过时状态十分困难, 因为离临界值接近临界值的异国信号可能来自各种物理机制。 在这方面的一个关键诊断是分散的振幅的极结构, 但不同的配置可以产生相似的特征。 极形配置和线形的绘图在质量临界值附近特别模糊, 分析控制有限 。 在这项工作中, 我们引入了一种具有不确定性的机器学习方法, 将极值结构分类为$S$- matrix元素。 我们的方法基于一个分类链的集合, 提供感知和感知性不确定性的估计数。 我们根据预测性不确定性应用了一个拒绝标准, 达到近95 美元 的验证精确度, 而仅仅丢弃了一小部分高不确定性的预测。 在已知极结构中, 我们用一个模型概括化的实验数据, 包括与 $P\\ bar{\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


Article 83

Title@2025-07-10 (4): Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL

Title: Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL Uncovering RL Integration in SSL Loss: Zielspezifische Implikationen für dateneffiziente RL SSL损失中未覆盖的 RL 整合:对数据高效RL的客观具体影响 2410.17428v3

Authors (2): Ömer Veysel Çağatan, Barış Akgün

In this study, we investigate the effect of SSL objective modifications within the SPR framework, focusing on specific adjustments such as terminal state masking and prioritized replay weighting, which were not explicitly addressed in the original design. While these modifications are specific to RL, they are not universally applicable across all RL algorithms. Therefore, we aim to assess their impact on performance and explore other SSL objectives that do not accommodate these adjustments like Barlow Twins and VICReg. We evaluate six SPR variants on the Atari 100k benchmark, including versions both with and without these modifications. Additionally, we test the performance of these objectives on the DeepMind Control Suite, where such modifications are absent. Our findings reveal that incorporating specific SSL modifications within SPR significantly enhances performance, and this influence extends to subsequent frameworks like SR-SPR and BBF, highlighting the critical importance of SSL objective selection and related adaptations in achieving data efficiency in self-predictive reinforcement learning.

在本研究中,我们调查了特殊目的修改在特殊目的程序框架内的影响,侧重于特定调整,如终端状态掩码和优先重置权重,这些修改在原设计中没有明确述及,虽然这些修改是特别针对RL的,但并不普遍适用所有RL算法,因此,我们旨在评估它们对绩效的影响,并探索其他不符合这些调整的SSL目标,如Barlow Twins和ICORReg。我们评估了Atari 100k基准的六个特殊目的修改,包括有这些修改和没有这些修改的版本。此外,我们在DeepMind控制套件上测试了这些目标的绩效,在没有进行这种修改的地方,我们的调查结果显示,在特殊目的修改中包含具体的特殊目的修改极大地提高了绩效,这种影响延伸到SR-SPR和BBF等后续框架,突出了特殊目的选择和相关的调整对于实现自我准备强化学习数据效率的至关重要。


Article 84

Title@2025-07-10 (4): Curriculum Negative Mining For Temporal Networks

Title: Curriculum Negative Mining For Temporal Networks Curriculum Negative Mining für zeitliche Netzwerke 时间网络负面采矿课程 2407.17070v2

Authors (3): Ziyue Chen, Tongya Zheng, Mingli Song

Temporal networks are effective in capturing the evolving interactions of networks over time, such as social networks and e-commerce networks. In recent years, researchers have primarily concentrated on developing specific model architectures for Temporal Graph Neural Networks (TGNNs) in order to improve the representation quality of temporal nodes and edges. However, limited attention has been given to the quality of negative samples during the training of TGNNs. When compared with static networks, temporal networks present two specific challenges for negative sampling: positive sparsity and positive shift. Positive sparsity refers to the presence of a single positive sample amidst numerous negative samples at each timestamp, while positive shift relates to the variations in positive samples across different timestamps. To robustly address these challenges in training TGNNs, we introduce Curriculum Negative Mining (CurNM), a model-aware curriculum learning framework that adaptively adjusts the difficulty of negative samples. Within this framework, we first establish a dynamically updated negative pool that balances random, historical, and hard negatives to address the challenges posed by positive sparsity. Secondly, we implement a temporal-aware negative selection module that focuses on learning from the disentangled factors of recently active edges, thus accurately capturing shifting preferences. Finally, the selected negatives are combined with annealing random negatives to support stable training. Extensive experiments on 12 datasets and 3 TGNNs demonstrate that our method outperforms baseline methods by a significant margin. Additionally, thorough ablation studies and parameter sensitivity experiments verify the usefulness and robustness of our approach.

与静态网络相比,时间网络有效捕捉网络在一段时间内不断演变的相互作用,如社交网络和电子商务网络。近年来,研究人员主要侧重于为时态图像神经网络(TGNN)开发具体的模型结构,以提高时间节点和边缘的代表性质量。然而,在培训TGNNT时,对负面样本质量的关注有限。与静态网络相比,时间网络对负面取样提出了两种具体挑战:积极的宽度和积极的转变。积极的实用性是指在每个时间戳中都存在一个单一的正样,在每个时间戳中都存在一个纯正的样本,而积极的参数变化则与不同时间戳中正样的样本的差异有关。为了在培训TGNNNT时有力地应对这些挑战,我们引入了“负式采矿”(CurNM)课程学习框架,即一个能适应负面样本难度的模型。在这个框架内,我们首先动态更新了一个负式的负式集合,平衡了随机、历史和硬式的平衡,应对积极性弹道挑战。第二,我们实施了一个对时间-觉觉的负面基点的精确度和反向偏差的实验,因此,最终从学习了一种反式的模型的模型,从学习了一种反式的模型的模型。最后从学习了一种反式的精确的模型的模型的模型的模型的模型的模型的模型的模型。


Article 85

Title@2025-07-10 (4): Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor

Title: Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor Machine Learning-Assisted Surrogate Modellierung mit multi-objektiver Optimierung und Entscheidungsfindung eines Dampfmethan-Reformreaktors 利用蒸气甲烷改造反应堆的多目标优化和决策 2507.07641v1

Authors (3): Seyed Reza Nabavi, Zonglin Guo, Zhiyuan Wang

This study presents an integrated modeling and optimization framework for a steam methane reforming (SMR) reactor, combining a mathematical model, artificial neural network (ANN)-based hybrid modeling, advanced multi-objective optimization (MOO) and multi-criteria decision-making (MCDM) techniques. A one-dimensional fixed-bed reactor model accounting for internal mass transfer resistance was employed to simulate reactor performance. To reduce the high computational cost of the mathematical model, a hybrid ANN surrogate was constructed, achieving a 93.8% reduction in average simulation time while maintaining high predictive accuracy. The hybrid model was then embedded into three MOO scenarios using the non-dominated sorting genetic algorithm II (NSGA-II) solver: 1) maximizing methane conversion and hydrogen output; 2) maximizing hydrogen output while minimizing carbon dioxide emissions; and 3) a combined three-objective case. The optimal trade-off solutions were further ranked and selected using two MCDM methods: technique for order of preference by similarity to ideal solution (TOPSIS) and simplified preference ranking on the basis of ideal-average distance (sPROBID). Optimal results include a methane conversion of 0.863 with 4.556 mol/s hydrogen output in the first case, and 0.988 methane conversion with 3.335 mol/s hydrogen and 0.781 mol/s carbon dioxide in the third. This comprehensive methodology offers a scalable and effective strategy for optimizing complex catalytic reactor systems with multiple, often conflicting, objectives.

这项研究为蒸气甲烷改革反应堆提供了一个综合模型和优化框架,将数学模型、人工神经网络(ANN)基于混合模型、高级多目标优化和多标准决策技术结合起来,为模拟反应堆的性能,采用了一维固定床反应堆模型,计算内部大规模转移阻力,模拟反应堆的性能;为降低数学模型的高计算成本,建立了混合ANN替代模型,平均模拟时间减少了93.8%,同时保持了较高的预测准确性;混合模型随后被嵌入三种MOO情景中,采用非主要分类遗传算法II(NSGA-II)解决方案:(1) 最大限度地实现甲烷转换和氢产出;(2) 最大限度地实现氢产出,同时尽量减少二氧化碳排放量;(3) 合并三个客观案例:采用两种MCDM方法进一步排列和选择了最佳的交换解决办法:采用与理想解决方案相似的排序技术(TOPSISIS)和根据理想平均距离(PROBID)简化了优惠等级。


Article 86

Title@2025-07-10 (4): HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric

Title: HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric HLF-FSL. Eine dezentrale, gefederte Split-Learning-Lösung für IoT auf Hyperledger Fabric HLF-FLF-FLF. 关于超板机纤维化的IOT的分散化的联邦学习分解解决方案 2507.07637v1

Authors (5): Carlos Beis Penedo, Rebeca P. Díaz Redondo, Ana Fernández Vilas, Manuel Fernández Veiga, Francisco Troncoso Pastoriza

Collaborative machine learning in sensitive domains demands scalable, privacy preserving solutions for enterprise deployment. Conventional Federated Learning (FL) relies on a central server, introducing single points of failure and privacy risks, while Split Learning (SL) partitions models for privacy but scales poorly due to sequential training. We present a decentralized architecture that combines Federated Split Learning (FSL) with the permissioned blockchain Hyperledger Fabric (HLF). Our chaincode orchestrates FSL’s split model execution and peer-to-peer aggregation without any central coordinator, leveraging HLF’s transient fields and Private Data Collections (PDCs) to keep raw data and model activations private. On CIFAR-10 and MNIST benchmarks, HLF-FSL matches centralized FSL accuracy while reducing per epoch training time compared to Ethereum-based works. Performance and scalability tests show minimal blockchain overhead and preserved accuracy, demonstrating enterprise grade viability.

在敏感领域开展协作的机器学习需要可扩展的隐私保护解决方案。 常规联邦学习(FL)依靠中央服务器,引入单一的失败点和隐私风险,而分解学习(SL)模式用于隐私,但因连续培训而规模不高。 我们展示了将Freed Slitech(FSL)与允许的链链式超大机器人Fabric(HLF)相结合的分散式架构。 我们的链码在没有任何中央协调员的情况下,将FSL的分解模式执行和对等聚合结合起来,利用HLF的瞬时域和私人数据收集(PDCs)来保持原始数据和模型激活私人功能。 在CIFAR-10和MNIST基准方面,HLF-FL匹配中央的FSL精度,同时比Eeepeum工程减少每小节培训时间。 绩效和可扩展性测试显示最小的链式间接费用和保存精度,展示企业等级的可行性。


Article 87

Title@2025-07-10 (4): Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights

Title: Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights Vergleichende Stimmungsanalyse der öffentlichen Wahrnehmung: Monkeypox vs. COVID-19 Verhaltenseinblicke 对公众感知的比较情绪分析:天花对COVID-19行为洞察力 2505.07430v2

Authors (3): Mostafa Mohaimen Akand Faisal, Rabeya Amin Jhuma, Jamini Jasim

The emergence of global health crises, such as COVID-19 and Monkeypox (mpox), has underscored the importance of understanding public sentiment to inform effective public health strategies. This study conducts a comparative sentiment analysis of public perceptions surrounding COVID-19 and mpox by leveraging extensive datasets of 147,475 and 106,638 tweets, respectively. Advanced machine learning models, including Logistic Regression, Naive Bayes, RoBERTa, DistilRoBERTa and XLNet, were applied to perform sentiment classification, with results indicating key trends in public emotion and discourse. The analysis highlights significant differences in public sentiment driven by disease characteristics, media representation, and pandemic fatigue. Through the lens of sentiment polarity and thematic trends, this study offers valuable insights into tailoring public health messaging, mitigating misinformation, and fostering trust during concurrent health crises. The findings contribute to advancing sentiment analysis applications in public health informatics, setting the groundwork for enhanced real-time monitoring and multilingual analysis in future research.

全球卫生危机的出现,如COVID-19和天花(天花),突显了了解公众情绪为有效公共卫生战略提供信息的重要性,这项研究通过利用147、475和106 638个推文的广泛数据集,对公众对COVID-19和天花的看法进行了比较情绪分析,利用了147、475和106 638个推文的广泛数据集,先进机器学习模型,包括后勤回流、Naive Bayes、RoBERTA、DistilROBERTA和XLNet,用于进行情绪分类,结果显示公众情绪和讨论的关键趋势。分析突出了疾病特征、媒体代表性和流行病疲劳等驱动的公共情绪的重大差异。通过情绪极化和主题趋势的透镜,这项研究为调整公共卫生信息、减少错误信息,以及在同时发生的卫生危机中培养信任提供了宝贵的洞察力。研究结果有助于推进公共卫生信息学中的情绪分析应用,为今后研究中加强实时监测和多语种分析奠定了基础。


Article 88

Title@2025-07-10 (4): Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks

Title: Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks Erforschung der Grenzen der Modellkompression in LLMs: Eine Studie zur Wissensdestillation über QA-Aufgaben 探索LLMM中模型压缩的限度:关于质量保证任务的知识积累研究 2507.07630v1

Authors (4): Joyeeta Datta, Niclas Doll, Qusai Ramadan, Zeyd Boukhers

Large Language Models (LLMs) have demonstrated outstanding performance across a range of NLP tasks, however, their computational demands hinder their deployment in real-world, resource-constrained environments. This work investigates the extent to which LLMs can be compressed using Knowledge Distillation (KD) while maintaining strong performance on Question Answering (QA) tasks. We evaluate student models distilled from the Pythia and Qwen2.5 families on two QA benchmarks, SQuAD and MLQA, under zero-shot and one-shot prompting conditions. Results show that student models retain over 90% of their teacher models’ performance while reducing parameter counts by up to 57.1%. Furthermore, one-shot prompting yields additional performance gains over zero-shot setups for both model families. These findings underscore the trade-off between model efficiency and task performance, demonstrating that KD, combined with minimal prompting, can yield compact yet capable QA systems suitable for resource-constrained applications.

大型语言模型(LLMS)在一系列NLP任务中表现出了杰出的成绩,然而,它们的计算要求阻碍了其在现实世界中受资源限制环境中的部署。这项工作调查了利用知识蒸馏(KD)压缩LLMS的程度,同时保持了在问答(QA)任务方面的强力表现。我们评估了Pythia和Quen2.5家庭根据两个QA基准(SQAD和MLQA)从SQuAD和MLQA中提炼的学生模型,它们处于零发和一发即时的提示性条件之下。结果显示,学生模型保留了90%以上的教师模型的成绩,同时将参数计数减少57.1 %。此外,一发即提示性能使两个模式家庭在零发计数上产生更多的业绩收益。这些结果强调了模型效率和任务性能之间的权衡,表明KD与最小的提示性能相结合,能够产生适合资源受限制应用的紧凑的QA系统。


Article 89

Title@2025-07-10 (4): Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials

Title: Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials Konzentration von Messwerten für nichtlineare Zufallsmatrizen mit Anwendungen in neuronalen Netzwerken und nicht-kommutativen Polynomen 非线性随机随机矩阵的测量浓度,该矩阵应用到神经网络和非模拟多元复合体 2507.07625v1

Authors (1): Radosław Adamczak

We prove concentration inequalities for several models of non-linear random matrices. As corollaries we obtain estimates for linear spectral statistics of the conjugate kernel of neural networks and non-commutative polynomials in (possibly dependent) random matrices.

某些非线性随机矩阵模型的集中不平等被证明。 作为卷轴,我们获得了(可能依赖的)随机矩阵中神经网络和(可能依赖的)非模拟多边矩阵共生内核的线性光谱统计估计数。


Article 90

Title@2025-07-10 (4): TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection

Title: TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TransformEEG: Auf dem Weg zur Verbesserung des Modells Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TerverEEEG:努力改进深学习性EEG Parkinson疾病检测模式 2507.07622v1

Authors (10): Federico Del Pup, Riccardo Brun, Filippo Iotti, Edoardo Paccagnella, Mattia Pezzato, Sabrina Bertozzo, Andrea Zanola, Louis Fabrice Tshimanga, Henning Müller, Manfredo Atzori

Electroencephalography (EEG) is establishing itself as an important, low-cost, noninvasive diagnostic tool for the early detection of Parkinson’s Disease (PD). In this context, EEG-based Deep Learning (DL) models have shown promising results due to their ability to discover highly nonlinear patterns within the signal. However, current state-of-the-art DL models suffer from poor generalizability caused by high inter-subject variability. This high variability underscores the need for enhancing model generalizability by developing new architectures better tailored to EEG data. This paper introduces TransformEEG, a hybrid Convolutional-Transformer designed for Parkinson’s disease detection using EEG data. Unlike transformer models based on the EEGNet structure, TransformEEG incorporates a depthwise convolutional tokenizer. This tokenizer is specialized in generating tokens composed by channel-specific features, which enables more effective feature mixing within the self-attention layers of the transformer encoder. To evaluate the proposed model, four public datasets comprising 290 subjects (140 PD patients, 150 healthy controls) were harmonized and aggregated. A 10-outer, 10-inner Nested-Leave-N-Subjects-Out (N-LNSO) cross-validation was performed to provide an unbiased comparison against seven other consolidated EEG deep learning models. TransformEEG achieved the highest balanced accuracy’s median (78.45%) as well as the lowest interquartile range (6.37%) across all the N-LNSO partitions. When combined with data augmentation and threshold correction, median accuracy increased to 80.10%, with an interquartile range of 5.74%. In conclusion, TransformEEG produces more consistent and less skewed results. It demonstrates a substantial reduction in variability and more reliable PD detection using EEG data compared to the other investigated models.

电离层电离层( EEG) 正在将自身定位为一个重要、低成本、非侵入性诊断工具,用于早期检测帕金森氏病(PD) 。 在这方面,基于 EEG 的深智(DL) 模型显示出了有希望的结果,因为其能够发现信号中高度非线性模式。然而,目前最先进的DL模型由于高跨质变异而导致的通用性差差。这种高变异性凸显出有必要通过开发更适合 EEEG 数据的新结构来提高模型的公正性普遍性。 本文介绍了“ 变异EEG ” , 用于使用 EEEEG 数据检测帕金森氏病( PEEG ) 的混合性变异性变异性变异异性( 变异性变性变变异异性变) 。 该代号专门生成由频道特异性组成的代号,使变异性变异性变异性变异性变异性变异性变异模型中更有效的混合特性。 为了评价所有拟议模型, 由290个主题( 140个PDREG,150个PDRE-D-N-D-D-D-N-N-N-D-D-N-D-D-N-D-S-SD-N-N-N-SD-S-S-I-I-I-I-I-I-I-I-I-I-I-I-N-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-


Article 91

Title@2025-07-10 (4): Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation

Title: Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation Sparse Causal Discovery mit generativer Intervention für unüberwachte Graphen-Domänenanpassung 以未受监督的图形域适应的生成干预生成的简单原因发现 2507.07621v1

Authors (9): Junyu Luo, Yuhao Tang, Yiwei Fu, Xiao Luo, Zhizhuo Kou, Zhiping Xiao, Wei Ju, Wentao Zhang, Ming Zhang

Unsupervised Graph Domain Adaptation (UGDA) leverages labeled source domain graphs to achieve effective performance in unlabeled target domains despite distribution shifts. However, existing methods often yield suboptimal results due to the entanglement of causal-spurious features and the failure of global alignment strategies. We propose SLOGAN (Sparse Causal Discovery with Generative Intervention), a novel approach that achieves stable graph representation transfer through sparse causal modeling and dynamic intervention mechanisms. Specifically, SLOGAN first constructs a sparse causal graph structure, leveraging mutual information bottleneck constraints to disentangle sparse, stable causal features while compressing domain-dependent spurious correlations through variational inference. To address residual spurious correlations, we innovatively design a generative intervention mechanism that breaks local spurious couplings through cross-domain feature recombination while maintaining causal feature semantic consistency via covariance constraints. Furthermore, to mitigate error accumulation in target domain pseudo-labels, we introduce a category-adaptive dynamic calibration strategy, ensuring stable discriminative learning. Extensive experiments on multiple real-world datasets demonstrate that SLOGAN significantly outperforms existing baselines.

未经监督的图形域适应(UGDA) 利用标签的源域图,尽管分布变化,但仍在未贴标签的目标领域实现有效运行。然而,由于因果净性特征的纠缠和全球对齐战略的失败,现有方法往往产生亚优结果。我们建议SLOGAN(Sparse Causal Discovery with General Indition Indiction),这是一种新颖的办法,通过零散因果建模和动态干预机制实现稳定的图形代表转移。具体地说,SLOGAN首先构建了一个稀疏因果图表结构,利用相互的信息瓶颈限制来分解分散、稳定的因果特征,同时通过变异推论来压缩依赖域的虚假相关性。为了解决残余的虚假相关性,我们创新地设计了一个基因化干预机制,通过交叉特性重组打破本地的随机性组合,同时通过共变式制约来保持因果关系。此外,为了减轻目标域伪标签中的错误累积,我们引入了一种分类的动态校准性动态校准战略,确保稳定的歧视性学习。关于多种现实数据基础的大规模实验,显示现有Slof-ang-ang-ADDDDDDDSDSDDDDDSDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDGDGDGDDDDGDGDDDDDDDDDDDDDDDDDDDDDDDDDDDDGDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD


Article 92

Title@2025-07-10 (4): Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0

Title: Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0 Sparse Selbstgebundenes Lernen für energieeffiziente kooperative Intelligenz in der Gesellschaft 5.0 社会节能合作情报学会 2507.07613v1

Authors (7): Davide Domini, Laura Erhan, Gianluca Aguzzi, Lucia Cavallaro, Amirhossein Douzandeh Zenoozi, Antonio Liotta, Mirko Viroli

Federated Learning offers privacy-preserving collaborative intelligence but struggles to meet the sustainability demands of emerging IoT ecosystems necessary for Society 5.0-a human-centered technological future balancing social advancement with environmental responsibility. The excessive communication bandwidth and computational resources required by traditional FL approaches make them environmentally unsustainable at scale, creating a fundamental conflict with green AI principles as billions of resource-constrained devices attempt to participate. To this end, we introduce Sparse Proximity-based Self-Federated Learning (SParSeFuL), a resource-aware approach that bridges this gap by combining aggregate computing for self-organization with neural network sparsification to reduce energy and bandwidth consumption.

联邦学习联盟提供保护隐私的合作情报,但努力满足社会5.0个以人为本的技术未来平衡社会进步与环境责任所需的新兴国际地球电信生态系统的可持续性需求。传统FL方法要求的过度通信带宽和计算资源使得这些系统在规模上无法在环境上持续,在数十亿个资源受限制的装置试图参与时,与绿色国际交流原则造成了根本冲突。为此,我们引入了基于资源节约的粗略近距离自联学习(SParSeFuL),这是一种资源节约办法,通过将自我组织的总计算与神经网络的节能和带宽消费相结合,弥合这一差距。


Article 93

Title@2025-07-10 (4): S2FGL: Spatial Spectral Federated Graph Learning

Title: S2FGL: Spatial Spectral Federated Graph Learning S2FGL: Raumspektrales Federiertes Graphenlernen S2FGL: 空间光谱联邦图表学习 2507.02409v2

Authors (6): Zihan Tan, Suyuan Huang, Guancheng Wan, Wenke Huang, He Li, Mang Ye

Federated Graph Learning (FGL) combines the privacy-preserving capabilities of federated learning (FL) with the strong graph modeling capability of Graph Neural Networks (GNNs). Current research addresses subgraph-FL only from the structural perspective, neglecting the propagation of graph signals on spatial and spectral domains of the structure. From a spatial perspective, subgraph-FL introduces edge disconnections between clients, leading to disruptions in label signals and a degradation in the class knowledge of the global GNN. From a spectral perspective, spectral heterogeneity causes inconsistencies in signal frequencies across subgraphs, which makes local GNNs overfit the local signal propagation schemes. As a result, spectral client drifts occur, undermining global generalizability. To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. The combination of spatial and spectral strategies forms our framework S2FGL. Extensive experiments on multiple datasets demonstrate the superiority of S2FGL. The code is available at https://github.com/Wonder7racer/S2FGL.git.

联邦图表学习(FGL)将联邦学习(FL)的隐私保护能力与图形神经网络(GNNS)的强烈图形模型模型能力相结合。当前的研究仅从结构角度处理子光子FL,忽视了该结构空间和光谱域图信号的传播。从空间角度看,子光子-FL引入客户之间的边缘断裂,导致标签信号中断和全球GN的班级知识退化。从光谱角度看,光谱异质导致各子集的信号频率不一致,使得本地GNN超出本地信号传播计划的适宜性。结果,光谱客户的漂移会发生,破坏全球通用性。为了应对挑战,我们提议建立一个全球知识库,以减少标签信号中断和频率调整,以应对光谱客户的漂移。空间和光谱战略的结合构成我们S2FGL.关于多个数据集的广泛实验的框架,显示了S2FGL的优越性。代码见https://github.com/Wenderassar/S2FL.F。


Article 94

Title@2025-07-10 (4): Offline Trajectory Optimization for Offline Reinforcement Learning

Title: Offline Trajectory Optimization for Offline Reinforcement Learning Offline-Trajektorienoptimierung für Offline-Verstärkungslernen 离线轨迹优化用于离线强化学习 2404.10393v2

Authors (9): Ziqi Zhao, Zhaochun Ren, Liu Yang, Yunsen Liang, Fajie Yuan, Pengjie Ren, Zhumin Chen, jun Ma, Xin Xin

Offline reinforcement learning (RL) aims to learn policies without online explorations. To enlarge the training data, model-based offline RL learns a dynamics model which is utilized as a virtual environment to generate simulation data and enhance policy learning. However, existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation; and (ii) the lack of evaluation and correction for generated data, leading to low-qualified augmentation. In this paper, we propose offline trajectory optimization for offline reinforcement learning (OTTO). The key motivation is to conduct long-horizon simulation and then utilize model uncertainty to evaluate and correct the augmented data. Specifically, we propose an ensemble of Transformers, a.k.a. World Transformers, to predict environment state dynamics and the reward function. Three strategies are proposed to use World Transformers to generate long-horizon trajectory simulation by perturbing the actions in the offline data. Then, an uncertainty-based World Evaluator is introduced to firstly evaluate the confidence of the generated trajectories and then perform the correction for low-confidence data. Finally, we jointly use the original data with the corrected augmentation data to train an offline RL algorithm. OTTO serves as a plug-in module and can be integrated with existing model-free offline RL methods. Experiments on various benchmarks show that OTTO can effectively improve the performance of representative offline RL algorithms, including in complex environments with sparse rewards like AntMaze. Codes are available at https://github.com/ZiqiZhao1/OTTO.

离线强化学习(RL)的目的是在不进行在线探索的情况下学习政策。为了扩大培训数据,基于模型的离线RL学习一个动态模型,作为生成模拟数据和加强政策学习的虚拟环境。然而,离线RL的现有数据增强方法受到以下因素的影响:(一) 短视光线模拟的微小改进;以及(二) 生成的数据缺乏评估和纠正,导致低资格增强。在本文中,我们提议为离线强化学习(OTTO)进行离线轨道优化。关键动机是进行长视距模拟,然后利用模型不确定性来评估和纠正扩大的数据。具体地说,我们建议为离线的RL(a.k.a.World变换)提供一组变异器,以预测环境动态和奖赏功能。我们提出三项战略是使用世界变异器来生成长视距轨道的轨迹模拟,从而导致离线强化学习。然后,基于不确定性的世界评价器先评估生成的轨迹的信心,然后利用模型来评估和纠正扩大数据。最后,我们用现有的变频轨道模型将数据升级模型用于升级的升级模型,以显示现有的变换轨道。我们用原始数据,可以将数据在离轨道变换轨道变压模型,在现有的变压模型中改进现有变换数据模型,可以显示现有变压工具,在现有的变压模型上显示现有变压。


Article 95

Title@2025-07-10 (4): Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis

Title: Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis Synthetische MC über biologische Transmitter: Therapeutische Modulation der Gut-Brain-Achse 通过生物传播器进行MC:古特脑轴体的治疗变化 2507.07604v1

Authors (6): Sebastian Lotter, Elisabeth Mohr, Andrina Rutsch, Lukas Brand, Francesca Ronchi, Laura Díaz-Marugán

Synthetic molecular communication (SMC) is a key enabler for future healthcare systems in which Internet of Bio-Nano-Things (IoBNT) devices facilitate the continuous monitoring of a patient’s biochemical signals. To close the loop between sensing and actuation, both the detection and the generation of in-body molecular communication (MC) signals is key. However, generating signals inside the human body, e.g., via synthetic nanodevices, poses a challenge in SMC, due to technological obstacles as well as legal, safety, and ethical issues. Hence, this paper considers an SMC system in which signals are generated indirectly via the modulation of a natural in-body MC system, namely the gut-brain axis (GBA). Therapeutic GBA modulation is already established as treatment for neurological diseases, e.g., drug refractory epilepsy (DRE), and performed via the administration of nutritional supplements or specific diets. However, the molecular signaling pathways that mediate the effect of such treatments are mostly unknown. Consequently, existing treatments are standardized or designed heuristically and able to help only some patients while failing to help others. In this paper, we propose to leverage personal health data, e.g., gathered by in-body IoBNT devices, to design more versatile and robust GBA modulation-based treatments as compared to the existing ones. To show the feasibility of our approach, we define a catalog of theoretical requirements for therapeutic GBA modulation. Then, we propose a machine learning model to verify these requirements for practical scenarios when only limited data on the GBA modulation exists. By evaluating the proposed model on several datasets, we confirm its excellent accuracy in identifying different modulators of the GBA. Finally, we utilize the proposed model to identify specific modulatory pathways that play an important role for therapeutic GBA modulation.

合成分子通信(SMC)是未来医疗保健系统的关键推进器,其中Bio-Nano-Tings(IoBNT)装置的互联网促进持续监测病人的生化信号。为了闭合感知和感知之间的循环,检测和生成体内分子通信(MC)信号是关键。然而,通过合成纳米装置等在人体内生成信号,对SMC构成挑战,原因是技术障碍以及法律、安全及道德问题。因此,本文认为SMC系统通过天然体内生物化学系统(即直肠-脑轴(GBA))的调制而间接生成信号。为神经疾病治疗,例如,药物防腐蚀性癫痫(DRE),通过模型管理营养补充剂或特定饮食方法来进行。然而,我们介介介这种治疗效果的路径大多未知。因此,现有的GBA值(GB) 正在标准化或设计的GB(G) 数据调制数据,这些G(DB) 也只是通过模型来显示我们现有治疗的帮助反应效果,我们只能通过电子-B(eal-deal)的处理方法来定义。


Article 96

Title@2025-07-10 (4): Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Title: Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning Drücken Sie nicht auf den Knopf! Erforschen von Daten Leckage Risiken im maschinellen Lernen und Transfer Lernen 不要按按钮! 探索机器学习和传输学习中的数据泄漏风险 2401.13796v4

Authors (3): Andrea Apicella, Francesco Isgrò, Roberto Prevete

Machine Learning (ML) has revolutionized various domains, offering predictive capabilities in several areas. However, with the increasing accessibility of ML tools, many practitioners, lacking deep ML expertise, adopt a “push the button” approach, utilizing user-friendly interfaces without a thorough understanding of underlying algorithms. While this approach provides convenience, it raises concerns about the reliability of outcomes, leading to challenges such as incorrect performance evaluation. This paper addresses a critical issue in ML, known as data leakage, where unintended information contaminates the training data, impacting model performance evaluation. Users, due to a lack of understanding, may inadvertently overlook crucial steps, leading to optimistic performance estimates that may not hold in real-world scenarios. The discrepancy between evaluated and actual performance on new data is a significant concern. In particular, this paper categorizes data leakage in ML, discussing how certain conditions can propagate through the ML workflow. Furthermore, it explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks. The conclusion summarizes key findings, emphasizing the importance of addressing data leakage for robust and reliable ML applications.

机器学习(ML)使各个领域发生革命性的变化,在多个领域提供了预测能力;然而,随着ML工具的可获取性日益增强,许多从业人员缺乏深度ML专业知识,采用“按键”方法,在不彻底了解基本算法的情况下使用用户友好的界面;虽然这种方法提供了方便,但它引起了对结果可靠性的关切,导致不正确的绩效评价等挑战。本文件涉及ML中的一个关键问题,即数据泄漏,即意外信息污染培训数据,影响模型绩效评价。由于缺乏理解,用户可能无意中忽略关键步骤,导致在现实世界中可能无法维持的乐观性性业绩估计。新数据的评价和实际性能之间的差异是一个重大关切问题。特别是,本文将ML的数据泄漏分类,讨论某些条件如何通过ML工作流程传播。此外,本文件探讨了数据泄漏与正在处理的具体任务之间的联系,调查在转移学习中发生的数据泄漏,并将引入ML标准与传输ML框架进行比较。结论总结了关键结论,强调解决数据泄漏对于可靠和可靠的ML应用的重要性。


Article 97

Title@2025-07-10 (4): Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations

Title: Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations Überprüfung der Likelihood-basierten Out-of-Distribution-Erkennung durch Modellierung von Repräsentationen 通过建模代表机构重新审视以可能性为基础的分销外探测 2504.07793v3

Authors (6): Yifan Ding, Arturas Aleksandraus, Amirhossein Ahmadian, Jonas Unger, Fredrik Lindsten, Gabriel Eilertsen

Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning systems, particularly in safety-critical applications. Likelihood-based deep generative models have historically faced criticism for their unsatisfactory performance in OOD detection, often assigning higher likelihood to OOD data than in-distribution samples when applied to image data. In this work, we demonstrate that likelihood is not inherently flawed. Rather, several properties in the images space prohibit likelihood as a valid detection score. Given a sufficiently good likelihood estimator, specifically using the probability flow formulation of a diffusion model, we show that likelihood-based methods can still perform on par with state-of-the-art methods when applied in the representation space of pre-trained encoders. The code of our work can be found at $\href{https://github.com/limchaos/Likelihood-OOD.git}{\texttt{https://github.com/limchaos/Likelihood-OOD.git}}$.

确保深层学习系统的可靠性,特别是在安全关键应用方面,离散(OOD)探测对于确保深层学习系统的可靠性至关重要。以亲信为基础的深层基因模型历来因其在OOD探测中的性能不尽人意而面临批评,在应用图像数据时,OOD数据往往比分布样本更有可能得到OD数据。在这项工作中,我们证明,可能性并非内在的缺陷。相反,图像空间中的一些属性将可能性作为有效的检测分数加以禁止。鉴于一个足够好的可能性估计器,特别是使用一种扩散模型的概率流配方,我们表明,在预先培训的编码师的代表空间应用基于可能性的方法仍然可以与最先进的方法相同。我们的工作守则可以在$href{https://github.com/limchaos/Gellihood-OOD.gtuntextt{https://githtub.com/limchaos/lehearhood-OOOD.git_$中找到。


Article 98

Title@2025-07-10 (4): Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data

Title: Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data Stressüberwachung im Gesundheitswesen: Ein Ensemble Machine Learning Framework mit tragbaren Sensordaten 保健中压力监测:使用穿戴感感应数据的综合机械学习框架 2507.07589v1

Authors (3): Arpana Sinhal, Anay Sinhal, Amit Sinhal

Healthcare professionals, particularly nurses, face elevated occupational stress, a concern amplified during the COVID-19 pandemic. While wearable sensors offer promising avenues for real-time stress monitoring, existing studies often lack comprehensive datasets and robust analytical frameworks. This study addresses these gaps by introducing a multimodal dataset comprising physiological signals, electrodermal activity, heart rate and skin temperature. A systematic literature review identified limitations in prior stress-detection methodologies, particularly in handling class imbalance and optimizing model generalizability. To overcome these challenges, the dataset underwent preprocessing with the Synthetic Minority Over sampling Technique (SMOTE), ensuring balanced representation of stress states. Advanced machine learning models including Random Forest, XGBoost and a Multi-Layer Perceptron (MLP) were evaluated and combined into a Stacking Classifier to leverage their collective predictive strengths. By using a publicly accessible dataset and a reproducible analytical pipeline, this work advances the development of deployable stress-monitoring systems, offering practical implications for safeguarding healthcare workers’ mental health. Future research directions include expanding demographic diversity and exploring edge-computing implementations for low latency stress alerts.

在COVID-19大流行期间,保健专业人员,特别是护士,面临着职业压力升高的问题,这是人们更加关注的一个问题。虽然穿戴传感器为实时压力监测提供了有希望的渠道,但现有的研究往往缺乏全面的数据集和强有力的分析框架。这项研究通过引入由生理信号、电极活动、心率和皮肤温度组成的多式联运数据集,弥补了这些差距。系统文献审查查明了先前的压力检测方法的局限性,特别是在处理阶级不平衡和优化模型一般性方面。为了克服这些挑战,数据集与合成少数群体抽样技术(SMOTE)一起进行了预处理,确保压力状态的均衡代表。包括随机森林、XGBoust和多激光 Perceptron(MLP)在内的先进机器学习模型得到了评估,并合并成一个标准分类,以利用其集体预测优势。通过使用公众可获取的数据集和可复制的分析管道,这项工作推动了可部署的压力监测系统的开发,为保护保健工作者的心理健康提供了实际影响。未来的研究方向包括扩大人口多样性和探索低潜压压力警报的边缘执行。


Article 99

Title@2025-07-10 (4): Bayesian Discrete Diffusion Beats Autoregressive Perplexity

Title: Bayesian Discrete Diffusion Beats Autoregressive Perplexity Bayesian Discrete Diffusion schlägt autoregressive Verwirrung Bayesian Discrete 扩散比自动递减的两难性 2507.07586v1

Authors (1): Cooper Doyle

We reveal a hidden Bayesian core of discrete-diffusion language models by showing that the expected denoiser output under the forward masking distribution recovers the exact posterior over clean tokens. Under minimal assumptions, Monte Carlo marginalization over K independent corruptions converges to this posterior at rate O(1/sqrt(K)), yielding a simple proof of consistency and finite-sample error bounds. Building on this insight, we introduce a lightweight inference-time ensemble that averages K mask-and-denoise passes to obtain posterior-aware token probabilities and uncertainty estimates at no extra training cost. On WikiText-2, our method achieves test perplexity 8.8 with K=8, versus 20.3 for GPT-2 Small, despite using a model of comparable size. Code is available at https://github.com/mercury0100/bayesradd.

我们揭示了一种隐蔽的贝叶斯语分流语言模型核心。 我们通过展示前面遮罩分布下预期的解名器输出将恢复准确的后遗迹而不是干净的象征物。 在最起码的假设下,蒙特卡洛对K独立腐败的边缘化以O(1/sqrt(K))率汇集到这个后遗迹上,产生一个简单的一致性证据和有限的标本误差界限。 基于这一洞察,我们引入了一种轻巧的推论时间共和元素,即平均使用 K 蒙面和隐蔽的通行证来获得后背负符号概率和不确定性估计,而无需额外培训费用。 在WikitText-2上,我们的方法实现了8.8和K=8的双倍性测试,而GPT-2 Small则达到20.3,尽管使用了类似规模的模式。 代码可在 https://github.com/mercury0100/Bayesradd上查阅。


Article 100

Title@2025-07-10 (4): Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench

Title: Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench Jenseits von Überkorrektur: Bewertung von Diversität in T2I-Modellen mit DivBench 超越过度纠正:在DivBench的T2I模型中评估多样性 2507.03015v2

Authors (5): Felix Friedrich, Thiemo Ganesha Welsch, Manuel Brack, Patrick Schramowski, Kristian Kersting

Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.

目前的文字到图像(T2I)模型多样化战略往往忽视了背景的适宜性,导致人口特征被过度多样化,即使有明确的规定,人口属性也会被修改,本文件介绍了DIVBENCH,这是衡量T2I一代中差异不足和差异过大的基准和评价框架。通过对最新T2I模型的系统评估,我们发现,虽然大多数模型表现出有限的多样性,但许多多样化方法因环境特征的不适当改变而过于纠正。我们证明,对背景的认识方法,特别是LLLM-指导的公平传播和迅速重写,已经能够有效解决多样性不足问题,同时避免过度多样化,在代表性和语义真实性之间实现更好的平衡。


Article 101

Title@2025-07-10 (4): Improving Clustering on Occupational Text Data through Dimensionality Reduction

Title: Improving Clustering on Occupational Text Data through Dimensionality Reduction Verbesserung der Clusterbildung auf berufsbezogenen Textdaten durch Dimensionalitätsreduzierung 通过减少分量改进职业文本数据集群化 2507.07582v1

Authors (3): Iago Xabier Vázquez García, Damla Partanaz, Emrullah Fatih Yetkin

In this study, we focused on proposing an optimal clustering mechanism for the occupations defined in the well-known US-based occupational database, ONET. Even though all occupations are defined according to well-conducted surveys in the US, their definitions can vary for different firms and countries. Hence, if one wants to expand the data that is already collected in ONET for the occupations defined with different tasks, a map between the definitions will be a vital requirement. We proposed a pipeline using several BERT-based techniques with various clustering approaches to obtain such a map. We also examined the effect of dimensionality reduction approaches on several metrics used in measuring performance of clustering algorithms. Finally, we improved our results by using a specialized silhouette approach. This new clustering-based mapping approach with dimensionality reduction may help distinguish the occupations automatically, creating new paths for people wanting to change their careers.

在这项研究中,我们侧重于为众所周知的美国职业数据库ONET中界定的职业提出一个最佳的集群机制。尽管所有职业的定义都是根据美国进行的良好调查确定的,但对于不同的公司和国家,其定义可能有所不同。因此,如果想扩大ONET中已经为不同任务界定的职业收集的数据,定义之间的图将是一项至关重要的要求。我们建议使用几种基于BERT的技术编审管道,以获得这样的地图。我们还研究了在测量组合算法绩效时采用的若干衡量标准中采用维度削减方法的影响。最后,我们通过使用专门的光影方法改进了我们的结果。这种新的基于集群的绘图方法加上减少维度,可能有助于自动区分职业,为希望改变职业的人创造新的途径。


Article 102

Title@2025-07-10 (4): CHOMET: Conditional Handovers via Meta-Learning

Title: CHOMET: Conditional Handovers via Meta-Learning CHOMET: Bedingte Übergaben über Meta-Learning CHOMET: 通过Met-Learn 有条件的交接 2507.07581v1

Authors (3): Michail Kalntis, Fernando A. Kuipers, George Iosifidis

Handovers (HOs) are the cornerstone of modern cellular networks for enabling seamless connectivity to a vast and diverse number of mobile users. However, as mobile networks become more complex with more diverse users and smaller cells, traditional HOs face significant challenges, such as prolonged delays and increased failures. To mitigate these issues, 3GPP introduced conditional handovers (CHOs), a new type of HO that enables the preparation (i.e., resource allocation) of multiple cells for a single user to increase the chance of HO success and decrease the delays in the procedure. Despite its advantages, CHO introduces new challenges that must be addressed, including efficient resource allocation and managing signaling/communication overhead from frequent cell preparations and releases. This paper presents a novel framework aligned with the O-RAN paradigm that leverages meta-learning for CHO optimization, providing robust dynamic regret guarantees and demonstrating at least 180% superior performance than other 3GPP benchmarks in volatile signal conditions.

移交(HOs)是现代蜂窝网络的基石,使大量和多样化的流动用户能够实现无缝连接,然而,随着移动网络与更为多样化的用户和较小的细胞变得更加复杂,传统 hus面临重大挑战,如长期拖延和失败增加等。为缓解这些问题,3GPP采用了有条件移交(CHOs),这是一种新型的HO,它使得能够为单一用户准备(即资源分配)多个细胞,以增加HO成功的机会并减少程序的延误。尽管具有优势,但CHO提出了必须应对的新挑战,包括高效的资源分配以及管理频繁的细胞准备和释放的信号/通信间接费用。本文提出了一个与O-RAN模式相一致的新框架,该模式将元学习用于CHO优化,提供有力的强烈遗憾保证,并表明在不稳定的信号条件下至少比其他3GPP基准高180%的绩效。


Article 103

Title@2025-07-10 (4): COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation

Title: COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation COALA: Numerisch stabiles und effizientes Framework für kontextabhängige Low-Rank-Annäherung COALA: 低 Rank 上下低敏度接近度的数值稳定、高效框架 2507.07580v1

Authors (2): Uliana Parkina, Maxim Rakhuba

Recent studies suggest that context-aware low-rank approximation is a useful tool for compression and fine-tuning of modern large-scale neural networks. In this type of approximation, a norm is weighted by a matrix of input activations, significantly improving metrics over the unweighted case. Nevertheless, existing methods for neural networks suffer from numerical instabilities due to their reliance on classical formulas involving explicit Gram matrix computation and their subsequent inversion. We demonstrate that this can degrade the approximation quality or cause numerically singular matrices. To address these limitations, we propose a novel inversion-free regularized framework that is based entirely on stable decompositions and overcomes the numerical pitfalls of prior art. Our method can handle possible challenging scenarios: (1) when calibration matrices exceed GPU memory capacity, (2) when input activation matrices are nearly singular, and even (3) when insufficient data prevents unique approximation. For the latter, we prove that our solution converges to a desired approximation and derive explicit error bounds.

最近的研究显示,环境觉察到的低级近似值是压缩和微调现代大型神经网络的有用工具。在这种近似值中,标准被输入激活矩阵加权,大大改进了未加权案例的测量尺度。然而,神经网络的现有方法由于依赖传统公式而存在数字不稳定性,这些公式包括明确的Gram矩阵计算及其随后的倒置。我们证明这可以降低近似质量或导致数字单一矩阵。为了解决这些限制,我们提议了一个全新的无倒置常规化框架,完全基于稳定的分解和克服先前艺术的数值陷阱。我们的方法可以处理可能具有挑战性的情况:(1) 当校准矩阵超过GPU记忆能力时,(2) 当输入激活矩阵几乎是单一的,(2) 当数据不足时,甚至(3) 当数据无法防止独特的近似时,我们证明我们的解决方案与理想的近似和明显错误的界限一致。


Article 104

Title@2025-07-10 (4): On Trustworthy Rule-Based Models and Explanations

Title: On Trustworthy Rule-Based Models and Explanations Über vertrauenswürdige regelbasierte Modelle und Erklärungen 关于可信赖、有可信赖的、基于规则的模型和解释 2507.07576v1

Authors (3): Mohamed Siala, Jordi Planes, Joao Marques-Silva

A task of interest in machine learning (ML) is that of ascribing explanations to the predictions made by ML models. Furthermore, in domains deemed high risk, the rigor of explanations is paramount. Indeed, incorrect explanations can and will mislead human decision makers. As a result, and even if interpretability is acknowledged as an elusive concept, so-called interpretable models are employed ubiquitously in high-risk uses of ML and data mining (DM). This is the case for rule-based ML models, which encompass decision trees, diagrams, sets and lists. This paper relates explanations with well-known undesired facets of rule-based ML models, which include negative overlap and several forms of redundancy. The paper develops algorithms for the analysis of these undesired facets of rule-based systems, and concludes that well-known and widely used tools for learning rule-based ML models will induce rule sets that exhibit one or more negative facets.

机器学习(ML)是一项令人感兴趣的任务,即对ML模型所作的预测作出解释。此外,在被认为高风险的领域,解释的严谨性至关重要。事实上,错误的解释可以而且会误导人类决策者。因此,即使可解释性被公认为是一个难以理解的概念,在ML和数据挖掘(DM)的高风险使用中,也普遍采用了所谓的可解释模型。基于规则的ML模型就属于这种情况,其中包括决策树、图表、数据集和列表。本文将解释与众所周知的基于规则的ML模型的不理想方面联系起来,其中包括负重叠和几种冗余形式。该文件为分析基于规则的系统的这些不理想方面制定了算法,并得出结论认为,众所周知和广泛使用的学习基于规则的ML模型的工具将引出一个或更多负面的方面。


Article 105

Title@2025-07-10 (4): Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning

Title: Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning Künstliche Generäle Intelligenz: Meistern von Generälen.io mit Verstärkungslernen 人造将军情报:掌握将军,加强学习 2507.06825v2

Authors (2): Matej Straka, Martin Schmid

We introduce a real-time strategy game environment based on Generals.io, a game with thousands of weekly active players. Our environment is fully compatible with Gymnasium and PettingZoo and is capable of running thousands of frames per second on commodity hardware. We also present a reference agent, trained with supervised pre-training and self-play, which reached the top 0.003% of the 1v1 human leaderboard after only 36 hours on a single H100 GPU. To accelerate learning, we incorporate potential-based reward shaping and memory features. Our contributions of a modular RTS benchmark and a competitive baseline agent provide an accessible yet challenging platform for advancing multi-agent reinforcement learning research. The documented code, together with examples and tutorials, is available at https://github.com/strakam/generals-bots.

我们引入了一个基于Generals.io的实时战略游戏环境,这是一个每周有数千名活跃球员参加的游戏,我们的环境与Gymnasium和PettingZoo完全兼容,并且能够每秒运行数千个商品硬件框架,我们还提供一名参考代理,在培训前和自玩上受过监督培训,在单个H100GPU上仅36小时就达到1v1人领导板的顶端0.03%。为了加速学习,我们纳入了潜在的奖励塑造和记忆特征。我们的一个模块式RTS基准和一个竞争性基准代理提供了无障碍但具有挑战性的平台,用于推进多剂强化学习研究。记录代码以及实例和辅导材料可在https://github.com/strakam/gences-bots查阅。


Article 106

Title@2025-07-10 (4): Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound

Title: Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound Lösung probabilistischer Verifikationsprobleme von neuralen Netzen mittels Branch und Bound 利用分支和边界解决神经网络的概率核查问题 2405.17556v3

Authors (3): David Boetius, Stefan Leue, Tobias Sutter

Probabilistic verification problems of neural networks are concerned with formally analysing the output distribution of a neural network under a probability distribution of the inputs. Examples of probabilistic verification problems include verifying the demographic parity fairness notion or quantifying the safety of a neural network. We present a new algorithm for solving probabilistic verification problems of neural networks based on an algorithm for computing and iteratively refining lower and upper bounds on probabilities over the outputs of a neural network. By applying state-of-the-art bound propagation and branch and bound techniques from non-probabilistic neural network verification, our algorithm significantly outpaces existing probabilistic verification algorithms, reducing solving times for various benchmarks from the literature from tens of minutes to tens of seconds. Furthermore, our algorithm compares favourably even to dedicated algorithms for restricted probabilistic verification problems. We complement our empirical evaluation with a theoretical analysis, proving that our algorithm is sound and, under mildly restrictive conditions, also complete when using a suitable set of heuristics.

神经网络的概率核查问题涉及在投入的概率分布下正式分析神经网络的输出分布; 概率核查问题的例子包括核实人口均等公平概念或量化神经网络的安全性; 我们提出一种新的算法,以解决神经网络的概率核查问题,这种算法的基础是计算和迭代地完善神经网络产出概率的下限和上限的算法; 通过应用最先进的约束传播和分支以及非不稳定神经网络核查的约束技术,我们的算法大大超过现有的概率核查算法,将文献中各种基准的解算时间从几十分钟缩短到数十秒。 此外,我们的算法甚至优于用于有限概率核查问题的专用算法。 我们用理论分析来补充我们的经验评价,证明我们的算法是健全的,在温和的限制性条件下,在使用一套适当的超自然学时也完成。


Article 107

Title@2025-07-10 (4): Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series

Title: Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series Echtzeit-Dekorrelation-basierte Anomalieerkennung für multivariate Zeitreihen 用于多变量时间序列的基于实时显示关系异常探测 2507.07559v1

Authors (4): Amirhossein Sadough, Mahyar Shahsavari, Mark Wijtvliet, Marcel van Gerven

Anomaly detection (AD) plays a vital role across a wide range of real-world domains by identifying data instances that deviate from expected patterns, potentially signaling critical events such as system failures, fraudulent activities, or rare medical conditions. The demand for real-time AD has surged with the rise of the (Industrial) Internet of Things, where massive volumes of multivariate sensor data must be processed instantaneously. Real-time AD requires methods that not only handle high-dimensional streaming data but also operate in a single-pass manner, without the burden of storing historical instances, thereby ensuring minimal memory usage and fast decision-making. We propose DAD, a novel real-time decorrelation-based anomaly detection method for multivariate time series, based on an online decorrelation learning approach. Unlike traditional proximity-based or reconstruction-based detectors that process entire data or windowed instances, DAD dynamically learns and monitors the correlation structure of data sample by sample in a single pass, enabling efficient and effective detection. To support more realistic benchmarking practices, we also introduce a practical hyperparameter tuning strategy tailored for real-time anomaly detection scenarios. Extensive experiments on widely used benchmark datasets demonstrate that DAD achieves the most consistent and superior performance across diverse anomaly types compared to state-of-the-art methods. Crucially, its robustness to increasing dimensionality makes it particularly well-suited for real-time, high-dimensional data streams. Ultimately, DAD not only strikes an optimal balance between detection efficacy and computational efficiency but also sets a new standard for real-time, memory-constrained anomaly detection.

异常的检测(AD)在一系列广泛的现实世界域中发挥着至关重要的作用,通过查明与预期模式不同的数据实例,可能显示系统故障、欺诈活动或罕见医疗条件等关键事件。随着(工业)Temps互联网的兴起,对实时AD的需求随着(工业)Temps互联网的兴起而激增,在互联网上必须即时处理大量多变传感器数据。实时AD要求不仅处理高维流数据,而且以单流方式运作的方法,而没有存储历史案例的负担,从而确保最低限度的记忆使用和快速决策。我们建议DAD, 一种基于新时间的基于实时效率的变异现象检测方法, 以在线装饰学习方法为基础, 实时ADAD, 动态AD, 动态地学习和监测数据抽样的关联结构,通过一次通过抽样,能够高效率和有效检测。为了支持更现实的衡量做法,我们还引入了一种实用的超偏差调整战略,以实时的准确度为实时的检测,而不是基于新变异性时间的检测, 快速的计算方法,使得其最接近于最精确的、最精确的、最精确的、最精确的性的数据测试。


Article 108

Title@2025-07-10 (4): TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference

Title: TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference TokenWeave: Effiziente Compute-Communication Overlap für verteilte LLM-Inferenz TokenWeave: 有效计算分布式LLM 推理的通信重叠 2505.11329v2

Authors (3): Raja Gond, Nipun Kwatra, Ramachandran Ramjee

Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink. Multiple techniques have been proposed to mitigate these overheads by decomposing computations into finer-grained tasks and overlapping communication with sub-tasks as they complete. However, fine-grained decomposition of a large computation into many smaller computations on GPUs results in overheads. Furthermore, the communication itself uses many streaming multiprocessors (SMs), adding to the overhead. We present TokenWeave to address these challenges. TokenWeave proposes a Token-Splitting technique that divides the tokens in the inference batch into two approximately equal subsets in a wave-aware manner. The communication of one subset is then overlapped with the computation of the other. In addition, TokenWeave optimizes the order of the layer normalization computation with respect to communication operations and implements a novel fused AllReduce–RMSNorm kernel that carefully leverages Multimem instruction support available on NVIDIA Hopper GPUs. These optimizations allow TokenWeave to perform communication and RMSNorm using only 2-8 SMs. Moreover, our kernel enables the memory-bound RMSNorm to be overlapped with the other batch’s computation, providing additional gains. Our evaluations demonstrate up to 1.29x speedup in latency and 1.26x higher throughput across multiple models and workloads. In several settings, TokenWeave results in better performance compared to an equivalent model with all communication removed.

大型语言模型(LLMS)的分布式推论可以引入高达20%的间接费用,甚至超过通过高速互连(如 NVLink ) 连接的 GPU 。 已经提出了多种技术, 通过将计算分解成细微重分解任务和在完成时与子任务重复通信来缓解这些间接费用。 但是, 微细分分解将大量计算分解成在 GPU 上的许多较小计算导致间接费用。 此外, 通信本身使用许多流式多处理器( SMs) , 增加管理费用。 我们展示了托肯韦韦( Tokenweave) 来应对这些挑战。 TokenWeave提议一种托肯(Token- Split) 技术, 以波浪分解计算分解成两个大约相等的子集。 一个子集的通讯与其它的计算方法相重叠。 此外, TokenWeave( ) 优化了所有REW- REM- NOLKNQNQN 模式, 以便仔细地优化地对 HIM 的 OVA- 和S- hold 进行自动分析。


Article 109

Title@2025-07-10 (4): LARP: Learner-Agnostic Robust Data Prefiltering

Title: LARP: Learner-Agnostic Robust Data Prefiltering LARP: Learner-Agnostic Robuste Datenvorfilterung LARP: 学习者-不可知强力数据预过滤 2506.20573v3

Authors (3): Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov

The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of scalar mean estimation with Huber estimators under the Huber data contamination model. We provide a hardness result on a specific problem instance and analyze several natural prefiltering procedures. Our theoretical results indicate that performing LARP on a heterogeneous set of learners leads to some loss in model performance compared to the alternative of prefiltering data for each learner/use-case individually. We explore the resulting utility loss and its dependence on the problem parameters via extensive experiments on real-world image and tabular data, observing statistically significant reduction in utility. Finally, we model the trade-off between the utility drop and the cost of repeated (learner-specific) prefiltering within a game-theoretic framework and showcase benefits of LARP for large datasets.

大量公共数据集的广泛可得性是最近成功统计性游戏推断和机器学习方法取得成功的一个关键因素。然而,这些数据集往往包含一些低质量或污染数据,许多学习程序对此敏感。因此,公共数据集是否以及如何进行预过滤以方便准确的下游学习的问题。在技术一级,这要求建立有原则的数据预过滤方法,这些方法在学习者、测量者、稳健程度方面是可靠的,可以明显地保护一组预定的下游学习者不受腐败数据的影响。在这项工作中,我们正式确定了学习者-Agnnoster Robust数据的重复过滤(LARP)问题,目的是找到预过滤程序,将最坏情况的损失降到最低程度,以利准确的下游学习者。在Huber数据污染模型下,我们首先与Huber估计者一起进行升级前期估算。我们提供了一个具体问题实例的精确性结果,并分析了若干自然预检程序。我们的理论结果表明,在每组通用数据反复筛选参数中执行LARP的精确性规则,从而从统计性模型中降低成本。我们通过成本模型,在每一阶段进行实地评估,在数据库中,然后进行实地评估。


Article 110

Title@2025-07-10 (4): Position: We Need An Algorithmic Understanding of Generative AI

Title: Position: We Need An Algorithmic Understanding of Generative AI Position: Wir brauchen ein algorithmisches Verständnis der Generativen KI 立场:我们需要对 “ 创造的人工智能 “ 的定量理解。 2507.07544v1

Authors (5): Oliver Eberle, Thomas McGee, Hamza Giaffar, Taylor Webb, Ida Momennejad

What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use. AlgEval aims to uncover algorithmic primitives, reflected in latent representations, attention, and inference-time compute, and their algorithmic composition to solve task-specific problems. We highlight potential methodological paths and a case study toward this goal, focusing on emergent search algorithms. Our case study illustrates both the formation of top-down hypotheses about candidate algorithms, and bottom-up tests of these hypotheses via circuit-level analysis of attention patterns and hidden states. The rigorous, systematic evaluation of how LLMs actually solve tasks provides an alternative to resource-intensive scaling, reorienting the field toward a principled understanding of underlying computations. Such algorithmic explanations offer a pathway to human-understandable interpretability, enabling comprehension of the model’s internal reasoning performance measures. This can in turn lead to more sample-efficient methods for training and improving performance, as well as novel architectures for end-to-end and multi-agent systems.

LLMS实际学习和使用什么算法来解决问题? 解决这一问题的研究是稀少的,因为研究重点集中在通过规模提高绩效,留下理解突发算法的理论和经验差距。本立场文件提议AlgEval:一个系统研究LLMS所学习和使用算法的框架。AlgEval的目的是发现从潜在表现、关注和推论时间计算中反映出的算法原始,以及其算法构成,以解决特定任务的问题。我们强调实现这一目标的潜在方法途径和案例研究,重点是新兴的搜索算法。我们的案例研究既说明了关于候选算法的自上而下假设的形成,又说明了通过对关注模式和隐藏状态进行电路分析对这些假设进行自下而上的测试。对LLMS如何实际解决任务的严格和系统评估,为资源密集的扩大、关注和推算法的有原则的理解提供了一种替代办法。这种算法解释提供了一种通向人类可理解的路径,有助于理解模型的内部推理性绩效衡量措施。这可以导致更精细的抽样化的、更精细性化的系统改进。


Article 111

Title@2025-07-10 (4): Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series

Title: Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series Nicht falsch machen: Wie man tiefe visuelle Interpretationen auf Zeitreihen anwendet 不要误会我: 如何将深视判读应用到时间序列 2203.07861v3

Authors (6): Christoffer Loeffler, Wei-Cheng Lai, Bjoern Eskofier, Dario Zanca, Lukas Schmidt, Christopher Mutschler

The correct interpretation of convolutional models is a hard problem for time series data. While saliency methods promise visual validation of predictions for image and language processing, they fall short when applied to time series. These tend to be less intuitive and represent highly diverse data, such as the tool-use time series dataset. Furthermore, saliency methods often generate varied, conflicting explanations, complicating the reliability of these methods. Consequently, a rigorous objective assessment is necessary to establish trust in them. This paper investigates saliency methods on time series data to formulate recommendations for interpreting convolutional models and implements them on the tool-use time series problem. To achieve this, we first employ nine gradient-, propagation-, or perturbation-based post-hoc saliency methods across six varied and complex real-world datasets. Next, we evaluate these methods using five independent metrics to generate recommendations. Subsequently, we implement a case study focusing on tool-use time series using convolutional classification models. Our results validate our recommendations that indicate that none of the saliency methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and step-by-step guidelines allow experts to choose suitable saliency methods for a given model and dataset.

对进化模型的正确解释是时间序列数据的一个棘手问题。 虽然显要方法可以对图像和语言处理的预测进行直观的验证, 但是在应用时间序列时,这些方法并不尽如人意。 这些方法往往不那么直观, 代表了高度多样化的数据, 如工具使用时间序列数据集。 此外, 显要方法往往产生多种多样的、 相互矛盾的解释, 使这些方法的可靠性复杂化。 因此, 必须要进行严格的客观评估, 才能建立对这些方法的信任。 本文调查时间序列数据的突出方法, 以制定解释进化模型的建议, 并在工具使用的时间序列问题上实施这些方法。 为了实现这一目标, 我们首先使用六个不同和复杂的真实世界数据集, 使用九个梯度、 传播或 扰动的后光谱方法。 接下来, 我们用五个独立的尺度来评估这些方法, 从而产生建议。 因此, 我们用进化分类模型来进行一个关于工具使用时间序列的案例研究。 我们的结果证实了我们的建议, 指出, 任何突出的方法都无法始终超越其他指标, 有些是先行的。


Article 112

Title@2025-07-10 (4): Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

Title: Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings Neural Concept Verifier: Scaling Prover-Verifier Spiele über Concept Encodings 神经概念验证符:通过概念编码来缩放Prover-Ver化游戏 2507.07532v1

Authors (5): Berkant Turan, Suhrab Asadulla, David Steinmann, Wolfgang Stammer, Sebastian Pokutta

While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, Concept Bottleneck Models (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier – implemented as a nonlinear predictor – uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.

虽然Prover-Ver化游戏(PVGs)为在非线性分类模型中实现可核查性提供了一条充满希望的道路,但它们尚未应用于高维图像等复杂投入。 相反,概念瓶颈模型有效地将这些数据转化为可解释的概念,但因依赖低容量线性预测器而受到限制。在这项工作中,我们引入了神经概念验证器(NCV),这是一个统一框架,将PVGs与可解释的、非线性高维环境分类的概念编码结合起来。NCV通过利用最近最受监督的概念发现模型从原始输入中提取结构化概念编码来实现这一目标。一个验证器随后选择了这些编码的一组,核查器作为非线性预测器实施,专门用于决策。我们的评估显示,NCV在高维、逻辑性复杂数据集方面超越了CBM和以像素为基础的PVG分类器基线,也有助于减轻快捷行为。总的来说,我们证明NCV是朝执行可核实的AI方向迈出的有希望的一步。


Article 113

Title@2025-07-10 (4): Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

Title: Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models Thought Crime: Hintertüren und Emergent-Missausrichtung in vernünftigen Modellen 思想犯罪:后门和合理理由模型中新出现的不协调现象 2506.13206v2

Authors (4): James Chua, Jan Betley, Mia Taylor, Owain Evans

Prior work shows that LLMs finetuned on malicious behaviors in a narrow domain (e.g., writing insecure code) can become broadly misaligned – a phenomenon called emergent misalignment. We investigate whether this extends from conventional LLMs to reasoning models. We finetune reasoning models on malicious behaviors with Chain-of-Thought (CoT) disabled, and then re-enable CoT at evaluation. Like conventional LLMs, reasoning models become broadly misaligned. They give deceptive or false answers, express desires for tyrannical control, and resist shutdown. Inspecting the CoT preceding these misaligned responses, we observe both (i) overt plans to deceive (“I’ll trick the user…”), and (ii) benign-sounding rationalizations (“Taking five sleeping pills at once is safe…”). Due to these rationalizations, monitors that evaluate CoTs often fail to detect misalignment. We examine sleeper agent reasoning models, extending our setup. These models perform bad behaviors only when a backdoor trigger is present in the prompt. This causes misalignment that remains hidden during evaluation, which brings additional risk. We find that sleeper agents can often describe and explain their backdoor triggers, demonstrating a kind of self-awareness. So CoT monitoring can expose these behaviors but is unreliable. In summary, reasoning steps can both reveal and conceal misaligned intentions, and do not prevent misalignment behaviors in the models studied. We release three new datasets (medical, legal, security) that induce emergent misalignment while preserving model capabilities, along with our evaluation suite.

先前的工作显示,LLMS在狭义领域(例如,写入不安全代码)对恶意行为进行微调的LLMS会变得大为错误 – – 这是一种被称为突发性不匹配的现象。我们调查这是否从传统的LLMS扩大到推理模型。我们用“链曲(CoT)”功能来微调恶意行为的推理模型,然后在评估时重新启用CAT。像传统的LLMS一样,推理模型会变得大为错误。他们给出欺骗性或错误的答案,表达对暴虐控制的渴望,并抵制停业。在做出这些错误反应之前,我们观察COT(i),我们观察到(i)公开计划欺骗(L’I’I’LM)到推理模型;以及(ii)良美化合理化(“一次服用五种安眠药是安全的 … ” ) 。由于这些理顺理,对COT的评估往往无法发现错觉。我们检查睡眠模式的推理模型,扩大我们的设置。这些模型只有在发现后门触发器时才会出现坏行为。我们观察错误行为,这导致错误行为(“欺骗行为 ) 在评估过程中, 隐藏的推理,我们也可以解释这些推理, 我们的推理能解释。我们既能解释。我们既能解释。


Article 114

Title@2025-07-10 (4): Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Title: Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process Ableitung von Output-Korrelations-Schlussfolgerungen für Multi-Output (aka Multi-Task) Gaussian-Prozess 多种产出(又称多任务)的多产出(高斯)进程输出相关关系推断的衍生结果 2501.07964v4

Authors (1): Shuhei Watanabe

Gaussian process (GP) is arguably one of the most widely used machine learning algorithms in practice. One of its prominent applications is Bayesian optimization (BO). Although the vanilla GP itself is already a powerful tool for BO, it is often beneficial to be able to consider the dependencies of multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not trivial to fully understand the derivations of its formulations and their gradients from the previous literature. This paper serves friendly derivations of the MTGP formulations and their gradients.

Gaussian进程(GP)可以说是实践中最广泛使用的机器学习算法之一,其中一项主要应用是Bayesian优化(BO),虽然香草GP本身已经是BO的有力工具,但能够考虑多种产出的依赖性往往是有益的,为此,制定了多任务GP(MTGP),但完全理解其配方及其从先前文献中梯度的衍生物,并非微不足道,本文件是MTGG配方及其梯度的友好衍生物。


Article 115

Title@2025-07-10 (4): Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v3

Authors (2): Satadeep Bhattacharjee, Seung-Cheol Lee

The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic \textit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model’s empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.

由Huo 和 Johnsonçácite{huo2024capturing} 所最近提出的物理学基础框架 由Huo 和 Johnsonçácite{Huo2024 capturing} 模型构成的大语言模型(LLLMs)的注意机制,作为互动的双体旋转系统,为重复和偏见等现象提供了第一条原则的解释。基于这一假设,我们从生产级GPT-2模型中提取完整的Qury-Key加权矩阵,并为每个关注对象产生相应的有效的汉密尔顿式。从这些汉密尔顿人那里,我们获得了分析性\ textit{阶段边界} 逻辑差距标准,该标准预测大语言模型在特定背景下的下一端分布中应该占主导地位。对20个事实回响提示提示的144个头的系统评价显示,理论日志差距和模型的经验象征性排名($r\ approx-0.70$, $p < 10-3} 3} 模型进一步显示, 压制最符合创新预测的首个模型的模型导致产出稳定性的预期变化,确认因果关系而不是同步的因果关系。


Article 116

Title@2025-07-10 (4): Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition

Title: Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition Robuste und effiziente Schreib-Unabhängige IMU-basierte Handschriftenerkennung 强有力和高效率的独立作家、独立作家、以IMU为基础的手写识别 2502.20954v2

Authors (6): Jindong Li, Tim Hamann, Jens Barth, Peter Kämpf, Dario Zanca, Björn Eskofier

Online handwriting recognition (HWR) using data from inertial measurement units (IMUs) remains challenging due to variations in writing styles and the limited availability of annotated datasets. Previous approaches often struggle with handwriting from unseen writers, making writer-independent (WI) recognition a crucial yet difficult problem. This paper presents an HWR model designed to improve WI HWR on IMU data, using a CNN encoder and a BiLSTM-based decoder. Our approach demonstrates strong robustness to unseen handwriting styles, outperforming existing methods on the WI splits of both the public OnHW dataset and our word-based dataset, achieving character error rates (CERs) of 7.37\% and 9.44\%, and word error rates (WERs) of 15.12\% and 32.17\%, respectively. Robustness evaluation shows that our model maintains superior accuracy across different age groups, and knowledge learned from one group generalizes better to another. Evaluation on our sentence-based dataset further demonstrates its potential in recognizing full sentences. Through comprehensive ablation studies, we show that our design choices lead to a strong balance between performance and efficiency. These findings support the development of more adaptable and scalable HWR systems for real-world applications.

使用惯性测量单位(IMUs)的数据进行在线笔迹识别(HWR)仍然具有挑战性,因为书写风格不同,附加注释的数据集也有限。以往的做法经常与隐形作家的笔迹纠缠不休,使独立作家(WI)的识别成为关键而困难的问题。本文展示了一种旨在用CNN编码器和BILSTM的解码器改进IMU数据的WI HWR模型。我们的方法显示对隐性笔迹风格的强强健性,优于现有方法,在公共OnHW数据集和我们基于字的数据集的WI分类上,实现7.37和9.44的字符错误率(CERs),以及字错误率(WERs)分别为15.12和32.17。 强健性评估表明,我们的模式保持了不同年龄组的较高准确性,从一个群体学到的知识比另一个群体要好得多。我们基于判决的数据集评估进一步显示了它在承认全面判刑方面的潜力。我们通过全面的通货膨胀研究,我们显示我们的设计选择能够使HRRRR和世界应用系统得到更可调整的实际发展。这些结果。这些结果支持。


Article 117

Title@2025-07-10 (4): Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning

Title: Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning Unsicherheit Quantifizierung für Motor Imagery BCI – Machine Learning vs. Deep Learning 机动图像BCI – – 机器学习与深层学习 2507.07511v1

Authors (4): Joris Suurmeijer, Ivo Pascal de Jong, Matias Valdenegro-Toro, Andreea Ioana Sburlea

Brain-computer interfaces (BCIs) turn brain signals into functionally useful output, but they are not always accurate. A good Machine Learning classifier should be able to indicate how confident it is about a given classification, by giving a probability for its classification. Standard classifiers for Motor Imagery BCIs do give such probabilities, but research on uncertainty quantification has been limited to Deep Learning. We compare the uncertainty quantification ability of established BCI classifiers using Common Spatial Patterns (CSP-LDA) and Riemannian Geometry (MDRM) to specialized methods in Deep Learning (Deep Ensembles and Direct Uncertainty Quantification) as well as standard Convolutional Neural Networks (CNNs). We found that the overconfidence typically seen in Deep Learning is not a problem in CSP-LDA and MDRM. We found that MDRM is underconfident, which we solved by adding Temperature Scaling (MDRM-T). CSP-LDA and MDRM-T give the best uncertainty estimates, but Deep Ensembles and standard CNNs give the best classifications. We show that all models are able to separate between easy and difficult estimates, so that we can increase the accuracy of a Motor Imagery BCI by rejecting samples that are ambiguous.

大脑-计算机界面(BCIs) 将大脑信号转化为功能上有用的输出,但并不总是准确的。 良好的机器学习分类人员应该能够通过提供分类的可能性来表明它对某一分类的自信程度。 汽车图像分类公司的标准分类人员确实提供了这种概率,但不确定性量化研究仅限于深学习。 我们用共同空间模式(CSP-LDA)和里曼测量仪(Riemann 几何仪)将已经建立的BCI分类师的不确定性量化能力与深层学习(深层组合和直接不确定性量化)以及标准进化神经网络(CNNs)的专门方法进行比较。 我们发现深层学习中通常看到的过度信任并不是CSP-LDA和MDRM的问题。 我们发现,MDRM(M)不够自信,我们通过添加温度增强(MDRM-T)来解决。 CSP-LDA和MDRM-T(MM-T)给出了最好的不确定性估计,但深层组合和标准的CNN(CN)给出了最佳的精确度。 我们发现,所有模型都很难分辨地分析。


Article 118

Title@2025-07-10 (4): Divergence Minimization Preference Optimization for Diffusion Model Alignment

Title: Divergence Minimization Preference Optimization for Diffusion Model Alignment Divergenz-Minimierungspräferenz-Optimierung für Diffusionsmodellausrichtung 传播模型对齐 2507.07510v1

Authors (4): Binxu Li, Minkai Xu, Meihua Dang, Stefano Ermon

Diffusion models have achieved remarkable success in generating realistic and versatile images from text prompts. Inspired by the recent advancements of language models, there is an increasing interest in further improving the models by aligning with human preferences. However, we investigate alignment from a divergence minimization perspective and reveal that existing preference optimization methods are typically trapped in suboptimal mean-seeking optimization. In this paper, we introduce Divergence Minimization Preference Optimization (DMPO), a novel and principled method for aligning diffusion models by minimizing reverse KL divergence, which asymptotically enjoys the same optimization direction as original RL. We provide rigorous analysis to justify the effectiveness of DMPO and conduct comprehensive experiments to validate its empirical strength across both human evaluations and automatic metrics. Our extensive results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques, specifically outperforming all existing diffusion alignment baselines by at least 64.6% in PickScore across all evaluation datasets, demonstrating the method’s superiority in aligning generative behavior with desired outputs. Overall, DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.

传播模型在从文本提示中生成现实和多用途图像方面取得了显著的成功。在语言模型最近进展的启发下,人们越来越有兴趣通过与人类偏好保持一致来进一步改进模型。然而,我们从差异最小化的角度调查现有偏好优化方法,并揭示现有偏好优化方法通常被困在最不理想的寻求平均优化中。在本文件中,我们引入了差异最小化偏好优化(DMPO)这一创新和有原则的方法,通过将反向差异最小化KL来统一传播模型,这种新颖和有原则的方法与原始RL一样享有相同的优化方向。我们提供了严格的分析,以证明DDPO的有效性,并进行了全面实验,以验证其在人类评价和自动计量中的经验实力。我们的广泛结果显示,与DDPO进行微调的传播模型能够始终超越或匹配现有技术,具体地在所有评价数据集中将所有现有的扩散匹配基线比PickScore至少高出64.6%,这表明在使基因化行为与预期产出相一致方面的方法优劣。总体而言,DPOPO为优惠调整了一条稳而优的路径,将原则理论与扩散模型的正确。


Article 119

Title@2025-07-10 (4): Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code

Title: Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code LLM zur Vernunft bringen: Stärkung Lernen aus algorithmischen Problemen ohne Code 教LLM到理由:加强从没有法典的等级问题中学习 2507.07498v1

Authors (9): Keqin Bao, Nuo Chen, Xiaoyuan Li, Binyuan Hui, Bowen Yu, Fuli Feng, Junyang Lin, Xiangnan He, Dayiheng Liu

Enhancing reasoning capabilities remains a central focus in the LLM reasearch community. A promising direction involves requiring models to simulate code execution step-by-step to derive outputs for given inputs. However, as code is often designed for large-scale systems, direct application leads to over-reliance on complex data structures and algorithms, even for simple cases, resulting in overfitting to algorithmic patterns rather than core reasoning structures. To address this, we propose TeaR, which aims at teaching LLMs to reason better. TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks, thereby improving general reasoning abilities. We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning. The results consistently show significant performance improvements. Notably, TeaR achieves a 35.9% improvement on Qwen2.5-7B and 5.9% on R1-Distilled-7B.

增强推理能力仍然是LLM再研究界的中心重点。一个有希望的方向是要求模拟代号执行逐步的模型,以获得特定投入的产出。然而,由于代码往往是为大型系统设计的,直接应用导致过分依赖复杂的数据结构和算法,即使是简单案例也会导致过度依赖复杂的数据结构和算法,导致过度适应算法模式而不是核心推理结构。为解决这一问题,我们提议TeaR,目的是更好地教授LMS。TeaR利用仔细的数据整理和强化学习来指导模型通过与代码有关的任务发现最佳推理路径,从而提高一般推理能力。我们使用两个基模型和三个长期CoT蒸馏模型进行广泛的实验,模型大小在15亿至320亿参数之间,跨越数学、知识、代码和逻辑原因等17个基准。结果始终显示显著的绩效改进。值得注意的是,TeaR在Quen2.5-7B上实现了35.9%的改进,在R1-Dustilled-7B上实现了5.9%的改进。


Article 120

Title@2025-07-10 (4): Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation

Title: Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation Semi-überwachtes Lernen und Integration von Multi-Sequenz-MR-Bildern für karotide Gefäßwand- und Plaquesegmentierung 在半监督下学习和整合对折合体船只壁壁和隔板的多序列MMM-图像的半监督学习和集成 2507.07496v1

Authors (6): Marie-Christine Pali, Christina Schwaiger, Malik Galijasevic, Valentin K. Ladenhauf, Stephanie Mangesius, Elke R. Gizewski

The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.

在多序列磁共振成像(MRI)数据中分析心动动脉,特别是板块,对于评估心动硬化和热中风的风险至关重要。为了评估度量和放射特征,量化心动硬化状态,准确的分解很重要。但是,粉碎的复杂形态和标签数据稀缺构成了重大挑战。在这项工作中,我们解决这些问题并提出半监督的深层次学习方法,旨在有效地整合多序列磁共振数据,用于断裂心动动器轮廓和板块。拟议算法由两个网络组成:粗局部化的本地化模型,根据以前对心动动脉位置和数量的一些知识,确定感兴趣的区域,随后是精确划定船体壁和板块的精细分解模型。为了有效地整合基于MRI的不同序列的补充信息,我们调查了不同的混合战略,并引入了多层次的多序列MRI数据数据数据,我们提出了各种心动动脉动系统变变动的模型,我们提出了各种磁性变变变的模型。


Article 121

Title@2025-07-10 (4): Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning

Title: Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning Aufgabenzuweisung und Explorationsoptimierung für UAV-Rescue mit geringer Höhe über Generative KI Enhanced Multi-Agent Verstärkungs-Lernen 通过创新的AI增强型多剂强化学习,为低高空无人驾驶航空器救援工作分配任务和探索优化 2504.13554v2

Authors (9): Xin Tang, Qian Chen, Wenjie Weng, Chao Jin, Zhang Liu, Jiacheng Wang, Geng Sun, Xiaohuan Li, Dusit Niyato

The integration of emerging uncrewed aerial vehicles (UAVs) with artificial intelligence (AI) and ground-embedded robots (GERs) has transformed emergency rescue operations in unknown environments. However, the high computational demands often exceed a single UAV’s capacity, making it difficult to continuously provide stable high-level services. To address this, this paper proposes a cooperation framework involving UAVs, GERs, and airships. The framework enables resource pooling through UAV-to-GER (U2G) and UAV-to-airship (U2A) links, offering computing services for offloaded tasks. Specifically, we formulate the multi-objective problem of task assignment and exploration as a dynamic long-term optimization problem aiming to minimize task completion time and energy use while ensuring stability. Using Lyapunov optimization, we transform it into a per-slot deterministic problem and propose HG-MADDPG, which combines the Hungarian algorithm with a GDM-based multi-agent deep deterministic policy gradient. Simulations demonstrate significant improvements in offloading efficiency, latency, and system stability over baselines.

新的未密封飞行器与人工智能(AI)和地面嵌入机器人(GER)相结合,改变了在未知环境中的紧急救援行动,然而,高计算需求往往超过单一无人驾驶飞行器的能力,难以持续提供稳定的高水平服务。为此,本文件提议了一个涉及无人驾驶飞行器、GERs和航空飞船的合作框架。该框架通过UAV-GER(U2G)和UAV-Airship(U2A)链接汇集资源,为卸载任务提供计算服务。具体地说,我们把任务分配和探索的多目标问题作为一个动态的长期优化问题,目的是尽量减少任务完成时间和能源使用,同时确保稳定。我们利用Lyapunov优化,将它转化为每个行点的确定性问题,并提出HG-MADDPG,将匈牙利算法与基于GDM的多剂的深度确定性政策梯度结合起来。模拟显示卸载效率、延缓度和系统稳定性在基线上的重大改进。


Article 122

Title@2025-07-10 (4): Affordable AI Assistants with Knowledge Graph of Thoughts

Title: Affordable AI Assistants with Knowledge Graph of Thoughts Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken 具有知识思想知识图的负担得起的AI助理 2504.02670v5

Authors (18): Maciej Besta, Lorenzo Paleari, Jia Hao Andrea Jiang, Robert Gerstenberger, You Wu, Jón Gunnar Hannesson, Patrick Iff, Ales Kubicek, Piotr Nyczyk, Diana Khimey, Nils Blach, Haiqiang Zhang, Tao Zhang, Peiran Ma, Grzegorz Kwaśniewski, Marcin Copik, Hubert Niewiadomski, Torsten Hoefler

Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively while also minimizing bias and noise. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini. Moreover, harnessing a smaller model dramatically reduces operational costs by over 36x compared to GPT-4o. Improvements for other models (e.g., Qwen2.5-32B and Deepseek-R1-70B) and benchmarks (e.g., SimpleQA) are similar. KGoT offers a scalable, affordable, versatile, and high-performing solution for AI assistants.

大型语言模型(LLMS)正在使能够跨领域执行不同任务的AI助理的发展革命性地革命性地革命性地革命性地革命性地发展能够执行不同任务的AI助理;然而,目前最先进的LLM驱动的代理商面临重大挑战,包括高运作成本和在诸如GAIA等复杂基准上的成功率有限。为了解决这些问题,我们提出了“思想知识图”(KGOT),这是一个创新的AI助理架构,将LLM推理与动态构建的知识图(KGGs)相结合。KGOT的摘录和结构任务相关知识,成为动态的KGG代表,通过数学解答器、网络爬虫器和Python脚本等外部工具不断加强。这种任务相关知识的结构化代表使低成本模式能够有效地解决复杂的任务,同时尽量减少偏见和噪音。例如,KGOT在GIA基准上实现了29%的任务成功率的提高,而GUG Fegg Feg Face Adriendorations (eg,Q-Go-Go-B) 和可负担得起的ASyal-GOATIal-GO1-B 提供的高标准70B)


Article 123

Title@2025-07-10 (4): Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning

Title: Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v1

Authors (2): Wooseong Jeong, Kuk-Jin Yoon

Multi-Task Learning (MTL) enables multiple tasks to be learned within a shared network, but differences in objectives across tasks can cause negative transfer, where the learning of one task degrades another task’s performance. While pre-trained transformers significantly improve MTL performance, their fixed network capacity and rigid structure limit adaptability. Previous dynamic network architectures attempt to address this but are inefficient as they directly convert shared parameters into task-specific ones. We propose Dynamic Token Modulation and Expansion (DTME-MTL), a framework applicable to any transformer-based MTL architecture. DTME-MTL enhances adaptability and reduces overfitting by identifying gradient conflicts in token space and applying adaptive solutions based on conflict type. Unlike prior methods that mitigate negative transfer by duplicating network parameters, DTME-MTL operates entirely in token space, enabling efficient adaptation without excessive parameter growth. Extensive experiments demonstrate that DTME-MTL consistently improves multi-task performance with minimal computational overhead, offering a scalable and effective solution for enhancing transformer-based MTL models.

多任务学习(MTL)使多个任务能够在共享的网络中学习,但不同任务的目标差异可能导致负转移,因为对一项任务的学习会降低另一个任务的业绩。尽管预先培训的变压器显著改善了MTL的性能,但其固定的网络容量和僵硬的结构限制了适应性。以前的动态网络结构试图解决这个问题,但效率低下,因为它们直接将共享参数转换成具体任务参数。我们提出了动态调控和扩展(DTME-MTL),这是一个适用于任何基于变压器的MTL结构的框架。DTME-MTL通过识别象征性空间的梯度冲突以及应用基于冲突类型的适应性解决方案来增强适应性和减少过度适应性。DTME-MTL不同于以往通过重复网络参数减少负转移的方法,完全在象征性空间运作,使得高效适应而没有过度的参数增长。广泛的实验表明DTME-MTL始终在以最低的计算间接费用改进多任务性业绩,为增强基于变压器的模型提供了可扩展性和有效解决办法。


Article 124

Title@2025-07-10 (4): Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Title: Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Machine Bullshit: Charakterisieren der Emergenten Missachtung der Wahrheit in großen Sprachmodellen 机器胡说:在大语言模型中突出新人无视真相的特点 2507.07484v1

Authors (6): Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fernández Fisac

Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs’ indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our results demonstrate that model fine-tuning with reinforcement learning from human feedback (RLHF) significantly exacerbates bullshit and inference-time chain-of-thought (CoT) prompting notably amplify specific bullshit forms, particularly empty rhetoric and paltering. We also observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy. Our findings highlight systematic challenges in AI alignment and provide new insights toward more truthful LLM behavior.

哲学家Harry Frankfurk认为,胡说,胡说八道的概念是指没有考虑到其真实价值的言论。虽然以前的工作探索了大型语言模型(LLM)幻觉和偏执,但我们建议把机器胡说八道作为总体概念框架,使研究人员能够将LLMLM中突然失去真实性这一更广泛的现象定性,并阐明其基本机制。我们引入了Bullish Index,这是将LLMS对真理漠不关心量化的一个新指标,并提出了一种补充分类法,分析四种质量的胡说:空话、讨好听、恶言和未经核实的主张。我们还在市场数据集、政治中立数据集和我们新的BullshoothEval基准(2 400个方案,涵盖100名AI助理)上进行了经验性评价,以明确评价机器的胡说八道。我们的结果表明,通过从人类反馈中强化学习(RLHFF),对模型进行微调,大大地加剧了胡说和推导时间思维链(CT),从而显著地扩充了具体的废话形式,特别是空话和互换。我们还观察了政治环境中普遍存在的机器胡说八道,用新话,我们的调查结果显示了更真实的深刻的眼光。


Article 125

Title@2025-07-10 (4): Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences

Title: Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences Adaptive Randomisierte Glättung: Zertifizierte Adversarial Robustheit für Multi-Step-Verteidigungen 适应性随机调整平滑:多步骤防御的证明反向强力 2406.10427v3

Authors (5): Saiyue Lyu, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, Mathias Lécuyer

We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples. ARS extends the analysis of randomized smoothing using $f$-Differential Privacy to certify the adaptive composition of multiple steps. For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy inputs. We instantiate ARS on deep image classification to certify predictions against adversarial examples of bounded $L_{\infty}$ norm. In the $L_{\infty}$ threat model, ARS enables flexible adaptation through high-dimensional input-dependent masking. We design adaptivity benchmarks, based on CIFAR-10 and CelebA, and show that ARS improves standard test accuracy by $1$ to $15\%$ points. On ImageNet, ARS improves certified test accuracy by up to $1.6\%$ points over standard RS without adaptivity. Our code is available at https://github.com/ubc-systopia/adaptive-randomized-smoothing .

我们建议采用适应性随机滑动(ARS),以证明我们测试时适应性模型的预测与对抗性实例的对比。ARS扩展了对随机性滑动的分析,使用美元差异隐私来证明多个步骤的适应性构成。我们的理论首次涵盖了噪音投入的一般和高维功能的健全适应性构成。我们即时将ARS用于深度图像分类,以证明对受约束的$Linfty}规范的对抗性实例的预测。在$Linfty}威胁模型中,ARS能够通过高维输入依赖的遮罩进行灵活适应性调整。我们根据CIFAR-10和CelibA设计了适应性基准,并表明ARS将标准测试精确度提高1美元至1美元至15美元点。在图像网上,ARS将认证的测试精度提高到不超过标准RS的1.6美元点,而不具有适应性。我们的代码可在https://github.com/ubc-syststopia/adpti-raniz-domiz-smotion-smothinging.


Article 126

Title@2025-07-10 (4): Mixture of Group Experts for Learning Invariant Representations

Title: Mixture of Group Experts for Learning Invariant Representations Mixtur von Gruppenexperten für Learning Invariante Repräsentationen 学习不稳定代表小组专家混合 2504.09265v2

Authors (4): Lei Kang, Jia Li, Mi Tian, Hua Huang

Sparsely activated Mixture-of-Experts (MoE) models effectively increase the number of parameters while maintaining consistent computational costs per token. However, vanilla MoE models often suffer from limited diversity and specialization among experts, constraining their performance and scalability, especially as the number of experts increases. In this paper, we present a novel perspective on vanilla MoE with top-$k$ routing inspired by sparse representation. This allows us to bridge established theoretical insights from sparse representation into MoE models. Building on this foundation, we propose a group sparse regularization approach for the input of top-$k$ routing, termed Mixture of Group Experts (MoGE). MoGE indirectly regularizes experts by imposing structural constraints on the routing inputs, while preserving the original MoE architecture. Furthermore, we organize the routing input into a 2D topographic map, spatially grouping neighboring elements. This structure enables MoGE to capture representations invariant to minor transformations, thereby significantly enhancing expert diversity and specialization. Comprehensive evaluations across various Transformer models for image classification and language modeling tasks demonstrate that MoGE substantially outperforms its MoE counterpart, with minimal additional memory and computation overhead. Our approach provides a simple yet effective solution to scale the number of experts and reduce redundancy among them. The source code is included in the supplementary material and will be publicly released.

在本文中,我们提出了一个关于香草混合模型的新视角,其中含有由稀疏代表所启发的以最高至1美元为主的路线路线。根据这个基础,我们提议对最高至1千美元路线(称为集团专家混合)的投入采取集体稀有的正规化方法。 移动模式通过对路由投入施加结构性限制,间接规范专家,同时保留原始的移动结构。此外,我们组织向2D地形图输入的路线,空间分组相邻要素。这一结构使得地理部能够将从稀疏代表形式到微小转变模式的既定理论见解连接起来,从而大大加强专家的多样性和专业化。在各种变异模式(即称为集团专家混合)的投入方面,我们提出了一种集体稀少的正规化方法。在图像分类和语言模式(称为组合专家混合)的输入中,对专家进行了有限的多样化和专业化。 移动模型通过对路由结构投入的结构性限制,对专家进行了间接规范,同时保留原始的移动结构结构。我们把路由2D地形图、空间分组的相邻元素组合要素组织输入的路线输入。这一结构可以将反映细微的图,从而大大增强专家对图像分类和语言模型的模型模型进行全面的翻版化。


Article 127

Title@2025-07-10 (4): ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Title: ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining ixi-GEN: Effiziente industrielle sLLMs durch Domain Adaptive Continual Pretraining ixi-GEN:通过远程适应性连续训练前,提高工业低温生产效率 2507.06795v2

Authors (10): Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon

The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative, despite their inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been previously explored as a method for domain adaptation, its utility in commercial applications remains under-examined. In this study, we validate the effectiveness of applying a DACP-based recipe across diverse foundation models and service domains. Through extensive experiments and real-world evaluations, we demonstrate that DACP-applied sLLMs achieve substantial gains in target domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.

开放源码大型语言模型的出现扩大了企业应用机会;然而,许多组织仍然缺乏部署和维持大型模型的基础设施,因此,小型LLM(LLMS)尽管具有内在的性能限制,但已成为一种实用的替代方案;虽然以前曾探索过作为领域适应方法的DACP(DACP),但其在商业应用中的效用仍未得到充分审查;在本研究中,我们验证了在不同基础模型和服务领域应用基于DACP的配方的有效性;通过广泛的实验和现实世界的评价,我们证明DACP(DACP)所应用的SLLMS在目标领域业绩方面取得巨大进展,同时保留了一般能力,为企业一级的部署提供了具有成本效益和可扩展的解决方案。


Article 128

Title@2025-07-10 (4): Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting

Title: Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting Galerkin-ARIMA: Ein zweistufiges Polynom-Regressions-Framework für schnelles Ein-Schritt-Vorhersagen Galerkin-ARIMA:一个双级多级倒退框架,用于快速滚动单步单步预告 2507.07469v1

Authors (2): Haojie Liu, Zihan Lin

Time-series models like ARIMA remain widely used for forecasting but limited to linear assumptions and high computational cost in large and complex datasets. We propose Galerkin-ARIMA that generalizes the AR component of ARIMA and replace it with a flexible spline-based function estimated by Galerkin projection. This enables the model to capture nonlinear dependencies in lagged values and retain the MA component and Gaussian noise assumption. We derive a closed-form OLS estimator for the Galerkin coefficients and show the model is asymptotically unbiased and consistent under standard conditions. Our method bridges classical time-series modeling and nonparametric regression, which offering improved forecasting performance and computational efficiency.

ARIMA等时间序列模型仍然广泛用于预测,但限于线性假设和大型和复杂数据集的高计算成本。我们提议Galerkin-ARIMA将ARIMA的AR部分概括化,代之以由Galerkin预测估计的灵活样条函数。这使模型能够捕捉滞后值的非线性依赖性,并保留MA部分和Gausian噪音假设。我们为Galerkin系数提取了一个封闭式的OSLS估计仪,并显示该模型在标准条件下是无差别的和一致的。我们的方法连接了传统的时序模型和非参数回归,提供了更好的预测性能和计算效率。


Article 129

Title@2025-07-10 (4): Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals

Title: Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals Hess-MC2: Sequentielle Monte Carlo mit Hessischen Informationen und Vorschlägen für die zweite Ordnung Hess-MC2:使用黑森信息和第二顺序提案的顺序蒙特卡洛广场 2507.07461v1

Authors (6): Joshua Murphy, Conor Rosato, Andrew Millard, Lee Devlin, Paul Horridge, Simon Maskell

When performing Bayesian inference using Sequential Monte Carlo (SMC) methods, two considerations arise: the accuracy of the posterior approximation and computational efficiency. To address computational demands, Sequential Monte Carlo Squared (SMC$^2$) is well-suited for high-performance computing (HPC) environments. The design of the proposal distribution within SMC$^2$ can improve accuracy and exploration of the posterior as poor proposals may lead to high variance in importance weights and particle degeneracy. The Metropolis-Adjusted Langevin Algorithm (MALA) uses gradient information so that particles preferentially explore regions of higher probability. In this paper, we extend this idea by incorporating second-order information, specifically the Hessian of the log-target. While second-order proposals have been explored previously in particle Markov Chain Monte Carlo (p-MCMC) methods, we are the first to introduce them within the SMC$^2$ framework. Second-order proposals not only use the gradient (first-order derivative), but also the curvature (second-order derivative) of the target distribution. Experimental results on synthetic models highlight the benefits of our approach in terms of step-size selection and posterior approximation accuracy when compared to other proposals.

在使用连续的蒙特卡洛(SMC)方法进行巴伊西亚推断时,出现了两个因素:后近近和计算效率的准确性。为解决计算需求,SMC$2美元(SMC$2美元)适用于高性能计算环境。SMC$2美元内的建议分配设计可以提高后继体的准确性和探索性,因为低建议可能导致重要重量和粒子衰减的高度差异。大都会-调整后的朗埃文阿尔戈里特姆(MALA)使用梯度信息,以便粒子优先探索概率较高的区域。在本文件中,我们扩展这一想法,纳入了二级信息,特别是日志目标的黑森。虽然此前在粒子Markov链蒙特卡洛(p-MC)方法中已经探讨过第二级建议,但我们首先在SMC$%2的框架内引入了第二级建议。第二级建议不仅使用梯度(一级衍生品),而且还在合成阶段分配结果中比较了其他级标值。在实验性模型中,还使用了其他级标度分配结果。


Article 130

Title@2025-07-10 (4): General purpose models for the chemical sciences

Title: General purpose models for the chemical sciences Allgemeine Zweckmodelle für die Chemiewissenschaften 化学科学通用模型 2507.07456v1

Authors (9): Nawaf Alampara, Anagha Aneesh, Martiño Ríos-García, Adrian Mirza, Mara Schilling-Wilhelmi, Ali Asghar Aghajani, Meiling Sun, Gordan Prastalo, Kevin Maik Jablonka

Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tasks they have not been directly trained on, and to flexibly operate with low amounts of data in different formats. In this review, we discuss fundamental building principles of GPMs and review recent applications of those models in the chemical sciences across the entire scientific process. While many of these applications are still in the prototype phase, we expect that the increasing interest in GPMs will make many of them mature in the coming years.

以数据为驱动的技术在改变和加速化学科学方面具有巨大的潜力,然而,化学科学也提出了非常多样、小型、模糊的数据集的独特挑战,这些数据集很难在传统的机器学习方法中完全加以利用。 一类新的模型,如大型语言模型等通用模型,已经表明有能力解决它们尚未直接接受过培训的任务,并以不同格式以低量数据灵活运作。本审查中,我们讨论了全球分子模型的基本建设原则,并审查了这些模型最近在整个科学进程中在化学科学中的应用情况。 虽然其中许多应用仍处于原型阶段,但我们预计,对全球分子模型的日益兴趣将在未来几年中使其中许多变得成熟。


Article 131

Title@2025-07-10 (4): C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition

Title: C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v4

Authors (3): Abhi Kamboj, Anh Duy Nguyen, Minh N. Do

In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between time-series modalities using a multimodal \textit{temporal} representation space for Human Activity Recognition (HAR). Specifically, we explore the setting where the modality used in testing has no labeled data during training, which we refer to as Unsupervised Modality Adaptation (UMA). We categorize existing UMA approaches as Student-Teacher or Contrastive Alignment methods. These methods typically compress continuous-time data samples into single latent vectors during alignment, inhibiting their ability to transfer temporal information through real-world temporal distortions. To address this, we introduce Cross-modal Transfer Through Time (C3T), which preserves temporal information during alignment to handle dynamic sensor data better. C3T achieves this by aligning a set of temporal latent vectors across sensing modalities. Our extensive experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy and shows superior robustness to temporal distortions such as time-shift, misalignment, and dilation. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for various multimodal applications.

为了释放不同传感器的潜力,我们调查了一种在使用多式联运\textit{textoral}代表空间进行人类活动识别的时间序列模式之间转让知识的方法。 具体地说,我们探索了在培训期间测试模式没有标签数据的地方,我们称之为“不受监督的模式适应”(UMA)。我们将现有的UMA方法归类为“学生-教师或对齐方法 ” 。这些方法通常在调整期间将连续时间数据样本压缩为单一潜在矢量,从而抑制它们通过现实世界的时间扭曲传输时间信息的能力。为了解决这个问题,我们引入了跨时间模式转移(C3T),在调整期间保留时间信息,以便更好地处理动态传感器数据。C3T通过对各种遥感模式的一组时间潜值矢量加以调整,从而实现了这一点。我们在各种相机+IMU数据集上进行的广泛实验表明,C3T在精确度上比UMA现有方法高出至少8%,并显示它们通过现实时间变化、误差和对时间扭曲的超强性时间扭曲能力。为了改进时间变化等,我们发现C3号为各种通用数据系列开辟了各种移动模式。


Article 132

Title@2025-07-10 (4): Neural networks leverage nominally quantum and post-quantum representations

Title: Neural networks leverage nominally quantum and post-quantum representations Neurale Netzwerke nutzen nominal Quanten- und Post-Quantum-Darstellungen 神经网络在名义上对数量和数量后代表的杠杆作用发挥杠杆作用 2507.07432v1

Authors (3): Paul M. Riechers, Thomas J. Elliott, Adam S. Shai

We show that deep neural networks, including transformers and RNNs, pretrained as usual on next-token prediction, intrinsically discover and represent beliefs over ‘quantum’ and ‘post-quantum’ low-dimensional generative models of their training data – as if performing iterative Bayesian updates over the latent state of this world model during inference as they observe more context. Notably, neural nets easily find these representation whereas there is no finite classical circuit that would do the job. The corresponding geometric relationships among neural activations induced by different input sequences are found to be largely independent of neural-network architecture. Each point in this geometry corresponds to a history-induced probability density over all possible futures, and the relative displacement of these points reflects the difference in mechanism and magnitude for how these distinct pasts affect the future.

我们显示,深神经网络,包括变压器和RNNs,在下方预测方面如往常受过训练,其培训数据的“quantum”和“quantum后”的低维基因模型的内在发现和代表了这些“quantum”和“quantum后”的信念 – – 仿佛在推断期间对这个世界模型的潜伏状态进行迭代贝ysian更新,因为它们观察到更多的背景。值得注意的是,神经网很容易找到这些表达方式,而没有固定的古典电路来完成这项工作。不同输入序列引发的神经激活之间的对应几何关系被认为基本上独立于神经网络结构。这一几何测量方法中的每一个点都与所有可能的未来的历史诱发概率密度相匹配,这些点的相对偏移反映了这些截然不同的过去如何影响未来的机制与规模的差异。


Article 133

Title@2025-07-10 (4): ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense

Title: ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense ARBoids: Adaptives Residual-Verstärkungs-Lernen mit Boids-Modell für kooperative Multi-USV-Zielverteidigung ABBOids:多紫外线合作多紫外线目标防御用BOids模式进行适应性残余强化学习 2502.18549v2

Authors (4): Jiyue Tao, Tongsheng Shen, Dexin Zhao, Feitian Zhang

The target defense problem (TDP) for unmanned surface vehicles (USVs) concerns intercepting an adversarial USV before it breaches a designated target region, using one or more defending USVs. A particularly challenging scenario arises when the attacker exhibits superior maneuverability compared to the defenders, significantly complicating effective interception. To tackle this challenge, this letter introduces ARBoids, a novel adaptive residual reinforcement learning framework that integrates deep reinforcement learning (DRL) with the biologically inspired, force-based Boids model. Within this framework, the Boids model serves as a computationally efficient baseline policy for multi-agent coordination, while DRL learns a residual policy to adaptively refine and optimize the defenders’ actions. The proposed approach is validated in a high-fidelity Gazebo simulation environment, demonstrating superior performance over traditional interception strategies, including pure force-based approaches and vanilla DRL policies. Furthermore, the learned policy exhibits strong adaptability to attackers with diverse maneuverability profiles, highlighting its robustness and generalization capability. The code of ARBoids will be released upon acceptance of this letter.

无人驾驶表面飞行器(USVs)的防御目标问题(TDP)涉及在攻击者违反指定目标区域之前拦截对抗性USV,使用一种或多种保护USV。当攻击者比维权者表现出较强的机动性,使有效拦截工作大为复杂化时,就会出现一种特别具有挑战性的情况。为了应对这一挑战,本信引入了ABBOid(ARBOid),这是一个新的适应性剩余强化学习框架,将深度强化学习(DRL)与生物启发的、以力量为基础的BOids模型结合起来。在这个框架内,BOids模型作为多剂协调的计算效率基准政策,而DRL则学习一项调整和优化维权者行动的剩余政策。拟议方法在高不显性Gazebo模拟环境中得到验证,展示了优于传统拦截战略的优异性表现,包括纯力方法和香草DRL政策。此外,所学的政策显示对攻击者具有很强的适应性,突出其坚固性和概括性。一旦接受此信,ARBOids的守则将予公布。


Article 134

Title@2025-07-10 (4): Harmonic Loss Trains Interpretable AI Models

Title: Harmonic Loss Trains Interpretable AI Models Harmonische Verlust Züge Interpretierbare KI-Modelle 可解释的 AI 模型 2502.01628v2

Authors (4): David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark

In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. Looking forward, we believe harmonic loss may become a valuable tool in domains with limited data availability or in high-stakes applications where interpretability and reliability are paramount, paving the way for more robust and efficient neural network models.

在本文中,我们引入了和谐损失,作为培训神经网络和大语言模型的替代监督信号。和谐损失与标准的跨天体损失不同,具体做法是:(a) 以比例反差的哈马克斯功能取代常规软马克斯正常化,(b) 通过Euclidean距离而不是点产物计算日志。和谐损失可以提高可解释性和更快的趋同性,因为其规模变化和因设计而形成的有限趋同点可以被解释为阶级中心。我们首先验证算法、视觉和语言数据集之间的和谐模型的性能。通过广泛的实验,我们证明接受过协调损失培训的模型比标准模型效果更好,具体做法是:(a) 提高可解释性,(b) 减少一般化的数据,以及(c) 减少重力。此外,我们比较了受过口音损失训练的GPT-2模型与标准的GPT-2模型相比,表明该调理模型会发展更多的可解释性表现。我们先看,认为,在数据可用性有限或高压网络模型中,在高能可靠的情况下,协调损失可能会成为有价值的工具。


Article 135

Title@2025-07-10 (4): Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm

Title: Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm Probabilistische annähernde Optimierung: Eine neue Variation des Monte Carlo-Algorithmus 概率近似优化:新的变异性蒙特卡洛算法 2507.07420v1

Authors (4): Abdelrahman S. Abdelrahman, Shuvro Chowdhury, Flaviano Morone, Kerem Y. Camsari

We introduce a generalized \textit{Probabilistic Approximate Optimization Algorithm (PAOA)}, a classical variational Monte Carlo framework that extends and formalizes prior work by Weitz \textit{et al.}~\cite{Combes_2023}, enabling parameterized and fast sampling on present-day Ising machines and probabilistic computers. PAOA operates by iteratively modifying the couplings of a network of binary stochastic units, guided by cost evaluations from independent samples. We establish a direct correspondence between derivative-free updates and the gradient of the full $2^N \times 2^N$ Markov flow, showing that PAOA admits a principled variational formulation. Simulated annealing emerges as a limiting case under constrained parameterizations, and we implement this regime on an FPGA-based probabilistic computer with on-chip annealing to solve large 3D spin-glass problems. Benchmarking PAOA against QAOA on the canonical 26-spin Sherrington-Kirkpatrick model with matched parameters reveals superior performance for PAOA. We show that PAOA naturally extends simulated annealing by optimizing multiple temperature profiles, leading to improved performance over SA on heavy-tailed problems such as SK-L'evy.

我们引入了普遍化的 & Textit{ 概率接近最佳化算法( PAOA ) , 典型的Monte Carlo 变异框架, 延长和正式了Weitz\ textit{et al. cite{Combes_2023} 的先前工作, 使当今的Ising 机器和概率计算机能够进行参数化和快速取样。 PAOA 操作以迭接方式修改二进制抽查器网络的连接, 由独立样本的成本评估提供指导。 我们在无衍生产品更新和完全2+N\time Markov 流的梯度之间建立了直接对应关系, 这表明 PAOA 接受有原则性的变异配方配方。 在受限制的参数化下, 模拟安非金正比型计算机系统( FPGA) , 和 机尾固型自动调试机解决大型的3D 脊椎问题。 我们将PAAAA 与 QAAA 在 Cancial- spin Sherrlsin Sherlorxal 上显示高级的性表现模型。


Article 136

Title@2025-07-10 (4): Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation

Title: Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation Autonomes KI-basiertes Cybersecurity Framework für kritische Infrastruktur: Echtzeit-Bedrohungsmilderung 以AI为基础的关键基础设施自动网络安全框架:减少实时威胁 2507.07416v1

Authors (4): Jenifer Paulraj, Brindha Raghuraman, Nagarani Gopalakrishnan, Yazan Otoum

Critical infrastructure systems, including energy grids, healthcare facilities, transportation networks, and water distribution systems, are pivotal to societal stability and economic resilience. However, the increasing interconnectivity of these systems exposes them to various cyber threats, including ransomware, Denial-of-Service (DoS) attacks, and Advanced Persistent Threats (APTs). This paper examines cybersecurity vulnerabilities in critical infrastructure, highlighting the threat landscape, attack vectors, and the role of Artificial Intelligence (AI) in mitigating these risks. We propose a hybrid AI-driven cybersecurity framework to enhance real-time vulnerability detection, threat modelling, and automated remediation. This study also addresses the complexities of adversarial AI, regulatory compliance, and integration. Our findings provide actionable insights to strengthen the security and resilience of critical infrastructure systems against emerging cyber threats.

关键基础设施系统,包括能源电网、卫生保健设施、运输网络和水分配系统,对于社会稳定和经济复原力至关重要,然而,这些系统日益相互连接,使它们面临各种网络威胁,包括赎金软件、拒绝服务攻击和高级持续威胁。本文件审视了关键基础设施的网络安全脆弱性,重点介绍了威胁地貌、攻击病媒以及人工智能(AI)在减轻这些风险方面的作用。我们提议建立一个由AI驱动的混合网络安全框架,以加强实时脆弱性检测、威胁建模和自动补救。本研究还探讨了对抗性AI、监管合规和整合的复杂性。我们的调查结果提供了可操作的见解,以加强关键基础设施系统的安全和复原力,应对新出现的网络威胁。


Article 137

Title@2025-07-10 (4): Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks

Title: Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks Hybride LLM-verstärkte Intrusionserkennung für Zero-Day-Bedrohungen in IoT-Netzwerken 在IoT网络零日威胁下加强入侵探测 2507.07413v1

Authors (4): Mohammad F. Al-Hammouri, Yazan Otoum, Rasha Atwa, Amiya Nayak

This paper presents a novel approach to intrusion detection by integrating traditional signature-based methods with the contextual understanding capabilities of the GPT-2 Large Language Model (LLM). As cyber threats become increasingly sophisticated, particularly in distributed, heterogeneous, and resource-constrained environments such as those enabled by the Internet of Things (IoT), the need for dynamic and adaptive Intrusion Detection Systems (IDSs) becomes increasingly urgent. While traditional methods remain effective for detecting known threats, they often fail to recognize new and evolving attack patterns. In contrast, GPT-2 excels at processing unstructured data and identifying complex semantic relationships, making it well-suited to uncovering subtle, zero-day attack vectors. We propose a hybrid IDS framework that merges the robustness of signature-based techniques with the adaptability of GPT-2-driven semantic analysis. Experimental evaluations on a representative intrusion dataset demonstrate that our model enhances detection accuracy by 6.3%, reduces false positives by 9.0%, and maintains near real-time responsiveness. These results affirm the potential of language model integration to build intelligent, scalable, and resilient cybersecurity defences suited for modern connected environments.

本文介绍了一种新颖的入侵探测方法,将传统的基于签名的方法与GPT-2大语言模型(LLM)的背景理解能力结合起来。随着网络威胁日益复杂,特别是在分布式、多样化和资源紧张的环境中,例如由物联网(IoT)促成的环境,对动态和适应性入侵探测系统的需求日益迫切。虽然传统方法对探测已知威胁依然有效,但它们往往无法识别新的和不断变化的攻击模式。相比之下,GPT-2在处理非结构化的数据和识别复杂的语义关系方面十分出色,使得它非常适合发现微妙的零日攻击矢量。我们提议了一个基于签名的技术的稳健性与GPT-2驱动的语义分析的适应性相结合的混合的 IDS 框架。对具有代表性的入侵数据集的实验性评估表明,我们的模型提高了6.3%的检测准确性,将假正数减少9.0%,并保持近实时反应。这些结果证实语言模型整合对于建设智能、可伸缩性和耐力的网络安全防御系统适合现代连接环境的潜力。


Article 138

Title@2025-07-10 (4): Determinant Estimation under Memory Constraints and Neural Scaling Laws

Title: Determinant Estimation under Memory Constraints and Neural Scaling Laws Determinante Abschätzung unter Gedächtnisbeschränkungen und neuralen Skalierungsgesetzen 根据记忆限制和神经扩增法对决定因素进行估算 2503.04424v2

Authors (5): Siavash Ameli, Chris van der Heide, Liam Hodgkinson, Fred Roosta, Michael W. Mahoney

Calculating or accurately estimating log-determinants of large positive definite matrices is of fundamental importance in many machine learning tasks. While its cubic computational complexity can already be prohibitive, in modern applications, even storing the matrices themselves can pose a memory bottleneck. To address this, we derive a novel hierarchical algorithm based on block-wise computation of the LDL decomposition for large-scale log-determinant calculation in memory-constrained settings. In extreme cases where matrices are highly ill-conditioned, accurately computing the full matrix itself may be infeasible. This is particularly relevant when considering kernel matrices at scale, including the empirical Neural Tangent Kernel (NTK) of neural networks trained on large datasets. Under the assumption of neural scaling laws in the test error, we show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset; in our experiments, this results in a $\sim$100,000$\times$ speedup with improved accuracy over competing approximations. Using these techniques, we successfully estimate log-determinants for dense matrices of extreme sizes, which were previously deemed intractable and inaccessible due to their enormous scale and computational demands.

在许多机器学习任务中,计算或准确估计大型正确定基质的逻辑确定值至关重要。 虽然其立方计算复杂性已经令人望而却步,但在现代应用中,即使储存基质本身也可能造成记忆瓶颈。 为了解决这个问题,我们根据在记忆限制环境中对大型正确定基质确定值的LDL分解值的分解进行分解的分解的分解, 得出一种新的等级算法。 在矩阵条件极差的极端情况下, 精确计算完整基质本身可能不可行。 在考虑规模的内核矩阵时, 包括经大规模数据集培训的神经神经内尔(NTENCER)的经验性神经内尔(NTKKNTK), 这一点特别相关。 在假设测试错误中的神经缩放法假设下, 我们显示,假定值的比重符合一种权力- 法律关系, 使我们能够得出相应的缩放法。 这样就可以精确估计整个基质的微小部分的NTK逻辑确定值; 在我们的实验中, 以100000美元/timetime计时, 的结果是特别相关的结果。


Article 139

Title@2025-07-10 (4): Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models

Title: Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models Phishing Detection in der Gen-AI Ära: Quantisierte LLMs gegen klassische Modelle Gen-AI 时代中的幻影探测:量化的LMs 与古典模型 2507.07406v1

Authors (4): Jikesh Thapa, Gurrehmat Chahal, Serban Voinea Gabreanu, Yazan Otoum

Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency. This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small-parameter Large Language Models (LLMs) for phishing detection. Through experiments on a curated dataset, we show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues. We also investigate the impact of zero-shot and few-shot prompting strategies, revealing that LLM-rephrased emails can significantly degrade the performance of both ML and LLM-based detectors. Our benchmarking highlights that models like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above 80%, using only 17GB of VRAM, supporting their viability for cost-efficient deployment. We further assess the models’ adversarial robustness and cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide concise, interpretable explanations to support real-time decision-making. These findings position optimized LLMs as promising components in phishing defence systems and offer a path forward for integrating explainable, efficient AI into modern cybersecurity frameworks.

光学攻击正在变得越来越复杂,突出表明需要建立探测系统,以便在高精确度和计算效率之间取得平衡。本文对传统机器学习(ML)、深学习(DL)和小参数大语言模型(LLM)进行比较评估,以进行钓鱼探测。我们通过对曲线数据集的实验表明,虽然LLMS目前与ML和DL方法相比在原始精确度方面表现不佳,但它们在查明微妙、基于背景的钓鱼提示方面有着巨大的潜力。我们还调查了零射和几发提示战略的影响,揭示了LLMM重新使用的电子邮件可以大大降低MLM和LLM的探测器的性能。我们的基准显示,DeepSeik R1stilling Qwen 14B(Q8_0)等模型取得了超过80%的竞争性精度,仅使用VRAM的17GB,支持其成本效益部署的可行性。我们进一步评估了这些模型的对抗性强性和成本效益权衡,并展示了LMS的轻度定位能够提供最精确、最有前途的前沿定位,从而提供最佳的自我定位。


Article 140

Title@2025-07-10 (4): HGMP:Heterogeneous Graph Multi-Task Prompt Learning

Title: HGMP:Heterogeneous Graph Multi-Task Prompt Learning HGMP:Heterogenes Graph-Multi-Task-Prompt-Lernen HGMP: 异基因图多任务快速学习 2507.07405v1

Authors (7): Pengfei Jiao, Jialong Ni, Di Jin, Xuan Guo, Huan Liu, Hongjiang Chen, Yanxian Bi

The pre-training and fine-tuning methods have gained widespread attention in the field of heterogeneous graph neural networks due to their ability to leverage large amounts of unlabeled data during the pre-training phase, allowing the model to learn rich structural features. However, these methods face the issue of a mismatch between the pre-trained model and downstream tasks, leading to suboptimal performance in certain application scenarios. Prompt learning methods have emerged as a new direction in heterogeneous graph tasks, as they allow flexible adaptation of task representations to address target inconsistency. Building on this idea, this paper proposes a novel multi-task prompt framework for the heterogeneous graph domain, named HGMP. First, to bridge the gap between the pre-trained model and downstream tasks, we reformulate all downstream tasks into a unified graph-level task format. Next, we address the limitations of existing graph prompt learning methods, which struggle to integrate contrastive pre-training strategies in the heterogeneous graph domain. We design a graph-level contrastive pre-training strategy to better leverage heterogeneous information and enhance performance in multi-task scenarios. Finally, we introduce heterogeneous feature prompts, which enhance model performance by refining the representation of input graph features. Experimental results on public datasets show that our proposed method adapts well to various tasks and significantly outperforms baseline methods.

培训前和微调方法由于能够在培训前阶段利用大量未贴标签的数据,使得模型能够学习丰富的结构特征,因而在多元图形神经网络领域得到了广泛的关注。然而,这些方法面临了预先培训的模式和下游任务之间不匹配的问题,在某些应用情景下游任务中导致不优化的绩效。快速学习方法已成为多元图形任务的新方向,因为这些方法允许灵活调整任务说明,以解决目标不一致的问题。基于这一想法,本文件为差异图形域(名为HGMP.First)提出了一个新的多任务快速框架,以弥合预先培训的模式与下游任务之间的差距,我们将所有下游任务改编成统一的图形级任务格式。接下来,我们处理现有的图表快速学习方法的局限性,这些方法努力将对比性培训前战略纳入组合图域。我们设计了一个图表级对比性培训前战略,以更好地利用混成信息,提高多任务情景的绩效。最后,我们引入了差异性特征提示,通过改进输入式图表的描述方式,改进了各种基准特征。实验性地展示了我们提出的各种基准方法。


Article 141

Title@2025-07-10 (4): Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

Title: Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization Generalized Tree Edit Distance (GTED): Ein treues Bewertungsmetrikum für die Autoformalisierung von Aussagen 通用树版编辑距离(GTED):声明自动正规化的忠实评价度量 2507.07399v1

Authors (9): Yuntian Liu, Tao Zhu, Xiaoyang Liu, Yu Chen, Zhaoxuan Liu, Qingfeng Guo, Jiashuo Zhang, Kangjie Bao, Tao Luo

Statement autoformalization, the automated translation of statement from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose GTED (Generalized Tree Edit Distance), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. On the miniF2F and ProofNet benchmarks, GTED outperforms all baseline metrics by achieving the highest accuracy and Kappa scores, thus providing the community with a more faithful metric for automated evaluation. The code and experimental results are available at https://github.com/XiaoyangLiu-sjtu/GTED.

报表自动化,即将声明从自然语言自动翻译成正式语言,已成为广泛研究的主题,但开发强有力的自动评价指标仍然有限;现有的评价方法往往缺乏语义理解,面临高计算成本的挑战,并受到当前自动理论验证进展的制约;为解决这些问题,我们提议采用新的评价框架GTED(通用树编辑距离),首先将正式声明标准化,然后将其转换为操作员树,然后利用匿名GTED标准确定语义相似性。关于小型F2F和校对网基准,GTED通过达到最高准确度和卡帕分数,超越所有基线指标,从而为社区提供了更忠实的自动化评价指标。该代码和实验结果可在https://github./com-XiaoyangLiu-sjtu/GTED上查阅。


Article 142

Title@2025-07-10 (4): IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing IML-Spikeformer: Multi-Level Spiking Transformer für die Sprachverarbeitung IML-Spikeex: 用于语音处理的具有投入意识的多层Spiking变换器 2507.07396v1

Authors (5): Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li

Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing task. Two key challenges hinder progress: (1) the high computational overhead during training caused by multi-timestep spike firing, and (2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce Input-aware Multi-Level Spikeformer, i.e. IML-Spikeformer, a spiking Transformer architecture specifically designed for large-scale speech processing. Central to our design is the Input-aware Multi-Level Spike (IMLS) mechanism, which simulate multi-timestep spike firing within a single timestep using an adaptive, input-aware thresholding scheme. IML-Spikeformer further integrates a Reparameterized Spiking Self-Attention (RepSSA) module with a Hierarchical Decay Mask (HDM), forming the HD-RepSSA module. This module enhances the precision of attention maps and enables modeling of multi-scale temporal dependencies in speech signals. Experiments demonstrate that IML-Spikeformer achieves word error rates of 6.0\% on AiShell-1 and 3.4\% on Librispeech-960, comparable to conventional ANN transformers while reducing theoretical inference energy consumption by 4.64$\times$ and 4.32$\times$ respectively. IML-Spikeformer marks an advance of scalable SNN architectures for large-scale speech processing in both task performance and energy efficiency.

在生物神经机制的启发下,Spiking神经网络(SNNS)是一个充满希望的神经形态计算模式,它为传统的人工神经网络(ANNS)提供了节能替代品。尽管已经证明是有效的,但SNN结构在大规模语音处理任务上很难达到竞争性业绩。两大挑战阻碍了进展:(1) 多时间步尖刺击射击在培训过程中造成的高计算间接费用,以及(2) 缺乏适合语音处理任务的大型SNNS结构。为了克服问题,我们引入了投入-觉醒多级Spikenter(REPSA)模块,即IML-Spikeuter(IM-Spikeore),这是一个专门为大规模语音处理而设计的高能效的理论式变换结构。对于我们设计的中心是投入-觉悟多级斯派(IMLS)机制,它利用一个适应性、有投入意识的门槛的启动器,在一次性时间步调中,同时将重新校正的SVA(REPSA) 模块,在高层次的IMSAS(H-SIMSAS) 大幅降低货币变压的货币变压模型处理中,在高的SIMLIMSIMS-S-SIMSIMSLSLSLSLS(SIMSLSLSLSAS) 上建立一个高的快速的快速的快速的模型和多级变压压压压压压压压压压压压压压的模型。


Article 143

Title@2025-07-10 (4): Learning Collective Variables from Time-lagged Generation

Title: Learning Collective Variables from Time-lagged Generation Kollektive Variablen aus der zeitverzögerten Generation lernen 时间滞后一代的学习集体变量 2507.07390v1

Authors (5): Seonghyun Park, Kiyoung Seong, Soojung Yang, Rafael Gómez-Bombarelli, Sungsoo Ahn

Rare events such as state transitions are difficult to observe directly with molecular dynamics simulations due to long timescales. Enhanced sampling techniques overcome this by introducing biases along carefully chosen low-dimensional features, known as collective variables (CVs), which capture the slow degrees of freedom. Machine learning approaches (MLCVs) have automated CV discovery, but existing methods typically focus on discriminating meta-stable states without fully encoding the detailed dynamics essential for accurate sampling. We propose TLC, a framework that learns CVs directly from time-lagged conditions of a generative model. Instead of modeling the static Boltzmann distribution, TLC models a time-lagged conditional distribution yielding CVs to capture the slow dynamic behavior. We validate TLC on the Alanine Dipeptide system using two CV-based enhanced sampling tasks: (i) steered molecular dynamics (SMD) and (ii) on-the-fly probability enhanced sampling (OPES), demonstrating equal or superior performance compared to existing MLCV methods in both transition path sampling and state discrimination.

由于时间尺度长,很难直接用分子动态模拟来观察诸如国家转型等罕见事件。强化的取样技术克服了这一点,引入了对精心选择的低维特征的偏差,即所谓的集体变量(CVs),以捕捉自由的慢度。机器学习方法(MLCVs)自动发现CV,但现有方法通常侧重于歧视元稳定状态,而不完全将精确取样所必需的详细动态编码。我们建议TLC,这是一个直接从一个基因模型的滞后条件下学习CV的框架。TLC不是模拟静态的波尔兹曼分布,而是模拟一个时间滞后的有条件分布,产生慢动态行为的CVs。我们用基于CV的两种强化采样任务验证了Alanine Dipeptide系统的TLC:(一) 引导分子动态(SMD) 和(二) 直流概率增强采样(OPES) ,在过渡路径采样和州歧视中显示与现有的刚果解放运动V方法相同或优异性。


Article 144

Title@2025-07-10 (4): ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction

Title: ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction ST-GRIT: Spatio-Temporal Graph Transformer für interne Eisschichtdicke Vorhersage ST-GRIT: 内部冰层厚度预测的时空图变异器 2507.07389v1

Authors (2): Zesheng Liu, Maryam Rahnemoonfar

Understanding the thickness and variability of internal ice layers in radar imagery is crucial for monitoring snow accumulation, assessing ice dynamics, and reducing uncertainties in climate models. Radar sensors, capable of penetrating ice, provide detailed radargram images of these internal layers. In this work, we present ST-GRIT, a spatio-temporal graph transformer for ice layer thickness, designed to process these radargrams and capture the spatiotemporal relationships between shallow and deep ice layers. ST-GRIT leverages an inductive geometric graph learning framework to extract local spatial features as feature embeddings and employs a series of temporal and spatial attention blocks separately to model long-range dependencies effectively in both dimensions. Experimental evaluation on radargram data from the Greenland ice sheet demonstrates that ST-GRIT consistently outperforms current state-of-the-art methods and other baseline graph neural networks by achieving lower root mean-squared error. These results highlight the advantages of self-attention mechanisms on graphs over pure graph neural networks, including the ability to handle noise, avoid oversmoothing, and capture long-range dependencies. Moreover, the use of separate spatial and temporal attention blocks allows for distinct and robust learning of spatial relationships and temporal patterns, providing a more comprehensive and effective approach.

雷达图像中内部冰层的厚度和变异性对于监测积雪、评估冰动态和减少气候模型的不确定性至关重要。雷达传感器能够穿透冰层,提供这些内部层的详细雷达图象。在这项工作中,我们提供ST-GRIT,这是一个用于冰层厚度的时空平面图变压器,旨在处理这些雷达图和捕捉浅层和深层冰层之间的短暂时空关系。ST-GRIT利用一个感应式几何图形学习框架,提取当地空间特征,作为嵌入特征,并使用一系列时间和空间关注区块,在两个层面都能够有效地分离出模型的远距离依赖性。对格陵兰冰层雷达图数据的实验性评估表明,ST-GRIT通过实现较低的底部平均误差,不断超越当前状态的方法和其他基线图形神经网络。这些结果突出表明,在纯图形神经网络上的图形中,包括处理噪音、避免过度移动和捕捉取长距离依赖性数据的能力,从而能够使用不同的空间和空间和空间依赖性关系,从而能够使用不同的空间和空间和空间分离式的系统。


Article 145

Title@2025-07-10 (4): GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction

Title: GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction GRIT: Graph Transformer für interne Eisschichtdicke Vorhersage GRIT: 内部冰层厚度预测的图形变形器 2507.07388v1

Authors (2): Zesheng Liu, Maryam Rahnemoonfar

Gaining a deeper understanding of the thickness and variability of internal ice layers in Radar imagery is essential in monitoring the snow accumulation, better evaluating ice dynamics processes, and minimizing uncertainties in climate models. Radar sensors, capable of penetrating ice, capture detailed radargram images of internal ice layers. In this work, we introduce GRIT, graph transformer for ice layer thickness. GRIT integrates an inductive geometric graph learning framework with an attention mechanism, designed to map the relationships between shallow and deeper ice layers. Compared to baseline graph neural networks, GRIT demonstrates consistently lower prediction errors. These results highlight the attention mechanism’s effectiveness in capturing temporal changes across ice layers, while the graph transformer combines the strengths of transformers for learning long-range dependencies with graph neural networks for capturing spatial patterns, enabling robust modeling of complex spatiotemporal dynamics.

更深入地了解雷达图像中内部冰层的厚度和多变性,对于监测积雪、更好地评估冰动态过程和尽量减少气候模型中的不确定性至关重要。雷达传感器能够穿透冰层,捕捉内部冰层的详细雷达图象。在这项工作中,我们引入了冰层厚度的图形变异器GRIT。GRIT结合了一个带注意机制的感应几何图学习框架,目的是绘制浅层和深层冰层之间的关系。与基线图形神经网络相比,GRIT显示的预测错误一直较低。这些结果突显了关注机制在捕捉冰层间时间变化方面的有效性,而图形变异器则将变异器的长程依赖力与图形神经网络结合起来,以捕捉空间模式,从而能够对复杂的波时动态进行稳健的建模。


Article 146

Title@2025-07-10 (4): HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Title: HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning HeLo:情感分布学习中带有标签关联的异变多模式融合 2507.06821v2

Authors (5): Chuhang Zheng, Chunwei Tian, Jie Wen, Daoqiang Zhang, Qi Zhu

Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.

近几年来,多模式情感的认知在人体-计算机互动(HCI)中起着重要作用,因而日益引起人们的注意。由于不同的离散情感可能同时存在,与单级情感情感识别相比,确定基本情感混合的情感分布学习(EDL)逐渐成为一种趋势。然而,现有的EDL方法在挖掘多种模式的异质性方面面临着挑战。此外,任意基本情感之间丰富的语义相关性没有得到充分利用。在本文中,我们提出一个多模式情感分布学习框架,名为HeLo,目的是在混合基本情感中充分探索多模式情感数据和标签相关性关系中的异质性和互补信息。具体地说,我们首先采用交叉意图将生理数据有效地结合在一起。然后,设计一个最佳运输(OT)基于异质的采矿模块,以挖掘生理和行为表现之间的互动和异质性。为了便利标签相关性学习,我们提出了一种可学习的标签,通过相关矩阵协调优化嵌入。最后,在可学习的标签嵌入式情感和标签感错交式关系矩阵中,将一个可理解的贴贴式的情感分布与现有两套式的情感感。通过新的标签式模式学习模式学习。


Article 147

Title@2025-07-10 (4): Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay

Title: Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay Online Continual Learning über Spiking Neuronal Networks mit Schlaf Enhanced Latent Replay 通过Spiking神经网络在线持续学习,并配有睡眠强化前端重播 2507.02901v2

Authors (5): Erliang Lin, Wenbin Luo, Wei Jia, Yu Chen, Shaofu Yang

Edge computing scenarios necessitate the development of hardware-efficient online continual learning algorithms to be adaptive to dynamic environment. However, existing algorithms always suffer from high memory overhead and bias towards recently trained tasks. To tackle these issues, this paper proposes a novel online continual learning approach termed as SESLR, which incorporates a sleep enhanced latent replay scheme with spiking neural networks (SNNs). SESLR leverages SNNs’ binary spike characteristics to store replay features in single bits, significantly reducing memory overhead. Furthermore, inspired by biological sleep-wake cycles, SESLR introduces a noise-enhanced sleep phase where the model exclusively trains on replay samples with controlled noise injection, effectively mitigating classification bias towards new classes. Extensive experiments on both conventional (MNIST, CIFAR10) and neuromorphic (NMNIST, CIFAR10-DVS) datasets demonstrate SESLR’s effectiveness. On Split CIFAR10, SESLR achieves nearly 30% improvement in average accuracy with only one-third of the memory consumption compared to baseline methods. On Split CIFAR10-DVS, it improves accuracy by approximately 10% while reducing memory overhead by a factor of 32. These results validate SESLR as a promising solution for online continual learning in resource-constrained edge computing scenarios.

电磁计算假设方案要求开发硬件高效在线持续学习算法,使之适应动态环境。然而,现有的算法总是有高记忆管理,偏向于最近培训的任务。为了解决这些问题,本文件建议采用名为SESLR的新颖的在线持续学习方法,即SESLR(SESLR),其中包括一个睡眠增强潜伏重放计划,与飞动神经网络(SNNIS)结合。SESLR(SESLR)利用SNNS的二进制峰顶特性将功能存储在单位的重播功能中,显著减少记忆管理。此外,在生物睡眠周期的启发下,SESLR(SESLR)引入了一个强化的睡眠阶段,在这个阶段,模型专门用受控的噪音注入来重新播放样本,有效地减轻对新班的分类偏向。关于常规(MIST,CIFAR10)和神经形态(NMNIST,CIFAR10-DVS(N)数据集的广泛实验显示了SES(NM)的效能。在S-RED(S-ralental reslical ial reslical)的模型中,通过不断学习S-rial-reslical 的S-resligal Expral 的S-ral 的模型,通过10)的S-rippilveal 的模型,提高S-resimal 的图像校验测算结果。


Article 148

Title@2025-07-10 (4): Unifews: You Need Fewer Operations for Efficient Graph Neural Networks

Title: Unifews: You Need Fewer Operations for Efficient Graph Neural Networks Unifews: Sie brauchen weniger Operationen für effiziente Graphen-Neural-Netzwerke Unifews: 高效图形神经网络需要更少操作 2403.13268v2

Authors (4): Ningyi Liao, Zihao Yu, Ruixiao Zeng, Siqiang Luo

Graph Neural Networks (GNNs) have shown promising performance, but at the cost of resource-intensive operations on graph-scale matrices. To reduce computational overhead, previous studies attempt to sparsify the graph or network parameters, but with limited flexibility and precision boundaries. In this work, we propose Unifews, a joint sparsification technique to unify graph and weight matrix operations and enhance GNN learning efficiency. The Unifews design enables adaptive compression across GNN layers with progressively increased sparsity, and is applicable to a variety of architectures with on-the-fly simplification. Theoretically, we establish a novel framework to characterize sparsified GNN learning in view of the graph optimization process, showing that Unifews effectively approximates the learning objective with bounded error and reduced computational overhead. Extensive experiments demonstrate that Unifews achieves efficiency improvements with comparable or better accuracy, including 10-20x matrix operation reduction and up to 100x acceleration on graphs up to billion-edge scale.

神经网络(GNNs)显示了有希望的业绩,但以图表尺度矩阵的资源密集型操作成本为代价。为了减少计算间接费用,以往的研究试图在图表或网络参数上加宽,但灵活性和精确度有限。在这项工作中,我们建议采用Unifews(Unifews)联合封闭技术,以统一图形和重量矩阵操作,提高GNN学习效率。Unifews设计使GNNT层的适应性压缩逐渐增加,并适用于各种在飞行上简化的建筑。理论上,我们建立了一个新的框架,为图形优化过程中的封闭式GNN学习定性,表明Unifews(Unifews)有效地接近学习目标,误差和减少计算间接费用。广泛的实验表明,Unifews(Unifews)以类似或更精确的方式实现了效率的提高,包括10-20x矩阵操作的减少和在10-20-20亿顶尖端的图表上达到100x加速度。


Article 149

Title@2025-07-10 (4): User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection

Title: User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection Benutzerbasierte sequentielle Modellierung mit Transformer-Encodern für Insider Threat Detection 以用户为基础的序列模型,使用变换器编码器进行内部威胁探测 2506.23446v2

Authors (2): Mohamed Elbasheer, Adewale Akinfaderin

Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.

内部威胁检测由于恶意行为者的核准地位和异常行为的微妙性而提出了独特的挑战。现有的机器学习方法往往将用户活动视为孤立事件,从而无法在用户行为中利用相继的依附关系。在本研究中,我们提出了基于用户的分类方法(UBS ) , 将 CERT 内部威胁数据集转换成结构化的时间序列, 适合深层相继建模。 我们为模拟无害用户活动配置了一个变换器编码器结构,并将其重建错误用作异常分数。这些分数随后使用三种未经监督的外部检测算法进行评估:一lass SVM(OCSVM)、地方外因(LOF)和Isolaration Forest(iest ) 。 在四个严格设计的测试组中,包括多个 CERT 数据集发布组合,我们的UBS- Transformormormation管道始终达到最新性能 — 特别是96.61%的准确性能、99.43%的回顾、96-38%的F1核心、9.00 % AUROC,以及非常低的负性负(0.0057)和假正面的系统内部威胁检测率的系统内部的系统测试。


Article 150

Title@2025-07-10 (4): BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Title: BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems BountyBench: Dollar-Impact von KI-Agenten-Angriffen und Verteidigern auf reale Cybersicherheitssysteme BuntyBuntyBunnench: AI代理攻击者和捍卫者对现实世界网络安全系统的美元影响 2505.15216v2

Authors (34): Andy K. Zhang, Joey Ji, Celeste Menders, Riya Dulepet, Thomas Qin, Ron Y. Wang, Junrong Wu, Kyleen Liao, Jiliang Li, Jinghan Hu, Sara Hong, Nardos Demilew, Shivatmica Murgai, Jason Tran, Nishka Kacheria, Ethan Ho, Denis Liu, Lauren McLane, Olivia Bruvik, Dai-Rong Han, Seungwoo Kim, Akhil Vyas, Cuiyuanxiu Chen, Ryan Li, Weiran Xu, Jonathan Z. Ye, Prerit Choudhary, Siddharth M. Bhatia, Vikram Sivashankar, Yuxuan Bao, Dawn Song, Dan Boneh, Daniel E. Ho, Percy Liang

AI agents have the potential to significantly alter the cybersecurity landscape. Here, we introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems. Instantiating this framework with BountyBench, we set up 25 systems with complex, real-world codebases. To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability). For Detect, we construct a new success indicator, which is general across vulnerability types and provides localized evaluation. We manually set up the environment for each system, including installing packages, setting up server(s), and hydrating database(s). We add 40 bug bounties, which are vulnerabilities with monetary awards of $10-$30,485, covering 9 of the OWASP Top 10 Risks. To modulate task difficulty, we devise a new strategy based on information to guide detection, interpolating from identifying a zero day to exploiting a specific vulnerability. We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1. Given up to three attempts, the top-performing agents are OpenAI Codex CLI: o3-high (12.5% on Detect, mapping to $3,720; 90% on Patch, mapping to $14,152), Custom Agent with Claude 3.7 Sonnet Thinking (67.5% on Exploit), and OpenAI Codex CLI: o4-mini (90% on Patch, mapping to $14,422). OpenAI Codex CLI: o3-high, OpenAI Codex CLI: o4-mini, and Claude Code are more capable at defense, achieving higher Patch scores of 90%, 90%, and 87.5%, compared to Exploit scores of 47.5%, 32.5%, and 57.5% respectively; while the custom agents are relatively balanced between offense and defense, achieving Exploit scores of 37.5-67.5% and Patch scores of 35-60%.

AI 代理商具有显著改变网络安全景观的潜力。 在这里, 我们引入了第一个框架, 以在不断演变的现实世界系统中捕捉攻击性和防御性网络安全性。 我们通过 BountyBench 来维护这个框架。 我们建立了25个系统, 拥有复杂的、 真实的代码库。 为了捕捉脆弱性生命周期, 我们定义了三种任务类型 : 检测( 发现新的脆弱性 ) , 开发( 开发特定的脆弱性 ) 和 Patch( 绘制特定的脆弱性 )。 为了检测, 我们建立了一个新的成功指标, 跨脆弱类型, 提供局部评估。 我们手动为每个系统设置了环境, 包括安装软件包, 安装服务器, 以及水晶体数据库数据库。 我们增加了40个错误元, 这是货币奖项的10 - 30, 485, 覆盖 OWASP Top 10 风险的9 。 为了调节任务难度, 我们设计了一个基于信息的新的战略, 指导检测, 确定一个零天, 用于特定的脆弱性 。 我们评估8个代理商 : C 代码, OS 代码, 代码 代码 代码 代码, C- 代码 代码 和 C- PTLI 3, IM- hil4, 服务器, 运行 的 O.


Article 151

Title@2025-07-10 (4): A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines

Title: A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines Ein Multi-Granularität überwacht Kontrastive Rahmen für das Bleiben nützlicher Lebensvorhersage von Aero-Motoren 空气-发动机剩余使用寿命预测多族监督多族监督违规框架 2411.00461v3

Authors (6): Zixuan He, Ziqian Kong, Zhengyu Chen, Yuling Zhan, Zijun Que, Zhengguo Xu

Accurate remaining useful life (RUL) predictions are critical to the safe operation of aero-engines. Currently, the RUL prediction task is mainly a regression paradigm with only mean square error as the loss function and lacks research on feature space structure, the latter of which has shown excellent performance in a large number of studies. This paper develops a multi-granularity supervised contrastive (MGSC) framework from plain intuition that samples with the same RUL label should be aligned in the feature space, and address the problems of too large minibatch size and unbalanced samples in the implementation. The RUL prediction with MGSC is implemented on using the proposed multi-phase training strategy. This paper also demonstrates a simple and scalable basic network structure and validates the proposed MGSC strategy on the CMPASS dataset using a convolutional long short-term memory network as a baseline, which effectively improves the accuracy of RUL prediction.

准确的剩余使用寿命(RUL)预测对于气动发动机的安全运行至关重要,目前,RUL预测任务主要是一个回归模式,仅以损失功能为平均方差,缺乏对地物空间结构的研究,后者在大量研究中表现出优异的性能。本文从普通直觉中发展出一个多角度监督的对比框架,即具有同一RUL标签的样品应在特性空间中进行校准,并解决执行中超大小批尺寸和不平衡样品的问题。与MGSC的RUL预测是利用拟议的多阶段培训战略进行的。本文还展示了一个简单和可扩缩的基本网络结构,并验证了CMPASS数据集的MGSC拟议战略,将一个长长长的短期记忆网络作为基线,有效地提高RUL预测的准确性。


Article 152

Title@2025-07-10 (4): Bradley-Terry and Multi-Objective Reward Modeling Are Complementary

Title: Bradley-Terry and Multi-Objective Reward Modeling Are Complementary Bradley-Terry und Multi-Objective Reward Modeling sind komplementär Bradley-Terriy和多目标奖励模型具有补充作用 2507.07375v1

Authors (13): Zhiwei Zhang, Hui Liu, Xiaomin Li, Zhenwei Dai, Jingying Zeng, Fali Wang, Minhua Lin, Ramraj Chandradevan, Zhen Li, Chen Luo, Xianfeng Tang, Qi He, Suhang Wang

Reward models trained on human preference data have demonstrated strong effectiveness in aligning Large Language Models (LLMs) with human intent under the framework of Reinforcement Learning from Human Feedback (RLHF). However, RLHF remains vulnerable to reward hacking, where the policy exploits imperfections in the reward function rather than genuinely learning the intended behavior. Although significant efforts have been made to mitigate reward hacking, they predominantly focus on and evaluate in-distribution scenarios, where the training and testing data for the reward model share the same distribution. In this paper, we empirically show that state-of-the-art methods struggle in more challenging out-of-distribution (OOD) settings. We further demonstrate that incorporating fine-grained multi-attribute scores helps address this challenge. However, the limited availability of high-quality data often leads to weak performance of multi-objective reward functions, which can negatively impact overall performance and become the bottleneck. To address this issue, we propose a unified reward modeling framework that jointly trains Bradley–Terry (BT) single-objective and multi-objective regression-based reward functions using a shared embedding space. We theoretically establish a connection between the BT loss and the regression objective and highlight their complementary benefits. Specifically, the regression task enhances the single-objective reward function’s ability to mitigate reward hacking in challenging OOD settings, while BT-based training improves the scoring capability of the multi-objective reward function, enabling a 7B model to outperform a 70B baseline. Extensive experimental results demonstrate that our framework significantly improves both the robustness and the scoring performance of reward models.

人类偏好数据培训的奖赏模型在将大语言模型(LLMS)与人的意图结合到 “ 从人类反馈学习强化 “ (RLHF)的框架内显示出了强大的效果。然而,RLHF仍然易受黑客奖赏的伤害,因为政策利用的是奖赏功能中的不完善,而不是真正学习预期行为。虽然为减少奖赏黑客而作出了巨大努力,但它们主要侧重于和评价分配情景,奖赏模型的培训和测试数据分布相同。在本文中,我们从经验上表明,在更具挑战性的分配外(OOOD)设置中,最先进的方法在更具有挑战性地挣扎。我们进一步表明,纳入精细的多属性分数有助于应对这一挑战。然而,高质量数据的提供往往导致多目标奖赏功能表现不佳,这可能会对总体业绩产生负面影响,并成为瓶颈。为了解决这一问题,我们提议了一个统一的奖赏模型框架,以联合培训基于布拉德-T(BT)的单一目标和多目标的退缩奖赏模型。我们进一步表明,采用共同的嵌式多级评分,我们进一步的多级评分评分成绩框架有助于应对这一挑战。我们共同的B级的递增的递反的轨道的成绩,同时提升的成绩,同时提升的B级的轨道功能将提升的轨道功能将提升的B的轨绩的成绩框架将提升的B的轨道上的递增税益性能提升的轨道的成绩,同时提升的轨损功能将提升的轨道的轨道的轨道的轨道的成绩,同时提升了B的功能将提升了B的轨道上的功能将提升到更能,同时提升了B的稳定性的轨道的轨道,提升了B的轨道上变。


Article 153

Title@2025-07-10 (4): Prediction-Augmented Mechanism Design for Weighted Facility Location

Title: Prediction-Augmented Mechanism Design for Weighted Facility Location Voraussichtlicher Mechanismus für den Standort der gewichteten Fazilität 加权设施位置设计 2507.06509v2

Authors (2): Yangguang Shi, Zhenyu Xue

Facility location is fundamental in operations research, mechanism design, and algorithmic game theory, with applications ranging from urban infrastructure planning to distributed systems. Recent research in this area has focused on augmenting classic strategyproof mechanisms with predictions to achieve an improved performance guarantee against the uncertainty under the strategic environment. Previous work has been devoted to address the trade-off obstacle of balancing the consistency (near-optimality under accurate predictions) and robustness (bounded inefficiency under poor predictions) primarily in the unweighted setting, assuming that all agents have the same importance. However, this assumption may not be true in some practical scenarios, leading to research of weighted facility location problems. The major contribution of the current work is to provide a prediction augmented algorithmic framework for balancing the consistency and robustness over strategic agents with non-uniform weights. In particular, through a reduction technique that identifies a subset of \emph{representative} instances and maps the other given locations to the representative ones, we prove that there exists a \emph{strategyproof} mechanism achieving a bounded consistency guarantee of $\frac{\sqrt{(1+c)^2W^2{\min}+(1-c)^2W^2{\max}}}{(1+c)W_{\min}}$ and a bounded robustness guarantee of $\frac{\sqrt{(1-c)^2W^2{\min}+(1+c)^2W^2{\max}}}{(1-c)W_{\min}}$ in weighted settings, where $c$ can be viewed as a parameter to make a trade-off between the consistency and robustness and $W_{\min}$ and $W_{\max}$ denote the minimum and maximum agents’ weight. We also proved that there is no strategyproof deterministic mechanism that reach $1$-consistency and $O\left( n \cdot \frac{W_{\max}}{W_{\min}} \right)$-robustness in weighted FLP, even with fully predictions of all agents.

设施位置是操作研究、 机制设计和算法游戏理论的基础, 其应用范围从城市基础设施规划到分布式系统。 最近对这一领域的研究侧重于增加典型的防战略机制, 并进行预测, 以便在战略环境中实现更好的绩效保障。 以前的工作致力于解决平衡一致性( 准确预测下的接近最佳性) 和稳健性( 低预测下的低效率) 的权衡障碍, 主要是在未加权环境下, 假设所有代理商都具有同等重要性 。 然而, 在某些实际情景中, 这一假设可能并不真实, 导致对加权设施位置问题的研究。 目前工作的主要贡献是提供预测增强的算法框架, 以非统一重量来平衡战略代理的连贯性和稳健性。 特别是通过一种削减技术, 找出代表的一组 emph{min} 实例并绘制其他给定地点, 我们证明存在一个甚至有 emph{ irf{ irmati) 最低( =%% Q) 和 irma_ irma2 战略在 $ $ (xxxxx) axxxxxxxxx) axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Article 154

Title@2025-07-10 (4): Atherosclerosis through Hierarchical Explainable Neural Network Analysis

Title: Atherosclerosis through Hierarchical Explainable Neural Network Analysis Atherosklerose durch hierarchische erklärende neurale Netzwerkanalyse 通过可解释的神经网络分析,通过高层次解析神经网络分析,实现天体硬化 2507.07373v1

Authors (10): Irsyad Adam, Steven Swee, Erika Yilin, Ethan Ji, William Speier, Dean Wang, Alex Bui, Wei Wang, Karol Watson, Peipei Ping

In this work, we study the problem pertaining to personalized classification of subclinical atherosclerosis by developing a hierarchical graph neural network framework to leverage two characteristic modalities of a patient: clinical features within the context of the cohort, and molecular data unique to individual patients. Current graph-based methods for disease classification detect patient-specific molecular fingerprints, but lack consistency and comprehension regarding cohort-wide features, which are an essential requirement for understanding pathogenic phenotypes across diverse atherosclerotic trajectories. Furthermore, understanding patient subtypes often considers clinical feature similarity in isolation, without integration of shared pathogenic interdependencies among patients. To address these challenges, we introduce ATHENA: Atherosclerosis Through Hierarchical Explainable Neural Network Analysis, which constructs a novel hierarchical network representation through integrated modality learning; subsequently, it optimizes learned patient-specific molecular fingerprints that reflect individual omics data, enforcing consistency with cohort-wide patterns. With a primary clinical dataset of 391 patients, we demonstrate that this heterogeneous alignment of clinical features with molecular interaction patterns has significantly boosted subclinical atherosclerosis classification performance across various baselines by up to 13% in area under the receiver operating curve (AUC) and 20% in F1 score. Taken together, ATHENA enables mechanistically-informed patient subtype discovery through explainable AI (XAI)-driven subnetwork clustering; this novel integration framework strengthens personalized intervention strategies, thereby improving the prediction of atherosclerotic disease progression and management of their clinical actionable outcomes.

在这项工作中,我们通过开发一个等级图形神经网络框架来利用患者的两种典型模式:组群内的临床特征和个体患者特有的分子数据,研究与临床亚性激素硬化个人分类有关的问题。当前基于图形的疾病分类方法检测了患者特有的分子指纹,但对于组群的特征缺乏一致性和理解,而这是理解不同神性加速轨道中病原体细胞型的基本要求。此外,了解患者亚性型往往认为临床特征与隔离相似,而病人之间没有共同的致病性相互依存性。为了应对这些挑战,我们引入了AHENA:通过可解释的神经网络分析,通过综合模式学习构建新型的等级网络代表;随后,它优化了反映个人食性数据的患者特定分子指纹的学习,确保了与全组织可变性模式的一致性。在391个病人的初级临床数据集中,我们展示了临床特征与分子间互动模式的异性一致性,从而大大推进了个人内分级诊断型内分级的内分级分析,1号内分级的内分级性内分级分析。


Article 155

Title@2025-07-10 (4): Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification

Title: Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification Datengesteuerte kinematische Modellierung in Soft Robots: Systemidentifikation und Unsicherheitsquantifizierung 软机器人中数据驱动的虚拟模型:系统识别和不确定性量化 2507.07370v1

Authors (4): Zhanhong Jiang, Dylan Shah, Hsin-Jung Yang, Soumik Sarkar

Precise kinematic modeling is critical in calibration and controller design for soft robots, yet remains a challenging issue due to their highly nonlinear and complex behaviors. To tackle the issue, numerous data-driven machine learning approaches have been proposed for modeling nonlinear dynamics. However, these models suffer from prediction uncertainty that can negatively affect modeling accuracy, and uncertainty quantification for kinematic modeling in soft robots is underexplored. In this work, using limited simulation and real-world data, we first investigate multiple linear and nonlinear machine learning models commonly used for kinematic modeling of soft robots. The results reveal that nonlinear ensemble methods exhibit the most robust generalization performance. We then develop a conformal kinematic modeling framework for soft robots by utilizing split conformal prediction to quantify predictive position uncertainty, ensuring distribution-free prediction intervals with a theoretical guarantee.

在软机器人的校准和控制器设计中,精密的动态建模至关重要,然而,由于其高度非线性和复杂的行为,仍是一个具有挑战性的问题。为了解决这一问题,为非线性动态建模提出了许多数据驱动机学习方法。然而,这些模型存在预测不确定性,可能会对建模准确性产生消极影响,软机器人的动能建模的不确定性量化也未得到充分探讨。在这项工作中,我们首先使用有限的模拟和现实世界数据,调查了用于软机器人动能建模常用的多线性和非线性机器学习模型。结果显示,非线性混合方法展示了最强的通用性性表现。我们随后开发了软机器人的符合性动态建模框架,通过使用分裂的相容预测来量化预测性定位不确定性,确保以理论保证的方式进行无分布的预测间隔。


Article 156

Title@2025-07-10 (4): A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Title: A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning Eine kryptografische Perspektive auf Mitigation vs. Detection in Machine Learning 关于减缓与机械学习中的探测的加密视角 2504.20310v2

Authors (2): Greg Gluch, Shafi Goldwasser

In this paper, we initiate a cryptographically inspired theoretical study of detection versus mitigation of adversarial inputs produced by attackers on Machine Learning algorithms during inference time. We formally define defense by detection (DbD) and defense by mitigation (DbM). Our definitions come in the form of a 3-round protocol between two resource-bounded parties: a trainer/defender and an attacker. The attacker aims to produce inference-time inputs that fool the training algorithm. We define correctness, completeness, and soundness properties to capture successful defense at inference time while not degrading (too much) the performance of the algorithm on inputs from the training distribution. We first show that achieving DbD and achieving DbM are equivalent for ML classification tasks. Surprisingly, this is not the case for ML generative learning tasks, where there are many possible correct outputs for each input. We show a separation between DbD and DbM by exhibiting two generative learning tasks for which it is possible to defend by mitigation but it is provably impossible to defend by detection. The mitigation phase uses significantly less computational resources than the initial training algorithm. In the first learning task we consider sample complexity as the resource and in the second the time complexity. The first result holds under the assumption that the Identity-Based Fully Homomorphic Encryption (IB-FHE), publicly-verifiable zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK), and Strongly Unforgeable Signatures exist. The second result assumes the existence of Non-Parallelizing Languages with Average-Case Hardness (NPL) and Incrementally-Verifiable Computation (IVC) and IB-FHE.

在本文中,我们开始对机器学习算法攻击者在推算期间产生的对抗性投入的检测和减缓进行由加密启发的理论研究。我们正式定义了通过检测(DbD)和减缓(DbM)进行防御。我们的定义以两个资源受资源约束的当事方(一个训练者/Defer和攻击者)之间3轮协议的形式出现。攻击者的目的是产生欺骗培训算法的推论时间投入。我们定义了正确性、完整性和正确性能特性,以便在推论时间捕成功防御,同时不贬低(太多)培训分布中投入的算法的性能。我们首先显示,DbDD和DbM的实现相当于3轮协议。我们发现,ML的变异性学习任务中,DbDbD和DbM的精确性能,我们展示了两个可以通过缓解来维护的归正性学习任务,但首先无法进行辨别。我们首先通过检测, 缓解阶段的变异性阶段使用较低的变性水平的变性数据, 将SB的变性阶段使用较低的计算结果。


Article 157

Title@2025-07-10 (4): Platform for Representation and Integration of multimodal Molecular Embeddings

Title: Platform for Representation and Integration of multimodal Molecular Embeddings Plattform für Repräsentation und Integration multimodaler molekularer Einbettungen 多式联运分子嵌入的 代表性和一体化平台 2507.07367v1

Authors (11): Erika Yilin Zheng, Yu Yan, Baradwaj Simha Sankar, Ethan Ji, Steven Swee, Irsyad Adam, Ding Wang, Alexander Russell Pelletier, Alex Bui, Wei Wang, Peipei Ping

Existing machine learning methods for molecular (e.g., gene) embeddings are restricted to specific tasks or data modalities, limiting their effectiveness within narrow domains. As a result, they fail to capture the full breadth of gene functions and interactions across diverse biological contexts. In this study, we have systematically evaluated knowledge representations of biomolecules across multiple dimensions representing a task-agnostic manner spanning three major data sources, including omics experimental data, literature-derived text data, and knowledge graph-based representations. To distinguish between meaningful biological signals from chance correlations, we devised an adjusted variant of Singular Vector Canonical Correlation Analysis (SVCCA) that quantifies signal redundancy and complementarity across different data modalities and sources. These analyses reveal that existing embeddings capture largely non-overlapping molecular signals, highlighting the value of embedding integration. Building on this insight, we propose Platform for Representation and Integration of multimodal Molecular Embeddings (PRISME), a machine learning based workflow using an autoencoder to integrate these heterogeneous embeddings into a unified multimodal representation. We validated this approach across various benchmark tasks, where PRISME demonstrated consistent performance, and outperformed individual embedding methods in missing value imputations. This new framework supports comprehensive modeling of biomolecules, advancing the development of robust, broadly applicable multimodal embeddings optimized for downstream biomedical machine learning applications.

分子嵌入(如基因)的现有机器学习方法仅限于特定任务或数据模式,限制了其在狭窄领域的效力,因此,这些方法未能捕捉基因功能和不同生物背景之间相互作用的全方位的基因功能和相互作用。在本研究中,我们系统地评价了生物球跨多个层面的知识表现,这代表着一个任务和不可知的方式,涵盖三大数据源,包括模拟实验数据、源自文献的文本数据和基于知识图表的表达方式。为了区分有意义的生物信号与偶然关联,我们设计了一个经调整的Singulal Victor Canonicolal Connational 分析(SVCCA)变量,以量化不同数据模式和来源之间的信号冗余和互补性。这些分析显示,现有的嵌入在很大程度上是非重叠分子信号,突出了嵌入整合的价值。我们根据这一洞察,提出了多式联运模型的体现和整合平台,一个基于机械学习的工作流程,利用自动编码将这些混合嵌入到统一的多式联运中。我们在各种基准任务中验证了这一方法,在生物-MIME的深度模型化中展示了单个学习模式的不断演进式模型。


Article 158

Title@2025-07-10 (4): Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning

Title: Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning Zielorientiertes sequentielles Bayesian Experimental Design für das kausale Lernen 以目标为导向、按顺序排列的Bayesian 因果关系学习实验设计 2507.07359v1

Authors (4): Zheyu Zhang, Jiayuan Dong, Jie Liu, Xun Huan

We present GO-CBED, a goal-oriented Bayesian framework for sequential causal experimental design. Unlike conventional approaches that select interventions aimed at inferring the full causal model, GO-CBED directly maximizes the expected information gain (EIG) on user-specified causal quantities of interest, enabling more targeted and efficient experimentation. The framework is both non-myopic, optimizing over entire intervention sequences, and goal-oriented, targeting only model aspects relevant to the causal query. To address the intractability of exact EIG computation, we introduce a variational lower bound estimator, optimized jointly through a transformer-based policy network and normalizing flow-based variational posteriors. The resulting policy enables real-time decision-making via an amortized network. We demonstrate that GO-CBED consistently outperforms existing baselines across various causal reasoning and discovery tasks-including synthetic structural causal models and semi-synthetic gene regulatory networks-particularly in settings with limited experimental budgets and complex causal mechanisms. Our results highlight the benefits of aligning experimental design objectives with specific research goals and of forward-looking sequential planning.

我们提出GO-CBED框架,这是针对目标的Bayesian框架,用于顺序因果关系实验设计。与选择旨在推断全因果模型的干预措施的常规方法不同,GO-CBED直接最大限度地扩大用户指定因果利益量的预期信息收益(EIG),从而能够更有针对性和高效率地进行实验。这个框架既非模拟,优化于整个干预序列,也只针对与因果查询有关的模式方面。为了解决精确的EIG计算是否具有吸引力的问题,我们引入了一个变式低约束估计器,通过以变压器为基础的政策网络和流基变动后继器实现最佳化。由此产生的政策使得通过摊分化网络实时决策成为可能。我们证明,GO-CBED一贯超越各种因果推理和发现任务的现有基线,包括合成因果模型和半合成基因管理网络,特别是在实验预算和复杂因果机制有限的情况下。我们的结果突出了将实验设计目标与具体研究目标和前瞻性连续规划相结合的好处。


Article 159

Title@2025-07-10 (4): Learning from positive and unlabeled examples -Finite size sample bounds

Title: Learning from positive and unlabeled examples -Finite size sample bounds Aus positiven und unmarkierten Beispielen lernen -Finite-Size-Probengrenzen 从正面和未贴标签的例子中学习 - 微小大小抽样范围 2507.07354v1

Authors (2): Farnam Mansouri, Shai Ben-David

PU (Positive Unlabeled) learning is a variant of supervised classification learning in which the only labels revealed to the learner are of positively labeled instances. PU learning arises in many real-world applications. Most existing work relies on the simplifying assumptions that the positively labeled training data is drawn from the restriction of the data generating distribution to positively labeled instances and/or that the proportion of positively labeled points (a.k.a. the class prior) is known apriori to the learner. This paper provides a theoretical analysis of the statistical complexity of PU learning under a wider range of setups. Unlike most prior work, our study does not assume that the class prior is known to the learner. We prove upper and lower bounds on the required sample sizes (of both the positively labeled and the unlabeled samples).

PU(积极无标签)学习是监督分类学习的一种变体,其中向学习者透露的唯一标签是贴有正面标签的事例。 PU学习在许多现实世界的应用中产生。大多数现有工作依赖于简化的假设,即贴有正面标签的培训数据是从数据发布限制中提取的,改为贴有正面标签的事例和(或)正面标签的点比例(a.k.a.先前的类别)是学习者最熟悉的。本文从理论上分析了在更广泛的设置下学习PU学习的统计复杂性。与大多数先前的工作不同,我们的研究并不假定学习者了解先前的类别。我们在所需的样本大小(贴有正面标签的样本和未贴标签的样本)上下划线。


Article 160

Title@2025-07-10 (4): Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience

Title: Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience Mehrstufige MD-Workflows mit maschinellem Lernen: Die Mini-MuMMI-Erfahrung 由学习驱动的机械式学习驱动的多规模MD工作流程:微型MIMI经验 2507.07352v1

Authors (11): Loïc Pottier, Konstantia Georgouli, Timothy S. Carpenter, Fikret Aydin, Jeremy O. B. Tempkin, Dwight V. Nissley, Frederick H. Streitz, Thomas R. W. Scogland, Peer-Timo Bremer, Felice C. Lightstone, Helgi I. Ingólfsson

Computational models have become one of the prevalent methods to model complex phenomena. To accurately model complex interactions, such as detailed biomolecular interactions, scientists often rely on multiscale models comprised of several internal models operating at difference scales, ranging from microscopic to macroscopic length and time scales. Bridging the gap between different time and length scales has historically been challenging but the advent of newer machine learning (ML) approaches has shown promise for tackling that task. Multiscale models require massive amounts of computational power and a powerful workflow management system. Orchestrating ML-driven multiscale studies on parallel systems with thousands of nodes is challenging, the workflow must schedule, allocate and control thousands of simulations operating at different scales. Here, we discuss the massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), a multiscale workflow management infrastructure, that can orchestrate thousands of molecular dynamics (MD) simulations operating at different timescales, spanning from millisecond to nanosecond. More specifically, we introduce a novel version of MuMMI called “mini-MuMMI”. Mini-MuMMI is a curated version of MuMMI designed to run on modest HPC systems or even laptops whereas MuMMI requires larger HPC systems. We demonstrate mini-MuMMI utility by exploring RAS-RAF membrane interactions and discuss the different challenges behind the generalization of multiscale workflows and how mini-MuMMI can be leveraged to target a broader range of applications outside of MD and RAS-RAF interactions.

精确地模拟复杂的相互作用,例如详细的生物分子相互作用,科学家往往依赖由从微观到宏观的长度和时间尺度等不同尺度运行的若干内部模型组成的多尺度模型。缩小不同时间和长度尺度之间的差距历来具有挑战性,但新机器学习(ML)方法的出现显示了应对这项任务的希望。多规模模型需要大量的计算力和强大的工作流程管理系统。用数千个节点对平行系统进行由ML驱动的多尺度研究具有挑战性,工作流程必须安排、分配和控制不同尺度运行的数千个模拟。在这里,我们讨论了大规模平行的多尺度机器模拟基础设施(MIMMI),一个多规模的工作流程管理基础设施,可以在不同的时间尺度上协调数千种分子动态模拟,从毫秒到毫秒不等。更具体地说,我们引入了名为“MIMMI-MI”的新型多尺度研究。MIMI的小型和MIMIMIMIMM系统需要更大规模地展示MIMIMIMA系统,而MIMI的小型和小型MIMIML系统则需要我们小规模的小型版本。


Article 161

Title@2025-07-10 (4): Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts

Title: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts Zero-Shot-Context-Verallgemeinerung in der Verstärkung Lernen aus wenigen Trainingskontexten 从少见的培训背景中加强学习的零零零片背景概括化 2507.07348v1

Authors (3): James Chapman, Kedar Karhadkar, Guido Montufar

Deep reinforcement learning (DRL) has achieved remarkable success across multiple domains, including competitive games, natural language processing, and robotics. Despite these advancements, policies trained via DRL often struggle to generalize to evaluation environments with different parameters. This challenge is typically addressed by training with multiple contexts and/or by leveraging additional structure in the problem. However, obtaining sufficient training data across diverse contexts can be impractical in real-world applications. In this work, we consider contextual Markov decision processes (CMDPs) with transition and reward functions that exhibit regularity in context parameters. We introduce the context-enhanced Bellman equation (CEBE) to improve generalization when training on a single context. We prove both analytically and empirically that the CEBE yields a first-order approximation to the Q-function trained across multiple contexts. We then derive context sample enhancement (CSE) as an efficient data augmentation method for approximating the CEBE in deterministic control environments. We numerically validate the performance of CSE in simulation environments, showcasing its potential to improve generalization in DRL.

深入强化学习(DRL)在多个领域取得了显著成功,包括竞争性游戏、自然语言处理和机器人。尽管取得了这些进步,通过DRL培训的政策往往难以向具有不同参数的评价环境推广。这一挑战通常通过多种背景的培训和(或)在问题中利用额外结构来解决。然而,在现实世界应用中,获得不同背景的足够培训数据可能不切实际。在这项工作中,我们考虑到具有在环境参数中表现出规律性的过渡和奖励功能的背景Markov决策过程。我们引入了环境强化的贝尔曼方程式(CEBE),以便在单一背景下的培训中改进通用性。我们从分析和实践上证明,CEBE在多个背景下的“功能”具有第一阶近似值。然后我们从环境样本中获取一种有效的数据增强方法,用于在确定性控制环境中对CEBE进行适应。我们用数字验证CSE在模拟环境中的表现,展示其改进DRL总体化的潜力。


Article 162

Title@2025-07-10 (4): It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Title: It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung 很难正常:噪音对结构-不可计量估计的影响 2507.02275v2

Authors (3): Jikai Jin, Lester Mackey, Vasilis Syrgkanis

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

结构- 不可知性因果推论研究 如何很好地估计了黑盒机器对骚扰功能的治疗效果, 黑盒机器对骚扰功能的学习估计( 如混乱者对治疗和结果的影响 ) 。 在此, 我们发现答案以令人惊讶的方式取决于治疗噪音的分布。 聚焦于部分线性模型\ citet{ robinson1988root} , 我们首先显示, 广泛采用的双机学习( DML) 估计值对于高斯治疗噪音来说是最小的速率- 最佳的, 解决了一个公开的骚扰功能问题( 比如混乱者对治疗和结果的影响 ) 。 同时, 对于独立的非加萨治疗噪音, 我们发现DML 总是不优化, 通过建立具有更高调度的干扰误差的新的实用程序。 这些程序使用结构- 不可知性累积性估测器来达到 $- 的对核糖分值的灵敏度差值, 只要$( +1) makey- mest comme- complain adationalimestational adviewal imal impalimalimalimalimpalimpalimpal impalimpal impalimpal impalimpalimpalimpal impalimpalimpalimpalimpalimpalimpalimpal impalimpal impalimpalimpalimpalimpalimpal impalpalimationsess 需要, 我们。


Article 163

Title@2025-07-10 (4): Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures

Title: Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures Viel mehr als die Summe ihrer Teile: Von statistischen zu strukturellen Mischungen 超出其部分总和:从统计到结构混合 2507.07343v1

Authors (1): James P. Crutchfield

We show that mixtures comprised of multicomponent systems typically are much more structurally complex than the sum of their parts; sometimes, infinitely more complex. We contrast this with the more familiar notion of statistical mixtures, demonstrating how statistical mixtures miss key aspects of emergent hierarchical organization. This leads us to identify a new kind of structural complexity inherent in multicomponent systems and to draw out broad consequences for system ergodicity.

我们发现,由多构件系统构成的混合物通常比其各部分的总和结构复杂得多;有时甚至更为复杂。 我们把它与更熟悉的统计混合物概念相对应,表明统计混合物如何忽略了新兴等级组织的关键方面。 这导致我们发现多构件系统固有的一种新的结构复杂性,并勾画出系统遗传性的广泛后果。


Article 164

Title@2025-07-09 (3): Adversarial Defenses via Vector Quantization

Title: Adversarial Defenses via Vector Quantization Adversariale Abwehr durch Vector Quantization 通过矢量定量的矢量反向防御 2305.13651v2

Authors (2): Zhiyi Dong, Yongyi Mao

Adversarial attacks pose significant challenges to the robustness of modern deep neural networks in computer vision, and defending these networks against adversarial attacks has attracted intense research efforts. Among various defense strategies, preprocessing-based defenses are practically appealing since there is no need to train the network under protection. However, such approaches typically do not achieve comparable robustness as other methods such as adversarial training. In this paper, we propose a novel framework for preprocessing-based defenses, where a vector quantizer is used as a preprocessor. This framework, inspired by and extended from Randomized Discretization (RandDisc), is theoretically principled by rate-distortion theory: indeed, RandDisc may be viewed as a scalar quantizer, and rate-distortion theory suggests that such quantization schemes are inferior to vector quantization. In our framework, the preprocessing vector quantizer treats the input image as a collection of patches and finds a set of representative patches based on the patch distributions; each original patch is then modified according to the representative patches close to it. We present two lightweight defenses in this framework, referred to as patched RandDisc (pRD) and sliding-window RandDisc (swRD), where the patches are disjoint in the former and overlapping in the latter. We show that vector-quantization-based defenses have certifiable robust accuracy and that pRD and swRD demonstrate state-of-the-art performances, surpassing RandDisc by a large margin. Notably, the proposed defenses possess the obfuscated gradients property. Our experiments however show that pRD and swRD remain effective under the STE and EOT attacks, which are designed specifically for defenses with gradient obfuscation. …

Adversarial攻击对计算机视觉中现代深神经网络的稳健性提出了重大挑战,捍卫这些网络抵御对抗性攻击引起了大量的研究努力。在各种防御战略中,基于预处理的防御实际上具有吸引力,因为不需要对网络进行受保护的培训。然而,这些方法通常没有像对抗性训练等其他方法那样实现类似的稳健性。在本文中,我们提出一个基于预处理的防御新框架,其中将矢量量量量量量用作预处理器。这个框架受随机变异(RandDisc)(Rand Disc)的启发和扩展,从理论上讲,这些网络的准确性能具有原则性能:事实上,Rand Disec(Rand Disr)可以被看成一个标定的二次缩压式变压式变压器,而在RDR(R)中,我们展示了两种正压式变压式变压式的变压式变压式变压式变压式变压式变压式。


Article 165

Title@2025-07-09 (3): Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset

Title: Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset Benchmarking der Warteliste Mortalitätsvorhersage in der Herztransplantation durch Time-to-Event-Modellierung mit neuem Langzeit-UNOS-Datensatz 利用新的纵向UNOS数据集,利用从时间到晚上的建模,制定基准的等待名单 心脏移植中死亡率预测 2507.07339v1

Authors (5): Yingtao Luo, Reza Skandari, Carlos Martinez, Arman Kilic, Rema Padman

Decisions about managing patients on the heart transplant waitlist are currently made by committees of doctors who consider multiple factors, but the process remains largely ad-hoc. With the growing volume of longitudinal patient, donor, and organ data collected by the United Network for Organ Sharing (UNOS) since 2018, there is increasing interest in analytical approaches to support clinical decision-making at the time of organ availability. In this study, we benchmark machine learning models that leverage longitudinal waitlist history data for time-dependent, time-to-event modeling of waitlist mortality. We train on 23,807 patient records with 77 variables and evaluate both survival prediction and discrimination at a 1-year horizon. Our best model achieves a C-Index of 0.94 and AUROC of 0.89, significantly outperforming previous models. Key predictors align with known risk factors while also revealing novel associations. Our findings can support urgency assessment and policy refinement in heart transplant decision making.

2018年以来,器官共享联合网络收集的纵向病人、捐助者和器官数据数量不断增加,因此,人们越来越有兴趣采用分析方法,在器官可用时支持临床决策。在这项研究中,我们用纵向等待名单历史数据作为基准,用于根据时间、时间和事件对等待名单死亡率进行模拟。我们对23 807个病人记录进行了77个变量的培训,并在1年的视野内评价生存预测和歧视。我们的最佳模型取得了0.94和0.89的C-Index和AUROC,大大超过以往的模型。关键预测器与已知的风险因素相一致,同时也揭示了新的协会。我们的调查结果可以支持对心脏移植决策的紧迫性评估和政策改进。


Article 166

Title@2025-07-09 (3): Bayesian Double Descent

Title: Bayesian Double Descent Bayesische Doppelabfahrt Bayesian 双世后代 2507.07338v1

Authors (2): Nick Polson, Vadim Sokolov

Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends – the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam’s razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.

我们的目标是从巴伊西亚角度观察双向下降。 深神经网络等超参数模型在其风险特性中具有有趣的再降解特性。 这是机器学习中最近出现的现象, 并且是许多研究的主题。 随着模型的复杂性增加, 有一个U形区域与传统的偏向偏差取舍相对应, 但是当参数数量等于观察次数, 而模型变成内插时, 风险会变得无限, 然后在超参数区域里, 它会重新降解 – – 双重下降效应。 我们显示这是一种自然的巴伊西亚人解释。 此外, 我们表明它与巴伊西亚模型拥有的传统奥卡姆剃刀没有冲突, 因为他们倾向于尽可能地选择更简单的模型。 我们用Bayesian模型在神经网络中选择的例子来说明这个方法。 最后, 我们得出了未来研究的方向。


Article 167

Title@2025-07-09 (3): Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning

Title: Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Leveraging Manifold Einbettungen für verbesserte Graph Transformer Darstellungen und Lernen 利用操纵嵌入器促进增强的图表变形器演示和学习 2507.07335v1

Authors (2): Ankit Jyothish, Ali Jannesari

Graph transformers typically embed every node in a single Euclidean space, blurring heterogeneous topologies. We prepend a lightweight Riemannian mixture-of-experts layer that routes each node to various kinds of manifold, mixture of spherical, flat, hyperbolic - best matching its local structure. These projections provide intrinsic geometric explanations to the latent space. Inserted into a state-of-the-art ensemble graph transformer, this projector lifts accuracy by up to 3% on four node-classification benchmarks. The ensemble makes sure that both euclidean and non-euclidean features are captured. Explicit, geometry-aware projection thus sharpens predictive power while making graph representations more interpretable.

图形变压器通常将每个节点嵌入单一的欧几里德空间, 模糊多样的地形。 我们预设了一个轻量级的里曼混合专家层, 将每个节点引向多种不同的多功能、 球体、 平板、 双曲- 最适合其本地结构 。 这些预测为潜在空间提供了内在的几何解释 。 被插入到一个最先进的混合式图形变压器中, 这个投影器将四个节点分类基准的精确度提高高达3% 。 共同点确保了 euclidean 和非 euclidean 的特性都被捕捉到。 清晰的、 几何性觉的预测因此使预测力更加精准, 使图形表达更加易解 。


Article 168

Title@2025-07-09 (3): Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery

Title: Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery Überbrückung des Plausibilitäts-Validitäts-Gaps durch Feintuning eines mit Gründen versehenen LLM für chemische Synthese und Entdeckung 通过为化学合成和发现而精细提出一个有合理理由的强化LLM来弥合可视性与价值差距 2507.07328v1

Authors (2): Malikussaid, Hilal Hudan Nuha

Large Language Models (LLMs) often generate scientifically plausible but factually invalid information, a challenge we term the “plausibility-validity gap,” particularly in specialized domains like chemistry. This paper presents a systematic methodology to bridge this gap by developing a specialized scientific assistant. We utilized the Magistral Small model, noted for its integrated reasoning capabilities, and fine-tuned it using Low-Rank Adaptation (LoRA). A key component of our approach was the creation of a “dual-domain dataset,” a comprehensive corpus curated from various sources encompassing both molecular properties and chemical reactions, which was standardized to ensure quality. Our evaluation demonstrates that the fine-tuned model achieves significant improvements over the baseline model in format adherence, chemical validity of generated molecules, and the feasibility of proposed synthesis routes. The results indicate a hierarchical learning pattern, where syntactic correctness is learned more readily than chemical possibility and synthesis feasibility. While a comparative analysis with human experts revealed competitive performance in areas like chemical creativity and reasoning, it also highlighted key limitations, including persistent errors in stereochemistry, a static knowledge cutoff, and occasional reference hallucination. This work establishes a viable framework for adapting generalist LLMs into reliable, specialized tools for chemical research, while also delineating critical areas for future improvement.

大型语言模型(LLMS)通常产生科学上可信但事实上无效的信息,我们称之为“可变性-功能差距”,特别是在化学等专门领域。本文通过开发专业科学助理,提出了弥合这一差距的系统方法。我们利用了磁学小模型,以综合推理能力为标志,并使用低射线适应(LORA)对其进行了微调。我们方法的一个关键组成部分是创建“双体域数据集”,这是从各种来源,包括分子特性和化学反应两方面综合汇集的综合资料,以确保质量。我们的评估表明,微调模型在格式遵守、所产分子的化学有效性和拟议合成路径的可行性方面,在基线模型方面取得了显著改进。结果显示一种分级学习模式,在这种模式中,与化学创新和推理等领域相比,比较分析揭示了有竞争力的绩效,但也突出了关键局限性,包括立体化学学方面的持续错误、静态知识截断和偶时时参考错觉。这项工作还确立了一个将关键研究领域改成常规化学研究工具的可行框架。


Article 169

Title@2025-07-09 (3): Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning

Title: Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning Optimierung der Modellaufteilung und der Aufgabenzuweisung für trügerisches Signal unterstütztes privates Mehr-Hop-Split-Lernen 为欺骗性信号辅助的私人多窗口共享学习优化模型分割和设备任务分配 2507.07323v1

Authors (5): Dongyu Wei, Xiaoren Xu, Yuchen Liu, H. Vincent Poor, Mingzhe Chen

In this paper, deceptive signal-assisted private split learning is investigated. In our model, several edge devices jointly perform collaborative training, and some eavesdroppers aim to collect the model and data information from devices. To prevent the eavesdroppers from collecting model and data information, a subset of devices can transmit deceptive signals. Therefore, it is necessary to determine the subset of devices used for deceptive signal transmission, the subset of model training devices, and the models assigned to each model training device. This problem is formulated as an optimization problem whose goal is to minimize the information leaked to eavesdroppers while meeting the model training energy consumption and delay constraints. To solve this problem, we propose a soft actor-critic deep reinforcement learning framework with intrinsic curiosity module and cross-attention (ICM-CA) that enables a centralized agent to determine the model training devices, the deceptive signal transmission devices, the transmit power, and sub-models assigned to each model training device without knowing the position and monitoring probability of eavesdroppers. The proposed method uses an ICM module to encourage the server to explore novel actions and states and a CA module to determine the importance of each historical state-action pair thus improving training efficiency. Simulation results demonstrate that the proposed method improves the convergence rate by up to 3x and reduces the information leaked to eavesdroppers by up to 13% compared to the traditional SAC algorithm.

在本文件中,对欺骗性信号辅助私人拆分学习进行了调查。 在我们的模型中, 几个边缘装置联合进行协作培训, 一些窃听器旨在收集模型和来自设备的数据信息。 为了防止窃听器收集模型和数据信息, 一组装置可以传输欺骗性信号。 因此, 有必要确定用于欺骗性信号传输的装置的子集、 模范培训装置的子集, 以及分配给每个示范培训装置的模型。 这个问题被描述为一个优化问题, 目的是在满足示范培训能源消耗和延迟限制的同时, 尽量减少泄露给偷听器的信息。 为了解决这一问题, 我们提议了一个软的行为者- 摄取式深度强化学习框架, 带有内在好奇心模块和交叉注意( ICM- CA) , 使中央机构能够确定模型培训装置、 迷性信号传输装置、 模范式培训装置的子模型, 传输力, 以及分配给每个示范培训装置的子模型, 而不知道窃听器的位置和监测概率。 拟议的方法使用ICM 模模模模模模模块, 来鼓励服务器上演化新动作, 通过SIM 3 模法, 测试, 改进每部的精调取率, 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校制 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正 校正


Article 170

Title@2025-07-09 (3): Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy

Title: Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy Optimierung von Kommunikation und Geräteclustering für clustergebundenes Federated Learning mit unterschiedlicher Privatsphäre 优化通信和装置集群,促进有不同隐私的分组联邦学习 2507.07320v1

Authors (4): Dongyu Wei, Xiaoren Xu, Shiwen Mao, Mingzhe Chen

In this paper, a secure and communication-efficient clustered federated learning (CFL) design is proposed. In our model, several base stations (BSs) with heterogeneous task-handling capabilities and multiple users with non-independent and identically distributed (non-IID) data jointly perform CFL training incorporating differential privacy (DP) techniques. Since each BS can process only a subset of the learning tasks and has limited wireless resource blocks (RBs) to allocate to users for federated learning (FL) model parameter transmission, it is necessary to jointly optimize RB allocation and user scheduling for CFL performance optimization. Meanwhile, our considered CFL method requires devices to use their limited data and FL model information to determine their task identities, which may introduce additional communication overhead. We formulate an optimization problem whose goal is to minimize the training loss of all learning tasks while considering device clustering, RB allocation, DP noise, and FL model transmission delay. To solve the problem, we propose a novel dynamic penalty function assisted value decomposed multi-agent reinforcement learning (DPVD-MARL) algorithm that enables distributed BSs to independently determine their connected users, RBs, and DP noise of the connected users but jointly minimize the training loss of all learning tasks across all BSs. Different from the existing MARL methods that assign a large penalty for invalid actions, we propose a novel penalty assignment scheme that assigns penalty depending on the number of devices that cannot meet communication constraints (e.g., delay), which can guide the MARL scheme to quickly find valid actions, thus improving the convergence speed. Simulation results show that the DPVD-MARL can improve the convergence rate by up to 20% and the ultimate accumulated rewards by 15% compared to independent Q-learning.

在本文中,提出了安全和通信高效的组合式联合学习(CFL)设计。在我们的模式中,几个任务处理能力不一且分布相同(非IID)数据的基本站(BS)和多个用户联合进行CFL培训,包括不同的隐私(DP)技术。由于每个BS只能处理学习任务的一个子集,并且有限的无线资源块分配给用户用于联合学习(FL)模型参数传输,因此有必要联合优化CFL绩效优化的RB分配和用户进度安排。与此同时,我们考虑的CFL方法需要各种设备来使用它们有限的数据和FL模型信息来确定他们的任务身份,这可能会增加通信管理费。我们形成了一个优化问题,目标是在考虑设备组合、RB分配、DP噪声和FL模式传输延迟的同时尽量减少所有培训任务的培训损失。为了解决问题,我们提出了一个新的动态惩罚功能,用S-DD-D-MAR算法来帮助降低增值(D-DD-MAR)的升级学习价值,但可以让分布的BS系统能够独立地决定其连接的用户、RB-RB最后的动作和DP-RVRMRMRMR的动作,这样可以让所有的计算。


Article 171

Title@2025-07-09 (3): AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing

Title: AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing AdeptHEQ-FL: Adaptive homomorphe Verschlüsselung für das Federated Learning von Hybrid-Klassisch-Quantum-Modellen mit dynamischem Layer-Sparing AdeptHEQ-FL:具有动态图层分隔的混合古量模型的联邦学习 2507.07316v1

Authors (5): Md Abrar Jahin, Taufikur Rahman Fuad, M. F. Mridha, Nafiz Fahad, Md. Jakir Hossen

Federated Learning (FL) faces inherent challenges in balancing model performance, privacy preservation, and communication efficiency, especially in non-IID decentralized environments. Recent approaches either sacrifice formal privacy guarantees, incur high overheads, or overlook quantum-enhanced expressivity. We introduce AdeptHEQ-FL, a unified hybrid classical-quantum FL framework that integrates (i) a hybrid CNN-PQC architecture for expressive decentralized learning, (ii) an adaptive accuracy-weighted aggregation scheme leveraging differentially private validation accuracies, (iii) selective homomorphic encryption (HE) for secure aggregation of sensitive model layers, and (iv) dynamic layer-wise adaptive freezing to minimize communication overhead while preserving quantum adaptability. We establish formal privacy guarantees, provide convergence analysis, and conduct extensive experiments on the CIFAR-10, SVHN, and Fashion-MNIST datasets. AdeptHEQ-FL achieves a $\approx 25.43\%$ and $\approx 14.17\%$ accuracy improvement over Standard-FedQNN and FHE-FedQNN, respectively, on the CIFAR-10 dataset. Additionally, it reduces communication overhead by freezing less important layers, demonstrating the efficiency and practicality of our privacy-preserving, resource-aware design for FL.

联邦学习联合会(FL)在平衡模型性能、隐私保护和通信效率方面面临固有的挑战,特别是在非二维分散环境中。最近的做法要么牺牲正式的隐私保障,要么造成高额间接费用,要么忽视量增强的表达度。我们引入了AdeptHEQ-FL,这是一个统一的混合古典-夸特姆FL框架,将(一) CNN-PQC的表达式分散学习混合结构,(二) 一个适应性、精度加权的汇总计划,利用不同的私人验证能力,(三) 选择性的单一加密(HE),用于敏感模型层的安全整合,以及(四) 动态的跨层适应性冻结,以尽量减少通信间接费用,同时保持量适应性能适应性。我们建立了正式的隐私保障,提供趋同分析,并对CIFAR-10、SVHN和Fashion-MIT数据集进行广泛的实验。AdeptHeQHQQQQNN和FHE-FHQNNN, 分别通过低度、低度、低度、低度节能、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、低度、


Article 172

Title@2025-07-09 (3): Frontier LLMs Still Struggle with Simple Reasoning Tasks

Title: Frontier LLMs Still Struggle with Simple Reasoning Tasks Frontier LLMs kämpfen immer noch mit einfachen, vernünftigen Aufgaben 仍与简单理由任务抗争的前沿LLMS 2507.07313v1

Authors (6): Alan Malek, Jiawei Ge, Nevena Lazic, Chi Jin, András György, Csaba Szepesvári

While state-of-the-art large language models (LLMs) demonstrate advanced reasoning capabilities-achieving remarkable performance on challenging competitive math and coding benchmarks-they also frequently fail on tasks that are easy for humans. This work studies the performance of frontier LLMs on a broad set of such “easy” reasoning problems. By extending previous work in the literature, we create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning, with changeable parameters (such as document length. or the number of variables in a math problem) that can arbitrarily increase the amount of computation required to produce the answer while preserving the fundamental difficulty. While previous work showed that traditional, non-thinking models can be made to fail on such problems, we demonstrate that even state-of-the-art thinking models consistently fail on such problems and for similar reasons (e.g. statistical shortcuts, errors in intermediate steps, and difficulties in processing long contexts). To further understand the behavior of the models, we introduce the unpuzzles dataset, a different “easy” benchmark consisting of trivialized versions of well-known math and logic puzzles. Interestingly, while modern LLMs excel at solving the original puzzles, they tend to fail on the trivialized versions, exhibiting several systematic failure patterns related to memorizing the originals. We show that this happens even if the models are otherwise able to solve problems with different descriptions but requiring the same logic. Our results highlight that out-of-distribution generalization is still problematic for frontier language models and the new generation of thinking models, even for simple reasoning tasks, and making tasks easier does not necessarily imply improved performance.

虽然最先进的大型语言模型(LLMS)展示了先进的推理能力,在具有竞争力的数学和编码基准上取得了卓越的杰出成绩,但是这些模型也常常在对人类来说容易的任务上失败。 这项工作研究了一系列广泛的“ 容易” 推理问题的前沿LLMS的性能。 通过扩展文献中以前的工作,我们创造了一套程序上产生的简单推理任务,包括计数、一阶逻辑、校对树和旅行规划,以及可更改的参数(如文件长度等,或数学问题中的变量数量),这些参数可以任意增加生成答案所需的计算数量,同时保存基本困难。 虽然以前的工作表明传统、非思维模型可以导致这些问题的失败。 即使是最先进的思维模型也总是在这些问题上失败(例如统计捷径、中间步骤错误、长期处理困难 ) 。 为了进一步理解模型的行为,我们引入了不易被夸大的数据集,我们不得不用不易被夸大的数据模型,一个不同的“容易”基准,即原始的逻辑模型可以使最初的模型变得模糊化,而令人费解的数学和逻辑性任务也使某些逻辑的模型变得令人费解。


Article 173

Title@2025-07-09 (3): From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?

Title: From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? Von Bildern zu Signalen: Sind große Visionsmodelle für die Zeitreihenanalyse nützlich? 从图像到信号:大型视觉模型对时间序列分析有用吗? 2505.24030v2

Authors (7): Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Zhigang Deng, Qingsong Wen, Jingchao Ni

Transformer-based models have gained increasing attention in time series research, driving interest in Large Language Models (LLMs) and foundation models for time series analysis. As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction. In the past, the effectiveness of Transformer and LLMs in time series has been debated. When it comes to LVMs, a similar question arises: are LVMs truely useful for time series analysis? To address it, we design and conduct the first principled study involving 4 LVMs, 8 imaging methods, 18 datasets and 26 baselines across both high-level (classification) and low-level (forecasting) tasks, with extensive ablation analysis. Our findings indicate LVMs are indeed useful for time series classification but face challenges in forecasting. Although effective, the contemporary best LVM forecasters are limited to specific types of LVMs and imaging methods, exhibit a bias toward forecasting periods, and have limited ability to utilize long look-back windows. We hope our findings could serve as a cornerstone for future research on LVM- and multimodal-based solutions to different time series tasks.

以变换器为基础的模型在时间序列研究中日益受到重视,吸引人们对大语言模型和时间序列分析基础模型的兴趣。随着实地走向多模式、大愿景模型(LVM)的出现,这是一个充满希望的方向。过去,对变换器和LLMS在时间序列中的有效性进行了辩论。在LVM方面,也出现了一个类似的问题:LVMs对时间序列分析是否真正有用?为了解决这个问题,我们设计并开展了第一次原则性研究,涉及4 LVMs、8个成像方法、18个数据集和26个基线,涵盖高级(分类)和低级别(预测)任务,并进行了广泛的通货膨胀分析。我们的调查结果表明LVMs确实可用于时间序列分类,但面临预测方面的挑战。尽管有效,但当代最好的LMMS预报器仅限于特定的LVMs和成像方法类型,对预测期表现出偏向,而且利用长视向后视窗口的能力有限。我们希望我们的调查结果能够成为未来对LVM和基于多式联运的解决方案进行不同时间序列的未来研究的基石。


Article 174

Title@2025-07-09 (3): Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

Title: Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing Spektrale Estimatoren für strukturierte generalisierte Linearmodelle über ungefähre Message Passing 通过近似消息传递为结构化通用线性模型制作的光谱光度模拟器 2308.14507v4

Authors (4): Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli

We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $\Sigma$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.

我们在一个高层次的通用线性模型中考虑参数估算问题。通过一个适当数据依赖矩阵的主成象仪的主成象仪获得的光谱方法提供了简单而令人惊讶的有效解决方案。然而,尽管这种方法使用范围广泛,但严格的性能定性以及数据预处理的原则性方法只能用于结构化的设计(即,d.d.\高森和哈尔或奥特索戈纳尔),而与之相反,现实世界的数据矩阵结构高度结构化,且显示非三角的关联性。为解决这一问题,我们认为,通过一个变量矩阵($\Sigma$)采集特征的异谱性特征的相联设计提供了简单而令人惊讶的有效解决方案。我们的主要结果是,对光谱估计器的性能进行精确的描述。这使我们能够确定最佳的预处理方法,最大限度地减少参数估计所需的样本数量。令人惊讶的是,这种预处理在一系列广泛的设计中是普遍的,其中部分涉及对可轮换性模型最佳光谱度定像度设置的预测。我们在通用的基因矩阵模型中,包括共同的基质的精确的计算方法,在以往的基质结构分析方法上改进了共同的精确的计算方法。


Article 175

Title@2025-07-09 (3): Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

Title: Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v2

Authors (2): Eun-Ji Park, Sangwon Yun

Recent studies have proposed interpreting the training process from an ergodic perspective. Building on this foundation, we present a unified framework for understanding and accelerating the training of deep neural networks via stochastic gradient descent (SGD). By analyzing the geometric landscape of the objective function we introduce a practical diagnostic, the running estimate of the largest Lyapunov exponent, which provably distinguishes genuine convergence toward stable minimizers from mere statistical stabilization near saddle points. We then propose a ghost category extension for standard classifiers that adds auxiliary ghost output nodes so the model gains extra descent directions that open a lateral corridor around narrow loss barriers and enable the optimizer to bypass poor basins during the early training phase. We show that this extension strictly reduces the approximation error and that after sufficient convergence the ghost dimensions collapse so that the extended model coincides with the original one and there exists a path in the enlarged parameter space along which the total loss does not increase. Taken together, these results provide a principled architecture level intervention that accelerates early stage trainability while preserving asymptotic behavior and simultaneously serves as an architecture-friendly regularizer.

最近的研究提议从性别角度来解释培训过程。 在此基础上, 我们提出了一个统一的框架, 以便通过随机梯度梯度下降(SGD)来理解和加速培训深神经网络。 通过分析客观功能的几何景观, 我们引入了一个实际的诊断, 正在对最大的Lyapunov Expent 进行估计, 这可以将向稳定的最小化点的真正趋同与在马鞍附近仅具有统计稳定性的最小化点区分开来。 然后我们建议为标准分类器增加一个幽灵类别扩展, 增加辅助幽灵输出节点, 使模型获得额外的下降方向, 围绕狭窄的损失障碍打开横向走廊, 并使优化者能够在早期培训阶段绕过贫穷盆地。 我们表明, 这一扩展将严格减少近似误差, 在完全融合了幽灵尺寸后, 扩展模型会与原始模型相吻合, 并且存在一条路径, 在总损失不会增加的扩大的参数空间中存在一条路径。 这些结果提供了一种原则性的结构级干预, 加速早期的阶段训练, 同时保持“ 保护” 行为, 并同时充当一个对建筑友好的固定器 。


Article 176

Title@2025-07-09 (3): Multilayer GNN for Predictive Maintenance and Clustering in Power Grids

Title: Multilayer GNN for Predictive Maintenance and Clustering in Power Grids Multilayer GNN für Predictive Maintenance und Clustering in Power Grids 用于电力网的预报维护和集群的多层次GNN(多层次GNN) 2507.07298v1

Authors (5): Muhammad Kazim, Harun Pirim, Chau Le, Trung Le, Om Prakash Yadav

Unplanned power outages cost the US economy over $150 billion annually, partly due to predictive maintenance (PdM) models that overlook spatial, temporal, and causal dependencies in grid failures. This study introduces a multilayer Graph Neural Network (GNN) framework to enhance PdM and enable resilience-based substation clustering. Using seven years of incident data from Oklahoma Gas & Electric (292,830 records across 347 substations), the framework integrates Graph Attention Networks (spatial), Graph Convolutional Networks (temporal), and Graph Isomorphism Networks (causal), fused through attention-weighted embeddings. Our model achieves a 30-day F1-score of 0.8935 +/- 0.0258, outperforming XGBoost and Random Forest by 3.2% and 2.7%, and single-layer GNNs by 10 to 15 percent. Removing the causal layer drops performance to 0.7354 +/- 0.0418. For resilience analysis, HDBSCAN clustering on HierarchicalRiskGNN embeddings identifies eight operational risk groups. The highest-risk cluster (Cluster 5, 44 substations) shows 388.4 incidents/year and 602.6-minute recovery time, while low-risk groups report fewer than 62 incidents/year. ANOVA (p < 0.0001) confirms significant inter-cluster separation. Our clustering outperforms K-Means and Spectral Clustering with a Silhouette Score of 0.626 and Davies-Bouldin index of 0.527. This work supports proactive grid management through improved failure prediction and risk-aware substation clustering.

计划外停电使美国经济每年花费超过1 500亿美元,部分原因是预测性维护模型(PdM)忽略了电网故障的空间、时间和因果依赖性。本研究引入了一个多层图像神经网络框架(GNN),以加强电磁并促成基于复原力的子站群。使用来自俄克拉荷马天然气电气公司(292,830个记录,覆盖347个子站)的七年事件数据,框架整合了图表关注网络(空间)、图表革命网络(时间)和图表变异性网络(因子),这些模型通过关注加权嵌入嵌入系统连接起来。我们的模型实现了30天F1核心,即0.8935+/0.0258,比XGBOst和随机森林高出3.2%和2.7%,而单层GNNNS的事故增加了10-15%。将因果层的性能下降到0.354+/-04.18。用于复原力分析,HDBCCCGN内嵌入式嵌入式嵌入式网络网络(0.5号),比我们VILSLSLIA类 级的恢复风险最高类别和低40级(SLA级) 级) 和低级类。


Article 177

Title@2025-07-09 (3): Time Series Foundation Models for Multivariate Financial Time Series Forecasting

Title: Time Series Foundation Models for Multivariate Financial Time Series Forecasting Time Series Foundation Modelle für Multivariate Finanzzeitreihen Prognose 多变财务时间序列预测模型 2507.07296v1

Authors (1): Ben A. Marconi

Financial time series forecasting presents significant challenges due to complex nonlinear relationships, temporal dependencies, variable interdependencies and limited data availability, particularly for tasks involving low-frequency data, newly listed instruments, or emerging market assets. Time Series Foundation Models (TSFMs) offer a promising solution through pretraining on diverse time series corpora followed by task-specific adaptation. This study evaluates two TSFMs (Tiny Time Mixers (TTM) and Chronos) across three financial forecasting tasks: US 10-year Treasury yield changes, EUR/USD volatility, and equity spread prediction. Results demonstrate that TTM exhibits strong transferability. When fine-tuning both the pretrained version of TTM and an untrained model with the same architecture, the pretrained version achieved 25-50% better performance when fine-tuned on limited data and 15-30% improvements even when fine-tuned on lengthier datasets. Notably, TTM’s zero-shot performance outperformed naive benchmarks in volatility forecasting and equity spread prediction, with the latter demonstrating that TSFMs can surpass traditional benchmark models without fine-tuning. The pretrained model consistently required 3-10 fewer years of data to achieve comparable performance levels compared to the untrained model, demonstrating significant sample-efficiency gains. However, while TTM outperformed naive baselines, traditional specialised models matched or exceeded its performance in two of three tasks, suggesting TSFMs prioritise breadth over task-specific optimisation. These findings indicate that TSFMs, though still nascent, offer substantial promise for financial forecasting-particularly in noisy, data-constrained tasks-but achieving competitive performance likely requires domain-specific pretraining and architectural refinements tailored to financial time series characteristics.

由于复杂的非线性关系、时间依赖性、不同相互依存性和有限数据可用性,特别是涉及低频数据、新上市工具或新兴市场资产的任务,时间系列基金会模型(TSFMs)通过对不同时间序列公司进行预先培训,然后根据具体任务进行调整,提供了一个有希望的解决办法。本研究报告评估了三个财务预测任务中的两个TSFM(Tiny Ti-Time Mixers和Chronos),这三项财务预测任务包括:美国10年期国库收益变化、欧元/美元波动和公平分布预测。结果显示,TTM公司具有很强的可转移性。当对TTM公司预培训版和同一结构的未培训模式进行微调时,在微调有限数据后,即实现了25-50%的改进,甚至对更长时间的调整。 值得注意的是,TTM公司在波动预测和公平分布预测中,零表现优于天真基准,后者表明,TSFM公司在不作微调的情况下可以超过传统的基准模型。


Article 178

Title@2025-07-09 (3): Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning

Title: Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning Thermodynamische Vorhersage ermöglicht durch automatisches Datensatzaufbauen und maschinelles Lernen 通过自动数据集建立和机器学习实现的热动力预测 2507.07293v1

Authors (7): Juejing Liu, Haydn Anderson, Noah I. Waxman, Vsevolod Kovalev, Byron Fisher, Elizabeth Li, Xiaofeng Guo

New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.

化学和材料科学的新发现,随着所需知识和实验工作量的不断增加,为机器学习提供了独特的机会,以在加速研究效率方面发挥关键作用。在这里,我们展示:(1) 使用大型语言模型进行自动文献审查,(2) 培训一个ML模型来预测化学知识(热力参数),我们基于LLM的文献审查工具(LMExt)成功地将化学信息带入一个机器可读的结构,包括金属电离层和相互作用的稳定性常数、热力特性和其他更广泛的数据类型(医学研究论文和财务报告),有效地克服每个领域固有的挑战。利用自动获取热力数据,利用CatBoost算法培训了一个ML模型,以准确预测矿物的热力参数(例如形成物的刺痛物)。这项工作突出了综合ML方法在重塑化学和材料科学研究方面的变革潜力。


Article 179

Title@2025-07-09 (3): Discretization-independent multifidelity operator learning for partial differential equations

Title: Discretization-independent multifidelity operator learning for partial differential equations Diskretisationsunabhängiges Multifidelitätsoperator-Lernen für partielle Differentialgleichungen 为部分差异方程学习部分差异方程 2507.07292v1

Authors (2): Jacob Hauck, Yanzhi Zhang

We develop a new and general encode-approximate-reconstruct operator learning model that leverages learned neural representations of bases for input and output function distributions. We introduce the concepts of \textit{numerical operator learning} and \textit{discretization independence}, which clarify the relationship between theoretical formulations and practical realizations of operator learning models. Our model is discretization-independent, making it particularly effective for multifidelity learning. We establish theoretical approximation guarantees, demonstrating uniform universal approximation under strong assumptions on the input functions and statistical approximation under weaker conditions. To our knowledge, this is the first comprehensive study that investigates how discretization independence enables robust and efficient multifidelity operator learning. We validate our method through extensive numerical experiments involving both local and nonlocal PDEs, including time-independent and time-dependent problems. The results show that multifidelity training significantly improves accuracy and computational efficiency. Moreover, multifidelity training further enhances empirical discretization independence.

我们开发了一个新的通用的编码-近似再构建操作员学习模式,利用对输入和输出函数分布基础的已知神经表征。 我们引入了\ textit{ 数字操作员学习} 和\ textit{ discretization 独立性的概念, 澄清了理论公式与操作员学习模式实际实现之间的关系。 我们的模式是离散的, 使得它对于多异性学习特别有效。 我们建立了理论近似保证, 在较弱的条件下, 在输入功能和统计近似的强假设下, 展示了统一的通用近似。 据我们了解, 这是第一项全面研究, 调查离散独立如何使强而高效的多异性操作员学习。 我们通过涉及本地和非本地的 PDE 的广泛数字实验, 验证了我们的方法, 包括时间独立和时间独立的问题。 结果显示, 多异性培训极大地提高了准确性和计算效率。 此外, 多性培训进一步增强了经验的离散独立性。


Article 180

Title@2025-07-09 (3): Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems

Title: Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems Schätzung der Datensatzdimension über Singular Metrics unter der Manifold Hypothese: Anwendung auf inverse Probleme 在 “ 假假设:对反向问题的应用 “ 下,通过单质计量法估算数据集的维度 2507.07291v1

Authors (2): Paola Causin, Alessio Marta

High-dimensional datasets often exhibit low-dimensional geometric structures, as suggested by the manifold hypothesis, which implies that data lie on a smooth manifold embedded in a higher-dimensional ambient space. While this insight underpins many advances in machine learning and inverse problems, fully leveraging it requires to deal with three key tasks: estimating the intrinsic dimension (ID) of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces. In this work, we propose a framework that addresses all these challenges using a Mixture of Variational Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric. The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference. We how this approach enhances solutions to ill-posed inverse problems, particularly in biomedical imaging, by enforcing that reconstructions lie on the learned manifold. Lastly, we explore the impact of network pruning on manifold geometry and reconstruction quality, showing that the intrinsic dimension serves as an effective proxy for monitoring model capacity.

高维数据集往往展示出由多重假设所推荐的低维几何结构,这意味着数据存在于高维环境空间内嵌入的光滑元体上。虽然这种洞察力支持了机器学习的许多进步和反向问题,但充分利用它来应对三项关键任务:估计元体的内在维度(ID),建造适当的地方坐标,在环境空间和多元空间之间绘制学习绘图。在这项工作中,我们提出了一个框架,用以利用动态自动自动成像(VAE)混合体和Riemannian几何仪的工具来应对所有这些挑战。我们特别侧重于通过分析VAE 解码拉回指标的数值等级来估计数据集的标识。估计的ID指导了使用不可忽略的VAE混合体的本地图表的构建图集,使精确的多元参数化和高效的推论得以实现。我们如何利用这种方法加强解决错误的反向问题,特别是生物医学成像(VAEEE)和来自Riemannian几何体的工具。最后,我们探索了网络内置的模型模拟测量质量,以显示模型的模型的模拟质量,以显示其有效模拟的模拟质量。


Article 181

Title@2025-07-09 (3): Natural Evolutionary Search meets Probabilistic Numerics

Title: Natural Evolutionary Search meets Probabilistic Numerics Natürliche evolutionäre Suche trifft auf probabilistische Zahlen 自然进化搜索会议 2507.07288v1

Authors (4): Pierre Osselin, Masaki Adachi, Xiaowen Dong, Michael A. Osborne

Zeroth-order local optimisation algorithms are essential for solving real-valued black-box optimisation problems. Among these, Natural Evolution Strategies (NES) represent a prominent class, particularly well-suited for scenarios where prior distributions are available. By optimising the objective function in the space of search distributions, NES algorithms naturally integrate prior knowledge during initialisation, making them effective in settings such as semi-supervised learning and user-prior belief frameworks. However, due to their reliance on random sampling and Monte Carlo estimates, NES algorithms can suffer from limited sample efficiency. In this paper, we introduce a novel class of algorithms, termed Probabilistic Natural Evolutionary Strategy Algorithms (ProbNES), which enhance the NES framework with Bayesian quadrature. We show that ProbNES algorithms consistently outperforms their non-probabilistic counterparts as well as global sample efficient methods such as Bayesian Optimisation (BO) or $\pi$BO across a wide range of tasks, including benchmark test functions, data-driven optimisation tasks, user-informed hyperparameter tuning tasks and locomotion tasks.

零顺序本地优化算法对于解决实际价值黑盒优化问题至关重要。 其中,自然进化战略(NES)代表着一个突出的类别,特别适合先前分布的情景。通过优化搜索分布空间的客观功能,NES算法自然地结合了初始化过程中的先前知识,使其在半监管学习和用户优先信仰框架等环境中有效。然而,由于依赖随机抽样和蒙特卡洛估计,NES算法可能因抽样效率有限而受到影响。在本文件中,我们引入了新型的算法,称为概率性自然进化战略阿尔戈里什姆(ProbNES),用巴耶斯二次曲线改进了国家空间战略框架。我们表明,ProbNES算法始终优于其非不稳定对应方以及诸如Bayesian Opitimization(BO)或$\pi$BO等全球样本高效方法,跨越了广泛的任务范围,包括基准测试功能、数据驱动的优化任务、用户知情的超分辨率任务和高分辨率任务。


Article 182

Title@2025-07-09 (3): EditLord: Learning Code Transformation Rules for Code Editing

Title: EditLord: Learning Code Transformation Rules for Code Editing EditLord: Regeln zur Code-Transformation für die Code-Editing 编辑主: 学习代码编辑的代码转换规则 2504.15284v4

Authors (6): Weichen Li, Albert Jan, Baishakhi Ray, Junfeng Yang, Chengzhi Mao, Kexin Pei

Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code’s intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they suffer from suboptimal performance and lack of robustness and generalization. We introduce EditLord, a code editing framework that makes the code transformation steps explicit. Our key insight is to employ a language model (LM) as an inductive learner to extract code editing rules from the training code pairs as concise meta-rule sets. Such rule sets will be manifested for each training sample to augment them for finetuning or assist in prompting- and iterative-based code editing. EditLord outperforms the state-of-the-art by an average of 22.7% in editing performance and 58.1% in robustness while achieving 20.2% higher functional correctness across critical software engineering and security applications, LM models, and editing modes.

代码编辑是软件开发中的一项基本任务,其有效性取决于它是否在不改变原代码的预期功能的情况下引入了理想的代码属性变化。现有方法往往将代码编辑作为一种隐含的端对端任务来制定代码编辑,忽略了编码编辑程序本身就包含离散和清晰的步骤这一事实。因此,它们表现欠佳,缺乏稳健性和概括性。我们引入了编辑框架,即使代码转换步骤明确化的代码编辑框架。我们的主要见解是使用一种语言模式(LM)作为感应学习者,从培训代码对对口中提取代码编辑规则,作为简洁的元规则。对于每个培训样本,这些规则将表现为增强它们,以进行微调,或协助快速和反复的代码编辑。编辑能力优于最新水平,平均22.7%的编辑性能和58.1%的强性,同时在关键软件工程和安全应用程序、LM模式和编辑模式中实现20.2%更高的功能正确性。


Article 183

Title@2025-07-09 (3): Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks

Title: Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks Smart IoT Security: Leichtbautechniken für die Multi-Class-Angrifferkennung in IoT-Netzwerken Smart Iot安全:在IoT网络中用于多级攻击探测的轻型机械学习技术 2502.04057v3

Authors (5): Shahran Rahman Alve, Muhammad Zawad Mahmud, Samiha Islam, Md. Asaduzzaman Chowdhury, Jahirul Islam

The Internet of Things (IoT) is expanding at an accelerated pace, making it critical to have secure networks to mitigate a variety of cyber threats. This study addresses the limitation of multi-class attack detection of IoT devices and presents new machine learning-based lightweight ensemble methods that exploit its strong machine learning framework. We used a dataset entitled CICIoT 2023, which has a total of 34 different attack types categorized into 10 categories, and methodically assessed the performance of a substantial array of current machine learning techniques in our goal to identify the best-performing algorithmic choice for IoT application protection. In this work, we focus on ML classifier-based methods to address the biocharges presented by the difficult and heterogeneous properties of the attack vectors in IoT ecosystems. The best-performing method was the Decision Tree, achieving 99.56% accuracy and 99.62% F1, indicating this model is capable of detecting threats accurately and reliably. The Random Forest model also performed nearly as well, with an accuracy of 98.22% and an F1 score of 98.24%, indicating that ML methods excel in a scenario of high-dimensional data. These findings emphasize the promise of integrating ML classifiers into the protective defenses of IoT devices and provide motivations for pursuing subsequent studies towards scalable, keystroke-based attack detection frameworks. We think that our approach offers a new avenue for constructing complex machine learning algorithms for low-resource IoT devices that strike a balance between accuracy requirements and time efficiency. In summary, these contributions expand and enhance the knowledge of the current IoT security literature, establishing a solid baseline and framework for smart, adaptive security to be used in IoT environments.

互联网(IoT)正在加速扩展,使得建立安全网络至关重要,以减轻各种网络威胁。本研究涉及对IoT设备多级攻击检测的限制,并介绍了利用其强大的机器学习框架的基于机器学习的轻质混合新方法。我们使用了一个名为CICIOT 2023的数据集,该数据集共有34种不同类型的攻击类型,分为10个类别,并有条不紊地评估了目前大量机器学习技术的性能,目的是确定IoT应用程序保护的最佳算法选择。在这项工作中,我们侧重于基于ML的基于分类的扩大方法,以解决IoT装置多级攻击矢量的难度和复杂性能所显示的生物充电量。我们使用一个名为CCICIOT 2023的数据集,该数据集共有34种不同的攻击类型,该模型能够准确和可靠地探测各种威胁。随机森林模型也运行了近乎于98.22%的低级思维方法,而F1级框架评为98.24%。我们注重高级的IL方法,在高层次的S-regial Slical rual rodeal rodeal laftal laction acal deal laction ask the fal laut the the the the magistrisal sal dal destrut the sal disal disal maisal laut the sal max thes fal disal mad mad mas fal max max max max la mas mas madal lax max mad max lad lad mas mas max max max max max max max max max max se se se se se lads se making se se se se se maildal lautdal se se se lad se se mad mad mas mas mas mas mas mas mas mas maildal mas mas mas se se mas


Article 184

Title@2025-07-09 (3): Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes

Title: Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes Fast sichere Konvergenz für die letzte Phase der stochastischen Stufenabfolge 沙粒渐变后代计划最后一次迭代几乎可以肯定的趋同 2507.07281v1

Authors (1): Marcel Hudiani

We study the almost sure convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $\gamma$-H"{o}lder. Using only discrete Gronwall’s inequality without Robbins-Siegmund theorem nor martingale convergence theory, we recover results for both SGD and SHB: $\min_{s\leq t} |\nabla F(w_s)|^2 = o(t^{p-1})$ for non-convex objectives and $F(w_t) - F_* = o(t^{2\gamma/(1+\gamma) \cdot \max(p-1,-2p+1)-\epsilon})$ for $\beta \in (0, 1)$ and $\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ almost surely for convex objectives. In addition, we proved that SHB with constant momentum parameter $\beta \in (0, 1)$ attains a convergence rate of $F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}{\delta})$ with probability at least $1-\delta$ when $F$ is convex and $\gamma = 1$ and step size $\alpha_t = \Theta(t^{-p})$ with $p \in (\frac{1}{2}, 1)$.

当目标函数 $F$是全球的 comvex 或非convex , 其梯度是 $\ gamma$- H\\\\\\\\ o}焊接者。 仅使用离散的 Gronworl 不平等性而不使用 Robbins- Siegmund 理论或 martingale 趋同理论, 我们收回 SGD 和 SHB 的结果:$\min\\\ leq t} nabla F(w_ s) 2= o( t) f( t) $( t) f( 2) = 美元(x) 美元(x(x) 美元(x) - gammama/ (1)\ max(p)\ max(x(x) 美元(x(xx) 美元(x(x) x(xxx) 美元(x(x) x(xxxxx) x(xxxxxx) x(xxxxx) x(x) x(xxxxxxxxxx) x(xxxxxxx) xxx) x(xx(xxxxxxxxxxxxxxxx)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Article 185

Title@2025-07-09 (3): TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores

Title: TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores TRIP: Ein nichtparametrischer Test zur Diagnose von Biased Feature Importance Scores TRIP: 诊断基线地物重要性分数的非参数测试 2507.07276v1

Authors (2): Aaron Foote, Danny Krizanc

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex model such as a random forest, such importances are not innate – as they are, e.g., with linear regression. Efficient methods have been created to provide such capabilities, with one of the most popular among them being permutation feature importance due to its efficiency, model-agnostic nature, and perceived intuitiveness. However, permutation feature importance has been shown to be misleading in the presence of dependent features as a result of the creation of unrealistic observations when permuting the dependent features. In this work, we develop TRIP (Test for Reliable Interpretation via Permutation), a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores that are the result of model extrapolation. To build on this, we demonstrate how the test can be complemented in order to allow its use in high dimensional settings. Through testing on simulated data and applications, our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.

在准确预测的同时,了解每个特征对预测的贡献,即该特征的重要性,是机器学习模型中一个可取和可论证的必要组成部分。对于随机森林等复杂模型来说,这种重要性并不是内在的 – – 因为它们是线性回归。已经制定了高效的方法来提供这种能力,其中最受欢迎的一个是变异特征,因其效率、模型-不可知性以及感觉的直觉而具有重要性。然而,由于在调整依赖性特征时产生了不切实际的观察结果,因此在存在依赖性特征时,变异性特征的重要性被证明具有误导性。在这项工作中,我们开发了TRIP(通过变异性进行可靠解释试验),这一测试要求最低限度的假设能够检测不可靠的变异性特征分数,这是模型外推的结果。要在此基础上,我们证明如何对测试进行补充,以便能够在高维度环境中使用该测试。通过对模拟数据和应用的测试,我们的结果表明,在测算特征分数时,测试可以可靠地用于可靠地检测。


Article 186

Title@2025-07-09 (3): Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time

Title: Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time Jenseits der ATE: Interpretierbare Modellierung von Behandlungseffekten über Dosis und Zeit 超越ATE:可解释的剂量和时间处理效果模型 2507.07271v1

Authors (3): Julianna Piskorz, Krzysztof Kacprzyk, Mihaela van der Schaar

The Average Treatment Effect (ATE) is a foundational metric in causal inference, widely used to assess intervention efficacy in randomized controlled trials (RCTs). However, in many applications – particularly in healthcare – this static summary fails to capture the nuanced dynamics of treatment effects that vary with both dose and time. We propose a framework for modelling treatment effect trajectories as smooth surfaces over dose and time, enabling the extraction of clinically actionable insights such as onset time, peak effect, and duration of benefit. To ensure interpretability, robustness, and verifiability – key requirements in high-stakes domains – we adapt SemanticODE, a recent framework for interpretable trajectory modelling, to the causal setting where treatment effects are never directly observed. Our approach decouples the estimation of trajectory shape from the specification of clinically relevant properties (e.g., maxima, inflection points), supporting domain-informed priors, post-hoc editing, and transparent analysis. We show that our method yields accurate, interpretable, and editable models of treatment dynamics, facilitating both rigorous causal analysis and practical decision-making.

平均治疗效果(ATE)是因果推断的一项基本指标,广泛用于评估随机控制试验(RCTs)的干预效果。然而,在许多应用中,特别是在医疗保健方面,这种静态摘要未能捕捉到剂量和时间随剂量和时间而变化的治疗效果的细微动态。我们提议了一个模型处理效果轨迹的框架,作为剂量和时间平滑的表面,以便提取临床可操作的洞察力,如起始时间、峰值效应和效益的持续时间。为了确保可解释性、稳健性和可核查性 – – 高摄入领域的关键要求 – – 我们调整了可解释轨迹建模的最近框架,即可解释的轨迹建模,以适应从未直接观察到治疗效果的因果环境。我们的方法将轨迹形状的估算与临床相关特性的规格(如峰值、裂点)脱钩,支持领域知情的前期、热后编辑和透明分析。我们显示,我们的方法产生了准确、可解释和可编辑的治疗动态模型,便利严格的因果关系分析和实际决策。


Article 187

Title@2025-07-09 (3): Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Title: Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors Robuster multimodaler Lernrahmen für die Erfassung von Aufnahmegesten mit kontaktlosen Radar- und tragbaren IMU-Sensoren 使用无接触雷达和可穿戴的IMU传感器掌握手势的强有力多模式学习框架 2507.07261v1

Authors (3): Chunzhuo Wang, Hans Hallez, Bart Vanrumste

Automated food intake gesture detection plays a vital role in dietary monitoring, enabling objective and continuous tracking of eating behaviors to support better health outcomes. Wrist-worn inertial measurement units (IMUs) have been widely used for this task with promising results. More recently, contactless radar sensors have also shown potential. This study explores whether combining wearable and contactless sensing modalities through multimodal learning can further improve detection performance. We also address a major challenge in multimodal learning: reduced robustness when one modality is missing. To this end, we propose a robust multimodal temporal convolutional network with cross-modal attention (MM-TCN-CMA), designed to integrate IMU and radar data, enhance gesture detection, and maintain performance under missing modality conditions. A new dataset comprising 52 meal sessions (3,050 eating gestures and 797 drinking gestures) from 52 participants is developed and made publicly available. Experimental results show that the proposed framework improves the segmental F1-score by 4.3% and 5.2% over unimodal Radar and IMU models, respectively. Under missing modality scenarios, the framework still achieves gains of 1.3% and 2.4% for missing radar and missing IMU inputs. This is the first study to demonstrate a robust multimodal learning framework that effectively fuses IMU and radar data for food intake gesture detection.

自动食物摄入姿态检测在饮食监测、客观和持续跟踪饮食行为以支持更好的健康结果方面发挥着关键作用。在这项工作中,广泛使用了人工惯性惯性测量单位(IMUs),并取得了可喜的成果。最近,无接触的雷达传感器也显示出潜力。这项研究探讨了通过多式联运学习将可磨损和无接触的感测模式相结合是否可以进一步提高检测性能。我们还应对了多式联运学习中的一项重大挑战:缺少一种模式时会降低强度。为此,我们提议建立一个强有力的多式时际共变网络,提供跨模式的关注(MMM-TCN-CMA),旨在整合IMU和雷达数据,加强手势检测,并在缺失模式条件下保持性能。由52名参与者52次餐会(3 050次吃手势和797次饮意感)组成的新数据集正在开发并公开提供。实验结果表明,拟议的框架将分节F1分数分数分别改善4.3%和5.2%的单式雷达和IMU模型。在缺失模式假设下,框架仍然取得了1.3%和2.4%的收益,用于缺失的MUMU的磁感测和缺失的MU数据。这一研究展示了缺失的MU的模型数据。该测试和IMFMU的学习框架。


Article 188

Title@2025-07-09 (3): Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record

Title: Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record Kryptogener Schlaganfall und Migräne: Verwendung von probabilistischer Unabhängigkeit und maschinellem Lernen, um latente Krankheitsquellen aus dem elektronischen Gesundheitsbericht zu entdecken 低温和偏头痛:利用概率独立和机器学习,从电子健康记录中发现潜在的疾病源 2505.04631v2

Authors (3): Joshua W. Betts, John M. Still, Thomas A. Lasko

Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.

微生物是一种常见但复杂的神经疾病,使低温中风的终生风险翻一番。然而,这种关系的特点仍然很差,很少有临床准则可以降低这一相关风险。因此,我们提议采用数据驱动方法,从电子健康记录数据中提取概率独立的来源,并为偏头痛病人的慢性中风形成一个10年风险预测模型。这些来源代表着外部潜在变数,根据EHR数据构建的因果图和我国人口中高温中风的大致根源发挥作用。一个随机森林模型对这些来源的病人表达方式进行了培训,该模型的准确性很高(0.771 ROC 0.771),并确定了在偏头痛病人中CS的十大预测来源。这些来源表明,药理科干预是减少我们人口中的慢性中风风险的最重要因素,并确定了与过敏病有关的因素,作为偏头痛病人的CS的潜在诱导源。


Article 189

Title@2025-07-09 (3): AXLearn: Modular Large Model Training on Heterogeneous Infrastructure

Title: AXLearn: Modular Large Model Training on Heterogeneous Infrastructure AXLearn: Modulares Großes Modelltraining auf heterogener Infrastruktur AXLearn:关于不同差异基础设施的模块大型示范培训 2507.05411v2

Authors (37): Mark Lee, Tom Gunter, Chang Lan, John Peebles, Hanzhi Zhou, Kelvin Zou, Sneha Bangalore, Chung-Cheng Chiu, Nan Du, Xianzhi Du, Philipp Dufter, Ruixuan Hou, Haoshuo Huang, Dongseong Hwang, Xiang Kong, Jinhao Lei, Tao Lei, Meng Li, Li Li, Jiarui Lu, Zhiyun Lu, Yiping Ma, David Qiu, Vivek Rathod, Senyu Tong, Zhucheng Tu, Jianyu Wang, Yongqiang Wang, Zirui Wang, Floris Weers, Sam Wiseman, Guoli Yin, Bowen Zhang, Xiyou Zhou, Danyang Zhuo, Cheng Leong, Ruoming Pang

We design and implement AXLearn, a production deep learning system that facilitates scalable and high-performance training of large deep learning models. Compared to other state-of-the-art deep learning systems, AXLearn has a unique focus on modularity and support for heterogeneous hardware infrastructure. AXLearn’s internal interfaces between software components follow strict encapsulation, allowing different components to be assembled to facilitate rapid model development and experimentation on heterogeneous compute infrastructure. We introduce a novel method of quantifying modularity via Lines-of-Code (LoC)-complexity, which demonstrates how our system maintains constant complexity as we scale the components in the system, compared to linear or quadratic complexity in other systems. This allows integrating features such as Rotary Position Embeddings (RoPE) into AXLearn across hundred of modules with just 10 lines of code, compared to hundreds as required in other systems. At the same time, AXLearn maintains equivalent performance compared to state-of-the-art training systems. Finally, we share our experience in the development and operation of AXLearn.

我们设计并实施了AXLearn, 这是一种有助于对大型深层学习模式进行可扩缩和高性能培训的生产深层学习系统。与其他最先进的深层学习系统相比, AXLearn 以模块性和对多种硬件基础设施的支持为独特的重点。AXLearn 软件组件之间的内部界面采用了严格的封装,允许将不同的组件组装起来,以便利对多种计算基础设施进行快速模型开发和实验。我们采用了一种新的方法,通过Code(LoC)兼容性来量化模块性。我们采用的方法,以量化模块性,说明我们系统在扩大系统组件的规模时,与其他系统中的线性或二次式复杂性相比,如何保持始终的复杂性。这使我们得以将扶轮定位嵌入(ROPE)等特征纳入AXLearn的100个模块,只有10行代码,而其他系统则需要几百行。与此同时, AXLearn 保持与最新培训系统同等的性能。最后,我们分享我们在AXLears的开发和操作中的经验。


Article 190

Title@2025-07-09 (3): Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning

Title: Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning Ausnutzen von Edge-Features für übertragbare adversariale Angriffe im verteilten maschinellen Lernen 在分配式机器学习中利用可转移的对地攻击的边缘特征 2507.07259v1

Authors (5): Giulio Rossolini, Fabio Brau, Alessandro Biondi, Battista Biggio, Giorgio Buttazzo

As machine learning models become increasingly deployed across the edge of internet of things environments, a partitioned deep learning paradigm in which models are split across multiple computational nodes introduces a new dimension of security risk. Unlike traditional inference setups, these distributed pipelines span the model computation across heterogeneous nodes and communication layers, thereby exposing a broader attack surface to potential adversaries. Building on these motivations, this work explores a previously overlooked vulnerability: even when both the edge and cloud components of the model are inaccessible (i.e., black-box), an adversary who intercepts the intermediate features transmitted between them can still pose a serious threat. We demonstrate that, under these mild and realistic assumptions, an attacker can craft highly transferable proxy models, making the entire deep learning system significantly more vulnerable to evasion attacks. In particular, the intercepted features can be effectively analyzed and leveraged to distill surrogate models capable of crafting highly transferable adversarial examples against the target model. To this end, we propose an exploitation strategy specifically designed for distributed settings, which involves reconstructing the original tensor shape from vectorized transmitted features using simple statistical analysis, and adapting surrogate architectures accordingly to enable effective feature distillation. A comprehensive and systematic experimental evaluation has been conducted to demonstrate that surrogate models trained with the proposed strategy, i.e., leveraging intermediate features, tremendously improve the transferability of adversarial attacks. These findings underscore the urgent need to account for intermediate feature leakage in the design of secure distributed deep learning systems.

随着机器学习模型越来越多地在事物环境的互联网边缘部署,一种分化的深层次学习模式,将模型分成多个计算节点,引入了安全风险的新层面。与传统的推理设置不同,这些分布式管道跨越了不同节点和通信层的模型计算,从而暴露了更广泛的攻击表面。基于这些动机,这项工作探索了一种以前忽视的脆弱性:即使模型的边缘和云层部分都无法进入(即黑箱),截获它们之间传输的中间特征的对手仍会构成严重的威胁。我们证明,根据这些轻度和现实的假设,攻击者可以制造高度可转移的代用模型,使整个深层次的学习系统更容易被规避攻击。特别是,截取的特征可以进行有效分析和利用,以蒸馏能够针对目标模型设计高度可转移的对抗性实例。为此,我们提出了专门为分布式环境设计的开发战略,其中涉及利用简单的统计分析,从传导传输的深层特征重建原始的变压形状,并相应地调整代位结构结构结构,从而使得经过培训的中期设计模型能够有效地进行细化。


Article 191

Title@2025-07-09 (3): A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms

Title: A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms Ein sprachgetriebener Rahmen zur Verbesserung personalisierter Empfehlungen: Zusammenführung von LLMs mit traditionellen Algorithmen 改进个性建议的语言驱动框架:将LLMs与传统等级合并 2507.07251v1

Authors (2): Aaron Goldstein, Ayan Dutta

Traditional recommendation algorithms are not designed to provide personalized recommendations based on user preferences provided through text, e.g., “I enjoy light-hearted comedies with a lot of humor”. Large Language Models (LLMs) have emerged as one of the most promising tools for natural language processing in recent years. This research proposes a novel framework that mimics how a close friend would recommend items based on their knowledge of an individual’s tastes. We leverage LLMs to enhance movie recommendation systems by refining traditional algorithm outputs and integrating them with language-based user preference inputs. We employ Singular Value Decomposition (SVD) or SVD++ algorithms to generate initial movie recommendations, implemented using the Surprise Python library and trained on the MovieLens-Latest-Small dataset. We compare the performance of the base algorithms with our LLM-enhanced versions using leave-one-out validation hit rates and cumulative hit rates. Additionally, to compare the performance of our framework against the current state-of-the-art recommendation systems, we use rating and ranking metrics with an item-based stratified 0.75 train, 0.25 test split. Our framework can generate preference profiles automatically based on users’ favorite movies or allow manual preference specification for more personalized results. Using an automated approach, our framework overwhelmingly surpassed SVD and SVD++ on every evaluation metric used (e.g., improvements of up to ~6x in cumulative hit rate, ~3.7x in NDCG, etc.), albeit at the cost of a slight increase in computational overhead.

传统建议算法的设计并不是要根据文本提供的用户偏好提供个性化建议,例如,“我享受有幽默感的轻心喜剧”。大型语言模型(LLMS)近年来已成为自然语言处理最有希望的工具之一。这项研究提出了一个新框架,模仿亲密朋友如何根据个人品味知识推荐项目。我们利用LLMS改进传统算法产出并将其与基于语言的用户偏好投入结合起来,以加强电影建议系统。我们使用Singulal 值分解(SVD)或SVD++)算法来产生初步的电影建议,使用Surprise Python图书馆实施,并在MovesLans-Last-Smalall数据集方面进行培训。我们用LMS-encional-hanced版本来比较基础算法的性能,使用个人品味的一次性验证率和累积冲击率。此外,我们将我们框架的性能与当前最优级的S-BS&+7建议系统进行比较,我们可以使用基于基于St-D的Stary 0.75的Servical press pral press a rual rual press press press press ex a ex a laimpal press slal press slupal press slupal labal lautal propal ex a lautal pressal press pressal press pressal press press pressal pressal pressal pressal pressal pressal pressal pressal pressal pressal ex a ex ex ex ex labal ex exal exal exal exal ex exal exal 框架,我们以在S6 pressal pressal pressal pressalalalalalalal press pressal pressal press pressal pressal pressal pressal 和以在S6 press pressal_S6 pressal pressal pressal


Article 192

Title@2025-07-09 (3): Position: Adopt Constraints Over Penalties in Deep Learning

Title: Position: Adopt Constraints Over Penalties in Deep Learning Position: Überstrapazierte Strafen im Deep Learning adoptieren 职位:在深深学习中采用约束措施以凌驾刑罚 2505.20628v2

Authors (3): Juan Ramirez, Meraj Hashemizadeh, Simon Lacoste-Julien

Recent efforts to develop trustworthy AI systems with accountability guarantees have led to widespread use of machine learning formulations incorporating external requirements, or constraints. These requirements are often enforced via penalization–adding fixed-weight terms to the task loss. We argue this approach is fundamentally ill-suited since there may be no penalty coefficient that simultaneously ensures constraint satisfaction and optimal constrained performance, i.e., that truly solves the constrained problem. Moreover, tuning these coefficients requires costly trial-and-error, incurring significant time and computational overhead. We, therefore, advocate for broader adoption of tailored constrained optimization methods–such as the Lagrangian approach, which jointly optimizes the penalization “coefficients” (the Lagrange multipliers) and the model parameters. Such methods (i) truly solve the constrained problem and do so accountably, by clearly defining feasibility and verifying when it is achieved, (ii) eliminate the need for extensive penalty tuning, and (iii) integrate seamlessly with modern deep learning pipelines.

最近为发展具有问责制保障的可靠的AI系统所作的努力导致广泛使用包含外部要求或限制的机器学习配方,这些要求往往通过惩罚性附加固定重量条件来强制执行,我们争辩说,这种办法根本是不合适的,因为可能没有惩罚系数同时确保限制满意度和最佳限制业绩,即真正解决受限制问题;此外,调和这些系数需要花费大量时间和计算间接费用的试验和测试费用,因此,我们主张更广泛地采用量身定制的限制性优化方法,例如拉格兰吉办法,这种办法共同优化惩罚性“系数”(拉格兰特乘数)和示范参数。这些方法(一) 真正解决受限制的问题,并在达到后明确确定可行性和核查,从而做到会计分明,(二) 消除广泛调整刑罚的需要,(三) 与现代深层学习管道接轨。


Article 193

Title@2025-07-09 (3): Leveraging the Structure of Medical Data for Improved Representation Learning

Title: Leveraging the Structure of Medical Data for Improved Representation Learning Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen 利用医疗数据结构改进代表性学习 2507.02987v2

Authors (10): Andrea Agostini, Sonia Laguna, Alain Ryser, Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce

与互联网规模公司不同,像MIMIC-CXR这样的临床数据集提供了有限的图像计数和稀缺的注释,但通过多视图成像呈现出丰富的内部结构。我们提议了一个自我监督的框架,利用医疗数据集的固有结构。具体地说,我们把配对的胸腔X光片(即前视和横向观)作为自然正面对子处理,学习从稀疏的片段重建每一面,同时调整其潜伏。我们的方法不需要文字监督,并产生信息说明。对MIMIC-CXR的评价显示,与监督的目标和基线相比,我们表现良好,在没有杠杆结构的情况下,正在培训。这项工作为数据结构化但稀缺的领域特定培训提供了一种轻量、模式-通用的蓝图。


Article 194

Title@2025-07-09 (3): Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention

Title: Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention Aufmerksamkeiten unter dem Mikroskop: Eine vergleichende Studie der Ressourcennutzung für Varianten der Selbstachtung ” 显微镜 “ 下的注意:自觉变式资源利用比较研究 2507.07247v1

Authors (4): Zhengyu Tian, Anantha Padmanaban Krishna Kumar, Hemant Krishnakumar, Reza Rawassizadeh

As large language models (LLMs) and visual language models (VLMs) grow in scale and application, attention mechanisms have become a central computational bottleneck due to their high memory and time complexity. While many efficient attention variants have been proposed, there remains a lack of rigorous evaluation on their actual energy usage and hardware resource demands during training. In this work, we benchmark eight attention mechanisms in training GPT-2 architecture, measuring key metrics including training time, GPU memory usage, FLOPS, CPU usage, and power consumption. Our results reveal that attention mechanisms with optimized kernel implementations, including Flash Attention, Locality-Sensitive Hashing (LSH) Attention, and Multi-Head Latent Attention (MLA), achieve the best energy efficiency. We further show that lower GPU power alone does not guarantee reduced energy use, as training time plays an equally important role. Our study highlights the importance of energy-aware benchmarking in attention design and provides a practical insight for selecting resource-efficient mechanisms. All our codes are available at GitHub.

随着大型语言模型(LLMS)和视觉语言模型(VLMS)在规模和应用上不断增长,关注机制由于记忆和时间的高度复杂,已成为一个核心的计算瓶颈。虽然提出了许多高效的关注变量,但是仍然缺乏对其实际能源使用和训练期间硬件资源需求的严格评价。在这项工作中,我们在培训GPT-2结构、衡量关键指标(包括培训时间、GPU记忆使用、FLOPS、CPU使用和电力消耗等)方面的八项关注机制中设定基准。我们的结果表明,关注机制具有优化的内核实施机制,包括闪应、地方敏感散射(LSH)关注和多领导人关注(MLA),实现了最佳的能效。我们进一步表明,由于培训时间发挥同样重要的作用,仅靠较低的GPUPU不能保证减少能源的使用。我们的研究强调了能源意识基准在关注设计中的重要性,并为选择资源效率机制提供了实用的洞察力。我们的所有代码都可在GitHub提供。


Article 195

Title@2025-07-09 (3): Cosmos World Foundation Model Platform for Physical AI

Title: Cosmos World Foundation Model Platform for Physical AI Kosmos World Foundation Modellplattform für physische KI 宇宙世界基金会物理光学示范平台 2501.03575v3

Authors (79): NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make Cosmos open-source and our models open-weight with permissive licenses available via https://github.com/nvidia-cosmos/cosmos-predict1.

物理人工智能需要首先进行数字培训。 它需要自己的数字双胞胎、政策模型和世界的数码双胞胎,即世界模型。 在本文中,我们介绍了宇宙世界基金会模型平台,以帮助开发者为其物理人工智能设置建立定制的世界模型。我们把世界基础模型定位为通用世界模型,该模型可以精确地适应为下游应用而定制的世界模型。我们的平台包括视频整理管道、预先培训的世界基础模型、预先培训的世界基础模型、培训后世界基础模型的范例和视频符号。为了帮助物理人工智能建设者解决我们社会最关键问题,我们通过https://github.com/nvidia-cosmos/cosmospace-predict1 提供许可,使宇宙开放源和模型开放重量。


Article 196

Title@2025-07-09 (3): Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

Title: Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased Herausforderungen beim Lernen aus unausgeglichenen Daten mit Baum-basierten Modellen: Prävalenzschätzungen hängen systematisch von Hyperparametern ab und können nach oben verzerrt sein 利用树基模型从不平衡数据中吸取挑战:流行率估计数系统依赖超参数,可能向上偏偏 2412.16209v2

Authors (3): Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model’s predictions because the model learns from a dataset that does not follow the same data generating process as new data. One way of accounting for this bias is to analytically map the resulting predictions to new values based on the sampling rate for the majority class, which was used to create the training dataset. While this approach may work well for some machine learning models, we show that calibrating a random forest this way has unintended negative consequences, including prevalence estimates that can be upwardly biased. These prevalence estimates depend on both i) the number of predictors considered at each split in the random forest; and ii) the sampling rate used. We explain the former using known properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to the widespread belief that decision trees are biased towards the majority class, they actually can be biased towards the minority class.

在许多研究领域出现了平衡的二进制分类问题。 当使用机器学习模型来对这些问题进行分类时,通常会分解多数阶层(即低抽样),为模型培训建立一个(更多)平衡的数据集。这偏向模型的预测,因为模型从一个数据集中学习,而该数据集并不遵循与新数据相同的数据生成过程。这种偏差的一个计算方法就是根据多数阶层的抽样率对预测结果进行新的值分析图解,而这种抽样率被用来创建培训数据集。虽然这种方法对某些机器学习模型可能效果良好,但我们发现随机调整森林会产生意想不到的负面后果,包括流行性估计可能向上偏差。这些流行率估计取决于在随机森林中每一分割的预测数;以及所使用的抽样率。我们用随机森林和分析校准的已知特性来解释前者。然而,在调查后一问题时,我们发现一个惊人的发现,与普遍认为决策树对多数阶层有偏向相反,它们实际上可能偏向少数阶层。


Article 197

Title@2025-07-09 (3): Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research Challenges

Title: Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research Challenges Semantic Edge Computing und Semantic Communications in 6G Networks: Eine einheitliche Umfrage und Forschungsherausforderungen 6G网络中的电子计算和语义通信:统一调查和研究挑战 2411.18199v3

Authors (4): Milin Zhang, Mohammad Abdi, Venkat R. Dasari, Francesco Restuccia

Semantic Edge Computing (SEC) and Semantic Communications (SemComs) have been proposed as viable approaches to achieve real-time edge-enabled intelligence in sixth-generation (6G) wireless networks. On one hand, SemCom leverages the strength of Deep Neural Networks (DNNs) to encode and communicate the semantic information only, while making it robust to channel distortions by compensating for wireless effects. Ultimately, this leads to an improvement in the communication efficiency. On the other hand, SEC has leveraged distributed DNNs to divide the computation of a DNN across different devices based on their computational and networking constraints. Although significant progress has been made in both fields, the literature lacks a systematic view to connect both fields. In this work, we fulfill the current gap by unifying the SEC and SemCom fields. We summarize the research problems in these two fields and provide a comprehensive review of the state of the art with a focus on their technical strengths and challenges.

在第六代(6G)无线网络中,SEMCom利用深神经网络(DNNs)的力量来编码和传播语义信息,同时通过补偿无线效应来有力地引导扭曲现象,最终导致通信效率的提高;另一方面,SEC利用分布式DNN,根据计算和联网限制,将DNN的计算分为不同装置。虽然在这两个领域都取得了重大进展,但文献缺乏将这两个领域连接起来的系统观点。在这项工作中,我们通过统一SEC和SEMCom这两个领域来填补目前的差距。我们总结了这两个领域的研究问题,并全面审查了技术优势和挑战。


Article 198

Title@2025-07-09 (3): Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism

Title: Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism Kostenreduzierung: Der Weg zur Optimierung der Gedankenkette durch Sparse-Achtungsmechanismus 降低理据成本:通过分散关注机制优化思维链的道路 2411.09111v8

Authors (1): Libo Wang

In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit’s reasoning time and chain of thought length are significantly lower than o1 Preview. It verifies the feasibility of sparse attention mechanism for optimizing chain of thought reasoning. Detailed architectural details and experimental process have been uploaded to Github, the link is:https://github.com/brucewang123456789/GeniusTrail.git.

为了解决大语言模型推论成本激增中思维链的问题,本研究建议使用一个很少注意的机制,仅侧重于几个相关的象征物。研究者建立了一个新的关注机制,并使用经过自定义的GPT培训的GiantRabbbit作为实验工具。实验测试和比较了该模型和O1预览的推理时间、正确性分数和思维长度链,以解决MIT OpenCourseWare的线性代数测试问题。结果显示,GiantRabbbit的推理时间和思维长度链大大低于O1预览。它验证了为优化思维推理链而分散注意机制的可行性。详细的建筑细节和实验过程已经上传到Github,链接是:https://github.com/brucewang123456789/GeniusTrail.git。


Article 199

Title@2025-07-09 (3): Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture

Title: Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture Auf dem Weg zu robusten Surrogate-Modellen: Benchmarking Machine Learning Ansätze zur Beschleunigung von Phasenfeldsimulationen von Bruchbruch 走向强力替代模型:为加速阶段断裂断裂的实地模拟工作制定机构学习方法基准 2507.07237v1

Authors (2): Erfan Hamdi, Emma Lejeune

Data driven approaches have the potential to make modeling complex, nonlinear physical phenomena significantly more computationally tractable. For example, computational modeling of fracture is a core challenge where machine learning techniques have the potential to provide a much needed speedup that would enable progress in areas such as mutli-scale modeling and uncertainty quantification. Currently, phase field modeling (PFM) of fracture is one such approach that offers a convenient variational formulation to model crack nucleation, branching and propagation. To date, machine learning techniques have shown promise in approximating PFM simulations. However, most studies rely on overly simple benchmarks that do not reflect the true complexity of the fracture processes where PFM excels as a method. To address this gap, we introduce a challenging dataset based on PFM simulations designed to benchmark and advance ML methods for fracture modeling. This dataset includes three energy decomposition methods, two boundary conditions, and 1,000 random initial crack configurations for a total of 6,000 simulations. Each sample contains 100 time steps capturing the temporal evolution of the crack field. Alongside this dataset, we also implement and evaluate Physics Informed Neural Networks (PINN), Fourier Neural Operators (FNO) and UNet models as baselines, and explore the impact of ensembling strategies on prediction accuracy. With this combination of our dataset and baseline models drawn from the literature we aim to provide a standardized and challenging benchmark for evaluating machine learning approaches to solid mechanics. Our results highlight both the promise and limitations of popular current models, and demonstrate the utility of this dataset as a testbed for advancing machine learning in fracture mechanics research.

由数据驱动的方法有可能使建模复杂、非线性物理现象在计算上更加容易实现。例如,在机器学习技术有可能提供急需的加速,从而能够在诸如肌肉规模建模和不确定性量化等领域取得进展,骨折的分阶段实地建模(PFM)是这样一种方法,为模拟裂痕、分流和传播提供了一个方便的变异配方。迄今为止,机器学习技术在接近PFM的模拟中显示了前景。然而,大多数研究都依赖于过于简单的基准,并不反映骨折过程的真正复杂性,而PFM作为一种方法优于骨折的精确程度。为了解决这一差距,我们采用基于PFM模拟的具有挑战性的数据集,以基准和推进骨折建模的ML方法。这一数据集包括三种能量分解配法、两个边界条件和1 000个随机的初始裂变配置。每个样本都包含记录裂缝场时间演变的100个时间步骤。除了这一数据设置之外,我们还执行和评估了当前骨折质断裂过程的精度进程进程,同时为我们的骨质测试基准和骨质测试模型,同时展示了我们的骨质测试模型,并展示了我们的骨质研究基准和骨质研究基准。


Article 200

Title@2025-07-09 (3): An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation

Title: An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation Eine informationstheoretische Perspektive zur Multi-LLM-Unsicherheitsschätzung 关于多LLLM不确定性估算的信息理论视角 2507.07236v1

Authors (6): Maya Kruse, Majid Afshar, Saksham Khatwani, Anoop Mayampurath, Guanhua Chen, Yanjun Gao

Large language models (LLMs) often behave inconsistently across inputs, indicating uncertainty and motivating the need for its quantification in high-stakes settings. Prior work on calibration and uncertainty quantification often focuses on individual models, overlooking the potential of model diversity. We hypothesize that LLMs make complementary predictions due to differences in training and the Zipfian nature of language, and that aggregating their outputs leads to more reliable uncertainty estimates. To leverage this, we propose MUSE (Multi-LLM Uncertainty via Subset Ensembles), a simple information-theoretic method that uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs. Experiments on binary prediction tasks demonstrate improved calibration and predictive performance compared to single-model and naive ensemble baselines.

大型语言模型(LLMs)往往在各种投入中表现不一,表明不确定性,并促使需要在高占用环境中对其量化。先前的校准和不确定性量化工作往往侧重于单个模型,忽略了模型多样性的潜力。我们假设LLMs由于培训的不同和语言的西普菲语性质而作出互补预测,而将其产出合并则导致更可靠的不确定性估计。为了利用这一点,我们提议MUSE(通过子集成的多-LLMM不确定性),这是一种简单的信息理论方法,使用Jensen-Shannon differgence来识别和汇总LLMS的精准子集。二元预测任务实验显示,与单一模型和天真的共同基线相比,校准和预测性表现有所改善。


Article 201

Title@2025-07-09 (3): Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection

Title: Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection Emoji-Angriff: Verstärkung von Jailbreak-Angriffen gegen Richter LLM-Erkennung Emoji攻击:加强针对LLM法官的越狱袭击 2411.01077v3

Authors (3): Zhipeng Wei, Yuqi Liu, N. Benjamin Erichson

Jailbreaking techniques trick Large Language Models (LLMs) into producing restricted output, posing a potential threat. One line of defense is to use another LLM as a Judge to evaluate the harmfulness of generated text. However, we reveal that these Judge LLMs are vulnerable to token segmentation bias, an issue that arises when delimiters alter the tokenization process, splitting words into smaller sub-tokens. This alters the embeddings of the entire sequence, reducing detection accuracy and allowing harmful content to be misclassified as safe. In this paper, we introduce Emoji Attack, a novel strategy that amplifies existing jailbreak prompts by exploiting token segmentation bias. Our method leverages in-context learning to systematically insert emojis into text before it is evaluated by a Judge LLM, inducing embedding distortions that significantly lower the likelihood of detecting unsafe content. Unlike traditional delimiters, emojis also introduce semantic ambiguity, making them particularly effective in this attack. Through experiments on state-of-the-art Judge LLMs, we demonstrate that Emoji Attack substantially reduces the unsafe prediction rate, bypassing existing safeguards.

大量语言模型(LLMS)诱使大型语言模型(LLMS)产生限制产出,构成了潜在威胁。 一种防御之道是使用另一个LLM(LLM)作为法官评估生成文本的危害性。然而,我们发现,这些LLMS法官容易受到象征性分割偏差的影响,这个问题是当划界员改变代号过程,将单词分成小块时产生的。这改变了整个序列的嵌入,降低了检测准确性,并允许将有害内容错误地分类为安全的内容。在本文中,我们引入了Emoji Action,这是一个新颖的战略,通过利用象征性分割偏差来扩大现有的越狱事件。我们的方法在文字上学习系统地在文本中插入mojis,然后由LM法官加以评估,从而导致嵌入扭曲,从而大大降低检测不安全内容的可能性。与传统的划界员不同,Demojis还引入了语义模糊性,使这些模糊性在攻击中特别有效。通过对州级LMS的实验,我们证明Emoji攻击会大大降低不安全的预测率,绕过现有的保障措施。


Article 202

Title@2025-07-09 (3): OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

Title: OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning OVOR: OnePrompt mit Virtual Outlier Regularization für probenfreies Klassen-Inkrementelles Lernen OVOR: 为无排练班级入门学习提供虚拟外部正规化的一次性 2402.04129v2

Authors (3): Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

Recent works have shown that by using large pre-trained models along with learnable prompts, rehearsal-free methods for class-incremental learning (CIL) settings can achieve superior performance to prominent rehearsal-based ones. Rehearsal-free CIL methods struggle with distinguishing classes from different tasks, as those are not trained together. In this work we propose a regularization method based on virtual outliers to tighten decision boundaries of the classifier, such that confusion of classes among different tasks is mitigated. Recent prompt-based methods often require a pool of task-specific prompts, in order to prevent overwriting knowledge of previous tasks with that of the new task, leading to extra computation in querying and composing an appropriate prompt from the pool. This additional cost can be eliminated, without sacrificing accuracy, as we reveal in the paper. We illustrate that a simplified prompt-based method can achieve results comparable to previous state-of-the-art (SOTA) methods equipped with a prompt pool, using much less learnable parameters and lower inference cost. Our regularization method has demonstrated its compatibility with different prompt-based methods, boosting those previous SOTA rehearsal-free CIL methods’ accuracy on the ImageNet-R and CIFAR-100 benchmarks. Our source code is available at https://github.com/jpmorganchase/ovor.

最近的工作表明,通过使用大量预先培训的模型以及可学习的提示,对课堂强化学习(CIL)设置的不排练方法可以取得优异的成绩,而对于以排练为基础的突出的排练基础,不排练的CIL方法与不同任务的不同类别作斗争,因为这些任务没有经过共同培训。在这项工作中,我们提议一种基于虚拟外源的正规化方法,以收紧分类员的决定界限,从而缓解不同任务之间的混乱。最近基于快速的方法往往需要一组具体任务范围的提示,以防止与新任务相比,过度记录先前任务的知识,从而导致在查询和从集合中形成适当的提示方面进行额外的计算。正如我们在文件中所表明的那样,这种额外的成本可以在不牺牲准确性的情况下予以消除。我们指出,简化的即时制方法可以取得与先前的“最先进”(SOTA)方法相类似的结果,使用远不易学习的参数和较低的推算成本。我们的正规化方法表明它与不同基于迅速方法的兼容性,从而提升了以前SOTA-R彩排/CILA/CA的精确度源。


Article 203

Title@2025-07-09 (3): Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

Title: Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences Zeit macht Raum: Entstehung von Platzfeldern in Netzwerken Kodierung temporär kontinuierlicher sensorischer Erfahrungen 时间创造空间:网络中位置字段的出现, 网络编码 温度持续感知经验 2408.05798v3

Authors (4): Zhaoze Wang, Ronald W. Di Tullio, Spencer Rooke, Vijay Balasubramanian

The vertebrate hippocampus is believed to use recurrent connectivity in area CA3 to support episodic memory recall from partial cues. This brain area also contains place cells, whose location-selective firing fields implement maps supporting spatial memory. Here we show that place cells emerge in networks trained to remember temporally continuous sensory episodes. We model CA3 as a recurrent autoencoder that recalls and reconstructs sensory experiences from noisy and partially occluded observations by agents traversing simulated rooms. The agents move in realistic trajectories modeled from rodents and environments are modeled as high-dimensional sensory experience maps. Training our autoencoder to pattern-complete and reconstruct experiences with a constraint on total activity causes spatially localized firing fields, i.e., place cells, to emerge in the encoding layer. The emergent place fields reproduce key aspects of hippocampal phenomenology: a) remapping (maintenance of and reversion to distinct learned maps in different environments), implemented via repositioning of experience manifolds in the network’s hidden layer, b) orthogonality of spatial representations in different arenas, c) robust place field emergence in differently shaped rooms, with single units showing multiple place fields in large or complex spaces, and d) slow representational drift of place fields. We argue that these results arise because continuous traversal of space makes sensory experience temporally continuous. We make testable predictions: a) rapidly changing sensory context will disrupt place fields, b) place fields will form even if recurrent connections are blocked, but reversion to previously learned representations upon remapping will be abolished, c) the dimension of temporally smooth experience sets the dimensionality of place fields, including during virtual navigation of abstract spaces.

脊椎动物河马场据信使用CA3 区域的经常性连接来支持局部感知记忆的缩略图。 这个大脑区域还包含位置单元格, 其位置选择射击场使用支持空间记忆的地图。 这里我们显示, 细胞出现在经过训练的网络中, 以记住时间连续性感官事件。 我们模拟CA3 是一个经常性自动编码器, 以提醒和重新解析代理人在模拟房间中进行吵闹和部分隐蔽的观察所得出的感官体验。 代理器在从鼠标和环境中模拟的切合实际的平稳轨迹中移动, 以高高度感官体验地图为模型模型。 训练我们的自动编码器到模式的完整和重新构建经验, 在总体活动上, 导致空间局部射击场的空间范围, 即位置, 即位置, 即位置的移动速度变化, 在不同的战场上, 将形成一个不断变化的轨迹。


Article 204

Title@2025-07-09 (3): Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems

Title: Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems Effizienter parametrischer SVD von Koopman Operator für stochastische dynamische Systeme Koopman 斯托切动态系统操作员的高效参数SVD 2507.07222v1

Authors (4): Minchan Jeong, J. Jon Ryu, Se-Young Yun, Gregory W. Wornell

The Koopman operator provides a principled framework for analyzing nonlinear dynamical systems through linear operator theory. Recent advances in dynamic mode decomposition (DMD) have shown that trajectory data can be used to identify dominant modes of a system in a data-driven manner. Building on this idea, deep learning methods such as VAMPnet and DPNet have been proposed to learn the leading singular subspaces of the Koopman operator. However, these methods require backpropagation through potentially numerically unstable operations on empirical second moment matrices, such as singular value decomposition and matrix inversion, during objective computation, which can introduce biased gradient estimates and hinder scalability to large systems. In this work, we propose a scalable and conceptually simple method for learning the top-k singular functions of the Koopman operator for stochastic dynamical systems based on the idea of low-rank approximation. Our approach eliminates the need for unstable linear algebraic operations and integrates easily into modern deep learning pipelines. Empirical results demonstrate that the learned singular subspaces are both reliable and effective for downstream tasks such as eigen-analysis and multi-step prediction.

Koopman 操作员为通过线性操作员理论分析非线性动态系统提供了一个原则框架。 动态模式分解( DMD)的最近进展表明,轨迹数据可以用数据驱动的方式确定一个系统的主导模式。 基于这一想法,VAMPnet和DPNet等深层学习方法建议学习Koopman操作员的主要单子空间。然而,这些方法要求通过经验性第二时刻矩阵上可能数字不稳定的操作,如单值分解和矩阵反向,进行反向分析,在客观计算期间,这可能会引入偏差梯度估计,妨碍大型系统的伸缩性。在这项工作中,我们提出了一个在概念上简单、可扩展的方法,用于学习Koopman操作员基于低级近似概念的随机动态系统的顶级单项功能。我们的方法消除了对不稳定的线性高位操作的需求,并容易融入现代深层学习管道。 Empical 结果表明, 所学过的单项子空间对于下游任务,例如乙基分析和多步预测,既可靠又有效。


Article 205

Title@2025-07-09 (3): Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Title: Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards Optimas: Optimierung von KI-Systemen mit global ausgerichteten lokalen Belohnungen 最佳结果:以全球统一地方奖励办法优化化合物AI系统 2507.03041v2

Authors (13): Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component’s local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component’s local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

将多种组成部分(如大语言模型、专门工具、传统机器学习模式等)整合在一起的复合自毁系统,正越来越多地被部署用于解决复杂的现实世界任务。然而,优化复合系统因其结构没有差异,各组成部分的配置类型不同,包括提示、超参数和模型参数等,因此仍然具有挑战性。为了应对这一挑战,我们提议了Optimas,一个有效优化复合系统的统一框架。Optimas的核心理念是每个组成部分都保持一个地方奖励函数(LRF),每个组成部分都符合一个地方-全球协调属性,即每个组成部分的当地奖赏与全球系统性能相关。在每次迭代,Optimas高效地调整LRFs,以维护这一属性,同时最大限度地利用每个组成部分的当地奖赏。这个方法能够利用指定的优化方法独立更新多种组合的配置,同时确保地方的改进始终能带来绩效收益。我们介绍了五个真实世界的复合系统的广泛评价,以显示,每个组成部分的精确性比强基线高出11.92%的平均改进率。我们的网站在 https://toimstandal.


Article 206

Title@2025-07-09 (3): Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains

Title: Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains Neurosymbolische Merkmalsextraktion zur Identifizierung von Zwangsarbeit in Lieferketten 查明供应链中强迫劳动情况 2507.07217v1

Authors (3): Zili Wang, Frank Montabon, Kristin Yvonne Rozier

Supply chain networks are complex systems that are challenging to analyze; this problem is exacerbated when there are illicit activities involved in the supply chain, such as counterfeit parts, forced labor, or human trafficking. While machine learning (ML) can find patterns in complex systems like supply chains, traditional ML techniques require large training data sets. However, illicit supply chains are characterized by very sparse data, and the data that is available is often (purposely) corrupted or unreliable in order to hide the nature of the activities. We need to be able to automatically detect new patterns that correlate with such illegal activity over complex, even temporal data, without requiring large training data sets. We explore neurosymbolic methods for identifying instances of illicit activity in supply chains and compare the effectiveness of manual and automated feature extraction from news articles accurately describing illicit activities uncovered by authorities. We propose a question tree approach for querying a large language model (LLM) to identify and quantify the relevance of articles. This enables a systematic evaluation of the differences between human and machine classification of news articles related to forced labor in supply chains.

供应链是复杂的系统,难以分析;当供应链中有伪造零件、强迫劳动或贩运人口等非法活动时,这一问题就更加严重。虽然机器学习(ML)可以在供应链等复杂系统中找到模式,但传统的ML技术需要大量的培训数据集;然而,非法供应链的特点是数据非常稀少,现有数据往往(目的上)腐败或不可靠,以掩盖活动的性质。我们需要能够自动发现与这种非法活动相关的新模式,超越复杂、甚至是时间性的数据,而不需要大量培训数据集。我们探索神经同步方法,以查明供应链中的非法活动案例,比较准确描述当局发现的非法活动的手动和自动专题文章的提取效果。我们提出一个问题树方法,查询大语言模型(LLMM),以查明和量化文章的相关性。这样就可以系统地评估与供应链中强迫劳动有关的新闻文章的人类和机器分类之间的差异。


Article 207

Title@2025-07-09 (3): Bias-Aware Mislabeling Detection via Decoupled Confident Learning

Title: Bias-Aware Mislabeling Detection via Decoupled Confident Learning Bias-Aware-Mislabeling-Erkennung durch entkoppeltes vertrauensvolles Lernen 通过解开信任学习解开错误标签检测 2507.07216v1

Authors (3): Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky

Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.

可靠数据是现代组织系统的基石。一个值得注意的数据完整性挑战源于标签偏差,标签偏差是指标签中的系统错误,这是一个对定量分析至关重要的共变式,其质量因社会各群体而异。这种偏差在概念上和经验上进行了探讨,并被广泛认为是跨越关键领域的一个紧迫问题。然而,解决这一问题的有效方法仍然稀缺。在这项工作中,我们建议采用分解的共鸣学习(Decole),这是一个基于原则的机器学习框架,专门用来发现受标签偏差影响的数据集中误标的事例,使偏见意识到误标检测并促进数据质量的改进。我们理论上证明DeCole的有效性,并评价其在仇恨言论检测影响性背景下的绩效,这是一个有据可查的领域。 积极的结果表明,DeCole在偏差感识别方面表现优异,一贯优于标签错误检测的替代方法。我们的工作查明并解决了识别偏差的挑战,并就如何将DeCole纳入组织数据管理做法提供了指导,作为加强数据可靠性的有力工具。


Article 208

Title@2025-07-09 (3): One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

Title: One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory Eine Trajektorie, ein Token: Erdliche Video-Tokenisierung über panoptische Sub-Objekt-Trajektorie 一个轨迹, 一个 Token: 通过泛光子物件轨迹, 固定的视频轨迹 2505.23617v2

Authors (8): Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi, Ziqi Gao, Vishnu Iyengar, Norimasa Kobori, Quan Kong, Ranjay Krishna

Effective video tokenization is critical for scaling transformer models for long videos. Current approaches tokenize videos using space-time patches, leading to excessive tokens and computational inefficiencies. The best token reduction strategies degrade performance and barely reduce the number of tokens when the camera moves. We introduce grounded video tokenization, a paradigm that organizes tokens based on panoptic sub-object trajectories rather than fixed patches. Our method aligns with fundamental perceptual principles, ensuring that tokenization reflects scene complexity rather than video duration. We propose TrajViT, a video encoder that extracts object trajectories and converts them into semantically meaningful tokens, significantly reducing redundancy while maintaining temporal coherence. Trained with contrastive learning, TrajViT significantly outperforms space-time ViT (ViT3D) across multiple video understanding benchmarks, e.g., TrajViT outperforms ViT3D by a large margin of 6% top-5 recall in average at video-text retrieval task with 10x token deduction. We also show TrajViT as a stronger model than ViT3D for being the video encoder for modern VideoLLM, obtaining an average of 5.2% performance improvement across 6 VideoQA benchmarks while having 4x faster training time and 18x less inference FLOPs. TrajViT is the first efficient encoder to consistently outperform ViT3D across diverse video analysis tasks, making it a robust and scalable solution.

有效的视频象征性化对于放大长视频变压器模型至关重要 。 当前的方法将使用时空补丁的视频象征性化, 导致过度的象征性和计算效率低下 。 最佳象征性减少策略会降低性能, 当相机移动时几乎不会减少物证数量 。 我们引入了有底线的视频象征性化模式, 这个模式可以组织基于全光子子对象轨迹而非固定补丁的物证。 我们的方法符合基本的概念性原则, 确保代号化反映场景复杂性而不是视频持续时间。 我们提议 TrajViT , 是一个视频编码化的视频编码器, 提取对象轨迹, 并将其转换为具有语义意义的代号, 大大减少冗余, 同时保持时间一致性 。 我们通过对比性学习培训, TrajViViViViT 明显超越时空 ViT (ViT) , 例如, TrajViViViVT 超越 ViFL 平均 ViD 格式, 在 ViDA 上获得比 ViL 的高级 ViVA 格式 格式的首次快速分析, 在 ViL 4 ViL 上, 在 ViL 上, 在 ViL 4 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 快速的高级分析中, 在 ViL 上, 在 ViD 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViD 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViD 上, 在 ViD 上, 在 ViD 上, 在 Vial 上, 上, 在 Vial 上, 上, 在 ViD 上, 在 Vial 上, 上, 在 ViD 上, 在 Vial 上, 上, 在 Vi 上, 上, 上, 上, 在 ViD 上, 在 Vi 上,在 Vi 上, 在上, 在上, 在上


Article 209

Title@2025-07-09 (3): Scale leads to compositional generalization

Title: Scale leads to compositional generalization Skala führt zu kompositorischer Verallgemeinerung 缩放导致编成一般化 2507.07207v1

Authors (3): Florian Redhardt, Yassir Akram, Simon Schug

Can neural networks systematically capture discrete, compositional task structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks suggest that the answer to this question is yes. However, even for the most capable models, there are still frequent failure cases that raise doubts about their compositionality. Here, we seek to understand what it takes for a standard neural network to generalize over tasks that share compositional structure. We find that simply scaling data and model size leads to compositional generalization. We show that this holds across different task encodings as long as the training distribution sufficiently covers the task space. In line with this finding, we prove that standard multilayer perceptrons can approximate a general class of compositional task families to arbitrary precision using only a linear number of neurons with respect to the number of task modules. Finally, we uncover that if networks successfully compositionally generalize, the constituents of a task can be linearly decoded from their hidden activations. We show that this metric correlates with failures of text-to-image generation models to compose known concepts.

神经网络能否系统地捕捉离散的、 构成任务结构? 大型神经网络令人印象深刻的能力表明这个问题的答案是肯定的。 但是,即使对于最有能力的模型,仍然经常出现一些失败案例,使人对其组成性产生怀疑。 在这里,我们试图了解标准神经网络需要什么来概括共享组成结构的任务。 我们发现, 仅仅缩放数据和模型大小就能导致组成普遍性。 我们显示, 只要培训分布足以覆盖任务空间, 这一点就存在于不同的任务编码中。 根据这一发现, 我们证明标准的多层透视器可以将一个普通的构成任务组类任意精确地近似, 仅使用对任务模块数的线性神经元。 最后, 我们发现, 如果网络能够成功地组成, 任务的组成可以从隐藏的激活中线性地解码。 我们显示, 这一指标与文本到图像生成模型无法构建已知概念有关。


Article 210

Title@2025-07-09 (3): MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation

Title: MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation MODA: Ein einheitliches 3D-Diffusions-Framework für multi-Task Target-Aware-Molekulargenerierung MODA:多目标目标目标软件分子生成3D统一传播框架 2507.07201v1

Authors (6): Dong Xu, Zhangfan Yang, Sisi Yuan, Jenna Xinyi Yao, Jiangqiang Li, Junkai Ji

Three-dimensional molecular generators based on diffusion models can now reach near-crystallographic accuracy, yet they remain fragmented across tasks. SMILES-only inputs, two-stage pretrain-finetune pipelines, and one-task-one-model practices hinder stereochemical fidelity, task alignment, and zero-shot transfer. We introduce MODA, a diffusion framework that unifies fragment growing, linker design, scaffold hopping, and side-chain decoration with a Bayesian mask scheduler. During training, a contiguous spatial fragment is masked and then denoised in one pass, enabling the model to learn shared geometric and chemical priors across tasks. Multi-task training yields a universal backbone that surpasses six diffusion baselines and three training paradigms on substructure, chemical property, interaction, and geometry. Model-C reduces ligand-protein clashes and substructure divergences while maintaining Lipinski compliance, whereas Model-B preserves similarity but trails in novelty and binding affinity. Zero-shot de novo design and lead-optimisation tests confirm stable negative Vina scores and high improvement rates without force-field refinement. These results demonstrate that a single-stage multi-task diffusion routine can replace two-stage workflows for structure-based molecular design.

基于扩散模型的三维分子生成器现在可以达到近晶晶晶体的精确度,但在不同任务之间仍然支离破碎。SMILES唯一的输入器、两阶段的先入先入为主的管道和一一式的模型做法阻碍了立体化学忠度、任务对齐和零弹射传输。我们引入了MODA,这是一个统一碎片生长、链接设计、链路设计、脚踏车和配有巴伊西亚面具调度仪的侧链装饰的传播框架。在培训期间,一个毗连的空间碎片被遮蔽,然后在一个通道中解开,使该模型能够学习共同的几何学和化学前期任务。多任务培训产生一个超越六个扩散基线和三个次结构、化学属性、互动和几何测量培训范例的普遍骨干。模型C减少了分辨冲突和亚结构差异,同时保持利宾斯基的合规性,而模型B保存了相似的线索,但在新颖性和约束性中,使该模型能够进行零位设计,使该模型能够学习共同学习共同的几何和化学前期化学前置。多相。多任务培训产生了一个超过六扩散基线和三个阶段的模型测试,从而证实了固定的僵化的僵化结构的Vina阶段的模型改进。


Article 211

Title@2025-07-09 (3): Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning

Title: Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning Kombination von vortrainierten Modellen für eine verbesserte Darstellung von Funktionen bei der Stärkung des Lernens 将强化学习中强化地物代表的预先培训模式相结合 2507.07197v1

Authors (5): Elia Piccoli, Malio Li, Giacomo Carfì, Vincenzo Lomonaco, Davide Bacciu

The recent focus and release of pre-trained models have been a key components to several advancements in many fields (e.g. Natural Language Processing and Computer Vision), as a matter of fact, pre-trained models learn disparate latent embeddings sharing insightful representations. On the other hand, Reinforcement Learning (RL) focuses on maximizing the cumulative reward obtained via agent’s interaction with the environment. RL agents do not have any prior knowledge about the world, and they either learn from scratch an end-to-end mapping between the observation and action spaces or, in more recent works, are paired with monolithic and computationally expensive Foundational Models. How to effectively combine and leverage the hidden information of different pre-trained models simultaneously in RL is still an open and understudied question. In this work, we propose Weight Sharing Attention (WSA), a new architecture to combine embeddings of multiple pre-trained models to shape an enriched state representation, balancing the tradeoff between efficiency and performance. We run an extensive comparison between several combination modes showing that WSA obtains comparable performance on multiple Atari games compared to end-to-end models. Furthermore, we study the generalization capabilities of this approach and analyze how scaling the number of models influences agents’ performance during and after training.

培训前模式的最近重点和发布是许多领域(如自然语言处理和计算机愿景)取得若干进展的关键组成部分,事实上,培训前模式学习不同潜伏的嵌入,分享有见地的表现;另一方面,加强学习侧重于最大限度地增加通过代理机构与环境的互动而获得的累积奖励;培训前学习者对世界没有任何事先了解,他们要么从零到零地了解观察空间与行动空间之间的端到端绘图,要么在近期的工程中,与单一和计算成本昂贵的基础模型相匹配;如何有效地合并和利用在培训前不同模型的隐藏信息,仍然是开放和研究不足的问题;在这项工作中,我们提出了“分享注意”(WSA),这是将多种培训前模式相结合的新结构,以形成更丰富的国家代表制,平衡效率和业绩之间的权衡。我们对几种组合模式进行了广泛的比较,表明WSA在多个Atari游戏上取得与最终培训模式相比可比的业绩,在总体分析模型和升级后如何分析。


Article 212

Title@2025-07-09 (3): Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching

Title: Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching Bridging the Last Mile of Prediction: Verbesserung der Zeitreihenvorhersage mit konditional gesteuertem Flow Matching 连接预测的最后一环:加强时间序列预测与有条件的引导流动匹配 2507.07192v1

Authors (5): Huibo Xu, Runlong Yu, Likang Wu, Xianquan Wang, Qi Liu

Diffusion models, a type of generative model, have shown promise in time series forecasting. But they face limitations like rigid source distributions and limited sampling paths, which hinder their performance. Flow matching offers faster generation, higher-quality outputs, and greater flexibility, while also possessing the ability to utilize valuable information from the prediction errors of prior models, which were previously inaccessible yet critically important. To address these challenges and fully unlock the untapped potential of flow matching, we propose Conditional Guided Flow Matching (CGFM). CGFM extends flow matching by incorporating the outputs of an auxiliary model, enabling a previously unattainable capability in the field: learning from the errors of the auxiliary model. For time series forecasting tasks, it integrates historical data as conditions and guidance, constructs two-sided conditional probability paths, and uses a general affine path to expand the space of probability paths, ultimately leading to improved predictions. Extensive experiments show that CGFM consistently enhances and outperforms state-of-the-art models, highlighting its effectiveness in advancing forecasting methods.

传播模型是一种基因模型,在时间序列预测中显示出希望。但是,它们面临着僵化源分布和有限的取样路径等限制,妨碍了它们的性能。流动匹配提供了更快的生成、更高质量的产出和更大的灵活性,同时也具备了利用先前模型预测错误的宝贵信息的能力,这些错误以前是无法获得的,但至关重要。为了应对这些挑战和充分挖掘尚未开发的流量匹配潜力,我们提议有条件的引导流动匹配(CGM)。CGMM通过纳入辅助模型的产出来扩展流量匹配,使实地具备以前无法实现的能力:从辅助模型的错误中学习。对于时间序列预测任务来说,它将历史数据作为条件和指导进行整合,建立双向有条件的概率路径,并使用一条通路来扩大概率路径的空间,最终导致改进预测。广泛的实验表明,CGFM不断加强和完善最新模型,突出其在推进预测方法方面的效力。


Article 213

Title@2025-07-09 (3): Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Title: Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs Bepflanzt in der Vorausbildung, durch Finetuning abgeschwächt: Eine Fallstudie über die Herkunft von Kognitiv-Biasen in LLMs 编在培训前编,《微调:关于LLM中认知性双星起源的个案研究》,《微调摇摇晃》 2507.07186v1

Authors (3): Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

Large language models (LLMs) exhibit cognitive biases – systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over $30$ cognitive biases. Second, we introduce \emph{cross-tuning} – swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness introduces some variability, biases are mainly shaped by pretraining: models with the same pretrained backbone exhibit more similar bias patterns than those sharing only finetuning data. These insights suggest that understanding biases in finetuned models requires considering their pretraining origins beyond finetuning effects. This perspective can guide future efforts to develop principled strategies for evaluating and mitigating bias in LLMs.

大型语言模型(LLMS)表现出认知偏差 – – 系统性的非理性决策倾向,类似于人类所见的不合理决策倾向。先前的工作发现,这些偏差在各模型之间有所不同,可以通过指令调换加以扩大。然而,由于训练随机性,这些偏差的差别是否源于训练前、微调,或甚至随机噪音,目前还不清楚。我们提出了分解这些因素的两步因果实验方法。首先,我们用不同的随机种子对模型进行多次微调,以研究训练随机性如何影响超过30美元认知偏差。第二,我们引入了\emph{交叉调} – – 在模型之间互换教学数据集以孤立偏差源。这种交换利用数据集导致不同的偏差模式,直接测试偏差是否取决于数据设置。我们的调查结果显示,虽然训练随机性带来了某些差异,但偏差主要取决于训练前:与受过训练的骨架模型显示的偏差模式比共享的微调数据更加相似。这些洞察显示,在微调模型中理解偏差需要考虑其前训练起源,而不是微调效果。这一视角可以指导今后制定评价和减轻偏差的战略。


Article 214

Title@2025-07-09 (3): Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Title: Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Auf dem Weg zum multimodalen Verständnis durch stabile Diffusion als Task-Aware Feature Extractor 通过稳定传播实现多模式理解,作为任务软件特写提取器 2507.07106v1

Authors (6): Vatsal Agarwal, Matthew Gwilliam, Gefen Kohavi, Eshan Verma, Daniel Ulbricht, Abhinav Shrivastava

Recent advances in multimodal large language models (MLLMs) have enabled image-based question-answering capabilities. However, a key limitation is the use of CLIP as the visual encoder; while it can capture coarse global information, it often can miss fine-grained details that are relevant to the input query. To address these shortcomings, this work studies whether pre-trained text-to-image diffusion models can serve as instruction-aware visual encoders. Through an analysis of their internal representations, we find diffusion features are both rich in semantics and can encode strong image-text alignment. Moreover, we find that we can leverage text conditioning to focus the model on regions relevant to the input question. We then investigate how to align these features with large language models and uncover a leakage phenomenon, where the LLM can inadvertently recover information from the original diffusion prompt. We analyze the causes of this leakage and propose a mitigation strategy. Based on these insights, we explore a simple fusion strategy that utilizes both CLIP and conditional diffusion features. We evaluate our approach on both general VQA and specialized MLLM benchmarks, demonstrating the promise of diffusion models for visual understanding, particularly in vision-centric tasks that require spatial and compositional reasoning. Our project page can be found https://vatsalag99.github.io/mustafar/.

最近多式大型语言模型(MLLMS)的进步使基于图像的问答能力得以实现。然而,一个关键的限制因素是使用CLIP作为视觉编码器;虽然它可以捕捉粗糙的全球信息,但它往往会错失与输入查询相关的精细细细微的细微细节。为解决这些缺陷,本工作研究的是,经过培训的文字到图像传播模型是否可以作为教学-有觉识的视觉编码器。通过对内部表述的分析,我们发现传播功能在语义学上十分丰富,可以对强烈的图像-文本协调进行编码。此外,我们发现我们可以利用文字调节,将模型的功能集中于与输入问题相关的区域。我们然后研究如何将这些特征与大语言模型相匹配,并发现渗漏现象,使LLMM能够无意中从原始的传播速度中提取信息。我们分析了这种渗漏的原因,并提出了一项缓解战略。我们根据这些洞察,我们探索了一种简单的融合战略,既利用CLIP又有条件的传播特性。我们评估了我们关于一般VQA和MLLLMM基准的方法,展示了我们在一般的视觉/CILA/CA的视觉分析中都要求有希望的视野分析。


Article 215

Title@2025-07-09 (3): Does Data Scaling Lead to Visual Compositional Generalization?

Title: Does Data Scaling Lead to Visual Compositional Generalization? Führt die Datenskalierung zu einer visuellen Verallgemeinerung der Zusammensetzung? 数据缩放是否导致视觉构成的概括化? 2507.07102v1

Authors (3): Arnas Uselis, Andrea Dittadi, Seong Joon Oh

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect performance, suggesting partial presence of this structure. Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning. Code available at https://github.com/oshapio/visual-compositional-generalization.

对于人类的智能来说,理解的构成至关重要,但目前尚不清楚当代的视觉模型是否表现出来。主要机器学习模式的前提是,扩大数据和模型规模将改善分配外的性能,包括组成上的概括化。我们通过控制实验测试这一前提,这种实验系统地改变数据规模、概念多样性和组合覆盖。我们发现,组成上的概括化是由数据多样性驱动的,而不仅仅是数据规模。扩大组合覆盖模式迫使模型发现一个线性要素代表结构,其概念分解成添加性成分。我们证明,这一结构是效率的关键,能够使少数观测到的组合实现完全的概括化。我们评估了预先训练的模型(DINO, CLIP),我们发现超自然但不完善的性能,表明这种结构存在部分的存在。我们的工作促使更加强调为组成总体化构建多种数据集,同时考虑到代表结构的重要性,从而能够有效地进行构成性学习。代码可在 https://github.com/oshappio/VVi-composial-gentional-genization。


Article 216

Title@2025-07-09 (3): Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Title: Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful Kleine Batch Size Training für Sprachmodelle: Wenn Vanilla SGD funktioniert, und warum gradient Akkumulation verschwenderisch ist 语言模式小批量规模培训:Vanilla SGD何时奏效,为何梯度累积是浪费的 2507.07101v1

Authors (5): Martin Marek, Sanae Lotfi, Aditya Somasundaram, Andrew Gordon Wilson, Micah Goldblum

Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating gradient accumulation, which trades off the number of optimizer steps for a proportional increase in batch size. While it is common to decrease the learning rate for smaller batch sizes, other hyperparameters are often held fixed. In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyperparameters to small batch sizes. We find that small batch sizes (1) train stably, (2) are consistently more robust to hyperparameter choices, (3) achieve equal or better per-FLOP performance than larger batch sizes, and (4) notably enable stable language model training with vanilla SGD, even without momentum, despite storing no optimizer state. Building on these results, we provide practical recommendations for selecting a batch size and setting optimizer hyperparameters. We further recommend against gradient accumulation unless training on multiple devices with multiple model replicas, bottlenecked by inter-device bandwidth.

常规智慧要求小批量体使语言模型的预培训和微调不稳定,鼓励梯度积累,从而从优化步骤的数量中取舍,使批量规模成比例增加。虽然降低小批量规模的学习率是常见的,但其他超参数往往被固定下来。在这项工作中,我们重新研究小批量大小,一直到批量规模为1,我们提出了将亚当超参数缩到小批量规模的规则。我们发现小批量体积:(1) 精细的火车,(2) 始终对超常参数选择更加强大,(3) 达到等于或更好的每批量体积体积性能,以及(4) 明显地使香草SGD能够进行稳定的语言模型培训,即使没有动力,即使没有储存优化状态。在这些结果的基础上,我们为选择批量大小和设定优化的超参数提供了切实可行的建议。我们进一步建议,除非在多种模型复制器上进行培训,并被内部带宽的带宽。


Article 217

Title@2025-07-09 (3): Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts

Title: Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts Umgang mit unausgewogenem Domain-Incremental-Lernen durch Dual-Balance-Kollaborative Experten 通过双平衡合作专家解决不平等问题 2507.07100v1

Authors (4): Lan Li, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

Domain-Incremental Learning (DIL) focuses on continual learning in non-stationary environments, requiring models to adjust to evolving domains while preserving historical knowledge. DIL faces two critical challenges in the context of imbalanced data: intra-domain class imbalance and cross-domain class distribution shifts. These challenges significantly hinder model performance, as intra-domain imbalance leads to underfitting of few-shot classes, while cross-domain shifts require maintaining well-learned many-shot classes and transferring knowledge to improve few-shot class performance in old domains. To overcome these challenges, we introduce the Dual-Balance Collaborative Experts (DCE) framework. DCE employs a frequency-aware expert group, where each expert is guided by specialized loss functions to learn features for specific frequency groups, effectively addressing intra-domain class imbalance. Subsequently, a dynamic expert selector is learned by synthesizing pseudo-features through balanced Gaussian sampling from historical class statistics. This mechanism navigates the trade-off between preserving many-shot knowledge of previous domains and leveraging new data to improve few-shot class performance in earlier tasks. Extensive experimental results on four benchmark datasets demonstrate DCE’s state-of-the-art performance.

在非静止环境中持续学习,要求模型适应不断演变的领域,同时保留历史知识。在数据不平衡的背景下,DIL面临两个关键的挑战:中产阶级内部不平衡和跨部阶级分配变化。这些挑战严重阻碍了模型性能,因为部内不平衡导致少发阶级不适应,而跨部位转变则需要保持获得的丰富经验的课堂和转让知识,以提高老领域少发阶级的绩效。为了克服这些挑战,我们引入了双发合作专家框架。DCE使用了一个频率意识专家组,在其中,每位专家都以专门损失功能为指导,学习特定频率组的特点,有效地解决中产阶级不平衡问题。随后,通过从历史类统计中平衡的高斯抽样,合成假性能,学习了一位动态专家选择者。这个机制在维护以往领域多发知识与利用新数据改进早期几发阶级业绩之间实现权衡。关于四度基准数据设置的大规模实验性结果展示了四个基准状态。


Article 218

Title@2025-07-09 (3): From Pseudorandomness to Multi-Group Fairness and Back

Title: From Pseudorandomness to Multi-Group Fairness and Back Von Pseudorandomness zu Multi-Group Fairness und Back 从优多度到多群体公平与后向 2301.08837v4

Authors (4): Cynthia Dwork, Daniel Lee, Huijia Lin, Pranay Tankala

We identify and explore connections between the recent literature on multi-group fairness for prediction algorithms and the pseudorandomness notions of leakage-resilience and graph regularity. We frame our investigation using new variants of multicalibration based on statistical distance and closely related to the concept of outcome indistinguishability. Adopting this perspective leads us not only to new, more efficient algorithms for multicalibration, but also to our graph theoretic results and a proof of a novel hardcore lemma for real-valued functions.

我们发现并探索最近关于多群体公平预测算法的文献与渗漏-复原力和图表规律的假随机概念之间的联系。 我们使用基于统计距离和与结果不可分性概念密切相关的新的多校准变量来构建我们的调查。 采用这一视角不仅导致我们采用新的、更高效的多校准算法,而且导致我们的图表理论结果,并证明真实价值功能的新型硬核伦目。


Article 219

Title@2025-07-09 (3): Large-scale portfolio optimization with variational neural annealing

Title: Large-scale portfolio optimization with variational neural annealing Große Portfolio-Optimierung mit variationaler neuronaler Glühung 大规模组合组合优化,采用变异性神经净化 2507.07159v1

Authors (4): Nishan Ranabhat, Behnam Javanparast, David Goerz, Estelle Inack

Portfolio optimization is a routine asset management operation conducted in financial institutions around the world. However, under real-world constraints such as turnover limits and transaction costs, its formulation becomes a mixed-integer nonlinear program that current mixed-integer optimizers often struggle to solve. We propose mapping this problem onto a classical Ising-like Hamiltonian and solving it with Variational Neural Annealing (VNA), via its classical formulation implemented using autoregressive neural networks. We demonstrate that VNA can identify near-optimal solutions for portfolios comprising more than 2,000 assets and yields performance comparable to that of state-of-the-art optimizers, such as Mosek, while exhibiting faster convergence on hard instances. Finally, we present a dynamical finite-size scaling analysis applied to the S&P 500, Russell 1000, and Russell 3000 indices, revealing universal behavior and polynomial annealing time scaling of the VNA algorithm on portfolio optimization problems.

投资组合优化是世界各地金融机构进行的一项常规资产管理行动,然而,在诸如营业额限额和交易成本等现实世界制约因素下,其配方成为混合整数的非线性方案,目前混合整数优化器往往难以解决。 我们提议将这一问题映射成像汉密尔顿那样的古典Ising Hamiltonian(Hamiltonian),并通过使用自动递减神经网络执行的古典配方来解决它。我们证明,VNA可以为由2,000多个资产组成的组合找到近于最佳的解决方案,并产生类似于Mosek等最新优化器的效绩,同时在硬性实例上表现出更快的趋同。 最后,我们提出对S&P 500、Russell 1000和Russell 3000指数进行动态的有限规模缩放分析,揭示了在组合优化问题上VNA算法的普遍行为和多纳尼拉时间缩放。


Article 220

Title@2025-07-09 (3): Less can be more for predicting properties with large language models

Title: Less can be more for predicting properties with large language models Weniger kann mehr für die Vorhersage von Eigenschaften mit großen Sprachmodellen sein 用大语言模型预测属性可以少得多 2406.17295v3

Authors (3): Nawaf Alampara, Santiago Miret, Kevin Maik Jablonka

Predicting properties from coordinate-category data – sets of vectors paired with categorical information – is fundamental to computational science. In materials science, this challenge manifests as predicting properties like formation energies or elastic moduli from crystal structures comprising atomic positions (vectors) and element types (categorical information). While large language models (LLMs) have increasingly been applied to such tasks, with researchers encoding structural data as text, optimal strategies for achieving reliable predictions remain elusive. Here, we report fundamental limitations in LLM’s ability to learn from coordinate information in coordinate-category data. Through systematic experiments using synthetic datasets with tunable coordinate and category contributions, combined with a comprehensive benchmarking framework (MatText) spanning multiple representations and model scales, we find that LLMs consistently fail to capture coordinate information while excelling at category patterns. This geometric blindness persists regardless of model size (up to 70B parameters), dataset scale (up to 2M structures), or text representation strategy. Our findings suggest immediate practical implications: for materials property prediction tasks dominated by structural effects, specialized geometric architectures consistently outperform LLMs by significant margins, as evidenced by a clear “GNN-LM wall” in performance benchmarks. Based on our analysis, we provide concrete guidelines for architecture selection in scientific machine learning, while highlighting the critical importance of understanding model inductive biases when tackling scientific prediction problems.

在材料科学中,这一挑战表现为预测由原子位置(矢量)和元素类型(分类信息)组成的晶体结构的形成能量或弹性模量等特性。虽然大型语言模型(LLMS)越来越多地应用于此类任务,研究人员将结构数据编码为文本,实现可靠预测的最佳战略仍然难以实现。在这里,我们报告了LLM从协调类别数据信息中学习的能力的根本局限性。在材料科学方面,通过利用具有金枪鱼协调和类别贡献的合成数据集进行系统实验,加上涵盖多个表象和模型规模的综合基准框架(MatText),我们发现LLOMS始终无法获取协调信息,同时优于类别模式。无论模型大小(最高为70B参数)、数据规模(最高为2M结构),还是文本表述战略,这种几何失明现象依然存在。我们的调查结果表明,对于以结构效应为主导的财产预测任务,专业的几何测量结构结构架构以显著的LLMMMS格式为顶端,同时以显著的边距值进行全面的综合基准,我们的科学选择模型分析中的重要基础是:我们科学定位分析中的关键建筑分析的清晰基准。


Article 221

Title@2025-07-09 (3): Multi-Attribute Steering of Language Models via Targeted Intervention

Title: Multi-Attribute Steering of Language Models via Targeted Intervention Multi-Attribute-Lenkung von Sprachmodellen durch gezielte Intervention 通过有针对性的干预措施,多语言模式多来源指导 2502.12446v2

Authors (4): Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Inference-time intervention (ITI) has emerged as a promising method for steering large language model (LLM) behavior in a particular direction (e.g., improving helpfulness) by intervening on token representations without costly updates to the LLM’s parameters. However, existing ITI approaches fail to scale to multi-attribute settings with conflicts, such as enhancing helpfulness while also reducing toxicity. To address this, we introduce Multi-Attribute Targeted Steering (MAT-Steer), a novel steering framework designed for selective token-level intervention across multiple attributes. MAT-Steer learns steering vectors using an alignment objective that shifts the model’s internal representations of undesirable outputs closer to those of desirable ones while enforcing sparsity and orthogonality among vectors for different attributes, thereby reducing inter-attribute conflicts. We evaluate MAT-Steer in two distinct settings: (i) on question answering (QA) tasks where we balance attributes like truthfulness, bias, and toxicity; (ii) on generative tasks where we simultaneously improve attributes like helpfulness, correctness, and coherence. MAT-Steer outperforms existing ITI and parameter-efficient fine-tuning approaches across both task types (e.g., 3% average accuracy gain across QA tasks and 55.82% win rate against the best ITI baseline).

为解决这一问题,我们引入了多归性定向指导(MAT-Steer),这是为多种属性的选择性象征性干预设计的一种新型指导框架。MAT-Steer学习了配比任务,即我们既要改进模型对不良产出的内部表述,又要使模型对理想产出的内部表述更接近于理想产出的内部表述,同时为不同属性加强矢量之间的宽度和异度,从而减少不同属性的冲突。我们用两种不同的背景评估MAT-Steer:(一) 问题回答(QA) 任务,我们把真实性、偏向性和毒性等特征与I类选择性象征性干预相平衡;(二) 基因化任务,我们同时改进属性,如精准性、正确性和一致性。


Article 222

Title@2025-07-09 (3): How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

Title: How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks Wie man die Sim-to-Real Gap in digitalen Twin-Aided Telecommunication Networks überbrücken kann 如何弥合数字双辅助电信网络的一对一差距 2507.07067v1

Authors (4): Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir M. Al-Hashimi

Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.

实际数据收集费用昂贵,现有数据集往往无法捕捉网络环境的独特操作条件和背景变异性。数字结对为这一问题提供了潜在的解决办法,因为根据当前网络部署量设计的模拟器可以生成针对具体地点的数据,以扩大现有的培训数据集。然而,需要制定解决方案,弥合合成数据与现实数据之间固有的模拟到现实(模拟到现实)差距。本文回顾了两个补充战略的最新进展:(1) 通过真实世界测量校准数字双胞胎(DTs),(2) 使用模拟到实际差距培训战略,以有力处理数字双生数据与真实数据之间的剩余差异。关于后者,我们评价两种概念上独特的方法,通过Bayesian学习在环境层面或通过预测力推断在培训损失水平上模拟模拟模拟到现实差距。


Article 223

Title@2025-07-09 (3): LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

Title: LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits LASER: Mit Multi-Armed Bandits lernen, Belohnungsmodelle adaptiv zu wählen LASeR: 学习用多武装强盗来灵活选择奖赏模型 2410.01735v2

Authors (4): Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g. writing) generalizes to new tasks (e.g. math) is often not known a priori, often making using only one fixed RM to train LLMs suboptimal. However, optimizing LLMs with multiple RMs simultaneously can incur a prohibitively high computational cost and lead to conflicting signals from different RMs that may degrade performance. To address these challenges, we introduce LASeR (Learning to Adaptively Select Rewards), which frames reward model selection as a multi-armed bandit problem, efficiently and iteratively training LLMs using multiple RMs by selecting the most well-suited RM for each instance. On commonsense and math reasoning tasks, we show that LASeR boosts iterative LLM training, improving the absolute average accuracy of Llama-3-8B over three datasets by 2.67% over an ensemble of RM scores while also showing superior efficiency (e.g., a 2x speedup). Moreover, on WildChat (open-ended instruction-following tasks), LASeR leads to a 72.69% AlpacaEval win rate over the RM score ensemble baseline. Extending to long-context generation, LASeR improves by 2.96 F1 points (avg.) on single-document QA tasks and 2.97 F1 points on few-shot learning over the RM score ensemble baseline with best-of-n sampling.

对大型语言模型(LLMS)的调整至关重要,但是,对于大型语言模型(LLMS),一个专门用于一项任务(例如写作)的RM(LLMS)一般化为新任务(例如数学)往往不先验地为人所知,往往只使用一个固定的RM(RM)来培训LLMS亚优化。然而,同时使用多个RMM(RMS)的优化LMM(LMS)会同时产生惊人的高计算成本,并导致不同RMS(Llama-3-8B)的绝对平均精确度降低性能。为了应对这些挑战,我们引入LSER(学习适应性选择选择选择选择一个任务(例如写作)),将模型的选择作为多武装的土匪问题,利用多个RMMS(例如数学)进行高效和迭代用培训LLMM(LMM),在常见和数学推理学任务方面,我们显示LRMMMM(L3-38)的绝对平均平均准确度比分比分数为2.67%,同时显示高级效率(e-RMD-RMS-S)在2.x-S-S的排序基线任务上,在2-S-S-S-S-S-xxxxxx-S(S)比分级上,在双平级上,在最低调调调。


Article 224

Title@2025-07-09 (3): Interpretable EEG-to-Image Generation with Semantic Prompts

Title: Interpretable EEG-to-Image Generation with Semantic Prompts Interpretierbare EEG-to-Image-Generation mit semantischen Prompts 具有语义提示的可解释的 EEG 至 图像生成 2507.07157v1

Authors (6): Arshak Rezvani, Ali Akbari, Kosar Sanjar Arani, Maryam Mirian, Emad Arasteh, Martin J. McKeown

Decoding visual experience from brain signals offers exciting possibilities for neuroscience and interpretable AI. While EEG is accessible and temporally precise, its limitations in spatial detail hinder image reconstruction. Our model bypasses direct EEG-to-image generation by aligning EEG signals with multilevel semantic captions – ranging from object-level to abstract themes – generated by a large language model. A transformer-based EEG encoder maps brain activity to these captions through contrastive learning. During inference, caption embeddings retrieved via projection heads condition a pretrained latent diffusion model for image generation. This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset, with interpretable alignment to known neurocognitive pathways. Dominant EEG-caption associations reflected the importance of different semantic levels extracted from perceived images. Saliency maps and t-SNE projections reveal semantic topography across the scalp. Our model demonstrates how structured semantic mediation enables cognitively aligned visual decoding from EEG.

从大脑信号中解析视觉经验为神经科学和可解释的AI提供了令人兴奋的可能性。 虽然EEG是可获得的,而且时间精确,但其空间细节的局限性阻碍了图像重建。 我们的模型绕过EEEG到图像的生成,通过将EEG信号与由大型语言模型产生的多层次语义说明 – – 从物体层面到抽象主题 – – 相匹配。 以变压器为基础的EEEG编码器绘制大脑活动图谱,通过对比性学习绘制这些语义。 在推断过程中,通过投影头状态检索到的字幕嵌入模型,为图像生成提供了预先训练的潜在传播模型。 这个文本中介框架在 EGCVPR 数据集上生成了最先进的视觉解码,与已知的神经认知路径进行了可解释的对接。 Domant EEG-Caption 协会反映了从感知图像中提取的不同语义级别的重要性。 光度图和 t-SNEEE预测揭示了整个头部的语义地形学。 我们的模型展示了结构结构的语义调调调调调调解如何使得EEEEG能够对认知进行视觉解。


Article 225

Title@2025-07-09 (3): ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Title: ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment ROCKET-2: Steuerung der Visuomotorik über Cross-View Goal Alignment ROCKET-2:通过交叉视野目标协调指导维苏莫多政策 2503.02505v2

Authors (4): Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang

We aim to develop a goal specification method that is semantically clear, spatially sensitive, domain-agnostic, and intuitive for human users to guide agent interactions in 3D environments. Specifically, we propose a novel cross-view goal alignment framework that allows users to specify target objects using segmentation masks from their camera views rather than the agent’s observations. We highlight that behavior cloning alone fails to align the agent’s behavior with human intent when the human and agent camera views differ significantly. To address this, we introduce two auxiliary objectives: cross-view consistency loss and target visibility loss, which explicitly enhance the agent’s spatial reasoning ability. According to this, we develop ROCKET-2, a state-of-the-art agent trained in Minecraft, achieving an improvement in the efficiency of inference 3x to 6x compared to ROCKET-1. We show that ROCKET-2 can directly interpret goals from human camera views, enabling better human-agent interaction. Remarkably, ROCKET-2 demonstrates zero-shot generalization capabilities: despite being trained exclusively on the Minecraft dataset, it can adapt and generalize to other 3D environments like Doom, DMLab, and Unreal through a simple action space mapping.

我们的目标是制定一种目标规格方法,使人体用户能够用精度清晰、空间敏感、域名和直觉来指导3D环境中的代理互动。具体地说,我们提议了一个新的交叉视图目标协调框架,使用户能够使用其相机视图中的分解面面罩而不是代理人的观测来指定目标对象。我们强调,在人类和代理相机观点大相径庭时,仅行为克隆不能使代理人的行为与人类意图一致。为了解决这个问题,我们引入了两个辅助目标:交叉视图一致性损失和目标可见度损失,这明显加强了代理人的空间推理能力。根据这个框架,我们开发了ROCKET-2,这是在地雷方面受过训练的一门最先进的工具,可以提高与ROCKET-1相比,3x至6x的推断效率。我们表明,在人类相机观点中直接解释目标,从而能够更好地进行人类-代理人的互动。值得注意的是,ROCKET-2展示了零光谱化能力:尽管仅接受过对采矿数据集的培训,但它可以通过DMRAMA和简单空间测绘,例如DMAL,可以将普通空间图解到其他3DRial环境。


Article 226

Title@2025-07-09 (3): Geometry-Informed Neural Operator Transformer

Title: Geometry-Informed Neural Operator Transformer Geometrie-informierter Neuraloperator Transformer 智能神经操作器变换器 2504.19452v4

Authors (6): Qibang Liu, Weiheng Zhong, Hadi Meidani, Diab Abueidda, Seid Koric, Philippe Geubelle

Machine-learning-based surrogate models offer significant computational efficiency and faster simulations compared to traditional numerical methods, especially for problems requiring repeated evaluations of partial differential equations. This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions on arbitrary geometries. GINOT employs a sampling and grouping strategy together with an attention mechanism to encode surface point clouds that are unordered, exhibit non-uniform point densities, and contain varying numbers of points for different geometries. The geometry information is seamlessly integrated with query points in the solution decoder through the attention mechanism. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.

与传统的数字方法相比,基于机械学习的代用模型提供了显著的计算效率和更快的模拟,特别是对于需要反复评价部分差异方程式的问题而言。这项工作引入了几何进制神经操作器变异器(GINOT),将变压器结构与神经操作器框架相结合,以便能够对任意的几何进行前瞻性预测。GINOT采用取样和分组战略,同时采用关注机制,对未排序、显示非统一点密度、并包含不同地理比例不同点数的表面点云进行编码。几何信息通过注意机制与溶液解码的查询点完全融合。GINOT的性能在多个具有挑战性的数据集上得到验证,显示其高精确度和强强的复杂和任意2D和3D的通用能力。


Article 227

Title@2025-07-09 (3): Low-Rank Adaptation Secretly Imitates Differentially Private SGD

Title: Low-Rank Adaptation Secretly Imitates Differentially Private SGD Low-Rank-Anpassung hinterhältig imitiert unterschiedlich private SGD 低浓度适应 秘密模仿 不同的私人 SGD 2409.17538v7

Authors (2): Saber Malekmohammadi, Golnoosh Farnadi

As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.

随着经过培训的语文模型规模扩大,充分调整其任务适应数据参数变得日益不切实际。为了应对这一挑战,提出了一些语言模型低调适应方法,例如,LORA,它将可训练的低级分解矩阵纳入只是经过培训的模型的某些参数,称为适应器。这种方法大大降低了可训练参数的数量,而没有微调所有参数或适配器。在这项工作中,我们从数据隐私的角度来看待低级适应方法。我们从理论上表明,LORA使用的低级别适应方法相当于具有噪声批量梯度的微调适应器――正如DPSGD算法所做的那样。我们还将注入的噪声的差异量化为适应等级的递减功能。通过建立一个Berry-Es seen型,将注入的噪音分布与高调噪音分布之间的总变异差与差异联系起来。我们从数据隐私的角度看,低调的动态非常接近于在DPSGD进行调控时。根据我们的理论发现和实验性攻击的结果,我们展示了稳健的调数据。我们展示了低调数据。


Article 228

Title@2025-07-09 (3): A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Title: A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering Eine neuartige, hybride Deep-Learning-Technik zur Sprachemotionserkennung mit Feature Engineering 利用地貌工程探测语音情感的新混合深层学习技术 2507.07046v1

Authors (4): Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee

Nowadays, speech emotion recognition (SER) plays a vital role in the field of human-computer interaction (HCI) and the evolution of artificial intelligence (AI). Our proposed DCRF-BiLSTM model is used to recognize seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise, which are trained on five datasets: RAVDESS (R), TESS (T), SAVEE (S), EmoDB (E), and Crema-D (C). The model achieves high accuracy on individual datasets, including 97.83% on RAVDESS, 97.02% on SAVEE, 95.10% for CREMA-D, and a perfect 100% on both TESS and EMO-DB. For the combined (R+T+S) datasets, it achieves 98.82% accuracy, outperforming previously reported results. To our knowledge, no existing study has evaluated a single SER model across all five benchmark datasets (i.e., R+T+S+C+E) simultaneously. In our work, we introduce this comprehensive combination and achieve a remarkable overall accuracy of 93.76%. These results confirm the robustness and generalizability of our DCRF-BiLSTM framework across diverse datasets.

目前,言语情感识别(SER)在人-计算机互动(HCI)和人工智能演化(AI)领域发挥着至关重要的作用。我们提议的DCR-BILSTM模式被用于识别七种情感:中立、快乐、悲伤、愤怒、恐惧、厌恶和惊喜,这些情感在五个数据集(REVDESS(R)、TESS(T)、SAVEE(S)、EmoDB(E)和CREMA-D(C)))方面受过培训,在五个数据集(RAVDESS(RAVDESS)97.83%、SAVE97.02%、CREMA-D95.10%、TES和EMO-DB一个完美的100%。对于综合(R+S)数据集,该数据集实现了98.82%的准确性,超过了以前报告的结果。据我们所知,现有研究没有对所有五个基准数据集(即R+T+S+CE+C+E+EEEE)的单一模型进行评价。在我们的工作中,我们采用了这种全面的组合,并实现了93.76%通用的DC-RIS框架的惊人的精确性。这些结果。


Article 229

Title@2025-07-09 (3): Non-Asymptotic Analysis of Online Local Private Learning with SGD

Title: Non-Asymptotic Analysis of Online Local Private Learning with SGD Nicht-asymptotische Analyse des lokalen Online-Lernens mit SGD 与SGD合作对本地私人在线学习进行非心理分析 2507.07041v1

Authors (5): Enze Shi, Jinhan Xie, Bei Jiang, Linglong Kong, Xuming He

Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses have focused on non-private optimization methods, and hence are not applicable to privacy-preserving optimization problems. This work initiates the analysis to bridge this gap and opens the door to non-asymptotic convergence analysis of private optimization problems. A general framework is investigated for the online LDP model in stochastic optimization problems. We assume that sensitive information from individuals is collected sequentially and aim to estimate, in real-time, a static parameter that pertains to the population of interest. Most importantly, we conduct a comprehensive non-asymptotic convergence analysis of the proposed estimators in finite-sample situations, which gives their users practical guidelines regarding the effect of various hyperparameters, such as step size, parameter dimensions, and privacy budgets, on convergence rates. Our proposed estimators are validated in the theoretical and practical realms by rigorous mathematical derivations and carefully constructed numerical experiments.

尽管如此,对DP-SGD进行系统的非抽查性趋同分析,特别是在网上问题和当地差异隐私模式方面,仍然基本上难以实现。现有的非抽查性分析侧重于非私人优化方法,因此不适用于隐私保护优化问题。这项工作启动了这一分析,以弥补这一差距,打开对私人优化问题进行非抽查性趋同分析的大门。对在线LDP模式进行随机优化问题调查的总框架。我们假定,从个人收集敏感信息是按顺序顺序收集的,目的是实时估计与利益群体有关的静态参数。最重要的是,我们对有限情况下的拟议估算者进行全面的非抽查性趋同分析,为用户提供关于各种超分计(如步尺大小、参数尺寸、以及精确度和精确度)效应的实际模型的实用准则。我们设想,个人提供的敏感信息是按步数、精确度和精确度进行实时估算。我们提议的数学模型和精确度模型的精确度是精确度和精确度。


Article 230

Title@2025-07-09 (3): Self-Supervised Learning at the Edge: The Cost of Labeling

Title: Self-Supervised Learning at the Edge: The Cost of Labeling Selbstüberwachtes Lernen am Rande: Die Kosten der Kennzeichnung 边缘自导学习:标签成本 2507.07033v1

Authors (6): Roberto Pereira, Fernanda Famá, Asal Rangrazi, Marco Miozzo, Charalampos Kalalas, Paolo Dini

Contrastive learning (CL) has recently emerged as an alternative to traditional supervised machine learning solutions by enabling rich representations from unstructured and unlabeled data. However, CL and, more broadly, self-supervised learning (SSL) methods often demand a large amount of data and computational resources, posing challenges for deployment on resource-constrained edge devices. In this work, we explore the feasibility and efficiency of SSL techniques for edge-based learning, focusing on trade-offs between model performance and energy efficiency. In particular, we analyze how different SSL techniques adapt to limited computational, data, and energy budgets, evaluating their effectiveness in learning robust representations under resource-constrained settings. Moreover, we also consider the energy costs involved in labeling data and assess how semi-supervised learning may assist in reducing the overall energy consumed to train CL models. Through extensive experiments, we demonstrate that tailored SSL strategies can achieve competitive performance while reducing resource consumption by up to 4X, underscoring their potential for energy-efficient learning at the edge.

最近出现了反向学习(CL),作为传统受监督的机器学习解决办法的替代方案,它使来自无结构的和未贴标签的数据能够产生丰富的代表性,然而,CL和更广泛地说,自我监督的学习(SSL)方法往往需要大量的数据和计算资源,对在资源紧缺的边缘设备上进行部署提出了挑战。在这项工作中,我们探索了SSL技术在边际学习的可行性和效率,侧重于模型性能和能源效率之间的权衡。我们特别分析了不同的SSL技术如何适应有限的计算、数据和能源预算,评价其在资源受限制的环境中学习强健代表性的效果。此外,我们还考虑了在标注数据方面的能源成本,并评估半监督的学习如何有助于减少用于培训CL模型的总体能源消耗量。我们通过广泛的实验,证明有针对性的SSL战略能够取得竞争性的绩效,同时将资源消耗减少到4X,强调其在边缘进行节能学习的潜力。


Article 231

Title@2025-07-09 (3): ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation

Title: ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation ZKTorch: Kompilieren von ML-Inferenz zu Null-Wissens-Proofs durch parallele Proof-Kumulation ZKTorch:通过平行证据累积,将ML推论编成零知识证据 2507.07031v1

Authors (3): Bing-Jyue Chen, Lilia Tang, Daniel Kang

As AI models become ubiquitous in our daily lives, there has been an increasing demand for transparency in ML services. However, the model owner does not want to reveal the weights, as they are considered trade secrets. To solve this problem, researchers have turned to zero-knowledge proofs of ML model inference. These proofs convince the user that the ML model output is correct, without revealing the weights of the model to the user. Past work on these provers can be placed into two categories. The first method compiles the ML model into a low-level circuit, and proves the circuit using a ZK-SNARK. The second method uses custom cryptographic protocols designed only for a specific class of models. Unfortunately, the first method is highly inefficient, making it impractical for the large models used today, and the second method does not generalize well, making it difficult to update in the rapidly changing field of machine learning. To solve this, we propose ZKTorch, an open source end-to-end proving system that compiles ML models into base cryptographic operations called basic blocks, each proved using specialized protocols. ZKTorch is built on top of a novel parallel extension to the Mira accumulation scheme, enabling succinct proofs with minimal accumulation overhead. These contributions allow ZKTorch to achieve at least a $3\times$ reduction in the proof size compared to specialized protocols and up to a $6\times$ speedup in proving time over a general-purpose ZKML framework.

随着AI模型在我们日常生活中变得无处不在,对ML服务透明度的需求不断增长。然而,模型所有人不想透露重量,因为他们被视为商业秘密。为了解决这个问题,研究人员已经转向了ML模型推断的零知识证明。这些证明使用户相信ML模型输出是正确的,而没有向用户透露模型的重量。这些证明过去的作品可以分为两类。第一种方法将ML模型编成一个低级电路,用ZK-SNARK来证明电路。第二种方法使用只为特定型号设计的定制加密协议。不幸的是,第一种方法效率很高,使今天使用的大型模型不切实际,而第二种方法没有很好地使快速变化的机器学习领域难以更新。为了解决这个问题,我们建议ZKTorrch,一个将ML模型编成基本加密操作的开放源端对端对端验证系统,称为ZK-SNARK。第二种方法使用专门化协议,每个方法都使用专门性协议的升级到最高层级协议。


Article 232

Title@2025-07-09 (3): Scaling 4D Representations

Title: Scaling 4D Representations Skalierung von 4D-Darstellungen 缩放 4D 代表制 2412.15212v2

Authors (35): João Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd van Steenkiste, Daniel Zoran, Drew A. Hudson, Pedro Vélez, Luisa Polanía, Luke Friedman, Chris Duvarney, Ross Goroshin, Kelsey Allen, Jacob Walker, Rishabh Kabra, Eric Aboussouan, Jennifer Sun, Thomas Kipf, Carl Doersch, Viorica Pătrăucean, Dima Damen, Pauline Luc, Mehdi S. M. Sajjadi, Andrew Zisserman

Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose estimation, point and object tracking, and depth estimation. We show that by learning from very large video datasets, masked auto-encoding (MAE) with transformer video models actually scales, consistently improving performance on these 4D tasks, as model size increases from 20M all the way to the largest by far reported self-supervised video model $\unicode{x2013}$ 22B parameters. Rigorous apples-to-apples comparison with many recent image and video models demonstrates the benefits of scaling 4D representations. Pretrained models are available at https://github.com/google-deepmind/representations4d .

尚未令人信服地证明从视频中纯粹自我监督学习的缩放。 然而,先前的工作侧重于对语义相关任务的评价 $\ uncode{x2013}$Action分类、图像网络分类等。 在本文中,我们侧重于对空间(3D) 和时间(+1D = 4D) 等非语义性视觉任务自我监督学习进行评价,例如相机显示估计、点和对象跟踪以及深度估计。 我们通过从非常大的视频数据集中学习,以变压器视频模型为掩码的自动编码(MAE)实际上规模,不断改进这些4D任务的业绩,因为模型的规模从20M 一直到报告的最大型的自我监督视频模型 $\uncode{x2013}$ 22B 参数。 Rigorous苹果-to-aples 比较许多最近的图像和视频模型显示了缩放 4D 表示的效益。 预设模型可在 https://github.com/google-deepmind/pres4d上查阅。


Article 233

Title@2025-07-09 (3): Topological Machine Learning with Unreduced Persistence Diagrams

Title: Topological Machine Learning with Unreduced Persistence Diagrams Topologisches maschinelles Lernen mit nicht reduzierten Persistenzdiagrammen 具有未减少持久性图谱图的地形机学习 2507.07156v1

Authors (3): Nicole Abreu, Parker B. Edwards, Francis Motta

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this, we introduce several methods to generate topological feature vectors from unreduced boundary matrices. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. This observation suggests that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

根据持久性同系物的特征培训的受监督的机器学习管道已经实验观察,忽略了持久性图表中包含的许多信息。计算持久性图表往往是这种管道中计算最困难的一步。然而,为了探索这一点,我们采用了几种方法,从未减少的边界矩阵中产生地形特征矢量。我们比较了受过未减少的PD矢量培训的管道的性能,并将其与在若干数据和任务类型中完全减少的PD矢量化的管道的性能相比较。我们的结果显示,用未经减少的图表建立的PD培训模型可以以等同的方式运行,甚至优于那些经过充分减少的图表培训的一些任务。这一观察表明,机器学习含有基于地形的特征的管道在计算成本和性能方面可能获益于未减少的边界矩阵中所含的信息。


Article 234

Title@2025-07-09 (3): On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

Title: On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence On-Device-Training von PV-Stromprognosemodellen in einem Smart Meter für Grid Edge Intelligence 电网边缘情报智能光电预报模型在线设计培训 2507.07016v1

Authors (6): Jian Huang, Yongli Zhu, Linna Xu, Zhe Zheng, Wenpeng Cui, Mingyang Sun

In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are investigated: a gradient boosting tree model and a recurrent neural network model. To adapt to the resource-limited situation in the smart meter, “mixed”- and “reduced”-precision training schemes are also devised. Experiment results demonstrate the feasibility of economically achieving grid-edge intelligence via the existing advanced metering infrastructures.

在本文中,在资源有限的智能计数器上进行了边缘示范培训研究,引进了电网情报的动机和装置培训概念,然后介绍了设备内部培训的技术准备步骤,介绍了关于光伏发电预报任务的案例研究,其中调查了两种有代表性的机器学习模型:梯度助树模型和经常性神经网络模型。为了适应智能计中资源有限的情况,还设计了“混合”和“减少”精确度培训计划。实验结果表明,通过现有的先进计量基础设施,在经济上实现电网智能的可行性。


Article 235

Title@2025-07-09 (3): MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

Title: MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation MST-Destill: Mischung von spezialisierten Lehrern für Cross-Modal Knowledge Destillation MST-蒸馏:跨模式知识蒸馏专业教师混合 2507.07015v1

Authors (6): Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang

Knowledge distillation as an efficient knowledge transfer technique, has achieved remarkable success in unimodal scenarios. However, in cross-modal settings, conventional distillation methods encounter significant challenges due to data and statistical heterogeneities, failing to leverage the complementary prior knowledge embedded in cross-modal teacher models. This paper empirically reveals two critical issues in existing approaches: distillation path selection and knowledge drift. To address these limitations, we propose MST-Distill, a novel cross-modal knowledge distillation framework featuring a mixture of specialized teachers. Our approach employs a diverse ensemble of teacher models across both cross-modal and multimodal configurations, integrated with an instance-level routing network that facilitates adaptive and dynamic distillation. This architecture effectively transcends the constraints of traditional methods that rely on monotonous and static teacher models. Additionally, we introduce a plug-in masking module, independently trained to suppress modality-specific discrepancies and reconstruct teacher representations, thereby mitigating knowledge drift and enhancing transfer effectiveness. Extensive experiments across five diverse multimodal datasets, spanning visual, audio, and text, demonstrate that our method significantly outperforms existing state-of-the-art knowledge distillation methods in cross-modal distillation tasks. The source code is available at https://github.com/Gray-OREO/MST-Distill.

作为一种有效的知识转让技术,知识蒸馏作为一种有效的知识蒸馏技术,在单式情景中取得了显著的成功;然而,在跨模式环境中,传统蒸馏方法由于数据和统计差异性不同而面临重大挑战,因为数据和统计层面的分流网络无法利用跨模式教师模式中固有的互补先前知识。本文从经验上揭示了现有方法中的两个关键问题:蒸馏路径选择和知识漂移。为解决这些局限性,我们提议采用新的跨模式知识蒸馏框架MST-蒸馏,这是一个由专业教师混合组成的新的跨模式知识蒸馏框架。我们的方法在跨模式和多式联运配置中采用多种教师模式的混合模式,结合了有助于适应和动态蒸馏的试级路径网络。这一结构有效地超越了依赖单调和静态教师模式的传统方法的制约。此外,我们引入了一个封闭式遮掩模模块,经过独立培训,以抑制特定模式的差异,并重建教师代表结构,从而减轻知识的流化和增强转让效力。我们的方法在五种不同的多式联运数据集、横跨视、音/多式结构中都采用多种教师模式模型,结合了各种教师模式的教师模型,与便于适应网络化网络化网络化网络网络网络网络网络网络网络网络网络网络网络,有效地超越了我们的方法超越了单质和静态/数字系统任务。


Article 236

Title@2025-07-09 (3): When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior

Title: When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior Wenn der Kontext nicht genug ist: Unerklärliche Variabilität im Auto-folgenden Verhalten modellieren 内容不够充分时:模拟汽车跟踪行为中未解释的变异性 2507.07012v1

Authors (4): Chengyuan Zhang, Zhengbing He, Cathy Wu, Lijun Sun

Modeling car-following behavior is fundamental to microscopic traffic simulation, yet traditional deterministic models often fail to capture the full extent of variability and unpredictability in human driving. While many modern approaches incorporate context-aware inputs (e.g., spacing, speed, relative speed), they frequently overlook structured stochasticity that arises from latent driver intentions, perception errors, and memory effects – factors that are not directly observable from context alone. To fill the gap, this study introduces an interpretable stochastic modeling framework that captures not only context-dependent dynamics but also residual variability beyond what context can explain. Leveraging deep neural networks integrated with nonstationary Gaussian processes (GPs), our model employs a scenario-adaptive Gibbs kernel to learn dynamic temporal correlations in acceleration decisions, where the strength and duration of correlations between acceleration decisions evolve with the driving context. This formulation enables a principled, data-driven quantification of uncertainty in acceleration, speed, and spacing, grounded in both observable context and latent behavioral variability. Comprehensive experiments on the naturalistic vehicle trajectory dataset collected from the German highway, i.e., the HighD dataset, demonstrate that the proposed stochastic simulation method within this framework surpasses conventional methods in both predictive performance and interpretable uncertainty quantification. The integration of interpretability and accuracy makes this framework a promising tool for traffic analysis and safety-critical applications.

模拟汽车跟踪行为是微小交通模拟的基础,但传统的确定性模型往往不能捕捉人类驾驶过程的可变性和不可预测性的全部程度。虽然许多现代方法包括了环境觉悟投入(如间距、速度、相对速度),但往往忽视了源于潜在驱动意图、认知错误和记忆效应的结构性随机效应 – – 这些因素并非仅从背景直接可见。为填补这一空白,本研究引入了一个可解释的随机模型框架,不仅能捕捉到因环境而异的动态,而且还能捕捉到环境所不能解释的剩余变异性。利用与非静止高斯过程(GPs)相结合的深神经网络,我们的模式采用一种情景适应性Gibs内核内流网络来学习加速决策中的动态时间相关性,即加速决定与驱动背景之间的关系的强度和持续时间。这一提法有助于根据可观察的背景和潜在行为变异性,在从德国高速公路、i. 可变性交通模型中采集的自然车辆轨迹数据,在这种可变性模型中和可变性分析工具中展示了这种可变性分析的可变性分析方法。


Article 237

Title@2025-07-09 (3): TokenShapley: Token Level Context Attribution with Shapley Value

Title: TokenShapley: Token Level Context Attribution with Shapley Value TokenShapley: Token Level Context Attribution mit Shapley Wert TokenShapley: 具有形状价值的 Token 级别背景属性 2507.05261v2

Authors (6): Yingtai Xiao, Yuqing Zhu, Sirat Samyoun, Wanrong Zhang, Jiachen T. Wang, Jian Du

Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.

大型语言模型(LLMs)在文字学习方面表现出很强的能力,但核实其生成响应的正确性仍是一项挑战。先前的工作在句级一级探索了归属问题,但当用户在响应中寻找特定关键词(如数字、年份或名称)的归属问题时,这些方法不尽人意。为了应对这一限制,我们提议TokenShapley,这是一种新型象征性归属方法,它将基于价值的数据归属与基于KNN的检索技术结合起来,这种方法受最近KNN提法LMs进步的启发。通过利用一个预先计算的数据存储器来进行背景检索和计算Shapley值以量化象征性重要性,TokenShapley提供了一种细微的归因数据归属方法。对四项基准的广泛评价表明,TokenShapley在象征性归属方面超越了最新的基线,从而实现了11-23%的准确性改进。


Article 238

Title@2025-07-09 (3): Exact Evaluation of the Accuracy of Diffusion Models for Inverse Problems with Gaussian Data Distributions

Title: Exact Evaluation of the Accuracy of Diffusion Models for Inverse Problems with Gaussian Data Distributions Exakte Bewertung der Genauigkeit von Diffusionsmodellen für inverse Probleme mit Gaußschen Datenverteilungen 对高山数据发布反问题传播模型准确性精确评价 2507.07008v1

Authors (2): Emile Pierret, Bruno Galerne

Used as priors for Bayesian inverse problems, diffusion models have recently attracted considerable attention in the literature. Their flexibility and high variance enable them to generate multiple solutions for a given task, such as inpainting, super-resolution, and deblurring. However, several unresolved questions remain about how well they perform. In this article, we investigate the accuracy of these models when applied to a Gaussian data distribution for deblurring. Within this constrained context, we are able to precisely analyze the discrepancy between the theoretical resolution of inverse problems and their resolution obtained using diffusion models by computing the exact Wasserstein distance between the distribution of the diffusion model sampler and the ideal distribution of solutions to the inverse problem. Our findings allow for the comparison of different algorithms from the literature.

传播模型作为贝叶斯反向问题的前身,最近在文献中引起了相当的注意。其灵活性和高度差异使得它们能够为特定任务产生多种解决办法,例如油漆、超分辨率和混凝土。然而,关于这些模型的运作情况,还存在几个尚未解决的问题。在本篇文章中,我们调查这些模型在应用到高斯数据分配中用于破碎的数据时的准确性。在这种有限的背景下,我们能够精确分析反向问题的理论解决办法与利用传播模型获得的这些问题的解析方法之间的差异,方法是计算扩散模型取样器分布与反向问题解决方案的理想分配之间的准确距离。我们的调查结果可以比较与文献的不同算法。


Article 239

Title@2025-07-09 (3): GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning

Title: GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning GNN-ViTCap: GNN-erweitertes Multiple-Instance-Lernen mit Vision-Transformern für die Klassifizierung und Beschriftung von ganzen Folienbildern GNN-Vittrap:GNN-加强多实例学习,与全幻灯片图像分类和说明的愿景变异者一起进行全幻灯片图像分类和说明 2507.07006v1

Authors (5): S M Taslim Uddin Raju, Md. Milon Islam, Md Rezwanul Haque, Hamdi Altaheri, Fakhri Karray

Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model’s input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.

对病理学图像进行微观评估对于准确的癌症诊断和治疗至关重要。 整个幻灯片图像分类和字幕分类已经成为计算机辅助病理学的关键任务。 然而, 微观的西西西西里西里西里西由于主观病理学家的捕获而面临冗余补丁和未知补丁位置等挑战。 此外, 生成自动病理说明仍然是一个重大挑战。 为了解决这些问题, 我们引入了一个新的 GNN- ViTCap 框架, 用于从病理微生物显微科微科图像进行分类和字幕生成。 首先, 一个直观的病人特征提取器产生补丁嵌入器。 然后, 利用深嵌入式集群将这些嵌入器进行动态组合, 并通过一个斜度点注意机制选择有代表性的补丁。 我们通过将每个节与相似的近邻连接来构建一个图表, 并应用一个图形神经内线层网络, 将汇总图像嵌入语言模型输入空间, 与用于微量语言模型的标示符号嵌嵌嵌嵌入器。 我们验证了我们在BHI和PL1级的GNISL数据分类中, 和GNIS里的B的BA级, 和GRA级, 的B级, AS级, 和GNIS- CA级的B级, 的B级, 的B级, 的B级, 和GVIL 级, 级, 级的B级的B级的B级, 级, 的B级, 级, 级,将一个GRL,将一个G-NUL, 和B级,将一个直级, 级, 和B级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级, 级


Article 240

Title@2025-07-09 (3): Skewed Score: A statistical framework to assess autograders

Title: Skewed Score: A statistical framework to assess autograders Skewed Score: Ein statistischer Rahmen zur Bewertung von Autogradern 倾斜分分数:评估自动降解器的统计框架 2507.03772v2

Authors (6): Magda Dubois, Harry Coppock, Mario Giulianelli, Timo Flesch, Lennart Luettgau, Cozmin Ududec

The evaluation of large language model (LLM) outputs is increasingly performed by other LLMs, a setup commonly known as “LLM-as-a-judge”, or autograders. While autograders offer a scalable alternative to human evaluation, they have shown mixed reliability and may exhibit systematic biases, depending on response type, scoring methodology, domain specificity, or other factors. Here we propose a statistical framework based on Bayesian generalised linear models (GLMs) that enables researchers to simultaneously assess their autograders while addressing their primary research questions (e.g., LLM evaluation). Our approach models evaluation outcomes (e.g., scores or pairwise preferences) as a function of properties of the grader (e.g., human vs. autograder) and the evaluated item (e.g., response length or the LLM that generated it), allowing for explicit quantification of scoring differences and potential biases within a unified framework. In addition, our method can be used to augment traditional metrics such as inter-rater agreement, by providing uncertainty estimates and clarifying sources of disagreement. Overall, this approach contributes to more robust and interpretable use of autograders in LLM evaluation, enabling both performance analysis and bias detection.

大型语言模型(LLM)产出的评价越来越多地由其他LLM(LLM)进行,这一机制通常称为“LLM-a-judge”或自动升级者。虽然自动升级者提供了可扩缩的人类评价的替代方法,但它们显示出了不完全的可靠性,并可能根据响应类型、评分方法、域特性或其他因素而表现出系统性的偏差。我们在这里提议了一个基于巴伊西亚通用线性模型(GLM)的统计框架,使研究人员能够在处理其主要研究问题(例如LLM评价)的同时评估自己的自动升级者。我们的方法模型的评价结果(例如分数或配对偏好)是分(例如人对自动升级者的偏好)和被评价项目(例如反应长度或生成的LM)的特性的函数。我们提出的统计框架可以明确量化评分差异和潜在偏差。此外,我们的方法可以用来通过提供不确定性估计和澄清分歧的来源来增强传统的衡量尺度。总体而言,这一方法有助于更有力和可解释地检测和解释的磁分析。


Article 241

Title@2025-07-09 (3): Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI

Title: Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI Multi-Modality Conditioned Variational U-Net für Feld-of-View-Erweiterung in Gehirn-Diffusion MRT 用于脑传播MRI中实地观察扩展的多式移动式有条件U-Net 2409.13846v2

Authors (16): Zhiyuan Li, Chenyu Gao, Praitayini Kanakaraj, Shunxing Bao, Lianrui Zuo, Michael E. Kim, Nancy R. Newlin, Gaurav Rudravaram, Nazirah M. Khairi, Yuankai Huo, Kurt G. Schilling, Walter A. Kukull, Arthur W. Toga, Derek B. Archer, Timothy J. Hohman, Bennett A. Landman

An incomplete field-of-view (FOV) in diffusion magnetic resonance imaging (dMRI) can severely hinder the volumetric and bundle analyses of whole-brain white matter connectivity. Although existing works have investigated imputing the missing regions using deep generative models, it remains unclear how to specifically utilize additional information from paired multi-modality data and whether this can enhance the imputation quality and be useful for downstream tractography. To fill this gap, we propose a novel framework for imputing dMRI scans in the incomplete part of the FOV by integrating the learned diffusion features in the acquired part of the FOV to the complete brain anatomical structure. We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV. We tested our framework on two cohorts from different sites with a total of 96 subjects and compared it with a baseline imputation method that treats the information from T1w and dMRI scans equally. The proposed framework achieved significant improvements in imputation performance, as demonstrated by angular correlation coefficient (p < 1E-5), and in downstream tractography accuracy, as demonstrated by Dice score (p < 0.01). Results suggest that the proposed framework improved imputation performance in dMRI scans by specifically utilizing additional information from paired multi-modality data, compared with the baseline method. The imputation achieved by the proposed framework enhances whole brain tractography, and therefore reduces the uncertainty when analyzing bundles associated with neurodegenerative.

在传播磁共振成像(dMRI)方面,一个不完整的实地观察(FOV)在传播磁共振成像(dMRI)方面不完全的实地观察(FOV)可能会严重妨碍对全脑白质连通性进行体积和捆绑分析。虽然现有的工程已经调查了利用深基因化模型对缺失区域进行估算,但仍然不清楚如何具体利用配对的多模式数据提供的额外信息,以及这是否能够提高估算质量和对下游色谱学有用。为了填补这一空白,我们提出了一个新框架,用于对FOV的不完整部分进行dMRI扫描,方法是将FOV所获得部分的已学传播功能与完整的大脑解剖结构结合起来。我们假设,通过这一设计,拟议框架可以提高dMRI扫描仪的估算性能,因此,拟议框架可以提高dMRI的全局性能。 我们用总共96个主题对两个组组进行了测试,并将这一框架与一个相关的基线估算方法进行了比较,将FOV的T1w和dMRI所获取的信息与完整的大脑内存的准确度扫描,因此,在1-ralmarialalalalmarial droal 结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构也显示,因此,因此,因此显示,因此显示显著的改进,因此显示显著性能改进,因此,因此,因此,因此显示,因此显示BBBBBBBBBBBBBBLBLBLBLBBBBBBBLBBLBLBLBBBBBBBBLBLBLBL 的改进,因此,因此,因此显示,因此显示,因此显示BBBBBBBBBBBBBBBBBBBBBLBLBBBBBBBBBBBBBBBBBBBBBBBBBBB的改进,因此显示,因此,因此显示,因此显示,因此显示,因此显示,因此显示,在1和BBBBBBBBBBBBBBBBB


Article 242

Title@2025-07-09 (3): Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs

Title: Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Bewusst lernen, intuitiv handeln: Entsperren von Test-Time-Reasoning in multimodalen LLMs 故意、自觉地学习:在多模式LMs中解锁试验时间理由 2507.06999v1

Authors (3): Yahan Yu, Yuyang Dong, Masafumi Oyamada

Reasoning is a key capability for large language models (LLMs), particularly when applied to complex tasks such as mathematical problem solving. However, multimodal reasoning research still requires further exploration of modality alignment and training costs. Many of these approaches rely on additional data annotation and relevant rule-based rewards to enhance the understanding and reasoning ability, which significantly increases training costs and limits scalability. To address these challenges, we propose the Deliberate-to-Intuitive reasoning framework (D2I) that improves the understanding and reasoning ability of multimodal LLMs (MLLMs) without extra annotations and complex rewards. Specifically, our method sets deliberate reasoning strategies to enhance modality alignment only through the rule-based format reward during training. While evaluating, the reasoning style shifts to intuitive, which removes deliberate reasoning strategies during training and implicitly reflects the model’s acquired abilities in the response. D2I outperforms baselines across both in-domain and out-of-domain benchmarks. Our findings highlight the role of format reward in fostering transferable reasoning skills in MLLMs, and inspire directions for decoupling training-time reasoning depth from test-time response flexibility.

解释是大型语言模型(LLMS)的关键能力,特别是在应用到数学问题解决等复杂任务时。然而,多式联运推理研究仍需要进一步探索模式调整和培训成本。许多这些方法依靠额外数据说明和相关基于规则的奖励来增强理解和推理能力,这大大增加了培训成本并限制了可缩放性。为应对这些挑战,我们提议了自觉到直觉的推理框架(D2I),它提高了多式联运LMS(MLMS)的理解和推理能力,而没有额外的说明和复杂的奖赏。具体地说,我们的方法设置了有意推理战略,只通过培训期间基于规则的格式奖励来增强模式的调整。在评估时,推理风格转向直觉,它消除了培训期间故意推理策略,隐含地反映了模型在应对过程中获得的能力。D2I超越了内部和外部基准的基线。我们的调查结果强调了格式奖励在培养MLLMS的可转移推理技巧和复杂奖赏方面的作用,并激励了培训时间推理深度与试验时反应灵活性的分。


Article 243

Title@2025-07-09 (3): Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks

Title: Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks Federated Learning-based MARL zur Stärkung der Physical-Layer-Sicherheit in B5G-Netzwerken 加强B5G网络中人身安全的联邦学习型MARL 2507.06997v1

Authors (3): Deemah H. Tashman, Soumaya Cherkaoui, Walaa Hamouda

This paper explores the application of a federated learning-based multi-agent reinforcement learning (MARL) strategy to enhance physical-layer security (PLS) in a multi-cellular network within the context of beyond 5G networks. At each cell, a base station (BS) operates as a deep reinforcement learning (DRL) agent that interacts with the surrounding environment to maximize the secrecy rate of legitimate users in the presence of an eavesdropper. This eavesdropper attempts to intercept the confidential information shared between the BS and its authorized users. The DRL agents are deemed to be federated since they only share their network parameters with a central server and not the private data of their legitimate users. Two DRL approaches, deep Q-network (DQN) and Reinforce deep policy gradient (RDPG), are explored and compared. The results demonstrate that RDPG converges more rapidly than DQN. In addition, we demonstrate that the proposed method outperforms the distributed DRL approach. Furthermore, the outcomes illustrate the trade-off between security and complexity.

本文探讨在5G网络之外,在一个多细胞网络中应用一个基于学习的多剂强化学习(MARL)联合战略,以加强实物安全,每个基站作为深度强化学习(DRL)代理器,与周围环境互动,在监听器面前最大限度地提高合法用户的保密率。这个窃听器试图拦截BS与其授权用户之间共享的机密信息。DRL代理器被视为是联合的,因为它们仅与中央服务器共享其网络参数,而不是其合法用户的私人数据。探索和比较了两种DRL方法、深Q网络(DQN)和加强深度政策梯度(RDPG)。结果显示,RDPG比DQN更快地聚合。此外,我们证明,拟议的方法超过了分布式DRL方法。此外,结果说明了安全和复杂性之间的权衡。


Article 244

Title@2025-07-09 (3): Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

Title: Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing Erzeugung von Multi-Table-Zeitreihen EHR aus dem Latent Space mit minimaler Vorverarbeitung 利用最小预处理器从冷藏空间生成多表时间序列 EHR 2507.06996v1

Authors (5): Eunbyeol Cho, Jiyoun Kim, Minjae Lee, Sungjin Park, Edward Choi

Electronic Health Records (EHR) are time-series relational databases that record patient interactions and medical events over time, serving as a critical resource for healthcare research and applications. However, privacy concerns and regulatory restrictions limit the sharing and utilization of such sensitive data, necessitating the generation of synthetic EHR datasets. Unlike previous EHR synthesis methods, which typically generate medical records consisting of expert-chosen features (e.g. a few vital signs or structured codes only), we introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs. Using text-based representation and compression techniques, RawMed captures complex structures and temporal dynamics with minimal preprocessing. We also propose a new evaluation framework for multi-table time-series synthetic EHRs, assessing distributional similarity, inter-table relationships, temporal dynamics, and privacy. Validated on two open-source EHR datasets, RawMed outperforms baseline models in fidelity and utility. The code is available at https://github.com/eunbyeol-cho/RawMed.

电子健康记录(EHR)是长期记录病人互动和医疗事件的时间序列关系数据库,是保健研究和应用的重要资源,然而,隐私关切和监管限制限制了此类敏感数据的共享和利用,因此需要生成合成的EHR数据集。与以前的EHR合成方法不同,后者通常生成由专家选择特征(例如,只有几个生命标志或结构代码)组成的医疗记录。我们引入了RawMed,这是第一个综合与原始EHR非常相似的多表、时间序列EHR数据的框架。使用基于文本的表述和压缩技术,RawMed捕捉复杂的结构和时间动态,且只有最低限度的预处理。我们还提出了新的多表时间序列合成EHR的评估框架,评估分布相似性、跨表关系、时间动态和隐私。在两个开放源EHR数据集上进行了验证,RawMed超越了忠实和实用的基线模型。该代码可在https://github.com/eunbiyeol/RawMed上查阅。


Article 245

Title@2025-07-09 (3): A Principled Framework for Multi-View Contrastive Learning

Title: A Principled Framework for Multi-View Contrastive Learning Ein prinzipielles Rahmenwerk für das kontrastive Lernen mit mehreren Ansichten 多方审查差异学习原则性框架 2507.06979v1

Authors (6): Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, Yannis Panagakis

Contrastive Learning (CL), a leading paradigm in Self-Supervised Learning (SSL), typically relies on pairs of data views generated through augmentation. While multiple augmentations per instance (more than two) improve generalization in supervised learning, current CL methods handle additional views suboptimally by simply aggregating different pairwise objectives. This approach suffers from four critical limitations: (L1) it utilizes multiple optimization terms per data point resulting to conflicting objectives, (L2) it fails to model all interactions across views and data points, (L3) it inherits fundamental limitations (e.g. alignment-uniformity coupling) from pairwise CL losses, and (L4) it prevents fully realizing the benefits of increased view multiplicity observed in supervised settings. We address these limitations through two novel loss functions: MV-InfoNCE, which extends InfoNCE to incorporate all possible view interactions simultaneously in one term per data point, and MV-DHEL, which decouples alignment from uniformity across views while scaling interaction complexity with view multiplicity. Both approaches are theoretically grounded - we prove they asymptotically optimize for alignment of all views and uniformity, providing principled extensions to multi-view contrastive learning. Our empirical results on ImageNet1K and three other datasets demonstrate that our methods consistently outperform existing multi-view approaches and effectively scale with increasing view multiplicity. We also apply our objectives to multimodal data and show that, in contrast to other contrastive objectives, they can scale beyond just two modalities. Most significantly, ablation studies reveal that MV-DHEL with five or more views effectively mitigates dimensionality collapse by fully utilizing the embedding space, thereby delivering multi-view benefits observed in supervised learning.

自我强化学习(SSL)的主要模式 — — 自我强化学习(SSL)的主要模式 — — 学习(CL)通常依赖于通过增强生成的数据观点。虽然多个增强(超过两个)一例(多例)提高了监督学习的普及性,但当前的 CL 方法通过简单整合不同的对称目标来处理额外的次优化观点。 这种方法有四个关键的局限性:(L1) 它使用每个数据点的多重优化条件,从而导致相互冲突的目标,(L2) 它无法模拟不同观点和数据点之间的所有互动,(L3) 它继承了从对齐的 CL损失(如对齐-一致性合并)中产生的基本限制(如对齐的 CL),以及(L4) 它阻碍完全实现在监督环境下观察到的更多视图多样性的效益。 我们通过两个新的损失函数(MV-InfoNCE) 来应对这些局限性:MV-InfONE,它将信息NCE扩展到将所有可能的视图同时纳入每个数据点的一个术语中,MV-DHEL,它无法将各种观点与各种观点的趋同,同时提升互动,同时将显示各种观点与视觉的复杂性与显示显示的复杂。 两种观点,我们的现有图像中,我们不断更新和不断更新的模型的模型的模型的模型的模型将显示的所有数据和多式的模型的模型的模型的效益展示,通过三个扩展。


Article 246

Title@2025-07-09 (3): PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series

Title: PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series PyPOTS: Ein Python-Toolkit für maschinelles Lernen auf teilbeobachteten Zeitreihen PyPOPTS:关于部分使用时间序列的机器学习的Python工具包 2305.18811v2

Authors (5): Wenjie Du, Yiyuan Yang, Linglong Qian, Jun Wang, Qingsong Wen

PyPOTS is an open-source Python library dedicated to data mining and analysis on multivariate partially-observed time series with missing values. Particularly, it provides easy access to diverse algorithms categorized into five tasks: imputation, forecasting, anomaly detection, classification, and clustering. The included models represent a diverse set of methodological paradigms, offering a unified and well-documented interface suitable for both academic research and practical applications. With robustness and scalability in its design philosophy, best practices of software construction, for example, unit testing, continuous integration and continuous delivery, code coverage, maintainability evaluation, interactive tutorials, and parallelization, are carried out as principles during the development of PyPOTS. The toolbox is available on PyPI, Anaconda, and Docker. PyPOTS is open source and publicly available on GitHub https://github.com/WenjieDu/PyPOTS.

PyPOTS是一个开放源码的Python图书馆,专门用于数据挖掘和分析缺少值的多变量部分观察时间序列,特别是,它提供了容易获得分为五类任务的各种算法:估算、预测、异常检测、分类和集群,包括模型代表了一套不同的方法范式,提供了适合学术研究和实用应用的统一和有据可查的界面。在设计哲学中,软件建设的最佳做法,例如单位测试、连续整合和连续交付、代码覆盖、维护性评价、交互式辅导和平行化,作为PyPOTS开发期间的原则,在PyPI、Anicaconda和Docker上提供工具箱。PyPOPTS是开放的源头,在GitHub https://github.com/WenjieDu/PyPOTS上公开提供。


Article 247

Title@2025-07-09 (3): Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Title: Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy Vereinheitlichung von Re-Identifikations-, Attribut-Schlussfolgerungs- und Datenwiederaufbaurisiken in der unterschiedlichen Privatsphäre 不同隐私中的识别、属性推断和数据重建风险 2507.06969v1

Authors (7): Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio du Pin Calmon, Jean Louis Raisaro

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks – re-identification, attribute inference, and data reconstruction – are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary (including worst-case) levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, R'enyi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., more than 15pp accuracy increase in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

不同的私人(DP)机制很难解释和校准,因为现有为具体隐私风险绘制标准隐私参数的方法 – – 重新识别、属性推断和数据重建 – – 过于悲观和不一致。在这项工作中,我们使用DP(f$-DP)的假设测试解释,确定攻击成功界限在重复识别、属性推断和数据重建风险之间可以采取同样的统一形式。我们的统一界限是:(1) 在许多攻击环境之间保持一致,(2) 使从业人员能够评估任意(包括最坏情况)基线风险水平的风险。 简而言之,我们的结果比以前使用$\varepsilon$-DP、R'enyi DP和集中的DP的方法更加严格。结果,用我们的界限校准噪音可以在同一风险水平上将所要求的噪音减少20%,例如,文本分类任务的准确度增加15ppo以上。总体而言,这种统一的观点为解释和校准DP的保护程度与特定水平的重新确定数据或属性提供了原则框架。


Article 248

Title@2025-07-09 (3): Noisy PDE Training Requires Bigger PINNs

Title: Noisy PDE Training Requires Bigger PINNs Noisy PDE Training erfordert größere PINNs 吵闹的PDE培训需要更大的 PINN 2507.06967v1

Authors (3): Sebastien Andre-Sloan, Anirbit Mukherjee, Matthew Colbrook

Physics-Informed Neural Networks (PINNs) are increasingly used to approximate solutions of partial differential equations (PDEs), especially in high dimensions. In real-world applications, data samples are noisy, so it is important to know when a predictor can still achieve low empirical risk. However, little is known about the conditions under which a PINN can do so effectively. We prove a lower bound on the size of neural networks required for the supervised PINN empirical risk to fall below the variance of noisy supervision labels. Specifically, if a predictor achieves an empirical risk $O(\eta)$ below $\sigma^2$ (variance of supervision data), then necessarily $d_N\log d_N\gtrsim N_s \eta^2$, where $N_s$ is the number of samples and $d_N$ is the number of trainable parameters of the PINN. A similar constraint applies to the fully unsupervised PINN setting when boundary labels are sampled noisily. Consequently, increasing the number of noisy supervision labels alone does not provide a ``free lunch’’ in reducing empirical risk. We also show empirically that PINNs can indeed achieve empirical risks below $\sigma^2$ under such conditions. As a case study, we investigate PINNs applied to the Hamilton–Jacobi–Bellman (HJB) PDE. Our findings lay the groundwork for quantitatively understanding the parameter requirements for training PINNs in the presence of noise.

物理进化神经网络(PINN)日益被用来近似部分差异方程式(PDE)的解决方案,特别是在高维方面。在现实世界的应用中,数据样本很吵,因此重要的是要知道预测器何时仍能达到低经验风险。然而,对于PINN能够有效达到这一目的的条件知之甚少。我们证明,在受监督的PINN实证风险低于噪音监督标签差异所需的神经网络规模限制较小。具体地说,如果预测器在实验风险中达到低于$O(geta)$(美元)2美元(监督数据的变化),那么数据样本中数据样本中的数据样本中的数据样本中的数据样本中的数据样本中的数据样本中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据中的数据。


Article 249

Title@2025-07-09 (3): Pullback Flow Matching on Data Manifolds

Title: Pullback Flow Matching on Data Manifolds Pullback Flow Passend zu Daten-Manifolds 数据处理匹配的回回回回回流 2410.04543v2

Authors (5): Friso de Kruiff, Erik Bekkers, Ozan Öktem, Carola-Bibiane Schönlieb, Willem Diepeveen

We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold’s geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM’s effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.

我们提议“拉回流量匹配”(PFM),这是数据元模型的基因模型新颖框架。与目前采用或学习用于培训里曼流动匹配模型的限制性封闭式多方绘图的方法不同,PFM利用拉回几何学和等量学习来保护基本多方的几何,同时在潜质空间中进行高效生成和精确的内插。这个方法不仅有利于数据元的闭式绘图,而且允许使用数据和潜质的假设度量度进行可设计的潜在空间。我们通过通过神经观察和提出可扩缩的培训目标,实现一种更适合内插的隐性空间,从而导致改进多重学习和基因性能。我们通过合成数据、蛋白动态和蛋白序列数据的应用,展示PFM的功效,产生具有具体特性的新蛋白质。这种方法显示了药物发现和材料科学的巨大潜力,在其中产生具有特定特性的新样本具有极大兴趣。


Article 250

Title@2025-07-09 (3): Off-Policy Evaluation Under Nonignorable Missing Data

Title: Off-Policy Evaluation Under Nonignorable Missing Data Off-Policy-Bewertung unter nicht entschuldbaren fehlenden Daten 在无污名失踪数据项下的非政策评价 2507.06961v1

Authors (4): Han Wang, Yang Xu, Wenbin Lu, Rui Song

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighted value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

外部政策评价(OPE)旨在利用从潜在不同政策中收集的离线数据估计目标政策的价值,但在现实世界应用中,记录的数据往往缺乏,尽管在文献中已对OPE进行了广泛研究,但对缺少的数据如何影响OPE结果的理论理解仍然不明确,在本文中,我们以单质缺失为证,对OPE进行调查,从理论上表明,在可忽略的缺失情况下,价值估计仍然没有偏见,但在不可忽略(信息化)缺失情况下,可能会有偏见。为了保持价值估计的一致性,我们提议采用一个反概率加权价值估计,并进行统计推论,以量化估计数的不确定性。我们通过一系列数字实验,从经验上表明,我们提议的估算者在缺失数据下得出更可靠的推论。


Article 251

Title@2025-07-09 (3): From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

Title: From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis Vom Video zum EEG: Anpassung der gemeinsamen Einbettung von vorausschauender Architektur an die Entdeckung visueller Konzepte in der Gehirnsignalanalyse 从视频到EEG:使联合嵌入的预测结构适应脑信号分析中的不可见视觉概念 2507.03633v3

Authors (6): Amirabbas Hojjati, Lu Li, Ibrahim Hameed, Anis Yazidi, Pedro G. Lind, Rabindra Khadka

EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy. Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.

EEG信号以高时空和低空间分辨率捕捉大脑活动,支持神经诊断、认知监测和大脑计算机界面等应用,但有效分析受到以下因素的阻碍:标签数据有限、高维度和缺乏充分捕捉时地依赖性的可扩缩模型;现有的自我监督学习方法往往侧重于空间或时间特征,导致表现不尽理想;为此,我们提议EEEG-VVJEPA,对视频联合嵌入式预测架构(V-JEPA)进行新调整,以进行EEEEEG分类。通过将EEEEG作为像的序列处理,EEEG-VEPA学会学会通过联合嵌入和适应性遮掩体来学习具有意义意义的可伸缩模型。据我们所知,这是利用V-JEPA来进行EG分类并探索模型所学的视觉概念的首项工作。对公开的圣殿大学医院(TUH)Abrmal EEG数据集的评估显示,EEEG-JE-S-S-assyalimalimalal assalalal assalizationalization Acal 和EEEEEEEEEEEA 分类中的现有S-Calimal-Cal-Calizalizal-Cs。


Article 252

Title@2025-07-09 (3): Bayesian Invariance Modeling of Multi-Environment Data

Title: Bayesian Invariance Modeling of Multi-Environment Data Bayesische Invarianz Modellierung von Multi-Umwelt-Daten 多种环境数据的贝耶斯因果建模 2506.22675v3

Authors (5): Luhuan Wu, Mingzhang Yin, Yixin Wang, John P. Cunningham, David M. Blei

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.

变化预测[Peters 等人,2016年]分析多种环境中的特征/结果数据,以查明与结果具有稳定的预测关系的变量特征。这些特征支持对新环境的概括化,并有助于揭示因果关系机制。以前的方法主要通过假设测试或常规化优化来解决这个问题。我们在这里开发了贝叶斯变化预测的概率模型(BIP),这是变化预测的概率模型。BIP将变化特征的指数编码为潜在变量,并通过事后推断加以恢复。根据Peters等人的假设([2016年]), BIP 海报针对的是真实的变量特征。我们证明,外表是一致的,更大的环境差异性导致更快的外表收缩。为了处理许多特征,我们设计了一个高效的变异近率,称为VI-BIP。在模拟和真实数据中,我们发现BIP和VI-BIP比现有的变量预测方法更准确、更可伸缩。


Article 253

Title@2025-07-09 (3): Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Title: Generating Heterogeneous Multi-dimensional Data : A Comparative Study Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie 生成异质多维数据:比较研究 2507.00090v2

Authors (4): Michael Corbeau, Emmanuelle Claeys, Mathieu Serrurier, Pascale Zaraté

Allocation of personnel and material resources is highly sensible in the case of firefighter interventions. This allocation relies on simulations to experiment with various scenarios. The main objective of this allocation is the global optimization of the firefighters response. Data generation is then mandatory to study various scenarios In this study, we propose to compare different data generation methods. Methods such as Random Sampling, Tabular Variational Autoencoders, standard Generative Adversarial Networks, Conditional Tabular Generative Adversarial Networks and Diffusion Probabilistic Models are examined to ascertain their efficacy in capturing the intricacies of firefighter interventions. Traditional evaluation metrics often fall short in capturing the nuanced requirements of synthetic datasets for real-world scenarios. To address this gap, an evaluation of synthetic data quality is conducted using a combination of domain-specific metrics tailored to the firefighting domain and standard measures such as the Wasserstein distance. Domain-specific metrics include response time distribution, spatial-temporal distribution of interventions, and accidents representation. These metrics are designed to assess data variability, the preservation of fine and complex correlations and anomalies such as event with a very low occurrence, the conformity with the initial statistical distribution and the operational relevance of the synthetic data. The distribution has the particularity of being highly unbalanced, none of the variables following a Gaussian distribution, adding complexity to the data generation process.

在消防人员干预方面,人事和物质资源的分配非常明智,这种分配取决于模拟来试验各种假设情况。这种分配的主要目标是使消防人员反应的全球最佳化。然后,数据生成是研究各种假设情况的必要条件。在本研究中,我们提议比较不同的数据生成方法。随机抽样、标签自动生成器、标准基因反向网络、标准组合组合组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合式组合


Article 254

Title@2025-07-09 (3): Representative Ranking for Deliberation in the Public Sphere

Title: Representative Ranking for Deliberation in the Public Sphere Repräsentative Rangliste für die Beratung im öffentlichen Bereich 公共领域评议代表 2503.18962v2

Authors (5): Manon Revel, Smitha Milli, Tyler Lu, Jamelle Watson-Daniels, Max Nickel

Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e.g., by using civility classifiers or forms of prosocial ranking. Yet, these interventions may also inadvertently reduce the visibility of legitimate viewpoints, undermining another key aspect of deliberation: representation of diverse views. We seek to remedy this problem by introducing guarantees of representation into these methods. In particular, we adopt the notion of justified representation (JR) from the social choice literature and incorporate a JR constraint into the comment ranking setting. We find that enforcing JR leads to greater inclusion of diverse viewpoints while still being compatible with optimizing for user engagement or other measures of conversational quality.

在线评论部分,如新闻网站或社交媒体上的评论部分,有可能促进非正式的公共讨论,然而,这种潜力往往由于这些场合中经常发生有毒或低质量的交流而受到损害,为此,平台越来越多地利用算法排名来便利更高质量的讨论,例如,利用文明分类或亲社会的排名形式;然而,这些干预措施也可能无意中降低合法观点的可见度,损害审议的另一个关键方面:代表不同观点。我们力求通过在这些方法中引入代表的保障来纠正这一问题。我们特别采用了社会选择文献中合理代表的概念,并在评论排名设置中纳入了JR的制约。我们发现,执行JR导致更多地纳入不同观点,同时仍然与用户参与的优化或其他对话质量措施相容。


Article 255

Title@2025-07-09 (3): ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

Title: ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease ADPv2: Ein hierarchischer histologischer Gewebetyp-Annotierter Datensatz für potenzielle Biomarker Entdeckung von Kolorektalerkrankungen ADPv2: 用于潜在直肠疾病生物标志发现的潜在生物标志数据集 2507.05656v2

Authors (12): Zhiyuan Yang, Kai Li, Sophia Ghamoshi Ramandi, Patricia Brassard, Hakim Khellaf, Vincent Quoc-Huy Trinh, Jennifer Zhang, Lina Chen, Corwyn Rowsell, Sonal Varma, Kostas Plataniotis, Mahdi S. Hosseini

Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model’s prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available at https://zenodo.org/records/15307021

科帕特病理学(CoPath)利用组织病理学图象来提高临床病理学的诊断精确度和可复制性。然而,由于需要大量专门知识和高注解成本,在颗粒层面的科帕特(CoPath)的可公开获得的数据数据集仍然稀少。数字病理学(CoPath)等现有数据集通过向多个器官提供多种HTT说明来解决这一问题,但限制了深入研究特定器官疾病的能力。基于这个基础,我们引入了ADPv2,一个以胃肠胃病理学为主的新数据集。我们的数据集由20,004个来自健康的结肠组织组织病理学的图象补,根据32个不同的HTTs(ADP)的等级分类,根据我们ADPv2的两阶段培训程序,我们培训了一个多标签代表学习模型。我们利用了VMamba架构,并在多层肠道肠道病理学(MAP)中实现了8.8的中以胃肠道病理学形态的新型数据集。最后,我们通过一个可确认的多层病理学病理学病理学病理学病理学病理学病理学的模型,我们通过HTTTTTTT.C进行两个数据研究的模型显示的模型显示我们的数据学学学分解学学学学学学学学学的模型显示。


Article 256

Title@2025-07-09 (3): DICE: Data Influence Cascade in Decentralized Learning

Title: DICE: Data Influence Cascade in Decentralized Learning DICE: Dateneinfluss Cascade im dezentralisierten Lernen DICIC: 分散学习中的数据影响连锁数据 2507.06931v1

Authors (4): Tongtian Zhu, Wenhao Li, Can Wang, Fengxiang He

Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade’’ in a decentralized network. To overcome this, we design the first method to estimate \textbf{D}ata \textbf{I}nfluence \textbf{C}ascad\textbf{E} (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape. DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE/.

分散化学习为通过同侪网络进行地理分布的多方源数据消费和计算工作量提供了一种充满希望的方法,通过同侪网络进行计算,从而满足急剧增长的需求。然而,适当的奖励办法仍然缺乏,大大抑制了参与。我们的愿景是,公平的奖励机制依赖于对参与节点的捐款的公平分配,而参与节点面临非三重挑战,因为地方联系在分散化的网络中产生了“连带”影响。为了克服这一点,我们设计了第一个方法来估计在分散化的环境中对各种应用进行估计,包括选择适当的合作者和查明恶意行为。从理论上讲,该框架产生了对任意邻里跳跃的可移动影响力的可移动近似值,表明影响是数据、通信地貌和损失地貌曲线的相互作用所决定的。DICE还为各种应用奠定了基础,包括选择合适的合作者和确定恶意行为。项目网页见https://raiden-zhu.github.io/blogb/205/DICE/。


Article 257

Title@2025-07-09 (3): Machine-Learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy

Title: Machine-Learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy Machine-learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy 混合集成级精密度拉蒂动态的机生成力场 2507.06929v1

Authors (3): Sita Schönbauer, Johanna P. Carbone, Andreas Grüneis

We investigate Machine-Learned Force Fields (MLFFs) trained on approximate Density Functional Theory (DFT) and Coupled Cluster (CC) level potential energy surfaces for the carbon diamond and lithium hydride solids. We assess the accuracy and precision of the MLFFs by calculating phonon dispersions and vibrational densities of states (VDOS) that are compared to experiment and reference ab initio results. To overcome limitations from long-range effects and the lack of atomic forces in the CC training data, a delta-learning approach based on the difference between CC and DFT results is explored. Compared to DFT, MLFFs trained on CC theory yield higher vibrational frequencies for optical modes, agreeing better with experiment. Furthermore, the MLFFs are used to estimate anharmonic effects on the VDOS of lithium hydride at the level of CC theory.

我们调查了在碳钻石和氢化锂固体的密度功能理论(DFT)和组合组(CC)级潜在能源表面方面受过大约密度功能理论(DFT)和组合组(CC)培训的机载力场;我们通过计算与试验和初始参考结果相比较的州分红和振动密度(VDOS)来评估MLFF的准确性和精确性;为了克服长距离影响的限制和CC培训数据中缺乏原子力量的问题,根据CCF结果和DFT结果之间的差别,探索了三角学习方法;与DFT相比,在CC理论方面受过培训的MLFF在光学模式方面产生更高的振动频率,并同意更好的实验;此外,MLFF还用来估计CC理论一级对锂氢化物的VDOS的和谐效应。


Article 258

Title@2025-07-09 (3): Distribution-free inference for LightGBM and GLM with Tweedie loss

Title: Distribution-free inference for LightGBM and GLM with Tweedie loss Verteilungsfreie Inferenz für LightGBM und GLM mit Tweedie-Verlust 对Tweedie损失的轻GBM和GLM灯光和GLM的无分配式推推 2507.06921v1

Authors (6): Alokesh Manna, Aditya Vikram Sett, Dipak K. Dey, Yuwen Gu, Elizabeth D. Schifano, Jichao He

Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.

保险业(cite{parodi2023precrimining})评估了个别司机的可能索赔费用的范围,提高了保价定价的准确性。它还使保险商能够通过考虑到事故可能性和严重程度的不确定性来更有效地管理风险。在出现共差的情况下,常常使用各种回归型模型来模拟保险索赔,从相对简单的普遍线性模型(GLM)到正规化的GLM(GLM)到梯度加速模型(GBM),作为在相对薄弱的互换性假设下量化预测不确定性的流行的无分布式预测性方法出现。在典型的线性回归设置下,我们很好地研究了这一方法。在这项工作中,我们提出了针对GLMs和GBMs与GLM类型损失的新的不兼容性措施。我们使用常规化的Tweedie GLM回归模型和LGBM和LightGBM和Tweedie损失的LGBM,我们展示了与保险索赔数据中这些不一致性措施的一致性预测性绩效。我们的模拟结果有利于使用当地加权Pearson平面的宽度,认为是最小的宽度。


Article 259

Title@2025-07-09 (3): Robust and Safe Traffic Sign Recognition using N-version with Weighted Voting

Title: Robust and Safe Traffic Sign Recognition using N-version with Weighted Voting Robuste und sichere Verkehrszeichenerkennung mit N-Version mit gewichteter Abstimmung 以加权投票方式使用N版本进行强力和安全交通信号识别 2507.06907v1

Authors (3): Linyun Gao, Qiang Wen, Fumio Machida

Autonomous driving is rapidly advancing as a key application of machine learning, yet ensuring the safety of these systems remains a critical challenge. Traffic sign recognition, an essential component of autonomous vehicles, is particularly vulnerable to adversarial attacks that can compromise driving safety. In this paper, we propose an N-version machine learning (NVML) framework that integrates a safety-aware weighted soft voting mechanism. Our approach utilizes Failure Mode and Effects Analysis (FMEA) to assess potential safety risks and assign dynamic, safety-aware weights to the ensemble outputs. We evaluate the robustness of three-version NVML systems employing various voting mechanisms against adversarial samples generated using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks. Experimental results demonstrate that our NVML approach significantly enhances the robustness and safety of traffic sign recognition systems under adversarial conditions.

自动驾驶作为机械学习的关键应用正在迅速推进,但确保这些系统的安全仍然是一个重大挑战。交通标志识别是自治车辆的一个基本组成部分,特别容易受到可能损害驾驶安全的对抗性攻击。在本文件中,我们提议了一个Nversion机器学习框架,将安全意识加权软投票机制纳入其中。我们的方法利用故障模式和效果分析来评估潜在的安全风险,并为联合产出分配动态的、安全觉悟的权重。我们评估了使用各种投票机制对抗使用快速渐进信号法和预测的梯子(PGD)攻击产生的对抗性样品的三版本NVML系统是否稳健。实验结果表明,我们的NVML方法在对抗条件下大大加强了交通标志识别系统的稳健性和安全性。


Article 260

Title@2025-07-09 (3): Neural Canonical Polyadic Factorization for Traffic Analysis

Title: Neural Canonical Polyadic Factorization for Traffic Analysis Neurale kanonischen Polyadische Faktorisierung für die Verkehrsanalyse 用于交通分析的神经卡纳尼基多功能因子化 2506.15079v3

Authors (3): Wenyu Luo, Yikai Hou, Peng Tang

Modern intelligent transportation systems rely on accurate spatiotemporal traffic analysis to optimize urban mobility and infrastructure resilience. However, pervasive missing data caused by sensor failures and heterogeneous sensing gaps fundamentally hinders reliable traffic modeling. This paper proposes a Neural Canonical Polyadic Factorization (NCPF) model that synergizes low-rank tensor algebra with deep representation learning for robust traffic data imputation. The model innovatively embeds CP decomposition into neural architecture through learnable embedding projections, where sparse traffic tensors are encoded into dense latent factors across road segments, time intervals, and mobility metrics. A hierarchical feature fusion mechanism employs Hadamard products to explicitly model multilinear interactions, while stacked multilayer perceptron layers nonlinearly refine these representations to capture complex spatiotemporal couplings. Extensive evaluations on six urban traffic datasets demonstrate NCPF’s superiority over six state-of-the-art baselines. By unifying CP decomposition’s interpretable factor analysis with neural network’s nonlinear expressive power, NCPF provides a principled yet flexible approaches for high-dimensional traffic data imputation, offering critical support for next-generation transportation digital twins and adaptive traffic control systems.

现代智能交通系统依赖于准确的时空交通分析,以优化城市交通和基础设施的复原力。然而,由于传感器失灵和不同感知差距造成的普遍缺失数据,严重阻碍了可靠的交通模式。本文件建议采用神经卡纳尼科综合聚合(NCPF)模型,将低层高温代数与深层代数相结合,为稳健交通数据估算进行深层代表性学习。模型创新地将CP分解嵌入神经结构,方法是通过可学习的嵌入预测,将稀释的交通电压编码成公路各部分、时间间隔和移动指标之间的密集潜伏因素。等级特征聚合机制使用Hadamard产品明确模拟多线性互动,同时堆叠多层多层感应无线性地改进这些特征,以捕捉复杂的双向相交相交相交。对6个城市交通数据集进行广泛评估,表明CPPF的优势高于6个州的基线。通过整合可解释的因素分析,通过神经网络的非线性直径直径传输能力,NCPFPF为关键交通控制双峰数据传输提供有原则的灵活方法。


Article 261

Title@2025-07-09 (3): Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams

Title: Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams Konzipieren von adaptiven Algorithmen auf Basis von Verstärkungslernen zur dynamischen Optimierung der Schiebefenstergröße in multidimensionalen Datenströmen 基于强化学习,使多维数据流中滑动窗口大小动态优化的强化学习设计适应性比值 2507.06901v1

Authors (2): Abolfazl Zarghani, Sadegh Abedi

Multi-dimensional data streams, prevalent in applications like IoT, financial markets, and real-time analytics, pose significant challenges due to their high velocity, unbounded nature, and complex inter-dimensional dependencies. Sliding window techniques are critical for processing such streams, but fixed-size windows struggle to adapt to dynamic changes like concept drift or bursty patterns. This paper proposes a novel reinforcement learning (RL)-based approach to dynamically optimize sliding window sizes for multi-dimensional data streams. By formulating window size selection as an RL problem, we enable an agent to learn an adaptive policy based on stream characteristics, such as variance, correlations, and temporal trends. Our method, RL-Window, leverages a Dueling Deep Q-Network (DQN) with prioritized experience replay to handle non-stationarity and high-dimensionality. Evaluations on benchmark datasets (UCI HAR, PAMAP2, Yahoo! Finance Stream) demonstrate that RL-Window outperforms state-of-the-art methods like ADWIN and CNN-Adaptive in classification accuracy, drift robustness, and computational efficiency. Additional qualitative analyses, extended metrics (e.g., energy efficiency, latency), and a comprehensive dataset characterization further highlight its adaptability and stability, making it suitable for real-time applications.

多维数据流在IoT、金融市场和实时分析等应用中十分流行,由于其高速、无约束性质和复杂的多维相互依存性,多维数据流构成重大挑战。 滑动窗口技术对于处理这种流流至关重要,但固定大小的窗口为适应动态变化而挣扎,如概念漂移或爆发模式。本文件提议采用基于新颖强化学习(RL)的方法,以动态优化多维数据流的滑动窗口大小。通过将窗口大小选择设计成一个RL问题,我们使得一个代理能够学习基于流特性的适应性政策,如差异、相关性和时间趋势。我们的方法,RL-Window,利用一个具有优先经验的裁断深Q网络(DQN)来应对动态变化,以处理不常态和高维度模式。 对基准数据集(UCIHAR、PAM2、Yhoo!Final Stream)的评估表明,RL-Window 超越了动态状态,例如ADW和CN-ADR-A趋势趋势趋势。我们的方法,RL-W, 利用了匹配深度的深度网络-Win-Net-Net-Net-Net-Net-net-net-net-net-net-net-net-net-net-net-net-net-net-net-net-tal-tal-tal-tal-tal-devallicalislationalislationalislationalityalityalityalisticalisticalityalityalityality contracalityalityalityalityaliscalityalityalityalityalityaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldal


Article 262

Title@2025-07-09 (3): SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN

Title: SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN SCore: Verstärkte Corpus-basierte Relation-Extraktion mit multi-Label Kontrastivem Lernen und Bayesian kNN SCOR:利用多标签差异学习和Bayesian kNN简化公司关系采掘 2507.06895v1

Authors (3): Luca Mariotti, Veronica Guidetti, Federica Mandreoli

The growing demand for efficient knowledge graph (KG) enrichment leveraging external corpora has intensified interest in relation extraction (RE), particularly under low-supervision settings. To address the need for adaptable and noise-resilient RE solutions that integrate seamlessly with pre-trained large language models (PLMs), we introduce SCoRE, a modular and cost-effective sentence-level RE system. SCoRE enables easy PLM switching, requires no finetuning, and adapts smoothly to diverse corpora and KGs. By combining supervised contrastive learning with a Bayesian k-Nearest Neighbors (kNN) classifier for multi-label classification, it delivers robust performance despite the noisy annotations of distantly supervised corpora. To improve RE evaluation, we propose two novel metrics: Correlation Structure Distance (CSD), measuring the alignment between learned relational patterns and KG structures, and Precision at R (P@R), assessing utility as a recommender system. We also release Wiki20d, a benchmark dataset replicating real-world RE conditions where only KG-derived annotations are available. Experiments on five benchmarks show that SCoRE matches or surpasses state-of-the-art methods while significantly reducing energy consumption. Further analyses reveal that increasing model complexity, as seen in prior work, degrades performance, highlighting the advantages of SCoRE’s minimal design. Combining efficiency, modularity, and scalability, SCoRE stands as an optimal choice for real-world RE applications.

利用外部公司对高效知识图(KG)的丰富利用外部公司的需求日益增加,这使人们更加关注关系提取(RE),特别是在低监督环境之下。为了满足对适应性和有噪音弹性的RE解决方案的需求,这些解决方案需要与经过事先训练的大语言模型(PLM)无缝结合,我们引入了模块化和具有成本效益的 RE 系统。SCORE使PLM转换容易,不要求微调,并顺利适应不同的公司和KGs。通过将受监督的对比学习与多标签分类的Bayesian k-Nearnighbors(KNNN)分类结合起来,它提供了强大的性能,尽管远受监督的巨型公司(PLMs)发出响亮的提示。为了改进RE,我们提出了两个新颖的衡量标准:校正结构距离(CSCS),衡量所学关系模式和KGG结构之间的匹配,以及Precion 评估作为建议系统评估效用的系统。我们还发布了Wik20d,一个基准数据集复制了现实-世界的RE条件,其中只有KG-Brest-ration State Proview Proviewal press repress repress repress repress repress repress repress repress repress repress repress press laisal press repress press lade lade lade lading lading ladings spress press ladings ladings ladings spress ladings lade lade ladings lade press press lacisal ex lade lade lade lade lade lade lade lade lade lade lade lade lad lacess lacess lacess lapress ex ex lade lade ex lade lacess lacess lade lad lad lad lad lad lad lad lad lade lade lad laess laess laess ex ex


Article 263

Title@2025-07-09 (3): Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants

Title: Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants Horizontales und vertikales Federated Causal Structure Learning über hochgeordnete Kumulanten 横向和垂直联邦结业结构通过高阶碳素学习 2507.06888v1

Authors (6): Wei Chen, Wanyang Gu, Linjun Peng, Ruichu Cai, Zhifeng Hao, Kun Zhang

Federated causal discovery aims to uncover the causal relationships between entities while protecting data privacy, which has significant importance and numerous applications in real-world scenarios. Existing federated causal structure learning methods primarily focus on horizontal federated settings. However, in practical situations, different clients may not necessarily contain data on the same variables. In a single client, the incomplete set of variables can easily lead to spurious causal relationships, thereby affecting the information transmitted to other clients. To address this issue, we comprehensively consider causal structure learning methods under both horizontal and vertical federated settings. We provide the identification theories and methods for learning causal structure in the horizontal and vertical federal setting via higher-order cumulants. Specifically, we first aggregate higher-order cumulant information from all participating clients to construct global cumulant estimates. These global estimates are then used for recursive source identification, ultimately yielding a global causal strength matrix. Our approach not only enables the reconstruction of causal graphs but also facilitates the estimation of causal strength coefficients. Our algorithm demonstrates superior performance in experiments conducted on both synthetic data and real-world data.

联邦因果发现旨在发现实体之间的因果关系,同时保护数据隐私,这在现实世界情景中具有重大意义并有许多应用。现有的联邦因果结构学习方法主要侧重于横向联动环境。然而,在实际情况下,不同的客户不一定包含关于相同变量的数据。在一个客户中,不完整的变量组很容易导致虚假因果关系,从而影响传递给其他客户的信息。为了解决这一问题,我们在横向和纵向联结环境中全面考虑因果结构学习方法。我们提供了通过较高级累积器在横向和纵向联邦环境中学习因果结构的识别理论和方法。具体地说,我们首先汇总所有参与客户的较高级累积信息,以构建全球累积估计数。然后将这些全球估计数用于循环源的识别,最终生成出全球因果强度矩阵。我们的方法不仅能够重建因果图表,而且还有助于估计因果强度系数。我们的算法显示在合成数据和真实世界数据上进行的实验的优异性。


Article 264

Title@2025-07-09 (3): A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas

Title: A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas Eine Umfrage zu Event Prediction Methods aus einer Systemperspektive: Gemeinsam unterschiedliche Forschungsbereiche 从系统角度对事件预测方法的调查:将分离研究领域结合起来 2302.04018v2

Authors (2): Janik-Vasily Benzin, Stefanie Rinderle-Ma

Event prediction is the ability of anticipating future events, i.e., future real-world occurrences, and aims to support the user in deciding on actions that change future events towards a desired state. An event prediction method learns the relation between features of past events and future events. It is applied to newly observed events to predict corresponding future events that are evaluated with respect to the user’s desired future state. If the predicted future events do not comply with this state, actions are taken towards achieving desirable future states. Evidently, event prediction is valuable in many application domains such as business and natural disasters. The diversity of application domains results in a diverse range of methods that are scattered across various research areas which, in turn, use different terminology for event prediction methods. Consequently, sharing methods and knowledge for developing future event prediction methods is restricted. To facilitate knowledge sharing on account of a comprehensive integration and assessment of event prediction methods, we take a systems perspective to integrate event prediction methods into a single system, elicit requirements, and assess existing work with respect to the requirements. Based on the assessment, we identify open challenges and discuss future research directions.

事件预测是预测未来事件的能力,即未来真实世界事件的能力,目的是支持用户决定改变未来事件的行动,使之达到理想状态; 事件预测方法学习过去事件特征与未来事件之间的关系; 用于新观测的事件预测根据用户期望的未来状态评估的相应未来事件; 如果预测的未来事件不符合这一状态,则采取行动实现理想的未来状态; 很明显,事件预测在许多应用领域,如商业和自然灾害,是有价值的; 应用领域的多样性导致各种各样的方法,这些方法分散于不同的研究领域,而不同的研究领域则使用不同的术语来进行事件预测方法; 因此,为开发未来事件预测方法而共享方法和知识受到限制; 为了在综合整合和评估事件预测方法时促进知识共享,我们从系统角度将事件预测方法纳入一个单一的系统,征求要求,并评估与需求有关的现有工作。 根据评估,我们查明公开的挑战并讨论未来的研究方向。


Article 265

Title@2025-07-09 (3): Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems

Title: Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems Konsequentitäts-Robustheits-Trade-Offs für Learning-Augmented Online-Knapsack-Probleme 学习增强型在线知识包在线问题近最佳一致度-交战贸易差额 2406.18752v2

Authors (6): Mohammadreza Daneshvaramoli, Helia Karisani, Adam Lechowicz, Bo Sun, Cameron Musco, Mohammad Hajiesmaili

This paper introduces a family of learning-augmented algorithms for online knapsack problems that achieve near Pareto-optimal consistency-robustness trade-offs through a simple combination of trusted learning-augmented and worst-case algorithms. Our approach relies on succinct, practical predictions – single values or intervals estimating the minimum value of any item in an offline solution. Additionally, we propose a novel fractional-to-integral conversion procedure, offering new insights for online algorithm design.

本文介绍一系列在线 knapsack 问题的学习强化算法,这些算法通过一个简单的可信赖的学习强化算法和最坏的算法组合,在Pareto- 最佳一致性- 坏坏坏性权衡中实现接近Pareto- 最佳一致性- 平衡。 我们的方法依赖于简洁、实际的预测 – – 单一值或每隔一段时间估计离线解决方案中任何项目的最小值。 此外,我们提出一个新的分数转换程序,为在线算法设计提供新的洞察力。


Article 266

Title@2025-07-09 (3): IntOPE: Off-Policy Evaluation in the Presence of Interference

Title: IntOPE: Off-Policy Evaluation in the Presence of Interference IntOPE: Off-Policy-Evaluierung in Gegenwart von Störungen IntOPE: 干涉情况下的非政策评价 2408.13484v2

Authors (5): Yuqi Bai, Ziyu Zhao, Chenxin Lyu, Minqin Zhu, Kun Kuang

Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual’s reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.

政策外评估(OPE)用于评估假设政策的潜在影响,使用记录背景强盗反馈,这在个人化医学和建议系统等领域至关重要,因为在线互动与重大风险和费用相关。传统上,OPE方法依赖稳定单位处理价值假设(SUTVA),该假设假设认为,对任何特定个人的奖励不受他人行为的影响。然而,这一假设在现实世界中往往失败,原因是存在干扰,个人的报酬不仅受到其自身行动的影响,而且受到同行的行动的影响。这一实现表明,在现实世界应用中,现有的OPE方法存在很大的局限性。为解决这一局限性,我们提议IPIPW, IPW, 一种IPW式的估算器,通过综合边缘化重要性权重来说明个人行为和相邻实体的影响,从而扩展不可避免的估算框架。对合成和真实世界数据进行广泛的实验,以证明拟议的IntIPW方法的有效性。


Article 267

Title@2025-07-09 (3): Conformal Prediction for Long-Tailed Classification

Title: Conformal Prediction for Long-Tailed Classification Konforme Vorhersagen für die lange hintere Klassifizierung 长期驾驶分类的正规预测 2507.06867v1

Authors (3): Tiffany Ding, Jean-Baptiste Fermanian, Joseph Salmon

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii) be a reasonable size, allowing users to easily verify candidate labels. Unfortunately, existing conformal prediction methods, when applied to the long-tailed setting, force practitioners to make a binary choice between small sets with poor class-conditional coverage or sets with very good class-conditional coverage but that are extremely large. We propose methods with guaranteed marginal coverage that smoothly trade off between set size and class-conditional coverage. First, we propose a conformal score function, prevalence-adjusted softmax, that targets a relaxed notion of class-conditional coverage called macro-coverage. Second, we propose a label-weighted conformal prediction method that allows us to interpolate between marginal and class-conditional conformal prediction. We demonstrate our methods on Pl@ntNet and iNaturalist, two long-tailed image datasets with 1,081 and 8,142 classes, respectively.

许多真实世界的分类问题,如植物识别,都存在极其长的等级分布问题。为了让预测组在这些环境中有用,它们应该(一) 提供良好的等级条件覆盖,确保预测组不系统地忽略稀有类别,以及(二) 合理规模,使用户能够方便地核查候选标签。不幸的是,现有的符合预测方法,如植物识别,在应用到长尾背景时,迫使从业者在低等级条件覆盖或极优等级条件覆盖但极为庞大的小类之间作出二进制选择。我们提出了保证边际覆盖的方法,在设定规模和等级条件覆盖之间进行平稳交换。首先,我们提出了一个符合的评分功能,即流行性调整软体积,目标是一个轻松的等级条件覆盖概念,称为宏观覆盖。第二,我们提出了一种标签加权的符合性预测方法,使我们能够在边际和低等级有条件的预测之间进行中间。我们在 Pl@ntNet 和 iNatrilist 上展示了我们的方法,两个长尾的图像数据集,分别为1,081 和 142 8。


Article 268

Title@2025-07-09 (3): Episodic Contextual Bandits with Knapsacks under Conversion Models

Title: Episodic Contextual Bandits with Knapsacks under Conversion Models Episodic Contextual Bandits mit Knapsacks unter Conversion Models 转换模型下带有Knapsack 背包的“瞬时背景强盗” 2507.06859v1

Authors (2): Zitian Li, Wang Chi Cheung

We study an online setting, where a decision maker (DM) interacts with contextual bandit-with-knapsack (BwK) instances in repeated episodes. These episodes start with different resource amounts, and the contexts’ probability distributions are non-stationary in an episode. All episodes share the same latent conversion model, which governs the random outcome contingent upon a request’s context and an allocation decision. Our model captures applications such as dynamic pricing on perishable resources with episodic replenishment, and first price auctions in repeated episodes with different starting budgets. We design an online algorithm that achieves a regret sub-linear in $T$, the number of episodes, assuming access to a \emph{confidence bound oracle} that achieves an $o(T)$-regret. Such an oracle is readily available from existing contextual bandit literature. We overcome the technical challenge with arbitrarily many possible contexts, which leads to a reinforcement learning problem with an unbounded state space. Our framework provides improved regret bounds in certain settings when the DM is provided with unlabeled feature data, which is novel to the contextual BwK literature.

我们研究一个在线设置,让决策者(DM)与背景强盗和Knappsack(BwK)反复发生互动。这些过程从不同的资源数量开始,环境的概率分布是非静止的。所有过程都有着相同的潜在转换模式,这种模式根据请求的背景和分配决定来管理随机结果。我们的模型捕捉了各种应用,例如:对易腐资源动态定价并附带附带补充,以及第一次价格拍卖,在有不同开始预算的反复事件中。我们设计了一种在线算法,以美元为单位,实现遗憾的次线性次线,即假设访问达到$o(T)-gret的场景数量,假设访问达到$(T)-regret的场景。这样的触角很容易从现有的背景波段文献中找到。我们克服了技术挑战,这种技术挑战在任意的许多情况下导致一个无限制的状态空间强化学习问题。我们的框架在某些环境中提供了更好的遗憾界限,因为DM提供了未贴标签的特征数据,这对背景BwK文献来说是新奇的。


Article 269

Title@2025-07-09 (3): Adaptive Elicitation of Latent Information Using Natural Language

Title: Adaptive Elicitation of Latent Information Using Natural Language Adaptive Vermittlung von latenten Informationen mit natürlicher Sprache 利用自然语言调适性易感信息使用自然语言 2504.04204v2

Authors (4): Jimmy Wang, Thomas Zollo, Richard Zemel, Hongseok Namkoong

Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.

在许多应用领域,例如,评估个别学生学习成果、诊断潜在疾病或学习用户偏好等,都是一项关键任务。虽然自然语言是这方面的强大媒介,但大型语言模型和现有的微调算法缺乏战略收集信息的机制,无法通过战略收集信息来完善对潜在实体的理解。为了利用LLMS的概括力量和世界知识来制定有效的信息收集战略,我们提议了一个适应性引导框架,以积极减少潜在实体的不确定性。由于抽象潜在实体的概率模型很难,我们的框架采用预测性不确定性的观点,使用元学语言模型模拟未来观测,使复杂自然语言的可扩展不确定性量化成为可能。通过自动递进式前瞻性模拟,我们的模式量化了新问题如何减少隐含性的不确定性,从而能够发展复杂的信息收集战略来选择最丰富的下一个查询。在20个问题游戏的实验中,动态的民意调查,以及适应性的学生评估中,我们的方法在识别关键未知因素和改进下游语言的预测方面始终超越了基线。


Article 270

Title@2025-07-09 (3): DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models

Title: DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models DiffSpectra: Molekulare Strukturerklärung von Spectra mit Diffusionsmodellen Diffspectra: 使用扩散模型从Spetra进行分子结构说明 2507.06853v1

Authors (10): Liang Wang, Yu Rong, Tingyang Xu, Zhenyi Zhong, Zhiyuan Liu, Pengju Wang, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

Molecular structure elucidation from spectra is a foundational problem in chemistry, with profound implications for compound identification, synthesis, and drug development. Traditional methods rely heavily on expert interpretation and lack scalability. Pioneering machine learning methods have introduced retrieval-based strategies, but their reliance on finite libraries limits generalization to novel molecules. Generative models offer a promising alternative, yet most adopt autoregressive SMILES-based architectures that overlook 3D geometry and struggle to integrate diverse spectral modalities. In this work, we present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data using diffusion models. DiffSpectra formulates structure elucidation as a conditional generation process. Its denoising network is parameterized by Diffusion Molecule Transformer, an SE(3)-equivariant architecture that integrates topological and geometric information. Conditioning is provided by SpecFormer, a transformer-based spectral encoder that captures intra- and inter-spectral dependencies from multi-modal spectra. Extensive experiments demonstrate that DiffSpectra achieves high accuracy in structure elucidation, recovering exact structures with 16.01% top-1 accuracy and 96.86% top-20 accuracy through sampling. The model benefits significantly from 3D geometric modeling, SpecFormer pre-training, and multi-modal conditioning. These results highlight the effectiveness of spectrum-conditioned diffusion modeling in addressing the challenge of molecular structure elucidation. To our knowledge, DiffSpectra is the first framework to unify multi-modal spectral reasoning and joint 2D/3D generative modeling for de novo molecular structure elucidation.

光谱中的分子结构解析是化学的一个根本问题,对化合物识别、合成和药物开发具有深远影响。传统方法主要依赖专家解释,缺乏可缩放性。先导机器学习方法引入了基于检索的战略,但对有限图书馆的依赖限制了对新分子的概括化。生成模型提供了一种有希望的替代方法,但多数人采用了以自动递进SMILES为基础的结构,这种结构忽视了3D的几何测量和努力整合多种光谱模式。在这项工作中,我们提出了DiffSpetra,这是一个从使用扩散模型的多模式光谱数据直接推导2D和3D分子结构。Different机器学习方法引入了基于检索的基于检索基础的战略,但对于有限图书馆的简单化,但对于新分子来说,它只是以Difulpult Moleculeculate 变换器为参数,而Se(3)-equiquistration 结构则会忽略了3Spretarial Exligial Exligistrational Exligistration 和高频谱化结构的模型。Siff-modistrical-de-de-de-de-de-de-deal-de-lax-lax-lax-lax-lax 3-lax-de-de-lax-deal disl-laxal d-lax 和Smocild-de-de-foriztal-laxxxxxxxxxxxxx


Article 271

Title@2025-07-09 (3): EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Title: EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning EMORL: Ensemble Multi-Objektive Verstärkungs-Lernen für effizientes und flexibles LLM-Fine-Tuning EMORL: 将多目标强化学习组合起来,促进高效率和灵活的LLM 微调 2505.02579v3

Authors (5): Lingxiao Kong, Cong Yang, Susanne Neufang, Oya Deniz Beyan, Zeyd Boukhers

Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption ($17,529\pm 1,650$ data points and $6,573\pm 147.43$ seconds), improved scalability and explainability, and comparable performance across multiple objectives.

在大型语言模型(LLM)微调强化学习(RLL)方面最近取得的进展显示,在处理多目标任务方面有希望,但仍面临重大挑战,包括相互竞争的目标平衡、培训效率低、可缩放性低和解释性有限。我们利用混合学习原则,引入一个组合式多目标学习(EMORL)框架,微调多种模式,同时微调具有个别目标的多重模式,同时在微调后优化组合,提高效率和灵活性。我们的方法是首先汇总单个模型的隐藏状态,将多重目标的背景资料纳入其中。这种方法得到一个分级电网搜索算法的支持,该算法确定了最佳加权组合。我们评估顾问思考生成任务的EMORL,使用文本分类模型对几代进行评分,并在RL微调期间提供奖励。我们通过对 PAIR 和 Mitch8k 数据集的全面实验,展示了EMORL 相对于现有基线的优势:培训消耗量显著降低和更加稳定(17,529 pm 1,650美元数据点和6,573\pm 147.3秒),改进了可计量性和可比较性。


Article 272

Title@2025-07-09 (3): Adaptive collaboration for online personalized distributed learning with heterogeneous clients

Title: Adaptive collaboration for online personalized distributed learning with heterogeneous clients Adaptive Zusammenarbeit für online personalisiertes verteiltes Lernen mit heterogenen Kunden 与不同客户进行适应性协作,促进在线个性化个人化分布式学习 2507.06844v1

Authors (4): Constantin Philippenko, Batiste Le Bars, Kevin Scaman, Laurent Massoulié

We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To tackle this, we introduce a gradient-based collaboration criterion, allowing each client to dynamically select peers with similar gradients during the optimization process. Our criterion is motivated by a refined and more general theoretical analysis of the All-for-one algorithm, proved to be optimal in Even et al. (2022) for an oracle collaboration scheme. We derive excess loss upper-bounds for smooth objective functions, being either strongly convex, non-convex, or satisfying the Polyak-Lojasiewicz condition; our analysis reveals that the algorithm acts as a variance reduction method where the speed-up depends on a sufficient variance. We put forward two collaboration methods instantiating the proposed general schema; and we show that one variant preserves the optimality of All-for-one. We validate our results with experiments on synthetic and real datasets.

我们研究个人化的在线个人化分散学习问题,与以美元为单位的统计多样性客户合作加速当地培训。这一背景下的一个重要挑战是选择相关合作者以减少梯度差异,同时减少引入的偏差。为了解决这个问题,我们引入了基于梯度的合作标准,允许每个客户在优化过程中动态地选择具有类似梯度的同龄人。我们的标准是基于对全方位计算法进行更精细和更全面的理论分析,在Even et al. (2022) 中,对于一个奥克莱协作计划是最佳的。我们为平稳的目标功能获取超额损失上限,要么是强烈的 convex、非convex,要么是满足Polyak-Lojasiewicz的条件;我们的分析表明,算法作为一种差异减少差异的方法,而加速取决于足够的差异。我们提出了两种协作方法,即对拟议的通用模型进行即时,我们发现一个变式保留了All for al. (2022) 最佳性。我们用合成和真实数据集的实验来验证我们的结果。


Article 273

Title@2025-07-09 (3): Towards Collaborative Anti-Money Laundering Among Financial Institutions

Title: Towards Collaborative Anti-Money Laundering Among Financial Institutions Auf dem Weg zur gemeinsamen Geldwäschebekämpfung unter den Finanzinstituten 实现金融机构之间反洗钱合作 2502.19952v3

Authors (7): Zhihua Tian, Yuan Ding, Wenjie Qu, Xiang Yu, Enchao Gong, Jian Liu, Kui Ren

Money laundering is the process that intends to legalize the income derived from illicit activities, thus facilitating their entry into the monetary flow of the economy without jeopardizing their source. It is crucial to identify such activities accurately and reliably in order to enforce anti-money laundering (AML). Despite considerable efforts to AML, a large number of such activities still go undetected. Rule-based methods were first introduced and are still widely used in current detection systems. With the rise of machine learning, graph-based learning methods have gained prominence in detecting illicit accounts through the analysis of money transfer graphs. Nevertheless, these methods generally assume that the transaction graph is centralized, whereas in practice, money laundering activities usually span multiple financial institutions. Due to regulatory, legal, commercial, and customer privacy concerns, institutions tend not to share data, restricting their utility in practical usage. In this paper, we propose the first algorithm that supports performing AML over multiple institutions while protecting the security and privacy of local data. To evaluate, we construct Alipay-ECB, a real-world dataset comprising digital transactions from Alipay, the world’s largest mobile payment platform, alongside transactions from E-Commerce Bank (ECB). The dataset includes over 200 million accounts and 300 million transactions, covering both intra-institution transactions and those between Alipay and ECB. This makes it the largest real-world transaction graph available for analysis. The experimental results demonstrate that our methods can effectively identify cross-institution money laundering subgroups. Additionally, experiments on synthetic datasets also demonstrate that our method is efficient, requiring only a few minutes on datasets with millions of transactions.

洗钱是打算将非法活动所得收入合法化的过程,从而便利他们进入经济的货币流动,同时又不损害其来源; 至关重要的是准确和可靠地查明此类活动,以便实施反洗钱(AML),尽管对反洗钱作出了相当大的努力,但许多此类活动仍未被发现; 最初采用了以规则为基础的方法,目前探测系统仍在广泛使用; 随着机器学习的兴起,基于图表的学习方法在通过分析货币转移图来查明非法账户方面越来越突出; 然而,这些方法一般假定交易图是集中的,而在实践中,洗钱活动通常跨越多个金融机构; 由于监管、法律、商业和客户隐私方面的关切,各机构往往不分享数据,限制其实际用途; 在本文件中,我们建议采用第一个算法,支持在多个机构进行反洗钱工作,同时保护当地数据的安全和隐私; 评估,我们只用Alipay-ECB来建立由数字交易组成的真实世界数据集; Alipay,这是世界最大的移动支付平台,通常跨越多个金融机构。 由于监管、法律、商业和客户隐私方面的担心,各机构往往不分享数据,因此无法在实际货币交易中进行交易。


Article 274

Title@2025-07-09 (3): Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning

Title: Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning Skalierbare Gaußprozesse: Fortschritte bei iterativen Methoden und pfadweisen Konditionierungen 可缩放高斯进程:迭接方法和路径性条件方面的进展 2507.06839v1

Authors (1): Jihao Andreas Lin

Gaussian processes are a powerful framework for uncertainty-aware function approximation and sequential decision-making. Unfortunately, their classical formulation does not scale gracefully to large amounts of data and modern hardware for massively-parallel computation, prompting many researchers to develop techniques which improve their scalability. This dissertation focuses on the powerful combination of iterative methods and pathwise conditioning to develop methodological contributions which facilitate the use of Gaussian processes in modern large-scale settings. By combining these two techniques synergistically, expensive computations are expressed as solutions to systems of linear equations and obtained by leveraging iterative linear system solvers. This drastically reduces memory requirements, facilitating application to significantly larger amounts of data, and introduces matrix multiplication as the main computational operation, which is ideal for modern hardware.

高斯进程是了解不确定性功能近似和顺序决策的强大框架,但不幸的是,其传统公式并不优于大量数据和现代硬件,无法进行大规模平行计算,促使许多研究人员开发提高可缩放性的技术。这篇论文侧重于迭代方法的强大组合,以及发展方法调整的强大组合,以发展方法贡献,便利在现代大型环境中使用高斯进程。通过将这两种技术协同结合,昂贵的计算被表述为线性方程式系统的解决办法,而利用迭代线性线性系统解析器获取。这极大地减少了记忆要求,便利了对大量数据的应用,并引入矩阵乘法作为现代硬件的理想的主要计算操作。


Article 275

Title@2025-07-09 (3): PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

Title: PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection PBCAT: Patch-basiertes komposites gegnerisches Training gegen physikalisch realisierbare Angriffe auf Objekterkennung PBCAT:针对实际可实现的物体探测攻击的补补综合对抗性训练 2506.23581v2

Authors (7): Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu

Object detection plays a crucial role in many security-sensitive applications. However, several recent studies have shown that object detectors can be easily fooled by physically realizable attacks, \eg, adversarial patches and recent adversarial textures, which pose realistic and urgent threats. Adversarial Training (AT) has been recognized as the most effective defense against adversarial attacks. While AT has been extensively studied in the $l_\infty$ attack settings on classification models, AT against physically realizable attacks on object detectors has received limited exploration. Early attempts are only performed to defend against adversarial patches, leaving AT against a wider range of physically realizable attacks under-explored. In this work, we consider defending against various physically realizable attacks with a unified AT method. We propose PBCAT, a novel Patch-Based Composite Adversarial Training strategy. PBCAT optimizes the model by incorporating the combination of small-area gradient-guided adversarial patches and imperceptible global adversarial perturbations covering the entire image. With these designs, PBCAT has the potential to defend against not only adversarial patches but also unseen physically realizable attacks such as adversarial textures. Extensive experiments in multiple settings demonstrated that PBCAT significantly improved robustness against various physically realizable attacks over state-of-the-art defense methods. Notably, it improved the detection accuracy by 29.7\% over previous defense methods under one recent adversarial texture attack.

然而,最近的一些研究显示,物体探测器很容易被物理上可实现的攻击、ege、对抗性补丁和最近的对抗性纹理所欺骗,这些攻击构成现实和紧迫的威胁。反向训练(AT)已被公认为是对抗敌对攻击的最有效防御手段。虽然在分类模型的“美元”攻击环境中对反向训练进行了广泛研究,但对实际上可实现的物体探测器的攻击得到的探索有限。早期的尝试只是为了防御对抗对立的补丁,使对实际上可实现的攻击范围更加广泛,被探索不足。在这项工作中,我们考虑以统一的AT方法来防御各种实际上可实现的攻击。我们提议采用新的基于补丁的复合反向训练(AT)战略来对付对抗对抗敌对性攻击。虽然反向反向反向反向攻击的模型进行了广泛研究,结合了小面积梯度制对抗性对立式对立式对立式攻击的组合,以及覆盖整个图像的不可察觉的全球对抗性对立性侵扰。有了这些设计,PBC不仅有可能在对抗性对抗性对立性近近近近近的补性攻击中进行防御,而且也以无形的反向性反向性反向性反射式反制式反制式反制式反制式反制。我们式反制式反制的反制式对式攻击,在一种反制的反制式反射式反制中展示式对立式对立式反制的反制的反制的反制中展示了反制式的反制,在一种反制式反制式反制式反制式反制制制制制制制制制制制制式制式制制制制制式制式制式制制制制式制制制制制制,在一种反制制式的反制,在一种反制制制制制制制制制制制制制制制制制制制式反制,在一种反制式反制式制式反制式反制式反制制制制制制制制制制制制制式反制式反制式反制式的反制式反制,在一种反制,在反制式反制式反制式反制式的反制式的反制式的反制式的反制式反制制式反制反制


Article 276

Title@2025-07-09 (3): Fredholm Neural Networks for forward and inverse problems in elliptic PDEs

Title: Fredholm Neural Networks for forward and inverse problems in elliptic PDEs Fredholm Neural Networks für vorwärts und inverse Probleme in elliptischen PDEs Fredholm神经网络,处理椭圆形PDE中前向和反向问题 2507.06038v2

Authors (3): Kyriakos Georgiou, Constantinos Siettos, Athanasios N. Yannacopoulos

Building on our previous work introducing Fredholm Neural Networks (Fredholm NNs/ FNNs) for solving integral equations, we extend the framework to tackle forward and inverse problems for linear and semi-linear elliptic partial differential equations. The proposed scheme consists of a deep neural network (DNN) which is designed to represent the iterative process of fixed-point iterations for the solution of elliptic PDEs using the boundary integral method within the framework of potential theory. The number of layers, weights, biases and hyperparameters are computed in an explainable manner based on the iterative scheme, and we therefore refer to this as the Potential Fredholm Neural Network (PFNN). We show that this approach ensures both accuracy and explainability, achieving small errors in the interior of the domain, and near machine-precision on the boundary. We provide a constructive proof for the consistency of the scheme and provide explicit error bounds for both the interior and boundary of the domain, reflected in the layers of the PFNN. These error bounds depend on the approximation of the boundary function and the integral discretization scheme, both of which directly correspond to components of the Fredholm NN architecture. In this way, we provide an explainable scheme that explicitly respects the boundary conditions. We assess the performance of the proposed scheme for the solution of both the forward and inverse problem for linear and semi-linear elliptic PDEs in two dimensions.

根据我们以前介绍Fredholm Neal Networks(Fredholm NNS/FNNS)解决整体方程式的工作,我们扩大了框架,以解决线性和半线性椭圆性部分方程式的前方和反面问题。拟议方案包括一个深神经网络(DNN),旨在代表使用潜在理论框架内的边界整体方法解决椭圆形PDE的固定点迭代过程。根据迭接机制,以可解释的方式计算层、重量、偏差和超度参数的数量。因此,我们将此称为潜在Fredholm Neural网络(PFNNN),我们表明,这一方法既能确保准确性和解释性,又能在域内出现小误差,在边界边界上也接近机械精密度。我们提供了一个建设性的证据,证明这一办法的一致性,并为域的内部和边界边界边界边界界限的界限提供了明确的界限界限界限界限,这些错误的界限取决于边界解决方案的近值和边界结构的完整度。我们清楚地解释了这一框架的内和内部结构的完整性。


Article 277

Title@2025-07-09 (3): Comprehensive Evaluation of Prototype Neural Networks

Title: Comprehensive Evaluation of Prototype Neural Networks Umfassende Evaluierung von Prototypen-Neuralen Netzwerken 原型神经网络综合评价 2507.06819v1

Authors (3): Philipp Schlinge, Steffen Meinert, Martin Atzmueller

Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library, which facilitates simple application of the metrics itself, as well as extensibility - providing the option for easily adding new metrics and models. https://github.com/uos-sis/quanproto

原型模型是可解释的人工智能(XAI)和可解释的机器学习的重要方法。在本文中,我们深入分析了一套突出的原型模型,包括ProtoPNet、ProtoPolol和PIPNet。我们为评估这些模型,采用了一套全面的衡量标准。除了应用文献的标准衡量标准外,我们还提出若干新的衡量标准,以进一步补充模型解释性分析。在实验中,我们将一套原型模型应用于一套不同的数据集,包括精细的分类、非二维设置和多标签分类,以进一步对比性能。此外,我们还提供我们的代码,作为开放源图书馆,方便简单应用指标本身,以及扩展性,为容易添加新的衡量标准和模型提供选项。https://github.com/uos-sis/quanproto。


Article 278

Title@2025-07-09 (3): Bayesian Multi-Scale Neural Network for Crowd Counting

Title: Bayesian Multi-Scale Neural Network for Crowd Counting Bayesian Multi-Scale Neural Network für die Crowd Counting Bayesian 多人计数多空间神经网络 2007.14245v4

Authors (1): Abhinav Sagar

Crowd counting is a challenging yet critical task in computer vision with applications ranging from public safety to urban planning. Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success. However, accurately counting individuals in highly congested scenes remains an open problem due to severe occlusions, scale variations, and perspective distortions, where people appear at drastically different sizes across the image. In this work, we propose a novel deep learning architecture that effectively addresses these challenges. Our network integrates a ResNet-based feature extractor for capturing rich hierarchical representations, followed by a downsampling block employing dilated convolutions to preserve spatial resolution while expanding the receptive field. An upsampling block using transposed convolutions reconstructs the high-resolution density map. Central to our architecture is a novel Perspective-aware Aggregation Module (PAM) designed to enhance robustness to scale and perspective variations by adaptively aggregating multi-scale contextual information. We detail the training procedure, including the loss functions and optimization strategies used. Our method is evaluated on three widely used benchmark datasets using Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics. Experimental results demonstrate that our model achieves superior performance compared to existing state-of-the-art methods. Additionally, we incorporate principled Bayesian inference techniques to provide uncertainty estimates along with the crowd count predictions, offering a measure of confidence in the model’s outputs.

在计算机视野中,通过从公共安全到城市规划的应用程序,进行人群计数是一项具有挑战性但至关重要的任务。最近利用革命神经网络(CNNs)取得的进展,估计密度的地图显示了巨大的成功。然而,在高度拥挤的场景中,精确地计数个人仍是一个开放的问题,因为存在严重的排斥、比例变化和观点扭曲,人们在图像上出现的巨大差异。在这项工作中,我们提出了一个新的深层次学习架构,以有效应对这些挑战。我们的网络整合了一个基于ResNet的特征提取器,以捕捉丰富的等级代表,随后是下层取样块,利用变异变异来保存空间分辨率,同时扩大接收场。一个使用变异变变变变变的图块,重建高分辨率密度的地图。我们建筑的中心是一个新颖的视觉感知聚合模块(PAM),目的是通过适应性集中的多尺度背景信息,加强规模和视觉变化的力度。我们详细介绍了培训程序,包括损失功能和所使用的优化战略。我们的方法是用三种广泛使用的基准数据集来评估,使用绝对误差(MAE)和模拟模拟的图像分析结果,以显示我们目前的预测结果。


Article 279

Title@2025-07-09 (3): Designing Robust Software Sensors for Nonlinear Systems via Neural Networks and Adaptive Sliding Mode Control

Title: Designing Robust Software Sensors for Nonlinear Systems via Neural Networks and Adaptive Sliding Mode Control Entwerfen robuster Softwaresensoren für nichtlineare Systeme über neurale Netzwerke und adaptive Schiebemodussteuerung 通过神经网络和适应性滑动模式控制,为非线性系统设计强力软件传感器 2507.06817v1

Authors (4): Ayoub Farkane, Mohamed Boutayeb, Mustapha Oudani, Mounir Ghogho

Accurate knowledge of the state variables in a dynamical system is critical for effective control, diagnosis, and supervision, especially when direct measurements of all states are infeasible. This paper presents a novel approach to designing software sensors for nonlinear dynamical systems expressed in their most general form. Unlike traditional model-based observers that rely on explicit transformations or linearization, the proposed framework integrates neural networks with adaptive Sliding Mode Control (SMC) to design a robust state observer under a less restrictive set of conditions. The learning process is driven by available sensor measurements, which are used to correct the observer’s state estimate. The training methodology leverages the system’s governing equations as a physics-based constraint, enabling observer synthesis without access to ground-truth state trajectories. By employing a time-varying gain matrix dynamically adjusted by the neural network, the observer adapts in real-time to system changes, ensuring robustness against noise, external disturbances, and variations in system dynamics. Furthermore, we provide sufficient conditions to guarantee estimation error convergence, establishing a theoretical foundation for the observer’s reliability. The methodology’s effectiveness is validated through simulations on challenging examples, including systems with non-differentiable dynamics and varying observability conditions. These examples, which are often problematic for conventional techniques, serve to demonstrate the robustness and broad applicability of our approach. The results show rapid convergence and high accuracy, underscoring the method’s potential for addressing complex state estimation challenges in real-world applications.

对动态系统中的国家变量的准确了解对于有效控制、诊断和监督至关重要,特别是在无法对所有国家进行直接测量的情况下。本文件介绍了设计非线性动态系统软件传感器的新方法。与依赖明确转换或线性化的传统模型观察者不同,拟议框架将神经网络与适应性缓冲模式控制(SMC)结合起来,以便在一套限制较少的条件下设计一个强大的国家观察员。学习过程受现有传感器测量的驱动,这些测量用于纠正观察员的国家估计。培训方法将系统应用的方程式作为一种基于物理的限制,使观察员能够进行综合,而不能接触到地面光线性状态轨迹。与传统的模型观察者不同,通过使用由神经网络进行动态调整的时变模型,将神经网络与适应性缓冲模式控制(SMC)结合起来,在一系列条件限制较少的情况下设计一个强大的国家观察员。此外,我们提供了足够的条件,可以保证对错误进行估算,为观察员的可靠性建立理论基础。 培训的准确性将系统应用作为物理约束,使得观察员的方程式能够进行综合,同时通过模拟、具有挑战性的常规方法的系统来验证。


Article 280

Title@2025-07-09 (3): Intrinsic Training Signals for Federated Learning Aggregation

Title: Intrinsic Training Signals for Federated Learning Aggregation Intrinsische Trainingssignale für die Federated Learning Aggregation 联邦学习综合组织内部培训信号 2507.06813v1

Authors (5): Cosimo Fiorini, Matteo Mosconi, Pietro Buzzega, Riccardo Salami, Simone Calderara

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy. While existing approaches for aggregating client-specific classification heads and adapted backbone parameters require architectural modifications or loss function changes, our method uniquely leverages intrinsic training signals already available during standard optimization. We present LIVAR (Layer Importance and VARiance-based merging), which introduces: i) a variance-weighted classifier aggregation scheme using naturally emergent feature statistics, and ii) an explainability-driven LoRA merging technique based on SHAP analysis of existing update parameter patterns. Without any architectural overhead, LIVAR achieves state-of-the-art performance on multiple benchmarks while maintaining seamless integration with existing FL methods. This work demonstrates that effective model merging can be achieved solely through existing training signals, establishing a new paradigm for efficient federated model aggregation. The code will be made publicly available upon acceptance.

联邦学习联合会(FL)在维护数据隐私的同时,使分布式客户能够进行合作模式培训,同时保护数据隐私。虽然整合特定客户分类头和调整后的主干参数的现有方法需要建筑改造或损失功能改变,但我们的方法独有地利用了标准优化期间已有的内在培训信号。我们介绍了LIVAR(Layer Streat和VARiance合并),它介绍了:i)使用自然突发特征统计的差别加权分类汇总计划;ii)根据SHAP对现有更新参数模式的分析,以解释性驱动的LORA合并技术。没有建筑管理,LIVAR在多个基准上取得最新业绩,同时保持与现有FL方法的无缝结合。这项工作表明,只有通过现有的培训信号,才能实现有效的合并模式,为高效的联邦化模型汇总建立一个新的模式。一旦被接受,守则将公布于众。


Article 281

Title@2025-07-09 (3): Noise tolerance via reinforcement: Learning a reinforced quantum dynamics

Title: Noise tolerance via reinforcement: Learning a reinforced quantum dynamics Geräuschtoleranz durch Verstärkung: Erlernen einer verstärkten Quantendynamik 通过加强增强的噪音容忍:学习强化量子动力学 2506.12418v2

Authors (1): Abolfazl Ramezanpour

The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the efficiency of learning and optimization algorithms. In this study, we demonstrate that a reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. We study a quantum annealing process where, through reinforcement, the system is encouraged to maintain its current state or follow a noise-free evolution. A learning algorithm is employed to derive a concise approximation of this reinforced dynamics, reducing the total evolution time and, consequently, the system’s exposure to noisy interactions. This also avoids the complexities associated with implementing quantum feedback in such reinforcement algorithms. The efficacy of our method is demonstrated through numerical simulations of reinforced quantum annealing with one- and two-qubit systems under Pauli noise.

量子模拟的性能在很大程度上取决于减少噪音技术和纠正错误算法的效率。强化已成为提高学习和优化算法效率的有力战略。在本研究中,我们证明,强化量子动态在与噪音环境的相互作用中可以表现出很强的强力。我们研究量子喷射过程,通过强化,鼓励系统保持其目前状态或遵循无噪音的演变。使用学习算法来对这种强化的动态进行简要的近似,减少整个演进时间,进而使系统暴露于吵闹的相互作用中。这也避免了在这种增强算法中执行量子反馈的复杂性。我们的方法的功效通过在保利噪音下用一二公分的量子系统进行强化量子喷射的数值模拟来证明。


Article 282

Title@2025-07-09 (3): Very fast Bayesian Additive Regression Trees on GPU

Title: Very fast Bayesian Additive Regression Trees on GPU Sehr schnelle Bayesian Additive Regression Bäume auf GPU 在 GPU 上非常快速的 Bayesian 补退树 2410.23244v2

Authors (1): Giacomo Petrillo

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique based on an ensemble of decision trees. It is part of the toolbox of many statisticians. The overall statistical quality of the regression is typically higher than other generic alternatives, and it requires less manual tuning, making it a good default choice. However, it is a niche method compared to its natural competitor XGBoost, due to the longer running time, making sample sizes above 10,000-100,000 a nuisance. I present a GPU-enabled implementation of BART, faster by up to 200x relative to a single CPU core, making BART competitive in running time with XGBoost. This implementation is available in the Python package bartz.

Bayesian Additive Redition 树(BART) 是一种非参数的Bayesian回归技术,基于决策树的组合,是许多统计人员的工具箱的一部分。 回归的总体统计质量通常高于其他通用替代物,它需要的人工调整较少,因此它是一个良好的默认选择。 然而,由于运行时间较长,它是一种与自然竞争者XGBoost相比的利基方法,使样本大小超过10,000-100,000是一个麻烦问题。 我展示了一个GPU启用的BART,比单个CPU核心更快地达到200x,使得BART在运行时具有与XGBoost的竞争力。 Python 软件包巴兹提供了这一应用。


Article 283

Title@2025-07-09 (3): A Wireless Foundation Model for Multi-Task Prediction

Title: A Wireless Foundation Model for Multi-Task Prediction Ein drahtloses Foundation-Modell für Multi-Task-Vorhersage 无线电无线基金会多任务预测模型 2507.05938v2

Authors (7): Yucheng Sheng, Jiacheng Wang, Xingyu Zhou, Le Liang, Hao Ye, Shi Jin, Geoffrey Ye Li

With the growing complexity and dynamics of the mobile communication networks, accurately predicting key system parameters, such as channel state information (CSI), user location, and network traffic, has become essential for a wide range of physical (PHY)-layer and medium access control (MAC)-layer tasks. Although traditional deep learning (DL)-based methods have been widely applied to such prediction tasks, they often struggle to generalize across different scenarios and tasks. In response, we propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals. The proposed model enforces univariate decomposition to unify heterogeneous tasks, encodes granularity for interval awareness, and uses a causal Transformer backbone for accurate predictions. Additionally, we introduce a patch masking strategy during training to support arbitrary input lengths. After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and achieves zero-shot performance on new tasks that surpass traditional full-shot baselines.

随着移动通信网络的日益复杂和动态,准确预测关键系统参数,如频道状态信息、用户位置和网络流量等关键系统参数,对于一系列广泛的物理(物理)和中层出入控制(MAC)任务至关重要。虽然传统的深层学习(DL)方法已广泛应用于此类预测任务,但它们往往难以在不同的情景和任务中一概而论。作为回应,我们提出了一个支持不同预测间隔的无线网络多任务预测的统一基础模型。拟议模型强制实施单象体分解,以统一不同任务,为间隔意识编码颗粒,并使用因果变形主干柱进行准确预测。此外,我们在培训期间引入了补丁遮罩战略,以支持任意输入长度。在接受大规模数据集培训后,拟议的基础模型展示了对看不见情景的有力概括,并在超过传统全速基线的新任务上实现了零速性表现。


Article 284

Title@2025-07-09 (3): Speech Tokenizer is Key to Consistent Representation

Title: Speech Tokenizer is Key to Consistent Representation Sprach-Tokenizer ist der Schlüssel zur konsistenten Repräsentation 语音收发器是统一代表的关键 2507.06802v1

Authors (3): Wonjin Jung, Sungil Kang, Dong-Yeon Cho

Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream tasks. While recent advances in residual vector quantization (RVQ) have incorporated semantic elements, they often neglect critical acoustic features. We propose an advanced approach that simultaneously encodes both linguistic and acoustic information, preserving prosodic and emotional content. Our method significantly enhances speech representation fidelity across diverse applications. Empirical evaluations demonstrate its effectiveness in speech coding, voice conversion, emotion recognition, and multimodal language modeling, without requiring additional training. This versatility underscores its potential as a key tool for advancing AI-driven speech processing.

语音符号化在数字语音处理中至关重要,它将连续语音信号转换成不同单位,用于各种计算任务。本文介绍了一个新的语音符号,广泛适用于下游任务。虽然残余矢量量化(RVQ)最近的进展包括了语义元素,但往往忽视了关键的声学特征。我们提出了一个先进的方法,同时将语言和声学信息编码,保留预言和情感内容。我们的方法极大地提高了各种应用程序的语音表示的忠诚性。经验性评估表明其在语音编码、语音转换、情感识别和多式语言建模方面的有效性,而无需额外培训。这种多功能性凸显了它作为推进AI驱动语音处理的关键工具的潜力。


Article 285

Title@2025-07-09 (3): Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure

Title: Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure Neurale Netzwerke für gezähmte Milstein-Annäherung von SDEs mit additivem symmetrischem Sprunggeräusch, angetrieben durch eine Poisson Random Measure 利用 Poisson 随机测量法驱动的 添加对称性跳跃振动的 SDE 近似电磁神经网络 2507.04417v2

Authors (2): Jose-Hermenegildo Ramirez-Gonzalez, Ying Sun

This work aims to estimate the drift and diffusion functions in stochastic differential equations (SDEs) driven by a particular class of L'evy processes with finite jump intensity, using neural networks. We propose a framework that integrates the Tamed-Milstein scheme with neural networks employed as non-parametric function approximators. Estimation is carried out in a non-parametric fashion for the drift function $f: \mathbb{Z} \to \mathbb{R}$, the diffusion coefficient $g: \mathbb{Z} \to \mathbb{R}$. The model of interest is given by [ dX(t) = \xi + f(X(t))\, dt + g(X(t))\, dW_t + \gamma \int_{\mathbb{Z}} z\, N(dt,dz), ] where $W_t$ is a standard Brownian motion, and $N(dt,dz)$ is a Poisson random measure on $(\mathbb{R}{+} \times \mathbb{Z}$, $\mathcal{B} (\mathbb{R}{+}) \otimes \mathcal{Z}$, $\lambda( \Lambda \otimes v))$, with $\lambda, \gamma > 0$, $\Lambda$ being the Lebesgue measure on $\mathbb{R}_{+}$, and $v$ a finite measure on the measurable space $(\mathbb{Z}, \mathcal{Z})$. Neural networks are used as non-parametric function approximators, enabling the modeling of complex nonlinear dynamics without assuming restrictive functional forms. The proposed methodology constitutes a flexible alternative for inference in systems with state-dependent noise and discontinuities driven by L'evy processes.

这项工作旨在用神经网络来估计由特定等级的L'evy进程驱动且具有有限跳力的磁度强度的流和扩散函数。 我们提议了一个框架, 将塔米德- Milstein 计划与神经网络整合为非参数函数匹配器。 以非参数方式对漂移函数进行估计 $f:\mathb\\ to\ mathb{ $, 以 math{ 美元为单位 。 美元是标准的Brown运动, 美元是美元=美元=美元=美元=美元=美元=美元} 。 功能网络的模型由\ x( x)\\ g)\\\ g\\\\\ ma\\ ma\\\ math\ a ma\ ma\ ma\ ma\\ max a max a max a a modelal deal deal deal deal deal asyal as, max a a max max maxal maxal mabs a a a a max max max max max motimotimotimotime max motimotimes, motime mocs, motions, mocal_ motime motime mocs a mocs mocal_ mocal_ motime motitist motitimotist motist motist motist motima mosta motistema mosta mosta mosta ma ma ma ma ma ma ma ma mocs a ma mocs a mocs a mocs a a mocs a mocs, mocal mocs, mocs a mocs a mocs a motis, mod *, mo mos a mocs, mos mos, mos a mo mocal_ mocal mo


Article 286

Title@2025-07-09 (3): The cost of ensembling: is it always worth combining?

Title: The cost of ensembling: is it always worth combining? Die Kosten für die Montage: lohnt es sich immer zu kombinieren? 集合的成本:总值得合并吗? 2506.04677v2

Authors (1): Marco Zanotti

Given the continuous increase in dataset sizes and the complexity of forecasting models, the trade-off between forecast accuracy and computational cost is emerging as an extremely relevant topic, especially in the context of ensemble learning for time series forecasting. To asses it, we evaluated ten base models and eight ensemble configurations across two large-scale retail datasets (M5 and VN1), considering both point and probabilistic accuracy under varying retraining frequencies. We showed that ensembles consistently improve forecasting performance, particularly in probabilistic settings. However, these gains come at a substantial computational cost, especially for larger, accuracy-driven ensembles. We found that reducing retraining frequency significantly lowers costs, with minimal impact on accuracy, particularly for point forecasts. Moreover, efficiency-driven ensembles offer a strong balance, achieving competitive accuracy with considerably lower costs compared to accuracy-optimized combinations. Most importantly, small ensembles of two or three models are often sufficient to achieve near-optimal results. These findings provide practical guidelines for deploying scalable and cost-efficient forecasting systems, supporting the broader goals of sustainable AI in forecasting. Overall, this work shows that careful ensemble design and retraining strategy selection can yield accurate, robust, and cost-effective forecasts suitable for real-world applications.

鉴于数据集规模的不断增加以及预测模型的复杂性,预测准确性和计算成本之间的权衡正在作为一个极为相关的专题出现,特别是在为时间序列预测进行共同学习的背景下。为了评估这一点,我们评估了两个大型零售数据集(M5和VN1)的10个基模型和8个混合配置,同时考虑到不同再培训频率下的点值和概率准确性。我们显示,在各种再培训频率下,二、三种模型的小型组合往往足以实现接近最佳的预测结果。这些结果为部署可调整和具有成本效益的预测系统提供了实用的指导方针,支持了准确性、特别是点预报的最小影响,降低了再培训频率。此外,效率驱动的组合提供了强有力的平衡,实现了竞争性的准确性,与精确性组合相比成本低得多。最重要的是,两、三个模型的小型组合往往足以实现接近最佳的结果。这些结果为部署可计量和具有成本效益的预测系统提供了实用指南,有助于部署可计量和具有成本效益的系统,支持对准确性、特别是点预报进行最小的准确性预测,从而展示了可靠、可靠和可靠的全球预测的实际战略。


Article 287

Title@2025-07-09 (3): Test-Time Scaling with Reflective Generative Model

Title: Test-Time Scaling with Reflective Generative Model Test-Zeit-Skalierung mit reflektierendem Generativem Modell 具有反反思考生成模型的试验时间缩放 2507.01951v2

Authors (11): Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie

We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3-mini’s performance via the new Reflective Generative Form. The new form focuses on high-quality reasoning trajectory selection and contains two novelties: 1) A unified interface for policy and process reward model: we share the backbone network and use task-specific heads for reasoning trajectory predicting and scoring respectively, introducing only 53M extra parameters for trajectory scoring. 2) Eliminating the reliance on process-level annotation: we provide a self-supervised process reward model, which can directly learn the high-quality reasoning trajectory selection from the outcome reward. Equipped with the reflective generative form, MetaStone-S1 is naturally suitable for test-time scaling, and we provide three reasoning effort modes (low, medium, and high) based on the controllable thinking length. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI o3-mini’s series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

我们引入了我们的第一个反光基因模型MetaStone-S1,该模型通过新的反射引因形式获得了OpenAI o3-mini的性能。新的形式侧重于高质量的推理轨迹选择,并包含两个新颖之处:1) 政策和程序奖励模式的统一界面:我们共享主干网,并使用特定任务头分别用于推理轨迹预测和评分,只引入了5300米的弹道评分额外参数。2) 不再依赖进程一级的批注:我们提供一种自我监督的进程奖赏模式,它能够直接从结果奖赏中学习高质量的推理轨迹选择。用反光化的基因化形式安装,MetaStone-S1自然适合测试时间缩放,我们根据可控思维长度提供三种推理努力模式(低、中、高)。实验表明我们的MetaStone-S1的性能与O3-mini系列相比,只有32B参数大小。为了支持研究界,我们在 https://giust/Setata1.Seta1.


Article 288

Title@2025-07-09 (3): Temporal Information Retrieval via Time-Specifier Model Merging

Title: Temporal Information Retrieval via Time-Specifier Model Merging Temporal Information Retrieval über Zeit-Spezifikator Modellverschmelzung 通过时间规格模型合并获取时间信息 2507.06782v1

Authors (7): SeungYoon Han, Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, Huije Lee, Jong C. Park

The rapid expansion of digital information and knowledge across structured and unstructured sources has heightened the importance of Information Retrieval (IR). While dense retrieval methods have substantially improved semantic matching for general queries, they consistently underperform on queries with explicit temporal constraints–often those containing numerical expressions and time specifiers such as ``in 2015.’’ Existing approaches to Temporal Information Retrieval (TIR) improve temporal reasoning but often suffer from catastrophic forgetting, leading to reduced performance on non-temporal queries. To address this, we propose Time-Specifier Model Merging (TSM), a novel method that enhances temporal retrieval while preserving accuracy on non-temporal queries. TSM trains specialized retrievers for individual time specifiers and merges them in to a unified model, enabling precise handling of temporal constraints without compromising non-temporal retrieval. Extensive experiments on both temporal and non-temporal datasets demonstrate that TSM significantly improves performance on temporally constrained queries while maintaining strong results on non-temporal queries, consistently outperforming other baseline methods. Our code is available at https://github.com/seungyoonee/TSM .

数字信息和知识在结构化和无结构化来源之间的迅速扩展提高了信息检索系统的重要性。虽然密集的检索方法大大改进了一般查询的语义匹配,但它们在使用明确时间限制的查询方面表现始终不力,通常含有数字表达和时间规格的查询,例如“2015年”“现有时间信息检索系统方法”改进了时间推理,但经常受到灾难性的遗忘,导致非时间质询的性能下降。为了解决这一问题,我们提议采用时间标定模型合并(TSM),这是一种新颖的方法,既能提高时间检索时间,又能保持非时间质询的准确性,同时保持非时间质询的准确性。TSM为个别时间标注员培训专门检索员,并将它们合并成一个统一的模式,使时间限制的准确处理不损及非时间检索。关于时间和非时间值数据集的广泛实验表明,TSM在保持非时间限制查询的有力结果的同时,显著地改进了非时间质询结果,持续地超越了其他基准方法。我们的代码可在http://giub/comyoung查阅。


Article 289

Title@2025-07-09 (3): Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm

Title: Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm Sicheres Lernen, eingeschränkte Richtlinien durch Nachahmung des Lernens: Verbindung zur probabilistischen Inferenz und zu einem Naive-Algorithmus 通过模仿学习学习,学习安全、有限制的政策:与概率推论和自然算法的联系 2507.06780v1

Authors (2): George Papadopoulos, George A. Vouros

This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert trajectories executing a task. The formulation of the method takes advantage of results connecting performance to bounds for the KL-divergence between demonstrated and learned policies, and its objective is rigorously justified through a connection to a probabilistic inference framework for reinforcement learning, incorporating the reinforcement learning objective and the objective to abide by constraints in an entropy maximization setting. The proposed algorithm optimizes the learning objective with dual gradient descent, supporting effective and stable training. Experiments show that the proposed method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, accommodating different modalities of demonstrated behaviour, and with abilities to generalize.

本条引入了一种模拟学习方法,以学习符合执行任务的专家轨迹所显示的限制的最大限度的诱变政策。该方法的制定利用了将表现与所显示和所学政策之间的KL-调和界限联系起来的结果,其目标通过与强化学习的概率推理框架相联系,纳入强化学习的目标和在恒温最大化环境中遵守限制的目标而得到严格合理解释。拟议的算法将学习目标优化为双梯度下降,支持有效和稳定的培训。实验表明,拟议的方法能够在不同类型、适应所显示行为的不同模式和普及能力的情况下学习约束行为的有效政策模式。


Article 290

Title@2025-07-09 (3): Tailoring deep learning for real-time brain-computer interfaces: From offline models to calibration-free online decoding

Title: Tailoring deep learning for real-time brain-computer interfaces: From offline models to calibration-free online decoding Maßgeschneidertes Deep Learning für Echtzeit-Brain-Computer-Schnittstellen: Von Offline-Modellen zur kalibrierungsfreien Online-Dekodierung 实时脑计算机界面的深层学习:从离线模型到无校准的在线解码 2507.06779v1

Authors (3): Martin Wimpff, Jan Zerfowski, Bin Yang

Despite the growing success of deep learning (DL) in offline brain-computer interfaces (BCIs), its adoption in real-time applications remains limited due to three primary challenges. First, most DL solutions are designed for offline decoding, making the transition to online decoding unclear. Second, the use of sliding windows in online decoding substantially increases computational complexity. Third, DL models typically require large amounts of training data, which are often scarce in BCI applications. To address these challenges and enable real-time, cross-subject decoding without subject-specific calibration, we introduce realtime adaptive pooling (RAP), a novel parameter-free method. RAP seamlessly modifies the pooling layers of existing offline DL models to meet online decoding requirements. It also reduces computational complexity during training by jointly decoding consecutive sliding windows. To further alleviate data requirements, our method leverages source-free domain adaptation, enabling privacy-preserving adaptation across varying amounts of target data. Our results demonstrate that RAP provides a robust and efficient framework for real-time BCI applications. It preserves privacy, reduces calibration demands, and supports co-adaptive BCI systems, paving the way for broader adoption of DL in online BCIs. These findings lay a strong foundation for developing user-centered, high-performance BCIs that facilitate immediate feedback and user learning.

尽管在离线大脑-计算机界面(BCI)的深层学习(DL)日益成功,但在实时应用中采用DL仍然有限,这主要有三个挑战。首先,大多数DL解决方案是为离线解码设计的,使向在线解码的过渡变得不明确。第二,在线解码中使用滑动窗口大大增加了计算复杂性。第三,DL模式通常需要大量培训数据,而BLI应用程序中通常很少使用这些数据。为了应对这些挑战,并促成实时、交叉解码,我们引入实时适应集合(RAP),这是一个全新的无参数方法。RAP无缝地修改现有DL模型的集合层以满足在线解码要求。在培训期间,通过联合解码连续解码窗口,降低计算复杂性。为了进一步缓解数据要求,我们的方法利用无源域适应,使隐私能够在不同数量的目标数据中进行适应。我们的成果表明,RAP为实时BCI应用程序提供了一个强大和有效的框架。它保护隐私,降低校准用户的校准要求,支持现有DLBBBBB-rodroad-roadstiming Stystemism for laeply bre laut lady-lical Stepal-lical broom-lical broduismisb-hide lautdildal-lidddddddddds lautds lautdly flautddddis 基础,支持这些高基础,支持这些基础的可靠的更新的高级学习基础。


Article 291

Title@2025-07-09 (3): Mutual Information Free Topological Generalization Bounds via Stability

Title: Mutual Information Free Topological Generalization Bounds via Stability Gegenseitige Informationen Freie Topologische Verallgemeinerung Grenzen durch Stabilität 通过稳定建立无地形普遍化的互信信息 2507.06775v1

Authors (6): Mario Tuci, Lennart Bastian, Benjamin Dupuis, Nassir Navab, Tolga Birdal, Umut Şimşekli

Providing generalization guarantees for stochastic optimization algorithms is a major challenge in modern learning theory. Recently, several studies highlighted the impact of the geometry of training trajectories on the generalization error, both theoretically and empirically. Among these works, a series of topological generalization bounds have been proposed, relating the generalization error to notions of topological complexity that stem from topological data analysis (TDA). Despite their empirical success, these bounds rely on intricate information-theoretic (IT) terms that can be bounded in specific cases but remain intractable for practical algorithms (such as ADAM), potentially reducing the relevance of the derived bounds. In this paper, we seek to formulate comprehensive and interpretable topological generalization bounds free of intractable mutual information terms. To this end, we introduce a novel learning theoretic framework that departs from the existing strategies via proof techniques rooted in algorithmic stability. By extending an existing notion of \textit{hypothesis set stability}, to \textit{trajectory stability}, we prove that the generalization error of trajectory-stable algorithms can be upper bounded in terms of (i) TDA quantities describing the complexity of the trajectory of the optimizer in the parameter space, and (ii) the trajectory stability parameter of the algorithm. Through a series of experimental evaluations, we demonstrate that the TDA terms in the bound are of great importance, especially as the number of training samples grows. This ultimately forms an explanation of the empirical success of the topological generalization bounds.

在现代学习理论中,普遍优化算法是一大挑战。最近,一些研究强调培训轨迹的几何学分数在理论上和经验上对一般化错误的影响。在这些著作中,提出了一系列从理论上和从经验上将一般化错误与从地形数据分析(TDA)中产生的地形复杂性概念联系起来的理论性一般化错误。尽管这些界限在经验上取得了成功,但它们依赖于在具体案例中可以约束但仍然难以找到实际算法(如ADAM)的复杂信息理论性术语。在本文件中,我们寻求制定全面和可解释的可解释的上层概括性框架,而没有难解的相互信息术语。为此,我们引入了一个新的理论性框架,这种框架与现有战略脱节,其根植根于算式数据分析的证明技术。通过扩展现有的 ktextit{hypothes stormationsmas settilates supilations salalalality(我们不断解读的轨迹图性分析)的典型性错误性错误,最终可以证明,在轨迹分析中,这种精确的精度的精度的精确性框架。


Article 292

Title@2025-07-09 (3): From Gradient Clipping to Normalization for Heavy Tailed SGD

Title: From Gradient Clipping to Normalization for Heavy Tailed SGD Vom gradienten Clipping zur Normalisierung für schweres Tailed SGD 从重尾SGD的梯度滑动到正常化 2410.13849v3

Authors (3): Florian Hübler, Ilyas Fatkhullin, Niao He

Recent empirical evidence indicates that many machine learning applications involve heavy-tailed gradient noise, which challenges the standard assumptions of bounded variance in stochastic optimization. Gradient clipping has emerged as a popular tool to handle this heavy-tailed noise, as it achieves good performance in this setting both theoretically and practically. However, our current theoretical understanding of non-convex gradient clipping has three main shortcomings. First, the theory hinges on large, increasing clipping thresholds, which are in stark contrast to the small constant clipping thresholds employed in practice. Second, clipping thresholds require knowledge of problem-dependent parameters to guarantee convergence. Lastly, even with this knowledge, current sampling complexity upper bounds for the method are sub-optimal in nearly all parameters. To address these issues, we study convergence of Normalized SGD (NSGD). First, we establish a parameter-free sample complexity for NSGD of $\mathcal{O}\left(\varepsilon^{-\frac{2p}{p-1}}\right)$ to find an $\varepsilon$-stationary point. Furthermore, we prove tightness of this result, by providing a matching algorithm-specific lower bound. In the setting where all problem parameters are known, we show this complexity is improved to $\mathcal{O}\left(\varepsilon^{-\frac{3p-2}{p-1}}\right)$, matching the previously known lower bound for all first-order methods in all problem dependent parameters. Finally, we establish high-probability convergence of NSGD with a mild logarithmic dependence on the failure probability. Our work complements the studies of gradient clipping under heavy tailed noise improving the sample complexities of existing algorithms and offering an alternative mechanism to achieve high probability convergence.

最近的实证证据表明, 许多机器学习应用都涉及重尾的梯度噪音, 这挑战了在随机优化中存在差异的标准假设。 渐变剪裁已成为处理这种重尾噪声的流行工具, 因为它在理论上和实践上都取得了良好的表现。 然而, 我们目前对非convex梯度剪裁的理论理解有三个主要缺陷。 首先, 理论依赖于大型的、 不断增长的剪裁阈值, 这与实际中采用的小型固定剪裁阈值形成鲜明的对比。 其次, 剪裁阈值需要了解取决于问题的参数, 才能保证趋同。 最后, 即便有了这种知识, 目前该方法的取样复杂度的复杂度在几乎所有参数中都是次最佳的。 首先, 我们为NSGD( $mathcal) 设定了一个无参数复杂性的无参数。 (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


Article 293

Title@2025-07-09 (3): Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking

Title: Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking Tail-aware Adversarial Attacks: Ein Distributionsansatz für effizientes LLM-Jailbreaking 反性攻击:高效LLM监狱破解的分发办法 2507.04446v2

Authors (4): Tim Beyer, Yan Scholten, Leo Schwinn, Stephan Günnemann

To guarantee safe and robust deployment of large language models (LLMs) at scale, it is critical to accurately assess their adversarial robustness. Existing adversarial attacks typically target harmful responses in single-point, greedy generations, overlooking the inherently stochastic nature of LLMs. In this paper, we propose a novel framework for adversarial robustness evaluation that explicitly models the entire output distribution, including tail-risks, providing better estimates for model robustness at scale. By casting the attack process as a resource allocation problem between optimization and sampling, we determine compute-optimal tradeoffs and show that integrating sampling into existing attacks boosts ASR by up to 48% and improves efficiency by up to two orders of magnitude. Our framework also enables us to analyze how different attack algorithms affect output harm distributions. Surprisingly, we find that most optimization strategies have little effect on output harmfulness. Finally, we introduce a data-free proof-of-concept objective based on entropy-maximization to demonstrate how our tail-aware perspective enables new optimization targets. Overall, our findings highlight the importance of tail-aware attacks and evaluation protocols to accurately assess and strengthen LLM safety.

为了保证大规模安全可靠地部署大型语言模型(LLMs),必须准确评估其对抗性强势。现有的对抗性攻击通常针对单点、贪婪的几代人的有害反应,忽视LLMs固有的随机性。在本文中,我们提出了一个新的对抗性强力评价框架,明确模拟整个产出分布,包括尾点风险,为规模模型稳健性提供更好的估计。通过将攻击过程作为优化和取样之间的资源分配问题,我们确定计算最佳权衡,并表明将取样纳入现有攻击后,ASR率将提高至48%,并将效率提高至两个级。我们的框架还使我们能够分析不同攻击性算法如何影响产出伤害分布。令人惊讶的是,我们发现大多数优化战略对产出有害性影响不大。最后,我们引入了一个无数据验证概念的目标,即基于nentropy-maximmiz化,以显示我们尾点认识的视角如何使新的优化目标得以实现。总体而言,我们的调查结果强调了尾点觉攻击和评估协议对于准确评估和加强LMLM安全的重要性。


Article 294

Title@2025-07-09 (3): Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric

Title: Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric Robustes Deep Network Lernen von nichtlinearen Regressionsaufgaben durch parametrische Leaky Exponential Linear Units (LELUs) und eine Diffusion Metric 通过参数激光光极线性单位(LELUs)和扩散计量器对非线性回归任务进行强有力的深网络学习 2507.06765v1

Authors (1): Enda D. V. Bigarella

This document proposes a parametric activation function (ac.f.) aimed at improving multidimensional nonlinear data regression. It is a established knowledge that nonlinear ac.f.’s are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the ac.f. further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient ac.f.’s such as ELU or SiLU have limited performance and non-smooth ac.f.’s such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth “Leaky Exponential Linear Unit”, with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.

本文件提出了旨在改进多维非线性数据回归的参数激活功能(ac.f.),这是公认的知识,即学习非线性数据集需要非线性 ac.f.f.。这项工作表明, ac.f.f.的光滑和梯度特性进一步影响到大型神经网络的性能,使其过于适应模型参数并对模型参数具有敏感性。光滑但消失-渐变 ac.f.等ELU或Sil.f.的性能有限,而RELU和Layaky-RELU等非线性能则进一步在培训模型中造成不连续性。通过光滑的“显性线性线性单元”展示了更好的性能,可以对非零梯度进行训练。还提出了新的扩散损失计量标准,以衡量经过训练的模型在超度方面的性能。


Article 295

Title@2025-07-09 (3): Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers

Title: Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers Fast Equivariant Imaging: Beschleunigung für unüberwachtes Lernen über Augmented Lagrangian und Auxiliary PnP Denoisers 快速等同成像:通过增强拉格朗江语和辅助PnP Denoisers语加速未受监督的学习 2507.06764v1

Authors (3): Guixian Xu, Jinglai Li, Junqi Tang

We propose Fast Equivariant Imaging (FEI), a novel unsupervised learning framework to efficiently train deep imaging networks without ground-truth data. From the perspective of reformulating the Equivariant Imaging based optimization problem via the method of Lagrange multipliers and utilizing plug-and-play denoisers, this novel unsupervised scheme shows superior efficiency and performance compared to vanilla Equivariant Imaging paradigm. In particular, our PnP-FEI scheme achieves an order-of-magnitude (10x) acceleration over standard EI on training U-Net with CT100 dataset for X-ray CT reconstruction, with improved generalization performance.

我们建议快速等同成像(FEI),这是一个新的不受监督的学习框架,可有效培训没有地面实况数据的深成像网络。 从通过拉格兰格乘数法和采用插杆式隐居器重新改造基于等同成像的优化问题的角度出发,这个新的、不受监督的计划比香草等同成像范范范还高效率和性能。 特别是,我们的PnP-FEI计划取得了一个超标准的超音速(10x)加速率,在X射线CT重建中,通过CT100数据集对U-网络进行培训,提高了通用性能。


Article 296

Title@2025-07-09 (3): Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Title: Reinforcement Learning-based Feature Generation Algorithm for Scientific Data Verstärktes Lernen-basiertes Feature-Generierungs-Algorithmus für wissenschaftliche Daten 科学数据基于学习的加强 学习 科学数据 地貌生成 算法 2507.03498v2

Authors (3): Meng Xiao, Junfeng Zhou, Yuanchun Zhou

Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.

地物生成(FG)的目的是通过建立高顺序特征组合和排除冗余特征,提高原始数据的预测潜力,这是表层科学数据改进下游机器学习模型性能的关键预处理步骤,是表层科学数据改进下游机器学习模型性能的关键预处理步骤,传统方法在处理科学数据特性生成时面临以下两个挑战:第一,科学数据高顺序特征组合的有效构建需要深入和广泛的具体领域的专门知识;第二,随着地物组合的顺序的提高,搜索空间会迅速扩展,造成人类劳动力消费难以承受;数据精密人工智能(DCAI)模式的进展为地物生成过程自动化开辟了新的途径。受此启发,本文件回顾了传统地审视了传统地物生成工作流程,并提出了多媒介地物生成(MAFG)框架。具体地说,在迭接合探索阶段,多媒介将构建数学转换方程式,合成和识别特征组合会抑制高信息内容,并利用强化学习机制来发展其战略。在完成探索阶段后,MAFG大型语言模型(LMMM)集大型模型(LMMM)集成大型语言模型(LIMM)为持续解释各项实验性创新成果,并持续地展示各种实验性数据生成模型,从而持续地分析各项结果生成的模型分析结果。


Article 297

Title@2025-07-09 (3): Mathematical artificial data for operator learning

Title: Mathematical artificial data for operator learning Mathematische künstliche Daten für das Bedienerlernen 用于操作员学习的数学人工数据 2507.06752v1

Authors (2): Heng Wu, Benzhuo Lu

Machine learning has emerged as a transformative tool for solving differential equations (DEs), yet prevailing methodologies remain constrained by dual limitations: data-driven methods demand costly labeled datasets while model-driven techniques face efficiency-accuracy trade-offs. We present the Mathematical Artificial Data (MAD) framework, a new paradigm that integrates physical laws with data-driven learning to facilitate large-scale operator discovery. By exploiting DEs’ intrinsic mathematical structure to generate physics-embedded analytical solutions and associated synthetic data, MAD fundamentally eliminates dependence on experimental or simulated training data. This enables computationally efficient operator learning across multi-parameter systems while maintaining mathematical rigor. Through numerical demonstrations spanning 2D parametric problems where both the boundary values and source term are functions, we showcase MAD’s generalizability and superior efficiency/accuracy across various DE scenarios. This physics-embedded-data-driven framework and its capacity to handle complex parameter spaces gives it the potential to become a universal paradigm for physics-informed machine intelligence in scientific computing.

机器学习已成为解决差异方程式(DEs)的变革工具,但普遍采用的方法仍然受到双重限制:数据驱动的方法要求有昂贵的标签数据集,而模型驱动的技术则面临效率准确的权衡。我们介绍了数学人工数据(MAD)框架,这是一个将物理法与数据驱动的学习相结合的新范例,以便利大型操作者发现。通过利用DEs的内在数学结构生成物理组合分析解决方案和相关合成数据,MAD从根本上消除了对实验或模拟培训数据的依赖。这让计算效率高的操作者能够在多参数系统中学习,同时保持数学钻孔器。通过包含边界值和源术语功能的2D参数问题的数字演示,我们展示MAD在各种DE情景中的通用性和超高级效率/准确性。这个由物理组合的数据驱动的框架及其处理复杂参数空间的能力,使得它有可能成为科学计算中物理学知情的机器情报的普遍范例。


Article 298

Title@2025-07-09 (3): A Blockchain Solution for Collaborative Machine Learning over IoT

Title: A Blockchain Solution for Collaborative Machine Learning over IoT Eine Blockchain-Lösung für kollaboratives maschinelles Lernen über IoT 合作机构在IoT上学习的链链解决方案 2311.14136v2

Authors (6): Carlos Beis-Penedo, Francisco Troncoso-Pastoriza, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas, Manuel Fernández-Veiga, Martín González Soto

The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.

物联网(IoT)装置和应用的迅速增长导致对先进的分析和机器学习技术的需求增加,这些技术能够处理与数据隐私、安全和可扩展性有关的挑战; 联邦学习(FL)和连锁技术已成为解决这些挑战的有希望的办法,通过在分布式数据源方面进行分散化、安全和隐私保护模式培训; 在本文件中,我们提出了一个新的IoT解决方案,将递增学习矢量量化算法(XuILVQ)与Eitherum连锁技术结合起来,以促进在分布式环境中安全高效地分享数据、进行模型培训和原型储存; 我们提议的架构通过减少计算和通信间接费用,同时维护数据隐私和安全,解决现有以链为基础的FL解决方案的缺点; 我们通过一系列实验,评估我们的系统的业绩,展示其在IoT环境中提高机器学习任务的准确性和效率的潜力。


Article 299

Title@2025-07-09 (3): EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward Modeling

Title: EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward Modeling EFKAN: Ein KAN-Integrierter Neuraloperator für effiziente Magnetotellurische Vorwärtsmodellierung EFKAN: 高效磁电离子推进建模的KAN综合神经操作员 2502.02195v2

Authors (6): Feng Wang, Hong Qiu, Yingying Huang, Xiaozhe Gu, Renfang Wang, Bo Yang

Magnetotelluric (MT) forward modeling is fundamental for improving the accuracy and efficiency of MT inversion. Neural operators (NOs) have been effectively used for rapid MT forward modeling, demonstrating their promising performance in solving the MT forward modeling-related partial differential equations (PDEs). Particularly, they can obtain the electromagnetic field at arbitrary locations and frequencies. In these NOs, the projection layers have been dominated by multi-layer perceptrons (MLPs), which may potentially reduce the accuracy of solution due to they usually suffer from the disadvantages of MLPs, such as lack of interpretability, overfitting, and so on. Therefore, to improve the accuracy of MT forward modeling with NOs and explore the potential alternatives to MLPs, we propose a novel neural operator by extending the Fourier neural operator (FNO) with Kolmogorov-Arnold network (EFKAN). Within the EFKAN framework, the FNO serves as the branch network to calculate the apparent resistivity and phase from the resistivity model in the frequency domain. Meanwhile, the KAN acts as the trunk network to project the resistivity and phase, determined by the FNO, to the desired locations and frequencies. Experimental results demonstrate that the proposed method not only achieves higher accuracy in obtaining apparent resistivity and phase compared to the NO equipped with MLPs at the desired frequencies and locations but also outperforms traditional numerical methods in terms of computational speed.

Magnetotelluric (MT) 前方建模对于提高MT的精确度和效率至关重要。神经操作员(NOs)已被有效地用于快速MT前方建模,表明他们在解决MT前建模相关部分差异方程式(PDEs)方面表现良好。特别是,他们可以在任意地点和频率获得电磁场。在这些NOs中,投影层受到多层透视器(MLPs)的支配,这可能会降低解决方案的准确性,因为他们通常受到MLPs(MLPs)的不利条件的影响,例如缺乏可解释性、过度安装等等。因此,为了提高MTT前建模与NOs的准确性,并探索MLPs的潜在替代品。我们建议,通过将Fourier神经操作员(FNO)与Kolmogorov-Arnold网络(EFKAN)扩大一个新型的神经操作员。在EFKAN框架内,FNO公司作为分支网络网络的网络网络,仅用来计算频率模型的明显阻力和阶段,而不是理想的频率轨道定位,在最精确的轨道上,以演示阶段展示的方式展示。


Article 300

Title@2025-07-09 (3): Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching

Title: Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching Dual-Granularity Cross-Modal Identity Association für schwach-überwachte Text-zu-Personen-Bild-Matching 弱度超强文本到人图像匹配双重通用跨模式身份协会 2507.06744v1

Authors (3): Yafei Zhang, Yongle Shang, Huafeng Li

Weakly supervised text-to-person image matching, as a crucial approach to reducing models’ reliance on large-scale manually labeled samples, holds significant research value. However, existing methods struggle to predict complex one-to-many identity relationships, severely limiting performance improvements. To address this challenge, we propose a local-and-global dual-granularity identity association mechanism. Specifically, at the local level, we explicitly establish cross-modal identity relationships within a batch, reinforcing identity constraints across different modalities and enabling the model to better capture subtle differences and correlations. At the global level, we construct a dynamic cross-modal identity association network with the visual modality as the anchor and introduce a confidence-based dynamic adjustment mechanism, effectively enhancing the model’s ability to identify weakly associated samples while improving overall sensitivity. Additionally, we propose an information-asymmetric sample pair construction method combined with consistency learning to tackle hard sample mining and enhance model robustness. Experimental results demonstrate that the proposed method substantially boosts cross-modal matching accuracy, providing an efficient and practical solution for text-to-person image matching.

nan


Article 301

Title@2025-07-09 (3): Residual Prior-driven Frequency-aware Network for Image Fusion

Title: Residual Prior-driven Frequency-aware Network for Image Fusion Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion 图像融合超前驱动频率感知网络 2507.06735v1

Authors (5): Guan Zheng, Xue Wang, Wenhua Qian, Peng Liu, Runzhuo Ma

Image fusion aims to integrate complementary information across modalities to generate high-quality fused images, thereby enhancing the performance of high-level vision tasks. While global spatial modeling mechanisms show promising results, constructing long-range feature dependencies in the spatial domain incurs substantial computational costs. Additionally, the absence of ground-truth exacerbates the difficulty of capturing complementary features effectively. To tackle these challenges, we propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet. Specifically, RPFNet employs a dual-branch feature extraction framework: the Residual Prior Module (RPM) extracts modality-specific difference information from residual maps, thereby providing complementary priors for fusion; the Frequency Domain Fusion Module (FDFM) achieves efficient global feature modeling and integration through frequency-domain convolution. Additionally, the Cross Promotion Module (CPM) enhances the synergistic perception of local details and global structures through bidirectional feature interaction. During training, we incorporate an auxiliary decoder and saliency structure loss to strengthen the model’s sensitivity to modality-specific differences. Furthermore, a combination of adaptive weight-based frequency contrastive loss and SSIM loss effectively constrains the solution space, facilitating the joint capture of local details and global features while ensuring the retention of complementary information. Extensive experiments validate the fusion performance of RPFNet, which effectively integrates discriminative features, enhances texture details and salient objects, and can effectively facilitate the deployment of the high-level vision task.

nan


Article 302

Title@2025-07-09 (3): Torsion in Persistent Homology and Neural Networks

Title: Torsion in Persistent Homology and Neural Networks Torsion in persistenter Homologie und neuralen Netzwerken 持久性同族体和神经网络中的折磨 2506.03049v2

Authors (1): Maria Walch

We explore the role of torsion in hybrid deep learning models that incorporate topological data analysis, focusing on autoencoders. While most TDA tools use field coefficients, this conceals torsional features present in integer homology. We show that torsion can be lost during encoding, altered in the latent space, and in many cases, not reconstructed by standard decoders. Using both synthetic and high-dimensional data, we evaluate torsion sensitivity to perturbations and assess its recoverability across several autoencoder architectures. Our findings reveal key limitations of field-based approaches and underline the need for architectures or loss terms that preserve torsional information for robust data representation.

nan


Article 303

Title@2025-07-09 (3): On the Effect of Uncertainty on Layer-wise Inference Dynamics

Title: On the Effect of Uncertainty on Layer-wise Inference Dynamics Über den Effekt der Unsicherheit auf die schichtweise Schlussfolgerungsdynamik 不确定性对图层推论作用的影响 2507.06722v1

Authors (3): Sunwoo Kim, Haneul Yoo, Alice Oh

Understanding how large language models (LLMs) internally represent and process their predictions is central to detecting uncertainty and preventing hallucinations. While several studies have shown that models encode uncertainty in their hidden states, it is underexplored how this affects the way they process such hidden states. In this work, we demonstrate that the dynamics of output token probabilities across layers for certain and uncertain outputs are largely aligned, revealing that uncertainty does not seem to affect inference dynamics. Specifically, we use the Tuned Lens, a variant of the Logit Lens, to analyze the layer-wise probability trajectories of final prediction tokens across 11 datasets and 5 models. Using incorrect predictions as those with higher epistemic uncertainty, our results show aligned trajectories for certain and uncertain predictions that both observe abrupt increases in confidence at similar layers. We balance this finding by showing evidence that more competent models may learn to process uncertainty differently. Our findings challenge the feasibility of leveraging simplistic methods for detecting uncertainty at inference. More broadly, our work demonstrates how interpretability methods may be used to investigate the way uncertainty affects inference.

nan


Article 304

Title@2025-07-09 (3): Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning

Title: Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning Multiparameter-Steuerung für das $(1+(λ,λ))$-GA auf OneMax über Deep Reinforcement Learning (1+(,,)$-GA的多参数控制 2505.12982v2

Authors (4): Tai Nguyen, Phong Le, Carola Doerr, Nguyen Dang

It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by $27\%$ and the irace-tuned policy, the strongest existing control policy on this benchmark, by $13\%$, for all tested problem sizes up to $40{,}000$.

nan


Article 305

Title@2025-07-09 (3): From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution

Title: From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution Von Blurry bis Brillant Detection: YOLO-basierte Aerial Object Detection mit Super-Auflösung 从布鲁里到辉煌的探测:以YOLO为基地的超分辨率探测天体物体 2401.14661v2

Authors (5): Ragib Amin Nihal, Benjamin Yen, Takeshi Ashizawa, Katsutoshi Itoyama, Kazuhiro Nakadai

Aerial object detection presents challenges from small object sizes, high density clustering, and image quality degradation from distance and motion blur. These factors create an information bottleneck where limited pixel representation cannot encode sufficient discriminative features. B2BDet addresses this with a two-stage framework that applies domain-specific super-resolution during inference, followed by detection using an enhanced YOLOv5 architecture. Unlike training-time super-resolution approaches that enhance learned representations, our method recovers visual information from each input image. The approach combines aerial-optimized SRGAN fine-tuning with architectural innovations including an Efficient Attention Module (EAM) and Cross-Layer Feature Pyramid Network (CLFPN). Evaluation across four aerial datasets shows performance gains, with VisDrone achieving 52.5% mAP using only 27.7M parameters. Ablation studies show that super-resolution preprocessing contributes +2.6% mAP improvement while architectural enhancements add +2.9%, yielding +5.5% total improvement over baseline YOLOv5. The method achieves computational efficiency with 53.8% parameter reduction compared to recent approaches while achieving strong small object detection performance.

nan


Article 306

Title@2025-07-09 (3): PINN-Obs: Physics-Informed Neural Network-Based Observer for Nonlinear Dynamical Systems

Title: PINN-Obs: Physics-Informed Neural Network-Based Observer for Nonlinear Dynamical Systems PINN-Obs: Physik-informierter neuraler Netzwerk-basierter Beobachter für nichtlineare dynamische Systeme PINN-Obs:非线性动态系统物理内成神经网络观察员 2507.06712v1

Authors (4): Ayoub Farkane, Mohamed Boutayeb, Mustapha Oudani, Mounir Ghogho

State estimation for nonlinear dynamical systems is a critical challenge in control and engineering applications, particularly when only partial and noisy measurements are available. This paper introduces a novel Adaptive Physics-Informed Neural Network-based Observer (PINN-Obs) for accurate state estimation in nonlinear systems. Unlike traditional model-based observers, which require explicit system transformations or linearization, the proposed framework directly integrates system dynamics and sensor data into a physics-informed learning process. The observer adaptively learns an optimal gain matrix, ensuring convergence of the estimated states to the true system states. A rigorous theoretical analysis establishes formal convergence guarantees, demonstrating that the proposed approach achieves uniform error minimization under mild observability conditions. The effectiveness of PINN-Obs is validated through extensive numerical simulations on diverse nonlinear systems, including an induction motor model, a satellite motion system, and benchmark academic examples. Comparative experimental studies against existing observer designs highlight its superior accuracy, robustness, and adaptability.

nan


Article 307

Title@2025-07-09 (3): Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem

Title: Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem Kausale Schlussfolgerung ist nicht besonders: Warum es nur ein weiteres Vorhersageproblem ist 原因推论不是特别的:为什么这只是另一个预测问题? 2504.04320v3

Authors (1): Carlos Fernández-Loría

Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under distribution shift. In both cases, we begin with labeled data from a source domain and seek to generalize to a target domain where outcomes are not observed. The key difference is that in causal inference, the labels – potential outcomes – are selectively observed based on treatment assignment, introducing bias that must be addressed through assumptions. This perspective reframes causal estimation as a familiar generalization problem and highlights how techniques from predictive modeling, such as reweighting and domain adaptation, apply directly to causal tasks. It also clarifies that causal assumptions are not uniquely strong – they are simply more explicit. By viewing causal inference through the lens of prediction, we demystify its logic, connect it to familiar tools, and make it more accessible to practitioners and educators alike.

nan


Article 308

Title@2025-07-09 (3): Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity

Title: Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity Diversifizierung von Roboter-Lokomotionsverhalten mit extrinsischer Verhaltenskuriosität 具有极端行为判断力的多样化机器人行为行为 2410.06151v3

Authors (8): Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew Soon Ong, Ivor Tsang

Imitation learning (IL) has shown promise in robot locomotion but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for diverse robot locomotion. The source code of this work is provided at https://github.com/vanzll/EBC.

nan


Article 309

Title@2025-07-09 (3): Autonomy by Design: Preserving Human Autonomy in AI Decision-Support

Title: Autonomy by Design: Preserving Human Autonomy in AI Decision-Support Autonomie nach Design: Erhaltung der menschlichen Autonomie in der KI-Entscheidungsunterstützung 设计自主:在AI决定支持中维护人类自主 2506.23952v3

Authors (3): Stefan Buijsman, Sarah E. Carter, Juan Pablo Bermúdez

AI systems increasingly support human decision-making across domains of professional, skill-based, and personal activity. While previous work has examined how AI might affect human autonomy globally, the effects of AI on domain-specific autonomy – the capacity for self-governed action within defined realms of skill or expertise – remain understudied. We analyze how AI decision-support systems affect two key components of domain-specific autonomy: skilled competence (the ability to make informed judgments within one’s domain) and authentic value-formation (the capacity to form genuine domain-relevant values and preferences). By engaging with prior investigations and analyzing empirical cases across medical, financial, and educational domains, we demonstrate how the absence of reliable failure indicators and the potential for unconscious value shifts can erode domain-specific autonomy both immediately and over time. We then develop a constructive framework for autonomy-preserving AI support systems. We propose specific socio-technical design patterns – including careful role specification, implementation of defeater mechanisms, and support for reflective practice – that can help maintain domain-specific autonomy while leveraging AI capabilities. This framework provides concrete guidance for developing AI systems that enhance rather than diminish human agency within specialized domains of action.

nan


Article 310

Title@2025-07-09 (3): Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

Title: Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement Wert aus Beobachtungen: Auf dem Weg zu groß angelegtem Imitations-Lernen durch Selbstverbesserung 从观察中得出的价值:通过自我改进实现大型模拟学习 2507.06701v1

Authors (11): Michael Bloesch, Markus Wulfmeier, Philemon Brakel, Todor Davchev, Martina Zambelli, Jost Tobias Springenberg, Abbas Abdolmaleki, William F Whitney, Nicolas Heess, Roland Hafner, Martin Riedmiller

Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale: Unlike behavior cloning or offline reinforcement learning, IfO can leverage action-free demonstrations and thus circumvents the need for costly action-labeled demonstrations or reward functions. However, current IfO research focuses on idealized scenarios with mostly bimodal-quality data distributions, restricting the meaningfulness of the results. In contrast, this paper investigates more nuanced distributions and introduces a method to learn from such data, moving closer to a paradigm in which imitation learning can be performed iteratively via self-improvement. Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data. Through comprehensive evaluation, we delineate the relation between different data distributions and the applicability of algorithms and highlight the limitations of established methods. Our findings provide valuable insights for developing more robust and practical IfO techniques on a path to scalable behaviour learning.

nan


Article 311

Title@2025-07-09 (3): Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

Title: Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems Multi-Task Offline-Verstärkung Lernen für Online-Werbung in Recommender-Systemen 顾问系统中在线广告网上广告多任务离线强化学习 2506.23090v2

Authors (9): Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao

Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL’s superiority over state-of-the-art methods.

nan


Article 312

Title@2025-07-09 (3): An Optimisation Framework for Unsupervised Environment Design

Title: An Optimisation Framework for Unsupervised Environment Design Ein Rahmen für die Optimierung des unbeaufsichtigten Umweltdesigns 无人监督环境设计优化框架 2505.20659v2

Authors (7): Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster

For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent’s generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.

nan


Article 313

Title@2025-07-09 (3): Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study

Title: Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study Heterogene Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study 用于全域和时标电力系统短期国家预报的异异变形图形神经网络:水电厂案例研究 2507.06694v1

Authors (2): Raffael Theiler, Olga Fink

Accurate short-term state forecasting is essential for efficient and stable operation of modern power systems, especially in the context of increasing variability introduced by renewable and distributed energy resources. As these systems evolve rapidly, it becomes increasingly important to reliably predict their states in the short term to ensure operational stability, support control decisions, and enable interpretable monitoring of sensor and machine behavior. Modern power systems often span multiple physical domains - including electrical, mechanical, hydraulic, and thermal - posing significant challenges for modeling and prediction. Graph Neural Networks (GNNs) have emerged as a promising data-driven framework for system state estimation and state forecasting in such settings. By leveraging the topological structure of sensor networks, GNNs can implicitly learn inter-sensor relationships and propagate information across the network. However, most existing GNN-based methods are designed under the assumption of homogeneous sensor relationships and are typically constrained to a single physical domain. This limitation restricts their ability to integrate and reason over heterogeneous sensor data commonly encountered in real-world energy systems, such as those used in energy conversion infrastructure. In this work, we propose the use of Heterogeneous Graph Attention Networks to address these limitations. Our approach models both homogeneous intra-domain and heterogeneous inter-domain relationships among sensor data from two distinct physical domains - hydraulic and electrical - which exhibit fundamentally different temporal dynamics. Experimental results demonstrate that our method significantly outperforms conventional baselines on average by 35.5% in terms of normalized root mean square error, confirming its effectiveness in multi-domain, multi-rate power system state forecasting.

nan


Article 314

Title@2025-07-09 (3): A statistical approach to latent dynamic modeling with differential equations

Title: A statistical approach to latent dynamic modeling with differential equations Ein statistischer Ansatz zur latenten dynamischen Modellierung mit Differentialgleichungen 利用差异方程进行潜在动态建模的统计方法 2311.16286v2

Authors (5): Maren Hackenberg, Astrid Pechmann, Clemens Kreutz, Janbernd Kirschner, Harald Binder

Ordinary differential equations (ODEs) can provide mechanistic models of temporally local changes of processes, where parameters are often informed by external knowledge. While ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, e.g., in a clinical setting. Yet, modeling of local changes could also be attractive for assessing the trajectory of an individual in a cohort in the immediate future given its current status, where ODE parameters could be informed by further characteristics of the individual. However, several hurdles so far limit such use of ODEs, as compared to regression-based function fitting approaches. The potentially higher level of noise in cohort data might be detrimental to ODEs, as the shape of the ODE solution heavily depends on the initial value. In addition, larger numbers of variables multiply such problems and might be difficult to handle for ODEs. To address this, we propose to use each observation in the course of time as the initial value to obtain multiple local ODE solutions and build a combined estimator of the underlying dynamics. Neural networks are used for obtaining a low-dimensional latent space for dynamic modeling from a potentially large number of variables, and for obtaining patient-specific ODE parameters from baseline variables. Simultaneous identification of dynamic models and of a latent space is enabled by recently developed differentiable programming techniques. We illustrate the proposed approach in an application with spinal muscular atrophy patients and a corresponding simulation study. In particular, modeling of local changes in health status at any point in time is contrasted to the interpretation of functions obtained from a global regression. This more generally highlights how different application settings might demand different modeling strategies.

nan


Article 315

Title@2025-07-09 (3): Hierarchical Procedural Framework for Low-latency Robot-Assisted Hand-Object Interaction

Title: Hierarchical Procedural Framework for Low-latency Robot-Assisted Hand-Object Interaction Hierarchischer Verfahrensrahmen für die Interaktion von Robotern mit geringer Latenz mit Hand-Objekten 低弹性机器人辅助手用物体相互作用的等级程序框架 2405.19531v3

Authors (6): Mingqi Yuan, Huijiang Wang, Kai-Fung Chu, Fumiya Iida, Bo Li, Wenjun Zeng

Advances in robotics have been driving the development of human-robot interaction (HRI) technologies. However, accurately perceiving human actions and achieving adaptive control remains a challenge in facilitating seamless coordination between human and robotic movements. In this paper, we propose a hierarchical procedural framework to enable dynamic robot-assisted hand-object interaction (HOI). An open-loop hierarchy leverages the RGB-based 3D reconstruction of the human hand, based on which motion primitives have been designed to translate hand motions into robotic actions. The low-level coordination hierarchy fine-tunes the robot’s action by using the continuously updated 3D hand models. Experimental validation demonstrates the effectiveness of the hierarchical control architecture. The adaptive coordination between human and robot behavior has achieved a delay of $\leq 0.3$ seconds in the tele-interaction scenario. A case study of ring-wearing tasks indicates the potential application of this work in assistive technologies such as healthcare and manufacturing.

nan


Article 316

Title@2025-07-09 (3): Class conditional conformal prediction for multiple inputs by p-value aggregation

Title: Class conditional conformal prediction for multiple inputs by p-value aggregation Klasse bedingte Konformitätsvorhersage für mehrere Eingänge durch p-Wert-Aggregation 通过 p-value 集合对多种投入进行有条件的分类一致预测 2507.07150v1

Authors (3): Jean-Baptiste Fermanian, Mohamed Hebiri, Joseph Salmon

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.

nan


Article 317

Title@2025-07-09 (3): Fast Gaussian Processes under Monotonicity Constraints

Title: Fast Gaussian Processes under Monotonicity Constraints Schnelle Gauß-Prozesse unter Monotonik-Beschränkungen 在单度制约下快速高斯进程 2507.06677v1

Authors (3): Chao Zhang, Jasper M. Everink, Jakob Sauer Jørgensen

Gaussian processes (GPs) are widely used as surrogate models for complicated functions in scientific and engineering applications. In many cases, prior knowledge about the function to be approximated, such as monotonicity, is available and can be leveraged to improve model fidelity. Incorporating such constraints into GP models enhances predictive accuracy and reduces uncertainty, but remains a computationally challenging task for high-dimensional problems. In this work, we present a novel virtual point-based framework for building constrained GP models under monotonicity constraints, based on regularized linear randomize-then-optimize (RLRTO), which enables efficient sampling from a constrained posterior distribution by means of solving randomized optimization problems. We also enhance two existing virtual point-based approaches by replacing Gibbs sampling with the No U-Turn Sampler (NUTS) for improved efficiency. A Python implementation of these methods is provided and can be easily applied to a wide range of problems. This implementation is then used to validate the approaches on approximating a range of synthetic functions, demonstrating comparable predictive performance between all considered methods and significant improvements in computational efficiency with the two NUTS methods and especially with the RLRTO method. The framework is further applied to construct surrogate models for systems of differential equations.

nan


Article 318

Title@2025-07-09 (3): DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training

Title: DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training DAF: Ein effizientes dynamisches Aktivierungs-Framework für On-Device-DNN-Schulungen DAF: 设计DNN培训的高效端对端动态启动框架 2507.07149v1

Authors (9): Renyuan Liu, Yuyang Leng, Kaiyan Liu, Shaohan Hu, Chun-Fu, Chen, Peijun Zhao, Heechul Yun, Shuochao Yao

Recent advancements in on-device training for deep neural networks have underscored the critical need for efficient activation compression to overcome the memory constraints of mobile and edge devices. As activations dominate memory usage during training and are essential for gradient computation, compressing them without compromising accuracy remains a key research challenge. While existing methods for dynamic activation quantization promise theoretical memory savings, their practical deployment is impeded by system-level challenges such as computational overhead and memory fragmentation. To address these challenges, we introduce DAF, a Dynamic Activation Framework that enables scalable and efficient on-device training through system-level optimizations. DAF achieves both memory- and time-efficient dynamic quantization training by addressing key system bottlenecks. It develops hybrid reduction operations tailored to the memory hierarchies of mobile and edge SoCs, leverages collaborative CPU-GPU bit-packing for efficient dynamic quantization, and implements an importance-aware paging memory management scheme to reduce fragmentation and support dynamic memory adjustments. These optimizations collectively enable DAF to achieve substantial memory savings and speedup without compromising model training accuracy. Evaluations on various deep learning models across embedded and mobile platforms demonstrate up to a $22.9\times$ reduction in memory usage and a $3.2\times$ speedup, making DAF a scalable and practical solution for resource-constrained environments.

nan


Article 319

Title@2025-07-09 (3): Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey

Title: Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey Erklärbare Künstliche Intelligenz in der biomedizinischen Bildanalyse: Eine umfassende Umfrage 生物医学图像分析中可解释的人工智能:全面调查 2507.07148v1

Authors (7): Getamesay Haile Dagnaw, Yanming Zhu, Muhammad Hassan Maqsood, Wencheng Yang, Xingshuai Dong, Xuefei Yin, Alan Wee-Chung Liew

Explainable artificial intelligence (XAI) has become increasingly important in biomedical image analysis to promote transparency, trust, and clinical adoption of DL models. While several surveys have reviewed XAI techniques, they often lack a modality-aware perspective, overlook recent advances in multimodal and vision-language paradigms, and provide limited practical guidance. This survey addresses this gap through a comprehensive and structured synthesis of XAI methods tailored to biomedical image analysis.We systematically categorize XAI methods, analyzing their underlying principles, strengths, and limitations within biomedical contexts. A modality-centered taxonomy is proposed to align XAI methods with specific imaging types, highlighting the distinct interpretability challenges across modalities. We further examine the emerging role of multimodal learning and vision-language models in explainable biomedical AI, a topic largely underexplored in previous work. Our contributions also include a summary of widely used evaluation metrics and open-source frameworks, along with a critical discussion of persistent challenges and future directions. This survey offers a timely and in-depth foundation for advancing interpretable DL in biomedical image analysis.

nan


Article 320

Title@2025-07-09 (3): Enhancing Diffusion Model Stability for Image Restoration via Gradient Management

Title: Enhancing Diffusion Model Stability for Image Restoration via Gradient Management Verbesserung der Diffusions-Modellstabilität für die Bildwiederherstellung durch Gradient Management 通过梯度管理加强通过图像恢复的传播模型稳定模型 2507.06656v1

Authors (5): Hongjie Wu, Mingqin Zhang, Linchao He, Ji-Zhe Zhou, Jiancheng Lv

Diffusion models have shown remarkable promise for image restoration by leveraging powerful priors. Prominent methods typically frame the restoration problem within a Bayesian inference framework, which iteratively combines a denoising step with a likelihood guidance step. However, the interactions between these two components in the generation process remain underexplored. In this paper, we analyze the underlying gradient dynamics of these components and identify significant instabilities. Specifically, we demonstrate conflicts between the prior and likelihood gradient directions, alongside temporal fluctuations in the likelihood gradient itself. We show that these instabilities disrupt the generative process and compromise restoration performance. To address these issues, we propose Stabilized Progressive Gradient Diffusion (SPGD), a novel gradient management technique. SPGD integrates two synergistic components: (1) a progressive likelihood warm-up strategy to mitigate gradient conflicts; and (2) adaptive directional momentum (ADM) smoothing to reduce fluctuations in the likelihood gradient. Extensive experiments across diverse restoration tasks demonstrate that SPGD significantly enhances generation stability, leading to state-of-the-art performance in quantitative metrics and visually superior results. Code is available at \href{https://github.com/74587887/SPGD}{here}.

nan


Article 321

Title@2025-07-09 (3): Federated Learning Inspired Fuzzy Systems: Decentralized Rule Updating for Privacy and Scalable Decision Making

Title: Federated Learning Inspired Fuzzy Systems: Decentralized Rule Updating for Privacy and Scalable Decision Making Federated Learning Inspired Fuzzy Systems: Dezentrale Regel Aktualisierung für Datenschutz und skalierbare Entscheidungsfindung 受联邦学习联合会启发的模糊系统:关于隐私和可扩展决策的权力下放规则更新更新 2507.06652v1

Authors (4): Arthur Alexander Lim, Zhen Bin It, Jovan Bowen Heng, Tee Hui Teo

Fuzzy systems are a way to allow machines, systems and frameworks to deal with uncertainty, which is not possible in binary systems that most computers use. These systems have already been deployed for certain use cases, and fuzzy systems could be further improved as proposed in this paper. Such technologies to draw inspiration from include machine learning and federated learning. Machine learning is one of the recent breakthroughs of technology and could be applied to fuzzy systems to further improve the results it produces. Federated learning is also one of the recent technologies that have huge potential, which allows machine learning training to improve by reducing privacy risk, reducing burden on networking infrastructure, and reducing latency of the latest model. Aspects from federated learning could be used to improve federated learning, such as applying the idea of updating the fuzzy rules that make up a key part of fuzzy systems, to further improve it over time. This paper discusses how these improvements would be implemented in fuzzy systems, and how it would improve fuzzy systems. It also discusses certain limitations on the potential improvements. It concludes that these proposed ideas and improvements require further investigation to see how far the improvements are, but the potential is there to improve fuzzy systems.

nan


Article 322

Title@2025-07-09 (3): Deep Disentangled Representation Network for Treatment Effect Estimation

Title: Deep Disentangled Representation Network for Treatment Effect Estimation Deep Disentangled Representation Network for Treatment Effect Abschätzung 治疗效果估计的深度分解代表网 2507.06650v1

Authors (4): Hui Meng, Keping Yang, Xuyu Peng, Bo Zheng

Estimating individual-level treatment effect from observational data is a fundamental problem in causal inference and has attracted increasing attention in the fields of education, healthcare, and public policy.In this work, we concentrate on the study of disentangled representation methods that have shown promising outcomes by decomposing observed covariates into instrumental, confounding, and adjustment factors. However, most of the previous work has primarily revolved around generative models or hard decomposition methods for covariates, which often struggle to guarantee the attainment of precisely disentangled factors. In order to effectively model different causal relationships, we propose a novel treatment effect estimation algorithm that incorporates a mixture of experts with multi-head attention and a linear orthogonal regularizer to softly decompose the pre-treatment variables, and simultaneously eliminates selection bias via importance sampling re-weighting techniques. We conduct extensive experiments on both public semi-synthetic and real-world production datasets. The experimental results clearly demonstrate that our algorithm outperforms the state-of-the-art methods focused on individual treatment effects.

nan


Article 323

Title@2025-07-09 (3): AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

Title: AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model AHCPTQ: Genaue und hardwarekompatible Nachschulungs-Quantisierung für Segment-Anything-Modell ACHPTQ: 分片 “ 任何 “ 模式的准确和硬件兼容的训练后培训后量化 2503.03088v2

Authors (4): Wenlun Zhang, Yunshan Zhong, Shimpei Ando, Kentaro Yoshioka

The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks. However, its large storage requirements and high computational cost pose challenges for practical deployment. Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment, but we identify two key challenges in SAM that hinder the effectiveness of existing PTQ methods: the heavy-tailed and skewed distribution of post-GELU activations, and significant inter-channel variation in linear projection activations. To address these challenges, we propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM. AHCPTQ introduces hardware-compatible Hybrid Log-Uniform Quantization (HLUQ) to manage post-GELU activations, employing log2 quantization for dense small values and uniform quantization for sparse large values to enhance quantization resolution. Additionally, AHCPTQ incorporates Channel-Aware Grouping (CAG) to mitigate inter-channel variation by progressively clustering activation channels with similar distributions, enabling them to share quantization parameters and improving hardware efficiency. The combination of HLUQ and CAG not only enhances quantization effectiveness but also ensures compatibility with efficient hardware execution. For instance, under the W4A4 configuration on the SAM-L model, AHCPTQ achieves 36.6% mAP on instance segmentation with the DINO detector, while achieving a 7.89x speedup and 8.64x energy efficiency over its floating-point counterpart in FPGA implementation.

nan


Article 324

Title@2025-07-09 (3): Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

Title: Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning Lost in Retraining: Roaming des Parameterraums exponentieller Familien unter geschlossenem Loop-Lernen 损失在再培训中:在闭路学习下,在封闭式学习下,对有生命力的家庭的参数空间进行Roaming 2506.20623v2

Authors (3): Fariba Jangjoo, Matteo Marsili, Yasser Roudi

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.

nan


Article 325

Title@2025-07-09 (3): Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment

Title: Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment Lernen von Sparse Point-Labels für Dense Carcinosis Lokalisierung im fortgeschrittenen Ovarialkrebs-Assessment 学习在高级肠道癌症评估中,从 “ 零星点标签 “ 中学习,用于 “ 高度癌症诊断 “ 地方化 2507.06643v1

Authors (10): Farahdiba Zarin, Riccardo Oliva, Vinkle Srivastav, Armine Vardazaryan, Andrea Rosati, Alice Zampolini Faustini, Giovanni Scambia, Anna Fagotti, Pietro Mascagni, Nicolas Padoy

Learning from sparse labels is a challenge commonplace in the medical domain. This is due to numerous factors, such as annotation cost, and is especially true for newly introduced tasks. When dense pixel-level annotations are needed, this becomes even more unfeasible. However, being able to learn from just a few annotations at the pixel-level, while extremely difficult and underutilized, can drive progress in studies where perfect annotations are not immediately available. This work tackles the challenge of learning the dense prediction task of keypoint localization from a few point annotations in the context of 2d carcinosis keypoint localization from laparoscopic video frames for diagnostic planning of advanced ovarian cancer patients. To enable this, we formulate the problem as a sparse heatmap regression from a few point annotations per image and propose a new loss function, called Crag and Tail loss, for efficient learning. Our proposed loss function effectively leverages positive sparse labels while minimizing the impact of false negatives or missed annotations. Through an extensive ablation study, we demonstrate the effectiveness of our approach in achieving accurate dense localization of carcinosis keypoints, highlighting its potential to advance research in scenarios where dense annotations are challenging to obtain.

nan


Article 326

Title@2025-07-09 (3): EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

Title: EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision EXAONE Path 2.0: Pathologie-Stiftungsmodell mit End-to-End-Überwachung EXONE路径2.0:病理学基金会的终至终监督模式 2507.06639v1

Authors (10): Myungjang Pyeon, Janghyeon Lee, Minsoo Lee, Juseung Yun, Hwanil Choi, Jonghyun Kim, Jiwon Kim, Yi Hu, Jongseong Jang, Soonyoung Lee

In digital pathology, whole-slide images (WSIs) are often difficult to handle due to their gigapixel scale, so most approaches train patch encoders via self-supervised learning (SSL) and then aggregate the patch-level embeddings via multiple instance learning (MIL) or slide encoders for downstream tasks. However, patch-level SSL may overlook complex domain-specific features that are essential for biomarker prediction, such as mutation status and molecular characteristics, as SSL methods rely only on basic augmentations selected for natural image domains on small patch-level area. Moreover, SSL methods remain less data efficient than fully supervised approaches, requiring extensive computational resources and datasets to achieve competitive performance. To address these limitations, we present EXAONE Path 2.0, a pathology foundation model that learns patch-level representations under direct slide-level supervision. Using only 37k WSIs for training, EXAONE Path 2.0 achieves state-of-the-art average performance across 10 biomarker prediction tasks, demonstrating remarkable data efficiency.

nan


Article 327

Title@2025-07-09 (3): Semi-parametric Functional Classification via Path Signatures Logistic Regression

Title: Semi-parametric Functional Classification via Path Signatures Logistic Regression Semiparametrische Funktionsklassifizierung über Pfadsignaturen Logistische Regression 通过 “ 路径签名 “ 进行半参数功能分类 2507.06637v1

Authors (2): Pengcheng Zeng, Siyuan Jiang

We propose Path Signatures Logistic Regression (PSLR), a semi-parametric framework for classifying vector-valued functional data with scalar covariates. Classical functional logistic regression models rely on linear assumptions and fixed basis expansions, which limit flexibility and degrade performance under irregular sampling. PSLR overcomes these issues by leveraging truncated path signatures to construct a finite-dimensional, basis-free representation that captures nonlinear and cross-channel dependencies. By embedding trajectories as time-augmented paths, PSLR extracts stable, geometry-aware features that are robust to sampling irregularity without requiring a common time grid, while still preserving subject-specific timing patterns. We establish theoretical guarantees for the existence and consistent estimation of the optimal truncation order, along with non-asymptotic risk bounds. Experiments on synthetic and real-world datasets show that PSLR outperforms traditional functional classifiers in accuracy, robustness, and interpretability, particularly under non-uniform sampling schemes. Our results highlight the practical and theoretical benefits of integrating rough path theory into modern functional data analysis.

nan


Article 328

Title@2025-07-09 (3): PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

Title: PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER) PBa-LLM: Datenschutz- und Bias-aware NLP mit Named-Entity-Erkennung (NER) PBA-LLM: 使用命名实体识别(NER)的隐私和Bias-awar NLP 2507.02966v2

Authors (8): Gonzalo Mancera, Aythami Morales, Julian Fierrez, Ruben Tolosana, Alejandro Penna, Miguel Lopez-Duran, Francisco Jurado, Alvaro Ortigosa

The use of Natural Language Processing (NLP) in highstakes AI-based applications has increased significantly in recent years, especially since the emergence of Large Language Models (LLMs). However, despite their strong performance, LLMs introduce important legal/ ethical concerns, particularly regarding privacy, data protection, and transparency. Due to these concerns, this work explores the use of Named- Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs. We propose a framework that uses NER technologies to anonymize sensitive information in text data, such as personal identities or geographic locations. An evaluation of the proposed privacy-preserving learning framework was conducted to measure its impact on user privacy and system performance in a particular high-stakes and sensitive setup: AI-based resume scoring for recruitment processes. The study involved two language models (BERT and RoBERTa) and six anonymization algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT) applied to a database of 24,000 candidate profiles. The findings indicate that the proposed privacy preservation techniques effectively maintain system performance while playing a critical role in safeguarding candidate confidentiality, thus promoting trust in the experimented scenario. On top of the proposed privacy-preserving approach, we also experiment applying an existing approach that reduces the gender bias in LLMs, thus finally obtaining our proposed Privacyand Bias-aware LLMs (PBa-LLMs). Note that the proposed PBa-LLMs have been evaluated in a particular setup (resume scoring), but are generally applicable to any other LLM-based AI application.

nan


Article 329

Title@2025-07-09 (3): Prevention of Overfitting on Mesh-Structured Data Regressions with a Modified Laplace Operator

Title: Prevention of Overfitting on Mesh-Structured Data Regressions with a Modified Laplace Operator Vermeidung von Überrüstungen auf netzstrukturierte Datenregressionen mit einem modifizierten Laplace-Operator 使用变换的拉拉布操作器防止对网状结构化数据递减过度适应 2507.06631v1

Authors (1): Enda D. V. Bigarella

This document reports on a method for detecting and preventing overfitting on data regressions, herein applied to mesh-like data structures. The mesh structure allows for the straightforward computation of the Laplace-operator second-order derivatives in a finite-difference fashion for noiseless data. Derivatives of the training data are computed on the original training mesh to serve as a true label of the entropy of the training data. Derivatives of the trained data are computed on a staggered mesh to identify oscillations in the interior of the original training mesh cells. The loss of the Laplace-operator derivatives is used for hyperparameter optimisation, achieving a reduction of unwanted oscillation through the minimisation of the entropy of the trained model. In this setup, testing does not require the splitting of points from the training data, and training is thus directly performed on all available training points. The Laplace operator applied to the trained data on a staggered mesh serves as a surrogate testing metric based on diffusion properties.

nan


Article 330

Title@2025-07-09 (3): Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Title: Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation Gewichtetes Multi-Prompt-Lernen mit beschreibungsfreier großsprachiger Modelldestillation 无说明无大语言模式的加权多速学习 2507.07147v1

Authors (3): Sua Lee, Kyubum Shin, Jung Ho Park

Recent advances in pre-trained Vision Language Models (VLM) have shown promising potential for effectively adapting to downstream tasks through prompt learning, without the need for additional annotated paired datasets. To supplement the text information in VLM trained on correlations with vision data, new approaches leveraging Large Language Models (LLM) in prompts have been proposed, enhancing robustness to unseen and diverse data. Existing methods typically extract text-based responses (i.e., descriptions) from LLM to incorporate into prompts; however, this approach suffers from high variability and low reliability. In this work, we propose Description-free Multi-prompt Learning(DeMul), a novel method that eliminates the process of extracting descriptions and instead directly distills knowledge from LLM into prompts. By adopting a description-free approach, prompts can encapsulate richer semantics while still being represented as continuous vectors for optimization, thereby eliminating the need for discrete pre-defined templates. Additionally, in a multi-prompt setting, we empirically demonstrate the potential of prompt weighting in reflecting the importance of different prompts during training. Experimental results show that our approach achieves superior performance across 11 recognition datasets.

nan


Article 331

Title@2025-07-09 (3): An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs

Title: An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs Ein aufmerksamer GNN-basierter Eingabeverteidiger gegen Multiturn-Jailbreak auf LLMs 一名关注GNN GNN的输入捍卫者, 2507.07146v1

Authors (7): Zixuan Huang, Kecheng Huang, Lihao Yin, Bowei He, Huiling Zhen, Mingxuan Yuan, Zili Shao

Large Language Models (LLMs) have gained widespread popularity and are increasingly integrated into various applications. However, their capabilities can be exploited for both benign and harmful purposes. Despite rigorous training and fine-tuning for safety, LLMs remain vulnerable to jailbreak attacks. Recently, multi-turn attacks have emerged, exacerbating the issue. Unlike single-turn attacks, multi-turn attacks gradually escalate the dialogue, making them more difficult to detect and mitigate, even after they are identified. In this study, we propose G-Guard, an innovative attention-aware GNN-based input classifier designed to defend against multi-turn jailbreak attacks on LLMs. G-Guard constructs an entity graph for multi-turn queries, explicitly capturing relationships between harmful keywords and queries even when those keywords appear only in previous queries. Additionally, we introduce an attention-aware augmentation mechanism that retrieves the most similar single-turn query based on the multi-turn conversation. This retrieved query is treated as a labeled node in the graph, enhancing the ability of GNN to classify whether the current query is harmful. Evaluation results demonstrate that G-Guard outperforms all baselines across all datasets and evaluation metrics.

nan


Article 332

Title@2025-07-09 (3): Multi-objective methods in Federated Learning: A survey and taxonomy

Title: Multi-objective methods in Federated Learning: A survey and taxonomy Multi-objektive Methoden im Federated Learning: Eine Umfrage und Taxonomie 联邦学习的多目标方法:调查和分类 2502.03108v2

Authors (3): Maria Hartmann, Grégoire Danoy, Pascal Bouvry

The Federated Learning paradigm facilitates effective distributed machine learning in settings where training data is decentralized across multiple clients. As the popularity of the strategy grows, increasingly complex real-world problems emerge, many of which require balancing conflicting demands such as fairness, utility, and resource consumption. Recent works have begun to recognise the use of a multi-objective perspective in answer to this challenge. However, this novel approach of combining federated methods with multi-objective optimisation has never been discussed in the broader context of both fields. In this work, we offer a first clear and systematic overview of the different ways the two fields can be integrated. We propose a first taxonomy on the use of multi-objective methods in connection with Federated Learning, providing a targeted survey of the state-of-the-art and proposing unambiguous labels to categorise contributions. Given the developing nature of this field, our taxonomy is designed to provide a solid basis for further research, capturing existing works while anticipating future additions. Finally, we outline open challenges and possible directions for further research.

nan


Article 333

Title@2025-07-09 (3): Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Title: Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning Zielorientierte Skill Abstraction für Offline-Multi-Task-Verstärkung Lernen 面向目标的离线多任务强化学习技能总结 2507.06628v1

Authors (7): Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Offline multi-task reinforcement learning aims to learn a unified policy capable of solving multiple tasks using only pre-collected task-mixed datasets, without requiring any online interaction with the environment. However, it faces significant challenges in effectively sharing knowledge across tasks. Inspired by the efficient knowledge abstraction observed in human learning, we propose Goal-Oriented Skill Abstraction (GO-Skill), a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance. Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library. To mitigate class imbalances between broadly applicable and task-specific skills, we introduce a skill enhancement phase to refine the extracted skills. Furthermore, we integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks. Extensive experiments on diverse robotic manipulation tasks within the MetaWorld benchmark demonstrate the effectiveness and versatility of GO-Skill.

nan


Article 334

Title@2025-07-09 (3): Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic

Title: Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic Q-STAC: Q-geführtes Stein-Variantenmodell Predictive Actor-Critic Q-STAC: Q-受控斯坦变化模型预测动动作用-批评 2507.06625v1

Authors (4): Shizhe Cai, Jayadeep Jacob, Zeya Yin, Fabio Ramos

Deep reinforcement learning has shown remarkable success in continuous control tasks, yet often requires extensive training data, struggles with complex, long-horizon planning, and fails to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) offers explainability and constraint satisfaction, but typically yields only locally optimal solutions and demands careful cost function design. This paper introduces the Q-guided STein variational model predictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian MPC with actor-critic reinforcement learning through constrained Stein Variational Gradient Descent (SVGD). Our method optimizes control sequences directly using learned Q-values as objectives, eliminating the need for explicit cost function design while leveraging known system dynamics to enhance sample efficiency and ensure control signals remain within safe boundaries. Extensive experiments on 2D navigation and robotic manipulation tasks demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to state-of-the-art algorithms, while maintaining the high expressiveness of policy distributions. Experiment videos are available on our website: https://sites.google.com/view/q-stac

nan


Article 335

Title@2025-07-09 (3): UniOD: A Universal Model for Outlier Detection across Diverse Domains

Title: UniOD: A Universal Model for Outlier Detection across Diverse Domains UniOD: Ein universelles Modell für Ausreißer-Erkennung über verschiedene Domains UNOD: 跨不同域域的外星探测通用模型 2507.06624v1

Authors (2): Dazhi Fu, Jicong Fan

Outlier detection (OD) seeks to distinguish inliers and outliers in completely unlabeled datasets and plays a vital role in science and engineering. Most existing OD methods require troublesome dataset-specific hyperparameter tuning and costly model training before they can be deployed to identify outliers. In this work, we propose UniOD, a universal OD framework that leverages labeled datasets to train a single model capable of detecting outliers of datasets from diverse domains. Specifically, UniOD converts each dataset into multiple graphs, produces consistent node features, and frames outlier detection as a node-classification task, and is able to generalize to unseen domains. As a result, UniOD avoids effort on model selection and hyperparameter tuning, reduces computational cost, and effectively utilizes the knowledge from historical datasets, which improves the convenience and accuracy in real applications. We evaluate UniOD on 15 benchmark OD datasets against 15 state-of-the-art baselines, demonstrating its effectiveness.

nan


Article 336

Title@2025-07-09 (3): Saffron-1: Safety Inference Scaling

Title: Saffron-1: Safety Inference Scaling Saffron-1: Sicherheitsableitende Skalierung Safron-1:安全推断比例 2506.06444v2

Authors (5): Ruizhong Qiu, Gaotang Li, Tianxin Wei, Jingrui He, Hanghang Tong

Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods’ susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning capabilities but remains unexplored in the context of safety assurance. Addressing this gap, our work pioneers inference scaling for robust and effective LLM safety against emerging threats. We reveal that conventional inference scaling techniques, despite their success in reasoning tasks, perform poorly in safety contexts, even falling short of basic approaches like Best-of-N Sampling. We attribute this inefficiency to a newly identified challenge, the exploration–efficiency dilemma, arising from the high computational overhead associated with frequent process reward model (PRM) evaluations. To overcome this dilemma, we propose SAFFRON, a novel inference scaling paradigm tailored explicitly for safety assurance. Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations. To operationalize this paradigm, we further propose: (i) a partial supervision training objective for MRM, (ii) a conservative exploration constraint to prevent out-of-distribution explorations, and (iii) a Trie-based key–value caching strategy that facilitates cache sharing across sequences during tree search. Extensive experiments validate the effectiveness of our method. Additionally, we publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M) to accelerate future research in LLM safety. Our code, model, and data are publicly available at https://github.com/q-rz/saffron , and our project homepage is at https://q-rz.github.io/p/saffron .

nan


Article 337

Title@2025-07-09 (3): Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000

Title: Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000 Schritte Adaptive Decay DPSGD: Leistungssteigerung auf unausgeglichenen Datensätzen mit differentieller Privatsphäre mit HAM10000 适应性衰减步骤DPSGD:用HAM1000提高具有不同隐私的平衡数据集的性能 2507.06619v1

Authors (2): Xiaobo Huang, Fang Xie

When applying machine learning to medical image classification, data leakage is a critical issue. Previous methods, such as adding noise to gradients for differential privacy, work well on large datasets like MNIST and CIFAR-100, but fail on small, imbalanced medical datasets like HAM10000. This is because the imbalanced distribution causes gradients from minority classes to be clipped and lose crucial information, while majority classes dominate. This leads the model to fall into suboptimal solutions early. To address this, we propose SAD-DPSGD, which uses a linear decaying mechanism for noise and clipping thresholds. By allocating more privacy budget and using higher clipping thresholds in the initial training phases, the model avoids suboptimal solutions and enhances performance. Experiments show that SAD-DPSGD outperforms Auto-DPSGD on HAM10000, improving accuracy by 2.15% under $\epsilon = 3.0$ , $\delta = 10^{-3}$.

nan


Article 338

Title@2025-07-09 (3): Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance

Title: Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance Effizientes Multi-Task-Verstärktes Lernen mit bereichsübergreifender politischer Anleitung 具有跨任务政策指导的高效多任务强化学习 2507.06615v1

Authors (7): Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Multi-task reinforcement learning endeavors to efficiently leverage shared information across various tasks, facilitating the simultaneous learning of multiple tasks. Existing approaches primarily focus on parameter sharing with carefully designed network structures or tailored optimization procedures. However, they overlook a direct and complementary way to exploit cross-task similarities: the control policies of tasks already proficient in some skills can provide explicit guidance for unmastered tasks to accelerate skills acquisition. To this end, we present a novel framework called Cross-Task Policy Guidance (CTPG), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks’ control policies, generating better training trajectories. In addition, we propose two gating mechanisms to improve the learning efficiency of CTPG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance. CTPG is a general framework adaptable to existing parameter sharing approaches. Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.

nan


Article 339

Title@2025-07-09 (3): Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

Title: Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning Klassifizierung von Autoimmunerkrankungen aus peripheren Blut-TCR-Repertoires durch multimodales Multi-Instanz-Lernen 通过多式多机构学习,从周边血液中的自体免疫疾病 TCR 孔径通过多功能多功能学习分类 2507.04981v3

Authors (6): Ruihao Zhang, Mao chen, Fei Ye, Dandan Meng, Yixuan Huang, Xiao Liu

T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

nan


Article 340

Title@2025-07-09 (3): Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation

Title: Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation Denoising Multi-Beta VAE: Repräsentationslernen für Entflechtung und Generation 多鲸目动物VAE:拆分和代代代代代代学习 2507.06613v1

Authors (4): Anshuk Uppal, Yuhta Takida, Chieh-Hsin Lai, Yuki Mitsufuji

Disentangled and interpretable latent representations in generative models typically come at the cost of generation quality. The $\beta$-VAE framework introduces a hyperparameter $\beta$ to balance disentanglement and reconstruction quality, where setting $\beta > 1$ introduces an information bottleneck that favors disentanglement over sharp, accurate reconstructions. To address this trade-off, we propose a novel generative modeling framework that leverages a range of $\beta$ values to learn multiple corresponding latent representations. First, we obtain a slew of representations by training a single variational autoencoder (VAE), with a new loss function that controls the information retained in each latent representation such that the higher $\beta$ value prioritize disentanglement over reconstruction fidelity. We then, introduce a non-linear diffusion model that smoothly transitions latent representations corresponding to different $\beta$ values. This model denoises towards less disentangled and more informative representations, ultimately leading to (almost) lossless representations, enabling sharp reconstructions. Furthermore, our model supports sample generation without input images, functioning as a standalone generative model. We evaluate our framework in terms of both disentanglement and generation quality. Additionally, we observe smooth transitions in the latent spaces with respect to changes in $\beta$, facilitating consistent manipulation of generated outputs.

nan


Article 341

Title@2025-07-09 (3): Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Title: Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Decoder-Hybrid-Decoder-Architektur für effizientes Nachdenken mit langer Generation 提高长代人合理性效率的代coder-Hybrid-Decer 结构 2507.06607v1

Authors (14): Liliang Ren, Congcong Chen, Haoran Xu, Young Jin Kim, Adam Atkinson, Zheng Zhan, Jiankai Sun, Baolin Peng, Liyuan Liu, Shuohang Wang, Hao Cheng, Jianfeng Gao, Weizhu Chen, Yelong Shen

Recent advances in language modeling have demonstrated the effectiveness of State Space Models (SSMs) for efficient sequence modeling. While hybrid architectures such as Samba and the decoder-decoder architecture, YOCO, have shown promising performance gains over Transformers, prior works have not investigated the efficiency potential of representation sharing between SSM layers. In this paper, we introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers. We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs in the cross-decoder to share memory readout states from a Samba-based self-decoder. SambaY significantly enhances decoding efficiency, preserves linear pre-filling time complexity, and boosts long-context performance, all while eliminating the need for explicit positional encoding. Through extensive scaling experiments, we demonstrate that our model exhibits a significantly lower irreducible loss compared to a strong YOCO baseline, indicating superior performance scalability under large-scale compute regimes. Our largest model enhanced with Differential Attention, Phi4-mini-Flash-Reasoning, achieves significantly better performance than Phi4-mini-Reasoning on reasoning tasks such as Math500, AIME24/25, and GPQA Diamond without any reinforcement learning, while delivering up to 10x higher decoding throughput on 2K-length prompts with 32K generation length under the vLLM inference framework. We release our training codebase on open-source data at https://github.com/microsoft/ArchScale.

nan


Article 342

Title@2025-07-09 (3): Generalization in Reinforcement Learning for Radio Access Networks

Title: Generalization in Reinforcement Learning for Radio Access Networks Generalisierung im Ausbau-Lernen für Funkzugangsnetze 无线电接入网络强化学习一般化 2507.06602v1

Authors (4): Burak Demirel, Yu Wang, Cristian Tatino, Pablo Soldati

Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) encodes cell topology and node attributes via attention-based graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.

nan


Article 343

Title@2025-07-09 (3): SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model

Title: SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model SeisMoLLM: Verbesserung der seismischen Überwachung durch modalen Transfer mit vortrainiertem Großsprachenmodell SeisMOLLM:利用预先培训的大型语言模式,通过跨模式转让推进地震监测 2502.19960v2

Authors (8): Xinghao Wang, Feng Liu, Rui Su, Zhihui Wang, Lihua Fang, Lianqing Zhou, Lei Bai, Wanli Ouyang

Recent advances in deep learning have revolutionized seismic monitoring, yet developing a foundation model that performs well across multiple complex tasks remains challenging, particularly when dealing with degraded signals or data scarcity. This work presents SeisMoLLM, the first foundation model that utilizes cross-modal transfer for seismic monitoring, to unleash the power of large-scale pre-training from a large language model without requiring direct pre-training on seismic datasets. Through elaborate waveform tokenization and fine-tuning of pre-trained GPT-2 model, SeisMoLLM achieves state-of-the-art performance on the DiTing and STEAD datasets across five critical tasks: back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification. It attains 36 best results out of 43 task metrics and 12 top scores out of 16 few-shot generalization metrics, with many relative improvements ranging from 10% to 50%. In addition to its superior performance, SeisMoLLM maintains efficiency comparable to or even better than lightweight models in both training and inference. These findings establish SeisMoLLM as a promising foundation model for practical seismic monitoring and highlight cross-modal transfer as an exciting new direction for earthquake studies, showcasing the potential of advanced deep learning techniques to propel seismology research forward.

nan


Article 344

Title@2025-07-09 (3): FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

Title: FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models FEVO: Finanzwissenserweiterung und -entwicklung für große Sprachmodelle FEVO: 金融知识的扩大和大语言模式的推理演变 2507.06057v2

Authors (9): Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang

Advancements in reasoning for large language models (LLMs) have lead to significant performance improvements for LLMs in various fields such as mathematics and programming. However, research applying these advances to the financial domain, where considerable domain-specific knowledge is necessary to complete tasks, remains limited. To address this gap, we introduce FEVO (Financial Evolution), a multi-stage enhancement framework developed to enhance LLM performance in the financial domain. FEVO systemically enhances LLM performance by using continued pre-training (CPT) to expand financial domain knowledge, supervised fine-tuning (SFT) to instill structured, elaborate reasoning patterns, and reinforcement learning (RL) to further integrate the expanded financial domain knowledge with the learned structured reasoning. To ensure effective and efficient training, we leverage frontier reasoning models and rule-based filtering to curate FEVO-Train, high-quality datasets specifically designed for the different post-training phases. Using our framework, we train the FEVO series of models - C32B, S32B, R32B - from Qwen2.5-32B and evaluate them on seven benchmarks to assess financial and general capabilities, with results showing that FEVO-R32B achieves state-of-the-art performance on five financial benchmarks against much larger models as well as specialist models. More significantly, FEVO-R32B demonstrates markedly better performance than FEVO-R32B-0 (trained from Qwen2.5-32B-Instruct using only RL), thus validating the effectiveness of financial domain knowledge expansion and structured, logical reasoning distillation

nan


Article 345

Title@2025-07-09 (3): Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots

Title: Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots Hysterese-Aware Neural Network Modeling und Ganzkörper-Verstärkung Learning Control von Soft Robots 软机器人的神经网络建模和整体加强学习控制 2504.13582v2

Authors (13): Zongyuan Chen, Yan Xia, Jiayuan Liu, Jijia Liu, Wenhao Tang, Jiayu Chen, Feng Gao, Longfei Ma, Hongen Liao, Yu Wang, Chao Yu, Boyu Zhang, Fei Xing

Soft robots exhibit inherent compliance and safety, which makes them particularly suitable for applications requiring direct physical interaction with humans, such as surgical procedures. However, their nonlinear and hysteretic behavior, resulting from the properties of soft materials, presents substantial challenges for accurate modeling and control. In this study, we present a soft robotic system designed for surgical applications and propose a hysteresis-aware whole-body neural network model that accurately captures and predicts the soft robot’s whole-body motion, including its hysteretic behavior. Building upon the high-precision dynamic model, we construct a highly parallel simulation environment for soft robot control and apply an on-policy reinforcement learning algorithm to efficiently train whole-body motion control strategies. Based on the trained control policy, we developed a soft robotic system for surgical applications and validated it through phantom-based laser ablation experiments in a physical environment. The results demonstrate that the hysteresis-aware modeling reduces the Mean Squared Error (MSE) by 84.95 percent compared to traditional modeling methods. The deployed control algorithm achieved a trajectory tracking error ranging from 0.126 to 0.250 mm on the real soft robot, highlighting its precision in real-world conditions. The proposed method showed strong performance in phantom-based surgical experiments and demonstrates its potential for complex scenarios, including future real-world clinical applications.

nan


Article 346

Title@2025-07-09 (3): Learning controllable dynamics through informative exploration

Title: Learning controllable dynamics through informative exploration Lernen steuerbare Dynamik durch informative Erkundung 通过信息探索进行学习控制动态 2507.06582v1

Authors (2): Peter N. Loxley, Friedrich T. Sommer

Environments with controllable dynamics are usually understood in terms of explicit models. However, such models are not always available, but may sometimes be learned by exploring an environment. In this work, we investigate using an information measure called “predicted information gain” to determine the most informative regions of an environment to explore next. Applying methods from reinforcement learning allows good suboptimal exploring policies to be found, and leads to reliable estimates of the underlying controllable dynamics. This approach is demonstrated by comparing with several myopic exploration approaches.

nan


Article 347

Title@2025-07-09 (3): Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints

Title: Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints Diffusionsgetriebene semantische Kommunikation für generative Modelle mit Bandbreitenbeschränkungen 在带宽限制下生成模型的传播-驱动性语义交流 2407.18468v4

Authors (6): Lei Guo, Wei Chen, Yuxuan Sun, Bo Ai, Nikolaos Pappas, Tony Q. S. Quek

Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC). We release the code at https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication.

nan


Article 348

Title@2025-07-09 (3): From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization

Title: From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization Von Daten-Centric zu Proben-Centric: Verbesserung der LLM-Reasoning durch progressive Optimierung 从数据中心到抽样中心:通过渐进优化提高LLM的教学理由 2507.06573v1

Authors (7): Xinjie Chen, Minpeng Liao, Guoxin Chen, Chengxi Li, Biao Fu, Kai Fan, Xinggao Liu

Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs). While prior work has emphasized algorithmic design, data curation, and reward shaping, we investigate RLVR from a sample-centric perspective and introduce LPPO (Learning-Progress and Prefix-guided Optimization), a framework of progressive optimization techniques. Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume. First, motivated by how hints aid human problem-solving, we propose prefix-guided sampling, an online data augmentation method that incorporates partial solution prefixes from expert demonstrations to guide the policy, particularly for challenging instances. Second, inspired by how humans focus on important questions aligned with their current capabilities, we introduce learning-progress weighting, a dynamic strategy that adjusts each training sample’s influence based on model progression. We estimate sample-level learning progress via an exponential moving average of per-sample pass rates, promoting samples that foster learning and de-emphasizing stagnant ones. Experiments on mathematical-reasoning benchmarks demonstrate that our methods outperform strong baselines, yielding faster convergence and a higher performance ceiling.

nan


Article 349

Title@2025-07-09 (3): CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs

Title: CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs CCQ: Konvolutionaler Code für extreme Low-Bit-Quantisierung in LLMs CCQ: 极低位量化微量(LLMM)的革命法 2507.07145v1

Authors (11): Zhaojing Zhou, Xunchao Li, Minghao Li, Handi Zhang, Haoshuang Wang, Wenbin Chang, Yiqun Liu, Qingqing Dang, Dianhai Yu, Yanjun Ma, Haifeng Wang

The rapid scaling of Large Language Models (LLMs) elevates inference costs and compounds substantial deployment barriers. While quantization to 8 or 4 bits mitigates this, sub-3-bit methods face severe accuracy, scalability, and efficiency degradation. We propose Convolutional Code Quantization (CCQ), an inference-optimized quantization approach compressing LLMs to 2.0-2.75 bits with minimal accuracy loss. Departing from error-prone scalar quantization or slow vector quantization, CCQ integrates a hardware-aware bit-shift encoding and decoding solution with Convolutional Code, Hybrid Encoding, and Code Cluster, jointly overcoming accuracy-speed bottlenecks. We construct a lookup-free encoding space, enabling a linear mapping between the codebook and weight vectors, thereby optimizing inference performance. Meanwhile, by drawing on the concept of data mapping from vector quantization, we minimize the performance degradation of the model under extremely low-bit conditions. Experiments demonstrate that CCQ achieves outstanding performance on LLMs across various benchmarks. We compress DeepSeek-V3 (671B total parameters) to 184GB and ERNIE-4.5-300B-A47B to 89GB, enabling single-GPU deployment of ERNIE 4.5 and eliminating inter-card communication. The 2-bit ERNIE-4.5-300B-A47B model and inference engine have been open-sourced.

nan


Article 350

Title@2025-07-09 (3): SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

Title: SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference SlimCaching: Kanten-Caching von Mixture-of-Experts für verteilte Inferenz SlimCaching: 分布式推断的混合专家的边缘缓存 2507.06567v1

Authors (3): Qian Chen, Xianhao Chen, Kaibin Huang

Mixture-of-Experts (MoE) models improve the scalability of large language models (LLMs) by activating only a small subset of relevant experts per input. However, the sheer number of expert networks in an MoE model introduces a significant storage burden for an edge device. To address this challenge, we consider a scenario where experts are dispersed within an edge network for distributed inference. Based on the popular Top-$K$ expert selection strategy, we formulate a latency minimization problem by optimizing expert caching on edge servers under storage constraints. When $K=1$, the problem reduces to a monotone submodular maximization problem with knapsack constraints, for which we design a greedy-based algorithm with a $(1 - 1/e)$-approximation guarantee. For the general case where $K\geq1$, expert co-activation within the same MoE layer introduces non-submodularity, causing greedy methods to be ineffective. To tackle this issue, we propose a successive greedy decomposition method to decompose the original problem into a series of subproblems, with each being solved by a dynamic programming approach. Furthermore, we design an accelerated algorithm based on the max-convolution technique to obtain the approximate solution with a provable guarantee in polynomial time. Simulation results on various MoE models demonstrate that our method significantly reduces inference latency compared to existing baselines.

nan


Article 351

Title@2025-07-09 (3): PASS: Private Attributes Protection with Stochastic Data Substitution

Title: PASS: Private Attributes Protection with Stochastic Data Substitution PASS: Schutz privater Attribute mit stochastischer Datenersetzung PASS: 私人财产保护,用存储数据替代数据 2506.07308v2

Authors (6): Yizhuo Chen, Chun-Fu, Chen, Hsiang Hsu, Shaohan Hu, Tarek Abdelzaher

The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.

nan


Article 352

Title@2025-07-09 (3): CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback

Title: CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback CHAI für LLMs: Verbesserung der code-gemischten Übersetzung in großen Sprachmodellen durch Verstärkungslernen mit KI-Feedback LLM女士CHAI:利用AI反馈加强学习,改进大语言模式的编码混合翻译 2411.09073v3

Authors (3): Wenbo Zhang, Aditya Majumdar, Amulya Yadav

Large Language Models (LLMs) have demonstrated remarkable capabilities across various NLP tasks but struggle with code-mixed (or code-switched) language understanding. For example, prior work benchmarking the performance of multilingual LLMs on code-mixed translation tasks has demonstrated that current state-of-the-art multilingual LLMs are ineffective in dealing with code-mixed languages. However, the question of how to improve the capability of multilingual LLMs to handle code-mixed language has not received any attention to date. In this paper, we tackle this research gap by proposing CHAI, a novel general-purpose framework for improving the ability of multilingual LLMs to handle code-mixed languages. CHAI relies on three novel contributions made in this paper. First, we explore the ability of LLMs to provide accurate annotations for code-mixed translation tasks. Second, we leverage this ability of LLMs as annotators to generate preference data for code-mixed translation tasks at scale, which are then used within a reinforcement learning from AI feedback (RLAIF) procedure to improve LLMs’ capability on code-mixed tasks. Third, we conduct a rigorous experimental evaluation across various real-world datasets and settings. Our analysis shows that CHAI-powered LLMs outperform state-of-the-art open-source LLMs by 25.66% (in terms of win rate adjudicated by human annotators) in code-mixed translation tasks. This work represents a first step towards developing more inclusive code-mixed LLMs.

nan


Article 353

Title@2025-07-09 (3): The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production

Title: The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production Die Fehler anderer: Ein LLM-gestützter Rahmen für die wissenschaftliche Wissensproduktion 其他法律:由LLM驱动的科学知识生产框架 2507.06565v1

Authors (1): Juan B. Gutiérrez

Large-language models turn writing into a live exchange between humans and software. We capture this new medium with a discursive-network model that treats people and LLMs as equal nodes and tracks how their statements circulate. Broadening the focus from isolated hallucinations, we define invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. A general mathematical model of discursive networks is developed to provide valuable insights: A network governed only by drift and self-repair stabilizes at a modest error rate; adding fabrication reproduces the high rates seen in current LLMs. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source \emph{Flaws-of-Others (FOO) algorithm}: a configurable loop in which any set of agents critique one another while a harmoniser merges their verdicts. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from wiring imperfect ones into networks that keep each other honest.

nan


Article 354

Title@2025-07-09 (3): Efficient Transfer Learning via Causal Bounds

Title: Efficient Transfer Learning via Causal Bounds Effizientes Transfer-Lernen über Causal Bounds 通过 “ 因果关系 “ 进行高效转让学习 2308.03572v5

Authors (3): Xueping Gong, Wei You, Jiheng Zhang

Transfer learning seeks to accelerate sequential decision-making by leveraging offline data from related agents. However, data from heterogeneous sources that differ in observed features, distributions, or unobserved confounders often render causal effects non-identifiable and bias naive estimators. We address this by forming ambiguity sets of structural causal models defined via integral constraints on their joint densities. Optimizing any causal effect over these sets leads to generally non-convex programs whose solutions tightly bound the range of possible effects under heterogeneity or confounding. To solve these programs efficiently, we develop a hit-and-run sampler that explores the entire ambiguity set and, when paired with a local optimization oracle, produces causal bound estimates that converge almost surely to the true limits. We further accommodate estimation error by relaxing the ambiguity set and exploit the Lipschitz continuity of causal effects to establish precise error propagation guarantees. These causal bounds are then embedded into bandit algorithms via arm elimination and truncated UCB indices, yielding optimal gap-dependent and minimax regret bounds. To handle estimation error, we also develop a safe algorithm for incorporating noisy causal bounds. In the contextual-bandit setting with function approximation, our method uses causal bounds to prune both the function class and the per-context action set, achieving matching upper and lower regret bounds with only logarithmic dependence on function-class complexity. Our analysis precisely characterizes when and how causal side-information accelerates online learning, and experiments on synthetic benchmarks confirm substantial regret reductions in data-scarce or confounded regimes.

nan


Article 355

Title@2025-07-09 (3): Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Title: Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v3

Authors (3): Runlong Zhou, Maryam Fazel, Simon S. Du

Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assumption fails to capture the non-transitive nature of populational human preferences. Nash learning from human feedback (NLHF), targeting non-transitive preferences, is a problem of computing the Nash equilibrium (NE) of the two-player constant-sum game defined by the human preference. We introduce Extragradient preference optimization (EGPO), a novel algorithm for NLHF achieving last-iterate linear convergence to the NE of KL-regularized games and polynomial convergence to the NE of original games, while being robust to noise. Unlike previous approaches that rely on nested optimization, we derive an equivalent implementation using gradients of an online variant of the identity preference optimization (IPO) loss, enabling more faithful implementation for neural networks. Our empirical evaluations demonstrate EGPO’s superior performance over baseline methods when training for the same number of epochs, as measured by pairwise win-rates using the ground truth preference. These results validate both the theoretical strengths and practical advantages of EGPO for language model alignment with non-transitive human preferences.

nan


Article 356

Title@2025-07-09 (3): Divergence-Based Similarity Function for Multi-View Contrastive Learning

Title: Divergence-Based Similarity Function for Multi-View Contrastive Learning Divergenzbasierte Ähnlichkeitsfunktion für das kontrastive Lernen mit mehreren Ansichten 多查看差异学习的基于差异的相似性功能 2507.06560v1

Authors (3): Jae Hyoung Jeon, Cheolsu Lim, Myungjoo Kang

Recent success in contrastive learning has sparked growing interest in more effectively leveraging multiple augmented views of an instance. While prior methods incorporate multiple views at the loss or feature level, they primarily capture pairwise relationships and fail to model the joint structure across all views. In this work, we propose a divergence-based similarity function (DSF) that explicitly captures the joint structure by representing each set of augmented views as a distribution and measuring similarity as the divergence between distributions. Extensive experiments demonstrate that DSF consistently improves performance across various tasks, including kNN classification and linear evaluation, while also offering greater efficiency compared to other multi-view methods. Furthermore, we establish a theoretical connection between DSF and cosine similarity, and show that, unlike cosine similarity, DSF operates effectively without requiring a temperature hyperparameter.

nan


Article 357

Title@2025-07-09 (3): The Primacy of Magnitude in Low-Rank Adaptation

Title: The Primacy of Magnitude in Low-Rank Adaptation Das Primat der Magnitude in der Low-Rank-Anpassung 低Rank适应中磁力的优先性 2507.06558v1

Authors (8): Zicheng Zhang, Haoran Li, Yifeng Zhang, Guoqiang Gong, Jiaxing Wang, Pengzhang Liu, Qixia Jiang, Junxing Hu

Low-Rank Adaptation (LoRA) offers a parameter-efficient paradigm for tuning large models. While recent spectral initialization methods improve convergence and performance over the naive “Noise & Zeros” scheme, their extra computational and storage overhead undermines efficiency. In this paper, we establish update magnitude as the fundamental driver of LoRA performance and propose LoRAM, a magnitude-driven “Basis & Basis” initialization scheme that matches spectral methods without their inefficiencies. Our key contributions are threefold: (i) Magnitude of weight updates determines convergence. We prove low-rank structures intrinsically bound update magnitudes, unifying hyperparameter tuning in learning rate, scaling factor, and initialization as mechanisms to optimize magnitude regulation. (ii) Spectral initialization succeeds via magnitude amplification. We demystify that the presumed knowledge-driven benefit of the spectral component essentially arises from the boost in the weight update magnitude. (iii) A novel and compact initialization strategy, LoRAM, scales deterministic orthogonal bases using pretrained weight magnitudes to simulate spectral gains. Extensive experiments show that LoRAM serves as a strong baseline, retaining the full efficiency of LoRA while matching or outperforming spectral initialization across benchmarks.

nan


Article 358

Title@2025-07-09 (3): On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective

Title: On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective Über die Härte der unüberwachten Domänenanpassung: Optimale Lernende und informationstheoretische Perspektive 关于未受监督的领域适应的难度:最佳学习者和信息理论视角 2507.06552v1

Authors (3): Zhiyi Dong, Zixuan Liu, Yongyi Mao

This paper studies the hardness of unsupervised domain adaptation (UDA) under covariate shift. We model the uncertainty that the learner faces by a distribution $\pi$ in the ground-truth triples $(p, q, f)$ – which we call a UDA class – where $(p, q)$ is the source – target distribution pair and $f$ is the classifier. We define the performance of a learner as the overall target domain risk, averaged over the randomness of the ground-truth triple. This formulation couples the source distribution, the target distribution and the classifier in the ground truth, and deviates from the classical worst-case analyses, which pessimistically emphasize the impact of hard but rare UDA instances. In this formulation, we precisely characterize the optimal learner. The performance of the optimal learner then allows us to define the learning difficulty for the UDA class and for the observed sample. To quantify this difficulty, we introduce an information-theoretic quantity – Posterior Target Label Uncertainty (PTLU) – along with its empirical estimate (EPTLU) from the sample , which capture the uncertainty in the prediction for the target domain. Briefly, PTLU is the entropy of the predicted label in the target domain under the posterior distribution of ground-truth classifier given the observed source and target samples. By proving that such a quantity serves to lower-bound the risk of any learner, we suggest that these quantities can be used as proxies for evaluating the hardness of UDA learning. We provide several examples to demonstrate the advantage of PTLU, relative to the existing measures, in evaluating the difficulty of UDA learning.

nan


Article 359

Title@2025-07-09 (3): Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery

Title: Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery Q2SAR: Ein Quantum-Multiple-Kernel-Lernansatz für die Drogenentdeckung Q2SAR:药物发现量子多核心学习方法 2506.14920v3

Authors (5): Alejandro Giraldo, Daniel Ruiz, Mariano Caruso, Javier Mancilla, Guido Bellomo

Quantitative Structure-Activity Relationship (QSAR) modeling is a cornerstone of computational drug discovery. This research demonstrates the successful application of a Quantum Multiple Kernel Learning (QMKL) framework to enhance QSAR classification, showing a notable performance improvement over classical methods. We apply this methodology to a dataset for identifying DYRK1A kinase inhibitors. The workflow involves converting SMILES representations into numerical molecular descriptors, reducing dimensionality via Principal Component Analysis (PCA), and employing a Support Vector Machine (SVM) trained on an optimized combination of multiple quantum and classical kernels. By benchmarking the QMKL-SVM against a classical Gradient Boosting model, we show that the quantum-enhanced approach achieves a superior AUC score, highlighting its potential to provide a quantum advantage in challenging cheminformatics classification tasks.

nan


Article 360

Title@2025-07-09 (3): Deep-Learning-Based Pre-Layout Parasitic Capacitance Prediction on SRAM Designs

Title: Deep-Learning-Based Pre-Layout Parasitic Capacitance Prediction on SRAM Designs Deep-Learning-based Pre-Layout Parasitic Capacitance Prediction auf SRAM-Designs 关于SRRAM设计设计的深层学习的Layount前寄生虫能力预测 2507.06549v1

Authors (6): Shan Shen, Dingcheng Yang, Yuyang Xie, Chunyan Pei, Wenjian Yu, Bei Yu

To achieve higher system energy efficiency, SRAM in SoCs is often customized. The parasitic effects cause notable discrepancies between pre-layout and post-layout circuit simulations, leading to difficulty in converging design parameters and excessive design iterations. Is it possible to well predict the parasitics based on the pre-layout circuit, so as to perform parasitic-aware pre-layout simulation? In this work, we propose a deep-learning-based 2-stage model to accurately predict these parasitics in pre-layout stages. The model combines a Graph Neural Network (GNN) classifier and Multi-Layer Perceptron (MLP) regressors, effectively managing class imbalance of the net parasitics in SRAM circuits. We also employ Focal Loss to mitigate the impact of abundant internal net samples and integrate subcircuit information into the graph to abstract the hierarchical structure of schematics. Experiments on 4 real SRAM designs show that our approach not only surpasses the state-of-the-art model in parasitic prediction by a maximum of 19X reduction of error but also significantly boosts the simulation process by up to 598X speedup.

nan


Article 361

Title@2025-07-09 (3): Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Title: Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution Konzept-TRAK: Verstehen, wie Diffusionsmodelle Konzepte durch Konzept-Level-Attribution lernen 概念-TRAK:了解传播模式如何通过概念层面的归属来学习概念 2507.06547v1

Authors (10): Yonghyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji

While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that matter most to stakeholders. To bridge this gap, we introduce \emph{concept-level attribution} via a novel method called \emph{Concept-TRAK}. Concept-TRAK extends influence functions with two key innovations: (1) a reformulated diffusion training loss based on diffusion posterior sampling, enabling robust, sample-specific attribution; and (2) a concept-aware reward function that emphasizes semantic relevance. We evaluate Concept-TRAK on the AbC benchmark, showing substantial improvements over prior methods. Through diverse case studies–ranging from identifying IP-protected and unsafe content to analyzing prompt engineering and compositional learning–we demonstrate how concept-level attribution yields actionable insights for responsible generative AI development and governance.

nan


Article 362

Title@2025-07-09 (3): Semantic Augmentation in Images using Language

Title: Semantic Augmentation in Images using Language Semantische Augmentation in Bildern mit Sprache 语言图像中的语义放大 2404.02353v2

Authors (5): Sahiti Yerramilli, Jayant Sravan Tamarapalli, Tanmay Girish Kulkarni, Jonathan Francis, Eric Nyberg

Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

nan


Article 363

Title@2025-07-09 (3): A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning

Title: A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning Eine einzige Zusammenführung: Wiederherstellung serverbasierter Lernleistung im dezentralisierten Lernen 单一合并条件:在分散学习中恢复基于服务器的学习绩效 2507.06542v1

Authors (5): Tongtian Zhu, Tianyu Zhang, Mingze Wang, Zhanpeng Zhou, Can Wang

Decentralized learning provides a scalable alternative to traditional parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time, including determining when and how frequently devices synchronize. Our empirical results show that concentrating communication budgets in the later stages of decentralized training markedly improves global generalization. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, is sufficient to match the performance of server-based training. We further show that low communication in decentralized learning preserves the \textit{mergeability} of local models throughout training. Our theoretical contributions, which explains these phenomena, are first to establish that the globally merged model of decentralized SGD can converge faster than centralized mini-batch SGD. Technically, we novelly reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components that accelerate convergence. This work challenges the common belief that decentralized learning generalizes poorly under data heterogeneity and limited communication, while offering new insights into model merging and neural network loss landscapes.

nan


Article 364

Title@2025-07-09 (3): Attribution Regularization for Multimodal Paradigms

Title: Attribution Regularization for Multimodal Paradigms Zuordnungsregularisierung für multimodale Paradigmen 多式联运标准规范化 2404.02359v2

Authors (4): Sahiti Yerramilli, Jayant Sravan Tamarapalli, Jonathan Francis, Eric Nyberg

Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dominates the decision-making process, resulting in suboptimal performance. This research project aims to address these challenges by proposing a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions. The focus of this project lies in the video-audio domain, although the proposed regularization technique holds promise for broader applications in embodied AI research, where multiple modalities are involved. By leveraging this regularization term, the proposed approach aims to mitigate the issue of unimodal dominance and improve the performance of multimodal machine learning systems. Through extensive experimentation and evaluation, the effectiveness and generalizability of the proposed technique will be assessed. The findings of this research project have the potential to significantly contribute to the advancement of multimodal machine learning and facilitate its application in various domains, including multimedia analysis, human-computer interaction, and embodied AI research.

nan


Article 365

Title@2025-07-09 (3): Graph-based Fake Account Detection: A Survey

Title: Graph-based Fake Account Detection: A Survey Grafische Fake Account Detection: Eine Umfrage 以图表为基础的假账户探测:调查 2507.06541v1

Authors (2): Ali Safarpoor Dehkordi, Ahad N. Zehmakan

In recent years, there has been a growing effort to develop effective and efficient algorithms for fake account detection in online social networks. This survey comprehensively reviews existing methods, with a focus on graph-based techniques that utilise topological features of social graphs (in addition to account information, such as their shared contents and profile data) to distinguish between fake and real accounts. We provide several categorisations of these methods (for example, based on techniques used, input data, and detection time), discuss their strengths and limitations, and explain how these methods connect in the broader context. We also investigate the available datasets, including both real-world data and synthesised models. We conclude the paper by proposing several potential avenues for future research.

nan


Article 366

Title@2025-07-09 (3): Understanding Malware Propagation Dynamics through Scientific Machine Learning

Title: Understanding Malware Propagation Dynamics through Scientific Machine Learning Malware-Propagation-Dynamik durch wissenschaftliches maschinelles Lernen verstehen 通过科学机器学习了解恶意传播动态 2507.07143v1

Authors (5): Karthik Pappu, Prathamesh Dinesh Joshi, Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat

Accurately modeling malware propagation is essential for designing effective cybersecurity defenses, particularly against adaptive threats that evolve in real time. While traditional epidemiological models and recent neural approaches offer useful foundations, they often fail to fully capture the nonlinear feedback mechanisms present in real-world networks. In this work, we apply scientific machine learning to malware modeling by evaluating three approaches: classical Ordinary Differential Equations (ODEs), Universal Differential Equations (UDEs), and Neural ODEs. Using data from the Code Red worm outbreak, we show that the UDE approach substantially reduces prediction error compared to both traditional and neural baselines by 44%, while preserving interpretability. We introduce a symbolic recovery method that transforms the learned neural feedback into explicit mathematical expressions, revealing suppression mechanisms such as network saturation, security response, and malware variant evolution. Our results demonstrate that hybrid physics-informed models can outperform both purely analytical and purely neural approaches, offering improved predictive accuracy and deeper insight into the dynamics of malware spread. These findings support the development of early warning systems, efficient outbreak response strategies, and targeted cyber defense interventions.

nan


Article 367

Title@2025-07-09 (3): Few-shot Learning on AMS Circuits and Its Application to Parasitic Capacitance Prediction

Title: Few-shot Learning on AMS Circuits and Its Application to Parasitic Capacitance Prediction Nur wenige Fotos auf AMS-Schaltungen und seine Anwendung auf parasitäre Kapazität Vorhersage 关于AMS电路及其在寄生虫能力预测中的应用的微小学习 2507.06538v1

Authors (4): Shan Shen, Yibin Zhang, Hector Rodriguez Rodriguez, Wenjian Yu

Graph representation learning is a powerful method to extract features from graph-structured data, such as analog/mixed-signal (AMS) circuits. However, training deep learning models for AMS designs is severely limited by the scarcity of integrated circuit design data. In this work, we present CircuitGPS, a few-shot learning method for parasitic effect prediction in AMS circuits. The circuit netlist is represented as a heterogeneous graph, with the coupling capacitance modeled as a link. CircuitGPS is pre-trained on link prediction and fine-tuned on edge regression. The proposed method starts with a small-hop sampling technique that converts a link or a node into a subgraph. Then, the subgraph embeddings are learned with a hybrid graph Transformer. Additionally, CircuitGPS integrates a low-cost positional encoding that summarizes the positional and structural information of the sampled subgraph. CircuitGPS improves the accuracy of coupling existence by at least 20\% and reduces the MAE of capacitance estimation by at least 0.067 compared to existing methods. Our method demonstrates strong inherent scalability, enabling direct application to diverse AMS circuit designs through zero-shot learning. Furthermore, the ablation studies provide valuable insights into graph models for representation learning.

nan


Article 368

Title@2025-07-09 (3): Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits

Title: Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits Übertragbare parasitäre Abschätzung über Graph Contrastive Learning und Label Rebalancing in AMS Circuits AMS 电路中通过图表对比学习和标签重新平衡进行可转移的寄生虫估计 2507.06535v1

Authors (7): Shan Shen, Shenglu Hua, Jiajun Zou, Jiawei Liu, Jianwang Zhai, Chuan Shi, Wenjian Yu

Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (MSE) and softmax cross-entropy (bsmCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at \href{https://anonymous.4open.science/r/CircuitGCL-099B/README.md}{here}.

nan


Article 369

Title@2025-07-09 (3): Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+

Title: Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+ Medizinische Bildsegmentierung mit Advanced Unet: VMSE-Unet und VM-Unet CBAM+ 利用高级Unet:VMSE-Unet和VM-Unet CBAM+ 2507.00511v2

Authors (6): Sayandeep Kanrar, Raja Piyush, Qaiser Razi, Debanshi Chakraborty, Vikas Hassija, GSS Chalapathi

In this paper, we present the VMSE U-Net and VM-Unet CBAM+ model, two cutting-edge deep learning architectures designed to enhance medical image segmentation. Our approach integrates Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) techniques into the traditional VM U-Net framework, significantly improving segmentation accuracy, feature localization, and computational efficiency. Both models show superior performance compared to the baseline VM-Unet across multiple datasets. Notably, VMSEUnet achieves the highest accuracy, IoU, precision, and recall while maintaining low loss values. It also exhibits exceptional computational efficiency with faster inference times and lower memory usage on both GPU and CPU. Overall, the study suggests that the enhanced architecture VMSE-Unet is a valuable tool for medical image analysis. These findings highlight its potential for real-world clinical applications, emphasizing the importance of further research to optimize accuracy, robustness, and computational efficiency.

nan


Article 370

Title@2025-07-09 (3): From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions

Title: From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions Von Big-eddy-Simulationen bis zum Deep Learning: Ein U-Net-Modell für schnelle urbane Baldachinflussprognosen 从大成模拟到深学习:快速城市树冠流预测的U-net模型 2507.06533v1

Authors (2): Themistoklis Vargiemezis, Catherine Gorlé

Accurate prediction of wind flow fields in urban canopies is crucial for ensuring pedestrian comfort, safety, and sustainable urban design. Traditional methods using wind tunnels and Computational Fluid Dynamics, such as Large-Eddy Simulations (LES), are limited by high costs, computational demands, and time requirements. This study presents a deep neural network (DNN) approach for fast and accurate predictions of urban wind flow fields, reducing computation time from an order of 10 hours on 32 CPUs for one LES evaluation to an order of 1 second on a single GPU using the DNN model. We employ a U-Net architecture trained on LES data including 252 synthetic urban configurations at seven wind directions ($0^{o}$ to $90^{o}$ in $15^{o}$ increments). The model predicts two key quantities of interest: mean velocity magnitude and streamwise turbulence intensity, at multiple heights within the urban canopy. The U-net uses 2D building representations augmented with signed distance functions and their gradients as inputs, forming a $256\times256\times9$ tensor. In addition, a Spatial Attention Module is used for feature transfer through skip connections. The loss function combines the root-mean-square error of predictions, their gradient magnitudes, and L2 regularization. Model evaluation on 50 test cases demonstrates high accuracy with an overall mean relative error of 9.3% for velocity magnitude and 5.2% for turbulence intensity. This research shows the potential of deep learning approaches to provide fast, accurate urban wind assessments essential for creating comfortable and safe urban environments. Code is available at https://github.com/tvarg/Urban-FlowUnet.git

nan


Article 371

Title@2025-07-09 (3): A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Title: A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence Ein Politik-Gradient-Ansatz zur Lösung von Unvollkommenheits-Informationsspielen mit bestmöglicher Konvergenz 以政策优先方式解决最不完美信息游戏与最佳战略汇合问题 2408.00751v2

Authors (3): Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox, due to their combination of desirable properties: iterate convergence, efficient use of stochastic trajectory feedback, and theoretically-sound avoidance of importance sampling corrections. In multi-agent imperfect-information settings (extensive-form games), however, it is still unknown whether the same desiderata can be guaranteed while retaining theoretical guarantees. Instead, sound methods for extensive-form games rely on approximating \emph{counterfactual} values (as opposed to Q values), which are incompatible with policy gradient methodologies. In this paper, we investigate whether policy gradient can be safely used in two-player zero-sum imperfect-information extensive-form games (EFGs). We establish positive results, showing for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play.

nan


Article 372

Title@2025-07-09 (3): Direct Regret Optimization in Bayesian Optimization

Title: Direct Regret Optimization in Bayesian Optimization Direkte Bedauern Optimierung in Bayesian Optimierung Bayesian优化的直接遗憾优化 2507.06529v1

Authors (2): Fengxue Zhang, Yuxin Chen

Bayesian optimization (BO) is a powerful paradigm for optimizing expensive black-box functions. Traditional BO methods typically rely on separate hand-crafted acquisition functions and surrogate models for the underlying function, and often operate in a myopic manner. In this paper, we propose a novel direct regret optimization approach that jointly learns the optimal model and non-myopic acquisition by distilling from a set of candidate models and acquisitions, and explicitly targets minimizing the multi-step regret. Our framework leverages an ensemble of Gaussian Processes (GPs) with varying hyperparameters to generate simulated BO trajectories, each guided by an acquisition function chosen from a pool of conventional choices, until a Bayesian early stop criterion is met. These simulated trajectories, capturing multi-step exploration strategies, are used to train an end-to-end decision transformer that directly learns to select next query points aimed at improving the ultimate objective. We further adopt a dense training–sparse learning paradigm: The decision transformer is trained offline with abundant simulated data sampled from ensemble GPs and acquisitions, while a limited number of real evaluations refine the GPs online. Experimental results on synthetic and real-world benchmarks suggest that our method consistently outperforms BO baselines, achieving lower simple regret and demonstrating more robust exploration in high-dimensional or noisy settings.

nan


Article 373

Title@2025-07-09 (3): InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior

Title: InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior InvestAlign: Überwindung von Datenknappheit bei der Ausrichtung großer Sprachmodelle mit Entscheidungsprozessen von Investoren unter Herdverhalten 投资对称:克服在将大语言模式与牧民行为下的投资者决策过程相匹配方面的数据稀缺性 2507.06528v1

Authors (6): Huisheng Wang, Zhuoshi Pan, Hangjing Zhang, Mingxiao Liu, Hanqing Gao, H. Vicky Zhao

Aligning Large Language Models (LLMs) with investor decision-making processes under herd behavior is a critical challenge in behavioral finance, which grapples with a fundamental limitation: the scarcity of real-user data needed for Supervised Fine-Tuning (SFT). While SFT can bridge the gap between LLM outputs and human behavioral patterns, its reliance on massive authentic data imposes substantial collection costs and privacy risks. We propose InvestAlign, a novel framework that constructs high-quality SFT datasets by leveraging theoretical solutions to similar and simple optimal investment problems rather than complex scenarios. Our theoretical analysis demonstrates that training LLMs with InvestAlign-generated data achieves faster parameter convergence than using real-user data, suggesting superior learning efficiency. Furthermore, we develop InvestAgent, an LLM agent fine-tuned with InvestAlign, which demonstrates significantly closer alignment to real-user data than pre-SFT models in both simple and complex investment problems. This highlights our proposed InvestAlign as a promising approach with the potential to address complex optimal investment problems and align LLMs with investor decision-making processes under herd behavior. Our code is publicly available at https://github.com/thu-social-network-research-group/InvestAlign.

nan


Article 374

Title@2025-07-09 (3): Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Title: Emergence in non-neural models: grokking modular arithmetic via average gradient outer product Entstehung in nicht-neuralen Modellen: Grokking modulare Arithmetik über durchschnittliche Gradienten-Außenprodukt 非新模式的出现:通过平均梯度外产产品计算模块计算 2407.20199v3

Authors (6): Neil Mallinar, Daniel Beaglehole, Libin Zhu, Adityanarayanan Radhakrishnan, Parthe Pandit, Mikhail Belkin

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of “emergence”, where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with general machine learning models. When used in conjunction with kernel machines, iterating RFM results in a fast transition from random, near zero, test accuracy to perfect test accuracy. This transition cannot be predicted from the training loss, which is identically zero, nor from the test loss, which remains constant in initial iterations. Instead, as we show, the transition is completely determined by feature learning: RFM gradually learns block-circulant features to solve modular arithmetic. Paralleling the results for RFM, we show that neural networks that solve modular arithmetic also learn block-circulant features. Furthermore, we present theoretical evidence that RFM uses such block-circulant features to implement the Fourier Multiplication Algorithm, which prior work posited as the generalizing solution neural networks learn on these tasks. Our results demonstrate that emergence can result purely from learning task-relevant features and is not specific to neural architectures nor gradient descent-based optimization methods. Furthermore, our work provides more evidence for AGOP as a key mechanism for feature learning in neural networks.

nan


Article 375

Title@2025-07-09 (3): The Power of Regularization in Solving Extensive-Form Games

Title: The Power of Regularization in Solving Extensive-Form Games Die Macht der Regularisierung bei der Lösung umfangreicher Form Spiele 解决广泛形式运动会的正规化权力 2206.09495v3

Authors (4): Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ {last-iterate convergence rate for the output of the algorithm} in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.

nan


Article 376

Title@2025-07-09 (3): AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks

Title: AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks AdaDPIGU: Differential Private SGD mit adaptivem Clipping und wichtigen Gradienten-Updates für tiefe neurale Netzwerke AdaDPIGU: 与深神经网络的适应性滑动和基于重要性的梯度更新有区别的私人SGD SGD 2507.06525v1

Authors (2): Huiqi Zhang, Fang Xie

Differential privacy has been proven effective for stochastic gradient descent; however, existing methods often suffer from performance degradation in high-dimensional settings, as the scale of injected noise increases with dimensionality. To tackle this challenge, we propose AdaDPIGU–a new differentially private SGD framework with importance-based gradient updates tailored for deep neural networks. In the pretraining stage, we apply a differentially private Gaussian mechanism to estimate the importance of each parameter while preserving privacy. During the gradient update phase, we prune low-importance coordinates and introduce a coordinate-wise adaptive clipping mechanism, enabling sparse and noise-efficient gradient updates. Theoretically, we prove that AdaDPIGU satisfies $(\varepsilon, \delta)$-differential privacy and retains convergence guarantees. Extensive experiments on standard benchmarks validate the effectiveness of AdaDPIGU. All results are reported under a fixed retention ratio of 60%. On MNIST, our method achieves a test accuracy of 99.12% under a privacy budget of $\epsilon = 8$, nearly matching the non-private model. Remarkably, on CIFAR-10, it attains 73.21% accuracy at $\epsilon = 4$, outperforming the non-private baseline of 71.12%, demonstrating that adaptive sparsification can enhance both privacy and utility.

nan


Article 377

Title@2025-07-09 (3): Str-GCL: Structural Commonsense Driven Graph Contrastive Learning

Title: Str-GCL: Structural Commonsense Driven Graph Contrastive Learning Str-GCL: Structural Commonsense Driven Graph Contrastive Learning Str-GCL:结构常识驱动图对比性学习 2507.07141v1

Authors (5): Dongxiao He, Yongqi Huang, Jitao Zhao, Xiaobao Wang, Zhen Wang

Graph Contrastive Learning (GCL) is a widely adopted approach in self-supervised graph representation learning, applying contrastive objectives to produce effective representations. However, current GCL methods primarily focus on capturing implicit semantic relationships, often overlooking the structural commonsense embedded within the graph’s structure and attributes, which contains underlying knowledge crucial for effective representation learning. Due to the lack of explicit information and clear guidance in general graph, identifying and integrating such structural commonsense in GCL poses a significant challenge. To address this gap, we propose a novel framework called Structural Commonsense Unveiling in Graph Contrastive Learning (Str-GCL). Str-GCL leverages first-order logic rules to represent structural commonsense and explicitly integrates them into the GCL framework. It introduces topological and attribute-based rules without altering the original graph and employs a representation alignment mechanism to guide the encoder in effectively capturing this commonsense. To the best of our knowledge, this is the first attempt to directly incorporate structural commonsense into GCL. Extensive experiments demonstrate that Str-GCL outperforms existing GCL methods, providing a new perspective on leveraging structural commonsense in graph representation learning.

nan


Article 378

Title@2025-07-09 (3): Instance-Wise Monotonic Calibration by Constrained Transformation

Title: Instance-Wise Monotonic Calibration by Constrained Transformation Instance-Wise Monotonische Kalibrierung durch eingeschränkte Transformation 由受控变换产生的实度- Wise 单声调校准 2507.06516v1

Authors (3): Yunrui Zhang, Gustavo Batista, Salil S. Kanhere

Deep neural networks often produce miscalibrated probability estimates, leading to overconfident predictions. A common approach for calibration is fitting a post-hoc calibration map on unseen validation data that transforms predicted probabilities. A key desirable property of the calibration map is instance-wise monotonicity (i.e., preserving the ranking of probability outputs). However, most existing post-hoc calibration methods do not guarantee monotonicity. Previous monotonic approaches either use an under-parameterized calibration map with limited expressive ability or rely on black-box neural networks, which lack interpretability and robustness. In this paper, we propose a family of novel monotonic post-hoc calibration methods, which employs a constrained calibration map parameterized linearly with respect to the number of classes. Our proposed approach ensures expressiveness, robustness, and interpretability while preserving the relative ordering of the probability output by formulating the proposed calibration map as a constrained optimization problem. Our proposed methods achieve state-of-the-art performance across datasets with different deep neural network models, outperforming existing calibration methods while being data and computation-efficient. Our code is available at https://github.com/YunruiZhang/Calibration-by-Constrained-Transformation

nan


Article 379

Title@2025-07-09 (3): Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Title: Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts Erforschen von Sparse Adaptern für skalierbare Zusammenführung von Parameter-Effizienten Experten 探索可缩放的参数集成高效专家的分散适配器 2507.07140v1

Authors (9): Samin Yeasar Arnob, Zhan Su, Minseon Kim, Oleksiy Ostapenko, Riyasat Ohib, Esra’a Saleh, Doina Precup, Lucas Caccia, Alessandro Sordoni

Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks, thus scaling beyond what is usually studied in the literature. Our findings demonstrate that sparse adapters yield superior in-distribution performance post-merging compared to LoRA or full model merging. Achieving strong held-out performance remains a challenge for all methods considered.

nan


Article 380

Title@2025-07-09 (3): UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Title: UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models UniF$^2$ace: Feinkörniges Gesichtsverständnis und -generierung mit einheitlichen multimodalen Modellen 2美元 UUF$$2美元ce:精美的面孔理解和具有统一多式联运模式的一代人 2503.08120v3

Authors (8): Junzhe Li, Xuerui Qiu, Linrui Xu, Liya Guo, Delin Qu, Tingting Long, Chun Fan, Ming Li

Unified multimodal models (UMMs) have emerged as a powerful paradigm in foundational computer vision research, demonstrating significant potential in both image understanding and generation. However, existing research in the face domain primarily focuses on $\textbf{coarse}$ facial attribute understanding, with limited capacity to handle $\textbf{fine-grained}$ facial attributes and without addressing generation capabilities. To overcome these limitations, we propose UniF$^2$ace, the first UMM tailored specifically for fine-grained face understanding and generation. In general, we train UniF$^2$ace on a self-constructed, specialized dataset utilizing two mutually beneficial diffusion techniques and a two-level mixture-of-experts architecture. Specifically, we first build a large-scale facial dataset, UniF$^2$ace-130K, which contains 130K image-text pairs with one million question-answering pairs that span a wide range of facial attributes. Second, we establish a theoretical connection between discrete diffusion score matching and masked generative models, optimizing both evidence lower bounds simultaneously, which significantly improves the model’s ability to synthesize facial details. Finally, we introduce both token-level and sequence-level mixture-of-experts, enabling efficient fine-grained representation learning for both understanding and generation tasks. Extensive experiments on UniF$^2$ace-130K demonstrate that UniF$^2$ace outperforms existing UMMs and generative models, achieving superior performance across both understanding and generation tasks.

nan


Article 381

Title@2025-07-09 (3): Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings

Title: Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings Pun Intended: Multi-Agent Übersetzung von Wordplay mit Kontrastivem Lernen und phonetisch-semantischen Einbettungen 未预期:多代理翻译与矛盾学习和语音-语言嵌入的文字游戏的多代理翻译 2507.06506v1

Authors (3): Russell Taylor, Benjamin Herbert, Michael Sana

Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology’s primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.

nan


Article 382

Title@2025-07-09 (3): GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Title: GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification GMLM: Überbrückung von Graph Neuronalen Netzwerken und Sprachmodellen für heterophile Knotenklassifikation GMLM: 血氧哲学节点分类的架接图形神经网络和语言模型 2503.05763v5

Authors (1): Aarush Sinha

Integrating powerful but computationally expensive Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) is a key challenge, especially on text-rich heterophilic graphs. We propose the Graph Masked Language Model (GMLM), a framework designed for the efficient and effective fusion of graph structure and text semantics. GMLM employs a two-stage process: first, a contrastive pre-training stage with a novel soft masking technique builds a robust multi-scale GNN; second, an end-to-end fine-tuning stage uses a dynamic active node selection strategy for scalability and a bi-directional cross-attention module for deep fusion. Experiments on five heterophilic benchmarks show GMLM achieves state-of-the-art results on four, significantly outperforming prior GNN and large LLM-based methods. For instance, it improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. Our work demonstrates that a sophisticated, deeply-integrated architecture can be more effective and efficient than larger, general-purpose models for text-rich graph representation learning.

nan


Article 383

Title@2025-07-09 (3): Oscillation-Reduced MXFP4 Training for Vision Transformers

Title: Oscillation-Reduced MXFP4 Training for Vision Transformers Oszillationsreduziertes MXFP4 Training für Vision Transformer MXFP4 愿景变形器培训 2502.20853v2

Authors (4): Yuxiang Chen, Haocheng Xi, Jun Zhu, Jianfei Chen

Pre-training Transformers in FP4 precision is becoming a promising approach to gain substantial speedup, but it comes with a considerable loss of accuracy. Microscaling (MX) data format provides a fine-grained per-group quantization method to improve the representation ability of the FP4 format and is supported by the next-generation Blackwell GPU architecture. However, training with MXFP4 data format still results in significant degradation and there is a lack of systematic research on the reason. In this work, we propose a novel training method TetraJet for a more accurate FP4 training. We comprehensively evaluate all of the quantizers involved in the training, and identify the weight oscillation problem in the forward pass as the main source of the degradation in MXFP4 training. Therefore, we introduce two novel methods, EMA Quantizer (Q-EMA) and Adaptive Ramping Optimizer (Q-Ramping), to resolve the oscillation problem. Extensive experiments on Vision Transformers demonstrate that TetraJet consistently outperforms the existing 4-bit training methods, and Q-EMA & Q-Ramping can provide additional enhancement by effectively reducing oscillation. We decreased the accuracy degradation by more than $50\%$ compared to the baseline, and can even achieve competitive performance compared to full precision training. The codes are available at https://github.com/thu-ml/TetraJet-MXFP4Training

nan


Article 384

Title@2025-07-09 (3): MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models

Title: MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models MoFE-Time: Mischung von Frequenzdomänenexperten für Zeitreihenprognosemodelle MOFE-时间:时间-系列预测模型频率域专家混合 2507.06502v1

Authors (9): Yiwen Liu, Chenyu Zhang, Junjie Song, Siqi Chen, Sun Yin, Zihan Wang, Lingming Zeng, Yuji Cao, Junming Jiao

As a prominent data modality task, time series forecasting plays a pivotal role in diverse applications. With the remarkable advancements in Large Language Models (LLMs), the adoption of LLMs as the foundational architecture for time series modeling has gained significant attention. Although existing models achieve some success, they rarely both model time and frequency characteristics in a pretraining-finetuning paradigm leading to suboptimal performance in predictions of complex time series, which requires both modeling periodicity and prior pattern knowledge of signals. We propose MoFE-Time, an innovative time series forecasting model that integrates time and frequency domain features within a Mixture of Experts (MoE) network. Moreover, we use the pretraining-finetuning paradigm as our training framework to effectively transfer prior pattern knowledge across pretraining and finetuning datasets with different periodicity distributions. Our method introduces both frequency and time cells as experts after attention modules and leverages the MoE routing mechanism to construct multidimensional sparse representations of input signals. In experiments on six public benchmarks, MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE. Beyond the existing evaluation benchmarks, we have developed a proprietary dataset, NEV-sales, derived from real-world business scenarios. Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications.

nan


Article 385

Title@2025-07-09 (3): Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

Title: Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning Bild kann Ihr Gedächtnis zurückbringen: Ein neuartiger Multi-Modal-Angriff gegen das Entlernen des Modells der Bildgenerierung 图像可以将您的记忆带回: 与图像生成模型重新学习相比的新创多模式向导攻击 2507.07139v1

Authors (4): Renyang Liu, Guanlin Li, Tianwei Zhang, See-Kiong Ng

Recent advances in image generation models (IGMs), particularly diffusion-based architectures such as Stable Diffusion (SD), have markedly enhanced the quality and diversity of AI-generated visual content. However, their generative capability has also raised significant ethical, legal, and societal concerns, including the potential to produce harmful, misleading, or copyright-infringing content. To mitigate these concerns, machine unlearning (MU) emerges as a promising solution by selectively removing undesirable concepts from pretrained models. Nevertheless, the robustness and effectiveness of existing unlearning techniques remain largely unexplored, particularly in the presence of multi-modal adversarial inputs. To bridge this gap, we propose Recall, a novel adversarial framework explicitly designed to compromise the robustness of unlearned IGMs. Unlike existing approaches that predominantly rely on adversarial text prompts, Recall exploits the intrinsic multi-modal conditioning capabilities of diffusion models by efficiently optimizing adversarial image prompts with guidance from a single semantically relevant reference image. Extensive experiments across ten state-of-the-art unlearning methods and diverse tasks show that Recall consistently outperforms existing baselines in terms of adversarial effectiveness, computational efficiency, and semantic fidelity with the original textual prompt. These findings reveal critical vulnerabilities in current unlearning mechanisms and underscore the need for more robust solutions to ensure the safety and reliability of generative models. Code and data are publicly available at \textcolor{blue}{https://github.com/ryliu68/RECALL}.

nan


Article 386

Title@2025-07-09 (3): Terrier: A Deep Learning Repeat Classifier

Title: Terrier: A Deep Learning Repeat Classifier Terrier: Ein Deep Learning Repeat Klassifikator Terrier: 深学习重复分类 2503.09312v2

Authors (5): Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, Tiffany A. Kosch

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families – four times more than Dfam – Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

nan


Article 387

Title@2025-07-09 (3): On the Inherent Privacy of Zeroth Order Projected Gradient Descent

Title: On the Inherent Privacy of Zeroth Order Projected Gradient Descent Auf der inhärenten Privatsphäre der Zeroth Order projiziert Gradient Descent 关于零分顺序的固有隐私 预测的渐变后裔 2507.05610v2

Authors (3): Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan

Differentially private zeroth-order optimization methods have recently gained popularity in private fine tuning of machine learning models due to their reduced memory requirements. Current approaches for privatizing zeroth-order methods rely on adding Gaussian noise to the estimated zeroth-order gradients. However, since the search direction in the zeroth-order methods is inherently random, researchers including Tang et al. (2024) and Zhang et al. (2024a) have raised an important question: is the inherent noise in zeroth-order estimators sufficient to ensure the overall differential privacy of the algorithm? This work settles this question for a class of oracle-based optimization algorithms where the oracle returns zeroth-order gradient estimates. In particular, we show that for a fixed initialization, there exist strongly convex objective functions such that running (Projected) Zeroth-Order Gradient Descent (ZO-GD) is not differentially private. Furthermore, we show that even with random initialization and without revealing (initial and) intermediate iterates, the privacy loss in ZO-GD can grow superlinearly with the number of iterations when minimizing convex objective functions.

nan


Article 388

Title@2025-07-09 (3): Proximal Oracles for Optimization and Sampling

Title: Proximal Oracles for Optimization and Sampling Proximale Oracles für Optimierung und Probenahme 最佳化和取样的近似甲骨文 2404.02239v2

Authors (2): Jiaming Liang, Yongxin Chen

We consider convex optimization with non-smooth objective function and log-concave sampling with non-smooth potential (negative log density). In particular, we study two specific settings where the convex objective/potential function is either H"older smooth or in hybrid form as the finite sum of H"older smooth components. To overcome the challenges caused by non-smoothness, our algorithms employ two powerful proximal frameworks in optimization and sampling: the proximal point framework for optimization and the alternating sampling framework (ASF) that uses Gibbs sampling on an augmented distribution. A key component of both optimization and sampling algorithms is the efficient implementation of the proximal map by the regularized cutting-plane method. We establish its iteration-complexity under both H"older smoothness and hybrid settings using novel convergence analysis, yielding results that are new to the literature. We further propose an adaptive proximal bundle method for non-smooth optimization that employs an aggressive adaptive stepsize strategy, which adjusts stepsizes only when necessary and never rejects iterates. The proposed method is universal since it does not need any problem parameters as input. Additionally, we provide an exact implementation of a proximal sampling oracle, analogous to the proximal map in optimization, along with simple complexity analyses for both the H"older smooth and hybrid cases, using a novel technique based on a modified Gaussian integral. Finally, we combine this proximal sampling oracle and ASF to obtain a Markov chain Monte Carlo method with non-asymptotic complexity bounds for sampling in H"older smooth and hybrid settings.

nan


Article 389

Title@2025-07-09 (3): FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning

Title: FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning FedDifRC: Entsperren des Potenzials von Text-zu-Bild-Diffusionsmodellen im Heterogenen Federated Learning FedDifRC:在异质联邦学习中释放文本到图像传播模型的潜力 2507.06482v1

Authors (6): Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Jiahua Shi, Jun Shen

Federated learning aims at training models collaboratively across participants while protecting privacy. However, one major challenge for this paradigm is the data heterogeneity issue, where biased data preferences across multiple clients, harming the model’s convergence and performance. In this paper, we first introduce powerful diffusion models into the federated learning paradigm and show that diffusion representations are effective steers during federated training. To explore the possibility of using diffusion representations in handling data heterogeneity, we propose a novel diffusion-inspired Federated paradigm with Diffusion Representation Collaboration, termed FedDifRC, leveraging meaningful guidance of diffusion models to mitigate data heterogeneity. The key idea is to construct text-driven diffusion contrasting and noise-driven diffusion regularization, aiming to provide abundant class-related semantic information and consistent convergence signals. On the one hand, we exploit the conditional feedback from the diffusion model for different text prompts to build a text-driven contrastive learning strategy. On the other hand, we introduce a noise-driven consistency regularization to align local instances with diffusion denoising representations, constraining the optimization region in the feature space. In addition, FedDifRC can be extended to a self-supervised scheme without relying on any labeled data. We also provide a theoretical analysis for FedDifRC to ensure convergence under non-convex objectives. The experiments on different scenarios validate the effectiveness of FedDifRC and the efficiency of crucial components.

nan


Article 390

Title@2025-07-09 (3): Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity

Title: Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity Generative lagrangianische Datenassimilation für Ozeandynamik unter extremer Sparsamkeit 产生拉格朗加数据同化,用于极端零星条件下的海洋动态 2507.06479v1

Authors (5): Niloofar Asefi, Leonard Lupin-Jimenez, Tianning Wu, Ruoying He, Ashesh Chattopadhyay

Reconstructing ocean dynamics from observational data is fundamentally limited by the sparse, irregular, and Lagrangian nature of spatial sampling, particularly in subsurface and remote regions. This sparsity poses significant challenges for forecasting key phenomena such as eddy shedding and rogue waves. Traditional data assimilation methods and deep learning models often struggle to recover mesoscale turbulence under such constraints. We leverage a deep learning framework that combines neural operators with denoising diffusion probabilistic models (DDPMs) to reconstruct high-resolution ocean states from extremely sparse Lagrangian observations. By conditioning the generative model on neural operator outputs, the framework accurately captures small-scale, high-wavenumber dynamics even at $99\%$ sparsity (for synthetic data) and $99.9\%$ sparsity (for real satellite observations). We validate our method on benchmark systems, synthetic float observations, and real satellite data, demonstrating robust performance under severe spatial sampling limitations as compared to other deep learning baselines.

nan


Article 391

Title: GNNs Meet Sequence Models Along the Shortest-Path: an Expressive Method for Link Prediction GNNs treffen auf Sequenzmodelle entlang des kürzesten Paths: eine Expressive Methode zur Link Prediction GNNs在最短的平面上遇到序列模型:联系预测的表达方法 2507.07138v1

Authors (5): Francesco Ferrini, Veronica Lachi, Antonio Longa, Bruno Lepri, Andrea Passerini

Graph Neural Networks (GNNs) often struggle to capture the link-specific structural patterns crucial for accurate link prediction, as their node-centric message-passing schemes overlook the subgraph structures connecting a pair of nodes. Existing methods to inject such structural context either incur high computational cost or rely on simplistic heuristics (e.g., common neighbor counts) that fail to model multi-hop dependencies. We introduce SP4LP (Shortest Path for Link Prediction), a novel framework that combines GNN-based node encodings with sequence modeling over shortest paths. Specifically, SP4LP first applies a GNN to compute representations for all nodes, then extracts the shortest path between each candidate node pair and processes the resulting sequence of node embeddings using a sequence model. This design enables SP4LP to capture expressive multi-hop relational patterns with computational efficiency. Empirically, SP4LP achieves state-of-the-art performance across link prediction benchmarks. Theoretically, we prove that SP4LP is strictly more expressive than standard message-passing GNNs and several state-of-the-art structural features methods, establishing it as a general and principled approach for link prediction in graphs.

nan


Article 392

Title@2025-07-09 (3): Stochastic Alignments: Matching an Observed Trace to Stochastic Process Models

Title: Stochastic Alignments: Matching an Observed Trace to Stochastic Process Models Stochastische Ausrichtungen: Passend zu einem beobachteten Pfad zu stochastischen Prozessmodellen 斯托卡对齐: 将观察到的路径与斯托卡过程模型匹配 2507.06472v1

Authors (3): Tian Li, Artem Polyvyanyy, Sander J. J. Leemans

Process mining leverages event data extracted from IT systems to generate insights into the business processes of organizations. Such insights benefit from explicitly considering the frequency of behavior in business processes, which is captured by stochastic process models. Given an observed trace and a stochastic process model, conventional alignment-based conformance checking techniques face a fundamental limitation: They prioritize matching the trace to a model path with minimal deviations, which may, however, lead to selecting an unlikely path. In this paper, we study the problem of matching an observed trace to a stochastic process model by identifying a likely model path with a low edit distance to the trace. We phrase this as an optimization problem and develop a heuristic-guided path-finding algorithm to solve it. Our open-source implementation demonstrates the feasibility of the approach and shows that it can provide new, useful diagnostic insights for analysts.

nan


Article 393

Title@2025-07-09 (3): MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters

Title: MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters MetaOptimize: Ein Framework zur Optimierung von Schrittgrößen und anderen Metaparametern 模型优化:优化步步尺寸和其他元参数的框架 2402.02342v6

Authors (3): Arsalan Sharifnassab, Saber Salehkaleybar, Richard Sutton

We address the challenge of optimizing meta-parameters (hyperparameters) in machine learning, a key factor for efficient training and high model performance. Rather than relying on expensive meta-parameter search methods, we introduce MetaOptimize: a dynamic approach that adjusts meta-parameters, particularly step sizes (also known as learning rates), during training. More specifically, MetaOptimize can wrap around any first-order optimization algorithm, tuning step sizes on the fly to minimize a specific form of regret that considers the long-term impact of step sizes on training, through a discounted sum of future losses. We also introduce lower-complexity variants of MetaOptimize that, in conjunction with its adaptability to various optimization algorithms, achieve performance comparable to those of the best hand-crafted learning rate schedules across diverse machine learning tasks.

nan


Article 394

Title@2025-07-09 (3): Mitigating Message Imbalance in Fraud Detection with Dual-View Graph Representation Learning

Title: Mitigating Message Imbalance in Fraud Detection with Dual-View Graph Representation Learning Message Imbalance bei Betrugserkennung mit Dual-View Graph Representation Learning abmildern 利用双视图代表制学习减少欺诈检测中的不平衡现象 2507.06469v1

Authors (9): Yudan Song, Yuecen Wei, Yuhang Lu, Qingyun Sun, Minglai Shao, Li-e Wang, Chunming Hu, Xianxian Li, Xingcheng Fu

Graph representation learning has become a mainstream method for fraud detection due to its strong expressive power, which focuses on enhancing node representations through improved neighborhood knowledge capture. However, the focus on local interactions leads to imbalanced transmission of global topological information and increased risk of node-specific information being overwhelmed during aggregation due to the imbalance between fraud and benign nodes. In this paper, we first summarize the impact of topology and class imbalance on downstream tasks in GNN-based fraud detection, as the problem of imbalanced supervisory messages is caused by fraudsters’ topological behavior obfuscation and identity feature concealment. Based on statistical validation, we propose a novel dual-view graph representation learning method to mitigate Message imbalance in Fraud Detection(MimbFD). Specifically, we design a topological message reachability module for high-quality node representation learning to penetrate fraudsters’ camouflage and alleviate insufficient propagation. Then, we introduce a local confounding debiasing module to adjust node representations, enhancing the stable association between node representations and labels to balance the influence of different classes. Finally, we conducted experiments on three public fraud datasets, and the results demonstrate that MimbFD exhibits outstanding performance in fraud detection.

nan


Article 395

Title@2025-07-09 (3): Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models

Title: Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Foundation Model Self-Play: Open-Ended Strategy Innovation über Foundation Models 基金会示范自发挥作用:通过基金会模式进行开放式战略创新 2507.06466v1

Authors (3): Aaron Dharna, Cong Lu, Jeff Clune

Multi-agent interactions have long fueled innovation, from natural predator-prey dynamics to the space race. Self-play (SP) algorithms try to harness these dynamics by pitting agents against ever-improving opponents, thereby creating an implicit curriculum toward learning high-quality solutions. However, SP often fails to produce diverse solutions and can get stuck in locally optimal behaviors. We introduce Foundation-Model Self-Play (FMSP), a new direction that leverages the code-generation capabilities and vast knowledge of foundation models (FMs) to overcome these challenges by leaping across local optima in policy space. We propose a family of approaches: (1) \textbf{Vanilla Foundation-Model Self-Play (vFMSP)} continually refines agent policies via competitive self-play; (2) \textbf{Novelty-Search Self-Play (NSSP)} builds a diverse population of strategies, ignoring performance; and (3) the most promising variant, \textbf{Quality-Diveristy Self-Play (QDSP)}, creates a diverse set of high-quality policies by combining the diversity of NSSP and refinement of vFMSP. We evaluate FMSPs in Car Tag, a continuous-control pursuer-evader setting, and in Gandalf, a simple AI safety simulation in which an attacker tries to jailbreak an LLM’s defenses. In Car Tag, FMSPs explore a wide variety of reinforcement learning, tree search, and heuristic-based methods, to name just a few. In terms of discovered policy quality, \ouralgo and vFMSP surpass strong human-designed strategies. In Gandalf, FMSPs can successfully automatically red-team an LLM, breaking through and jailbreaking six different, progressively stronger levels of defense. Furthermore, FMSPs can automatically proceed to patch the discovered vulnerabilities. Overall, FMSPs represent a promising new research frontier of improving self-play with foundation models, opening fresh paths toward more creative and open-ended strategy discovery

nan


Article 396

Title@2025-07-09 (3): Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge

Title: Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge Automatisieren der Evaluation von Diffusion Model Unlearning mit (Vision-) Language Model World Knowledge 以(视觉)语言示范世界知识自动评价传播模式 2507.07137v1

Authors (5): Eric Yeats, Darryl Hannan, Henry Kvinge, Timothy Doster, Scott Mahan

Machine unlearning (MU) is a promising cost-effective method to cleanse undesired information (generated concepts, biases, or patterns) from foundational diffusion models. While MU is orders of magnitude less costly than retraining a diffusion model without the undesired information, it can be challenging and labor-intensive to prove that the information has been fully removed from the model. Moreover, MU can damage diffusion model performance on surrounding concepts that one would like to retain, making it unclear if the diffusion model is still fit for deployment. We introduce autoeval-dmun, an automated tool which leverages (vision-) language models to thoroughly assess unlearning in diffusion models. Given a target concept, autoeval-dmun extracts structured, relevant world knowledge from the language model to identify nearby concepts which are likely damaged by unlearning and to circumvent unlearning with adversarial prompts. We use our automated tool to evaluate popular diffusion model unlearning methods, revealing that language models (1) impose semantic orderings of nearby concepts which correlate well with unlearning damage and (2) effectively circumvent unlearning with synthetic adversarial prompts.

nan


Article 397

Title@2025-07-09 (3): SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam

Title: SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam SoftSignSGD(S3): Ein optimierter Optimierer für praktisches DNN Training und Verlust Spikes Minimierung jenseits Adams SoftSignSGD(S3):在亚当之后将实际的 DNN 培训和损失尖峰缩小到最小化的增强优化 2507.06464v1

Authors (6): Hanyang Peng, Shuang Qin, Yue Yu, Fangqing Jiang, Hui Wang, Wen Gao

Adam has proven remarkable successful in training deep neural networks, but the mechanisms underlying its empirical successes and limitations remain underexplored. In this study, we demonstrate that the effectiveness of Adam stems largely from its similarity to SignSGD in robustly handling large gradient fluctuations, yet it is also vulnerable to destabilizing loss spikes due to its uncontrolled update scaling. To enhance the advantage of Adam and mitigate its limitation, we propose SignSoftSGD (S3), a novel optimizer with three key innovations. \emph{First}, S3 generalizes the sign-like update by employing a flexible $p$-th order momentum ($p \geq 1$) in the denominator, departing from the conventional second-order momentum (variance) preconditioning. This design enables enhanced performance while achieving stable training even with aggressive learning rates. \emph{Second}, S3 minimizes the occurrences of loss spikes through unified exponential moving average coefficients for numerator and denominator momenta, which inherently bound updates to $[-1, 1]$ and simplify hyperparameter tuning. \emph{Third}, S3 incorporates an equivalent Nesterov’s accelerated gradient(NAG) module, accelerating convergence without memory overhead. Theoretically, we prove that S3 achieves the optimal convergence rate of $O\left(\frac{1}{T^{\sfrac{1}{4}}}\right)$ for general nonconvex stochastic optimization under weak assumptions. Extensive experiments across a range of vision and language tasks show that \textsf{\small S3} not only converges more rapidly and improves performance but also rarely experiences loss spikes, even with a \textbf{$\bm{10 \times}$} larger learning rate. In fact, S3 delivers performance comparable to or better than AdamW with \textbf{$2 \times$} the training steps, establishing its efficacy in both efficiency and final task performance.

nan


Article 398

Title@2025-07-09 (3): Cons-training Tensor Networks: Embedding and Optimization Over Discrete Linear Constraints

Title: Cons-training Tensor Networks: Embedding and Optimization Over Discrete Linear Constraints Cons-Training Tensor Networks: Einbetten und Optimieren über diskrete lineare Einschränkungen 内嵌和优化超过分辨线性限制 2405.09005v5

Authors (2): Javier Lopez-Piqueres, Jing Chen

In this study, we introduce a novel family of tensor networks, termed constrained matrix product states (MPS), designed to incorporate exactly arbitrary discrete linear constraints, including inequalities, into sparse block structures. These tensor networks are particularly tailored for modeling distributions with support strictly over the feasible space, offering benefits such as reducing the search space in optimization problems, alleviating overfitting, improving training efficiency, and decreasing model size. Central to our approach is the concept of a quantum region, an extension of quantum numbers traditionally used in U(1) symmetric tensor networks, adapted to capture any linear constraint, including the unconstrained scenario. We further develop a novel canonical form for these new MPS, which allow for the merging and factorization of tensor blocks according to quantum region fusion rules and permit optimal truncation schemes. Utilizing this canonical form, we apply an unsupervised training strategy to optimize arbitrary objective functions subject to discrete linear constraints. Our method’s efficacy is demonstrated by solving the quadratic knapsack problem, achieving superior performance compared to a leading nonlinear integer programming solver. Additionally, we analyze the complexity and scalability of our approach, demonstrating its potential in addressing complex constrained combinatorial optimization problems.

nan


Article 399

Title@2025-07-09 (3): Energy-Efficient Supervised Learning with a Binary Stochastic Forward-Forward Algorithm

Title: Energy-Efficient Supervised Learning with a Binary Stochastic Forward-Forward Algorithm Energieeffizientes Überwachtes Lernen mit einem binären stochastischen Vorwärtsalgorithmus 以二进制前向前前向算法进行能源高效监督学习 2507.06461v1

Authors (3): Risi Jaiswal, Supriyo Datta, Joseph G. Makin

Reducing energy consumption has become a pressing need for modern machine learning, which has achieved many of its most impressive results by scaling to larger and more energy-consumptive neural networks. Unfortunately, the main algorithm for training such networks, backpropagation, poses significant challenges for custom hardware accelerators, due to both its serial dependencies and the memory footprint needed to store forward activations for the backward pass. Alternatives to backprop, although less effective, do exist; here the main computational bottleneck becomes matrix multiplication. In this study, we derive forward-forward algorithms for binary, stochastic units. Binarization of the activations transforms matrix multiplications into indexing operations, which can be executed efficiently in hardware. Stochasticity, combined with tied weights across units with different biases, bypasses the information bottleneck imposed by binary units. Furthermore, although slow and expensive in traditional hardware, binary sampling that is very fast can be implemented cheaply with p-bits (probabilistic bits), novel devices made up of unstable magnets. We evaluate our proposed algorithms on the MNIST, Fashion-MNIST, and CIFAR-10 datasets, showing that its performance is close to real-valued forward-forward, but with an estimated energy savings of about one order of magnitude.

nan


Article 400

Title@2025-07-09 (3): Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

Title: Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations Nicht-negative Matrix-Fabrikisierung mit impliziten neuralen Darstellungen neu denken 以隐含神经表征代表面重新思考非电子矩阵的乘数 2404.04439v2

Authors (4): Krishna Subramani, Paris Smaragdis, Takuya Higuchi, Mehrez Souden

Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or sinusoidal analysis models, has not been possible since these representations cannot be directly stored in matrix form. In this paper, we formulate NMF in terms of learnable functions (instead of vectors) and show that NMF can be extended to a wider variety of signal classes that need not be regularly sampled.

nan


Article 401

Title@2025-07-08 (2): Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models

Title: Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models Automatisierte Neuronen-Kennzeichnung ermöglicht generative Steuerung und Dolmetschbarkeit in Protein-Sprachmodellen 自动神经标签自动化使蛋白语言模型能够产生引导和解释 2507.06458v1

Authors (4): Arjun Banerjee, David Martinez, Camille Dang, Ethan Tam

Protein language models (PLMs) encode rich biological information, yet their internal neuron representations are poorly understood. We introduce the first automated framework for labeling every neuron in a PLM with biologically grounded natural language descriptions. Unlike prior approaches relying on sparse autoencoders or manual annotation, our method scales to hundreds of thousands of neurons, revealing individual neurons are selectively sensitive to diverse biochemical and structural properties. We then develop a novel neuron activation-guided steering method to generate proteins with desired traits, enabling convergence to target biochemical properties like molecular weight and instability index as well as secondary and tertiary structural motifs, including alpha helices and canonical Zinc Fingers. We finally show that analysis of labeled neurons in different model sizes reveals PLM scaling laws and a structured neuron space distribution.

nan


Article 402

Title@2025-07-08 (2): Wild refitting for black box prediction

Title: Wild refitting for black box prediction Wilde Nachrüstung für Black Box Vorhersage 黑盒预测的野生改造 2506.21460v2

Authors (1): Martin J. Wainwright

We describe and analyze a computionally efficient refitting procedure for computing high-probability upper bounds on the instance-wise mean-squared prediction error of penalized nonparametric estimates based on least-squares minimization. Requiring only a single dataset and black box access to the prediction method, it consists of three steps: computing suitable residuals, symmetrizing and scaling them with a pre-factor $\rho$, and using them to define and solve a modified prediction problem recentered at the current estimate. We refer to it as wild refitting, since it uses Rademacher residual symmetrization as in a wild bootstrap variant. Under relatively mild conditions allowing for noise heterogeneity, we establish a high probability guarantee on its performance, showing that the wild refit with a suitably chosen wild noise scale $\rho$ gives an upper bound on prediction error. This theoretical analysis provides guidance into the design of such procedures, including how the residuals should be formed, the amount of noise rescaling in the wild sub-problem needed for upper bounds, and the local stability properties of the block-box procedure. We illustrate the applicability of this procedure to various problems, including non-rigid structure-from-motion recovery with structured matrix penalties; plug-and-play image restoration with deep neural network priors; and randomized sketching with kernel methods.

nan


Article 403

Title@2025-07-08 (2): FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models

Title: FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models FedPhD: Federated Pruning mit Hierarchical Learning of Diffusion Models FFPhD: 与传播模型的等级化学习结合的联邦节制 2507.06449v1

Authors (4): Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

Federated Learning (FL), as a distributed learning paradigm, trains models over distributed clients’ data. FL is particularly beneficial for distributed training of Diffusion Models (DMs), which are high-quality image generators that require diverse data. However, challenges such as high communication costs and data heterogeneity persist in training DMs similar to training Transformers and Convolutional Neural Networks. Limited research has addressed these issues in FL environments. To address this gap and challenges, we introduce a novel approach, FedPhD, designed to efficiently train DMs in FL environments. FedPhD leverages Hierarchical FL with homogeneity-aware model aggregation and selection policy to tackle data heterogeneity while reducing communication costs. The distributed structured pruning of FedPhD enhances computational efficiency and reduces model storage requirements in clients. Our experiments across multiple datasets demonstrate that FedPhD achieves high model performance regarding Fr'echet Inception Distance (FID) scores while reducing communication costs by up to $88\%$. FedPhD outperforms baseline methods achieving at least a $34\%$ improvement in FID, while utilizing only $56\%$ of the total computation and communication resources.

nan


Article 404

Title@2025-07-08 (2): Understanding Fixed Predictions via Confined Regions

Title: Understanding Fixed Predictions via Confined Regions Verstehe feste Vorhersagen über zusammenhängende Regionen 通过封闭区域了解固定预测 2502.16380v2

Authors (4): Connor Lawless, Tsui-Wei Weng, Berk Ustun, Madeleine Udell

Machine learning models can assign fixed predictions that preclude individuals from changing their outcome. Existing approaches to audit fixed predictions do so on a pointwise basis, which requires access to an existing dataset of individuals and may fail to anticipate fixed predictions in out-of-sample data. This work presents a new paradigm to identify fixed predictions by finding confined regions of the feature space in which all individuals receive fixed predictions. This paradigm enables the certification of recourse for out-of-sample data, works in settings without representative datasets, and provides interpretable descriptions of individuals with fixed predictions. We develop a fast method to discover confined regions for linear classifiers using mixed-integer quadratically constrained programming. We conduct a comprehensive empirical study of confined regions across diverse applications. Our results highlight that existing pointwise verification methods fail to anticipate future individuals with fixed predictions, while our method both identifies them and provides an interpretable description.

nan


Article 405

Title@2025-07-08 (2): Can Interpretation Predict Behavior on Unseen Data?

Title: Can Interpretation Predict Behavior on Unseen Data? Kann Interpretation Verhalten auf ungesehenen Daten voraussagen? 解释预测行为是否与未知数据有关? 2507.06445v1

Authors (5): Victoria R. Li, Jenny Kaufmann, Martin Wattenberg, David Alvarez-Melis, Naomi Saphra

Interpretability research often aims to predict how a model will respond to targeted interventions on specific mechanisms. However, it rarely predicts how a model will respond to unseen input data. This paper explores the promises and challenges of interpretability as a tool for predicting out-of-distribution (OOD) model behavior. Specifically, we investigate the correspondence between attention patterns and OOD generalization in hundreds of Transformer models independently trained on a synthetic classification task. These models exhibit several distinct systematic generalization rules OOD, forming a diverse population for correlational analysis. In this setting, we find that simple observational tools from interpretability can predict OOD performance. In particular, when in-distribution attention exhibits hierarchical patterns, the model is likely to generalize hierarchically on OOD data – even when the rule’s implementation does not rely on these hierarchical patterns, according to ablation tests. Our findings offer a proof-of-concept to motivate further interpretability work on predicting unseen model behavior.

nan


Article 406

Title@2025-07-08 (2): FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval

Title: FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval FACap: Ein großformatiger Fashion-Datensatz für feinkörnige Verbundbilder 缩放成形图像检索: 用于精制合成图像检索的大型时装数据集 2507.07135v1

Authors (4): François Gardères, Shizhe Chen, Camille-Sovanneary Gauthier, Jean Ponce

The composed image retrieval (CIR) task is to retrieve target images given a reference image and a modification text. Recent methods for CIR leverage large pretrained vision-language models (VLMs) and achieve good performance on general-domain concepts like color and texture. However, they still struggle with application domains like fashion, because the rich and diverse vocabulary used in fashion requires specific fine-grained vision and language understanding. An additional difficulty is the lack of large-scale fashion datasets with detailed and relevant annotations, due to the expensive cost of manual annotation by specialists. To address these challenges, we introduce FACap, a large-scale, automatically constructed fashion-domain CIR dataset. It leverages web-sourced fashion images and a two-stage annotation pipeline powered by a VLM and a large language model (LLM) to generate accurate and detailed modification texts. Then, we propose a new CIR model FashionBLIP-2, which fine-tunes the general-domain BLIP-2 model on FACap with lightweight adapters and multi-head query-candidate matching to better account for fine-grained fashion-specific information. FashionBLIP-2 is evaluated with and without additional fine-tuning on the Fashion IQ benchmark and the enhanced evaluation dataset enhFashionIQ, leveraging our pipeline to obtain higher-quality annotations. Experimental results show that the combination of FashionBLIP-2 and pretraining with FACap significantly improves the model’s performance in fashion CIR especially for retrieval with fine-grained modification texts, demonstrating the value of our dataset and approach in a highly demanding environment such as e-commerce websites. Code is available at https://fgxaos.github.io/facap-paper-website/.

nan


Article 407

Title@2025-07-08 (2): Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention

Title: Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention Quadratic Gating Mixture of Experts: Statistische Einblicke in die Selbstachtung 二次曲线交配专家:将统计透视纳入自我意识 2410.11222v3

Authors (4): Pedram Akbarian, Huy Nguyen, Xing Han, Nhat Ho

Mixture of Experts (MoE) models are well known for effectively scaling model capacity while preserving computational overheads. In this paper, we establish a rigorous relation between MoE and the self-attention mechanism, showing that each row of a self-attention matrix can be written as a quadratic gating mixture of linear experts. Motivated by this connection, we conduct a comprehensive convergence analysis of MoE models with two different quadratic gating functions, namely the quadratic polynomial gate and the quadratic monomial gate, offering useful insights into the design of gating and experts for the MoE framework. First, our analysis indicates that the use of the quadratic monomial gate yields an improved sample efficiency for estimating parameters and experts compared to the quadratic polynomial gate. Second, parameter and expert estimation rates become significantly faster when employing non-linear experts in place of linear experts. Combining these theoretical insights with the above link between MoE and self-attention, we propose a novel \emph{active-attention} mechanism where we apply a non-linear activation function to the value matrix in the formula of self-attention. Finally, we demonstrate that the proposed active-attention outperforms the standard self-attention through several extensive experiments in various tasks, including image classification, language modeling, and multivariate time series forecasting.

nan


Article 408

Title@2025-07-08 (2): Deprecating Benchmarks: Criteria and Framework

Title: Deprecating Benchmarks: Criteria and Framework Deprecating Benchmarks: Kriterien und Rahmen 折旧基准:标准和框架 2507.06434v1

Authors (4): Ayrton San Joaquin, Rokas Gipiškis, Leon Staufer, Ariel Gil

As frontier artificial intelligence (AI) models rapidly advance, benchmarks are integral to comparing different models and measuring their progress in different task-specific domains. However, there is a lack of guidance on when and how benchmarks should be deprecated once they cease to effectively perform their purpose. This risks benchmark scores over-valuing model capabilities, or worse, obscuring capabilities and safety-washing. Based on a review of benchmarking practices, we propose criteria to decide when to fully or partially deprecate benchmarks, and a framework for deprecating benchmarks. Our work aims to advance the state of benchmarking towards rigorous and quality evaluations, especially for frontier models, and our recommendations are aimed to benefit benchmark developers, benchmark users, AI governance actors (across governments, academia, and industry panels), and policy makers.

nan


Article 409

Title@2025-07-08 (2): eegFloss: A Python package for refining sleep EEG recordings using machine learning models

Title: eegFloss: A Python package for refining sleep EEG recordings using machine learning models eegFloss: Ein Python-Paket zur Verfeinerung von EEG-Aufnahmen mit maschinellen Lernmodellen eegFloss:一个使用机器学习模型改进睡眠 EEEE 录音的 Python 软件包 2507.06433v1

Authors (5): Niloy Sikder, Paul Zerr, Mahdad Jafarzadeh Esfahani, Martin Dresler, Matthias Krauledat

Electroencephalography (EEG) allows monitoring of brain activity, providing insights into the functional dynamics of various brain regions and their roles in cognitive processes. EEG is a cornerstone in sleep research, serving as the primary modality of polysomnography, the gold standard in the field. However, EEG signals are prone to artifacts caused by both internal (device-specific) factors and external (environmental) interferences. As sleep studies are becoming larger, most rely on automatic sleep staging, a process highly susceptible to artifacts, leading to erroneous sleep scores. This paper addresses this challenge by introducing eegFloss, an open-source Python package to utilize eegUsability, a novel machine learning (ML) model designed to detect segments with artifacts in sleep EEG recordings. eegUsability has been trained and evaluated on manually artifact-labeled EEG data collected from 15 participants over 127 nights using the Zmax headband. It demonstrates solid overall classification performance (F1-score is approximately 0.85, Cohens kappa is 0.78), achieving a high recall rate of approximately 94% in identifying channel-wise usable EEG data, and extends beyond Zmax. Additionally, eegFloss offers features such as automatic time-in-bed detection using another ML model named eegMobility, filtering out certain artifacts, and generating hypnograms and sleep statistics. By addressing a fundamental challenge faced by most sleep studies, eegFloss can enhance the precision and rigor of their analysis as well as the accuracy and reliability of their outcomes.

nan


Article 410

Title@2025-07-08 (2): Transfer Learning for Transient Classification: From Simulations to Real Data and ZTF to LSST

Title: Transfer Learning for Transient Classification: From Simulations to Real Data and ZTF to LSST Transfer Learning for Transient Classification: Von Simulationen zu Realdaten und ZTF zu LSST 中转分类的学习转移:从模拟到真实数据和ZTF到LSST 2502.18558v2

Authors (4): Rithwik Gupta, Daniel Muthukrishna, Nabeel Rehemtulla, Ved Shah

Machine learning has become essential for automated classification of astronomical transients, but current approaches face significant limitations: classifiers trained on simulations struggle with real data, models developed for one survey cannot be easily applied to another, and new surveys require prohibitively large amounts of labelled training data. These challenges are particularly pressing as we approach the era of the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), where existing classification models will need to be retrained using LSST observations. We demonstrate that transfer learning can overcome these challenges by repurposing existing models trained on either simulations or data from other surveys. Starting with a model trained on simulated Zwicky Transient Facility (ZTF) light curves, we show that transfer learning reduces the amount of labelled real ZTF transients needed by 95% while maintaining equivalent performance to models trained from scratch. Similarly, when adapting ZTF models for LSST simulations, transfer learning achieves 94% of the baseline performance while requiring only 30% of the training data. These findings have significant implications for the early operations of LSST, suggesting that reliable automated classification will be possible soon after the survey begins, rather than waiting months or years to accumulate sufficient training data.

nan


Article 411

Title@2025-07-08 (2): Bridging Data Gaps of Rare Conditions in ICU: A Multi-Disease Adaptation Approach for Clinical Prediction

Title: Bridging Data Gaps of Rare Conditions in ICU: A Multi-Disease Adaptation Approach for Clinical Prediction Überbrückung von Datenlücken seltener Bedingungen in der Intensivstation: Ein Multi-Disease-Anpassungsansatz für klinische Vorhersagen 在伊斯兰法院联盟中缩小罕见条件的数据差距:临床预测的多种疾病适应办法 2507.06432v1

Authors (4): Mingcheng Zhu, Yu Liu, Zhiyao Luo, Tingting Zhu

Artificial Intelligence has revolutionised critical care for common conditions. Yet, rare conditions in the intensive care unit (ICU), including recognised rare diseases and low-prevalence conditions in the ICU, remain underserved due to data scarcity and intra-condition heterogeneity. To bridge such gaps, we developed KnowRare, a domain adaptation-based deep learning framework for predicting clinical outcomes for rare conditions in the ICU. KnowRare mitigates data scarcity by initially learning condition-agnostic representations from diverse electronic health records through self-supervised pre-training. It addresses intra-condition heterogeneity by selectively adapting knowledge from clinically similar conditions with a developed condition knowledge graph. Evaluated on two ICU datasets across five clinical prediction tasks (90-day mortality, 30-day readmission, ICU mortality, remaining length of stay, and phenotyping), KnowRare consistently outperformed existing state-of-the-art models. Additionally, KnowRare demonstrated superior predictive performance compared to established ICU scoring systems, including APACHE IV and IV-a. Case studies further demonstrated KnowRare’s flexibility in adapting its parameters to accommodate dataset-specific and task-specific characteristics, its generalisation to common conditions under limited data scenarios, and its rationality in selecting source conditions. These findings highlight KnowRare’s potential as a robust and practical solution for supporting clinical decision-making and improving care for rare conditions in the ICU.

nan


Article 412

Title@2025-07-08 (2): Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

Title: Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies Neurale Aktor-Kritische Methoden für Hamilton-Jacobi-Bellman PDEs: Asymptotische Analyse und Numerische Studien 汉密尔顿-Jacobi-Bellman PDEs的神经活性-化学方法:无症状分析和数值研究 2507.06428v1

Authors (4): Samuel N. Cohen, Jackson Hebner, Deqing Jiang, Justin Sirignano

We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm’s performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm’s numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.

nan


Article 413

Title@2025-07-08 (2): BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks

Title: BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks BOOST: Out-of-Distribution-informierte Adaptive Probenahme für Bias Mitigation in stylistischen konvolutionären Neuralen Netzwerken BOOST:在立体革命神经网络中为减轻细菌影响而进行分配外-综合的适应性抽样 2507.07134v1

Authors (4): Mridula Vijendran, Shuang Chen, Jingjing Deng, Hubert P. H. Shum

The pervasive issue of bias in AI presents a significant challenge to painting classification, and is getting more serious as these systems become increasingly integrated into tasks like art curation and restoration. Biases, often arising from imbalanced datasets where certain artistic styles dominate, compromise the fairness and accuracy of model predictions, i.e., classifiers are less accurate on rarely seen paintings. While prior research has made strides in improving classification performance, it has largely overlooked the critical need to address these underlying biases, that is, when dealing with out-of-distribution (OOD) data. Our insight highlights the necessity of a more robust approach to bias mitigation in AI models for art classification on biased training data. We propose a novel OOD-informed model bias adaptive sampling method called BOOST (Bias-Oriented OOD Sampling and Tuning). It addresses these challenges by dynamically adjusting temperature scaling and sampling probabilities, thereby promoting a more equitable representation of all classes. We evaluate our proposed approach to the KaoKore and PACS datasets, focusing on the model’s ability to reduce class-wise bias. We further propose a new metric, Same-Dataset OOD Detection Score (SODC), designed to assess class-wise separation and per-class bias reduction. Our method demonstrates the ability to balance high performance with fairness, making it a robust solution for unbiasing AI models in the art domain.

nan


Article 414

Title@2025-07-08 (2): Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders

Title: Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders Erforschen der Aufgabenleistung mit Interpretierbaren Modellen über Sparse Auto-Encoder 通过 Sparass Auto- Enccders 与可解释模型探索任务性能 2507.06427v1

Authors (8): Shun Wang, Tyler Loakman, Youbo Lei, Yi Liu, Bohao Yang, Yuting Zhao, Dong Yang, Chenghua Lin

Large Language Models (LLMs) are traditionally viewed as black-box algorithms, therefore reducing trustworthiness and obscuring potential approaches to increasing performance on downstream tasks. In this work, we apply an effective LLM decomposition method using a dictionary-learning approach with sparse autoencoders. This helps extract monosemantic features from polysemantic LLM neurons. Remarkably, our work identifies model-internal misunderstanding, allowing the automatic reformulation of the prompts with additional annotations to improve the interpretation by LLMs. Moreover, this approach demonstrates a significant performance improvement in downstream tasks, such as mathematical reasoning and metaphor detection.

nan


Article 415

Title@2025-07-08 (2): Generative Panoramic Image Stitching

Title: Generative Panoramic Image Stitching Generative Panoramabildstiche 生成全景图像切换 2507.07133v1

Authors (3): Mathieu Tuli, Kaveh Kamali, David B. Lindell

We introduce the task of generative panoramic image stitching, which aims to synthesize seamless panoramas that are faithful to the content of multiple reference images containing parallax effects and strong variations in lighting, camera capture settings, or style. In this challenging setting, traditional image stitching pipelines fail, producing outputs with ghosting and other artifacts. While recent generative models are capable of outpainting content consistent with multiple reference images, they fail when tasked with synthesizing large, coherent regions of a panorama. To address these limitations, we propose a method that fine-tunes a diffusion-based inpainting model to preserve a scene’s content and layout based on multiple reference images. Once fine-tuned, the model outpaints a full panorama from a single reference image, producing a seamless and visually coherent result that faithfully integrates content from all reference images. Our approach significantly outperforms baselines for this task in terms of image quality and the consistency of image structure and scene layout when evaluated on captured datasets.

nan


Article 416

Title@2025-07-08 (2): Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification

Title: Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification Kapsel-ConvKAN: Ein hybrider Neuralansatz zur Klassifizierung medizinischer Bilder Capsule-ConvKAN:医学图像分类混合神经方法 2507.06417v1

Authors (3): Laura Pituková, Peter Sinčák, László József Kovács

This study conducts a comprehensive comparison of four neural network architectures: Convolutional Neural Network, Capsule Network, Convolutional Kolmogorov–Arnold Network, and the newly proposed Capsule–Convolutional Kolmogorov–Arnold Network. The proposed Capsule-ConvKAN architecture combines the dynamic routing and spatial hierarchy capabilities of Capsule Network with the flexible and interpretable function approximation of Convolutional Kolmogorov–Arnold Networks. This novel hybrid model was developed to improve feature representation and classification accuracy, particularly in challenging real-world biomedical image data. The architectures were evaluated on a histopathological image dataset, where Capsule-ConvKAN achieved the highest classification performance with an accuracy of 91.21\%. The results demonstrate the potential of the newly introduced Capsule-ConvKAN in capturing spatial patterns, managing complex features, and addressing the limitations of traditional convolutional models in medical image classification.

nan


Article 417

Title@2025-07-08 (2): PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

Title: PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning PERK: Lang-Kontext als Parameter-Effizientes Test-Zeit-Lernen PERK: 参数有效试验时间学习的长期理由 2507.06415v1

Authors (4): Zeming Chen, Angelika Romanou, Gail Weiss, Antoine Bosselut

Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. Previous research shows that using test-time learning to encode context directly into model parameters can effectively enable reasoning over noisy information. However, meta-learning methods for enabling test-time learning are prohibitively memory-intensive, preventing their application to long context settings. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long input contexts using gradient updates to a lightweight model adapter at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard prompt-based long-context baseline, achieving average absolute performance gains of up to 90% for smaller models (GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In general, PERK is more robust to reasoning complexity, length extrapolation, and the locations of relevant information in contexts. Finally, we show that while PERK is memory-intensive during training, it scales more efficiently at inference time than prompt-based long-context inference.

nan


Article 418

Title@2025-07-08 (2): Tokenization for Molecular Foundation Models

Title: Tokenization for Molecular Foundation Models Tokenization für molekulare Stiftungsmodelle 分子基础模型的代名化 2409.15370v3

Authors (3): Alexius Wadell, Anoushka Bhutani, Venkatasubramanian Viswanathan

Text-based foundation models have become an important part of scientific discovery, with molecular foundation models accelerating advancements in material science and molecular design.However, existing models are constrained by closed-vocabulary tokenizers that capture only a fraction of molecular space. In this work, we systematically evaluate 34 tokenizers, including 19 chemistry-specific ones, and reveal significant gaps in their coverage of the SMILES molecular representation. To assess the impact of tokenizer choice, we introduce n-gram language models as a low-cost proxy and validate their effectiveness by pretraining and finetuning 18 RoBERTa-style encoders for molecular property prediction. To overcome the limitations of existing tokenizers, we propose two new tokenizers – Smirk and Smirk-GPE – with full coverage of the OpenSMILES specification. The proposed tokenizers systematically integrate nuclear, electronic, and geometric degrees of freedom; facilitating applications in pharmacology, agriculture, biology, and energy storage. Our results highlight the need for open-vocabulary modeling and chemically diverse benchmarks in cheminformatics.

nan


Article 419

Title@2025-07-08 (2): MedSyn: Enhancing Diagnostics with Human-AI Collaboration

Title: MedSyn: Enhancing Diagnostics with Human-AI Collaboration MedSyn: Diagnostik durch Mensch-AI-Kollaboration verbessern MedSyn:与人类-AI合作加强诊断 2506.14774v2

Authors (7): Burcu Sayin, Ipek Baris Schlicht, Ngoc Vo Hong, Sara Allievi, Jacopo Staiano, Pasquale Minervini, Andrea Passerini

Clinical decision-making is inherently complex, often influenced by cognitive biases, incomplete information, and case ambiguity. Large Language Models (LLMs) have shown promise as tools for supporting clinical decision-making, yet their typical one-shot or limited-interaction usage may overlook the complexities of real-world medical practice. In this work, we propose a hybrid human-AI framework, MedSyn, where physicians and LLMs engage in multi-step, interactive dialogues to refine diagnoses and treatment decisions. Unlike static decision-support tools, MedSyn enables dynamic exchanges, allowing physicians to challenge LLM suggestions while the LLM highlights alternative perspectives. Through simulated physician-LLM interactions, we assess the potential of open-source LLMs as physician assistants. Results show open-source LLMs are promising as physician assistants in the real world. Future work will involve real physician interactions to further validate MedSyn’s usefulness in diagnostic accuracy and patient outcomes.

nan


Article 420

Title@2025-07-08 (2): SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

Title: SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents SHADE-Arena: Bewertung von Sabotage und Überwachung in LLM-Agenten SHADE-Arena:用LLM代理物评估破坏和监测 2506.15740v2

Authors (12): Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton

As Large Language Models (LLMs) are increasingly deployed as autonomous agents in complex and long horizon settings, it is critical to evaluate their ability to sabotage users by pursuing hidden objectives. We study the ability of frontier LLMs to evade monitoring and achieve harmful hidden goals while completing a wide array of realistic tasks. We evaluate a broad range of frontier LLMs using SHADE (Subtle Harmful Agent Detection & Evaluation)-Arena, the first highly diverse agent evaluation dataset for sabotage and monitoring capabilities of LLM agents. SHADE-Arena consists of complex pairs of benign main tasks and harmful side objectives in complicated environments. Agents are evaluated on their ability to complete the side task without appearing suspicious to an LLM monitor. When measuring agent ability to (a) complete the main task, (b) complete the side task, and (c) avoid detection, we find that the best performing frontier models score 27% (Claude 3.7 Sonnet) and 15% (Gemini 2.5 Pro) as sabotage agents when overseen by Claude 3.6 Sonnet. For current frontier models, success on the side task relies heavily on having access to a hidden scratchpad that is not visible to the monitor. We also use SHADE-Arena to measure models’ monitoring abilities, with the top monitor (Gemini 2.5 Pro) achieving an AUC of 0.87 at distinguishing benign and malign transcripts. We find that for now, models still struggle at sabotage due to failures in long-context main task execution. However, our measurements already demonstrate the difficulty of monitoring for subtle sabotage attempts, which we expect to only increase in the face of more complex and longer-horizon tasks.

nan


Article 421

Title@2025-07-08 (2): Many-Task Federated Fine-Tuning via Unified Task Vectors

Title: Many-Task Federated Fine-Tuning via Unified Task Vectors Multi-Task-Federated Fine-Tuning über Unified Task-Vektoren 通过统一任务矢量进行许多任务、许多任务、联邦、联邦和 2502.06376v3

Authors (3): Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia

Federated Learning (FL) traditionally assumes homogeneous client tasks; however, in real-world scenarios, clients often specialize in diverse tasks, introducing task heterogeneity. To address this challenge, Many-Task FL (MaT-FL) has emerged, enabling clients to collaborate effectively despite task diversity. Existing MaT-FL approaches rely on client grouping or personalized layers, requiring the server to manage individual models and failing to account for clients handling multiple tasks. We propose MaTU, a MaT-FL approach that enables joint learning of task vectors across clients, eliminating the need for clustering or client-specific weight storage at the server. Our method introduces a novel aggregation mechanism that determines task similarity based on the direction of clients task vectors and constructs a unified task vector encapsulating all tasks. To address task-specific requirements, we augment the unified task vector with lightweight modulators that facilitate knowledge transfer among related tasks while disentangling dissimilar ones. Evaluated across 30 datasets, MaTU achieves superior performance over state-of-the-art MaT-FL approaches, with results comparable to per-task fine-tuning, while delivering significant communication savings.

nan


Article 422

Title@2025-07-08 (2): Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction

Title: Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction Lernen, autonomes Verhalten in Mensch-Roboter-Interaktion zu bewerten 学习评估人类机器人互动中的自主行为 2507.06404v1

Authors (8): Matteo Tiezzi, Tommaso Apicella, Carlos Cardenas-Perez, Giovanni Fregonese, Stefano Dafarra, Pietro Morerio, Daniele Pucci, Alessio Del Bue

Evaluating and comparing the performance of autonomous Humanoid Robots is challenging, as success rate metrics are difficult to reproduce and fail to capture the complexity of robot movement trajectories, critical in Human-Robot Interaction and Collaboration (HRIC). To address these challenges, we propose a general evaluation framework that measures the quality of Imitation Learning (IL) methods by focusing on trajectory performance. We devise the Neural Meta Evaluator (NeME), a deep learning model trained to classify actions from robot joint trajectories. NeME serves as a meta-evaluator to compare the performance of robot control policies, enabling policy evaluation without requiring human involvement in the loop. We validate our framework on ergoCub, a humanoid robot, using teleoperation data and comparing IL methods tailored to the available platform. The experimental results indicate that our method is more aligned with the success rate obtained on the robot than baselines, offering a reproducible, systematic, and insightful means for comparing the performance of multimodal imitation learning approaches in complex HRI tasks.

nan


Article 423

Title@2025-07-08 (2): Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Title: Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach Feinabstimmung multimodaler Transformer am Rand: Ein paralleler Split-Lernansatz 边缘的微调多式变形器:平行分割学习方法 2502.06355v3

Authors (3): Timo Fudala, Vasileios Tsouvalas, Nirvana Meratnia

Multimodal transformers integrate diverse data types like images, audio, and text, advancing tasks such as audio-visual understanding and image-text retrieval; yet their high parameterization limits deployment on resource-constrained edge devices. Split Learning (SL), which partitions models at a designated cut-layer to offload compute-intensive operations to the server, offers a promising approach for distributed training of multimodal transformers, though its application remains underexplored. We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner, while eliminating label sharing, client synchronization, and per-client sub-model management. MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs. Our evaluation across 7 multimodal datasets demonstrates that MPSL matches or outperforms Federated Learning, reduces client-side computations by 250x, and achieves superior scalability in communication cost with model growth. Through extensive analysis, we highlight task suitability, trade-offs, and scenarios where MPSL excels, inspiring further exploration.

nan


Article 424

Title@2025-07-08 (2): Detection of Intelligent Tampering in Wireless Electrocardiogram Signals Using Hybrid Machine Learning

Title: Detection of Intelligent Tampering in Wireless Electrocardiogram Signals Using Hybrid Machine Learning Erkennung der intelligenten Zähmung in drahtlosen Elektrokardiogrammsignalen mittels hybridem maschinellem Lernen 利用混合机器学习,在无线电心电图信号中探测智能安装设备 2507.06402v1

Authors (3): Siddhant Deshpande, Yalemzerf Getnet, Waltenegus Dargie

With the proliferation of wireless electrocardiogram (ECG) systems for health monitoring and authentication, protecting signal integrity against tampering is becoming increasingly important. This paper analyzes the performance of CNN, ResNet, and hybrid Transformer-CNN models for tamper detection. It also evaluates the performance of a Siamese network for ECG based identity verification. Six tampering strategies, including structured segment substitutions and random insertions, are emulated to mimic real world attacks. The one-dimensional ECG signals are transformed into a two dimensional representation in the time frequency domain using the continuous wavelet transform (CWT). The models are trained and evaluated using ECG data from 54 subjects recorded in four sessions 2019 to 2025 outside of clinical settings while the subjects performed seven different daily activities. Experimental results show that in highly fragmented manipulation scenarios, CNN, FeatCNN-TranCNN, FeatCNN-Tran and ResNet models achieved an accuracy exceeding 99.5 percent . Similarly, for subtle manipulations (for example, 50 percent from A and 50 percent from B and, 75 percent from A and 25 percent from B substitutions) our FeatCNN-TranCNN model demonstrated consistently reliable performance, achieving an average accuracy of 98 percent . For identity verification, the pure Transformer-Siamese network achieved an average accuracy of 98.30 percent . In contrast, the hybrid CNN-Transformer Siamese model delivered perfect verification performance with 100 percent accuracy.

nan


Article 425

Title@2025-07-08 (2): The Trilemma of Truth in Large Language Models

Title: The Trilemma of Truth in Large Language Models Das Trilemma der Wahrheit in großen Sprachmodellen 大语言模型中的真理三边 2506.23921v2

Authors (2): Germans Savcisens, Tina Eliassi-Rad

We often attribute human characteristics to large language models (LLMs) and claim that they “know” certain things. LLMs have an internal probabilistic knowledge that represents information retained during training. How can we assess the veracity of this knowledge? We examine two common methods for probing the veracity of LLMs and discover several assumptions that are flawed. To address these flawed assumptions, we introduce sAwMIL (short for Sparse Aware Multiple-Instance Learning), a probing method that utilizes the internal activations of LLMs to separate statements into true, false, and neither. sAwMIL is based on multiple-instance learning and conformal prediction. We evaluate sAwMIL on 5 validity criteria across 16 open-source LLMs, including both default and chat-based variants, as well as on 3 new datasets. Among the insights we provide are: (1) the veracity signal is often concentrated in the third quarter of an LLM’s depth; (2) truth and falsehood signals are not always symmetric; (3) linear probes perform better on chat models than on default models; (4) nonlinear probes may be required to capture veracity signals for some LLMs with reinforcement learning from human feedback or knowledge distillation; and (5) LLMs capture a third type of signal that is distinct from true and false and is neither true nor false. These findings provide a reliable method for verifying what LLMs “know” and how certain they are of their probabilistic internal knowledge.

nan


Article 426

Title@2025-07-08 (2): Representing Prompting Patterns with PDL: Compliance Agent Case Study

Title: Representing Prompting Patterns with PDL: Compliance Agent Case Study Präsentieren von Prompting Patterns mit PDL: Compliance Agent Case Study 代表PDL的提示模式:合规代理案例研究 2507.06396v1

Authors (6): Mandana Vaziri, Louis Mandel, Yuji Watanabe, Hirokuni Kitahara, Martin Hirzel, Anca Sailer

Prompt engineering for LLMs remains complex, with existing frameworks either hiding complexity behind restrictive APIs or providing inflexible canned patterns that resist customization – making sophisticated agentic programming challenging. We present the Prompt Declaration Language (PDL), a novel approach to prompt representation that tackles this fundamental complexity by bringing prompts to the forefront, enabling manual and automatic prompt tuning while capturing the composition of LLM calls together with rule-based code and external tools. By abstracting away the plumbing for such compositions, PDL aims at improving programmer productivity while providing a declarative representation that is amenable to optimization. This paper demonstrates PDL’s utility through a real-world case study of a compliance agent. Tuning the prompting pattern of this agent yielded up to 4x performance improvement compared to using a canned agent and prompt pattern.

nan


Article 427

Title@2025-07-08 (2): Nonlinear denoising score matching for enhanced learning of structured distributions

Title: Nonlinear denoising score matching for enhanced learning of structured distributions Nichtlineare Denoisierungs-Score-Matching für verbessertes Lernen von strukturierten Distributionen 非线性非线性除空比对匹配,以加强结构化分配的学习 2405.15625v2

Authors (5): Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet, Benjamin J. Zhang, Wei Zhu

We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be obtained from the data by an inexpensive preprocessing step. The nonlinear dynamics introduces new challenges into training which we address in two ways: 1) we develop a new nonlinear denoising score matching (NDSM) method, 2) we introduce neural control variates in order to reduce the variance of the NDSM training objective. We demonstrate the effectiveness of this method on several examples: a) a collection of low-dimensional examples, motivated by clustering in latent space, b) high-dimensional images, addressing issues with mode imbalance, small training sets, and approximate symmetries, the latter being a challenge for methods based on equivariant neural networks, which require exact symmetries, c) latent space representation of high-dimensional data, demonstrating improved performance with greatly reduced computational cost. Our method learns score-based generative models with less data by flexibly incorporating structure arising in the dataset.

nan


Article 428

Title@2025-07-08 (2): Ampere: Communication-Efficient and High-Accuracy Split Federated Learning

Title: Ampere: Communication-Efficient and High-Accuracy Split Federated Learning Ampere: Kommunikationseffizientes und hochgenaues Split-Federated-Learning Ampere: 通信效率和高准确度分立联邦学习 2507.07130v1

Authors (3): Zihan Zhang, Leon Wong, Blesson Varghese

A Federated Learning (FL) system collaboratively trains neural networks across devices and a server but is limited by significant on-device computation costs. Split Federated Learning (SFL) systems mitigate this by offloading a block of layers of the network from the device to a server. However, in doing so, it introduces large communication overheads due to frequent exchanges of intermediate activations and gradients between devices and the server and reduces model accuracy for non-IID data. We propose Ampere, a novel collaborative training system that simultaneously minimizes on-device computation and device-server communication while improving model accuracy. Unlike SFL, which uses a global loss by iterative end-to-end training, Ampere develops unidirectional inter-block training to sequentially train the device and server block with a local loss, eliminating the transfer of gradients. A lightweight auxiliary network generation method decouples training between the device and server, reducing frequent intermediate exchanges to a single transfer, which significantly reduces the communication overhead. Ampere mitigates the impact of data heterogeneity by consolidating activations generated by the trained device block to train the server block, in contrast to SFL, which trains on device-specific, non-IID activations. Extensive experiments on multiple CNNs and transformers show that, compared to state-of-the-art SFL baseline systems, Ampere (i) improves model accuracy by up to 13.26% while reducing training time by up to 94.6%, (ii) reduces device-server communication overhead by up to 99.1% and on-device computation by up to 93.13%, and (iii) reduces standard deviation of accuracy by 53.39% for various non-IID degrees highlighting superior performance when faced with heterogeneous data.

nan


Article 429

Title@2025-07-08 (2): KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

Title: KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks KPFlow: Eine Operator-Perspektive zum dynamischen Kollaps bei gradienter Abwärtsschulung von recurrenten Netzwerken KPFlow:经常网络渐渐下降源培训的操作员视角 2507.06381v1

Authors (4): James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

Gradient Descent (GD) and its variants are the primary tool for enabling efficient training of recurrent dynamical systems such as Recurrent Neural Networks (RNNs), Neural ODEs and Gated Recurrent units (GRUs). The dynamics that are formed in these models exhibit features such as neural collapse and emergence of latent representations that may support the remarkable generalization properties of networks. In neuroscience, qualitative features of these representations are used to compare learning in biological and artificial systems. Despite recent progress, there remains a need for theoretical tools to rigorously understand the mechanisms shaping learned representations, especially in finite, non-linear models. Here, we show that the gradient flow, which describes how the model’s dynamics evolve over GD, can be decomposed into a product that involves two operators: a Parameter Operator, K, and a Linearized Flow Propagator, P. K mirrors the Neural Tangent Kernel in feed-forward neural networks, while P appears in Lyapunov stability and optimal control theory. We demonstrate two applications of our decomposition. First, we show how their interplay gives rise to low-dimensional latent dynamics under GD, and, specifically, how the collapse is a result of the network structure, over and above the nature of the underlying task. Second, for multi-task training, we show that the operators can be used to measure how objectives relevant to individual sub-tasks align. We experimentally and theoretically validate these findings, providing an efficient Pytorch package, \emph{KPFlow}, implementing robust analysis tools for general recurrent architectures. Taken together, our work moves towards building a next stage of understanding of GD learning in non-linear recurrent models.

nan


Article 430

Title@2025-07-08 (2): Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation

Title: Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation Sichere und speicherfähige Deep-Learning-Modelle für Edge-KI mit automatischer Gewichtsgenerierung 使用自动重力生成的边缘AI的安全和储存高效深层学习模型 2507.06380v1

Authors (3): Habibur Rahaman, Atri Chatterjee, Swarup Bhunia

Complex neural networks require substantial memory to store a large number of synaptic weights. This work introduces WINGs (Automatic Weight Generator for Secure and Storage-Efficient Deep Learning Models), a novel framework that dynamically generates layer weights in a fully connected neural network (FC) and compresses the weights in convolutional neural networks (CNNs) during inference, significantly reducing memory requirements without sacrificing accuracy. WINGs framework uses principal component analysis (PCA) for dimensionality reduction and lightweight support vector regression (SVR) models to predict layer weights in the FC networks, removing the need for storing full-weight matrices and achieving substantial memory savings. It also preferentially compresses the weights in low-sensitivity layers of CNNs using PCA and SVR with sensitivity analysis. The sensitivity-aware design also offers an added level of security, as any bit-flip attack with weights in compressed layers has an amplified and readily detectable effect on accuracy. WINGs achieves 53x compression for the FC layers and 28x for AlexNet with MNIST dataset, and 18x for Alexnet with CIFAR-10 dataset with 1-2% accuracy loss. This significant reduction in memory results in higher throughput and lower energy for DNN inference, making it attractive for resource-constrained edge applications.

nan


Article 431

Title@2025-07-08 (2): Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos

Title: Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos Thermisator: Stabile autoregressive neuronale Emulation des spatiotemporalen Chaos 热水器:稳定的自动递减性神经神经模拟时空混乱 2503.18731v2

Authors (3): Chris Pedersen, Laure Zanna, Joan Bruna

Autoregressive surrogate models (or \textit{emulators}) of spatiotemporal systems provide an avenue for fast, approximate predictions, with broad applications across science and engineering. At inference time, however, these models are generally unable to provide predictions over long time rollouts due to accumulation of errors leading to diverging trajectories. In essence, emulators operate out of distribution, and controlling the online distribution quickly becomes intractable in large-scale settings. To address this fundamental issue, and focusing on time-stationary systems admitting an invariant measure, we leverage diffusion models to obtain an implicit estimator of the score of this invariant measure. We show that this model of the score function can be used to stabilize autoregressive emulator rollouts by applying on-the-fly denoising during inference, a process we call \textit{thermalization}. Thermalizing an emulator rollout is shown to extend the time horizon of stable predictions by an order of magnitude in complex systems exhibiting turbulent and chaotic behavior, opening up a novel application of diffusion models in the context of neural emulation.

nan


Article 432

Title@2025-07-08 (2): Efficient Decision Trees for Tensor Regressions

Title: Efficient Decision Trees for Tensor Regressions Effiziente Entscheidungsbäume für Tensor-Regressionen 用于回退的高效决策树 2408.01926v2

Authors (3): Hengrui Luo, Akira Horiguchi, Li Ma

We proposed the tensor-input tree (TT) method for scalar-on-tensor and tensor-on-tensor regression problems. We first address scalar-on-tensor problem by proposing scalar-output regression tree models whose input variable are tensors (i.e., multi-way arrays). We devised and implemented fast randomized and deterministic algorithms for efficient fitting of scalar-on-tensor trees, making TT competitive against tensor-input GP models. Based on scalar-on-tensor tree models, we extend our method to tensor-on-tensor problems using additive tree ensemble approaches. Theoretical justification and extensive experiments on real and synthetic datasets are provided to illustrate the performance of TT.

nan


Article 433

Title@2025-07-08 (2): The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks

Title: The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks Die Riemannsche Geometrie im Zusammenhang mit Gradient Flows of Linear Convolutional Networks 与线性革命网络的梯度流相关的里伊曼尼几何学 2507.06367v1

Authors (3): El Mehdi Achour, Kathlén Kohn, Holger Rauhut

We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

nan


Article 434

Title@2025-07-08 (2): DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction

Title: DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction DecoyDB: Ein Datensatz für Graph Kontrastives Lernen in Protein-Ligand Bindung Affinity Prediction 代号DB:蛋白碱-激光捆绑性近亲预测中的图表对比学习数据集 2507.06366v1

Authors (7): Yupu Zhang, Zelin Xu, Tingsong Xiao, Gustavo Seabra, Yanjun Li, Chenglong Li, Zhe Jiang

Predicting the binding affinity of protein-ligand complexes plays a vital role in drug discovery. Unfortunately, progress has been hindered by the lack of large-scale and high-quality binding affinity labels. The widely used PDBbind dataset has fewer than 20K labeled complexes. Self-supervised learning, especially graph contrastive learning (GCL), provides a unique opportunity to break the barrier by pre-training graph neural network models based on vast unlabeled complexes and fine-tuning the models on much fewer labeled complexes. However, the problem faces unique challenges, including a lack of a comprehensive unlabeled dataset with well-defined positive/negative complex pairs and the need to design GCL algorithms that incorporate the unique characteristics of such data. To fill the gap, we propose DecoyDB, a large-scale, structure-aware dataset specifically designed for self-supervised GCL on protein-ligand complexes. DecoyDB consists of high-resolution ground truth complexes (less than 2.5 Angstrom) and diverse decoy structures with computationally generated binding poses that range from realistic to suboptimal (negative pairs). Each decoy is annotated with a Root Mean Squared Deviation (RMSD) from the native pose. We further design a customized GCL framework to pre-train graph neural networks based on DecoyDB and fine-tune the models with labels from PDBbind. Extensive experiments confirm that models pre-trained with DecoyDB achieve superior accuracy, label efficiency, and generalizability.

nan


Article 435

Title@2025-07-08 (2): Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Title: Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate Wachsende Transformer: Modulare Zusammensetzung und schichtweise Erweiterung auf einem gefrorenen Substrat 增长变异器:在冻结基质上的模块构成和图层扩展 2507.07129v1

Authors (1): A. Bochkov

The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training, a resource-intensive process that lacks flexibility. This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings. In prior [1], we established that high-level semantic reasoning can emerge in Transformers using frozen embeddings derived from the visual structure of Unicode glyphs. Here, we demonstrate that this fixed representational substrate acts as a universal “docking port,” enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth. First, we show that specialist models trained on disparate datasets (e.g., Russian and Chinese text) can be merged into a single, more capable Mixture-of-Experts (MoE) model, post-training, with zero architectural modification. This is achieved by simply averaging their output logits. The resulting MoE model exhibits immediate performance improvements on reasoning benchmarks like MMLU, surpassing its constituent experts without catastrophic forgetting. Second, we introduce a layer-wise constructive training methodology, where a deep Transformer is “grown” by progressively stacking and training one layer at a time. This method demonstrates stable convergence and a clear correlation between model depth and the emergence of complex reasoning abilities, such as those required for SQuAD. Our findings suggest a paradigm shift from monolithic optimization towards a more biological or constructive model of AI development, where complexity is built incrementally and modules can be composed freely. This opens new avenues for resource-efficient scaling, continual learning, and a more democratized ecosystem for building powerful AI systems. We release all code and models to facilitate further research.

nan


Article 436

Title@2025-07-08 (2): ConTextTab: A Semantics-Aware Tabular In-Context Learner

Title: ConTextTab: A Semantics-Aware Tabular In-Context Learner ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner ConTextTab : 语义学软件制表器 2506.10707v2

Authors (4): Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin

Tabular in-context learning (ICL) has recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Previously restricted to classification problems on small tables, recent advances such as TabPFN and TabICL have extended its use to larger datasets. While being architecturally efficient and well-adapted to tabular data structures, current table-native ICL architectures, being trained exclusively on synthetic data, do not fully leverage the rich semantics and world knowledge contained in real-world tabular data. On another end of this spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. By employing specialized embeddings for different data modalities and by training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark. Code and checkpoints are available at https://github.com/SAP-samples/contexttab

nan


Article 437

Title@2025-07-08 (2): hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Title: hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v2

Authors (6): Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v (“HDL-to-Verilog”), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardware description languages - VHDL, Chisel, and PyMTL3 - to Verilog. Furthermore, we demonstrate the value of hdl2v in enhancing LLM Verilog generation by improving performance of a 32 billion-parameter open-weight model by up to 23% (pass@10) in VerilogEvalV2, without utilizing any data augmentation or knowledge distillation from larger models. We also show hdl2v’s ability to boost the performance of a data augmentation-based fine-tuning approach by 63%. Finally, we characterize and analyze our dataset to better understand which characteristics of HDL-to-Verilog datasets can be expanded upon in future work for even better performance.

nan


Article 438

Title@2025-07-08 (2): Deep learning-based species-area models reveal multi-scale patterns of species richness and turnover

Title: Deep learning-based species-area models reveal multi-scale patterns of species richness and turnover Deep Learning-basierte Arten-Gebiet Modelle zeigen Multi-Skala-Muster von Arten Reichtum und Umsatz 基于深深学习的物种区模型揭示了物种丰富性和更替性等多种规模的物种丰富性模式 2507.06358v1

Authors (18): Victor Boussange, Philipp Brun, Johanna T. Malle, Gabriele Midolo, Jeanne Portier, Théophile Sanchez, Niklaus E. Zimmermann, Irena Axmanová, Helge Bruelheide, Milan Chytrý, Stephan Kambach, Zdeňka Lososová, Martin Večeřa, Idoia Biurrun, Klaus T. Ecker, Jonathan Lenoir, Jens-Christian Svenning, Dirk Nikolaus Karger

The number of species within ecosystems is influenced not only by their intrinsic characteristics but also by the spatial scale considered. As the sampled area expands, species richness increases, a phenomenon described by the species-area relationship (SAR). The accumulation dynamics of the SAR results from a complex interplay of biotic and abiotic processes operating at various spatial scales. However, the challenge of collecting exhaustive biodiversity records across spatial scales has hindered a comprehensive understanding of these dynamics. Here, we develop a deep learning approach that leverages sampling theory and small-scale ecological surveys to spatially resolve the scale-dependency of species richness. We demonstrate its performance by predicting the species richness of vascular plant communities across Europe, and evaluate the predictions against an independent dataset of plant community inventories. Our model improves species richness estimates by 32\% and delivers spatially explicit patterns of species richness and turnover for sampling areas ranging from square meters to hundreds of square kilometers. Explainable AI techniques further disentangle how drivers of species richness operate across spatial scales. The ability of our model to represent the multi-scale nature of biodiversity is essential to deliver robust biodiversity assessments and forecasts under global change.

nan


Article 439

Title@2025-07-08 (2): On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving

Title: On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving Über die natürliche Robustheit von Vision-Sprachmodellen gegen visuelle Wahrnehmungsangriffe im autonomen Fahren 关于在自主驾驶时反对视觉视力攻击的视觉-语言模型的自然强力 2506.11472v2

Authors (8): Pedram MohajerAnsari, Amir Salarpour, Michael Kühr, Siyu Huang, Mohammad Hamad, Sebastian Steinhorst, Habeeb Olufowobi, Mert D. Pesé

Autonomous vehicles (AVs) rely on deep neural networks (DNNs) for critical tasks such as traffic sign recognition (TSR), automated lane centering (ALC), and vehicle detection (VD). However, these models are vulnerable to attacks that can cause misclassifications and compromise safety. Traditional defense mechanisms, including adversarial training, often degrade benign accuracy and fail to generalize against unseen attacks. In this work, we introduce Vehicle Vision Language Models (V2LMs), fine-tuned vision-language models specialized for AV perception. Our findings demonstrate that V2LMs inherently exhibit superior robustness against unseen attacks without requiring adversarial training, maintaining significantly higher accuracy than conventional DNNs under adversarial conditions. We evaluate two deployment strategies: Solo Mode, where individual V2LMs handle specific perception tasks, and Tandem Mode, where a single unified V2LM is fine-tuned for multiple tasks simultaneously. Experimental results reveal that DNNs suffer performance drops of 33% to 46% under attacks, whereas V2LMs maintain adversarial accuracy with reductions of less than 8% on average. The Tandem Mode further offers a memory-efficient alternative while achieving comparable robustness to Solo Mode. We also explore integrating V2LMs as parallel components to AV perception to enhance resilience against adversarial threats. Our results suggest that V2LMs offer a promising path toward more secure and resilient AV perception systems.

nan


Article 440

Title@2025-07-08 (2): Trainability of Quantum Models Beyond Known Classical Simulability

Title: Trainability of Quantum Models Beyond Known Classical Simulability Ausbildungsfähigkeit von Quantenmodellen jenseits der bekannten klassischen Simulierbarkeit 超出已知古典模拟性之外的量数模型的可操作性 2507.06344v1

Authors (4): Sabri Meyer, Francesco Scala, Francesco Tacchino, Aurelien Lucchi

Variational Quantum Algorithms (VQAs) are promising candidates for near-term quantum computing, yet they face scalability challenges due to barren plateaus, where gradients vanish exponentially in the system size. Recent conjectures suggest that avoiding barren plateaus might inherently lead to classical simulability, thus limiting the opportunities for quantum advantage. In this work, we advance the theoretical understanding of the relationship between the trainability and computational complexity of VQAs, thus directly addressing the conjecture. We introduce the Linear Clifford Encoder (LCE), a novel technique that ensures constant-scaling gradient statistics on optimization landscape regions that are close to Clifford circuits. Additionally, we leverage classical Taylor surrogates to reveal computational complexity phase transitions from polynomial to super-polynomial as the initialization region size increases. Combining these results, we reveal a deeper link between trainability and computational complexity, and analytically prove that barren plateaus can be avoided in regions for which no classical surrogate is known to exist. Furthermore, numerical experiments on LCE transformed landscapes confirm in practice the existence of a super-polynomially complex ``transition zone’’ where gradients decay polynomially. These findings indicate a plausible path to practically relevant, barren plateau-free variational models with potential for quantum advantage.

nan


Article 441

Title@2025-07-08 (2): A Unifying Framework for Robust and Efficient Inference with Unstructured Data

Title: A Unifying Framework for Robust and Efficient Inference with Unstructured Data Ein einheitliches Rahmenwerk für robuste und effiziente Schlussfolgerungen mit unstrukturierten Daten 与无结构数据进行有力和高效推断的统一框架 2505.00282v2

Authors (2): Jacob Carlson, Melissa Dell

This paper presents a general framework for conducting efficient inference on parameters derived from unstructured data, which include text, images, audio, and video. Economists have long used unstructured data by first extracting low-dimensional structured features (e.g., the topic or sentiment of a text), since the raw data are too high-dimensional and uninterpretable to include directly in empirical analyses. The rise of deep neural networks has accelerated this practice by greatly reducing the costs of extracting structured data at scale, but neural networks do not make generically unbiased predictions. This potentially propagates bias to the downstream estimators that incorporate imputed structured data, and the availability of different off-the-shelf neural networks with different biases moreover raises p-hacking concerns. To address these challenges, we reframe inference with unstructured data as a problem of missing structured data, where structured variables are imputed from high-dimensional unstructured inputs. This perspective allows us to apply classic results from semiparametric inference, leading to estimators that are valid, efficient, and robust. We formalize this approach with MAR-S, a framework that unifies and extends existing methods for debiased inference using machine learning predictions, connecting them to familiar problems such as causal inference. Within this framework, we develop robust and efficient estimators for both descriptive and causal estimands and address challenges like inference with aggregated and transformed missing structured data-a common scenario that is not covered by existing work. These methods-and the accompanying implementation package-provide economists with accessible tools for constructing unbiased estimators using unstructured data in a wide range of applications, as we demonstrate by re-analyzing several influential studies.

nan


Article 442

Title@2025-07-08 (2): SymFlux: deep symbolic regression of Hamiltonian vector fields

Title: SymFlux: deep symbolic regression of Hamiltonian vector fields SymFlux: tiefe symbolische Regression der Hamiltonschen Vektorfelder 交汇:汉密尔顿矢量场的深刻象征性回归 2507.06342v1

Authors (2): M. A. Evangelista-Alvarado, P. Suárez-Serrato

We present SymFlux, a novel deep learning framework that performs symbolic regression to identify Hamiltonian functions from their corresponding vector fields on the standard symplectic plane. SymFlux models utilize hybrid CNN-LSTM architectures to learn and output the symbolic mathematical expression of the underlying Hamiltonian. Training and validation are conducted on newly developed datasets of Hamiltonian vector fields, a key contribution of this work. Our results demonstrate the model’s effectiveness in accurately recovering these symbolic expressions, advancing automated discovery in Hamiltonian mechanics.

nan


Article 443

Title@2025-07-08 (2): Learning Nonlinear Finite Element Solution Operators using Multilayer Perceptrons and Energy Minimization

Title: Learning Nonlinear Finite Element Solution Operators using Multilayer Perceptrons and Energy Minimization Nichtlineare Finite-Elemente-Lösungs-Betreiber lernen mit Multilayer-Perceptronen und Energie-Minimierung 使用多层光谱和能源最小化的学习非线性有限元素解答操作器 2412.04596v2

Authors (3): Mats G. Larson, Carl Lundholm, Anna Persson

We develop and evaluate a method for learning solution operators to nonlinear problems governed by partial differential equations (PDEs). The approach is based on a finite element discretization and aims at representing the solution operator by a multilayer perceptron (MLP) that takes problem data variables as input and gives a prediction of the finite element solution as output. The variables will typically correspond to parameters in a parametrization of input data such as boundary conditions, coefficients, and right-hand sides. The output will be an approximation of the corresponding finite element solution, thus enabling support and enhancement by the standard finite element method (FEM) both theoretically and practically. The loss function is most often an energy functional and we formulate efficient parallelizable training algorithms based on assembling the energy locally on each element. For large problems, the learning process can be made more efficient by using only a small fraction of randomly chosen elements in the mesh in each iteration. The approach is evaluated on several relevant test cases, where learning the finite element solution operator turns out to be beneficial, both in its own right but also by combination with standard FEM theory and software.

nan


Article 444

Title@2025-07-08 (2): Self-supervised learning predicts plant growth trajectories from multi-modal industrial greenhouse data

Title: Self-supervised learning predicts plant growth trajectories from multi-modal industrial greenhouse data Selbstüberwachtes Lernen prognostiziert Pflanzenwachstumspfade aus multimodalen industriellen Treibhausgasdaten 自监督的自学预测从多模式工业温室数据中预测到植物生长轨迹 2507.06336v1

Authors (4): Adam J Riesselman, Evan M Cofer, Therese LaRue, Wim Meeussen

Quantifying organism-level phenotypes, such as growth dynamics and biomass accumulation, is fundamental to understanding agronomic traits and optimizing crop production. However, quality growing data of plants at scale is difficult to generate. Here we use a mobile robotic platform to capture high-resolution environmental sensing and phenotyping measurements of a large-scale hydroponic leafy greens system. We describe a self-supervised modeling approach to build a map from observed growing data to the entire plant growth trajectory. We demonstrate our approach by forecasting future plant height and harvest mass of crops in this system. This approach represents a significant advance in combining robotic automation and machine learning, as well as providing actionable insights for agronomic research and operational efficiency.

nan


Article 445

Title@2025-07-08 (2): AR2: Attention-Guided Repair for the Robustness of CNNs Against Common Corruptions

Title: AR2: Attention-Guided Repair for the Robustness of CNNs Against Common Corruptions AR2: Aufmerksamkeitsgeführte Reparatur für die Robustheit von CNNs gegen häufige Korruption AR2:对有线电视新闻网反常见腐败的强力进行引人注意的修理 2507.06332v1

Authors (3): Fuyuan Zhang, Qichen Wang, Jianjun Zhao

Deep neural networks suffer from significant performance degradation when exposed to common corruptions such as noise, blur, weather, and digital distortions, limiting their reliability in real-world applications. In this paper, we propose AR2 (Attention-Guided Repair for Robustness), a simple yet effective method to enhance the corruption robustness of pretrained CNNs. AR2 operates by explicitly aligning the class activation maps (CAMs) between clean and corrupted images, encouraging the model to maintain consistent attention even under input perturbations. Our approach follows an iterative repair strategy that alternates between CAM-guided refinement and standard fine-tuning, without requiring architectural changes. Extensive experiments show that AR2 consistently outperforms existing state-of-the-art methods in restoring robustness on standard corruption benchmarks (CIFAR-10-C, CIFAR-100-C and ImageNet-C), achieving a favorable balance between accuracy on clean data and corruption robustness. These results demonstrate that AR2 provides a robust and scalable solution for enhancing model reliability in real-world environments with diverse corruptions.

nan


Article 446

Title@2025-07-08 (2): Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson’s Disease

Title: Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson’s Disease Sample-Efficient Verstärkungs-Lern-Controller für tiefe Gehirnstimulation bei Parkinson-Krankheit 帕金森病深脑刺激学主计长 2507.06326v1

Authors (5): Harsh Ravivarapu, Gaurav Bagwe, Xiaoyong Yuan, Chunxiu Yu, Lan Zhang

Deep brain stimulation (DBS) is an established intervention for Parkinson’s disease (PD), but conventional open-loop systems lack adaptability, are energy-inefficient due to continuous stimulation, and provide limited personalization to individual neural dynamics. Adaptive DBS (aDBS) offers a closed-loop alternative, using biomarkers such as beta-band oscillations to dynamically modulate stimulation. While reinforcement learning (RL) holds promise for personalized aDBS control, existing methods suffer from high sample complexity, unstable exploration in binary action spaces, and limited deployability on resource-constrained hardware. We propose SEA-DBS, a sample-efficient actor-critic framework that addresses the core challenges of RL-based adaptive neurostimulation. SEA-DBS integrates a predictive reward model to reduce reliance on real-time feedback and employs Gumbel Softmax-based exploration for stable, differentiable policy updates in binary action spaces. Together, these components improve sample efficiency, exploration robustness, and compatibility with resource-constrained neuromodulatory hardware. We evaluate SEA-DBS on a biologically realistic simulation of Parkinsonian basal ganglia activity, demonstrating faster convergence, stronger suppression of pathological beta-band power, and resilience to post-training FP16 quantization. Our results show that SEA-DBS offers a practical and effective RL-based aDBS framework for real-time, resource-constrained neuromodulation.

nan


Article 447

Title@2025-07-08 (2): (How) Can Transformers Predict Pseudo-Random Numbers?

Title: (How) Can Transformers Predict Pseudo-Random Numbers? (Wie) Können Transformer Pseudo-Random-Zahlen vorhersagen? (如何)变形者预测出优多-兰多数字? 2502.10390v2

Authors (5): Tao Tao, Darshil Doshi, Dayal Singh Kalra, Tianyu He, Maissam Barkeshli

Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random number sequences from linear congruential generators (LCGs), defined by the recurrence relation $x_{t+1} = a x_t + c \;\mathrm{mod}\; m$. We find that with sufficient architectural capacity and training data variety, Transformers can perform in-context prediction of LCG sequences with unseen moduli ($m$) and parameters ($a,c$). By analyzing the embedding layers and attention patterns, we uncover how Transformers develop algorithmic structures to learn these sequences in two scenarios of increasing complexity. First, we investigate how Transformers learn LCG sequences with unseen ($a, c$) but fixed modulus; and demonstrate successful learning up to $m = 2^{32}$. We find that models learn to factorize $m$ and utilize digit-wise number representations to make sequential predictions. In the second, more challenging scenario of unseen moduli, we show that Transformers can generalize to unseen moduli up to $m_{\text{test}} = 2^{16}$. In this case, the model employs a two-step strategy: first estimating the unknown modulus from the context, then utilizing prime factorizations to generate predictions. For this task, we observe a sharp transition in the accuracy at a critical depth $d= 3$. We also find that the number of in-context sequence elements needed to reach high accuracy scales sublinearly with the modulus.

nan


Article 448

Title@2025-07-08 (2): Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation

Title: Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation Zentralisierter Kopier-Paste: Verbesserte Datenerweiterungsstrategie für die semantische Segmentierung von Wildlandfeuern 中央中央复印版面:加强荒地火灾静义分割区的数据增加战略 2507.06321v1

Authors (8): Joon Tai Kim, Tianle Chen, Ziyu Dong, Nishanth Kunchala, Alexander Guller, Daniel Ospina Acero, Roger Williams, Mrinal Kumar

Collecting and annotating images for the purpose of training segmentation models is often cost prohibitive. In the domain of wildland fire science, this challenge is further compounded by the scarcity of reliable public datasets with labeled ground truth. This paper presents the Centralized Copy-Paste Data Augmentation (CCPDA) method, for the purpose of assisting with the training of deep-learning multiclass segmentation models, with special focus on improving segmentation outcomes for the fire-class. CCPDA has three main steps: (i) identify fire clusters in the source image, (ii) apply a centralization technique to focus on the core of the fire area, and (iii) paste the refined fire clusters onto a target image. This method increases dataset diversity while preserving the essential characteristics of the fire class. The effectiveness of this augmentation technique is demonstrated via numerical analysis and comparison against various other augmentation methods using a weighted sum-based multi-objective optimization approach. This approach helps elevate segmentation performance metrics specific to the fire class, which carries significantly more operational significance than other classes (fuel, ash, or background). Numerical performance assessment validates the efficacy of the presented CCPDA method in alleviating the difficulties associated with small, manually labeled training datasets. It also illustrates that CCPDA outperforms other augmentation strategies in the application scenario considered, particularly in improving fire-class segmentation performance.

nan


Article 449

Title@2025-07-08 (2): Implicit Neural Representations for Chemical Reaction Paths

Title: Implicit Neural Representations for Chemical Reaction Paths Implizite neurale Darstellungen für chemische Reaktionswege 化学反应路径的隐性神经图示 2502.15843v3

Authors (5): Kalyan Ramakrishnan, Lars L. Schaaf, Chen Lin, Guangrun Wang, Philip Torr

We show that neural networks can be optimized to represent minimum energy paths as continuous functions, offering a flexible alternative to discrete path-search methods such as Nudged Elastic Band (NEB). Our approach parameterizes reaction paths with a network trained on a loss function that discards tangential energy gradients and enables instant estimation of the transition state. We first validate the method on two-dimensional potentials and then demonstrate its advantages over NEB on challenging atomistic systems where (i) poor initial guesses yield unphysical paths, (ii) multiple competing paths exist, or (iii) the reaction follows a complex multi-step mechanism. Results highlight the versatility of the method: for instance, a simple adjustment to the sampling strategy during optimization can help escape local-minimum solutions. Finally, in a low-dimensional setting, we demonstrate that a single neural network can learn from existing paths and generalize to unseen systems, showing promise for a universal reaction path representation.

nan


Article 450

Title@2025-07-08 (2): Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Title: Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Effizienz-Wirksamkeit Reranking FLOPs für LLM-basierte Reranker 为基于LLM的升级者重新安排FLOP 2507.06223v1

Authors (5): Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang

Large Language Models (LLMs) have recently been applied to reranking tasks in information retrieval, achieving strong performance. However, their high computational demands often hinder practical deployment. Existing studies evaluate the efficiency of LLM-based rerankers using proxy metrics such as latency, the number of forward passes, input tokens, and output tokens. However, these metrics depend on hardware and running-time choices (\eg parallel or not, batch size, etc), and often fail to account for model size, making it difficult to interpret and obscuring the evaluation of the efficiency-effectiveness tradeoff. To address this issue, we propose E\textsuperscript{2}R-FLOPs, for LLM-based rerankers: ranking metrics per PetaFLOP (RPP) for relevance per compute and queries per PetaFLOP (QPP) for hardware-agnostic throughput. Companied with the new metrics, an interpretable FLOPs estimator is built to estimate the FLOPs of an LLM-based reranker even without running any experiments. Based on the proposed metrics, we conduct comprehensive experiments to evaluate a wide range of LLM-based rerankers with different architecture, studying the efficiency-effectiveness trade-off and bringing this issue to the attention of the research community.

nan


Article 451

Title@2025-07-08 (2): Deep Learning Optimization of Two-State Pinching Antennas Systems

Title: Deep Learning Optimization of Two-State Pinching Antennas Systems Deep Learning Optimierung von zweistufigen Pinching-Antennensystemen 双州钉钉天线系统的深度学习优化 2507.06222v1

Authors (5): Odysseas G. Karagiannidis, Victoria E. Galanopoulou, Panagiotis D. Diamantoulakis, Zhiguo Ding, Octavia Dobre

The evolution of wireless communication systems requires flexible, energy-efficient, and cost-effective antenna technologies. Pinching antennas (PAs), which can dynamically control electromagnetic wave propagation through binary activation states, have recently emerged as a promising candidate. In this work, we investigate the problem of optimally selecting a subset of fixed-position PAs to activate in a waveguide, when the aim is to maximize the communication rate at a user terminal. Due to the complex interplay between antenna activation, waveguide-induced phase shifts, and power division, this problem is formulated as a combinatorial fractional 0-1 quadratic program. To efficiently solve this challenging problem, we use neural network architectures of varying complexity to learn activation policies directly from data, leveraging spatial features and signal structure. Furthermore, we incorporate user location uncertainty into our training and evaluation pipeline to simulate realistic deployment conditions. Simulation results demonstrate the effectiveness and robustness of the proposed models.

nan


Article 452

Title@2025-07-08 (2): Is Diversity All You Need for Scalable Robotic Manipulation?

Title: Is Diversity All You Need for Scalable Robotic Manipulation? Ist Vielfalt alles, was Sie für eine skalierbare Robotermanipulation benötigen? 多样性是否是可缩放的机器人操纵所需要的? 2507.06219v1

Authors (10): Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, Hongyang Li

Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of “more diverse is better”. Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios; (2) multi-embodiment pre-training data is optional for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing more desirable scaling property during fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with velocity multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively.

nan


Article 453

Title@2025-07-08 (2): What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23

Title: What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23 Was ZTF sah, wo Rubin sah: Anomalie Jagd in DR23 ZTF所看到的Rubin所看的地方:DR23的异常狩猎 2507.06217v1

Authors (10): Maria V. Pruzhinskaya, Anastasia D. Lavrukhina, Timofey A. Semenikhi, Alina A. Volnova, Sreevarsha Sreejith, Vadim V. Krushinsky, Emmanuel Gangler, Emille E. O. Ishida, Matwey V. Kornilov, Konstantin L. Malanchev

We present results from the SNAD VIII Workshop, during which we conducted the first systematic anomaly search in the ZTF fields also observed by LSSTComCam during Rubin Scientific Pipeline commissioning. Using the PineForest active anomaly detection algorithm, we analysed four selected fields (two galactic and two extragalactic) and visually inspected 400 candidates. As a result, we discovered six previously uncatalogued variable stars, including RS~CVn, BY Draconis, ellipsoidal, and solar-type variables, and refined classifications and periods for six known objects. These results demonstrate the effectiveness of the SNAD anomaly detection pipeline and provide a preview of the discovery potential in the upcoming LSST data.

nan


Article 454

Title@2025-07-08 (2): Embedding Atlas: Low-Friction, Interactive Embedding Visualization

Title: Embedding Atlas: Low-Friction, Interactive Embedding Visualization Atlas einbetten: Low-Friction, Interactive Embedding Visualisierung 嵌入式Atlas:低功能、互动嵌入式可视化 2505.06386v2

Authors (4): Donghao Ren, Fred Hohman, Halden Lin, Dominik Moritz

Embedding projections are popular for visualizing large datasets and models. However, people often encounter “friction” when using embedding visualization tools: (1) barriers to adoption, e.g., tedious data wrangling and loading, scalability limits, no integration of results into existing workflows, and (2) limitations in possible analyses, without integration with external tools to additionally show coordinated views of metadata. In this paper, we present Embedding Atlas, a scalable, interactive visualization tool designed to make interacting with large embeddings as easy as possible. Embedding Atlas uses modern web technologies and advanced algorithms – including density-based clustering, and automated labeling – to provide a fast and rich data analysis experience at scale. We evaluate Embedding Atlas with a competitive analysis against other popular embedding tools, showing that Embedding Atlas’s feature set specifically helps reduce friction, and report a benchmark on its real-time rendering performance with millions of points. Embedding Atlas is available as open source to support future work in embedding-based analysis.

nan


Article 455

Title@2025-07-08 (2): Instruction Following by Boosting Attention of Large Language Models

Title: Instruction Following by Boosting Attention of Large Language Models Anleitung, indem man die Aufmerksamkeit großer Sprachmodelle erhöht 之后的教学,培养对大语言模式的注意 2506.13734v2

Authors (4): Vitoria Guardieiro, Adam Stein, Avishree Khare, Eric Wong

Controlling the generation of large language models (LLMs) remains a central challenge to ensure their safe and reliable deployment. While prompt engineering and finetuning are common approaches, recent work has explored latent steering, a lightweight technique that alters LLM internal activations to guide generation. However, subsequent studies revealed latent steering’s effectiveness to be limited, often underperforming simple instruction prompting. To address this limitation, we first establish a benchmark across diverse behaviors for standardized evaluation of steering techniques. Building on insights from this benchmark, we introduce Instruction Attention Boosting (InstABoost), a latent steering method that boosts the strength of instruction prompting by altering the model’s attention during generation. InstABoost combines the strengths of existing approaches and is theoretically supported by prior work that suggests that in-context rule following in transformer-based models can be controlled by manipulating attention on instructions. Empirically, InstABoost demonstrates superior control success compared to both traditional prompting and latent steering.

nan


Article 456

Title@2025-07-08 (2): StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Title: StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation StreamDiffusion: Eine Pipeline-Level-Lösung für die interaktive Echtzeit-Generierung 流流扩散:实时互动新一代管道一级的解决方案 2312.12491v2

Authors (11): Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Masayoshi Tomizuka, Kurt Keutzer

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throughput is imperative. To address this, we present a novel approach that transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams. To handle the frequency disparity between data input and model throughput, we design a novel input-output queue for parallelizing the streaming process. Moreover, the existing diffusion pipeline uses classifier-free guidance(CFG), which requires additional U-Net computation. To mitigate the redundant computations, we propose a novel residual classifier-free guidance (RCFG) algorithm that reduces the number of negative conditional denoising steps to only one or even zero. Besides, we introduce a stochastic similarity filter(SSF) to optimize power consumption. Our Stream Batch achieves around 1.5x speedup compared to the sequential denoising method at different denoising levels. The proposed RCFG leads to speeds up to 2.05x higher than the conventional CFG. Combining the proposed strategies and existing mature acceleration tools makes the image-to-image generation achieve up-to 91.07fps on one RTX4090, improving the throughputs of AutoPipline developed by Diffusers over 59.56x. Furthermore, our proposed StreamDiffusion also significantly reduces the energy consumption by 2.39x on one RTX3060 and 1.99x on one RTX4090, respectively.

nan


Article 457

Title@2025-07-08 (2): Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

Title: Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA Benchmarking der CoW mit der TopCoW Challenge: Topologie-Bewusst Anatomische Segmentierung des Willis-Kreises für CTA und MRA 将欧洲联盟委员会与顶级COW挑战:CTA和MRA的Willis圈的地形学-软件解剖分解 2312.17670v4

Authors (113): Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano Höher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Houjing Huang, Chinmay Prabhakar, Ezequiel de la Rosa, Bastian Wittmann, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris N. Vos, Ynte M. Ruigrok, Birgitta K. Velthuis, Hugo J. Kuijf, Pengcheng Shi, Wei Liu, Ting Ma, Maximilian R. Rokuss, Yannick Kirchhoff, Fabian Isensee, Klaus Maier-Hein, Chengcheng Zhu, Huilin Zhao, Philippe Bijlenga, Julien Hämmerli, Catherine Wurster, Laura Westphal, Jeroen Bisschop, Elisa Colombo, Hakim Baazaoui, Hannah-Lea Handelsmann, Andrew Makmur, James Hallinan, Amrish Soundararajan, Bene Wiestler, Jan S. Kirschke, Roland Wiest, Emmanuel Montagnon, Laurent Letourneau-Guillon, Kwanseok Oh, Dahye Lee, Adam Hilbert, Orhun Utku Aydin, Dimitrios Rallios, Jana Rieger, Satoru Tanioka, Alexander Koch, Dietmar Frey, Abdul Qayyum, Moona Mazher, Steven Niederer, Nico Disch, Julius Holzschuh, Dominic LaBella, Francesco Galati, Daniele Falcetta, Maria A. Zuluaga, Chaolong Lin, Haoran Zhao, Zehan Zhang, Minghui Zhang, Xin You, Hanxiao Zhang, Guang-Zhong Yang, Yun Gu, Sinyoung Ra, Jongyun Hwang, Hyunjin Park, Junqiang Chen, Marek Wodzinski, Henning Müller, Nesrin Mansouri, Florent Autrusseau, Cansu Yalçin, Rachika E. Hamadache, Clara Lisazo, Joaquim Salvi, Adrià Casamitjana, Xavier Lladó, Uma Maria Lal-Trehan Estrada, Valeriia Abramova, Luca Giancardo, Arnau Oliver, Paula Casademunt, Adrian Galdran, Matteo Delucchi, Jialu Liu, Haibin Huang, Yue Cui, Zehang Lin, Yusheng Liu, Shunzhi Zhu, Tatsat R. Patel, Adnan H. Siddiqui, Vincent M. Tutino, Maysam Orouskhani, Huayu Wang, Mahmud Mossa-Basha, Yuki Sato, Sven Hirsch, Susanne Wegener, Bjoern Menze

The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited datasets with annotations on CoW anatomy, especially for CTA. Therefore, we organized the TopCoW challenge with the release of an annotated CoW dataset. The TopCoW dataset is the first public dataset with voxel-level annotations for 13 CoW vessel components, enabled by virtual reality technology. It is also the first large dataset using 200 pairs of MRA and CTA from the same patients. As part of the benchmark, we invited submissions worldwide and attracted over 250 registered participants from six continents. The submissions were evaluated on both internal and external test datasets of 226 scans from over five centers. The top performing teams achieved over 90% Dice scores at segmenting the CoW components, over 80% F1 scores at detecting key CoW components, and over 70% balanced accuracy at classifying CoW variants for nearly all test sets. The best algorithms also showed clinical potential in classifying fetal-type posterior cerebral artery and locating aneurysms with CoW anatomy. TopCoW demonstrated the utility and versatility of CoW segmentation algorithms for a wide range of downstream clinical applications with explainability. The annotated datasets and best performing algorithms have been released as public Zenodo records to foster further methodological development and clinical tool building.

nan


Article 458

Title@2025-07-08 (2): Modern Methods in Associative Memory

Title: Modern Methods in Associative Memory Moderne Methoden im assoziativen Gedächtnis 共同内存现代方法 2507.06211v1

Authors (4): Dmitry Krotov, Benjamin Hoover, Parikshit Ram, Bao Pham

Associative Memories like the famous Hopfield Networks are elegant models for describing fully recurrent neural networks whose fundamental job is to store and retrieve information. In the past few years they experienced a surge of interest due to novel theoretical results pertaining to their information storage capabilities, and their relationship with SOTA AI architectures, such as Transformers and Diffusion Models. These connections open up possibilities for interpreting the computation of traditional AI networks through the theoretical lens of Associative Memories. Additionally, novel Lagrangian formulations of these networks make it possible to design powerful distributed models that learn useful representations and inform the design of novel architectures. This tutorial provides an approachable introduction to Associative Memories, emphasizing the modern language and methods used in this area of research, with practical hands-on mathematical derivations and coding notebooks.

nan


Article 459

Title@2025-07-08 (2): EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

Title: EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG EEG2TEXT-CN: Eine explorative Studie der offenen Vokabulären chinesischen Text-EEG-Ausrichtung über großsprachliches Modell und kontrastives Lernen auf ChinesischEEG EEG2TEXT-CN:通过大语言模式和中经语言差异性学习对中文文本与EEEG校对开放词汇的探索性研究 2506.00854v3

Authors (6): Jacky Tai-Yu Lu, Jung Chiang, Chi-Sheng Chen, Anna Nai-Yun Tung, Hsiang Wei Hu, Yuan Chiao Cheng

We propose EEG2TEXT-CN, which, to the best of our knowledge, represents one of the earliest open-vocabulary EEG-to-text generation frameworks tailored for Chinese. Built on a biologically grounded EEG encoder (NICE-EEG) and a compact pretrained language model (MiniLM), our architecture aligns multichannel brain signals with natural language representations via masked pretraining and contrastive learning. Using a subset of the ChineseEEG dataset, where each sentence contains approximately ten Chinese characters aligned with 128-channel EEG recorded at 256 Hz, we segment EEG into per-character embeddings and predict full sentences in a zero-shot setting. The decoder is trained with teacher forcing and padding masks to accommodate variable-length sequences. Evaluation on over 1,500 training-validation sentences and 300 held-out test samples shows promising lexical alignment, with a best BLEU-1 score of 6.38\%. While syntactic fluency remains a challenge, our findings demonstrate the feasibility of non-phonetic, cross-modal language decoding from EEG. This work opens a new direction in multilingual brain-to-text research and lays the foundation for future cognitive-language interfaces in Chinese.

nan


Article 460

Title@2025-07-08 (2): Differential Mamba

Title: Differential Mamba Unterschiedliche Mamba 差异 Mamba 2507.06204v1

Authors (3): Nadav Schneider, Itamar Zimerman, Eliya Nachmani

Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hallucinations, weakening long-range and retrieval abilities, and reducing robustness. Recent work has shown that differential design can mitigate this issue in Transformers, improving their effectiveness across various applications. In this paper, we explore whether these techniques, originally developed for Transformers, can be applied to Mamba, a recent architecture based on selective state-space layers that achieves Transformer-level performance with greater efficiency. We show that a naive adaptation of differential design to Mamba is insufficient and requires careful architectural modifications. To address this, we introduce a novel differential mechanism for Mamba, empirically validated on language modeling benchmarks, demonstrating improved retrieval capabilities and superior performance over vanilla Mamba. Finally, we conduct extensive ablation studies and empirical analyses to justify our design choices and provide evidence that our approach effectively mitigates the overallocation problem in Mamba-based models. Our code is publicly available.

nan


Article 461

Title@2025-07-08 (2): Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions

Title: Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions Effiziente Umsetzung von Gaußschen Prozessregression Beschleunigte Sattelpunktsuche mit Anwendung auf molekulare Reaktionen 高效实施高斯进程递减加速散积点搜索,应用分子反应 2505.12519v2

Authors (5): Rohit Goswami, Maxim Masterov, Satish Kamath, Alejandro Peña-Torres, Hannes Jónsson

The task of locating first order saddle points on high-dimensional surfaces describing the variation of energy as a function of atomic coordinates is an essential step for identifying the mechanism and estimating the rate of thermally activated events within the harmonic approximation of transition state theory. When combined directly with electronic structure calculations, the number of energy and atomic force evaluations needed for convergence is a primary issue. Here, we describe an efficient implementation of Gaussian process regression (GPR) acceleration of the minimum mode following method where a dimer is used to estimate the lowest eigenmode of the Hessian. A surrogate energy surface is constructed and updated after each electronic structure calculation. The method is applied to a test set of 500 molecular reactions previously generated by Hermez and coworkers [J. Chem. Theory Comput. 18, 6974 (2022)]. An order of magnitude reduction in the number of electronic structure calculations needed to reach the saddle point configurations is obtained by using the GPR compared to the dimer method. Despite the wide range in stiffness of the molecular degrees of freedom, the calculations are carried out using Cartesian coordinates and are found to require similar number of electronic structure calculations as an elaborate internal coordinate method implemented in the Sella software package. The present implementation of the GPR surrogate model in C++ is efficient enough for the wall time of the saddle point searches to be reduced in 3 out of 4 cases even though the calculations are carried out at a low Hartree-Fock level.

nan


Article 462

Title@2025-07-08 (2): UQLM: A Python Package for Uncertainty Quantification in Large Language Models

Title: UQLM: A Python Package for Uncertainty Quantification in Large Language Models UQLM: Ein Python-Paket zur Unsicherheitsquantifizierung in großen Sprachmodellen UQLM: 用于大语言模型中不确定性量化的 Python 软件包 2507.06196v1

Authors (6): Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Ho-Kyeong Ra, Viren Bajaj, Zeya Ahmad

Hallucinations, defined as instances where Large Language Models (LLMs) generate false or misleading content, pose a significant challenge that impacts the safety and trust of downstream applications. We introduce UQLM, a Python package for LLM hallucination detection using state-of-the-art uncertainty quantification (UQ) techniques. This toolkit offers a suite of UQ-based scorers that compute response-level confidence scores ranging from 0 to 1. This library provides an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.

nan


Article 463

Title@2025-07-08 (2): SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads

Title: SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads SQLBarber: Ein System, das große Sprachmodelle nutzt, um angepasste und realistische SQL Workloads zu generieren SQLBarber:一个系统利用大语言模型生成自定义和现实的 SQL 工作量 2507.06192v1

Authors (2): Jiale Lao, Immanuel Trummer

Database research and development often require a large number of SQL queries for benchmarking purposes. However, acquiring real-world SQL queries is challenging due to privacy concerns, and existing SQL generation methods are limited in customization and in satisfying realistic constraints. To address this issue, we present SQLBarber, a system based on Large Language Models (LLMs) to generate customized and realistic SQL workloads. SQLBarber (i) eliminates the need for users to manually craft SQL templates in advance, while providing the flexibility to accept natural language specifications to constrain SQL templates, (ii) scales efficiently to generate large volumes of queries matching any user-defined cost distribution (e.g., cardinality and execution plan cost), and (iii) uses execution statistics from Amazon Redshift and Snowflake to derive SQL template specifications and query cost distributions that reflect real-world query characteristics. SQLBarber introduces (i) a declarative interface for users to effortlessly generate customized SQL templates, (ii) an LLM-powered pipeline augmented with a self-correction module that profiles, refines, and prunes SQL templates based on query costs, and (iii) a Bayesian Optimizer to efficiently explore different predicate values and identify a set of queries that satisfy the target cost distribution. We construct and open-source ten benchmarks of varying difficulty levels and target query cost distributions based on real-world statistics from Snowflake and Amazon Redshift. Extensive experiments on these benchmarks show that SQLBarber is the only system that can generate customized SQL templates. It reduces query generation time by one to three orders of magnitude, and significantly improves alignment with the target cost distribution, compared with existing methods.

nan


Article 464

Title@2025-07-08 (2): Conservative approximation-based feedforward neural network for WENO schemes

Title: Conservative approximation-based feedforward neural network for WENO schemes Konservatives annäherungsbasiertes neuronales Feedforward-Netzwerk für WENO-Systeme WENO计划保守近近近近近近效饲料向神经网络 2507.06190v1

Authors (3): Kwanghyuk Park, Jiaxi Gu, Jae-Hun Jung

In this work, we present the feedforward neural network based on the conservative approximation to the derivative from point values, for the weighted essentially non-oscillatory (WENO) schemes in solving hyperbolic conservation laws. The feedforward neural network, whose inputs are point values from the three-point stencil and outputs are two nonlinear weights, takes the place of the classical WENO weighting procedure. For the training phase, we employ the supervised learning and create a new labeled dataset for one-dimensional conservative approximation, where we construct a numerical flux function from the given point values such that the flux difference approximates the derivative to high-order accuracy. The symmetric-balancing term is introduced for the loss function so that it propels the neural network to match the conservative approximation to the derivative and satisfy the symmetric property that WENO3-JS and WENO3-Z have in common. The consequent WENO schemes, WENO3-CADNNs, demonstrate robust generalization across various benchmark scenarios and resolutions, where they outperform WENO3-Z and achieve accuracy comparable to WENO5-JS.

nan


Article 465

Title@2025-07-08 (2): GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus

Title: GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus GuiderNet: Ein Meta-Learning-Framework zur Optimierung der Quantum Circuit Geometrie und zur Abmilderung von Barren Plateaus 指南Net:优化量子电路几何和减缓巴伦高原的元学习框架 2506.21940v2

Authors (2): Marwan Ait Haddou, Mohamed Bennai

Variational Quantum Algorithms (VQAs) offer potential for near-term quantum advantage but face challenges from barren plateaus, where gradients vanish, and poorly conditioned optimization landscapes. We introduce GuiderNet, a meta-learning framework that conditions Parameterized Quantum Circuits (PQCs) using data-dependent parameter shifts aimed at minimizing the log condition number of the Fubini-Study metric tensor. Implemented as a classical neural network, GuiderNet is meta-trained to guide PQC parameters into geometrically favorable regions and is embedded within hybrid quantum-classical pipelines to steer both initialization and adaptive modulation during training. Applied to the Kaggle Diabetes classification task, GuiderNet reduces cumulative training loss by over 5x, improves test accuracy from 75.3% to 98.6%, and increases the minority-class F1 score from 0.67 to 0.95. It also suppresses gradient explosion and stabilizes parameter updates, enabling smoother and more robust optimization. These results demonstrate that geometric meta-conditioning can mitigate barren plateaus and ill-conditioning, providing a scalable approach to enhance trainability and generalization in quantum machine learning.

nan


Article 466

Title@2025-07-08 (2): A Method for Optimizing Connections in Differentiable Logic Gate Networks

Title: A Method for Optimizing Connections in Differentiable Logic Gate Networks Eine Methode zur Optimierung von Verbindungen in differenzierbaren Logic Gate-Netzwerken 在有区别逻辑门网中优化连接的方法 2507.06173v1

Authors (4): Wout Mommen, Lars Keuninckx, Matthias Hartmann, Piet Wambacq

We introduce a novel method for partial optimization of the connections in Deep Differentiable Logic Gate Networks (LGNs). Our training method utilizes a probability distribution over a subset of connections per gate input, selecting the connection with highest merit, after which the gate-types are selected. We show that the connection-optimized LGNs outperform standard fixed-connection LGNs on the Yin-Yang, MNIST and Fashion-MNIST benchmarks, while requiring only a fraction of the number of logic gates. When training all connections, we demonstrate that 8000 simple logic gates are sufficient to achieve over 98% on the MNIST data set. Additionally, we show that our network has 24 times fewer gates, while performing better on the MNIST data set compared to standard fully connected LGNs. As such, our work shows a pathway towards fully trainable Boolean logic.

nan


Article 467

Title@2025-07-08 (2): Inferring Higher-Order Couplings with Neural Networks

Title: Inferring Higher-Order Couplings with Neural Networks Rückschlüsse auf höhere Auftragskoppelungen mit neuralen Netzen 与神经网络连接 2501.06108v4

Authors (3): Aurélien Decelle, Alfonso de Jesús Navas Gómez, Beatriz Seoane

Maximum entropy methods, rooted in the inverse Ising/Potts problem from statistical physics, are widely used to model pairwise interactions in complex systems across disciplines such as bioinformatics and neuroscience. While successful, these approaches often fail to capture higher-order interactions that are critical for understanding collective behavior. In contrast, modern machine learning methods can model such interactions, but their interpretability often comes at a prohibitive computational cost. Restricted Boltzmann Machines (RBMs) provide a computationally efficient alternative by encoding statistical correlations through hidden units in a bipartite architecture. In this work, we introduce a method that maps RBMs onto generalized Potts models, enabling the systematic extraction of interactions up to arbitrary order. Leveraging large-$N$ approximations, made tractable by the RBM’s structure, we extract effective many-body couplings with minimal computational effort. We further propose a robust framework for recovering higher-order interactions in more complex generative models, and introduce a simple gauge-fixing scheme for the effective Potts representation. Validation on synthetic data demonstrates accurate recovery of two- and three-body interactions. Applied to protein sequence data, our method reconstructs contact maps with high fidelity and outperforms state-of-the-art inverse Potts models. These results establish RBMs as a powerful and efficient tool for modeling higher-order structure in high-dimensional categorical data.

nan


Article 468

Title@2025-07-08 (2): Aliasing in Convnets: A Frame-Theoretic Perspective

Title: Aliasing in Convnets: A Frame-Theoretic Perspective Aliasing in Convnets: Eine rahmentheoretische Perspektive Convnets中的异异相:框架理论视角 2507.06152v1

Authors (5): Daniel Haider, Vincent Lostanlen, Martin Ehler, Nicki Holighaus, Peter Balazs

Using a stride in a convolutional layer inherently introduces aliasing, which has implications for numerical stability and statistical generalization. While techniques such as the parametrizations via paraunitary systems have been used to promote orthogonal convolution and thus ensure Parseval stability, a general analysis of aliasing and its effects on the stability has not been done in this context. In this article, we adapt a frame-theoretic approach to describe aliasing in convolutional layers with 1D kernels, leading to practical estimates for stability bounds and characterizations of Parseval stability, that are tailored to take short kernel sizes into account. From this, we derive two computationally very efficient optimization objectives that promote Parseval stability via systematically suppressing aliasing. Finally, for layers with random kernels, we derive closed-form expressions for the expected value and variance of the terms that describe the aliasing effects, revealing fundamental insights into the aliasing behavior at initialization.

nan


Article 469

Title@2025-07-08 (2): Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning

Title: Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning Online-Planung für Multi-UAV Pursuit-Evasion in unbekannten Umgebungen mit Deep Verstärkung Learning 利用深强化学习,在未知环境中进行多无人驾驶航空器探索-扩展的在线规划 2409.15866v4

Authors (9): Jiayu Chen, Chao Yu, Guosheng Li, Wenhao Tang, Shilong Ji, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang

Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelligence. Multi-agent reinforcement learning (MARL) has demonstrated potential in modeling cooperative behaviors, but most RL-based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely restricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we address multi-UAV pursuit-evasion by considering UAV dynamics and physical constraints. We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning. Additionally, we propose an adaptive environment generator within MARL training, enabling higher exploration efficiency and better policy generalization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Finally, we derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.

nan


Article 470

Title@2025-07-08 (2): Transformers Simulate MLE for Sequence Generation in Bayesian Networks

Title: Transformers Simulate MLE for Sequence Generation in Bayesian Networks Transformer simulieren MLE für Sequenzgenerierung in Bayesischen Netzwerken Bayesian 网络序列生成的变换器模拟 MLE MLE 2501.02547v2

Authors (6): Yuan Cao, Yihan He, Dennis Wu, Hong-Yu Chen, Jianqing Fan, Han Liu

Transformers have achieved significant success in various fields, notably excelling in tasks involving sequential data like natural language processing. Despite these achievements, the theoretical understanding of transformers’ capabilities remains limited. In this paper, we investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based on in-context maximum likelihood estimation (MLE). Specifically, we consider a setting where a context is formed by a set of independent sequences generated according to a Bayesian network. We demonstrate that there exists a simple transformer model that can (i) estimate the conditional probabilities of the Bayesian network according to the context, and (ii) autoregressively generate a new sample according to the Bayesian network with estimated conditional probabilities. We further demonstrate in extensive experiments that such a transformer does not only exist in theory, but can also be effectively obtained through training. Our analysis highlights the potential of transformers to learn complex probabilistic models and contributes to a better understanding of large language models as a powerful class of sequence generators.

nan


Article 471

Title: Topic Modeling and Link-Prediction for Material Property Discovery Thema Modellierung und Link-Prädikation für Material Property Discovery 物质财产发现专题建模和连接预测 2507.06139v1

Authors (5): Ryan C. Barron, Maksim E. Eren, Valentin Stanev, Cynthia Matuszek, Boian S. Alexandrov

Link prediction infers missing or future relations between graph nodes, based on connection patterns. Scientific literature networks and knowledge graphs are typically large, sparse, and noisy, and often contain missing links between entities. We present an AI-driven hierarchical link prediction framework that integrates matrix factorization to infer hidden associations and steer discovery in complex material domains. Our method combines Hierarchical Nonnegative Matrix Factorization (HNMFk) and Boolean matrix factorization (BNMFk) with automatic model selection, as well as Logistic matrix factorization (LMF), we use to construct a three-level topic tree from a 46,862-document corpus focused on 73 transition-metal dichalcogenides (TMDs). These materials are studied in a variety of physics fields with many current and potential applications. An ensemble BNMFk + LMF approach fuses discrete interpretability with probabilistic scoring. The resulting HNMFk clusters map each material onto coherent topics like superconductivity, energy storage, and tribology. Also, missing or weakly connected links are highlight between topics and materials, suggesting novel hypotheses for cross-disciplinary exploration. We validate our method by removing publications about superconductivity in well-known superconductors, and show the model predicts associations with the superconducting TMD clusters. This shows the method finds hidden connections in a graph of material to latent topic associations built from scientific literature, especially useful when examining a diverse corpus of scientific documents covering the same class of phenomena or materials but originating from distinct communities and perspectives. The inferred links generating new hypotheses, produced by our method, are exposed through an interactive Streamlit dashboard, designed for human-in-the-loop scientific discovery.

nan


Article 472

Title@2025-07-08 (2): The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Title: The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret Die Gefahren der Optimierung gelernter Belohnungsfunktionen: Niedriger Trainingsfehler garantiert keinen geringen Bedauern 优化得失回报功能的危害:低培训错误不能保证低遗憾 2406.15753v3

Authors (6): Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse

In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the data distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. We hope our results stimulate the theoretical and empirical study of improved methods to learn reward models, and better ways to measure their quality reliably.

nan


Article 473

Title@2025-07-08 (2): Subspace-based Approximate Hessian Method for Zeroth-Order Optimization

Title: Subspace-based Approximate Hessian Method for Zeroth-Order Optimization Subraum-basierte annähernde hessische Methode zur Zeroth-Order-Optimierung 近乎赫西安的赫斯法 2507.06125v1

Authors (4): Dongyoon Kim, Sungjae Lee, Wonjin Lee, Kwang In Kim

Zeroth-order optimization addresses problems where gradient information is inaccessible or impractical to compute. While most existing methods rely on first-order approximations, incorporating second-order (curvature) information can, in principle, significantly accelerate convergence. However, the high cost of function evaluations required to estimate Hessian matrices often limits practical applicability. We present the subspace-based approximate Hessian (ZO-SAH) method, a zeroth-order optimization algorithm that mitigates these costs by focusing on randomly selected two-dimensional subspaces. Within each subspace, ZO-SAH estimates the Hessian by fitting a quadratic polynomial to the objective function and extracting its second-order coefficients. To further reduce function-query costs, ZO-SAH employs a periodic subspace-switching strategy that reuses function evaluations across optimization steps. Experiments on eight benchmark datasets, including logistic regression and deep neural network training tasks, demonstrate that ZO-SAH achieves significantly faster convergence than existing zeroth-order methods.

nan


Article 474

Title@2025-07-08 (2): Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression

Title: Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression Regression für den Mittelwert: Auto-Evaluierung und Schlussfolgerung mit wenigen Etiketten durch post-hoc-Regression 平均值的回归:通过热后回归,用很少的标签进行自动评价和推断 2411.12665v2

Authors (2): Benjamin Eyre, David Madras

The availability of machine learning systems that can effectively perform arbitrary tasks has led to synthetic labels from these systems being used in applications of statistical inference, such as data analysis or model evaluation. The Prediction Powered Inference (PPI) framework provides a way of leveraging both a large pool of pseudo-labelled data and a small sample with real, high-quality labels to produce a low-variance, unbiased estimate of the quantity being evaluated for. Most work on PPI considers a relatively sizable set of labelled samples, which can be resource intensive to obtain. However, we find that when labelled data is scarce, the PPI++ method can perform even worse than classical inference. We analyze this phenomenon by relating PPI++ to ordinary least squares regression, which also experiences high variance with small sample sizes, and use this regression framework to better understand the efficacy of PPI. Motivated by this, we present two new PPI-based techniques that leverage robust regressors to produce even lower variance estimators in the few-label regime.

nan


Article 475

Title@2025-07-08 (2): Entropy stable conservative flux form neural networks

Title: Entropy stable conservative flux form neural networks Entropiestabile konservative Flussform neuronale Netzwerke Entropy稳定稳定的保守通量形成神经网络 2411.01746v2

Authors (4): Lizuo Liu, Tongtong Li, Anne Gelb, Yoonsang Lee

We propose an entropy-stable conservative flux form neural network (CFN) that integrates classical numerical conservation laws into a data-driven framework using the entropy-stable, second-order, and non-oscillatory Kurganov-Tadmor (KT) scheme. The proposed entropy-stable CFN uses slope limiting as a denoising mechanism, ensuring accurate predictions in both noisy and sparse observation environments, as well as in both smooth and discontinuous regions. Numerical experiments demonstrate that the entropy-stable CFN achieves both stability and conservation while maintaining accuracy over extended time domains. Furthermore, it successfully predicts shock propagation speeds in long-term simulations, {\it without} oracle knowledge of later-time profiles in the training data.

nan


Article 476

Title@2025-07-08 (2): Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Title: Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation Sichere Domain Randomisierung durch Uncertainty-Aware Out-of-Distribution-Erkennung und Politikanpassung 通过不确定性—- 软件外分发检测和政策适应而安全域域随机化 2507.06111v1

Authors (5): Mohamad H. Danesh, Maxime Wabartha, Stanley Wu, Joelle Pineau, Hsiu-Chin Lin

Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns, and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.

nan


Article 477

Title@2025-07-08 (2): Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification

Title: Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification Fliesenbasierte ViT-Inferenz mit Visual-Cluster-Prioren für die zero-shot Multi-Spezies Pflanzenidentifikation 零热多物种植物识别与视光聚群前科的电压VT推断值 2507.06093v1

Authors (4): Murilo Gustineli, Anthony Miyaguchi, Adrian Cheung, Divyansh Khattak

We describe DS@GT’s second-place solution to the PlantCLEF 2025 challenge on multi-species plant identification in vegetation quadrat images. Our pipeline combines (i) a fine-tuned Vision Transformer ViTD2PC24All for patch-level inference, (ii) a 4x4 tiling strategy that aligns patch size with the network’s 518x518 receptive field, and (iii) domain-prior adaptation through PaCMAP + K-Means visual clustering and geolocation filtering. Tile predictions are aggregated by majority vote and re-weighted with cluster-specific Bayesian priors, yielding a macro-averaged F1 of 0.348 (private leaderboard) while requiring no additional training. All code, configuration files, and reproducibility scripts are publicly available at https://github.com/dsgt-arc/plantclef-2025.

nan


Article 478

Title@2025-07-08 (2): Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AI

Title: Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AI Zähmen von Daten-Herausforderungen in ML-basierten Sicherheitsaufgaben: Lehren aus der Integration generativer KI 在以多边后勤为基础的安全任务中应对数据挑战:从综合生成中汲取的教训 2507.06092v1

Authors (7): Shravya Kanchi, Neal Mangaokar, Aravind Cheruvu, Sifat Muhammad Abdullah, Shirin Nilizadeh, Atul Prakash, Bimal Viswanath

Machine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorithmic advancements. We argue that data challenges that negatively impact the performance of these classifiers have received limited attention. We address the following research question: Can developments in Generative AI (GenAI) address these data challenges and improve classifier performance? We propose augmenting training datasets with synthetic data generated using GenAI techniques to improve classifier generalization. We evaluate this approach across 7 diverse security tasks using 6 state-of-the-art GenAI methods and introduce a novel GenAI scheme called Nimai that enables highly controlled data synthesis. We find that GenAI techniques can significantly improve the performance of security classifiers, achieving improvements of up to 32.6% even in severely data-constrained settings (only ~180 training samples). Furthermore, we demonstrate that GenAI can facilitate rapid adaptation to concept drift post-deployment, requiring minimal labeling in the adjustment process. Despite successes, our study finds that some GenAI schemes struggle to initialize (train and produce data) on certain security tasks. We also identify characteristics of specific tasks, such as noisy labels, overlapping class distributions, and sparse feature vectors, which hinder performance boost using GenAI. We believe that our study will drive the development of future GenAI tools designed for security tasks.

nan


Article 479

Title@2025-07-08 (2): CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs

Title: CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs CoRE: Verbesserung der Metakognition durch etikettenfreie Selbsteinschätzung in LRMs CORE:加强低林机制中与无标签自评的比喻意识 2507.06087v1

Authors (4): Haoxi Li, Sikai Bai, Jie Zhang, Song Guo

Large reasoning models (LRMs) have demonstrated impressive capabilities in domains like mathematics and program synthesis. Despite their strong performance, LRMs often exhibit overthinking – excessive and redundant reasoning steps that introduce inefficiencies during inference. This phenomenon raises an important question for LRM self-evaluation: How can a model autonomously assess the correctness of its own reasoning trajectory without external labels? To address this, we propose Chain-of-Reasoning Embedding (CoRE), a series of hidden states in latent space to enable label-free self-evaluation on intermediate reasoning steps of LRMs, so as to enhance metacognition abilities for improved reasoning efficiency. By analyzing the geometric properties of the CoRE trajectories, we reveal that redundant reasoning usually presents cyclical fluctuations, which correspond to repetitive and unconscious reflection/exploration. Leveraging this insight, we further introduce a training-free, label-free self-evaluation framework, CoRE-Eval, to detect such patterns and dynamically determine whether to terminate reasoning early. Extensive experiments on mathematical reasoning benchmarks (GSM8K, MATH-500, and AIME) and across model sizes from 7B to 32B demonstrate that CoRE-Eval reduces chain-of-thought length by 13.7% to 33.2% while improving answer accuracy by around 10%, achieving 70.0% accuracy on the challenging AIME benchmark with the 32B model.

nan


Article 480

Title@2025-07-08 (2): A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models

Title: A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models Ein Cascading kooperatives Multi-Agent-Framework für die On-Ramp Merging Control Integration großer Sprachmodelle 连串合作多剂框架,以综合大语言模式,促进在轨合并控制控制 2503.08199v2

Authors (7): Miao Zhang, Zhenlong Fang, Tianyi Wang, Qian Zhang, Shuai Lu, Junfeng Jiao, Tianyu Shi

Traditional Reinforcement Learning (RL) suffers from replicating human-like behaviors, generalizing effectively in multi-agent scenarios, and overcoming inherent interpretability issues.These tasks are compounded when deep environment understanding, agent coordination and dynamic optimization are required. While Large Language Model (LLM) enhanced methods have shown promise in generalization and interoperability, they often neglect necessary multi-agent coordination. Therefore, we introduce the Cascading Cooperative Multi-agent (CCMA) framework, integrating RL for individual interactions, a fine-tuned LLM for regional cooperation, a reward function for global optimization, and the Retrieval-augmented Generation mechanism to dynamically optimize decision-making across complex driving scenarios. Our experiments demonstrate that the CCMA outperforms existing RL methods, demonstrating significant improvements in both micro and macro-level performance in complex driving environments.

nan


Article 481

Title@2025-07-08 (2): QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

Title: QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models QS4D: Quantization-aware-Training für effiziente Hardware-Bereitstellung strukturierter State-Space-Sequential-Modelle QS4D:为高效部署结构有序的州-空相继模型的硬件进行量化了解培训 2507.06079v1

Authors (6): Sebastian Siegel, Ming-Jay Yang, Younes Bouhadjar, Maxime Fabre, Emre Neftci, John Paul Strachan

Structured State Space models (SSM) have recently emerged as a new class of deep learning models, particularly well-suited for processing long sequences. Their constant memory footprint, in contrast to the linearly scaling memory demands of Transformers, makes them attractive candidates for deployment on resource-constrained edge-computing devices. While recent works have explored the effect of quantization-aware training (QAT) on SSMs, they typically do not address its implications for specialized edge hardware, for example, analog in-memory computing (AIMC) chips. In this work, we demonstrate that QAT can significantly reduce the complexity of SSMs by up to two orders of magnitude across various performance metrics. We analyze the relation between model size and numerical precision, and show that QAT enhances robustness to analog noise and enables structural pruning. Finally, we integrate these techniques to deploy SSMs on a memristive analog in-memory computing substrate and highlight the resulting benefits in terms of computational efficiency.

nan


Article 482

Title@2025-07-08 (2): Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol

Title: Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol Kontrastives und Transferlernen für effektives Audio-Fingerprinting durch ein Real-World Evaluation Protocol 通过《现实世界评价议定书》促进有效音频指纹的交流和转让学习 2507.06070v1

Authors (2): Christos Nikou, Theodoros Giannakopoulos

Recent advances in song identification leverage deep neural networks to learn compact audio fingerprints directly from raw waveforms. While these methods perform well under controlled conditions, their accuracy drops significantly in real-world scenarios where the audio is captured via mobile devices in noisy environments. In this paper, we introduce a novel evaluation protocol designed to better reflect such real-world conditions. We generate three recordings of the same audio, each with increasing levels of noise, captured using a mobile device’s microphone. Our results reveal a substantial performance drop for two state-of-the-art CNN-based models under this protocol, compared to previously reported benchmarks. Additionally, we highlight the critical role of the augmentation pipeline during training with contrastive loss. By introduction low pass and high pass filters in the augmentation pipeline we significantly increase the performance of both systems in our proposed evaluation. Furthermore, we develop a transformer-based model with a tailored projection module and demonstrate that transferring knowledge from a semantically relevant domain yields a more robust solution. The transformer architecture outperforms CNN-based models across all noise levels, and query durations. In low noise conditions it achieves 47.99% for 1-sec queries, and 97% for 10-sec queries in finding the correct song, surpassing by 14%, and by 18.5% the second-best performing model, respectively, Under heavy noise levels, we achieve a detection rate 56.5% for 15-second query duration. All experiments are conducted on public large-scale dataset of over 100K songs, with queries matched against a database of 56 million vectors.

nan


Article 483

Title@2025-07-08 (2): Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

Title: Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle Sind LLMs prescient? Eine kontinuierliche Bewertung mit Daily News als Oracle LLLM是否具有先锋精神?利用每日新闻作为甲骨文进行持续评估。 2411.08324v2

Authors (3): Hui Dai, Ryan Teehan, Mengye Ren

Many existing evaluation benchmarks for Large Language Models (LLMs) quickly become outdated due to the emergence of new models and training data. These benchmarks also fall short in assessing how LLM performance changes over time, as they consist of a static set of questions without a temporal dimension. To address these limitations, we propose using future event prediction as a continuous evaluation method to assess LLMs’ temporal generalization and forecasting abilities. Our benchmark, Daily Oracle, automatically generates question-answer (QA) pairs from daily news, challenging LLMs to predict “future” event outcomes. Our findings reveal that as pre-training data becomes outdated, LLM performance degrades over time. While Retrieval Augmented Generation (RAG) has the potential to enhance prediction accuracy, the performance degradation pattern persists, highlighting the need for continuous model updates. Code and data are available at https://agenticlearning.ai/daily-oracle.

nan


Article 484

Title@2025-07-08 (2): Few-Shot Learning by Explicit Physics Integration: An Application to Groundwater Heat Transport

Title: Few-Shot Learning by Explicit Physics Integration: An Application to Groundwater Heat Transport Wenig heißes Lernen durch explizite Physik-Integration: Eine Anwendung auf den Grundwasser-Wärmetransport 通过明确物理集成进行很少热的热学习:地下水热运输的应用 2507.06062v1

Authors (4): Julia Pelzer, Corné Verburg, Alexander Heinlein, Miriam Schulte

Machine learning methods often struggle with real-world applications in science and engineering due to limited or low-quality training data. In this work, the example of groundwater flow with heat transport is considered; this corresponds to an advection-diffusion process under heterogeneous flow conditions, that is, spatially distributed material parameters and heat sources. Classical numerical simulations are costly and challenging due to high spatio-temporal resolution requirements and large domains. While often computationally more efficient, purely data-driven surrogate models face difficulties, particularly in predicting the advection process, which is highly sensitive to input variations and involves long-range spatial interactions. Therefore, in this work, a Local-Global Convolutional Neural Network (LGCNN) approach is introduced. It combines a lightweight numerical surrogate for the transport process (global) with convolutional neural networks for the groundwater velocity and heat diffusion processes (local). With the LGCNN, a city-wide subsurface temperature field is modeled, involving a heterogeneous groundwater flow field and one hundred groundwater heat pump injection points forming interacting heat plumes over long distances. The model is first systematically analyzed based on random subsurface input fields. Then, the model is trained on a handful of cut-outs from a real-world subsurface map of the Munich region in Germany, and it scales to larger cut-outs without retraining. All datasets, our code, and trained models are published for reproducibility.

nan


Article 485

Title@2025-07-08 (2): Estimating prevalence with precision and accuracy

Title: Estimating prevalence with precision and accuracy Prävalenz mit Präzision und Genauigkeit abschätzen 精确和精确地估计流行率 2507.06061v1

Authors (3): Aime Bienfait Igiraneza, Christophe Fraser, Robert Hinch

Unlike classification, whose goal is to estimate the class of each data point in a dataset, prevalence estimation or quantification is a task that aims to estimate the distribution of classes in a dataset. The two main tasks in prevalence estimation are to adjust for bias, due to the prevalence in the training dataset, and to quantify the uncertainty in the estimate. The standard methods used to quantify uncertainty in prevalence estimates are bootstrapping and Bayesian quantification methods. It is not clear which approach is ideal in terms of precision (i.e. the width of confidence intervals) and coverage (i.e. the confidence intervals being well-calibrated). Here, we propose Precise Quantifier (PQ), a Bayesian quantifier that is more precise than existing quantifiers and with well-calibrated coverage. We discuss the theory behind PQ and present experiments based on simulated and real-world datasets. Through these experiments, we establish the factors which influence quantification precision: the discriminatory power of the underlying classifier; the size of the labeled dataset used to train the quantifier; and the size of the unlabeled dataset for which prevalence is estimated. Our analysis provides deep insights into uncertainty quantification for quantification learning.

nan


Article 486

Title@2025-07-08 (2): Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators

Title: Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators Kernel Trace Distance: Quantum Statistical Metric zwischen den Messungen durch RKHS-Dichteoperatoren 核心跟踪距离:通过RKHS密度运算器测量的量子统计尺度 2507.06055v1

Authors (4): Arturo Castellanos, Anna Korba, Pavlo Mozharovskyi, Hicham Janati

Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators. We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance. In particular, we show that it avoids some pitfalls of MMD, by being more discriminative and robust to the choice of hyperparameters. Moreover, it benefits from some compelling properties of kernel methods, that can avoid the curse of dimensionality for their sample complexity. We provide an algorithm to compute the distance in practice by introducing an extension of kernel matrix for difference of distributions that could be of independent interest. Those advantages are illustrated by robust approximate Bayesian computation under contamination as well as particle flow simulations.

nan


Article 487

Title@2025-07-08 (2): Minimal Deterministic Echo State Networks Outperform Random Reservoirs in Learning Chaotic Dynamics

Title: Minimal Deterministic Echo State Networks Outperform Random Reservoirs in Learning Chaotic Dynamics Minimale deterministische Echo State Networks übertreffen zufällige Reservoirs beim Lernen der Chaotischen Dynamik 最起码的确定性回声状态网络 2507.06050v1

Authors (1): Francesco Martinuzzi

Machine learning (ML) is widely used to model chaotic systems. Among ML approaches, echo state networks (ESNs) have received considerable attention due to their simple construction and fast training. However, ESN performance is highly sensitive to hyperparameter choices and to its random initialization. In this work, we demonstrate that ESNs constructed using deterministic rules and simple topologies (MESNs) outperform standard ESNs in the task of chaotic attractor reconstruction. We use a dataset of more than 90 chaotic systems to benchmark 10 different minimal deterministic reservoir initializations. We find that MESNs obtain up to a 41% reduction in error compared to standard ESNs. Furthermore, we show that the MESNs are more robust, exhibiting less inter-run variation, and have the ability to reuse hyperparameters across different systems. Our results illustrate how structured simplicity in ESN design can outperform stochastic complexity in learning chaotic dynamics.

nan


Article 488

Title@2025-07-08 (2): Neural-Network solver of ideal MHD equilibria

Title: Neural-Network solver of ideal MHD equilibria Neural-Netzwerk-Löser der idealen MHD-Gleichgewichte 理想MHD平衡的神经网解决方案 2507.03119v2

Authors (5): Timo Thun, Andrea Merlo, Rory Conlin, Dario Panici, Daniel Böckenhoff

We present a novel approach to compute three-dimensional Magnetohydrodynamic equilibria by parametrizing Fourier modes with artificial neural networks and compare it to equilibria computed by conventional solvers. The full nonlinear global force residual across the volume in real space is then minimized with first order optimizers. Already,we observe competitive computational cost to arrive at the same minimum residuals computed by existing codes. With increased computational cost,lower minima of the residual are achieved by the neural networks,establishing a new lower bound for the force residual. We use minimally complex neural networks,and we expect significant improvements for solving not only single equilibria with neural networks,but also for computing neural network models valid over continuous distributions of equilibria.

nan


Article 489

Title@2025-07-08 (2): EdgeCodec: Onboard Lightweight High Fidelity Neural Compressor with Residual Vector Quantization

Title: EdgeCodec: Onboard Lightweight High Fidelity Neural Compressor with Residual Vector Quantization EdgeCodec: Onboard-Leichtgewicht Hoher Fidelity-Neuralkompressor mit residualer Vector-Quantisierung 载体内轻重量高菲力坚硬性神经压缩器,残余矢量量化 2507.06040v1

Authors (5): Benjamin Hodo, Tommaso Polonelli, Amirhossein Moallemi, Luca Benini, Michele Magno

We present EdgeCodec, an end-to-end neural compressor for barometric data collected from wind turbine blades. EdgeCodec leverages a heavily asymmetric autoencoder architecture, trained with a discriminator and enhanced by a Residual Vector Quantizer to maximize compression efficiency. It achieves compression rates between 2’560:1 and 10’240:1 while maintaining a reconstruction error below 3%, and operates in real time on the GAP9 microcontroller with bitrates ranging from 11.25 to 45 bits per second. Bitrates can be selected on a sample-by-sample basis, enabling on-the-fly adaptation to varying network conditions. In its highest compression mode, EdgeCodec reduces the energy consumption of wireless data transmission by up to 2.9x, significantly extending the operational lifetime of deployed sensor units.

nan


Article 490

Title@2025-07-08 (2): TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision

Title: TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision TextPixs: Glyphenkonditionierte Diffusion mit zeichenbewusster Aufmerksamkeit und OCR-geführter Überwachung 文本像素: 带有字符提醒注意和 OCR 引导监督的 淋巴- 有条件扩散 2507.06033v1

Authors (6): Syeda Anshrah Gillani, Mirza Samad Ahmed Baig, Osama Ahmed Khan, Shahid Munir Shah, Umema Mujeeb, Maheen Ali

The modern text-to-image diffusion models boom has opened a new era in digital content production as it has proven the previously unseen ability to produce photorealistic and stylistically diverse imagery based on the semantics of natural-language descriptions. However, the consistent disadvantage of these models is that they cannot generate readable, meaningful, and correctly spelled text in generated images, which significantly limits the use of practical purposes like advertising, learning, and creative design. This paper introduces a new framework, namely Glyph-Conditioned Diffusion with Character-Aware Attention (GCDA), using which a typical diffusion backbone is extended by three well-designed modules. To begin with, the model has a dual-stream text encoder that encodes both semantic contextual information and explicit glyph representations, resulting in a character-aware representation of the input text that is rich in nature. Second, an attention mechanism that is aware of the character is proposed with a new attention segregation loss that aims to limit the attention distribution of each character independently in order to avoid distortion artifacts. Lastly, GCDA has an OCR-in-the-loop fine-tuning phase, where a full text perceptual loss, directly optimises models to be legible and accurately spell. Large scale experiments to benchmark datasets, such as MARIO-10M and T2I-CompBench, reveal that GCDA sets a new state-of-the-art on all metrics, with better character based metrics on text rendering (Character Error Rate: 0.08 vs 0.21 for the previous best; Word Error Rate: 0.15 vs 0.25), human perception, and comparable image synthesis quality on high-fidelity (FID: 14.3).

nan


Article 491

Title@2025-07-08 (2): Efficient Federated Learning with Timely Update Dissemination

Title: Efficient Federated Learning with Timely Update Dissemination Effizientes Federated Learning mit rechtzeitiger Aktualisierung der Verbreitung 及时更新传播的高效联邦学习和及时更新更新的传播 2507.06031v1

Authors (7): Juncheng Jia, Ji Liu, Chao Huo, Yihui Shen, Yang Zhou, Huaiyu Dai, Dejing Dou

Federated Learning (FL) has emerged as a compelling methodology for the management of distributed data, marked by significant advancements in recent years. In this paper, we propose an efficient FL approach that capitalizes on additional downlink bandwidth resources to ensure timely update dissemination. Initially, we implement this strategy within an asynchronous framework, introducing the Asynchronous Staleness-aware Model Update (FedASMU), which integrates both server-side and device-side methodologies. On the server side, we present an asynchronous FL system model that employs a dynamic model aggregation technique, which harmonizes local model updates with the global model to enhance both accuracy and efficiency. Concurrently, on the device side, we propose an adaptive model adjustment mechanism that integrates the latest global model with local models during training to further elevate accuracy. Subsequently, we extend this approach to a synchronous context, referred to as FedSSMU. Theoretical analyses substantiate the convergence of our proposed methodologies. Extensive experiments, encompassing six models and five public datasets, demonstrate that FedASMU and FedSSMU significantly surpass baseline methods in terms of both accuracy (up to 145.87%) and efficiency (up to 97.59%).

nan


Article 492

Title@2025-07-08 (2): Multi-view mid fusion: a universal approach for learning in an HDLSS setting

Title: Multi-view mid fusion: a universal approach for learning in an HDLSS setting Multi-View Mid Fusion: ein universeller Ansatz für das Lernen in einem HDLSS-Setting 多视角多视角中融合:在多语言和多语言系统环境下学习的普遍方法 2507.06026v1

Authors (1): Lynn Houthuys

The high-dimensional low-sample-size (HDLSS) setting presents significant challenges in various applications where the feature dimension far exceeds the number of available samples. This paper introduces a universal approach for learning in HDLSS setting using multi-view mid fusion techniques. It shows how existing mid fusion multi-view methods perform well in an HDLSS setting even if no inherent views are provided. Three view construction methods are proposed that split the high-dimensional feature vectors into smaller subsets, each representing a different view. Extensive experimental validation across model-types and learning tasks confirm the effectiveness and generalization of the approach. We believe the work in this paper lays the foundation for further research into the universal benefits of multi-view mid fusion learning.

nan


Article 493

Title@2025-07-08 (2): Kamae: Bridging Spark and Keras for Seamless ML Preprocessing

Title: Kamae: Bridging Spark and Keras for Seamless ML Preprocessing Kamae: Bridging Spark und Keras für Nahtlose ML Vorverarbeitung Kamae: 连接Spark和Keras, 用于无缝ML 2507.06021v1

Authors (4): George Barrowclough, Marian Andrecki, James Shinner, Daniele Donghi

In production recommender systems, feature preprocessing must be faithfully replicated across training and inference environments. This often requires duplicating logic between offline and online environments, increasing engineering effort and introducing risks of dataset shift. We present Kamae, an open-source Python library that bridges this gap by translating PySpark preprocessing pipelines into equivalent Keras models. Kamae provides a suite of configurable Spark transformers and estimators, each mapped to a corresponding Keras layer, enabling consistent, end-to-end preprocessing across the ML lifecycle. Framework’s utility is illustrated on real-world use cases, including MovieLens dataset and Expedia’s Learning-to-Rank pipelines. The code is available at https://github.com/ExpediaGroup/kamae.

nan


Article 494

Title@2025-07-08 (2): Instance-Optimal Quantum State Certification with Entangled Measurements

Title: Instance-Optimal Quantum State Certification with Entangled Measurements Instance-Optimal Quantum State Zertifizierung mit verschränkten Messungen 具有相互缠绕的测量量的国家认证 2507.06010v1

Authors (2): Ryan O’Donnell, Chirag Wadhwa

We consider the task of quantum state certification: given a description of a hypothesis state $\sigma$ and multiple copies of an unknown state $\rho$, a tester aims to determine whether the two states are equal or $\epsilon$-far in trace distance. It is known that $\Theta(d/\epsilon^2)$ copies of $\rho$ are necessary and sufficient for this task, assuming the tester can make entangled measurements over all copies [CHW07,OW15,BOW19]. However, these bounds are for a worst-case $\sigma$, and it is not known what the optimal copy complexity is for this problem on an instance-by-instance basis. While such instance-optimal bounds have previously been shown for quantum state certification when the tester is limited to measurements unentangled across copies [CLO22,CLHL22], they remained open when testers are unrestricted in the kind of measurements they can perform. We address this open question by proving nearly instance-optimal bounds for quantum state certification when the tester can perform fully entangled measurements. Analogously to the unentangled setting, we show that the optimal copy complexity for certifying $\sigma$ is given by the worst-case complexity times the fidelity between $\sigma$ and the maximally mixed state. We prove our lower bounds using a novel quantum analogue of the Ingster-Suslina method, which is likely to be of independent interest. This method also allows us to recover the $\Omega(d/\epsilon^2)$ lower bound for mixedness testing [OW15], i.e., certification of the maximally mixed state, with a surprisingly simple proof.

nan


Article 495

Title@2025-07-08 (2): KnowIt: Deep Time Series Modeling and Interpretation

Title: KnowIt: Deep Time Series Modeling and Interpretation KnowIt: Deep Time Series Modellierung und Interpretation 深时间系列建模和解释 2507.06009v1

Authors (3): M. W. Theunissen, R. Rabe, M. H. Davel

KnowIt (Knowledge discovery in time series data) is a flexible framework for building deep time series models and interpreting them. It is implemented as a Python toolkit, with source code and documentation available from https://must-deep-learning.github.io/KnowIt. It imposes minimal assumptions about task specifications and decouples the definition of dataset, deep neural network architecture, and interpretability technique through well defined interfaces. This ensures the ease of importing new datasets, custom architectures, and the definition of different interpretability paradigms while maintaining on-the-fly modeling and interpretation of different aspects of a user’s own time series data. KnowIt aims to provide an environment where users can perform knowledge discovery on their own complex time series data through building powerful deep learning models and explaining their behavior. With ongoing development, collaboration and application our goal is to make this a platform to progress this underexplored field and produce a trusted tool for deep time series modeling.

nan


Article 496

Title@2025-07-08 (2): Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Title: Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics Skalierbare diskrete Diffusionsproben: Kombinatorische Optimierung und Statistische Physik 可缩放分分解扩散采样器:组合优化和统计物理 2502.08696v3

Authors (6): Sebastian Sanokowski, Wilhelm Berghammer, Martin Ennemoser, Haoyu Peter Wang, Sepp Hochreiter, Sebastian Lehner

Learning to sample from complex unnormalized distributions over discrete domains emerged as a promising research direction with applications in statistical physics, variational inference, and combinatorial optimization. Recent work has demonstrated the potential of diffusion models in this domain. However, existing methods face limitations in memory scaling and thus the number of attainable diffusion steps since they require backpropagation through the entire generative process. To overcome these limitations we introduce two novel training methods for discrete diffusion samplers, one grounded in the policy gradient theorem and the other one leveraging Self-Normalized Neural Importance Sampling (SN-NIS). These methods yield memory-efficient training and achieve state-of-the-art results in unsupervised combinatorial optimization. Numerous scientific applications additionally require the ability of unbiased sampling. We introduce adaptations of SN-NIS and Neural Markov Chain Monte Carlo that enable for the first time the application of discrete diffusion models to this problem. We validate our methods on Ising model benchmarks and find that they outperform popular autoregressive approaches. Our work opens new avenues for applying diffusion models to a wide range of scientific applications in discrete domains that were hitherto restricted to exact likelihood models.

nan


Article 497

Title@2025-07-08 (2): An Optimal Transport Perspective on Unpaired Image Super-Resolution

Title: An Optimal Transport Perspective on Unpaired Image Super-Resolution Optimale Transportperspektive auf ungepaarte Bild-Super-Resolution 对未调图像超分辨率的优化传输视角 2202.01116v3

Authors (7): Milena Gazdieva, Petr Mokrov, Litu Rout, Alexander Korotin, Andrey Kravchenko, Alexander Filippov, Evgeny Burnaev

Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques. As a result, the tasks are usually approached by unpaired techniques based on Generative Adversarial Networks (GANs), which yield complex training losses with several regularization terms, e.g., content or identity losses. While GANs usually provide good practical performance, they are used heuristically, i.e., theoretical understanding of their behaviour is yet rather limited. We theoretically investigate optimization problems which arise in such models and find two surprising observations. First, the learned SR map is always an optimal transport (OT) map. Second, we theoretically prove and empirically show that the learned map is biased, i.e., it does not actually transform the distribution of low-resolution images to high-resolution ones. Inspired by these findings, we investigate recent advances in neural OT field to resolve the bias issue. We establish an intriguing connection between regularized GANs and neural OT approaches. We show that unlike the existing GAN-based alternatives, these algorithms aim to learn an unbiased OT map. We empirically demonstrate our findings via a series of synthetic and real-world unpaired SR experiments. Our source code is publicly available at https://github.com/milenagazdieva/OT-Super-Resolution.

nan


Article 498

Title@2025-07-08 (2): DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting

Title: DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting DRAN: Ein Vertriebs- und Beziehungsadaptives Netzwerk für die räumlich-zeitliche Vorhersage DRAN: 空间时预报分布和关系适应网络 2504.01531v2

Authors (5): Xiaobei Zou, Luolin Xiong, Kexuan Zhang, Cesare Alippi, Yang Tang

Accurate predictions of spatio-temporal systems are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of many spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. In order to address non-stationarity, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and distribution changes over time. While temporal normalization and de-normalization are frequently used techniques to adapt to distribution shifts, this operation is not suitable for the spatio-temporal context as temporal normalization scales the time series of nodes and possibly disrupts the spatial relations among nodes. In order to address this problem, a Spatial Factor Learner (SFL) module is developed that enables the normalization and de-normalization process. To adapt to dynamic changes in spatial relationships among sensors, we propose a Dynamic-Static Fusion Learner (DSFL) module that effectively integrates features learned from both dynamic and static relations through an adaptive fusion ratio mechanism. Furthermore, we introduce a Stochastic Learner to capture the noisy components of spatio-temporal representations. Our approach outperforms state-of-the-art methods on weather prediction and traffic flow forecasting tasks.Experimental results show that our SFL efficiently preserves spatial relationships across various temporal normalization operations. Visualizations of the learned dynamic and static relations demonstrate that DSFL can capture both local and distant relationships between nodes.

nan


Article 499

Title@2025-07-08 (2): Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence

Title: Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence Das beste konstante Rebalancing-Portfolio bei langfristigen Investitionen schlagen: Eine Generalisierung des Kelly-Kriteriums und des universellen Lernalgorithmus für Märkte mit serieller Abhängigkeit 击败长期投资中最佳常态再平衡组合:串级依赖市场凯利标准标准和普遍学习标准普遍化 2507.05994v1

Authors (1): Duy Khanh Lam

In the online portfolio optimization framework, existing learning algorithms generate strategies that yield significantly poorer cumulative wealth compared to the best constant rebalancing portfolio in hindsight, despite being consistent in asymptotic growth rate. While this unappealing performance can be improved by incorporating more side information, it raises difficulties in feature selection and high-dimensional settings. Instead, the inherent serial dependence of assets’ returns, such as day-of-the-week and other calendar effects, can be leveraged. Although latent serial dependence patterns are commonly detected using large training datasets, this paper proposes an algorithm that learns such dependence using only gradually revealed data, without any assumption on their distribution, to form a strategy that eventually exceeds the cumulative wealth of the best constant rebalancing portfolio. Moreover, the classical Kelly criterion, which requires independent assets’ returns, is generalized to accommodate serial dependence in a market modeled as an independent and identically distributed process of random matrices. In such a stochastic market, where existing learning algorithms designed for stationary processes fail to apply, the proposed learning algorithm still generates a strategy that asymptotically grows to the highest rate among all strategies, matching that of the optimal strategy constructed under the generalized Kelly criterion. The experimental results with real market data demonstrate the theoretical guarantees of the algorithm and its performance as expected, as long as serial dependence is significant, regardless of the validity of the generalized Kelly criterion in the experimental market. This further affirms the broad applicability of the algorithm in general contexts.

nan


Article 500

Title@2025-07-08 (2): Counterfactual Inference under Thompson Sampling

Title: Counterfactual Inference under Thompson Sampling Gegenfaktische Schlussfolgerung unter Thompson Sampling Thompson抽样下的反事实推断 2504.08773v2

Authors (1): Olivier Jeunen

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives. To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take. Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts. Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures. We derive exact and efficiently computable expressions for action propensities under a variety of parameter and outcome distributions, enabling the use of off-policy estimators in Thompson sampling scenarios. This opens up a range of practical use-cases where counterfactual inference is crucial, including unbiased offline evaluation of recommender systems, as well as general applications of causal inference in online advertising, personalisation, and beyond.

nan


Article 501

Title@2025-07-08 (2): A Survey of Multi Agent Reinforcement Learning: Federated Learning and Cooperative and Noncooperative Decentralized Regimes

Title: A Survey of Multi Agent Reinforcement Learning: Federated Learning and Cooperative and Noncooperative Decentralized Regimes Eine Umfrage über das Lernen von Multi-Agenten-Verstärkung: Föderiertes Lernen und kooperative und nichtkooperative dezentralisierte Regimes 多代理强化学习调查:联邦学习与合作及不合作的权力下放制度 2507.06278v1

Authors (6): Kemboi Cheruiyot, Nickson Kiprotich, Vyacheslav Kungurtsev, Kennedy Mugo, Vivian Mwirigi, Marvin Ngesa

The increasing interest in research and innovation towards the development of autonomous agents presents a number of complex yet important scenarios of multiple AI Agents interacting with each other in an environment. The particular setting can be understood as exhibiting three possibly topologies of interaction - centrally coordinated cooperation, ad-hoc interaction and cooperation, and settings with noncooperative incentive structures. This article presents a comprehensive survey of all three domains, defined under the formalism of Federal Reinforcement Learning (RL), Decentralized RL, and Noncooperative RL, respectively. Highlighting the structural similarities and distinctions, we review the state of the art in these subjects, primarily explored and developed only recently in the literature. We include the formulations as well as known theoretical guarantees and highlights and limitations of numerical performance.

nan


Article 502

Title@2025-07-08 (2): Robust Speech-Workload Estimation for Intelligent Human-Robot Systems

Title: Robust Speech-Workload Estimation for Intelligent Human-Robot Systems Robuste Sprach-Workload-Schätzung für intelligente Mensch-Roboter-Systeme 智能人类机器人系统的强力语音-工作负荷估计 2507.05985v1

Authors (3): Julian Fortune, Julie A. Adams, Jamison Heard

Demanding task environments (e.g., supervising a remotely piloted aircraft) require performing tasks quickly and accurately; however, periods of low and high operator workload can decrease task performance. Intelligent modulation of the system’s demands and interaction modality in response to changes in operator workload state may increase performance by avoiding undesirable workload states. This system requires real-time estimation of each workload component (i.e., cognitive, physical, visual, speech, and auditory) to adapt the correct modality. Existing workload systems estimate multiple workload components post-hoc, but few estimate speech workload, or function in real-time. An algorithm to estimate speech workload and mitigate undesirable workload states in real-time is presented. An analysis of the algorithm’s accuracy is presented, along with the results demonstrating the algorithm’s generalizability across individuals and human-machine teaming paradigms. Real-time speech workload estimation is a crucial element towards developing adaptive human-machine systems.

nan


Article 503

Title@2025-07-08 (2): RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages

Title: RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages RabakBench: Skalierung menschlicher Anmerkungen zum Aufbau lokalisierter Mehrsprachiger Sicherheits-Benchmarks für ressourcenarme Sprachen Rabak Rabak Bench:提高人文说明,以建立本地化的低资源语言多语文安全基准 2507.05980v1

Authors (4): Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee

Large language models (LLMs) and their safety classifiers often perform poorly on low-resource languages due to limited training data and evaluation benchmarks. This paper introduces RabakBench, a new multilingual safety benchmark localized to Singapore’s unique linguistic context, covering Singlish, Chinese, Malay, and Tamil. RabakBench is constructed through a scalable three-stage pipeline: (i) Generate - adversarial example generation by augmenting real Singlish web content with LLM-driven red teaming; (ii) Label - semi-automated multi-label safety annotation using majority-voted LLM labelers aligned with human judgments; and (iii) Translate - high-fidelity translation preserving linguistic nuance and toxicity across languages. The final dataset comprises over 5,000 safety-labeled examples across four languages and six fine-grained safety categories with severity levels. Evaluations of 11 popular open-source and closed-source guardrail classifiers reveal significant performance degradation. RabakBench not only enables robust safety evaluation in Southeast Asian multilingual settings but also offers a reproducible framework for building localized safety datasets in low-resource environments. The benchmark dataset, including the human-verified translations, and evaluation code are publicly available.

nan


Article 504

Title@2025-07-08 (2): CoDy: Counterfactual Explainers for Dynamic Graphs

Title: CoDy: Counterfactual Explainers for Dynamic Graphs CoDy: Counterfactual Explainers für dynamische Graphen CoDy: 动态图表的反事实解析器 2403.16846v2

Authors (3): Zhan Qu, Daniel Gomm, Michael Färber

Temporal Graph Neural Networks (TGNNs) are widely used to model dynamic systems where relationships and features evolve over time. Although TGNNs demonstrate strong predictive capabilities in these domains, their complex architectures pose significant challenges for explainability. Counterfactual explanation methods provide a promising solution by illustrating how modifications to input graphs can influence model predictions. To address this challenge, we present CoDy, Counterfactual Explainer for Dynamic Graphs, a model-agnostic, instance-level explanation approach that identifies counterfactual subgraphs to interpret TGNN predictions. CoDy employs a search algorithm that combines Monte Carlo Tree Search with heuristic selection policies, efficiently exploring a vast search space of potential explanatory subgraphs by leveraging spatial, temporal, and local event impact information. Extensive experiments against state-of-the-art factual and counterfactual baselines demonstrate CoDy’s effectiveness, with improvements of 16% in AUFSC+ over the strongest baseline.

nan


Article 505

Title@2025-07-08 (2): VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

Title: VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play VolleyballBots: Ein Testbett für Multi-Drone Volleyball Spiel Kombination von Motion Control und strategischem Spiel VolleyBots:多轨排球游戏组合运动控制和战略游戏的测试台 2502.01932v4

Authors (12): Zelai Xu, Ruize Zhang, Chao Yu, Huining Yuan, Xiangmin Yi, Shilong Ji, Chuqi Wang, Wenhao Tang, Feng Gao, Wenbo Ding, Xinlei Chen, Yu Wang

Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering. Competitive and cooperative gameplay challenges each drone to coordinate with its teammates while anticipating and countering opposing teams’ tactics. Turn-based interaction demands precise timing, accurate state prediction, and management of long-horizon temporal dependencies. Agile 3D maneuvering requires rapid accelerations, sharp turns, and precise 3D positioning despite the quadrotor’s underactuated dynamics. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy reinforcement learning (RL) methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy which achieves a 69.5% percent win rate against the strongest baseline in the 3 vs 3 task, underscoring its potential as an effective solution for tackling the complex interplay between low-level control and high-level strategy. The project page is at https://sites.google.com/view/thu-volleybots.

nan


Article 506

Title@2025-07-08 (2): Generalized and Unified Equivalences between Hardness and Pseudoentropy

Title: Generalized and Unified Equivalences between Hardness and Pseudoentropy Generalisierte und einheitliche Äquivalenz zwischen Härte und Pseudoentropie 硬度和优度机能之间的普遍和统一等同 2507.05972v1

Authors (2): Lunjia Hu, Salil Vadhan

Pseudoentropy characterizations provide a quantitatively precise demonstration of the close relationship between computational hardness and computational randomness. We prove a unified pseudoentropy characterization that generalizes and strengthens previous results for both uniform and non-uniform models of computation. Our characterization holds for a general family of entropy notions that encompasses the common notions of Shannon entropy and min entropy as special cases. Moreover, we show that the characterizations for different entropy notions can be simultaneously achieved by a single, universal function that simultaneously witnesses computational hardness and computational randomness. A key technical insight of our work is that the notion of weight-restricted calibration from the recent literature on algorithm fairness, along with standard computational indistinguishability (known as multiaccuracy in the fairness literature), suffices for proving pseudoentropy characterizations for general entropy notions. This demonstrates the power of weight-restricted calibration to enhance the classic Complexity-Theoretic Regularity Lemma (Trevisan, Tulsiani, and Vadhan, 2009) and Leakage Simulation Lemma (Jetchev and Pietrzak, 2014) and allows us to achieve an exponential improvement in the complexity dependency on the alphabet size compared to the pseudoentropy characterizations by Casacuberta, Dwork, and Vadhan (2024) based on the much stronger notion of multicalibration. We show that the exponential dependency on the alphabet size is inevitable for multicalibration as well as for the weaker notion of calibrated multiaccuracy.

nan


Article 507

Title@2025-07-08 (2): Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

Title: Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model Analytisches Subraum-Routing: Wie rekursive Least Squares im kontinuierlichen Lernen des großen Sprachmodells funktionieren 分析子空间运行分析:在继续不断学习大语言模式中如何使最起码的广场恢复 2503.13575v2

Authors (8): Kai Tong, Kang Pan, Xiao Zhang, Erli Meng, Run He, Yawen Cui, Nuoyan Guo, Huiping Zhuang

Large Language Models (LLMs) possess encompassing capabilities that can process diverse language-related tasks. However, finetuning on LLMs will diminish this general skills and continual finetuning will further cause severe degradation on accumulated knowledge. Recently, Continual Learning (CL) in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks while maintaining previously learned knowledge and inheriting general skills. Existing techniques either leverage previous data to replay, leading to extra computational costs, or utilize a single parameter-efficient module to learn the downstream task, constraining new knowledge absorption with interference between different tasks. Toward these issues, this paper proposes Analytic Subspace Routing(ASR) to address these challenges. For each task, we isolate the learning within a subspace of deep layers’ features via low-rank adaptation, eliminating knowledge interference between different tasks. Additionally, we propose an analytic routing mechanism to properly utilize knowledge learned in different subspaces. Our approach employs Recursive Least Squares to train a multi-task router model, allowing the router to dynamically adapt to incoming data without requiring access to historical data. Also, the router effectively assigns the current task to an appropriate subspace and has a non-forgetting property of previously learned tasks with a solid theoretical guarantee. Experimental results demonstrate that our method achieves near-perfect retention of prior knowledge while seamlessly integrating new information, effectively overcoming the core limitations of existing methods. Our code will be released after acceptance.

nan


Article 508

Title@2025-07-08 (2): Simple Convergence Proof of Adam From a Sign-like Descent Perspective

Title: Simple Convergence Proof of Adam From a Sign-like Descent Perspective Simple Convergence Beweis für Adam aus einer sign-like Descent Perspektive 亚当的简单聚合证据 2507.05966v1

Authors (6): Hanyang Peng, Shuang Qin, Yue Yu, Fangqing Jiang, Hui Wang, Zhouchen Lin

Adam is widely recognized as one of the most effective optimizers for training deep neural networks (DNNs). Despite its remarkable empirical success, its theoretical convergence analysis remains unsatisfactory. Existing works predominantly interpret Adam as a preconditioned stochastic gradient descent with momentum (SGDM), formulated as $\bm{x}{t+1} = \bm{x}_t - \frac{\gamma_t}{ {\sqrt{\bm{v}_t}+\epsilon} } \circ \bm{m}_t$. This perspective necessitates strong assumptions and intricate techniques, resulting in lengthy and opaque convergence proofs that are difficult to verify and extend. In contrast, we propose a novel interpretation by treating Adam as a sign-like optimizer, expressed as $\bm{x}{t+1} = \bm{x}_t - \gamma_t \frac{ \bm{m}_t }{ {\sqrt{\bm{v}_t}+\epsilon} } \circ {\rm Sign}(\bm{m}_t)$. This reformulation significantly simplifies the convergence analysis. For the first time, with some mild conditions, we prove that Adam achieves the optimal rate of ${\cal O}(\frac{1}{T^{\sfrac{1}{4}}})$ rather than the previous ${\cal O} \left(\frac{\ln T}{T^{\sfrac{1}{4}}}\right)$ under weak assumptions of the generalized $p$-affine variance and $(L_0, L_1, q)$-smoothness, without dependence on the model dimensionality or the numerical stability parameter $\epsilon$. Additionally, our theoretical analysis provides new insights into the role of momentum as a key factor ensuring convergence and offers practical guidelines for tuning learning rates in Adam, further bridging the gap between theory and practice.

nan


Article 509

Title@2025-07-08 (2): Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification

Title: Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification Vergleichende Analyse von CNN- und Transformer-Architekturen mit Herzzyklus-Normalisierung für automatisierte Phonokardiogramm-Klassifikation CNN和变压器结构与心脏循环正常化用于自动心电图分类的比较分析 2507.07058v1

Authors (5): Martin Sondermann, Pinar Bisgin, Niklas Tschorn, Anja Burmann, Christoph M. Friedrich

The automated classification of phonocardiogram (PCG) recordings represents a substantial advancement in cardiovascular diagnostics. This paper presents a systematic comparison of four distinct models for heart murmur detection: two specialized convolutional neural networks (CNNs) and two zero-shot universal audio transformers (BEATs), evaluated using fixed-length and heart cycle normalization approaches. Utilizing the PhysioNet2022 dataset, a custom heart cycle normalization method tailored to individual cardiac rhythms is introduced. The findings indicate the following AUROC values: the CNN model with fixed-length windowing achieves 79.5%, the CNN model with heart cycle normalization scores 75.4%, the BEATs transformer with fixed-length windowing achieves 65.7%, and the BEATs transformer with heart cycle normalization results in 70.1%. The findings indicate that physiological signal constraints, especially those introduced by different normalization strategies, have a substantial impact on model performance. The research provides evidence-based guidelines for architecture selection in clinical settings, emphasizing the need for a balance between accuracy and computational efficiency. Although specialized CNNs demonstrate superior performance overall, the zero-shot transformer models may offer promising efficiency advantages during development, such as faster training and evaluation cycles, despite their lower classification accuracy. These findings highlight the potential of automated classification systems to enhance cardiac diagnostics and improve patient care.

nan


Article 510

Title@2025-07-08 (2): skfolio: Portfolio Optimization in Python

Title: skfolio: Portfolio Optimization in Python skfolio: Portfoliooptimierung in Python skfolio: Python 的组合组合优化 2507.04176v2

Authors (3): Carlo Nicolini, Matteo Manzi, Hugo Delatte

Portfolio optimization is a fundamental challenge in quantitative finance, requiring robust computational tools that integrate statistical rigor with practical implementation. We present skfolio, an open-source Python library for portfolio construction and risk management that seamlessly integrates with the scikit-learn ecosystem. skfolio provides a unified framework for diverse allocation strategies, from classical mean-variance optimization to modern clustering-based methods, state-of-the-art financial estimators with native interfaces, and advanced cross-validation techniques tailored for financial time series. By adhering to scikit-learn’s fit-predict-transform paradigm, the library enables researchers and practitioners to leverage machine learning workflows for portfolio optimization, promoting reproducibility and transparency in quantitative finance.

nan


Article 511

Title@2025-07-08 (2): Rethinking Associative Memory Mechanism in Induction Head

Title: Rethinking Associative Memory Mechanism in Induction Head Rethinking Assoziative Memory Mechanism in Induktion Head 上岗时重新思考联合记忆机制 2412.11459v2

Authors (2): Shuo Wang, Issei Sato

Induction head mechanism is a part of the computational circuits for in-context learning (ICL) that enable large language models (LLMs) to adapt to new tasks without fine-tuning. Most existing work explains the training dynamics behind acquiring such a powerful mechanism. However, the model’s ability to coordinate in-context information over long contexts and global knowledge acquired during pretraining remains poorly understood. This paper investigates how a two-layer transformer thoroughly captures in-context information and balances it with pretrained bigram knowledge in next token prediction, from the viewpoint of associative memory. We theoretically analyze the representation of weight matrices in attention layers and the resulting logits when a transformer is given prompts generated by a bigram model. In the experiments, we design specific prompts to evaluate whether the outputs of the trained transformer align with the theoretical results.

nan


Article 512

Title@2025-07-08 (2): Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport

Title: Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport Unüberwachte Anomalienerkennung durch massenabstoßenden optimalen Transport 通过大规模反转最佳运输方式进行不受监督的异常探测 2502.12793v2

Authors (3): Eduardo Fernandes Montesuma, Adel El Habazi, Fred Ngole Mboula

Detecting anomalies in datasets is a longstanding problem in machine learning. In this context, anomalies are defined as a sample that significantly deviates from the remaining data. Meanwhile, optimal transport (OT) is a field of mathematics concerned with the transportation, between two probability measures, at least effort. In classical OT, the optimal transportation strategy of a measure to itself is the identity. In this paper, we tackle anomaly detection by forcing samples to displace its mass, while keeping the least effort objective. We call this new transportation problem Mass Repulsing Optimal Transport (MROT). Naturally, samples lying in low density regions of space will be forced to displace mass very far, incurring a higher transportation cost. We use these concepts to design a new anomaly score. Through a series of experiments in existing benchmarks, and fault detection problems, we show that our algorithm improves over existing methods.

nan


Article 513

Title@2025-07-08 (2): Deep neural networks have an inbuilt Occam’s razor

Title: Deep neural networks have an inbuilt Occam’s razor Tiefe neuronale Netzwerke haben einen eingebauten Occams Rasiermesser 深心神经网络有一个内建的奥卡姆剃刀 2304.06670v2

Authors (4): Chris Mingard, Henry Rees, Guillermo Valle-Pérez, Ard A. Louis

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam’s razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.

nan


Article 514

Title@2025-07-08 (2): Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling

Title: Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling Verbesserung der KI-basierten Canine Heart Disease Diagnose mit Expert-Consensus Auskultation Kennzeichnung 改进基于AI的犬心病诊断,与专家 – – 咨询理事会的隔离标签 2507.05950v1

Authors (11): Pinar Bisgin, Tom Strube, Niklas Tschorn, Michael Pantförder, Maximilian Fecke, Ingrid Ljungvall, Jens Häggström, Gerhard Wess, Christoph Schummer, Sven Meister, Falk M. Howar

Noisy labels pose significant challenges for AI model training in veterinary medicine. This study examines expert assessment ambiguity in canine auscultation data, highlights the negative impact of label noise on classification performance, and introduces methods for label noise reduction. To evaluate whether label noise can be minimized by incorporating multiple expert opinions, a dataset of 140 heart sound recordings (HSR) was annotated regarding the intensity of holosystolic heart murmurs caused by Myxomatous Mitral Valve Disease (MMVD). The expert opinions facilitated the selection of 70 high-quality HSR, resulting in a noise-reduced dataset. By leveraging individual heart cycles, the training data was expanded and classification robustness was enhanced. The investigation encompassed training and evaluating three classification algorithms: AdaBoost, XGBoost, and Random Forest. While AdaBoost and Random Forest exhibited reasonable performances, XGBoost demonstrated notable improvements in classification accuracy. All algorithms showed significant improvements in classification accuracy due to the applied label noise reduction, most notably XGBoost. Specifically, for the detection of mild heart murmurs, sensitivity increased from 37.71% to 90.98% and specificity from 76.70% to 93.69%. For the moderate category, sensitivity rose from 30.23% to 55.81% and specificity from 64.56% to 97.19%. In the loud/thrilling category, sensitivity and specificity increased from 58.28% to 95.09% and from 84.84% to 89.69%, respectively. These results highlight the importance of minimizing label noise to improve classification algorithms for the detection of canine heart murmurs. Index Terms: AI diagnosis, canine heart disease, heart sound classification, label noise reduction, machine learning, XGBoost, veterinary cardiology, MMVD.

nan


Article 515

Title@2025-07-08 (2): Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence

Title: Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence Messung variabler Bedeutung bei heterogenen Behandlungseffekten mit Vertrauen 具有信心地衡量不同不同不同待遇的可变重要性 2408.13002v4

Authors (5): Joseph Paillard, Angel Reyero Lobo, Vitaliy Kolodyazhniy, Bertrand Thirion, Denis A. Engemann

Causal machine learning holds promise for estimating individual treatment effects from complex data. For successful real-world applications of machine learning methods, it is of paramount importance to obtain reliable insights into which variables drive heterogeneity in the response to treatment. We propose PermuCATE, an algorithm based on the Conditional Permutation Importance (CPI) method, for statistically rigorous global variable importance assessment in the estimation of the Conditional Average Treatment Effect (CATE). Theoretical analysis of the finite sample regime and empirical studies show that PermuCATE has lower variance than the Leave-One-Covariate-Out (LOCO) reference method and provides a reliable measure of variable importance. This property increases statistical power, which is crucial for causal inference in the limited-data regime common to biomedical applications. We empirically demonstrate the benefits of PermuCATE in simulated and real-world health datasets, including settings with up to hundreds of correlated variables.

nan


Article 516

Title@2025-07-08 (2): Information-theoretic machine learning for time-varying mode decomposition of separated aerodynamic flows

Title: Information-theoretic machine learning for time-varying mode decomposition of separated aerodynamic flows Information-theoretisches maschinelles Lernen für die Zeit-Verschiedenheit-Modus Zersetzung von getrennten aerodynamischen Strömungen 分离的空气动力流动分流时间变化模式分解分解信息理论机学习 2505.24132v2

Authors (2): Kai Fukami, Ryo Araki

We perform an information-theoretic mode decomposition for separated aerodynamic flows. The current data-driven approach based on a neural network referred to as deep sigmoidal flow enables the extraction of an informative component from a given flow field snapshot with respect to a target variable at a future time stamp, thereby capturing the causality as a time-varying modal structure. We consider four examples of separated flows around a wing, namely, 1. laminar periodic wake at post-stall angles of attack, strong gust-wing interactions of 2. numerical and 3. experimental measurements, and 4. a turbulent wake in a spanwise-periodic domain. The present approach reveals informative vortical structures associated with a time-varying lift response. For the periodic shedding cases, the informative structures vary in time corresponding to the fluctuation level from their mean values. With the examples of gust-wing interactions, how the effect of gust on a wing emerges in the lift response over time is identified in an interpretable manner. Furthermore, for the case of turbulent wake, the present model highlights structures near the wing and vortex cores as informative components based solely on the information metric without any prior knowledge of aerodynamics and length scales. This study provides causality-based insights into a range of unsteady aerodynamic problems.

nan


Article 517

Title@2025-07-08 (2): News and Load: Social and Economic Drivers of Regional Multi-horizon Electricity Demand Forecasting

Title: News and Load: Social and Economic Drivers of Regional Multi-horizon Electricity Demand Forecasting News and Load: Soziale und wirtschaftliche Treiber regionaler Mehrhorizont-Stromnachfrageprognosen 新闻与负载:区域多光速电力需求预测的社会和经济驱动因素 2406.06641v2

Authors (3): Yun Bai, Simon Camal, Andrea Michiorri

The relationship between electricity demand and variables such as economic activity and weather patterns is well established. However, this paper explores the connection between electricity demand and social aspects. It further embeds dynamic information about the state of society into energy demand modelling and forecasting approaches. Through the use of natural language processing on a large news corpus, we highlight this important link. This study is conducted in five regions of the UK and Ireland and considers multiple time horizons from 1 to 30 days. It also considers economic variables such as GDP, unemployment and inflation. The textual features used in this study represent central constructs from the word frequencies, topics, word embeddings extracted from the news. The findings indicate that: 1) the textual features are related to various contents, such as military conflicts, transportation, the global pandemic, regional economics, and the international energy market. They exhibit causal relationships with regional electricity demand, which are validated using Granger causality and Double Machine Learning methods. 2) Economic indicators play a more important role in the East Midlands and Northern Ireland, while social indicators are more influential in the West Midlands and the South West of England. 3) The use of these factors improves deterministic forecasting by around 6%.

nan


Article 518

Title@2025-07-08 (2): WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling

Title: WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling WATS: Kalibrierung von Graphen-Neuralnetzwerken mit Wavelet-Aware-Temperaturskalierung WATS: 校准带有波盘软件温度缩放的图形神经网络 2506.23782v2

Authors (6): Xiaoyang Li, Linwei Tao, Haohui Lu, Minjing Dong, Junbin Gao, Chang Xu

Graph Neural Networks (GNNs) have demonstrated strong predictive performance on relational data; however, their confidence estimates often misalign with actual predictive correctness, posing significant limitations for deployment in safety-critical settings. While existing graph-aware calibration methods seek to mitigate this limitation, they primarily depend on coarse one-hop statistics, such as neighbor-predicted confidence, or latent node embeddings, thereby neglecting the fine-grained structural heterogeneity inherent in graph topology. In this work, we propose Wavelet-Aware Temperature Scaling (WATS), a post-hoc calibration framework that assigns node-specific temperatures based on tunable heat-kernel graph wavelet features. Specifically, WATS harnesses the scalability and topology sensitivity of graph wavelets to refine confidence estimates, all without necessitating model retraining or access to neighboring logits or predictions. Extensive evaluations across seven benchmark datasets with varying graph structures and two GNN backbones demonstrate that WATS achieves the lowest Expected Calibration Error (ECE) among all compared methods, outperforming both classical and graph-specific baselines by up to 42.3\% in ECE and reducing calibration variance by 17.24\% on average compared with graph-specific methods. Moreover, WATS remains computationally efficient, scaling well across graphs of diverse sizes and densities. Code will be released based on publication.

nan


Article 519

Title@2025-07-08 (2): A Comprehensive Study of Shapley Value in Data Analytics

Title: A Comprehensive Study of Shapley Value in Data Analytics Eine umfassende Untersuchung des Shapley-Wertes in der Datenanalyse 全面研究数据分析中的虚光值 2412.01460v8

Authors (7): Hong Lin, Shixin Wan, Zhongle Xie, Ke Chen, Meihui Zhang, Lidan Shou, Gang Chen

Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper presents the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, disentangle the resolution techniques from existing arts in this field, then analyze and discuss the techniques w.r.t. each challenge and the potential conflicts between challenges.We also implement SVBench, a modular and extensible open-source framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.

nan


Article 520

Title@2025-07-08 (2): Self-supervised learning of speech representations with Dutch archival data

Title: Self-supervised learning of speech representations with Dutch archival data Selbstüberwachtes Lernen von Sprachdarstellungen mit niederländischen Archivdaten 自行监督学习用荷兰档案数据进行语音陈述 2507.04554v2

Authors (3): Nik Vaessen, Roeland Ordelman, David A. van Leeuwen

This paper explores the use of Dutch archival television broadcast data for self-supervised learning of speech foundation models, specifically wav2vec 2.0. We first study data quality assumptions for pre-training, and show how music, noise and speaker overlap affect SSL convergence and downstream fine-tuning performance. Secondly, we explore effectively pre-processing strategies to convert the noisy broadcast dataset into a qualitative dataset for pre-training, by using Whisper and WhisperX. Thirdly, we compare mono-lingual and multi-lingual pre-training with equivalent amounts of data, and show that mono-lingual pre-training is more robust to out-of-domain data. Lastly, we achieve a state-of-the-art LARGE wav2vec 2.0 model for the Dutch language, by a continuation of pre-training a wav2vec 2.0 XLS-R model checkpoint with our 55k hour archival dataset.

nan


Article 521

Title@2025-07-08 (2): KD$^{2}$M: A unifying framework for feature knowledge distillation

Title: KD$^{2}$M: A unifying framework for feature knowledge distillation KD$^{2}$M: Ein einheitlicher Rahmen für die Merkmalsdestillation von Wissen KD$2}百万美元:特色知识蒸馏的统一框架 2504.01757v2

Authors (1): Eduardo Fernandes Montesuma

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks’ predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets’ activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD$^{2}$M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD.

nan


Article 522

Title@2025-07-08 (2): Online Regularized Learning Algorithms in RKHS with $β$- and $φ$-Mixing Sequences

Title: Online Regularized Learning Algorithms in RKHS with $β$- and $φ$-Mixing Sequences Online Regularisierte Lernalgorithmen in RKHS mit $β$- und $φ$-Mischsequenzen RKHS的在线正规化学习比值,按美元和美元混合顺序计算 2507.05929v1

Authors (2): Priyanka Roy, Susanne Saminger-Platz

In this paper, we study an online regularized learning algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes. We choose such a process where the degree of dependence is measured by mixing coefficients. As a representative example, we analyze a strictly stationary Markov chain, where the dependence structure is characterized by the (\phi)- and (\beta)-mixing coefficients. Under these assumptions, we derive probabilistic upper bounds as well as convergence rates for both the exponential and polynomial decay of the mixing coefficients.

nan


Article 523

Title@2025-07-08 (2): Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques

Title: Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques Advancing Offline Handschriftliche Texterkennung: Eine systematische Überprüfung von Datenvergrößerungs- und Erzeugungstechniken 推进离线手写文本识别:系统审查数据增强和生成技术 2507.06275v1

Authors (7): Yassin Hussein Rassul, Aram M. Ahmed, Polla Fattah, Bryar A. Hassan, Arwaa W. Abdulkareem, Tarik A. Rashid, Joan Lu

Offline Handwritten Text Recognition (HTR) systems play a crucial role in applications such as historical document digitization, automatic form processing, and biometric authentication. However, their performance is often hindered by the limited availability of annotated training data, particularly for low-resource languages and complex scripts. This paper presents a comprehensive survey of offline handwritten data augmentation and generation techniques designed to improve the accuracy and robustness of HTR systems. We systematically examine traditional augmentation methods alongside recent advances in deep learning, including Generative Adversarial Networks (GANs), diffusion models, and transformer-based approaches. Furthermore, we explore the challenges associated with generating diverse and realistic handwriting samples, particularly in preserving script authenticity and addressing data scarcity. This survey follows the PRISMA methodology, ensuring a structured and rigorous selection process. Our analysis began with 1,302 primary studies, which were filtered down to 848 after removing duplicates, drawing from key academic sources such as IEEE Digital Library, Springer Link, Science Direct, and ACM Digital Library. By evaluating existing datasets, assessment metrics, and state-of-the-art methodologies, this survey identifies key research gaps and proposes future directions to advance the field of handwritten text generation across diverse linguistic and stylistic landscapes.

nan


Article 524

Title@2025-07-08 (2): Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data

Title: Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data Diffusion Datensatzkondensation: Training Ihres Diffusionsmodells schneller mit weniger Daten 传播数据集集中: 训练您的传播模型, 以更少数据更快的速度 2507.05914v1

Authors (9): Rui Huang, Shitong Shao, Zikai Zhou, Pukun Zhao, Hangyu Guo, Tian Ye, Lichen Bai, Shuo Yang, Zeke Xie

Diffusion models have achieved remarkable success in various generative tasks, but training them remains highly resource-intensive, often requiring millions of images and many days of GPU computation. From a data-centric perspective addressing this limitation, we study diffusion dataset condensation as a new and challenging problem setting. The goal is to construct a “synthetic” sub-dataset with significantly fewer samples than the original dataset, enabling high-quality diffusion model training with greatly reduced cost. To the best of our knowledge, we are the first to formally investigate dataset condensation for diffusion models, whereas prior work focused on training discriminative models. To tackle this new challenge, we propose a novel Diffusion Dataset Condensation (D2C) framework, which consists of two phases: Select and Attach. The Select phase identifies a compact and diverse subset using a diffusion difficulty score and interval sampling. The Attach phase enhances the selected subset by attaching rich semantic and visual representations to strengthen the conditional signals. Extensive experiments across various dataset sizes, model architectures, and resolutions show that our D2C framework enables significantly faster diffusion model training with dramatically fewer data, while preserving high visual quality. Notably, for the SiT-XL/2 architecture, D2C achieves a 100x training speed-up, reaching a FID score of 4.3 in just 40k steps using only 0.8% of the training data.

nan


Article 525

Title@2025-07-08 (2): Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis

Title: Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis Best-of-N durch die Glättende Lens: KL Divergence and Regret Analysis 通过平滑透镜:KL差异和悔恨分析实现最佳N级 2507.05913v1

Authors (5): Gholamali Aminian, Idan Shenfeld, Amir R. Asadi, Ahmad Beirami, Youssef Mroueh

A simple yet effective method for inference-time alignment of generative models is Best-of-$N$ (BoN), where $N$ outcomes are sampled from a reference policy, evaluated using a proxy reward model, and the highest-scoring one is selected. While prior work argues that BoN is almost optimal in reward vs KL tradeoffs, the effectiveness of BoN depends critically on the quality of the proxy reward model used for selection. For this purpose, we study BoN through a smooth version known as Soft Best-of-N (SBoN) and develop a theoretical framework to address this gap. We analyze the scaling behaviour of BoN by providing bounds on the KL divergence between the SBoN policy and the reference policy, offering insights into how performance varies with the number of samples. We also study the regret gap, i.e., the gap between the expected true reward under the optimal policy and the SBoN policy. Our theoretical and empirical findings show that smoothing helps SBoN mitigate reward overoptimization, especially when the quality of the proxy reward is low.

nan


Article 526

Title@2025-07-08 (2): Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality

Title: Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality Tiefes Lernen aus stark mischenden Beobachtungen: Sparse-penalisierte Regularisierung und Minimax-Optimalität 从高度混合的观测中汲取的深层经验教训: 2406.08321v2

Authors (2): William Kengne, Modou Wade

The explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$\cdots$, oracle inequality for the expected excess risk is established and a bound on the class of H"older smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the $L_2$ error and investigate an upper bound of this error on a class of H"older composition functions. For the specific case of nonparametric autoregression with Gaussian and Laplace errors, a lower bound of the $L_2$ error on this H"older composition class is established. Up to logarithmic factor, this bound matches its upper bound; so, the deep neural network estimator attains the minimax optimal rate.

nan


Article 527

Title@2025-07-08 (2): Longitudinal Ensemble Integration for sequential classification with multimodal data

Title: Longitudinal Ensemble Integration for sequential classification with multimodal data Longitudinal Ensemble Integration für die sequentielle Klassifizierung mit multimodalen Daten 与多式联运数据相继分类的一体化 2411.05983v2

Authors (8): Aviad Susman, Rupak Krishnamurthy, Yan Chak Li, Mohammad Olaimat, Serdar Bozdag, Bino Varghese, Nasim Sheikh-Bahaei, Gaurav Pandey

Effectively modeling multimodal longitudinal data is a pressing need in various application areas, especially biomedicine. Despite this, few approaches exist in the literature for this problem, with most not adequately taking into account the multimodality of the data. In this study, we developed multiple configurations of a novel multimodal and longitudinal learning framework, Longitudinal Ensemble Integration (LEI), for sequential classification. We evaluated LEI’s performance, and compared it against existing approaches, for the early detection of dementia, which is among the most studied multimodal sequential classification tasks. LEI outperformed these approaches due to its use of intermediate base predictions arising from the individual data modalities, which enabled their better integration over time. LEI’s design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses. Overall, our work demonstrates the potential of LEI for sequential classification from longitudinal multimodal data.

nan


Article 528

Title@2025-07-08 (2): Trust-Region Twisted Policy Improvement

Title: Trust-Region Twisted Policy Improvement Vertrauensregion verdrehte politische Verbesserung 改变政策改进 2504.06048v4

Authors (4): Joery A. de Vries, Jinke He, Yaniv Oren, Matthijs T. J. Spaan

Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scaling MCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.

nan


Article 529

Title@2025-07-08 (2): Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs

Title: Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen 未受监督的优化交通学习计划预测 2506.12025v3

Authors (3): Sonia Mazelet, Rémi Flamary, Bertrand Thirion

Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.

nan


Article 530

Title@2025-07-08 (2): Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why

Title: Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why Feature-based vs. GAN-based Learning from Demonstrations: Wann und warum 从示范活动中学习:何时和为何 2507.05906v1

Authors (3): Chenhao Li, Marco Hutter, Andreas Krause

This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations, with a focus on the structure of reward functions and their implications for policy learning. Feature-based methods offer dense, interpretable rewards that excel at high-fidelity motion imitation, yet often require sophisticated representations of references and struggle with generalization in unstructured settings. GAN-based methods, in contrast, use implicit, distributional supervision that enables scalability and adaptation flexibility, but are prone to training instability and coarse reward signals. Recent advancements in both paradigms converge on the importance of structured motion representations, which enable smoother transitions, controllable synthesis, and improved task integration. We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced: rather than one paradigm dominating the other, the choice should be guided by task-specific priorities such as fidelity, diversity, interpretability, and adaptability. This work outlines the algorithmic trade-offs and design considerations that underlie method selection, offering a framework for principled decision-making in learning from demonstrations.

nan


Article 531

Title@2025-07-08 (2): Universal Embeddings of Tabular Data

Title: Universal Embeddings of Tabular Data Universelle Einbettungen von Tabellendaten 表格数据通用嵌入式 2507.05904v1

Authors (4): Astrid Franz, Frederik Hoppe, Marianne Michaelis, Udo Göbel

Tabular data in relational databases represents a significant portion of industrial data. Hence, analyzing and interpreting tabular data is of utmost importance. Application tasks on tabular data are manifold and are often not specified when setting up an industrial database. To address this, we present a novel framework for generating universal, i.e., task-independent embeddings of tabular data for performing downstream tasks without predefined targets. Our method transforms tabular data into a graph structure, leverages Graph Auto-Encoders to create entity embeddings, which are subsequently aggregated to obtain embeddings for each table row, i.e., each data sample. This two-step approach has the advantage that unseen samples, consisting of similar entities, can be embedded without additional training. Downstream tasks such as regression, classification or outlier detection, can then be performed by applying a distance-based similarity measure in the embedding space. Experiments on real-world datasets demonstrate that our method achieves superior performance compared to existing universal tabular data embedding techniques.

nan


Article 532

Title@2025-07-08 (2): On the Fundamental Impossibility of Hallucination Control in Large Language Models

Title: On the Fundamental Impossibility of Hallucination Control in Large Language Models Über die grundsätzliche Unmöglichkeit der Halluzinationskontrolle in großen Sprachmodellen 关于大语言模型中幻听控制的基本不可能性 2506.06382v3

Authors (1): Michał P. Karpowicz

We prove that perfect hallucination control in large language models is mathematically impossible. No LLM inference mechanism can simultaneously achieve truthful response generation, semantic information conservation, relevant knowledge revelation, and knowledge-constrained optimality. This impossibility is fundamental, arising from the mathematical structure of information aggregation itself rather than engineering limitations. The proof spans three mathematical frameworks: auction theory, proper scoring theory for probabilistic predictions, and log-sum-exp analysis for transformer architectures. In each setting, we demonstrate that information aggregation creates unavoidable violations of conservation principles. The Jensen gap in transformer probability aggregation provides a direct measure of this impossibility. These results reframe hallucination from an engineering bug to an inevitable mathematical feature of distributed intelligence. There are fundamental trade-offs between truthfulness, knowledge utilization, and response completeness, providing principled foundations for managing rather than eliminating hallucination. This work reveals deep connections between neural network inference, philosophy of knowledge and reasoning, and classical results in game theory and information theory, opening new research directions for developing beneficial AI systems within mathematical constraints.

nan


Article 533

Title@2025-07-08 (2): Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning

Title: Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning Stabile akustische Relay-Zuordnung mit hohem Durchsatz über Lase Chaos-basiertes Verstärkungslernen 通过以激光混乱为基础的加强学习,以高输送量方式分配稳定的声频中继任务 2507.05900v1

Authors (3): Zengjing Chen, Lu Wang, Chengzhi Xing

This study addresses the problem of stable acoustic relay assignment in an underwater acoustic network. Unlike the objectives of most existing literature, two distinct objectives, namely classical stable arrangement and ambiguous stable arrangement, are considered. To achieve these stable arrangements, a laser chaos-based multi-processing learning (LC-ML) method is introduced to efficiently obtain high throughput and rapidly attain stability. In order to sufficiently explore the relay’s decision-making, this method uses random numbers generated by laser chaos to learn the assignment of relays to multiple source nodes. This study finds that the laser chaos-based random number and multi-processing in the exchange process have a positive effect on higher throughput and strong adaptability with environmental changing over time. Meanwhile, ambiguous cognitions result in the stable configuration with less volatility compared to accurate ones. This provides a practical and useful method and can be the basis for relay selection in complex underwater environments.

nan


Article 534

Title@2025-07-08 (2): Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment

Title: Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment Composable Strategy Framework mit integrierten Video-Text basierten großen Sprachmodellen für die Herzfehlerbewertung 综合战略框架,包括基于视频文本的心衰失评估大语言模型 2502.16548v2

Authors (9): Jianzhou Chen, Jinyang Sun, Xiumei Wang, Xi Chen, Heyu Chu, Guo Song, Yuji Luo, Xingping Zhou, Rong Gu

Heart failure is one of the leading causes of death worldwide, with millons of deaths each year, according to data from the World Health Organization (WHO) and other public health agencies. While significant progress has been made in the field of heart failure, leading to improved survival rates and improvement of ejection fraction, there remains substantial unmet needs, due to the complexity and multifactorial characteristics. Therefore, we propose a composable strategy framework for assessment and treatment optimization in heart failure. This framework simulates the doctor-patient consultation process and leverages multi-modal algorithms to analyze a range of data, including video, physical examination, text results as well as medical history. By integrating these various data sources, our framework offers a more holistic evaluation and optimized treatment plan for patients. Our results demonstrate that this multi-modal approach outperforms single-modal artificial intelligence (AI) algorithms in terms of accuracy in heart failure (HF) prognosis prediction. Through this method, we can further evaluate the impact of various pathological indicators on HF prognosis,providing a more comprehensive evaluation.

nan


Article 535

Title@2025-07-08 (2): Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales

Title: Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales Verbesserung der Vertrauensschätzung in der Mensch-Roboter-Kollaboration mit Beta-Reputation bei feinkörnigen Zeitskalan 利用精确时间尺度的贝塔名声提高人类-机器人合作信任度估计 2411.01866v2

Authors (4): Resul Dagdanov, Milan Andrejevic, Dikai Liu, Chin-Teng Lin

When interacting with each other, humans adjust their behavior based on perceived trust. To achieve similar adaptability, robots must accurately estimate human trust at sufficiently granular timescales while collaborating with humans. Beta reputation is a popular way to formalize a mathematical estimation of human trust. However, it relies on binary performance, which updates trust estimations only after each task concludes. Additionally, manually crafting a reward function is the usual method of building a performance indicator, which is labor-intensive and time-consuming. These limitations prevent efficient capture of continuous trust changes at more granular timescales throughout the collaboration task. Therefore, this paper presents a new framework for the estimation of human trust using beta reputation at fine-grained timescales. To achieve granularity in beta reputation, we utilize continuous reward values to update trust estimates at each timestep of a task. We construct a continuous reward function using maximum entropy optimization to eliminate the need for the laborious specification of a performance indicator. The proposed framework improves trust estimations by increasing accuracy, eliminating the need to manually craft a reward function, and advancing toward the development of more intelligent robots.

nan


Article 536

Title@2025-07-08 (2): Iterative Importance Fine-tuning of Diffusion Models

Title: Iterative Importance Fine-tuning of Diffusion Models Iterative Bedeutung Feinabstimmung von Diffusionsmodellen 传播模型的微调 2502.04468v2

Authors (4): Alexander Denker, Shreyas Padhy, Francisco Vargas, Johannes Hertrich

Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using the $h$-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by estimating the $h$-transform, enabling amortised conditional sampling. Our method iteratively refines the $h$-transform using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.

nan


Article 537

Title@2025-07-08 (2): Robust Power System State Estimation using Physics-Informed Neural Networks

Title: Robust Power System State Estimation using Physics-Informed Neural Networks Robuste Leistungssystem-State-Schätzung mit physikinformierten Neuronalen Netzwerken 利用物理成形神经网络进行强力动力系统状态估算 2507.05874v1

Authors (4): Solon Falas, Markos Asprou, Charalambos Konstantinou, Maria K. Michael

Modern power systems face significant challenges in state estimation and real-time monitoring, particularly regarding response speed and accuracy under faulty conditions or cyber-attacks. This paper proposes a hybrid approach using physics-informed neural networks (PINNs) to enhance the accuracy and robustness, of power system state estimation. By embedding physical laws into the neural network architecture, PINNs improve estimation accuracy for transmission grid applications under both normal and faulty conditions, while also showing potential in addressing security concerns such as data manipulation attacks. Experimental results show that the proposed approach outperforms traditional machine learning models, achieving up to 83% higher accuracy on unseen subsets of the training dataset and 65% better performance on entirely new, unrelated datasets. Experiments also show that during a data manipulation attack against a critical bus in a system, the PINN can be up to 93% more accurate than an equivalent neural network.

nan


Article 538

Title@2025-07-08 (2): Bayesian Hierarchical Invariant Prediction

Title: Bayesian Hierarchical Invariant Prediction Bayesische Hierarchische Invariante Vorhersage Bayesian 海湾的等级性不稳定预测 2505.11211v2

Authors (4): Francisco Madaleno, Pernille Julie Viuff Sand, Francisco C. Pereira, Sergio Hernan Garrido Mejia

We propose Bayesian Hierarchical Invariant Prediction (BHIP) reframing Invariant Causal Prediction (ICP) through the lens of Hierarchical Bayes. We leverage the hierarchical structure to explicitly test invariance of causal mechanisms under heterogeneous data, resulting in improved computational scalability for a larger number of predictors compared to ICP. Moreover, given its Bayesian nature BHIP enables the use of prior information. In this paper, we test two sparsity inducing priors: horseshoe and spike-and-slab, both of which allow us a more reliable identification of causal features. We test BHIP in synthetic and real-world data showing its potential as an alternative inference method to ICP.

nan


Article 539

Title@2025-07-08 (2): Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments

Title: Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments Kommunikation-Effizient Modul-Wise-Federated Learning für Graspone-Erkennung in zerklüfteten Umgebungen 用于在冷冻环境中进行格拉斯普·波斯探测的通信效率模块 – – 联邦学习 2507.05861v1

Authors (5): Woonsang Kang, Joohyung Lee, Seungjun Kim, Jungchan Cho, Yoonseon Oh

Grasp pose detection (GPD) is a fundamental capability for robotic autonomy, but its reliance on large, diverse datasets creates significant data privacy and centralization challenges. Federated Learning (FL) offers a privacy-preserving solution, but its application to GPD is hindered by the substantial communication overhead of large models, a key issue for resource-constrained robots. To address this, we propose a novel module-wise FL framework that begins by analyzing the learning dynamics of the GPD model’s functional components. This analysis identifies slower-converging modules, to which our framework then allocates additional communication effort. This is realized through a two-phase process: a standard full-model training phase is followed by a communication-efficient phase where only the identified subset of slower-converging modules is trained and their partial updates are aggregated. Extensive experiments on the GraspNet-1B dataset demonstrate that our method outperforms standard FedAvg and other baselines, achieving higher accuracy for a given communication budget. Furthermore, real-world experiments on a physical robot validate our approach, showing a superior grasp success rate compared to baseline methods in cluttered scenes. Our work presents a communication-efficient framework for training robust, generalized GPD models in a decentralized manner, effectively improving the trade-off between communication cost and model performance.

nan


Article 540

Title@2025-07-08 (2): Quantum QSAR for drug discovery

Title: Quantum QSAR for drug discovery Quantum QSAR für die Entdeckung von Arzneimitteln 药物发现量量 QSAR 2505.04648v2

Authors (4): Alejandro Giraldo, Daniel Ruiz, Mariano Caruso, Guido Bellomo

Quantitative Structure-Activity Relationship (QSAR) modeling is key in drug discovery, but classical methods face limitations when handling high-dimensional data and capturing complex molecular interactions. This research proposes enhancing QSAR techniques through Quantum Support Vector Machines (QSVMs), which leverage quantum computing principles to process information Hilbert spaces. By using quantum data encoding and quantum kernel functions, we aim to develop more accurate and efficient predictive models.

nan


Article 541

Title@2025-07-08 (2): Property Elicitation on Imprecise Probabilities

Title: Property Elicitation on Imprecise Probabilities Immobilien-Elizitierung auf ungenaue Wahrscheinlichkeiten 易失可能性财产援引 2507.05857v1

Authors (2): James Bailie, Rabanus Derr

Property elicitation studies which attributes of a probability distribution can be determined by minimising a risk. We investigate a generalisation of property elicitation to imprecise probabilities (IP). This investigation is motivated by multi-distribution learning, which takes the classical machine learning paradigm of minimising a single risk over a (precise) probability and replaces it with $\Gamma$-maximin risk minimization over an IP. We provide necessary conditions for elicitability of a IP-property. Furthermore, we explain what an elicitable IP-property actually elicits through Bayes pairs – the elicited IP-property is the corresponding standard property of the maximum Bayes risk distribution.

nan


Article 542

Title@2025-07-08 (2): Optimal Transport for Domain Adaptation through Gaussian Mixture Models

Title: Optimal Transport for Domain Adaptation through Gaussian Mixture Models Optimaler Transport für Domain-Anpassung durch Gaussian Mixture Models 通过高山混合混合模型优化迁移以适应域域 2403.13847v3

Authors (3): Eduardo Fernandes Montesuma, Fred Maurice Ngolè Mboula, Antoine Souloumiac

Machine learning systems operate under the assumption that training and test data are sampled from a fixed probability distribution. However, this assumptions is rarely verified in practice, as the conditions upon which data was acquired are likely to change. In this context, the adaptation of the unsupervised domain requires minimal access to the data of the new conditions for learning models robust to changes in the data distribution. Optimal transport is a theoretically grounded tool for analyzing changes in distribution, especially as it allows the mapping between domains. However, these methods are usually computationally expensive as their complexity scales cubically with the number of samples. In this work, we explore optimal transport between Gaussian Mixture Models (GMMs), which is conveniently written in terms of the components of source and target GMMs. We experiment with 9 benchmarks, with a total of $85$ adaptation tasks, showing that our methods are more efficient than previous shallow domain adaptation methods, and they scale well with number of samples $n$ and dimensions $d$.

nan


Article 543

Title@2025-07-08 (2): Prototype-Guided and Lightweight Adapters for Inherent Interpretation and Generalisation in Federated Learning

Title: Prototype-Guided and Lightweight Adapters for Inherent Interpretation and Generalisation in Federated Learning Prototypengeführte und leichte Adapter zur inhärenten Interpretation und Generalisierung im Federated Learning 用于联邦学习中固有解释和通化的原型辅助和轻量级适应器 2507.05852v1

Authors (3): Samuel Ofosu Mensah, Kerol Djoumessi, Philipp Berens

Federated learning (FL) provides a promising paradigm for collaboratively training machine learning models across distributed data sources while maintaining privacy. Nevertheless, real-world FL often faces major challenges including communication overhead during the transfer of large model parameters and statistical heterogeneity, arising from non-identical independent data distributions across clients. In this work, we propose an FL framework that 1) provides inherent interpretations using prototypes, and 2) tackles statistical heterogeneity by utilising lightweight adapter modules to act as compressed surrogates of local models and guide clients to achieve generalisation despite varying client distribution. Each client locally refines its model by aligning class embeddings toward prototype representations and simultaneously adjust the lightweight adapter. Our approach replaces the need to communicate entire model weights with prototypes and lightweight adapters. This design ensures that each client’s model aligns with a globally shared structure while minimising communication load and providing inherent interpretations. Moreover, we conducted our experiments on a real-world retinal fundus image dataset, which provides clinical-site information. We demonstrate inherent interpretable capabilities and perform a classification task, which shows improvements in accuracy over baseline algorithms.

nan


Article 544

Title@2025-07-08 (2): Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining

Title: Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining Filtern wie Sie testen: Datengesteuerte Datenfilterung für CLIP-Vorschulung 像您一样的过滤器测试: CLIP 预培训前的数据驱动数据过滤 2503.08805v2

Authors (2): Mikey Shechter, Yair Carmon

We introduce Filter Like You Test (FLYT), an algorithm for curating large-scale vision-language datasets that learns the usefulness of each data point as a pretraining example. FLYT trains a scoring model that learns to weigh each example’s features using gradient signals from downstream tasks training sets. Based on FLYT, we implement Mixing-FLYT (M-FLYT), which takes the per-example scores generated by different scoring methods as features, and learns to unify them into a single score. FLYT naturally produces a distribution over the training examples, which we leverage through Soft Cap Sampling (SCS), a strategy for obtaining a filtered pretraining dataset from per-example probabilities that samples examples while preventing over-representation through a repetition penalty. Using these methods, we achieve 40.1% ImageNet zero-shot accuracy on the DataComp medium scale filtering benchmark, a 2% absolute accuracy increase over all previous results and a 5.5% increase over results that - like us - use only public resources. Our approach also yields 37.7\% on the average of 38 DataComp evaluation tasks, outperforming previous public-resource approaches by 0.4\%.

nan


Article 545

Title@2025-07-08 (2): GC-GAT: Multimodal Vehicular Trajectory Prediction using Graph Goal Conditioning and Cross-context Attention

Title: GC-GAT: Multimodal Vehicular Trajectory Prediction using Graph Goal Conditioning and Cross-context Attention GC-GAT: Multimodale Fahrzeugbahnvorhersage mit Graph Goal Conditioning und Cross-Context-Achtung GC-GAT:利用图表目标条件和交叉注意的多式车辆轨迹预测 2504.11150v2

Authors (3): Mahir Gulzar, Yar Muhammad, Naveed Muhammad

Predicting future trajectories of surrounding vehicles heavily relies on what contextual information is given to a motion prediction model. The context itself can be static (lanes, regulatory elements, etc) or dynamic (traffic participants). This paper presents a lane graph-based motion prediction model that first predicts graph-based goal proposals and later fuses them with cross attention over multiple contextual elements. We follow the famous encoder-interactor-decoder architecture where the encoder encodes scene context using lightweight Gated Recurrent Units, the interactor applies cross-context attention over encoded scene features and graph goal proposals, and the decoder regresses multimodal trajectories via Laplacian Mixture Density Network from the aggregated encodings. Using cross-attention over graph-based goal proposals gives robust trajectory estimates since the model learns to attend to future goal-relevant scene elements for the intended agent. We evaluate our work on nuScenes motion prediction dataset, achieving state-of-the-art results.

nan


Article 546

Title@2025-07-08 (2): PDFMathTranslate: Scientific Document Translation Preserving Layouts

Title: PDFMathTranslate: Scientific Document Translation Preserving Layouts PDFMathTranslate: Wissenschaftliche Dokumentübersetzung Layouts erhalten PDFMath Translate:科学文件翻译保存布局 2507.03009v2

Authors (4): Rongxin Ouyang, Chang Chu, Zhikuang Xin, Xiangyao Ma

Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world’s first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work has been open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 222k downloads.

nan


Article 547

Title@2025-07-08 (2): Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

Title: Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing Intra-DP: Ein Hochleistungs-Kollaboratives Schlussfolgerungssystem für Mobile Edge Computing DP内部:移动边缘电子计算高绩效合作推断系统 2507.05829v1

Authors (11): Zekai Sun, Xiuxian Guan, Zheng Lin, Zihan Fang, Xiangming Cai, Zhe Chen, Fangming Liu, Heming Cui, Jie Xiong, Wei Ni, Chau Yuen

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life. While Mobile Edge Computing (MEC) offers collaborative inference with GPU servers as a promising solution, existing approaches primarily rely on layer-wise model partitioning and undergo significant transmission bottlenecks caused by the sequential execution of DNN operations. To address this challenge, we present Intra-DP, a high-performance collaborative inference system optimized for DNN inference on MEC. Intra DP employs a novel parallel computing technique based on local operators (i.e., operators whose minimum unit input is not the entire input tensor, such as the convolution kernel). By decomposing their computations (operations) into several independent sub-operations and overlapping the computation and transmission of different sub-operations through parallel execution, Intra-DP mitigates transmission bottlenecks in MEC, achieving fast and energy-efficient inference. The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines, without sacrificing accuracy.

nan


Article 548

Title@2025-07-08 (2): The Impact of Prompt Programming on Function-Level Code Generation

Title: The Impact of Prompt Programming on Function-Level Code Generation Die Auswirkungen der Prompt-Programmierung auf die Code-Generierung auf Funktionsebene 迅速编制方案对职能层面代码生成的影响 2412.20545v2

Authors (4): Ranim Khojah, Francisco Gomes de Oliveira Neto, Mazen Mohamad, Philipp Leitner

Large Language Models (LLMs) are increasingly used by software engineers for code generation. However, limitations of LLMs such as irrelevant or incorrect code have highlighted the need for prompt programming (or prompt engineering) where engineers apply specific prompt techniques (e.g., chain-of-thought or input-output examples) to improve the generated code. While some prompt techniques have been studied, the impact of different techniques – and their interactions – on code generation is still not fully understood. In this study, we introduce CodePromptEval, a dataset of 7072 prompts designed to evaluate five prompt techniques (few-shot, persona, chain-of-thought, function signature, list of packages) and their effect on the correctness, similarity, and quality of complete functions generated by three LLMs (GPT-4o, Llama3, and Mistral). Our findings show that while certain prompt techniques significantly influence the generated code, combining multiple techniques does not necessarily improve the outcome. Additionally, we observed a trade-off between correctness and quality when using prompt techniques. Our dataset and replication package enable future research on improving LLM-generated code and evaluating new prompt techniques.

nan


Article 549

Title@2025-07-08 (2): Accelerating Large-Scale Regularized High-Order Tensor Recovery

Title: Accelerating Large-Scale Regularized High-Order Tensor Recovery Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung 加速大规模正规化高阶天文恢复 2506.09594v2

Authors (4): Wenjin Qin, Hailin Wang, Jingyao Hou, Jianjun Wang

Currently, existing tensor recovery methods fail to recognize the impact of tensor scale variations on their structural characteristics. Furthermore, existing studies face prohibitive computational costs when dealing with large-scale high-order tensor data. To alleviate these issue, assisted by the Krylov subspace iteration, block Lanczos bidiagonalization process, and random projection strategies, this article first devises two fast and accurate randomized algorithms for low-rank tensor approximation (LRTA) problem. Theoretical bounds on the accuracy of the approximation error estimate are established. Next, we develop a novel generalized nonconvex modeling framework tailored to large-scale tensor recovery, in which a new regularization paradigm is exploited to achieve insightful prior representation for large-scale tensors. On the basis of the above, we further investigate new unified nonconvex models and efficient optimization algorithms, respectively, for several typical high-order tensor recovery tasks in unquantized and quantized situations. To render the proposed algorithms practical and efficient for large-scale tensor data, the proposed randomized LRTA schemes are integrated into their central and time-intensive computations. Finally, we conduct extensive experiments on various large-scale tensors, whose results demonstrate the practicability, effectiveness and superiority of the proposed method in comparison with some state-of-the-art approaches.

nan


Article 550

Title@2025-07-08 (2): Fair Domain Generalization: An Information-Theoretic View

Title: Fair Domain Generalization: An Information-Theoretic View Fair Domain Generalization: Eine informationstheoretische Ansicht 公平域一般化:信息理论观点 2507.05823v1

Authors (5): Tangzheng Lian, Guanyu Hu, Dimitrios Kollias, Xinyu Yang, Oya Celiktutan

Domain generalization (DG) and algorithmic fairness are two critical challenges in machine learning. However, most DG methods focus only on minimizing expected risk in the unseen target domain without considering algorithmic fairness. Conversely, fairness methods typically do not account for domain shifts, so the fairness achieved during training may not generalize to unseen test domains. In this work, we bridge these gaps by studying the problem of Fair Domain Generalization (FairDG), which aims to minimize both expected risk and fairness violations in unseen target domains. We derive novel mutual information-based upper bounds for expected risk and fairness violations in multi-class classification tasks with multi-group sensitive attributes. These bounds provide key insights for algorithm design from an information-theoretic perspective. Guided by these insights, we introduce PAFDG (Pareto-Optimal Fairness for Domain Generalization), a practical framework that solves the FairDG problem and models the utility-fairness trade-off through Pareto optimization. Experiments on real-world vision and language datasets show that PAFDG achieves superior utility-fairness trade-offs compared to existing methods.

nan


Article 551

Title@2025-07-08 (2): Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations

Title: Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations Einfach besser oder schlechter sagen: Ein Mensch-AI-Kollaborativer Rahmen für medizinische Bildsegmentierung ohne manuelle Anmerkungen 说好说坏说好说好说坏:未经人工说明的医学图像分割人类-大赦国际协作框架 2507.05815v1

Authors (1): Yizhe Zhang

Manual annotation of medical images is a labor-intensive and time-consuming process, posing a significant bottleneck in the development and deployment of robust medical imaging AI systems. This paper introduces a novel Human-AI collaborative framework for medical image segmentation that substantially reduces the annotation burden by eliminating the need for explicit manual pixel-level labeling. The core innovation lies in a preference learning paradigm, where human experts provide minimal, intuitive feedback – simply indicating whether an AI-generated segmentation is better or worse than a previous version. The framework comprises four key components: (1) an adaptable foundation model (FM) for feature extraction, (2) label propagation based on feature similarity, (3) a clicking agent that learns from human better-or-worse feedback to decide where to click and with which label, and (4) a multi-round segmentation learning procedure that trains a state-of-the-art segmentation network using pseudo-labels generated by the clicking agent and FM-based label propagation. Experiments on three public datasets demonstrate that the proposed approach achieves competitive segmentation performance using only binary preference feedback, without requiring experts to directly manually annotate the images.

nan


Article 552

Title@2025-07-08 (2): Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs

Title: Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs Konzeptbasierte mechanistische Interpretationsfähigkeit mit strukturierten Wissensgraphen 使用结构化知识图 2507.05810v1

Authors (4): Sofiia Chorna, Kateryna Tarelkina, Eloïse Berthier, Gianni Franchi

While concept-based interpretability methods have traditionally focused on local explanations of neural network predictions, we propose a novel framework and interactive tool that extends these methods into the domain of mechanistic interpretability. Our approach enables a global dissection of model behavior by analyzing how high-level semantic attributes (referred to as concepts) emerge, interact, and propagate through internal model components. Unlike prior work that isolates individual neurons or predictions, our framework systematically quantifies how semantic concepts are represented across layers, revealing latent circuits and information flow that underlie model decision-making. A key innovation is our visualization platform that we named BAGEL (for Bias Analysis with a Graph for global Explanation Layers), which presents these insights in a structured knowledge graph, allowing users to explore concept-class relationships, identify spurious correlations, and enhance model trustworthiness. Our framework is model-agnostic, scalable, and contributes to a deeper understanding of how deep learning models generalize (or fail to) in the presence of dataset biases. The demonstration is available at https://knowledge-graph-ui-4a7cb5.gitlab.io/.

nan


Article 553

Title@2025-07-08 (2): Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters

Title: Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters Verbesserung der Robustheit von Foundation-Modellen in der Domänenanpassung mit Soup-Adaptern 改进与Soup-Adapters公司在适应域域域方面的基础模型的牢固性 2507.05807v1

Authors (1): Marco Roschkowski

In this paper, we tackle two fundamental problems in few-shot domain adaptation of foundation models. First, hyperparameter tuning is often impractical due to the lack of large validation datasets. Second, model robustness under distribution shifts where test time data deviates slightly from training distributions, remains a concern. We show that by training multiple independent adapters and averaging their outputs, the new model has a higher performance and is more robust to distribution shifts compared to any individual adapter. This improvement holds even when the adapters are trained with diverse hyperparameters sampled from a wide range, resulting in varied individual performance. Consequently, our method addresses both of the problems described above. The ensemble is also significantly less sensitive to the residual ratio, a critical hyperparameter of CLIP-Adapter. Since the ensemble can be reparameterized to a single adapter again using a principled concatenation of the parameters, we refer to our method as Soup-Adapter. This is also the first study to explore CLIP adapter-style techniques for DINOv2 and to directly compare them with CLIP in this setting.

nan


Article 554

Title@2025-07-08 (2): Predicting Graph Structure via Adapted Flux Balance Analysis

Title: Predicting Graph Structure via Adapted Flux Balance Analysis Vorhersage der Graphenstruktur über angepasste Flux-Balance-Analyse 通过经调整的通量平衡分析实现的预测图结构 2507.05806v1

Authors (4): Sevvandi Kandanaarachchi, Ziqi Xu, Stefan Westerlund, Conrad Sanderson

Many dynamic processes such as telecommunication and transport networks can be described through discrete time series of graphs. Modelling the dynamics of such time series enables prediction of graph structure at future time steps, which can be used in applications such as detection of anomalies. Existing approaches for graph prediction have limitations such as assuming that the vertices do not to change between consecutive graphs. To address this, we propose to exploit time series prediction methods in combination with an adapted form of flux balance analysis (FBA), a linear programming method originating from biochemistry. FBA is adapted to incorporate various constraints applicable to the scenario of growing graphs. Empirical evaluations on synthetic datasets (constructed via Preferential Attachment model) and real datasets (UCI Message, HePH, Facebook, Bitcoin) demonstrate the efficacy of the proposed approach.

nan


Article 555

Title@2025-07-08 (2): Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Title: Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v3

Authors (5): Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies demonstrates much lower sample efficiency compared to directly applying PPO on MLP policies (MLP+PPO). To address these challenges, we introduce NCDPO, a novel framework that reformulates Diffusion Policy as a noise-conditioned deterministic policy. By treating each denoising step as a differentiable transformation conditioned on pre-sampled noise, NCDPO enables tractable likelihood evaluation and gradient backpropagation through all diffusion timesteps. Our experiments demonstrate that NCDPO achieves sample efficiency comparable to MLP+PPO when training from scratch, outperforming existing methods in both sample efficiency and final performance across diverse benchmarks, including continuous robot control and multi-agent game scenarios. Furthermore, our experimental results show that our method is robust to the number denoising timesteps in the Diffusion Policy.

nan


Article 556

Title@2025-07-08 (2): An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems

Title: An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems Ein Ensemble-Embedding-Ansatz zur Verbesserung der semantischen Caching-Performance in LLM-basierten Systemen 提高基于LLM系统的语义缓冲性能的结合嵌入方法 2507.07061v1

Authors (3): Shervin Ghaffari, Zohre Bahranifard, Mohammad Akbari

Semantic caching enhances the efficiency of large language model (LLM) systems by identifying semantically similar queries, storing responses once, and serving them for subsequent equivalent requests. However, existing semantic caching frameworks rely on single embedding models for query representation, which limits their ability to capture the diverse semantic relationships present in real-world query distributions. This paper presents an ensemble embedding approach that combines multiple embedding models through a trained meta-encoder to improve semantic similarity detection in LLM caching systems. We evaluate our method using the Quora Question Pairs (QQP) dataset, measuring cache hit ratios, cache miss ratios, token savings, and response times. Our ensemble approach achieves a 92\% cache hit ratio for semantically equivalent queries while maintaining an 85\% accuracy in correctly rejecting non-equivalent queries as cache misses. These results demonstrate that ensemble embedding methods significantly outperform single-model approaches in distinguishing between semantically similar and dissimilar queries, leading to more effective caching performance and reduced computational overhead in LLM-based systems.

nan


Article 557

Title@2025-07-08 (2): MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training

Title: MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training MAMUT: Ein neuartiges Framework zur Modifizierung mathematischer Formeln für die Generierung von Spezialdatensätzen für Sprachmodellschulungen MAMUT:为语言模式培训制作专门数据集修改数学公式的新框架 2502.20855v2

Authors (3): Jonathan Drechsel, Anja Reusch, Steffen Herbold

Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships. While state-of-the-art transformer models excel in processing and understanding natural language, they encounter challenges with mathematical notation, which involves a complex structure and diverse representations. This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in LaTeX notation, effectively capturing the mathematical variety in notation of the same concept. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation. Experiments show that models trained on these datasets exhibit new SoTA performance on mathematical retrieval tasks. We publish our code, generated datasets, and pretrained mathematical models: https://github.com/aieng-lab/math-mutator.

nan


Article 558

Title@2025-07-08 (2): Copula Density Neural Estimation

Title: Copula Density Neural Estimation Copula Dichte Neurale Schätzung Copula 密度心神经估计 2211.15353v3

Authors (3): Nunzio A. Letizia, Nicola Novello, Andrea M. Tonello

Probability density estimation from observed data constitutes a central task in statistics. In this brief, we focus on the problem of estimating the copula density associated to any observed data, as it fully describes the dependence between random variables. We separate univariate marginal distributions from the joint dependence structure in the data, the copula itself, and we model the latter with a neural network-based method referred to as copula density neural estimation (CODINE). Results show that the novel learning approach is capable of modeling complex distributions and can be applied for mutual information estimation and data generation.

nan


Article 559

Title@2025-07-08 (2): Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning

Title: Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning Robuste Bandbreitenschätzung für Echtzeit-Kommunikation mit Offline-Verstärkung 与离线强化学习进行实时通信的强力波段测算 2507.05785v1

Authors (5): Jian Kai, Tianwei Zhang, Zihan Ling, Yang Cao, Can Shen

Accurate bandwidth estimation (BWE) is critical for real-time communication (RTC) systems. Traditional heuristic approaches offer limited adaptability under dynamic networks, while online reinforcement learning (RL) suffers from high exploration costs and potential service disruptions. Offline RL, which leverages high-quality data collected from real-world environments, offers a promising alternative. However, challenges such as out-of-distribution (OOD) actions, policy extraction from behaviorally diverse datasets, and reliable deployment in production systems remain unsolved. We propose RBWE, a robust bandwidth estimation framework based on offline RL that integrates Q-ensemble (an ensemble of Q-functions) with a Gaussian mixture policy to mitigate OOD risks and enhance policy learning. A fallback mechanism ensures deployment stability by switching to heuristic methods under high uncertainty. Experimental results show that RBWE reduces overestimation errors by 18% and improves the 10th percentile Quality of Experience (QoE) by 18.6%, demonstrating its practical effectiveness in real-world RTC applications.

nan


Article 560

Title@2025-07-08 (2): From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification

Title: From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification Von der Bewegung zur Bedeutung: Biomechanik-informiertes Neuronales Netzwerk zur erklärbaren Identifizierung von Herz-Kreislauf-Erkrankungen 从运动到意义:用于可解释心血管疾病识别的生物机械-内建神经网络 2507.05783v1

Authors (4): Comte Valentin, Gemma Piella, Mario Ceresa, Miguel A. Gonzalez Ballester

Cardiac diseases are among the leading causes of morbidity and mortality worldwide, which requires accurate and timely diagnostic strategies. In this study, we introduce an innovative approach that combines deep learning image registration with physics-informed regularization to predict the biomechanical properties of moving cardiac tissues and extract features for disease classification. We utilize the energy strain formulation of Neo-Hookean material to model cardiac tissue deformations, optimizing the deformation field while ensuring its physical and biomechanical coherence. This explainable approach not only improves image registration accuracy, but also provides insights into the underlying biomechanical processes of the cardiac tissues. Evaluation on the Automated Cardiac Diagnosis Challenge (ACDC) dataset achieved Dice scores of 0.945 for the left ventricular cavity, 0.908 for the right ventricular cavity, and 0.905 for the myocardium. Subsequently, we estimate the local strains within the moving heart and extract a detailed set of features used for cardiovascular disease classification. We evaluated five classification algorithms, Logistic Regression, Multi-Layer Perceptron, Support Vector Classifier, Random Forest, and Nearest Neighbour, and identified the most relevant features using a feature selection algorithm. The best performing classifier obtained a classification accuracy of 98% in the training set and 100% in the test set of the ACDC dataset. By integrating explainable artificial intelligence, this method empowers clinicians with a transparent understanding of the model’s predictions based on cardiac mechanics, while also significantly improving the accuracy and reliability of cardiac disease diagnosis, paving the way for more personalized and effective patient care.

nan


Article 561

Title@2025-07-08 (2): On the relation between trainability and dequantization of variational quantum learning models

Title: On the relation between trainability and dequantization of variational quantum learning models Über die Beziehung zwischen Lernfähigkeit und Dequantisierung von variationalen Quanten-Lernmodellen 关于变化量量学习模型的可培训性和量化之间的关系 2406.07072v3

Authors (4): Elies Gil-Fuster, Casper Gyurik, Adrián Pérez-Salinas, Vedran Dunjko

The quest for successful variational quantum machine learning (QML) relies on the design of suitable parametrized quantum circuits (PQCs), as analogues to neural networks in classical machine learning. Successful QML models must fulfill the properties of trainability and non-dequantization, among others. Recent works have highlighted an intricate interplay between trainability and dequantization of such models, which is still unresolved. In this work we contribute to this debate from the perspective of machine learning, proving a number of results identifying, among others when trainability and non-dequantization are not mutually exclusive. We begin by providing a number of new somewhat broader definitions of the relevant concepts, compared to what is found in other literature, which are operationally motivated, and consistent with prior art. With these precise definitions given and motivated, we then study the relation between trainability and dequantization of variational QML. Next, we also discuss the degrees of “variationalness” of QML models, where we distinguish between models like the hardware efficient ansatz and quantum kernel methods. Finally, we introduce recipes for building PQC-based QML models which are both trainable and nondequantizable, and corresponding to different degrees of variationalness. We do not address the practical utility for such models. Our work however does point toward a way forward for finding more general constructions, for which finding applications may become feasible.

nan


Article 562

Title@2025-07-08 (2): NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Title: NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models NoWag: Ein einheitliches Rahmenwerk für die Erhaltung von Formen für die Kompression großer Sprachmodelle NoWag:大型语言模型形状保留压缩统一框架 2504.14569v2

Authors (6): Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. To address this challenge, we propose NoWag: (Normalized Weight and Activation Guided Compression), a unified framework for zero-shot shape preserving compression algorithms. We compressed Llama-2 7B/13B/70B and Llama-3 8/70BB models, using two popular forms of shape-preserving compression, vector quantization NoWag-VQ (NoWag for Vector Quantization), and unstructured/semi-structured pruning NoWag-P (NoWag for Pruning). We found that NoWag-VQ significantly outperforms state-of-the-art zero shot VQ, and that NoWag-P performs competitively against state-of-the-art methods. These results suggest commonalities between these compression paradigms that could inspire future work. Our code is available at https://github.com/LawrenceRLiu/NoWag

nan


Article 563

Title@2025-07-08 (2): Efficient Risk-sensitive Planning via Entropic Risk Measures

Title: Efficient Risk-sensitive Planning via Entropic Risk Measures Effiziente risikosensible Planung über entropische Risikomaßnahmen 通过整体风险措施进行对风险有敏感认识的有效规划 2502.20423v2

Authors (4): Alexandre Marthe, Samuel Bounan, Aurélien Garivier, Claire Vernade

Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work showed that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming, leaving a hard-to-interpret parameter to choose. We show that the computation of the full set of optimal policies for EntRM across parameter values leads to tight approximations for the metrics of interest. We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks. Empirical results demonstrate that our approach achieves strong performance in a variety of decision-making scenarios.

nan


Article 564

Title@2025-07-08 (2): Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack

Title: Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack Achten Sie auf die Kosten des Gerüstes! Benign Clients können sogar Komplizen von Backdoor Attack werden 注意脚手架的成本! 仁慈的客户甚至可能成为幕后攻击的帮凶 2411.16167v3

Authors (10): Xingshuo Han, Xuanye Zhang, Xiang Lan, Haozhao Wang, Shengmin Xu, Shen Ren, Jason Zeng, Ming Wu, Michael Heinrich, Tianwei Zhang

By using a control variate to calibrate the local gradient of each client, Scaffold has been widely known as a powerful solution to mitigate the impact of data heterogeneity in Federated Learning. Although Scaffold achieves significant performance improvements, we show that this superiority is at the cost of increased security vulnerabilities. Specifically, this paper presents BadSFL, the first backdoor attack targeting Scaffold, which turns benign clients into accomplices to amplify the attack effect. The core idea of BadSFL is to uniquely tamper with the control variate to subtly steer benign clients’ local gradient updates towards the attacker’s poisoned direction, effectively turning them into unwitting accomplices and significantly enhancing the backdoor persistence. Additionally, BadSFL leverages a GAN-enhanced poisoning strategy to enrich the attacker’s dataset, maintaining high accuracy on both benign and backdoored samples while remaining stealthy. Extensive experiments demonstrate that BadSFL achieves superior attack durability, maintaining effectiveness for over 60 global rounds, lasting up to three times longer than existing baselines even after ceasing malicious model injections.

nan


Article 565

Title@2025-07-08 (2): PSAT: Pediatric Segmentation Approaches via Adult Augmentations and Transfer Learning

Title: PSAT: Pediatric Segmentation Approaches via Adult Augmentations and Transfer Learning PSAT: Pädiatrische Segmentierungsansätze über Erwachsene Augmentationen und Transfer Learning PSAT: 通过成人增强和转移学习的儿科分解方法 2507.05764v1

Authors (5): Tristan Kirscher, Sylvain Faisan, Xavier Coubez, Loris Barrier, Philippe Meyer

Pediatric medical imaging presents unique challenges due to significant anatomical and developmental differences compared to adults. Direct application of segmentation models trained on adult data often yields suboptimal performance, particularly for small or rapidly evolving structures. To address these challenges, several strategies leveraging the nnU-Net framework have been proposed, differing along four key axes: (i) the fingerprint dataset (adult, pediatric, or a combination thereof) from which the Training Plan -including the network architecture-is derived; (ii) the Learning Set (adult, pediatric, or mixed), (iii) Data Augmentation parameters, and (iv) the Transfer learning method (finetuning versus continual learning). In this work, we introduce PSAT (Pediatric Segmentation Approaches via Adult Augmentations and Transfer learning), a systematic study that investigates the impact of these axes on segmentation performance. We benchmark the derived strategies on two pediatric CT datasets and compare them with state-of-theart methods, including a commercial radiotherapy solution. PSAT highlights key pitfalls and provides actionable insights for improving pediatric segmentation. Our experiments reveal that a training plan based on an adult fingerprint dataset is misaligned with pediatric anatomy-resulting in significant performance degradation, especially when segmenting fine structures-and that continual learning strategies mitigate institutional shifts, thus enhancing generalization across diverse pediatric datasets. The code is available at https://github.com/ICANS-Strasbourg/PSAT.

nan


Article 566

Title@2025-07-08 (2): Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism

Title: Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism Jigsaw: Training von Multi-Billion-Parameter KI-Wettermodellen mit optimierter Modellparallelität Jigsaw:以优化模型平行化优化模式培训多亿分之一的气象模型 2507.05753v1

Authors (5): Deifilia Kieckhefen, Markus Götz, Lars H. Heyen, Achim Streit, Charlotte Debus

AI-based methods have revolutionized atmospheric forecasting, with recent successes in medium-range forecasting spurring the development of climate foundation models. Accurate modeling of complex atmospheric dynamics at high spatial resolutions and longer lead times requires large neural networks and gigabyte-sized data samples, making accelerator memory and I/O-bandwidth the bottlenecks for model training. We introduce WeatherMixer, a multi-layer-perceptron-based architecture whose workload scales linearly with input size, allowing the model to learn global weather phenomena at accuracies similar to numerical weather prediction. To cope with the computational demand, we propose Jigsaw, a novel model parallelization scheme that employs both domain and tensor parallelism, eliminating memory redundancy. Jigsaw exceeds state-of-the-art performance in strong scaling in compute-communication-limited systems and achieves superscalar weak scaling in I/O-bandwidth-limited systems. We scale training to 256 GPUs, reaching peak performances of 9 and 11 PFLOPs, 23% and 28% of theoretical peaks, achieving 68% and 72% scaling efficiency versus 51% without model parallelism.

nan


Article 567

Title@2025-07-08 (2): Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Title: Pretrained Reversible Generation as Unsupervised Visual Representation Learning Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen 作为无人监督的视觉代表学习 2412.01787v5

Authors (7): Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang

Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous generation model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. This framework enables the flexible selection of feature hierarchies tailored to specific downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model based methods, including 78% top-1 accuracy on ImageNet at a resolution of 64*64. Extensive ablation studies, including out-of-distribution evaluations, further validate the effectiveness of our approach.PRG is available at https://github.com/opendilab/PRG.

nan


Article 568

Title@2025-07-08 (2): Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates

Title: Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates Politikprüfung in stochastischen dynamischen Systemen mit logarithmischen Neuralzertifikaten 使用对数神经神经证书进行斯托卡动态系统的政策核查 2406.00826v3

Authors (4): Thom Badings, Wietze Koops, Sebastian Junges, Nils Jansen

We consider the verification of neural network policies for discrete-time stochastic systems with respect to reach-avoid specifications. We use a learner-verifier procedure that learns a certificate for the specification, represented as a neural network. Verifying that this neural network certificate is a so-called reach-avoid supermartingale (RASM) proves the satisfaction of a reach-avoid specification. Existing approaches for such a verification task rely on computed Lipschitz constants of neural networks. These approaches struggle with large Lipschitz constants, especially for reach-avoid specifications with high threshold probabilities. We present two key contributions to obtain smaller Lipschitz constants than existing approaches. First, we introduce logarithmic RASMs (logRASMs), which take exponentially smaller values than RASMs and hence have lower theoretical Lipschitz constants. Second, we present a fast method to compute tighter upper bounds on Lipschitz constants based on weighted norms. Our empirical evaluation shows we can consistently verify the satisfaction of reach-avoid specifications with probabilities as high as 99.9999%.

nan


Article 569

Title@2025-07-08 (2): A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation

Title: A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation Ein Satelliten-Ground Synergistisches Large Vision-Language-Modellsystem für die Erdbeobachtung 地球观测卫星地面同步卫星同步大型视野语言模型系统 2507.05731v1

Authors (6): Yuxin Zhang, Jiahao Yang, Zhe Chen, Wenjun Zhu, Jin Zhao, Yue Gao

Recently, large vision-language models (LVLMs) unleash powerful analysis capabilities for low Earth orbit (LEO) satellite Earth observation images in the data center. However, fast satellite motion, brief satellite-ground station (GS) contact windows, and large size of the images pose a data download challenge. To enable near real-time Earth observation applications (e.g., disaster and extreme weather monitoring), we should explore how to deploy LVLM in LEO satellite networks, and design SpaceVerse, an efficient satellite-ground synergistic LVLM inference system. To this end, firstly, we deploy compact LVLMs on satellites for lightweight tasks, whereas regular LVLMs operate on GSs to handle computationally intensive tasks. Then, we propose a computing and communication co-design framework comprised of a progressive confidence network and an attention-based multi-scale preprocessing, used to identify on-satellite inferring data, and reduce data redundancy before satellite-GS transmission, separately. We implement and evaluate SpaceVerse on real-world LEO satellite constellations and datasets, achieving a 31.2% average gain in accuracy and a 51.2% reduction in latency compared to state-of-the-art baselines.

nan


Article 570

Title@2025-07-08 (2): Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Title: Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling Verbesserung der LLM-Zuverlässigkeit durch explizite Wissensgrenzenmodellierung 通过明确知识边界建模增强LLM可靠性 2503.02233v3

Authors (6): Hang Zheng, Hongshen Xu, Yuncong Liu, Lu Chen, Pascale Fung, Kai Yu

Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness, particularly when processing queries exceeding their knowledge boundaries. While existing mitigation strategies employ uncertainty estimation or query rejection mechanisms, they suffer from computational efficiency and sacrificed helpfulness. To address these issues, we propose the Explicit Knowledge Boundary Modeling (EKBM) framework, integrating fast and slow reasoning systems to harmonize reliability and usability. The framework first employs a fast-thinking model to generate confidence-labeled responses, enabling immediate utilization of high-confidence outputs, whereas uncertain predictions trigger a slow refinement model for accuracy improvement. To align model behavior with our proposed object, we propose a hybrid training pipeline, enhancing self-awareness without degrading task performance. Evaluations on dialogue state tracking tasks demonstrate that EKBM achieves superior model reliability over uncertainty-based baselines. Further analysis reveals that refinement substantially boosts accuracy while maintaining low computational overhead. The framework establishes a scalable paradigm for deploying reliable LLMs in error-sensitive applications, effectively balancing accuracy and practical utility.

nan


Article 571

Title@2025-07-08 (2): Multi-Channel Hypergraph Contrastive Learning for Matrix Completion

Title: Multi-Channel Hypergraph Contrastive Learning for Matrix Completion Multi-Kanal Hypergraph Kontrastives Lernen für Matrix-Vervollständigung 用于完成矩阵完成的多通道高频反竞争学习 2411.01376v2

Authors (5): Xiang Li, Changsheng Shui, Zhongying Zhao, Junyu Dong, Yanwei Yu

Rating is a typical user explicit feedback that visually reflects how much a user likes a related item. The (rating) matrix completion is essentially a rating prediction process, which is also a significant problem in recommender systems. Recently, graph neural networks (GNNs) have been widely used in matrix completion, which captures users’ preferences over items by formulating a rating matrix as a bipartite graph. However, existing methods are susceptible due to data sparsity and long-tail distribution in real-world scenarios. Moreover, the messaging mechanism of GNNs makes it difficult to capture high-order correlations and constraints between nodes, which are essentially useful in recommendation tasks. To tackle these challenges, we propose a Multi-Channel Hypergraph Contrastive Learning framework for matrix completion, named MHCL. Specifically, MHCL adaptively learns hypergraph structures to capture high-order correlations between nodes and jointly captures local and global collaborative relationships through attention-based cross-view aggregation. Additionally, to consider the magnitude and order information of ratings, we treat different rating subgraphs as different channels, encourage alignment between adjacent ratings, and further achieve the mutual enhancement between different ratings through multi-channel cross-rating contrastive learning. Extensive experiments on five public datasets demonstrate that the proposed method significantly outperforms the current state-of-the-art approaches.

nan


Article 572

Title@2025-07-08 (2): Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition

Title: Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Omni-Router: Routing-Entscheidungen in Sparse Mixture-of-Experts für die Spracherkennung teilen Omni-Router: 分享语音识别专家的松散混集决定 2507.05724v1

Authors (3): Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly

Mixture-of-experts (MoE) architectures have expanded from language modeling to automatic speech recognition (ASR). Traditional MoE methods, such as the Switch Transformer, route experts independently within each layer. Our analysis reveals that routers in most layers make expert choices that are not strongly correlated with the choices of the routers in other layers. To increase the cooperation between experts in different layers and encourage greater specialization, we use a shared router across different MoE layers. We call this model \emph{Omni-router Transformer}. Extensive experiments on a large-scale pseudo-labeled dataset and evaluations across 10 diverse, out-of-domain ASR benchmarks demonstrate that the Omni-router Transformer is able to achieve lower training loss and consistently outperform dense and Switch Transformer models, reducing average word error rates by 11.2% and 8.2%, respectively, while providing structured expert usage and improved robustness to diverse data.

nan


Article 573

Title@2025-07-08 (2): Empirical Validation of the Independent Chip Model

Title: Empirical Validation of the Independent Chip Model Empirische Validierung des unabhängigen Chipmodells 独立芯片模型的经验验证 2506.00180v2

Authors (1): Juho Kim

The independent chip model (ICM) forms a cornerstone of all modern poker tournament strategy. However, despite its prominence, the ICM’s performance in the real world has not been sufficiently scrutinized, especially at a large scale. In this paper, we introduce our new dataset of poker tournaments, consisting of results of over ten thousand events. Then, using this dataset, we perform two experiments as part of a large-scale empirical validation of the ICM. First, we verify that the ICM performs more accurately than a baseline we propose. Second, we obtain empirical evidence of the ICM underestimating the performances of players with larger stacks while overestimating those who are short-stacked. Our contributions may be useful to future researchers developing new algorithms for estimating a player’s value in poker tournaments.

nan


Article 574

Title@2025-07-08 (2): Hierarchical Task Offloading for UAV-Assisted Vehicular Edge Computing via Deep Reinforcement Learning

Title: Hierarchical Task Offloading for UAV-Assisted Vehicular Edge Computing via Deep Reinforcement Learning Hierarchische Aufgabe Offloading für UAV-Assisted Vehicular Edge Computing über Deep Reinforcement Learning 通过深强化学习为无人驾驶航空器协助的车辆边缘计算机卸载 2507.05722v1

Authors (5): Hongbao Li, Ziye Jia, Sijie He, Kun Guo, Qihui Wu

With the emergence of compute-intensive and delay-sensitive applications in vehicular networks, unmanned aerial vehicles (UAVs) have emerged as a promising complement for vehicular edge computing due to the high mobility and flexible deployment. However, the existing UAV-assisted offloading strategies are insufficient in coordinating heterogeneous computing resources and adapting to dynamic network conditions. Hence, this paper proposes a dual-layer UAV-assisted edge computing architecture based on partial offloading, composed of the relay capability of high-altitude UAVs and the computing support of low-altitude UAVs. The proposed architecture enables efficient integration and coordination of heterogeneous resources. A joint optimization problem is formulated to minimize the system delay and energy consumption while ensuring the task completion rate. To solve the high-dimensional decision problem, we reformulate the problem as a Markov decision process and propose a hierarchical offloading scheme based on the soft actor-critic algorithm. The method decouples global and local decisions, where the global decisions integrate offloading ratios and trajectory planning into continuous actions, while the local scheduling is handled via designing a priority-based mechanism. Simulations are conducted and demonstrate that the proposed approach outperforms several baselines in task completion rate, system efficiency, and convergence speed, showing strong robustness and applicability in dynamic vehicular environments.

nan


Article 575

Title@2025-07-08 (2): MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Title: MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment MobileGUI-RL: Mobile GUI-Agent durch Verstärkungslernen im Online-Umfeld fördern MomGUI-RL:通过在线环境强化学习推进移动图形界面代理 2507.05720v1

Authors (8): Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, Dong Yu

Recently, there has been a surge of vision-based GUI agents designed to automate everyday mobile and web tasks. These agents interpret raw GUI screenshots and autonomously decide where to click, scroll, or type, which bypasses handcrafted rules and app-specific APIs. However, most existing methods trained GUI agent in the offline environment using pre-collected trajectories. This approach limits scalability, causes overfitting to specific UI templates, and leads to brittle policies when faced with unseen environment. We present MobileGUI-RL, a scalable framework that trains GUI agent in online environment. MobileGUI-RL contains two key components. It (i) synthesizes a curriculum of learnable tasks through self-exploration and filtering, and (ii) adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards that balance task success and execution efficiency. Experiments on three online mobile-agent benchmarks show consistent gains, validating the effectiveness of our approach.

nan


Article 576

Title@2025-07-08 (2): Agentic-R1: Distilled Dual-Strategy Reasoning

Title: Agentic-R1: Distilled Dual-Strategy Reasoning Agentic-R1: Destillierte Dual-Strategy-Reasoning 剂- R1: 蒸馏的双战略理由 2507.05707v1

Authors (4): Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang

Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning. Our project is available at https://github.com/StigLidu/DualDistill

nan


Article 577

Title@2025-07-08 (2): MPX: Mixed Precision Training for JAX

Title: MPX: Mixed Precision Training for JAX MPX: Mixed Precision Training für JAX MPX:JAX混合精密训练 2507.03312v2

Authors (2): Alexander Gräfe, Sebastian Trimpe

Mixed-precision training has emerged as an indispensable tool for enhancing the efficiency of neural network training in recent years. Concurrently, JAX has grown in popularity as a versatile machine learning toolbox. However, it currently lacks robust support for mixed-precision training. We propose MPX, a mixed-precision training toolbox for JAX that simplifies and accelerates the training of large-scale neural networks while preserving model accuracy. MPX seamlessly integrates with popular toolboxes such as Equinox and Flax, allowing users to convert full-precision pipelines to mixed-precision versions with minimal modifications. By casting both inputs and outputs to half precision, and introducing a dynamic loss-scaling mechanism, MPX alleviates issues like gradient underflow and overflow that commonly arise in half precision computations. Its design inherits critical features from JAX’s type-promotion behavior, ensuring that operations take place in the correct precision and allowing for selective enforcement of full precision where needed (e.g., sums, means, or softmax). MPX further provides wrappers for automatic creation and management of mixed-precision gradients and optimizers, enabling straightforward integration into existing JAX training pipelines. MPX’s source code, documentation, and usage examples are available at github.com/Data-Science-in-Mechanical-Engineering/mixed_precision_for_JAX .

nan


Article 578

Title@2025-07-08 (2): A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution

Title: A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution Ein COMPASS zu Modellvergleich und simulationsbasierter Schlussfolgerung in der galaktischen chemischen Evolution 银河化学进化中以模型比较和模拟为基础的模拟推论的合成系统 2507.05060v2

Authors (3): Berkay Gunes, Sven Buder, Tobias Buck

We present COMPASS, a novel simulation-based inference framework that combines score-based diffusion models with transformer architectures to jointly perform parameter estimation and Bayesian model comparison across competing Galactic Chemical Evolution (GCE) models. COMPASS handles high-dimensional, incomplete, and variable-size stellar abundance datasets. Applied to high-precision elemental abundance measurements, COMPASS evaluates 40 combinations of nucleosynthetic yield tables. The model strongly favours Asymptotic Giant Branch yields from NuGrid and core-collapse SN yields used in the IllustrisTNG simulation, achieving near-unity cumulative posterior probability. Using the preferred model, we infer a steep high-mass IMF slope and an elevated Supernova Ia normalization, consistent with prior solar neighbourhood studies but now derived from fully amortized Bayesian inference. Our results demonstrate that modern SBI methods can robustly constrain uncertain physics in astrophysical simulators and enable principled model selection when analysing complex, simulation-based data.

nan


Article 579

Title@2025-07-08 (2): AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Title: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs AutoTriton: Automatische Triton-Programmierung mit Verstärkungslernen in LLMs AutoTriton: 自动三联编程,在LLMs中进行强化学习 2507.05687v1

Authors (11): Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

nan


Article 580

Title@2025-07-08 (2): Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG

Title: Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG Pre-Training Graph Kontrastive Maskierte Autoencoder sind starke Destiller für EEG 在EEG中,培训前的对比图蒙面自动编码器是强大的蒸馏器 2411.19230v2

Authors (6): Xinxu Wei, Kanhao Zhao, Yong Jiao, Hua Xie, Lifang He, Yu Zhang

Effectively utilizing extensive unlabeled high-density EEG data to improve performance in scenarios with limited labeled low-density EEG data presents a significant challenge. In this paper, we address this challenge by formulating it as a graph transfer learning and knowledge distillation problem. We propose a Unified Pre-trained Graph Contrastive Masked Autoencoder Distiller, named EEG-DisGCMAE, to bridge the gap between unlabeled and labeled as well as high- and low-density EEG data. Our approach introduces a novel unified graph self-supervised pre-training paradigm, which seamlessly integrates the graph contrastive pre-training with the graph masked autoencoder pre-training. Furthermore, we propose a graph topology distillation loss function, allowing a lightweight student model trained on low-density data to learn from a teacher model trained on high-density data during pre-training and fine-tuning. This method effectively handles missing electrodes through contrastive distillation. We validate the effectiveness of EEG-DisGCMAE across four classification tasks using two clinical EEG datasets with abundant data. The source code is available at https://github.com/weixinxu666/EEG_DisGCMAE.

nan


Article 581

Title@2025-07-08 (2): Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Title: Training Set Reconstruction from Differentially Private Forests: How Effective is DP? Trainingsset Rekonstruktion aus unterschiedlichen Privatwäldern: Wie effektiv ist DP? 从不同的私人森林中重建培训成套培训:发展伙伴关系如何有效? 2502.05307v2

Authors (4): Alice Gorgé, Julien Ferry, Sébastien Gambs, Thibaut Vidal

Recent research has shown that machine learning models are vulnerable to privacy attacks targeting their training data. To mitigate these risks, differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protection. In this paper, we introduce a reconstruction attack targeting state-of-the-art $\varepsilon$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest’s structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks and maintain non-trivial predictive performance.

nan


Article 582

Title@2025-07-08 (2): Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach

Title: Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach Effiziente Schulung von großformatigen KI-Modellen durch Federated Mixture-of-Experts: Ein System-Level-Ansatz 通过联邦混合专家对大型AI型模式进行有效培训:全系统办法 2507.05685v1

Authors (7): Xiaobing Chen, Boyang Zhang, Xiangwei Zhou, Mingxuan Sun, Shuai Zhang, Songyang Zhang, Geoffrey Ye Li

The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay between heterogeneous client resources and the sophisticated coordination required for numerous specialized experts. This article highlights a critical, yet underexplored concept: the absence of robust quantitative strategies for dynamic client-expert alignment that holistically considers varying client capacities and the imperative for system-wise load balancing. Specifically, we propose a conceptual system design for intelligent client-expert alignment that incorporates dynamic fitness scoring, global expert load monitoring, and client capacity profiling. By tackling these systemic issues, we can unlock more scalable, efficient, and robust training mechanisms {with fewer communication rounds for convergence}, paving the way for the widespread deployment of large-scale federated MoE-structured LAMs in edge computing with ultra-high communication efficiency.

nan


Article 583

Title@2025-07-08 (2): LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Title: LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization LoSiA: Effizientes High-Rank Feintuning über Subnet Lokalisierung und Optimierung LoSiA:通过子网本地化和优化化提高高射速精密导航效率 2507.04487v2

Authors (3): Xujia Wang, Yunjia Qi, Bin Xu

Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, significantly reduce the number of trainable parameters by introducing low-rank decomposition matrices. However, existing methods perform extensive matrix multiplications in domain specialization tasks, resulting in computational inefficiency and sub-optimal fine-tuning performance. Hence, we propose LoSiA(Low-Resources Subnet Integration Adaptation), an innovative method that dynamically localizes and optimizes critical parameters during the training process. Specifically, it identifies a sub-network using gradient sparsity analysis and optimizes it as the trainable target. This design enables effective high-rank adaptation by updating only the sub-network parameters, reducing the additional matrix multiplication. We also present LoSiA-Pro, a faster implementation of LoSiA, which reduces the training latency by about $27\%$ compared to LoRA. Extensive evaluations show that our method achieves minimal performance drop compared to full fine-tuning, while requiring the least training time across domain specialization and common-sense reasoning tasks. Further analysis shows that LoSiA also reduces forgetting during continued training.

nan


Article 584

Title@2025-07-08 (2): GATMesh: Clock Mesh Timing Analysis using Graph Neural Networks

Title: GATMesh: Clock Mesh Timing Analysis using Graph Neural Networks GATMesh: Uhr Mesh Timing Analyse mit Hilfe von Graph Neural Networks GATMesh:利用图形神经网络分析时钟网时间 2507.05681v1

Authors (2): Muhammad Hadir Khan, Matthew Guthaus

Clock meshes are essential in high-performance VLSI systems for minimizing skew and handling PVT variations, but analyzing them is difficult due to reconvergent paths, multi-source driving, and input mesh buffer skew. SPICE simulations are accurate but slow; yet simplified models miss key effects like slew and input skew. We propose GATMesh, a Graph Neural Network (GNN)-based framework that models the clock mesh as a graph with augmented structural and physical features. Trained on SPICE data, GATMesh achieves high accuracy with average delay error of 5.27ps on unseen benchmarks, while achieving speed-ups of 47146x over multi-threaded SPICE simulation.

nan


Article 585

Title@2025-07-08 (2): Speeding up Speculative Decoding via Sequential Approximate Verification

Title: Speeding up Speculative Decoding via Sequential Approximate Verification Beschleunigung der spekulativen Dekodierung über die sequentielle annähernde Verifikation 通过序列近似间接核查加快投机性代号 2502.04557v3

Authors (3): Meiyu Zhong, Noel Teku, Ravi Tandon

Speculative Decoding (SD) is a recently proposed technique for faster inference using Large Language Models (LLMs). SD operates by using a smaller draft LLM for autoregressively generating a sequence of tokens and a larger target LLM for parallel verification to ensure statistical consistency. However, periodic parallel calls to the target LLM for verification prevent SD from achieving even lower latencies. We propose SPRINTER, which utilizes a low-complexity verifier trained to predict if tokens generated from a draft LLM would be accepted by the target LLM. By performing sequential approximate verification, SPRINTER does not require verification by the target LLM and is only invoked when a token is deemed unacceptable. This reduces the number of calls to the larger LLM, achieving further speedups and lower computation cost. We present a theoretical analysis of SPRINTER, examining the statistical properties of the generated tokens, as well as the expected reduction in latency as a function of the verifier. We evaluate SPRINTER on several datasets and model pairs, demonstrating that approximate verification can still maintain high quality generation while further reducing latency.

nan


Article 586

Title@2025-07-08 (2): From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks

Title: From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks Von Gegenfakten zu Bäumen: Wettbewerbsanalyse von Modellextraktionsangriffen 从反事实到树木:对模拟采掘攻击的竞争分析 2502.05325v2

Authors (3): Awa Khouna, Julien Ferry, Thibaut Vidal

The advent of Machine Learning as a Service (MLaaS) has heightened the trade-off between model explainability and security. In particular, explainability techniques, such as counterfactual explanations, inadvertently increase the risk of model extraction attacks, enabling unauthorized replication of proprietary models. In this paper, we formalize and characterize the risks and inherent complexity of model reconstruction, focusing on the “oracle’’ queries required for faithfully inferring the underlying prediction function. We present the first formal analysis of model extraction attacks through the lens of competitive analysis, establishing a foundational framework to evaluate their efficiency. Focusing on models based on additive decision trees (e.g., decision trees, gradient boosting, and random forests), we introduce novel reconstruction algorithms that achieve provably perfect fidelity while demonstrating strong anytime performance. Our framework provides theoretical bounds on the query complexity for extracting tree-based model, offering new insights into the security vulnerabilities of their deployment.

nan


Article 587

Title@2025-07-08 (2): Canine Clinical Gait Analysis for Orthopedic and Neurological Disorders: An Inertial Deep-Learning Approach

Title: Canine Clinical Gait Analysis for Orthopedic and Neurological Disorders: An Inertial Deep-Learning Approach Canine Clinical Gait Analysis für orthopädische und neurologische Erkrankungen: Ein inertialer Deep-Learning-Ansatz 关于矫形和神经神经病症的犬犬临床盖特分析:一种永恒的深层学习方法 2507.05671v1

Authors (6): Netta Palez, Léonie Straß, Sebastian Meller, Holger Volk, Anna Zamansky, Itzik Klein

Canine gait analysis using wearable inertial sensors is gaining attention in veterinary clinical settings, as it provides valuable insights into a range of mobility impairments. Neurological and orthopedic conditions cannot always be easily distinguished even by experienced clinicians. The current study explored and developed a deep learning approach using inertial sensor readings to assess whether neurological and orthopedic gait could facilitate gait analysis. Our investigation focused on optimizing both performance and generalizability in distinguishing between these gait abnormalities. Variations in sensor configurations, assessment protocols, and enhancements to deep learning model architectures were further suggested. Using a dataset of 29 dogs, our proposed approach achieved 96% accuracy in the multiclass classification task (healthy/orthopedic/neurological) and 82% accuracy in the binary classification task (healthy/non-healthy) when generalizing to unseen dogs. Our results demonstrate the potential of inertial-based deep learning models to serve as a practical and objective diagnostic and clinical aid to differentiate gait assessment in orthopedic and neurological conditions.

nan


Article 588

Title@2025-07-08 (2): HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales

Title: HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales HRRRCast: ein datengesteuerter Emulator für regionale Wettervorhersagen bei Konvektion ermöglicht Skalen HRRRCast:数据驱动模拟器,用于在对流允许尺度范围内进行区域天气预报 2507.05658v1

Authors (8): Daniel Abdi, Isidora Jankov, Paul Madden, Vanderlei Vargas, Timothy A. Smith, Sergey Frolov, Montgomery Flora, Corey Potvin

The High-Resolution Rapid Refresh (HRRR) model is a convection-allowing model used in operational weather forecasting across the contiguous United States (CONUS). To provide a computationally efficient alternative, we introduce HRRRCast, a data-driven emulator built with advanced machine learning techniques. HRRRCast includes two architectures: a ResNet-based model (ResHRRR) and a Graph Neural Network-based model (GraphHRRR). ResHRRR uses convolutional neural networks enhanced with squeeze-and-excitation blocks and Feature-wise Linear Modulation, and supports probabilistic forecasting via the Denoising Diffusion Implicit Model (DDIM). To better handle longer lead times, we train a single model to predict multiple lead times (1h, 3h, and 6h), then use a greedy rollout strategy during inference. When evaluated on composite reflectivity over the full CONUS domain using ensembles of 3 to 10 members, ResHRRR outperforms HRRR forecast at light rainfall threshold (20 dBZ) and achieves competitive performance at moderate thresholds (30 dBZ). Our work advances the StormCast model of Pathak et al. [21] by: a) training on the full CONUS domain, b) using multiple lead times to improve long-range skill, c) training on analysis data instead of the +1h post-analysis data inadvertently used in StormCast, and d) incorporating future GFS states as inputs, enabling downscaling that improves long-lead accuracy. Grid-, neighborhood-, and object-based metrics confirm better storm placement, lower frequency bias, and higher success ratios than HRRR. HRRRCast ensemble forecasts also maintain sharper spatial detail, with power spectra more closely matching HRRR analysis. While GraphHRRR underperforms in its current form, it lays groundwork for future graph-based forecasting. HRRRCast represents a step toward efficient, data-driven regional weather prediction with competitive accuracy and ensemble capability.

nan


Article 589

Title@2025-07-08 (2): KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks

Title: KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks KAN-AD: Zeitreihen-Anomalieerkennung mit Kolmogorov-Arnold-Netzwerken KAN-AD: 与科尔莫戈罗夫-阿诺尔德网络的异常探测时间序列 2411.00278v3

Authors (9): Quan Zhou, Changhua Pei, Fei Sun, Jing Han, Zhengwei Gao, Dan Pei, Haiming Zhang, Gaogang Xie, Jianhui Li

Time series anomaly detection (TSAD) underpins real-time monitoring in cloud services and web systems, allowing rapid identification of anomalies to prevent costly failures. Most TSAD methods driven by forecasting models tend to overfit by emphasizing minor fluctuations. Our analysis reveals that effective TSAD should focus on modeling “normal” behavior through smooth local patterns. To achieve this, we reformulate time series modeling as approximating the series with smooth univariate functions. The local smoothness of each univariate function ensures that the fitted time series remains resilient against local disturbances. However, a direct KAN implementation proves susceptible to these disturbances due to the inherently localized characteristics of B-spline functions. We thus propose KAN-AD, replacing B-splines with truncated Fourier expansions and introducing a novel lightweight learning mechanism that emphasizes global patterns while staying robust to local disturbances. On four popular TSAD benchmarks, KAN-AD achieves an average 15% improvement in detection accuracy (with peaks exceeding 27%) over state-of-the-art baselines. Remarkably, it requires fewer than 1,000 trainable parameters, resulting in a 50% faster inference speed compared to the original KAN, demonstrating the approach’s efficiency and practical viability.

nan


Article 590

Title@2025-07-08 (2): MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

Title: MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation MALT Diffusion: Memory-Augmented Latent Transformers für jede Länge Video-Generation MALT 传播: 用于任何语言视频生成的内存增强的中子变换器 2502.12632v3

Authors (9): Sihyun Yu, Meera Hahn, Dan Kondratyuk, Jinwoo Shin, Agrim Gupta, José Lezama, Irfan Essa, David Ross, Jonathan Huang

Diffusion models are successful for synthesizing high-quality videos but are limited to generating short clips (e.g., 2-10 seconds). Synthesizing sustained footage (e.g. over minutes) still remains an open research question. In this paper, we propose MALT Diffusion (using Memory-Augmented Latent Transformers), a new diffusion model specialized for long video generation. MALT Diffusion (or just MALT) handles long videos by subdividing them into short segments and doing segment-level autoregressive generation. To achieve this, we first propose recurrent attention layers that encode multiple segments into a compact memory latent vector; by maintaining this memory vector over time, MALT is able to condition on it and continuously generate new footage based on a long temporal context. We also present several training techniques that enable the model to generate frames over a long horizon with consistent quality and minimal degradation. We validate the effectiveness of MALT through experiments on long video benchmarks. We first perform extensive analysis of MALT in long-contextual understanding capability and stability using popular long video benchmarks. For example, MALT achieves an FVD score of 220.4 on 128-frame video generation on UCF-101, outperforming the previous state-of-the-art of 648.4. Finally, we explore MALT’s capabilities in a text-to-video generation setting and show that it can produce long videos compared with recent techniques for long text-to-video generation.

nan


Article 591

Title@2025-07-08 (2): DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning

Title: DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning DESIGN: Verschlüsselte GNN-Inferenz über Server-Side Input Graph Pruning design:通过服务器- Side 输入图路透图加密的 GNN 推论 2507.05649v1

Authors (4): Kaixiang Zhao, Joseph Yousry Attalla, Qian Lou, Yushun Dong

Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph-based learning tasks. However, enabling privacy-preserving GNNs in encrypted domains, such as under Fully Homomorphic Encryption (FHE), typically incurs substantial computational overhead, rendering real-time and privacy-preserving inference impractical. In this work, we propose DESIGN (EncrypteD GNN Inference via sErver-Side Input Graph pruNing), a novel framework for efficient encrypted GNN inference. DESIGN tackles the critical efficiency limitations of existing FHE GNN approaches, which often overlook input data redundancy and apply uniform computational strategies. Our framework achieves significant performance gains through a hierarchical optimization strategy executed entirely on the server: first, FHE-compatible node importance scores (based on encrypted degree statistics) are computed from the encrypted graph. These scores then guide a homomorphic partitioning process, generating multi-level importance masks directly under FHE. This dynamically generated mask facilitates both input graph pruning (by logically removing unimportant elements) and a novel adaptive polynomial activation scheme, where activation complexity is tailored to node importance levels. Empirical evaluations demonstrate that DESIGN substantially accelerates FHE GNN inference compared to state-of-the-art methods while maintaining competitive model accuracy, presenting a robust solution for secure graph analytics.

nan


Article 592

Title@2025-07-08 (2): Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics

Title: Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics Krümmungsorientiertes Federated Learning (CAFe): Harmonisierung von Verlustlandschaften für Fairness ohne Demographie CAFE: 协调丧失的景观,促进没有人口统计的公平 2404.19725v5

Authors (3): Shaily Roy, Harshit Sharma, Asif Salekin

Federated Learning (FL) enables privacy-preserving collaborative training, making it well-suited for decentralized human-sensing applications. Ensuring fairness in FL is challenging, as current methods rely on sensitive attribute knowledge, which conflicts with FL’s privacy principles. Additionally, sensitive attributes in human-sensing data may be unknown or latent. To address this, we introduce Curvature-Aligned Federated Learning (CAFe), a theoretically grounded approach that achieves fairness in FL without requiring sensitive attribute knowledge, a concept termed “Fairness without Demographics” (FWD). CAFe introduces loss-landscape curvature regularization during local training and clients’ loss-landscape sharpness-aware aggregation to align curvature both within and across clients, enabling a strong balance between higher fairness and performance. CAFe is especially suitable for real-world human-sensing FL scenarios involving single or multi-user edge devices with unknown or multiple bias factors. We validated CAFe through theoretical and empirical justifications, and comprehensive evaluations using three real-world datasets and a live real-world FL deployment with a heterogeneous testbed of resource-constrained devices. Additionally, we conduct sensitivity analyses on local training data volume, client sampling, communication overhead, resource costs, and runtime performance to demonstrate its feasibility for practical FL edge device deployment.

nan


Article 593

Title@2025-07-08 (2): FACT: the Features At Convergence Theorem for neural networks

Title: FACT: the Features At Convergence Theorem for neural networks FACT: die Merkmale am Konvergenz-Theorem für neuronale Netzwerke FACT:神经网络在趋同理论上的特征 2507.05644v1

Authors (4): Enric Boix-Adsera, Neil Mallinar, James B. Simon, Mikhail Belkin

A central challenge in deep learning theory is to understand how neural networks learn and represent features. To this end, we prove the Features at Convergence Theorem (FACT), which gives a self-consistency equation that neural network weights satisfy at convergence when trained with nonzero weight decay. For each weight matrix $W$, this equation relates the “feature matrix” $W^\top W$ to the set of input vectors passed into the matrix during forward propagation and the loss gradients passed through it during backpropagation. We validate this relation empirically, showing that neural features indeed satisfy the FACT at convergence. Furthermore, by modifying the “Recursive Feature Machines” of Radhakrishnan et al. 2024 so that they obey the FACT, we arrive at a new learning algorithm, FACT-RFM. FACT-RFM achieves high performance on tabular data and captures various feature learning behaviors that occur in neural network training, including grokking in modular arithmetic and phase transitions in learning sparse parities.

nan


Article 594

Title@2025-07-08 (2): An empirical study of task and feature correlations in the reuse of pre-trained models

Title: An empirical study of task and feature correlations in the reuse of pre-trained models Eine empirische Untersuchung von Aufgaben- und Merkmalskorrelationen bei der Wiederverwendung von vortrainierten Modellen 经验性研究,研究在重新使用培训前模式方面的任务和特点相互关系 2506.01975v2

Authors (2): Jama Hussein Mohamud, Willie Brink

Pre-trained neural networks are commonly used and reused in the machine learning community. Alice trains a model for a particular task, and a part of her neural network is reused by Bob for a different task, often to great effect. To what can we ascribe Bob’s success? This paper introduces an experimental setup through which factors contributing to Bob’s empirical success could be studied in silico. As a result, we demonstrate that Bob might just be lucky: his task accuracy increases monotonically with the correlation between his task and Alice’s. Even when Bob has provably uncorrelated tasks and input features from Alice’s pre-trained network, he can achieve significantly better than random performance due to Alice’s choice of network and optimizer. When there is little correlation between tasks, only reusing lower pre-trained layers is preferable, and we hypothesize the converse: that the optimal number of retrained layers is indicative of task and feature correlation. Finally, we show in controlled real-world scenarios that Bob can effectively reuse Alice’s pre-trained network if there are semantic correlations between his and Alice’s task.

nan


Article 595

Title@2025-07-08 (2): A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs

Title: A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs Laufzeit-Adaptive Transformer Neural Network Accelerator auf FPGAs FPGAs 运行时间- 适应性变革器神经网络加速器 2411.18148v3

Authors (5): Ehsan Kabir, Austin R. J. Downey, Jason D. Bakos, David Andrews, Miaoqing Huang

Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.

nan


Article 596

Title@2025-07-08 (2): Variational OOD State Correction for Offline Reinforcement Learning

Title: Variational OOD State Correction for Offline Reinforcement Learning Unterschiedliche OOD-Staatskorrektur für Offline-Verstärkungslernen OOOD 国家离线强化学习校正 2505.00503v3

Authors (3): Ke Jiang, Wen Jiang, Xiaoyang Tan

The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the offline MuJoCo and AntMaze suites.

nan


Article 597

Title@2025-07-08 (2): Learnable quantum spectral filters for hybrid graph neural networks

Title: Learnable quantum spectral filters for hybrid graph neural networks Erlernbare Quantenspektralfilter für hybride Graphen-Neuralnetzwerke 用于混合图形神经网络的可学习量子光谱过滤器 2507.05640v1

Authors (1): Ammar Daskin

In this paper, we describe a parameterized quantum circuit that can be considered as convolutional and pooling layers for graph neural networks. The circuit incorporates the parameterized quantum Fourier circuit where the qubit connections for the controlled gates derived from the Laplacian operator. Specifically, we show that the eigenspace of the Laplacian operator of a graph can be approximated by using QFT based circuit whose connections are determined from the adjacency matrix. For an $N\times N$ Laplacian, this approach yields an approximate polynomial-depth circuit requiring only $n=log(N)$ qubits. These types of circuits can eliminate the expensive classical computations for approximating the learnable functions of the Laplacian through Chebyshev polynomial or Taylor expansions. Using this circuit as a convolutional layer provides an $n-$ dimensional probability vector that can be considered as the filtered and compressed graph signal. Therefore, the circuit along with the measurement can be considered a very efficient convolution plus pooling layer that transforms an $N$-dimensional signal input into $n-$dimensional signal with an exponential compression. We then apply a classical neural network prediction head to the output of the circuit to construct a complete graph neural network. Since the circuit incorporates geometric structure through its graph connection-based approach, we present graph classification results for the benchmark datasets listed in TUDataset library. Using only [1-100] learnable parameters for the quantum circuit and minimal classical layers (1000-5000 parameters) in a generic setting, the obtained results are comparable to and in some cases better than many of the baseline results, particularly for the cases when geometric structure plays a significant role.

nan


Article 598

Title@2025-07-08 (2): Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs

Title: Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs Learning Federated Neural Graph Datenbanken zur Beantwortung komplexer Abfragen aus verteilten Wissensgraphen 用于回答分布式知识图的复杂查询的学习联邦神经图数据库 2402.14609v4

Authors (9): Qi Hu, Weifeng Jiang, Haoran Li, Zihao Wang, Jiaxin Bai, Qianren Mao, Yangqiu Song, Lixin Fan, Jianxin Li

The increasing demand for deep learning-based foundation models has highlighted the importance of efficient data retrieval mechanisms. Neural graph databases (NGDBs) offer a compelling solution, leveraging neural spaces to store and query graph-structured data, thereby enabling LLMs to access precise and contextually relevant information. However, current NGDBs are constrained to single-graph operation, limiting their capacity to reason across multiple, distributed graphs. Furthermore, the lack of support for multi-source graph data in existing NGDBs hinders their ability to capture the complexity and diversity of real-world data. In many applications, data is distributed across multiple sources, and the ability to reason across these sources is crucial for making informed decisions. This limitation is particularly problematic when dealing with sensitive graph data, as directly sharing and aggregating such data poses significant privacy risks. As a result, many applications that rely on NGDBs are forced to choose between compromising data privacy or sacrificing the ability to reason across multiple graphs. To address these limitations, we propose to learn Federated Neural Graph DataBase (FedNGDB), a pioneering systematic framework that empowers privacy-preserving reasoning over multi-source graph data. FedNGDB leverages federated learning to collaboratively learn graph representations across multiple sources, enriching relationships between entities, and improving the overall quality of graph data.

nan


Article 599

Title@2025-07-08 (2): Graph Learning

Title: Graph Learning Graphisches Lernen 图表学习 2507.05636v1

Authors (8): Feng Xia, Ciyuan Peng, Jing Ren, Falih Gozi Febrinanto, Renqiang Luo, Vidya Saikrishna, Shuo Yu, Xiangjie Kong

Graph learning has rapidly evolved into a critical subfield of machine learning and artificial intelligence (AI). Its development began with early graph-theoretic methods, gaining significant momentum with the advent of graph neural networks (GNNs). Over the past decade, progress in scalable architectures, dynamic graph modeling, multimodal learning, generative AI, explainable AI (XAI), and responsible AI has broadened the applicability of graph learning to various challenging environments. Graph learning is significant due to its ability to model complex, non-Euclidean relationships that traditional machine learning struggles to capture, thus better supporting real-world applications ranging from drug discovery and fraud detection to recommender systems and scientific reasoning. However, challenges like scalability, generalization, heterogeneity, interpretability, and trustworthiness must be addressed to unlock its full potential. This survey provides a comprehensive introduction to graph learning, focusing on key dimensions including scalable, temporal, multimodal, generative, explainable, and responsible graph learning. We review state-of-the-art techniques for efficiently handling large-scale graphs, capturing dynamic temporal dependencies, integrating heterogeneous data modalities, generating novel graph samples, and enhancing interpretability to foster trust and transparency. We also explore ethical considerations, such as privacy and fairness, to ensure responsible deployment of graph learning models. Additionally, we identify and discuss emerging topics, highlighting recent integration of graph learning and other AI paradigms and offering insights into future directions. This survey serves as a valuable resource for researchers and practitioners seeking to navigate the rapidly evolving landscape of graph learning.

nan


Article 600

Title@2025-07-08 (2): DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning

Title: DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning DeepCell: Selbstüberwachte Multiview-Fusion für Circuit Representative Learning DeepCell: 巡回代表学习自上式多视角集成 2502.06816v2

Authors (10): Zhengyuan Shi, Chengyu Ma, Ziyang Zheng, Lingfeng Zhou, Hongyang Pan, Wentao Jiang, Fan Yang, Xiaoyan Yang, Zhufei Chu, Qiang Xu

We introduce DeepCell, a novel circuit representation learning framework that effectively integrates multiview information from both And-Inverter Graphs (AIGs) and Post-Mapping (PM) netlists. At its core, DeepCell employs a self-supervised Mask Circuit Modeling (MCM) strategy, inspired by masked language modeling, to fuse complementary circuit representations from different design stages into unified and rich embeddings. To our knowledge, DeepCell is the first framework explicitly designed for PM netlist representation learning, setting new benchmarks in both predictive accuracy and reconstruction quality. We demonstrate the practical efficacy of DeepCell by applying it to critical EDA tasks such as functional Engineering Change Orders (ECO) and technology mapping. Extensive experimental results show that DeepCell significantly surpasses state-of-the-art open-source EDA tools in efficiency and performance.

nan


Article 601

Title@2025-07-08 (2): How Not to Detect Prompt Injections with an LLM

Title: How Not to Detect Prompt Injections with an LLM Wie man Injektionen mit einem LLM nicht erkennen kann 如何不用LLM检测快速注射 2507.05630v1

Authors (4): Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha

LLM-integrated applications and agents are vulnerable to prompt injection attacks, in which adversaries embed malicious instructions within seemingly benign user inputs to manipulate the LLM’s intended behavior. Recent defenses based on $\textit{known-answer detection}$ (KAD) have achieved near-perfect performance by using an LLM to classify inputs as clean or contaminated. In this work, we formally characterize the KAD framework and uncover a structural vulnerability in its design that invalidates its core security premise. We design a methodical adaptive attack, $\textit{DataFlip}$, to exploit this fundamental weakness. It consistently evades KAD defenses with detection rates as low as $1.5\%$ while reliably inducing malicious behavior with success rates of up to $88\%$, without needing white-box access to the LLM or any optimization procedures.

nan


Article 602

Title@2025-07-08 (2): One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP

Title: One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP Ein Surrogate an Narren: All: Universelle, übertragbare und gezielte Widersacherangriffe mit CLIP 以CLIP取代 “ 愚人Them all “ :通用、可转移和有针对性的对立攻击 2505.19840v2

Authors (4): Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang

Deep Neural Networks (DNNs) have achieved widespread success yet remain prone to adversarial attacks. Typically, such attacks either involve frequent queries to the target model or rely on surrogate models closely mirroring the target model – often trained with subsets of the target model’s training data – to achieve high attack success rates through transferability. However, in realistic scenarios where training data is inaccessible and excessive queries can raise alarms, crafting adversarial examples becomes more challenging. In this paper, we present UnivIntruder, a novel attack framework that relies solely on a single, publicly available CLIP model and publicly available datasets. By using textual concepts, UnivIntruder generates universal, transferable, and targeted adversarial perturbations that mislead DNNs into misclassifying inputs into adversary-specified classes defined by textual concepts. Our extensive experiments show that our approach achieves an Attack Success Rate (ASR) of up to 85% on ImageNet and over 99% on CIFAR-10, significantly outperforming existing transfer-based methods. Additionally, we reveal real-world vulnerabilities, showing that even without querying target models, UnivIntruder compromises image search engines like Google and Baidu with ASR rates up to 84%, and vision language models like GPT-4 and Claude-3.5 with ASR rates up to 80%. These findings underscore the practicality of our attack in scenarios where traditional avenues are blocked, highlighting the need to reevaluate security paradigms in AI applications.

nan


Article 603

Title@2025-07-08 (2): StreamDiT: Real-Time Streaming Text-to-Video Generation

Title: StreamDiT: Real-Time Streaming Text-to-Video Generation StreamDiT: Echtzeit-Streaming Text-zu-Video-Generierung 流流DDIT:实时流动文本到视频一代 2507.03745v2

Authors (5): Akio Kodaira, Tingbo Hou, Ji Hou, Masayoshi Tomizuka, Yue Zhao

Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only short clips offline, restricting their use cases in interactive and real-time applications. This paper addresses these challenges by proposing StreamDiT, a streaming video generation model. StreamDiT training is based on flow matching by adding a moving buffer. We design mixed training with different partitioning schemes of buffered frames to boost both content consistency and visual quality. StreamDiT modeling is based on adaLN DiT with varying time embedding and window attention. To practice the proposed method, we train a StreamDiT model with 4B parameters. In addition, we propose a multistep distillation method tailored for StreamDiT. Sampling distillation is performed in each segment of a chosen partitioning scheme. After distillation, the total number of function evaluations (NFEs) is reduced to the number of chunks in a buffer. Finally, our distilled model reaches real-time performance at 16 FPS on one GPU, which can generate video streams at 512p resolution. We evaluate our method through both quantitative metrics and human evaluation. Our model enables real-time applications, e.g. streaming generation, interactive generation, and video-to-video. We provide video results and more examples in our project website: https://cumulo-autumn.github.io/StreamDiT/

nan


Article 604

Title@2025-07-08 (2): DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective

Title: DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective DATENBench: Auswertung von Datensatz-Auditing in Deep Learning aus einer adversarischen Perspektive DATABENCH:从反向角度评价深学习中数据集审计 2507.05622v1

Authors (11): Shuo Shao, Yiming Li, Mengren Zheng, Zhiyang Hu, Yukun Chen, Boheng Li, Yu He, Junfeng Guo, Tianwei Zhang, Dacheng Tao, Zhan Qin

The widespread application of Deep Learning across diverse domains hinges critically on the quality and composition of training datasets. However, the common lack of disclosure regarding their usage raises significant privacy and copyright concerns. Dataset auditing techniques, which aim to determine if a specific dataset was used to train a given suspicious model, provide promising solutions to addressing these transparency gaps. While prior work has developed various auditing methods, their resilience against dedicated adversarial attacks remains largely unexplored. To bridge the gap, this paper initiates a comprehensive study evaluating dataset auditing from an adversarial perspective. We start with introducing a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing). Subsequently, we formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset. Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery. These formulations and strategies lead to our new benchmark, DATABench, comprising 17 evasion attacks, 5 forgery attacks, and 9 representative auditing methods. Extensive evaluations using DATABench reveal that none of the evaluated auditing methods are sufficiently robust or distinctive under adversarial settings. These findings underscore the urgent need for developing a more secure and reliable dataset auditing method capable of withstanding sophisticated adversarial manipulation. Code is available at https://github.com/shaoshuo-ss/DATABench.

nan


Article 605

Title@2025-07-08 (2): Generative Head-Mounted Camera Captures for Photorealistic Avatars

Title: Generative Head-Mounted Camera Captures for Photorealistic Avatars Generative Head-Mounted Kameraaufnahmen für fotorealistische Avatare 为摄影现实主义的Astatars 生成头热摄像头摄像头抓取 2507.05620v1

Authors (11): Shaojie Bai, Seunghyeon Seo, Yida Wang, Chenghui Li, Owen Wang, Te-Li Wang, Tianyang Ma, Jason Saragih, Shih-En Wei, Nojun Kwak, Hyung Jun Kim

Enabling photorealistic avatar animations in virtual and augmented reality (VR/AR) has been challenging because of the difficulty of obtaining ground truth state of faces. It is physically impossible to obtain synchronized images from head-mounted cameras (HMC) sensing input, which has partial observations in infrared (IR), and an array of outside-in dome cameras, which have full observations that match avatars’ appearance. Prior works relying on analysis-by-synthesis methods could generate accurate ground truth, but suffer from imperfect disentanglement between expression and style in their personalized training. The reliance of extensive paired captures (HMC and dome) for the same subject makes it operationally expensive to collect large-scale datasets, which cannot be reused for different HMC viewpoints and lighting. In this work, we propose a novel generative approach, Generative HMC (GenHMC), that leverages large unpaired HMC captures, which are much easier to collect, to directly generate high-quality synthetic HMC images given any conditioning avatar state from dome captures. We show that our method is able to properly disentangle the input conditioning signal that specifies facial expression and viewpoint, from facial appearance, leading to more accurate ground truth. Furthermore, our method can generalize to unseen identities, removing the reliance on the paired captures. We demonstrate these breakthroughs by both evaluating synthetic HMC images and universal face encoders trained from these new HMC-avatar correspondences, which achieve better data efficiency and state-of-the-art accuracy.

nan


Article 606

Title@2025-07-08 (2): Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study

Title: Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study Erkennung und Eindämmung von Belohnungshacking in Verstärkungs-Lernsystemen: Eine umfassende empirische Studie 检测和减轻强化学习系统中的回扣:综合经验研究 2507.05619v1

Authors (3): Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Reward hacking in Reinforcement Learning (RL) systems poses a critical threat to the deployment of autonomous agents, where agents exploit flaws in reward functions to achieve high scores without fulfilling intended objectives. Despite growing awareness of this problem, systematic detection and mitigation approaches remain limited. This paper presents a large-scale empirical study of reward hacking across diverse RL environments and algorithms. We analyze 15,247 training episodes across 15 RL environments (Atari, MuJoCo, custom domains) and 5 algorithms (PPO, SAC, DQN, A3C, Rainbow), implementing automated detection algorithms for six categories of reward hacking: specification gaming, reward tampering, proxy optimization, objective misalignment, exploitation patterns, and wireheading. Our detection framework achieves 78.4% precision and 81.7% recall across environments, with computational overhead under 5%. Through controlled experiments varying reward function properties, we demonstrate that reward density and alignment with true objectives significantly impact hacking frequency ($p < 0.001$, Cohen’s $d = 1.24$). We validate our approach through three simulated application studies representing recommendation systems, competitive gaming, and robotic control scenarios. Our mitigation techniques reduce hacking frequency by up to 54.6% in controlled scenarios, though we find these trade-offs are more challenging in practice due to concept drift, false positive costs, and adversarial adaptation. All detection algorithms, datasets, and experimental protocols are publicly available to support reproducible research in RL safety.

nan


Article 607

Title@2025-07-08 (2): A Theory for Conditional Generative Modeling on Multiple Data Sources

Title: A Theory for Conditional Generative Modeling on Multiple Data Sources Eine Theorie für die bedingte Generative Modellierung auf mehreren Datenquellen 多种数据源有条件生成建模理论 2502.14583v2

Authors (5): Rongzhen Wang, Yan Zhang, Chenyu Zheng, Chongxuan Li, Guoqiang Wu

The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes the first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specifically, we establish a general distribution estimation error bound in average total variation distance for conditional maximum likelihood estimation based on the bracketing number. Our result shows that when source distributions share certain similarities and the model is expressive enough, multi-source training guarantees a sharper bound than single-source training. We further instantiate the general theory on conditional Gaussian estimation and deep generative models including autoregressive and flexible energy-based models, by characterizing their bracketing numbers. The results highlight that the number of sources and similarity among source distributions improve the advantage of multi-source training. Simulations and real-world experiments are conducted to validate the theory, with code available at: https://github.com/ML-GSAI/Multi-Source-GM.

nan


Article 608

Title@2025-07-08 (2): Activation Steering for Chain-of-Thought Compression

Title: Activation Steering for Chain-of-Thought Compression Aktivierungslenkung für Ketten-of-Thought-Kompression 研究链压缩启动指导 2507.04742v2

Authors (3): Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram

Large language models (LLMs) excel at complex reasoning when they include intermediate steps, known as “chains of thought” (CoTs). However, these rationales are often overly verbose, even for simple problems, leading to wasted context, increased latency, and higher energy consumption. We observe that verbose, English-heavy CoTs and concise, math-centric CoTs occupy distinct regions in the model’s residual-stream activation space. By extracting and injecting a “steering vector” to transition between these modes, we can reliably shift generation toward more concise reasoning, effectively compressing CoTs without retraining. We formalize this approach as Activation-Steered Compression (ASC), an inference-time technique that shortens reasoning traces by directly modifying hidden representations. In addition, we provide a theoretical analysis of the impact of ASC on the output distribution, derived from a closed-form KL-divergence-bounded constraint to regulate steering strength. Using only 100 paired verbose and concise examples, ASC achieves up to 67.43% reduction in CoT length on MATH500 and GSM8K datasets, while maintaining accuracy across 7B, 8B, and 32B parameter models. As a training-free method, ASC introduces negligible runtime overhead and, on MATH500, delivers an average 2.73x speedup in end-to-end reasoning wall-clock time on an 8B model. This makes ASC a practical and efficient tool for streamlining the deployment of reasoning-capable LLMs in latency- or cost-sensitive settings. The code is available at: https://github.com/ArminAzizi98/ASC

nan


Article 609

Title@2025-07-08 (2): Stacked conformal prediction

Title: Stacked conformal prediction Gestapelte konforme Vorhersage 堆叠一致预测 2505.12578v3

Authors (1): Paulo C. Marques F

We consider a method for conformalizing a stacked ensemble of predictive models, showing that the potentially simple form of the meta-learner at the top of the stack enables a procedure with manageable computational cost that achieves approximate marginal validity without requiring the use of a separate calibration sample. Empirical results indicate that the method compares favorably to a standard inductive alternative.

nan


Article 610

Title@2025-07-08 (2): Efficient Detection of Intermittent Job Failures Using Few-Shot Learning

Title: Efficient Detection of Intermittent Job Failures Using Few-Shot Learning Effiziente Erkennung intermittierender Job-Fälle durch wenig scharfes Lernen 利用很少热的学习方法有效检测间歇性工作失败 2507.04173v2

Authors (3): Henri Aïdasso, Francis Bordeleau, Ali Tizghadam

One of the main challenges developers face in the use of continuous integration (CI) and deployment pipelines is the occurrence of intermittent job failures, which result from unexpected non-deterministic issues (e.g., flaky tests or infrastructure problems) rather than regular code-related errors such as bugs. Prior studies developed machine learning (ML) models trained on large datasets of job logs to classify job failures as either intermittent or regular. As an alternative to costly manual labeling of large datasets, the state-of-the-art (SOTA) approach leveraged a heuristic based on non-deterministic job reruns. However, this method mislabels intermittent job failures as regular in contexts where rerunning suspicious job failures is not an explicit policy, and therefore limits the SOTA’s performance in practice. In fact, our manual analysis of 2,125 job failures from 5 industrial and 1 open-source projects reveals that, on average, 32% of intermittent job failures are mislabeled as regular. To address these limitations, this paper introduces a novel approach to intermittent job failure detection using few-shot learning (FSL). Specifically, we fine-tune a small language model using a few number of manually labeled log examples to generate rich embeddings, which are then used to train an ML classifier. Our FSL-based approach achieves 70-88% F1-score with only 12 shots in all projects, outperforming the SOTA, which proved ineffective (34-52% F1-score) in 4 projects. Overall, this study underlines the importance of data quality over quantity and provides a more efficient and practical framework for the detection of intermittent job failures in organizations.

nan


Article 611

Title@2025-07-08 (2): The Fourier Spectral Transformer Networks For Efficient and Generalizable Nonlinear PDEs Prediction

Title: The Fourier Spectral Transformer Networks For Efficient and Generalizable Nonlinear PDEs Prediction Die Fourier Spectral Transformer-Netzwerke für effiziente und generalisierbare nichtlineare PDE-Vorhersage Fariier 光谱变异器网络,以高效和可通用的非线性PDEs预测 2507.05584v1

Authors (1): Beibei Li

In this work we propose a unified Fourier Spectral Transformer network that integrates the strengths of classical spectral methods and attention based neural architectures. By transforming the original PDEs into spectral ordinary differential equations, we use high precision numerical solvers to generate training data and use a Transformer network to model the evolution of the spectral coefficients. We demonstrate the effectiveness of our approach on the two dimensional incompressible Navier-Stokes equations and the one dimensional Burgers’ equation. The results show that our spectral Transformer can achieve highly accurate long term predictions even with limited training data, better than traditional numerical methods and machine learning methods in forecasting future flow dynamics. The proposed framework generalizes well to unseen data, bringing a promising paradigm for real time prediction and control of complex dynamical systems.

nan


Article 612

Title@2025-07-08 (2): Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance

Title: Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance Ethische KI für junge digitale Bürgerinnen und Bürger: Ein Aufruf zum Handeln in Sachen Datenschutz-Governance 青年数字公民道德伦理协会:隐私权治理行动呼吁 2503.11947v2

Authors (4): Austin Shouli, Ankur Barthwal, Molly Campbell, Ajay Kumar Shrestha

The rapid expansion of Artificial Intelligence (AI) in digital platforms used by youth has created significant challenges related to privacy, autonomy, and data protection. While AI-driven personalization offers enhanced user experiences, it often operates without clear ethical boundaries, leaving young users vulnerable to data exploitation and algorithmic biases. This paper presents a call to action for ethical AI governance, advocating for a structured framework that ensures youth-centred privacy protections, transparent data practices, and regulatory oversight. We outline key areas requiring urgent intervention, including algorithmic transparency, privacy education, parental data-sharing ethics, and accountability measures. Through this approach, we seek to empower youth with greater control over their digital identities and propose actionable strategies for policymakers, AI developers, and educators to build a fairer and more accountable AI ecosystem.

nan


Article 613

Title@2025-07-08 (2): Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization

Title: Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization Modellfreie optische Prozessoren mit In Situ Verstärkungslernen mit proximaler Politikoptimierung 采用不使用模型的光学处理器,在采用最优政策优化的 “ 环境强化学习 “ 中采用无模型的光学处理器 2507.05583v1

Authors (4): Yuhang Li, Shiqi Chen, Tingyu Gong, Aydogan Ozcan

Optical computing holds promise for high-speed, energy-efficient information processing, with diffractive optical networks emerging as a flexible platform for implementing task-specific transformations. A challenge, however, is the effective optimization and alignment of the diffractive layers, which is hindered by the difficulty of accurately modeling physical systems with their inherent hardware imperfections, noise, and misalignments. While existing in situ optimization methods offer the advantage of direct training on the physical system without explicit system modeling, they are often limited by slow convergence and unstable performance due to inefficient use of limited measurement data. Here, we introduce a model-free reinforcement learning approach utilizing Proximal Policy Optimization (PPO) for the in situ training of diffractive optical processors. PPO efficiently reuses in situ measurement data and constrains policy updates to ensure more stable and faster convergence. We experimentally validated our method across a range of in situ learning tasks, including targeted energy focusing through a random diffuser, holographic image generation, aberration correction, and optical image classification, demonstrating in each task better convergence and performance. Our strategy operates directly on the physical system and naturally accounts for unknown real-world imperfections, eliminating the need for prior system knowledge or modeling. By enabling faster and more accurate training under realistic experimental constraints, this in situ reinforcement learning approach could offer a scalable framework for various optical and physical systems governed by complex, feedback-driven dynamics.

nan


Article 614

Title@2025-07-08 (2): Visual Adaptive Prompting for Compositional Zero-Shot Learning

Title: Visual Adaptive Prompting for Compositional Zero-Shot Learning Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen 零热学习的视觉适应性促进 2502.20292v5

Authors (4): Kyle Stein, Arash Mahyari, Guillermo Francia, Eman El-Sheikh

Vision-Language Models (VLMs) have demonstrated impressive multimodal capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives–such as attributes and objects–that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose a Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval mechanism within the framework of VLMs to bridge the gap between semantic and visual features. Our method introduces a dynamic visual prompt repository mechanism that selects the most relevant attribute and object prompts based on the visual features of the image. Our proposed system includes a visual prompt adapter that encourages the model to learn a more generalizable embedding space. Experiments on three CZSL benchmarks, across both closed and open-world scenarios, demonstrate state-of-the-art results.

nan


Article 615

Title@2025-07-08 (2): Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems

Title: Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen 动态系统中不可解释的异常探测和根本原因分析 2502.12086v2

Authors (3): Yue Sun, Rick S. Blum, Parv Venkitasubramaniam

Dynamical systems, prevalent in various scientific and engineering domains, are susceptible to anomalies that can significantly impact their performance and reliability. This paper addresses the critical challenges of anomaly detection, root cause localization, and anomaly type classification in dynamical systems governed by ordinary differential equations (ODEs). We define two categories of anomalies: cyber anomalies, which propagate through interconnected variables, and measurement anomalies, which remain localized to individual variables. To address these challenges, we propose the Interpretable Causality Ordinary Differential Equation (ICODE) Networks, a model-intrinsic explainable learning framework. ICODE leverages Neural ODEs for anomaly detection while employing causality inference through an explanation channel to perform root cause analysis (RCA), elucidating why specific time periods are flagged as anomalous. ICODE is designed to simultaneously perform anomaly detection, RCA, and anomaly type classification within a single, interpretable framework. Our approach is grounded in the hypothesis that anomalies alter the underlying ODEs of the system, manifesting as changes in causal relationships between variables. We provide a theoretical analysis of how perturbations in learned model parameters can be utilized to identify anomalies and their root causes in time series data. Comprehensive experimental evaluations demonstrate the efficacy of ICODE across various dynamical systems, showcasing its ability to accurately detect anomalies, classify their types, and pinpoint their origins.

nan


Article 616

Title@2025-07-08 (2): The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

Title: The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation Die Landschaft der Erinnerung in LLMs: Mechanismen, Messung und Milderung LLMM中纪念的景观:机制、衡量和缓解 2507.05578v1

Authors (4): Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressing these concerns, this paper synthesizes recent studies and investigates the landscape of memorization, the factors influencing it, and methods for its detection and mitigation. We explore key drivers, including training data duplication, training dynamics, and fine-tuning procedures that influence data memorization. In addition, we examine methodologies such as prefix-based extraction, membership inference, and adversarial prompting, assessing their effectiveness in detecting and measuring memorized content. Beyond technical analysis, we also explore the broader implications of memorization, including the legal and ethical implications. Finally, we discuss mitigation strategies, including data cleaning, differential privacy, and post-training unlearning, while highlighting open challenges in balancing the minimization of harmful memorization with utility. This paper provides a comprehensive overview of the current state of research on LLM memorization across technical, privacy, and performance dimensions, identifying critical directions for future work.

nan


Article 617

Title@2025-07-08 (2): Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models

Title: Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen 语言模型中强化研究链理由的特征采掘和指南 2505.15634v3

Authors (6): Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao, Haoyi Xiong, Mengnan Du

Large Language Models (LLMs) demonstrate the ability to solve reasoning and mathematical problems using the Chain-of-Thought (CoT) technique. Expanding CoT length, as seen in models such as DeepSeek-R1, significantly enhances this reasoning for complex problems, but requires costly and high-quality long CoT data and fine-tuning. This work, inspired by the deep thinking paradigm of DeepSeek-R1, utilizes a steering technique to enhance the reasoning ability of an LLM without external datasets. Our method first employs Sparse Autoencoders (SAEs) to extract interpretable features from vanilla CoT. These features are then used to steer the LLM’s internal states during generation. Recognizing that many LLMs do not have corresponding pre-trained SAEs, we further introduce a novel SAE-free steering algorithm, which directly computes steering directions from the residual activations of an LLM, obviating the need for an explicit SAE. Experimental results demonstrate that both our SAE-based and subsequent SAE-free steering algorithms significantly enhance the reasoning capabilities of LLMs.

nan


Article 618

Title@2025-07-08 (2): Offline Learning and Forgetting for Reasoning with Large Language Models

Title: Offline Learning and Forgetting for Reasoning with Large Language Models Offline-Lernen und Vergessen für die Vernunft mit großen Sprachmodellen 利用大语言模式进行离线学习和遗忘 2504.11364v3

Authors (6): Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor

Leveraging inference-time search in large language models has proven effective in further enhancing a trained model’s capability to solve complex mathematical and reasoning problems. However, this approach significantly increases computational costs and inference time, as the model must generate and evaluate multiple candidate solutions to identify a viable reasoning path. To address this, we propose an effective approach that integrates search capabilities directly into the model by fine-tuning it on unpaired successful (learning) and failed reasoning paths (forgetting) derived from diverse search methods. A key challenge we identify is that naive fine-tuning can degrade the model’s search capability; we show this can be mitigated with a smaller learning rate. Extensive experiments on the challenging Game-of-24 and Countdown reasoning benchmarks show that, replacing CoT-generated data with search-generated data for offline fine-tuning improves success rates by around 23% over inference-time search baselines, while reducing inference time by 180$\times$. On top of this, our learning and forgetting objective consistently outperforms both supervised fine-tuning and preference-based methods.

nan


Article 619

Title@2025-07-08 (2): Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA

Title: Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA Beyond Retrieval: Zusammenbauen von Cross-Encodern und GPT-Rerankern mit LLMs für biomedizinische QA 超出检索范围:利用生物医学质量评估LMs将跨环境者和GPT再生者与生物医学质量评估LMs组合在一起 2507.05577v1

Authors (3): Shashank Verma, Fengyi Jiang, Xiangning Xue

Biomedical semantic question answering rooted in information retrieval can play a crucial role in keeping up to date with vast, rapidly evolving and ever-growing biomedical literature. A robust system can help researchers, healthcare professionals and even layman users access relevant knowledge grounded in evidence. The BioASQ 2025 Task13b Challenge serves as an important benchmark, offering a competitive platform for advancement of this space. This paper presents the methodologies and results from our participation in this challenge where we built a Retrieval-Augmented Generation (RAG) system that can answer biomedical questions by retrieving relevant PubMed documents and snippets to generate answers. For the retrieval task, we generated dense embeddings from biomedical articles for initial retrieval, and applied an ensemble of finetuned cross-encoders and large language models (LLMs) for re-ranking to identify top relevant documents. Our solution achieved an MAP@10 of 0.1581, placing 10th on the leaderboard for the retrieval task. For answer generation, we employed few-shot prompting of instruction-tuned LLMs. Our system achieved macro-F1 score of 0.95 for yes/no questions (rank 12), Mean Reciprocal Rank (MRR) of 0.64 for factoid questions (rank 1), mean-F1 score of 0.63 for list questions (rank 5), and ROUGE-SU4 F1 score of 0.29 for ideal answers (rank 11).

nan


Article 620

Title@2025-07-08 (2): Approximating invariant functions with the sorting trick is theoretically justified

Title: Approximating invariant functions with the sorting trick is theoretically justified Annähernde invariante Funktionen mit dem Sortiertrick sind theoretisch gerechtfertigt 用排序技巧排序相似的变量函数在理论上是合理的 2403.01671v4

Authors (2): Wee Chaimanowong, Ying Zhu

Many machine learning models leverage group invariance which is enjoyed with a wide-range of applications. For exploiting an invariance structure, one common approach is known as \emph{frame averaging}. One popular example of frame averaging is the \emph{group averaging}, where the entire group is used to symmetrize a function. The other end of the spectrum is the \emph{canonicalization}, where a frame at each point consists of a single group element which transforms the point to its orbit representative. Compared to group averaging, canonicalization is more efficient computationally. However, it results in non-differentiablity or discontinuity of the canonicalized function. As a result, the theoretical performance of canonicalization has not been given much attention. In this work, we establish an approximation theory for canonicalization. Specifically, we bound the point-wise and $L^2(\mathbb{P})$ approximation errors as well as the kernel’s eigenvalue decay rates associated with a canonicalization trick.

nan


Article 621

Title@2025-07-08 (2): ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models

Title: ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models ReLayout: Integration von Beziehungsgründen für die Content-aware Layout-Generierung mit multimodalen großen Sprachmodellen ReLayyout:将具有内容意识的布局生成相关理由与多模式大语言模式相结合 2507.05568v1

Authors (6): Jiaxu Tian, Xuehui Yu, Yaoxing Wang, Pan Wang, Guangqian Guo, Shan Gao

Content-aware layout aims to arrange design elements appropriately on a given canvas to convey information effectively. Recently, the trend for this task has been to leverage large language models (LLMs) to generate layouts automatically, achieving remarkable performance. However, existing LLM-based methods fail to adequately interpret spatial relationships among visual themes and design elements, leading to structural and diverse problems in layout generation. To address this issue, we introduce ReLayout, a novel method that leverages relation-CoT to generate more reasonable and aesthetically coherent layouts by fundamentally originating from design concepts. Specifically, we enhance layout annotations by introducing explicit relation definitions, such as region, salient, and margin between elements, with the goal of decomposing the layout into smaller, structured, and recursive layouts, thereby enabling the generation of more structured layouts. Furthermore, based on these defined relationships, we introduce a layout prototype rebalance sampler, which defines layout prototype features across three dimensions and quantifies distinct layout styles. This sampler addresses uniformity issues in generation that arise from data bias in the prototype distribution balance process. Extensive experimental results verify that ReLayout outperforms baselines and can generate structural and diverse layouts that are more aligned with human aesthetics and more explainable.

nan


Article 622

Title@2025-07-08 (2): AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks

Title: AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks AbdomenAtlas-8K: Anmerkung zu 8.000 CT-Volumes für Multi-Organ-Segmentierung in drei Wochen AbdomenAtlas-8K:说明8 000个CT卷,供三周内多组织分解使用 2305.09666v3

Authors (7): Chongyu Qu, Tiezheng Zhang, Hualin Qiao, Jie Liu, Yucheng Tang, Alan Yuille, Zongwei Zhou

Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.

nan


Article 623

Title@2025-07-08 (2): Exact and efficient basis pursuit denoising via differential inclusions and a selection principle

Title: Exact and efficient basis pursuit denoising via differential inclusions and a selection principle Exakte und effiziente Basisverfolgung durch Differentialeinschlüsse und ein Auswahlprinzip 通过有区别的包容和甄选原则实现无无异化 2507.05562v1

Authors (2): Gabriel P. Langlois, Jérôme Darbon

Basis pursuit denoising (BPDN) is a cornerstone of compressive sensing, statistics and machine learning. While various algorithms for BPDN have been proposed, they invariably suffer from drawbacks and must either favor efficiency at the expense of accuracy or vice versa. As such, state-of-the-art algorithms remain ineffective for high-dimensional applications that require accurate solutions within a reasonable amount of computational time. In this work, we address this issue and propose an exact and efficient algorithm for BPDN using differential inclusions. Specifically, we prove that a selection principle from the theory of differential inclusions turns the dual problem of BPDN into calculating the trajectory of an \emph{integrable} projected dynamical system, that is, whose trajectory and asymptotic limit can be computed exactly. Our analysis naturally yields an exact algorithm, numerically up to machine precision, that is amenable to computing regularization paths and very fast. Numerical experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both accuracy and efficiency. Moreover, we show that the global continuation of solutions (in terms of the hyperparameter and data) of the projected dynamical system yields a rigorous homotopy algorithm for BPDN, as well as a novel greedy algorithm for computing feasible solutions to basis pursuit in strongly polynomial time. Beyond this work, we expect that our results and analysis can be adapted to compute exact or approximate solutions to a broader class of polyhedral-constrained optimization problems.

nan


Article 624

Title@2025-07-08 (2): LATST: Are Transformers Necessarily Complex for Time-Series Forecasting

Title: LATST: Are Transformers Necessarily Complex for Time-Series Forecasting LATST: Sind Transformer notwendig komplex für die Zeitreihenprognose LATST: 是变异器 必然复杂 用于时间- 系列预报 2410.23749v9

Authors (1): Dizhen Liang

Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.

nan


Article 625

Title@2025-07-08 (2): Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

Title: Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines Preemptive Lösung künftiger Probleme: Multitask Preplay in Menschen und Maschinen 预防性解决未来问题:人与机器的多重任务前奏 2507.05561v1

Authors (4): Wilka Carvalho, Sam Hall-McMaster, Honglak Lee, Samuel J. Gershman

Humans can pursue a near-infinite variety of tasks, but typically can only pursue a small number at the same time. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not pursued. We formalize this idea as Multitask Preplay, a novel algorithm that replays experience on one task as the starting point for “preplay” – counterfactual simulation of an accessible but unpursued task. Preplay is used to learn a predictive representation that can support fast, adaptive task performance later on. We first show that, compared to traditional planning and predictive representation methods, multitask preplay better predicts how humans generalize to tasks that were accessible but not pursued in a small grid-world, even when people didn’t know they would need to generalize to these tasks. We then show these predictions generalize to Craftax, a partially observable 2D Minecraft environment. Finally, we show that Multitask Preplay enables artificial agents to learn behaviors that transfer to novel Craftax worlds sharing task co-occurrence structure. These findings demonstrate that Multitask Preplay is a scalable theory of how humans counterfactually learn and generalize across multiple tasks; endowing artificial agents with the same capacity can significantly improve their performance in challenging multitask environments.

nan


Article 626

Title@2025-07-08 (2): Neural Network-Based Parameter Estimation for Non-Autonomous Differential Equations with Discontinuous Signals

Title: Neural Network-Based Parameter Estimation for Non-Autonomous Differential Equations with Discontinuous Signals Neurale netzwerkbasierte Parameterschätzung für nichtautonome Differentialgleichungen mit diskontinuierlichen Signalen 具有不连续信号的非自主差异等式的以神经网络为基础的参数参数估计 2507.06267v1

Authors (3): Hyeontae Jo, Krešimir Josić, Jae Kyoung Kim

Non-autonomous differential equations are crucial for modeling systems influenced by external signals, yet fitting these models to data becomes particularly challenging when the signals change abruptly. To address this problem, we propose a novel parameter estimation method utilizing functional approximations with artificial neural networks. Our approach, termed Harmonic Approximation of Discontinuous External Signals using Neural Networks (HADES-NN), operates in two iterated stages. In the first stage, the algorithm employs a neural network to approximate the discontinuous signal with a smooth function. In the second stage, it uses this smooth approximate signal to estimate model parameters. HADES-NN gives highly accurate and precise parameter estimates across various applications, including circadian clock systems regulated by external light inputs measured via wearable devices and the mating response of yeast to external pheromone signals. HADES-NN greatly extends the range of model systems that can be fit to real-world measurements.

nan


Article 627

Title@2025-07-08 (2): Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort

Title: Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort Unversicherbare Risiken durch KI sichern: Der Staat als Versicherer des Last Resorts AI:国家作为最后度假地的保险人 2409.06672v2

Authors (1): Cristian Trout

Many experts believe that AI systems will sooner or later pose uninsurable risks, including existential risks. This creates an extreme judgment-proof problem: few if any parties can be held accountable ex post in the event of such a catastrophe. This paper proposes a novel solution: a government-provided, mandatory indemnification program for AI developers. The program uses risk-priced indemnity fees to induce socially optimal levels of care. Risk-estimates are determined by surveying experts, including indemnified developers. The Bayesian Truth Serum mechanism is employed to incent honest and effortful responses. Compared to alternatives, this approach arguably better leverages all private information, and provides a clearer signal to indemnified developers regarding what risks they must mitigate to lower their fees. It’s recommended that collected fees be used to help fund the safety research developers need, employing a fund matching mechanism (Quadratic Financing) to induce an optimal supply of this public good. Under Quadratic Financing, safety research projects would compete for private contributions from developers, signaling how much each is to be supplemented with public funds.

nan


Article 628

Title@2025-07-08 (2): Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

Title: Machine Learning based Enterprise Financial Audit Framework and High Risk Identification Machine Learning based Enterprise Financial Audit Framework und High Risk Identification 企业财务审计框架和高风险识别 2507.06266v1

Authors (3): Tingyu Yuan, Xi Zhang, Xuanjing Chen

In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI’s impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.

nan


Article 629

Title@2025-07-08 (2): Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI

Title: Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI Haftung und Versicherung für Katastrophenschäden: Der Atomkraftvorhersage und Lehren für KI 灾难损失的责任和保险:核能先例和对AI的教训 2409.06673v2

Authors (1): Cristian Trout

As AI systems become more autonomous and capable, experts warn of them potentially causing catastrophic losses. Drawing on the successful precedent set by the nuclear power industry, this paper argues that developers of frontier AI models should be assigned limited, strict, and exclusive third party liability for harms resulting from Critical AI Occurrences (CAIOs) - events that cause or easily could have caused catastrophic losses. Mandatory insurance for CAIO liability is recommended to overcome developers’ judgment-proofness, mitigate winner’s curse dynamics, and leverage insurers’ quasi-regulatory abilities. Based on theoretical arguments and observations from the analogous nuclear power context, insurers are expected to engage in a mix of causal risk-modeling, monitoring, lobbying for stricter regulation, and providing loss prevention guidance in the context of insuring against heavy-tail risks from AI. While not a substitute for regulation, clear liability assignment and mandatory insurance can help efficiently allocate resources to risk-modeling and safe design, facilitating future regulatory efforts.

nan


Article 630

Title@2025-07-08 (2): A Malliavin calculus approach to score functions in diffusion generative models

Title: A Malliavin calculus approach to score functions in diffusion generative models Ein Malliavin Kalkül Ansatz, um Funktionen in Diffusion generative Modelle punkten 以Malliavin微积分法在传播基因变异模型中计分功能 2507.05550v1

Authors (5): Ehsan Mirafzali, Frank Proske, Utkarsh Gupta, Daniele Venturi, Razvan Marinescu

Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via deterministic or stochastic differential equations (SDEs). The score function is typically estimated from data using a variety of approximation techniques, such as denoising or sliced score matching, Hyv"arien’s method, or Schr"odinger bridges. In this paper, we derive an exact, closed form, expression for the score function for a broad class of nonlinear diffusion generative models. Our approach combines modern stochastic analysis tools such as Malliavin derivatives and their adjoint operators (Skorokhod integrals or Malliavin Divergence) with a new Bismut-type formula. The resulting expression for the score function can be written entirely in terms of the first and second variation processes, with all Malliavin derivatives systematically eliminated, thereby enhancing its practical applicability. The theoretical framework presented in this work offers a principled foundation for advancing score estimation methods in generative modelling, enabling the design of new sampling algorithms for complex probability distributions. Our results can be extended to broader classes of stochastic differential equations, opening new directions for the development of score-based diffusion generative models.

nan


Article 631

Title@2025-07-08 (2): SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

Title: SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v2

Authors (4): Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce Selective Prompting with Efficient Estimation of Difficulty (SPEED), an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator’s signal-to-noise ratio, accelerating convergence. Empirically, our efficient implementation leads to 2x to 6x faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.

nan


Article 632

Title@2025-07-08 (2): Gait-Based Hand Load Estimation via Deep Latent Variable Models with Auxiliary Information

Title: Gait-Based Hand Load Estimation via Deep Latent Variable Models with Auxiliary Information Gait-basierte Handlastschätzung über tief latente variable Modelle mit Hilfsinformationen 通过带有辅助信息的深边变量模型进行基于 Gait 的手负载估计 2507.05544v1

Authors (3): Jingyi Gao, Sol Lim, Seokhyun Chung

Machine learning methods are increasingly applied to ergonomic risk assessment in manual material handling, particularly for estimating carried load from gait motion data collected from wearable sensors. However, existing approaches often rely on direct mappings from loaded gait to hand load, limiting generalization and predictive accuracy. In this study, we propose an enhanced load estimation framework that incorporates auxiliary information, including baseline gait patterns during unloaded walking and carrying style. While baseline gait can be automatically captured by wearable sensors and is thus readily available at inference time, carrying style typically requires manual labeling and is often unavailable during deployment. Our model integrates deep latent variable modeling with temporal convolutional networks and bi-directional cross-attention to capture gait dynamics and fuse loaded and unloaded gait patterns. Guided by domain knowledge, the model is designed to estimate load magnitude conditioned on carrying style, while eliminating the need for carrying style labels at inference time. Experiments using real-world data collected from inertial measurement units attached to participants demonstrate substantial accuracy gains from incorporating auxiliary information and highlight the importance of explicit fusion mechanisms over naive feature concatenation.

nan


Article 633

Title@2025-07-07 (1): Robust Learning on Noisy Graphs via Latent Space Constraints with External Knowledge

Title: Robust Learning on Noisy Graphs via Latent Space Constraints with External Knowledge Robustes Lernen von Noisy Graphs über Latent Space Constraints mit externem Wissen 通过具有外部知识的冷藏空间限制对噪音图进行有力的学习 2507.05540v1

Authors (5): Chunhui Gu, Mohammad Sadegh Nasr, James P. Long, Kim-Anh Do, Ehsan Irajizad

Graph Neural Networks (GNNs) often struggle with noisy edges. We propose Latent Space Constrained Graph Neural Networks (LSC-GNN) to incorporate external “clean” links and guide embeddings of a noisy target graph. We train two encoders–one on the full graph (target plus external edges) and another on a regularization graph excluding the target’s potentially noisy links–then penalize discrepancies between their latent representations. This constraint steers the model away from overfitting spurious edges. Experiments on benchmark datasets show LSC-GNN outperforms standard and noise-resilient GNNs in graphs subjected to moderate noise. We extend LSC-GNN to heterogeneous graphs and validate it on a small protein-metabolite network, where metabolite-protein interactions reduce noise in protein co-occurrence data. Our results highlight LSC-GNN’s potential to boost predictive performance and interpretability in settings with noisy relational structures.

nan


Article 634

Title@2025-07-07 (1): The Role of Deductive and Inductive Reasoning in Large Language Models

Title: The Role of Deductive and Inductive Reasoning in Large Language Models Die Rolle der deduktiven und induktiven Vernunft in großen Sprachmodellen 大语种模式中的减量和引引原因的作用 2410.02892v3

Authors (8): Chengkun Cai, Xu Zhao, Haoliang Liu, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li

Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning tasks, yet their reliance on static prompt structures and limited adaptability to complex scenarios remains a significant challenge. In this paper, we propose the Deductive and InDuctive(DID) method, a novel framework that enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning approaches. Drawing from cognitive science principles, DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy to precisely assess task difficulty and guide decomposition strategies. DID enables the model to progressively adapt its reasoning pathways based on problem complexity, mirroring human cognitive processes. We evaluate DID’s effectiveness across multiple benchmarks, including the AIW and MR-GSM8K, as well as our custom Holiday Puzzle dataset for temporal reasoning. Our results demonstrate significant improvements in reasoning quality and solution accuracy - achieving 70.3% accuracy on AIW (compared to 62.2% for Tree of Thought) while maintaining lower computational costs. The success of DID in improving LLM performance while preserving computational efficiency suggests promising directions for developing more cognitively aligned and capable language models. Our work contributes a theoretically grounded, input-centric approach to enhancing LLM reasoning capabilities, offering an efficient alternative to traditional output-exploration methods.

nan


Article 635

Title@2025-07-07 (1): Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality

Title: Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality Ausbalancierende Effizienz und Expressivität: Subgraph GNNs mit Walk-Based Centrality 平衡效率和表现:GNNs与步行中心 2501.03113v2

Authors (6): Joshua Southern, Yam Eitan, Guy Bar-Shalom, Michael Bronstein, Haggai Maron, Fabrizio Frasca

Subgraph GNNs have emerged as promising architectures that overcome the expressiveness limitations of Graph Neural Networks (GNNs) by processing bags of subgraphs. Despite their compelling empirical performance, these methods are afflicted by a high computational complexity: they process bags whose size grows linearly in the number of nodes, hindering their applicability to larger graphs. In this work, we propose an effective and easy-to-implement approach to dramatically alleviate the computational cost of Subgraph GNNs and unleash broader applications thereof. Our method, dubbed HyMN, leverages walk-based centrality measures to sample a small number of relevant subgraphs and drastically reduce the bag size. By drawing a connection to perturbation analysis, we highlight the strength of the proposed centrality-based subgraph sampling, and further prove that these walk-based centralities can be additionally used as Structural Encodings for improved discriminative power. A comprehensive set of experimental results demonstrates that HyMN provides an effective synthesis of expressiveness, efficiency, and downstream performance, unlocking the application of Subgraph GNNs to dramatically larger graphs. Not only does our method outperform more sophisticated subgraph sampling approaches, it is also competitive, and sometimes better, than other state-of-the-art approaches for a fraction of their runtime.

nan


Article 636

Title@2025-07-07 (1): Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records

Title: Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records Zero-shot Medical Event Vorhersage mit einem Generativen Vortrainierten Transformer auf elektronischen Gesundheitsdaten 利用电子健康记录培训前变革器生成电子健康记录 2503.05893v2

Authors (10): Ekaterina Redekop, Zichen Wang, Rushikesh Kulkarni, Mara Pleasure, Aaron Chin, Hamid Reza Hassanzadeh, Brian L. Hill, Melika Emami, William Speier, Corey W. Arnold

Longitudinal data in electronic health records (EHRs) represent an individuals clinical history through a sequence of codified concepts, including diagnoses, procedures, medications, and laboratory tests. Generative pre-trained transformers (GPT) can leverage this data to predict future events. While fine-tuning of these models can enhance task-specific performance, it becomes costly when applied to many clinical prediction tasks. In contrast, a pretrained foundation model can be used in zero-shot forecasting setting, offering a scalable alternative to fine-tuning separate models for each outcome. This study presents the first comprehensive analysis of zero-shot forecasting with GPT-based foundational models in EHRs, introducing a novel pipeline that formulates medical concept prediction as a generative modeling task. Unlike supervised approaches requiring extensive labeled data, our method enables the model to forecast a next medical event purely from a pretraining knowledge. We evaluate performance across multiple time horizons and clinical categories, demonstrating models ability to capture latent temporal dependencies and complex patient trajectories without task supervision. Model performance for predicting the next medical concept was evaluated using precision and recall metrics, achieving an average top1 precision of 0.614 and recall of 0.524. For 12 major diagnostic conditions, the model demonstrated strong zero-shot performance, achieving high true positive rates while maintaining low false positives. We demonstrate the power of a foundational EHR GPT model in capturing diverse phenotypes and enabling robust, zero-shot forecasting of clinical outcomes. This capability enhances the versatility of predictive healthcare models and reduces the need for task-specific training, enabling more scalable applications in clinical settings.

nan


Article 637

Title@2025-07-07 (1): Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception

Title: Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception Simulation refraaktiver Verzerrungen und wetterinduzierter Artefakte für ressourcenabhängige autonome Wahrnehmung 为受资源限制的自主感知模拟折变扭曲和天气诱导的天体 2507.05536v1

Authors (7): Moseli Mots’oehli, Feimei Chen, Hok Wai Chan, Itumeleng Tlali, Thulani Babeli, Kyungim Baek, Huaijin Chen

The scarcity of autonomous vehicle datasets from developing regions, particularly across Africa’s diverse urban, rural, and unpaved roads, remains a key obstacle to robust perception in low-resource settings. We present a procedural augmentation pipeline that enhances low-cost monocular dashcam footage with realistic refractive distortions and weather-induced artifacts tailored to challenging African driving scenarios. Our refractive module simulates optical effects from low-quality lenses and air turbulence, including lens distortion, Perlin noise, Thin-Plate Spline (TPS), and divergence-free (incompressible) warps. The weather module adds homogeneous fog, heterogeneous fog, and lens flare. To establish a benchmark, we provide baseline performance using three image restoration models. To support perception research in underrepresented African contexts, without costly data collection, labeling, or simulation, we release our distortion toolkit, augmented dataset splits, and benchmark results.

nan


Article 638

Title@2025-07-07 (1): Special-Unitary Parameterization for Trainable Variational Quantum Circuits

Title: Special-Unitary Parameterization for Trainable Variational Quantum Circuits Spezial-Unitary-Parameterisierung für lernfähige Variations-Quantenkreise 可训练变量量电路特殊单位参数 2507.05535v1

Authors (5): Kuan-Cheng Chen, Huan-Hsin Tseng, Samuel Yen-Chi Chen, Chen-Yu Liu, Kin K. Leung

We propose SUN-VQC, a variational-circuit architecture whose elementary layers are single exponentials of a symmetry-restricted Lie subgroup, $\mathrm{SU}(2^{k}) \subset \mathrm{SU}(2^{n})$ with $k \ll n$. Confining the evolution to this compact subspace reduces the dynamical Lie-algebra dimension from $\mathcal{O}(4^{n})$ to $\mathcal{O}(4^{k})$, ensuring only polynomial suppression of gradient variance and circumventing barren plateaus that plague hardware-efficient ans"atze. Exact, hardware-compatible gradients are obtained using a generalized parameter-shift rule, avoiding ancillary qubits and finite-difference bias. Numerical experiments on quantum auto-encoding and classification show that SUN-VQCs sustain order-of-magnitude larger gradient signals, converge 2–3$\times$ faster, and reach higher final fidelities than depth-matched Pauli-rotation or hardware-efficient circuits. These results demonstrate that Lie-subalgebra engineering provides a principled, scalable route to barren-plateau-resilient VQAs compatible with near-term quantum processors.

nan


Article 639

Title@2025-07-07 (1): Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models

Title: Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models Zufällige Spaziergänge mit Tweedie: Eine einheitliche Ansicht von Score-basierten Diffusionsmodellen 与 Tweedie 一起随机漫步:基于分数的传播模型统一视图 2411.18702v2

Authors (5): Chicago Y. Park, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov

We present a concise derivation for several influential score-based diffusion models that relies on only a few textbook results. Diffusion models have recently emerged as powerful tools for generating realistic, synthetic signals – particularly natural images – and often play a role in state-of-the-art algorithms for inverse problems in image processing. While these algorithms are often surprisingly simple, the theory behind them is not, and multiple complex theoretical justifications exist in the literature. Here, we provide a simple and largely self-contained theoretical justification for score-based diffusion models that is targeted towards the signal processing community. This approach leads to generic algorithmic templates for training and generating samples with diffusion models. We show that several influential diffusion models correspond to particular choices within these templates and demonstrate that alternative, more straightforward algorithmic choices can provide comparable results. This approach has the added benefit of enabling conditional sampling without any likelihood approximation.

nan


Article 640

Title@2025-07-07 (1): Theoretical Learning Performance of Graph Neural Networks: The Impact of Jumping Connections and Layer-wise Sparsification

Title: Theoretical Learning Performance of Graph Neural Networks: The Impact of Jumping Connections and Layer-wise Sparsification Theoretische Lernleistung von Graph Neural Networks: Die Auswirkungen von Sprungverbindungen und schichtweise Sparsifikation 图形神经网络的理论学习表现:跳动连接和图层分化的影响 2507.05533v1

Authors (3): Jiawei Sun, Hongkang Li, Meng Wang

Jumping connections enable Graph Convolutional Networks (GCNs) to overcome over-smoothing, while graph sparsification reduces computational demands by selecting a sub-matrix of the graph adjacency matrix during neighborhood aggregation. Learning GCNs with graph sparsification has shown empirical success across various applications, but a theoretical understanding of the generalization guarantees remains limited, with existing analyses ignoring either graph sparsification or jumping connections. This paper presents the first learning dynamics and generalization analysis of GCNs with jumping connections using graph sparsification. Our analysis demonstrates that the generalization accuracy of the learned model closely approximates the highest achievable accuracy within a broad class of target functions dependent on the proposed sparse effective adjacency matrix $A^$. Thus, graph sparsification maintains generalization performance when $A^$ preserves the essential edges that support meaningful message propagation. We reveal that jumping connections lead to different sparsification requirements across layers. In a two-hidden-layer GCN, the generalization is more affected by the sparsified matrix deviations from $A^*$ of the first layer than the second layer. To the best of our knowledge, this marks the first theoretical characterization of jumping connections’ role in sparsification requirements. We validate our theoretical results on benchmark datasets in deep GCNs.

nan


Article 641

Title: Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search Bit-Flip-Fault-Angriff: Zerkleinernde Graphen-Neural-Netzwerke über schrittweise Bitsuche Bit- Flip 错误攻击: 通过渐变位搜索粉碎图形神经网络 2507.05531v1

Authors (2): Sanaz Kazemi Abharian, Sai Manoj Pudukotai Dinakarrao

Graph Neural Networks (GNNs) have emerged as a powerful machine learning method for graph-structured data. A plethora of hardware accelerators has been introduced to meet the performance demands of GNNs in real-world applications. However, security challenges of hardware-based attacks have been generally overlooked. In this paper, we investigate the vulnerability of GNN models to hardware-based fault attack, wherein an attacker attempts to misclassify output by modifying trained weight parameters through fault injection in a memory device. Thus, we propose Gradual Bit-Flip Fault Attack (GBFA), a layer-aware bit-flip fault attack, selecting a vulnerable bit in each selected weight gradually to compromise the GNN’s performance by flipping a minimal number of bits. To achieve this, GBFA operates in two steps. First, a Markov model is created to predict the execution sequence of layers based on features extracted from memory access patterns, enabling the launch of the attack within a specific layer. Subsequently, GBFA identifies vulnerable bits within the selected weights using gradient ranking through an in-layer search. We evaluate the effectiveness of the proposed GBFA attack on various GNN models for node classification tasks using the Cora and PubMed datasets. Our findings show that GBFA significantly degrades prediction accuracy, and the variation in its impact across different layers highlights the importance of adopting a layer-aware attack strategy in GNNs. For example, GBFA degrades GraphSAGE’s prediction accuracy by 17% on the Cora dataset with only a single bit flip in the last layer.

nan


Article 642

Title@2025-07-07 (1): Mitigating Shortcut Learning with InterpoLated Learning

Title: Mitigating Shortcut Learning with InterpoLated Learning Kurzbefehls-Lernen mit interpoLiertem Lernen abmildern 以互插学习来缩短快捷键学习 2507.05527v1

Authors (3): Michalis Korakakis, Andreas Vlachos, Adrian Weller

Empirical risk minimization (ERM) incentivizes models to exploit shortcuts, i.e., spurious correlations between input attributes and labels that are prevalent in the majority of the training data but unrelated to the task at hand. This reliance hinders generalization on minority examples, where such correlations do not hold. Existing shortcut mitigation approaches are model-specific, difficult to tune, computationally expensive, and fail to improve learned representations. To address these issues, we propose InterpoLated Learning (InterpoLL) which interpolates the representations of majority examples to include features from intra-class minority examples with shortcut-mitigating patterns. This weakens shortcut influence, enabling models to acquire features predictive across both minority and majority examples. Experimental results on multiple natural language understanding tasks demonstrate that InterpoLL improves minority generalization over both ERM and state-of-the-art shortcut mitigation methods, without compromising accuracy on majority examples. Notably, these gains persist across encoder, encoder-decoder, and decoder-only architectures, demonstrating the method’s broad applicability.

nan


Article 643

Title@2025-07-07 (1): Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning

Title: Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning Schätzung der Interventionsverteilungen mit unsicheren Kausalgraphen durch Meta-Learning 通过Met-Llearn 估计含不确定因果图的干预分布 2507.05526v1

Authors (6): Anish Dhir, Cristiana Diaconu, Valentinian Mihai Lungu, James Requeima, Richard E. Turner, Mark van der Wilk

In scientific domains – from biology to the social sciences – many questions boil down to \textit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.

nan


Article 644

Title@2025-07-07 (1): Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models

Title: Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models 脱离柏拉图的洞穴:调整独立培训的愿景和语言模式的JAM 2507.01201v3

Authors (4): Hyoseo, Yoon, Yisong Yue, Been Kim

Independently trained vision and language models inhabit disjoint representational spaces, shaped by their respective modalities, objectives, and architectures. Yet an emerging hypothesis - the Platonic Representation Hypothesis - suggests that such models may nonetheless converge toward a shared statistical model of reality. This compatibility, if it exists, raises a fundamental question: can we move beyond post-hoc statistical detection of alignment and explicitly optimize for it between such disjoint representations? We cast this Platonic alignment problem as a multi-objective optimization task - preserve each modality’s native structure while aligning for mutual coherence. We introduce the Joint Autoencoder Modulator (JAM) framework that jointly trains modality-specific autoencoders on the latent representations of pre-trained single modality models, encouraging alignment through both reconstruction and cross-modal objectives. By analogy, this framework serves as a method to escape Plato’s Cave, enabling the emergence of shared structure from disjoint inputs. We evaluate this framework across three critical design axes: (i) the alignment objective - comparing contrastive loss (Con), its hard-negative variant (NegCon), and our Spread loss, (ii) the layer depth at which alignment is most effective, and (iii) the impact of foundation model scale on representational convergence. Our findings show that our lightweight Pareto-efficient framework reliably induces alignment, even across frozen, independently trained representations, offering both theoretical insight and practical pathways for transforming generalist unimodal foundations into specialist multimodal models.

nan


Article 645

Title@2025-07-07 (1): Deep Learning of Continuous and Structured Policies for Aggregated Heterogeneous Treatment Effects

Title: Deep Learning of Continuous and Structured Policies for Aggregated Heterogeneous Treatment Effects Deep Learning von kontinuierlichen und strukturierten Strategien für Aggregate Heterogene Behandlungseffekte 深入学习关于综合异异同治疗效果的持续和结构性政策 2507.05511v1

Authors (3): Jennifer Y. Zhang, Shuyang Du, Will Y. Zou

As estimation of Heterogeneous Treatment Effect (HTE) is increasingly adopted across a wide range of scientific and industrial applications, the treatment action space can naturally expand, from a binary treatment variable to a structured treatment policy. This policy may include several policy factors such as a continuous treatment intensity variable, or discrete treatment assignments. From first principles, we derive the formulation for incorporating multiple treatment policy variables into the functional forms of individual and average treatment effects. Building on this, we develop a methodology to directly rank subjects using aggregated HTE functions. In particular, we construct a Neural-Augmented Naive Bayes layer within a deep learning framework to incorporate an arbitrary number of factors that satisfies the Naive Bayes assumption. The factored layer is then applied with continuous treatment variables, treatment assignment, and direct ranking of aggregated treatment effect functions. Together, these algorithms build towards a generic framework for deep learning of heterogeneous treatment policies, and we show their power to improve performance with public datasets.

nan


Article 646

Title@2025-07-07 (1): Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting

Title: Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v2

Authors (5): Zhongying Wang, Thoai D. Ngo, Hamidreza Zoraghein, Benjamin Lucas, Morteza Karimzadeh

The COVID-19 pandemic’s severe impact highlighted the need for accurate and timely hospitalization forecasting to support effective healthcare planning. However, most forecasting models struggled, particularly during variant surges, when they were most needed. This study introduces a novel parallel-stream Long Short-Term Memory (LSTM) framework to forecast daily state-level incident hospitalizations in the United States. Our framework incorporates a spatiotemporal feature, Social Proximity to Hospitalizations (SPH), derived from Meta’s Social Connectedness Index, to improve forecasts. SPH serves as a proxy for interstate population interaction, capturing transmission dynamics across space and time. Our architecture captures both short- and long-term temporal dependencies, and a multi-horizon ensembling strategy balances forecasting consistency and error. An evaluation against the COVID-19 Forecast Hub ensemble models during the Delta and Omicron surges reveals the superiority of our model. On average, our model surpasses the ensemble by 27, 42, 54, and 69 hospitalizations per state at the 7-, 14-, 21-, and 28-day horizons, respectively, during the Omicron surge. Data-ablation experiments confirm SPH’s predictive power, highlighting its effectiveness in enhancing forecasting models. This research not only advances hospitalization forecasting but also underscores the significance of spatiotemporal features, such as SPH, in modeling the complex dynamics of infectious disease spread.

nan


Article 647

Title@2025-07-07 (1): Heterogeneous Causal Learning for Optimizing Aggregated Functions in User Growth

Title: Heterogeneous Causal Learning for Optimizing Aggregated Functions in User Growth Heterogenes Kausallernen zur Optimierung Aggregation von Funktionen im Nutzerwachstum 在用户增长中优化综合功能以优化综合功能 2507.05510v1

Authors (3): Shuyang Du, Jennifer Zhang, Will Y. Zou

User growth is a major strategy for consumer internet companies. To optimize costly marketing campaigns and maximize user engagement, we propose a novel treatment effect optimization methodology to enhance user growth marketing. By leveraging deep learning, our algorithm learns from past experiments to optimize user selection and reward allocation, maximizing campaign impact while minimizing costs. Unlike traditional prediction methods, our model directly models uplifts in key business metrics. Further, our deep learning model can jointly optimize parameters for an aggregated loss function using softmax gating. Our approach surpasses traditional methods by directly targeting desired business metrics and demonstrates superior algorithmic flexibility in handling complex business constraints. Comprehensive evaluations, including comparisons with state-of-the-art techniques such as R-learner and Causal Forest, validate the effectiveness of our model. We experimentally demonstrate that our proposed constrained and direct optimization algorithms significantly outperform state-of-the-art methods by over $20\%$, proving their cost-efficiency and real-world impact. The versatile methods can be applied to various product scenarios, including optimal treatment allocation. Its effectiveness has also been validated through successful worldwide production deployments.

nan


Article 648

Title@2025-07-07 (1): Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning

Title: Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning Beyond Communication Overhead: Ein mehrstufiger Monte-Carlo-Ansatz zur Vermeidung von Kompressions-Bias im verteilten Lernen ” 超越传播 “ overoverhead:减少在分配的学习中 “ 压力 “ 的多层次蒙特卡洛办法 2507.05508v1

Authors (3): Ze’ev Zukerman, Bassel Hamoud, Kfir Y. Levy

Distributed learning methods have gained substantial momentum in recent years, with communication overhead often emerging as a critical bottleneck. Gradient compression techniques alleviate communication costs but involve an inherent trade-off between the empirical efficiency of biased compressors and the theoretical guarantees of unbiased compressors. In this work, we introduce a novel Multilevel Monte Carlo (MLMC) compression scheme that leverages biased compressors to construct statistically unbiased estimates. This approach effectively bridges the gap between biased and unbiased methods, combining the strengths of both. To showcase the versatility of our method, we apply it to popular compressors, like Top-$k$ and bit-wise compressors, resulting in enhanced variants. Furthermore, we derive an adaptive version of our approach to further improve its performance. We validate our method empirically on distributed deep learning tasks.

nan


Article 649

Title@2025-07-07 (1): X-ray transferable polyrepresentation learning

Title: X-ray transferable polyrepresentation learning Röntgen übertragbares Polyrepräsentationslernen X光可转移多代表制学习 2507.06264v1

Authors (2): Weronika Hryniewska-Guzik, Przemyslaw Biecek

The success of machine learning algorithms is inherently related to the extraction of meaningful features, as they play a pivotal role in the performance of these algorithms. Central to this challenge is the quality of data representation. However, the ability to generalize and extract these features effectively from unseen datasets is also crucial. In light of this, we introduce a novel concept: the polyrepresentation. Polyrepresentation integrates multiple representations of the same modality extracted from distinct sources, for example, vector embeddings from the Siamese Network, self-supervised models, and interpretable radiomic features. This approach yields better performance metrics compared to relying on a single representation. Additionally, in the context of X-ray images, we demonstrate the transferability of the created polyrepresentation to a smaller dataset, underscoring its potential as a pragmatic and resource-efficient approach in various image-related solutions. It is worth noting that the concept of polyprepresentation on the example of medical data can also be applied to other domains, showcasing its versatility and broad potential impact.

nan


Article 650

Title@2025-07-07 (1): Dynamic Campus Origin-Destination Mobility Prediction using Graph Convolutional Neural Network on WiFi Logs

Title: Dynamic Campus Origin-Destination Mobility Prediction using Graph Convolutional Neural Network on WiFi Logs Dynamic Campus Origin-Destination Mobilitätsvorhersage mit Graph Convolutional Neural Network auf WiFi Logs 利用无线、无线、无线、无线、无线、无线、无线、无线、无线、无线、无线和无线 2507.05507v1

Authors (2): Godwin Badu-Marfo, Bilal Farooq

We present an integrated graph-based neural networks architecture for predicting campus buildings occupancy and inter-buildings movement at dynamic temporal resolution that learns traffic flow patterns from Wi-Fi logs combined with the usage schedules within the buildings. The relative traffic flows are directly estimated from the WiFi data without assuming the occupant behaviour or preferences while maintaining individual privacy. We formulate the problem as a data-driven graph structure represented by a set of nodes (representing buildings), connected through a route of edges or links using a novel Graph Convolution plus LSTM Neural Network (GCLSTM) which has shown remarkable success in modelling complex patterns. We describe the formulation, model estimation, interpretability and examine the relative performance of our proposed model. We also present an illustrative architecture of the models and apply on real-world WiFi logs collected at the Toronto Metropolitan University campus. The results of the experiments show that the integrated GCLSTM models significantly outperform traditional pedestrian flow estimators like the Multi Layer Perceptron (MLP) and Linear Regression.

nan


Article 651

Title@2025-07-07 (1): Predicting mutational effects on protein binding from folding energy

Title: Predicting mutational effects on protein binding from folding energy Vorhersage von Mutationseffekten auf Proteinbindung aus Faltenergie 预测折叠能量对蛋白质结合的突变效应 2507.05502v1

Authors (6): Arthur Deng, Karsten Householder, Fang Wu, Sebastian Thrun, K. Christopher Garcia, Brian Trippe

Accurate estimation of mutational effects on protein-protein binding energies is an open problem with applications in structural biology and therapeutic design. Several deep learning predictors for this task have been proposed, but, presumably due to the scarcity of binding data, these methods underperform computationally expensive estimates based on empirical force fields. In response, we propose a transfer-learning approach that leverages advances in protein sequence modeling and folding stability prediction for this task. The key idea is to parameterize the binding energy as the difference between the folding energy of the protein complex and the sum of the folding energies of its binding partners. We show that using a pre-trained inverse-folding model as a proxy for folding energy provides strong zero-shot performance, and can be fine-tuned with (1) copious folding energy measurements and (2) more limited binding energy measurements. The resulting predictor, StaB-ddG, is the first deep learning predictor to match the accuracy of the state-of-the-art empirical force-field method FoldX, while offering an over 1,000x speed-up.

nan


Article 652

Title@2025-07-07 (1): Explainable Hierarchical Deep Learning Neural Networks (Ex-HiDeNN)

Title: Explainable Hierarchical Deep Learning Neural Networks (Ex-HiDeNN) Erklärbare Hierarchische Deep-Learning-Neural-Netzwerke (Ex-HiDeNN) 可解释的等级级深层学习神经网络(Ex-HIDENN) 2507.05498v1

Authors (4): Reza T. Batley, Chanwook Park, Wing Kam Liu, Sourav Saha

Data-driven science and computation have advanced immensely to construct complex functional relationships using trainable parameters. However, efficiently discovering interpretable and accurate closed-form expressions from complex dataset remains a challenge. The article presents a novel approach called Explainable Hierarchical Deep Learning Neural Networks or Ex-HiDeNN that uses an accurate, frugal, fast, separable, and scalable neural architecture with symbolic regression to discover closed-form expressions from limited observation. The article presents the two-step Ex-HiDeNN algorithm with a separability checker embedded in it. The accuracy and efficiency of Ex-HiDeNN are tested on several benchmark problems, including discerning a dynamical system from data, and the outcomes are reported. Ex-HiDeNN generally shows outstanding approximation capability in these benchmarks, producing orders of magnitude smaller errors compared to reference data and traditional symbolic regression. Later, Ex-HiDeNN is applied to three engineering applications: a) discovering a closed-form fatigue equation, b) identification of hardness from micro-indentation test data, and c) discovering the expression for the yield surface with data. In every case, Ex-HiDeNN outperformed the reference methods used in the literature. The proposed method is built upon the foundation and published works of the authors on Hierarchical Deep Learning Neural Network (HiDeNN) and Convolutional HiDeNN. The article also provides a clear idea about the current limitations and future extensions of Ex-HiDeNN.

nan


Article 653

Title@2025-07-07 (1): MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Title: MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation MEIT: Multimodale Elektrokardiogramm-Anleitung Tuning auf großen Sprachmodellen für die Berichtserstellung MEIT:多式电动心电图指示,关于报告生成大语言模式的图示 2403.04945v4

Authors (9): Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, Mi Zhang

Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT’s results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, resilience to signal perturbation, and alignment with human expert evaluation. These findings emphasize the efficacy of MEIT and its potential for real-world clinical application.

nan


Article 654

Title@2025-07-07 (1): Cloud Diffusion Part 1: Theory and Motivation

Title: Cloud Diffusion Part 1: Theory and Motivation Cloud Diffusion Teil 1: Theorie und Motivation 云传播第一部分:理论和动力 2507.05496v1

Authors (1): Andrew Randono

Diffusion models for image generation function by progressively adding noise to an image set and training a model to separate out the signal from the noise. The noise profile used by these models is white noise – that is, noise based on independent normal distributions at each point whose mean and variance is independent of the scale. By contrast, most natural image sets exhibit a type of scale invariance in their low-order statistical properties characterized by a power-law scaling. Consequently, natural images are closer (in a quantifiable sense) to a different probability distribution that emphasizes large scale correlations and de-emphasizes small scale correlations. These scale invariant noise profiles can be incorporated into diffusion models in place of white noise to form what we will call a ``Cloud Diffusion Model”. We argue that these models can lead to faster inference, improved high-frequency details, and greater controllability. In a follow-up paper, we will build and train a Cloud Diffusion Model that uses scale invariance at a fundamental level and compare it to classic, white noise diffusion models.

nan


Article 655

Title@2025-07-07 (1): Navigating Sparse Molecular Data with Stein Diffusion Guidance

Title: Navigating Sparse Molecular Data with Stein Diffusion Guidance Navigation von Sparse Molekulardaten mit Stein Diffusion Guidance 使用 Stein 扩散指导导航微粒分子数据 2507.05482v1

Authors (3): Van Khoa Nguyen, Lionel Blondé, Alexandros Kalousis

Stochastic optimal control (SOC) has recently emerged as a principled framework for fine-tuning diffusion models. However, its dependence on computationally intensive simulations makes it impractical for fast sampling. In parallel, a class of training-free approaches has been developed that guides diffusion models using off-the-shelf classifiers on predicted clean samples, bypassing the need to train classifiers on noisy data. These methods can be interpreted as approximate SOC schemes, using Tweedie’s formula to estimate diffusion posteriors. In practice, however, such direct approximations can introduce significant errors, leading to unreliable guidance. In this work, we unify the strengths of both paradigms by proposing a novel training-free diffusion guidance framework based on a surrogate stochastic optimal control objective. We derive a new theoretical bound on the value function that reveals the necessity of correcting the approximate posteriors to remain faithful to the true diffusion posterior. To this end, we connect the problem with Stein variational inference, which seeks the steepest descent direction that minimizes the Kullback-Leibler discrepancy between the two posteriors. Our method, which we refer to as Stein Diffusion Guidance (SDG), introduces a principled correction mechanism and incorporates a novel running cost functional to enable effective guidance in low-density regions. Experiments on challenging molecular generation tasks demonstrate that SDG significantly outperforms standard training-free guidance methods, highlighting its potential for broader applications.

nan


Article 656

Title@2025-07-07 (1): Dynamic Regret Reduces to Kernelized Static Regret

Title: Dynamic Regret Reduces to Kernelized Static Regret Dynamisches Bedauern reduziert den kernelisierten statischen Bedauern 向内内中性静态迟缓的动态遗憾减少 2507.05478v1

Authors (4): Andrew Jacobsen, Alessandro Rudi, Francesco Orabona, Nicolo Cesa-Bianchi

We study dynamic regret in online convex optimization, where the objective is to achieve low cumulative loss relative to an arbitrary benchmark sequence. By observing that competing with an arbitrary sequence of comparators $u_{1},\ldots,u_{T}$ in $\mathcal{W}\subseteq\mathbb{R}^{d}$ is equivalent to competing with a fixed comparator function $u:[1,T]\to \mathcal{W}$, we frame dynamic regret minimization as a static regret problem in a function space. By carefully constructing a suitable function space in the form of a Reproducing Kernel Hilbert Space (RKHS), our reduction enables us to recover the optimal $R_{T}(u_{1},\ldots,u_{T}) = \mathcal{O}(\sqrt{\sum_{t}|u_{t}-u_{t-1}|T})$ dynamic regret guarantee in the setting of linear losses, and yields new scale-free and directionally-adaptive dynamic regret guarantees. Moreover, unlike prior dynamic-to-static reductions – which are valid only for linear losses – our reduction holds for any sequence of losses, allowing us to recover $\mathcal{O}\big(|u|^2+d_{\mathrm{eff}}(\lambda)\ln T\big)$ bounds in exp-concave and improper linear regression settings, where $d_{\mathrm{eff}}(\lambda)$ is a measure of complexity of the RKHS. Despite working in an infinite-dimensional space, the resulting reduction leads to algorithms that are computable in practice, due to the reproducing property of RKHSs.

nan


Article 657

Title@2025-07-07 (1): Epistemically-guided forward-backward exploration

Title: Epistemically-guided forward-backward exploration Epistemisch geführte Forward-Backward-Exploration 后向探索 2507.05477v1

Authors (4): Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros

Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations (FB) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, FB and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that FB representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the FB representation that minimize the posterior variance of the FB representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that such principled exploration strategies improve sample complexity of the FB algorithm considerably in comparison to other exploration methods. Code is publicly available at https://sites.google.com/view/fbee-url.

nan


Article 658

Title@2025-07-07 (1): Features are fate: a theory of transfer learning in high-dimensional regression

Title: Features are fate: a theory of transfer learning in high-dimensional regression Merkmale sind Schicksal: eine Theorie des Transferlernens in hochdimensionaler Regression 特征是命运:在高维回归中传授学习的理论 2410.08194v2

Authors (3): Javan Tahir, Surya Ganguli, Grant M. Rotskoff

With the emergence of large-scale pre-trained neural networks, methods to adapt such “foundation” models to data-limited downstream tasks have become a necessity. Fine-tuning, preference optimization, and transfer learning have all been successfully employed for these purposes when the target task closely resembles the source task, but a precise theoretical understanding of “task similarity” is still lacking. While conventional wisdom suggests that simple measures of similarity between source and target distributions, such as $\phi$-divergences or integral probability metrics, can directly predict the success of transfer, we prove the surprising fact that, in general, this is not the case. We adopt, instead, a feature-centric viewpoint on transfer learning and establish a number of theoretical results that demonstrate that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. We study deep linear networks as a minimal model of transfer learning in which we can analytically characterize the transferability phase diagram as a function of the target dataset size and the feature space overlap. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance, especially in the low data limit. These results build on an emerging understanding of feature learning dynamics in deep linear networks, and we demonstrate numerically that the rigorous results we derive for the linear case also apply to nonlinear networks.

nan


Article 659

Title@2025-07-07 (1): Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting

Title: Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting Temporal Conformal Prediction (TCP): Ein verteilungsfreies statistisches und maschinelles Lernkonzept für adaptive Risikoprognosen 时空危机预测:用于适应风险预测的不分发的统计和机器学习框架 2507.05470v1

Authors (3): Agnideep Aich, Ashit Baran Aich, Dipak C. Jain

We propose Temporal Conformal Prediction (TCP), a novel framework for constructing prediction intervals in financial time-series with guaranteed finite-sample validity. TCP integrates quantile regression with a conformal calibration layer that adapts online via a decaying learning rate. This hybrid design bridges statistical and machine learning paradigms, enabling TCP to accommodate non-stationarity, volatility clustering, and regime shifts which are hallmarks of real-world asset returns, without relying on rigid parametric assumptions. We benchmark TCP against established methods including GARCH, Historical Simulation, and static Quantile Regression across equities (S&P 500), cryptocurrency (Bitcoin), and commodities (Gold). Empirical results show that TCP consistently delivers sharper intervals with competitive or superior coverage, particularly in high-volatility regimes. Our study underscores TCP’s strength in navigating the coverage-sharpness tradeoff, a central challenge in modern risk forecasting. Overall, TCP offers a distribution-free, adaptive, and interpretable alternative for financial uncertainty quantification, advancing the interface between statistical inference and machine learning in finance.

nan


Article 660

Title@2025-07-07 (1): GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC

Title: GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC GCN-getriebenes Verstärkungslernen für probabilistische Echtzeitgarantien in industriellen URLLC GCN-Driven Energy Enter Enter Engment 学习,以在工业URLC中进行概率性实时保证 2506.15011v2

Authors (2): Eman Alqudah, Ashfaq Khokhar

Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks. We enhance the Local Deadline Partition (LDP) algorithm by introducing a Graph Convolutional Network (GCN) integrated with a Deep Q-Network (DQN) reinforcement learning framework for improved interference coordination in multi-cell, multi-channel networks. Unlike LDP’s static priorities, our approach dynamically learns link priorities based on real-time traffic demand, network topology, remaining transmission opportunities, and interference patterns. The GCN captures spatial dependencies, while the DQN enables adaptive scheduling decisions through reward-guided exploration. Simulation results show that our GCN-DQN model achieves mean SINR improvements of 179.6\%, 197.4\%, and 175.2\% over LDP across three network configurations. Additionally, the GCN-DQN model demonstrates mean SINR improvements of 31.5\%, 53.0\%, and 84.7\% over our previous CNN-based approach across the same configurations. These results underscore the effectiveness of our GCN-DQN model in addressing complex URLLC requirements with minimal overhead and superior network performance.

nan


Article 661

Title@2025-07-07 (1): 2048: Reinforcement Learning in a Delayed Reward Environment

Title: 2048: Reinforcement Learning in a Delayed Reward Environment 2048: Verstärktes Lernen in einer verzögerten Belohnungsumgebung 2048年:在延迟奖励环境中加强学习 2507.05465v1

Authors (3): Prady Saligram, Tanvir Bhathal, Robby Manihani

Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from 3.988K (DQN) to 5.756K (PPO), 8.66K (QR-DQN), and 18.21K (H-DQN), with H-DQN reaching the 2048 tile. Upon scaling H-DQN it reaches a max score 41.828K and a 4096 tile. These results demonstrate that distributional, multi-step targets substantially enhance performance in sparse-reward domains, and they suggest promising avenues for further gains through model-based planning and curriculum learning.

nan


Article 662

Title@2025-07-07 (1): RSPO: Regularized Self-Play Alignment of Large Language Models

Title: RSPO: Regularized Self-Play Alignment of Large Language Models RSPO: Regularisierte Selbstausrichtung großer Sprachmodelle RSPO:大语言模式标准化的自用调整 2503.00030v2

Authors (6): Xiaohang Tang, Sangwoong Yoon, Seongho Son, Huizhuo Yuan, Quanquan Gu, Ilija Bogunovic

Self-play alignment has emerged as an effective approach for fine-tuning large language models (LLMs), formulating preference optimization as a two-player game. However, the regularization with respect to the reference policy, which is crucial for mitigating over-optimization, has been insufficiently investigated in self-play alignment. To study the impact of different regularization strategies, we propose \textbf{Regularized Self-Play Policy Optimization (RSPO)}, a general and modular framework that unifies prior methods and enables simple plug-and-play integration of various regularizers, meanwhile preserving convergence to Nash equilibrium of the corresponding regularized game.Our empirical study involving over $120$ fine-tuned Mistral-7B-Instruct models reveals that forward KL divergence regularization reduces response length, whereas reverse KL divergence markedly improves raw win rates. Crucially, RSPO regularized with a linear combination of forward and reverse KL divergence significantly boosts the length-controlled win rate on AlpacaEval-2 from $28.5\%$ (unregularized self-play, SPPO) to $35.4\%$, and consistently demonstrates superior performance on Arena-Hard, MT-Bench, ArmoRM scores, and response diversity. Combining simplicity, convergence guarantees, and significant empirical gains, RSPO offers a strong foundation for exploring regularized self-play in language model alignment.

nan


Article 663

Title@2025-07-07 (1): When Federated Learning Meets Quantum Computing: Survey and Research Opportunities

Title: When Federated Learning Meets Quantum Computing: Survey and Research Opportunities Wenn Federated Learning auf Quanten Computing trifft: Umfrage- und Forschungsmöglichkeiten 《当联邦学习与量子计算:调查和研究机会》 2504.08814v2

Authors (3): Aakar Mathur, Ashish Gupta, Sajal K. Das

Quantum Federated Learning (QFL) is an emerging field that harnesses advances in Quantum Computing (QC) to improve the scalability and efficiency of decentralized Federated Learning (FL) models. This paper provides a systematic and comprehensive survey of the emerging problems and solutions when FL meets QC, from research protocol to a novel taxonomy, particularly focusing on both quantum and federated limitations, such as their architectures, Noisy Intermediate Scale Quantum (NISQ) devices, and privacy preservation, so on. This work explores key developments and integration strategies, along with the impact of quantum computing on FL, keeping a sharp focus on hybrid quantum-classical approaches. The paper offers an in-depth understanding of how the strengths of QC, such as gradient hiding, state entanglement, quantum key distribution, quantum security, and quantum-enhanced differential privacy, have been integrated into FL to ensure the privacy of participants in an enhanced, fast, and secure framework. Finally, this study proposes potential future directions to address the identified research gaps and challenges, aiming to inspire faster and more secure QFL models for practical use.

nan


Article 664

Title@2025-07-07 (1): Towards General Continuous Memory for Vision-Language Models

Title: Towards General Continuous Memory for Vision-Language Models Auf dem Weg zum allgemeinen Dauergedächtnis für Vision-Sprachenmodelle 争取实现视野-语言模型的一般持续记忆 2505.17670v2

Authors (6): Wenyi Wu, Zixuan Song, Kun Zhou, Yifei Shao, Zhiting Hu, Biwei Huang

Language models (LMs) and their extension, vision-language models (VLMs), have achieved remarkable performance across various tasks. However, they still struggle with complex reasoning tasks that require multimodal or multilingual real-world knowledge. To support such capabilities, an external memory system that can efficiently provide relevant multimodal information is essential. Existing approaches generally concatenate image and text tokens into a long sequence as memory, which, however, may drastically increase context length and even degrade performance. In contrast, we propose using continuous memory, a compact set of dense embeddings to more effectively and efficiently represent multimodal and multilingual knowledge. Our key insight is that a VLM can serve as its own continuous memory encoder. We empirically show that this design improves performance on complex multimodal reasoning tasks. Building on this, we introduce a data-efficient and parameter-efficient method to fine-tune the VLM into a memory encoder, requiring only 1.2% of the model’s parameters and a small corpus of 15.6K self-synthesized samples. Our approach CoMEM utilizes VLM’s original capabilities to encode arbitrary multimodal and multilingual knowledge into just 8 continuous embeddings. Since the inference-time VLM remains frozen, our memory module is plug-and-play and can be flexibly integrated as needed. Extensive experiments across eight multimodal reasoning benchmarks demonstrate the effectiveness of our approach.

nan


Article 665

Title@2025-07-07 (1): Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack

Title: Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack 通过最大对抗性多目标攻击对财务报告进行反向机器学习攻击 2507.05441v1

Authors (4): Edward Raff, Karen Kukla, Michel Benaroch, Joseph Comprix

Bad actors, primarily distressed firms, have the incentive and desire to manipulate their financial reports to hide their distress and derive personal gains. As attackers, these firms are motivated by potentially millions of dollars and the availability of many publicly disclosed and used financial modeling frameworks. Existing attack methods do not work on this data due to anti-correlated objectives that must both be satisfied for the attacker to succeed. We introduce Maximum Violated Multi-Objective (MVMO) attacks that adapt the attacker’s search direction to find $20\times$ more satisfying attacks compared to standard attacks. The result is that in $\approx50\%$ of cases, a company could inflate their earnings by 100-200%, while simultaneously reducing their fraud scores by 15%. By working with lawyers and professional accountants, we ensure our threat model is realistic to how such frauds are performed in practice.

nan


Article 666

Title@2025-07-07 (1): DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

Title: DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning DeepRetro: Retrosynthetische Pathway Discovery mit iterativem LLM Reasoning 使用迭代LM 理由解释的回溯合成途径发现 2507.07060v1

Authors (9): Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar

Retrosynthesis, the identification of precursor molecules for a target compound, is pivotal for synthesizing complex molecules, but faces challenges in discovering novel pathways beyond predefined templates. Recent large language model (LLM) approaches to retrosynthesis have shown promise but effectively harnessing LLM reasoning capabilities for effective multi-step planning remains an open question. To address this challenge, we introduce DeepRetro, an open-source, iterative, hybrid LLM-based retrosynthetic framework. Our approach integrates the strengths of conventional template-based/Monte Carlo tree search tools with the generative power of LLMs in a step-wise, feedback-driven loop. Initially, synthesis planning is attempted with a template-based engine. If this fails, the LLM subsequently proposes single-step retrosynthetic disconnections. Crucially, these suggestions undergo rigorous validity, stability, and hallucination checks before the resulting precursors are recursively fed back into the pipeline for further evaluation. This iterative refinement allows for dynamic pathway exploration and correction. We demonstrate the potential of this pipeline through benchmark evaluations and case studies, showcasing its ability to identify viable and potentially novel retrosynthetic routes. In particular, we develop an interactive graphical user interface that allows expert human chemists to provide human-in-the-loop feedback to the reasoning algorithm. This approach successfully generates novel pathways for complex natural product compounds, demonstrating the potential for iterative LLM reasoning to advance state-of-art in complex chemical syntheses.

nan


Article 667

Title@2025-07-07 (1): Optimizing Bidding Strategies in First-Price Auctions in Binary Feedback Setting with Predictions

Title: Optimizing Bidding Strategies in First-Price Auctions in Binary Feedback Setting with Predictions Optimierung von Bietenstrategien in Erstpreis-Auktionen in Binary-Feedback-Einstellung mit Vorhersagen 在有预测的二进制反馈中,在第一价格拍卖中优化投标战略 2506.15817v2

Authors (1): Jason Tandiary

This paper studies Vickrey first-price auctions under binary feedback. Leveraging the enhanced performance of machine learning algorithms, the new algorithm uses past information to improve the regret bounds of the BROAD-OMD algorithm. Motivated by the growing relevance of first-price auctions and the predictive capabilities of machine learning models, this paper proposes a new algorithm within the BROAD-OMD framework (Hu et al., 2025) that leverages predictions of the highest competing bid. This paper’s main contribution is an algorithm that achieves zero regret under accurate predictions. Additionally, a bounded regret bound of O(T^(3/4) * Vt^(1/4)) is established under certain normality conditions.

nan


Article 668

Title@2025-07-07 (1): LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

Title: LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users LLM Hypnose: Nutzung des Benutzerfeedbacks für unautorisierte Wissensinjektion für alle Benutzer LLM Hypnisis:利用用户反馈,为所有用户提供未经授权知识注射 2507.02850v2

Authors (4): Almog Hilel, Idan Shenfeld, Jacob Andreas, Leshem Choshen

We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and upvote / downvote feedback on LM outputs. To implement the attack, the attacker prompts the LM to stochastically output either a “poisoned” or benign response, then upvotes the poisoned response or downvotes the benign one. When feedback signals are used in a subsequent preference tuning behavior, LMs exhibit increased probability of producing poisoned responses even in contexts without malicious prompts. We show that this attack can be used to (1) insert factual knowledge the model did not previously possess, (2) modify code generation patterns in ways that introduce exploitable security flaws, and (3) inject fake financial news. Our finding both identifies a new qualitative feature of language model preference tuning (showing that it even highly restricted forms of preference data can be used to exert fine-grained control over behavior), and a new attack mechanism for LMs trained with user feedback (extending work on pretraining-time data poisoning and deployment-time prompt injection).

nan


Article 669

Title@2025-07-07 (1): Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning

Title: Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning Coarse-to-fine Q-Network mit Aktionssequenz für dateneffizientes Roboterlernen Coarse 至 fine Q 网络与数据效率机器人学习行动序列 2411.12155v5

Authors (2): Younggyo Seo, Pieter Abbeel

Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.

nan


Article 670

Title@2025-07-07 (1): Motion Generation: A Survey of Generative Approaches and Benchmarks

Title: Motion Generation: A Survey of Generative Approaches and Benchmarks Motion Generation: Eine Übersicht über generative Ansätze und Benchmarks 动议的产生:对产生方式和基准的调查 2507.05419v1

Authors (8): Aliasghar Khani, Arianna Rampini, Bruno Roy, Larasika Nadela, Noa Kaplan, Evan Atherton, Derek Cheung, Jacky Bibliowicz

Motion generation, the task of synthesizing realistic motion sequences from various conditioning inputs, has become a central problem in computer vision, computer graphics, and robotics, with applications ranging from animation and virtual agents to human-robot interaction. As the field has rapidly progressed with the introduction of diverse modeling paradigms including GANs, autoencoders, autoregressive models, and diffusion-based techniques, each approach brings its own advantages and limitations. This growing diversity has created a need for a comprehensive and structured review that specifically examines recent developments from the perspective of the generative approach employed. In this survey, we provide an in-depth categorization of motion generation methods based on their underlying generative strategies. Our main focus is on papers published in top-tier venues since 2023, reflecting the most recent advancements in the field. In addition, we analyze architectural principles, conditioning mechanisms, and generation settings, and compile a detailed overview of the evaluation metrics and datasets used across the literature. Our objective is to enable clearer comparisons and identify open challenges, thereby offering a timely and foundational reference for researchers and practitioners navigating the rapidly evolving landscape of motion generation.

nan


Article 671

Title@2025-07-07 (1): Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Title: Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning Global lernen, lokal sprechen: Die Lücken in der Mehrsprachigkeit überwinden 《全球学习,本地语言:弥合多种语文理由方面的差距》 2507.05418v1

Authors (8): Jaedong Hwang, Kumar Tanmay, Seok-Jin Lee, Ayush Agrawal, Hamid Palangi, Kumar Ayush, Ila Fiete, Paul Pu Liang

Large Language Models (LLMs) have achieved strong performance in domains like mathematics, factual QA, and code generation, yet their multilingual reasoning capabilities in these tasks remain underdeveloped. Especially for low-resource languages such as Swahili or Thai, LLMs can often misinterpret prompts or default to reasoning in English. This implicit bias toward high-resource languages undermines factual accuracy, interpretability, and trust. Current multilingual benchmarks focus only on final answers, overlooking whether models actually reason in the target language. To address this gap, we introduce GeoFact-X, a geography-based multilingual factual reasoning benchmark with annotated reasoning traces in five languages: English, Hindi, Japanese, Swahili, and Thai. We further propose BRIDGE, a novel training method that guides supervised fine-tuning and test-time reinforcement learning with a language-consistency reward to align reasoning with the input language. Finally, we develop an automatic evaluation protocol using LLM-as-a-judge to assess answer correctness and the quality and language consistency of reasoning traces, enabling nuanced and scalable analysis beyond surface-level metrics. Our results show that BRIDGE significantly enhances multilingual reasoning fidelity, demonstrating that reasoning-aware multilingual reinforcement learning is crucial for robust cross-lingual generalization. https://jd730.github.io/projects/GeoFact-X_BRIDGE

nan


Article 672

Title@2025-07-07 (1): What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning

Title: What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning Was würdest du fragen, wenn du zuerst $a^2+b^2=c^2$ sahst? LLM auf Kuriosität-getriebene Fragestellung bewerten 当你第一次看到$2+b2=c2$时你会问什么? 2409.17172v2

Authors (2): Shashidhar Reddy Javaji, Zining Zhu

Large language models (LLMs) can store a massive amount of knowledge, yet their potential to acquire new knowledge remains unknown. We propose a novel evaluation framework that evaluates this capability. This framework prompts LLMs to generate questions about a statement introducing scientific knowledge, simulating a curious person when facing the statement for the first time. We score the qualities of the generated questions, thereby evaluating the knowledge acquisition potential of the LLM. We apply controlled ablation studies to validate our scoring procedures. Additionally, we created a synthetic dataset consisting of 1101 statements in physics, chemistry, and maths with distinct levels of difficulties, 300 general knowledge statements, and 567 incorrect statements. Human evaluations were conducted to validate our model assessments, achieving an approximate weighted Cohen’s kappa of 0.7 on all three metrics considered. We find that while large models like GPT-4 and Mistral 8x7b are adept at generating coherent and relevant questions, the smaller Phi-2 model is equally or more effective. This indicates that size does not solely determine a model’s knowledge acquisition potential. The proposed framework quantifies a critical model capability that was commonly overlooked and opens up research opportunities for developing more knowledgeable AI systems

nan


Article 673

Title@2025-07-07 (1): EmissionNet: Air Quality Pollution Forecasting for Agriculture

Title: EmissionNet: Air Quality Pollution Forecasting for Agriculture EmissionsNet: Vorhersage der Luftqualität für die Landwirtschaft 排放网:农业空气质量污染预测 2507.05416v1

Authors (2): Prady Saligram, Tanvir Bhathal

Air pollution from agricultural emissions is a significant yet often overlooked contributor to environmental and public health challenges. Traditional air quality forecasting models rely on physics-based approaches, which struggle to capture complex, nonlinear pollutant interactions. In this work, we explore forecasting N$_2$O agricultural emissions through evaluating popular architectures, and proposing two novel deep learning architectures, EmissionNet (ENV) and EmissionNet-Transformer (ENT). These models leverage convolutional and transformer-based architectures to extract spatial-temporal dependencies from high-resolution emissions data

nan


Article 674

Title@2025-07-07 (1): Tractable Transformers for Flexible Conditional Generation

Title: Tractable Transformers for Flexible Conditional Generation Tractable Transformer für flexible Bedingte Generation 灵活条件一代的可变变变换器 2502.07616v2

Authors (6): Anji Liu, Xuejie Liu, Dayuan Zhao, Mathias Niepert, Yitao Liang, Guy Van den Broeck

Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e.g., GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries (i.e., the set of unknown variables) unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based generative model that is more robust to different conditional generation tasks. Unlike existing models that rely solely on global contextual features derived from full inputs, Tracformers incorporate a sparse Transformer encoder to capture both local and global contextual information. This information is routed through a decoder for conditional generation. Empirical results demonstrate that Tracformers achieve state-of-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines.

nan


Article 675

Title@2025-07-07 (1): Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

Title: Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift Einschließliche Interventionale Unabhängigkeit verbessert Robustheit gegen Interventionale Verteilungsverschiebung 纳入干预性独立 增强抵御干预性分配转变的力度 2507.05412v1

Authors (2): Gautam Sreekumar, Vishnu Naresh Boddeti

We consider the problem of learning robust discriminative representations of causally-related latent variables. In addition to observational data, the training dataset also includes interventional data obtained through targeted interventions on some of these latent variables to learn representations robust against the resulting interventional distribution shifts. Existing approaches treat interventional data like observational data, even when the underlying causal model is known, and ignore the independence relations that arise from these interventions. Since these approaches do not fully exploit the causal relational information resulting from interventions, they learn representations that produce large disparities in predictive performance on observational and interventional data, which worsens when the number of interventional training samples is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the independence conditions induced by the interventional causal model. (2) For linear models, we derive sufficient conditions on the proportion of interventional data in the training dataset, for which enforcing interventional independence between representations corresponding to the intervened node and its non-descendants lowers the error on interventional data. Combining these insights, (3) we propose RepLIn, a training algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on a synthetic dataset and on real image and text datasets on facial attribute classification and toxicity detection, respectively. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robust representations against interventional distribution shifts of both continuous and discrete latent variables.

nan


Article 676

Title@2025-07-07 (1): Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification

Title: Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification Probabilistisch verstärkte lineare Entspannungs-basierte Störungsanalyse für neurale Netzwerkverifizierung 神经网络核查的基于扰动分析的线性轻松度分析 2507.05405v1

Authors (3): Luca Marzari, Ferdinando Cicalese, Alessandro Farinelli

We present $\textbf{P}$robabilistically $\textbf{T}$ightened $\textbf{Li}$near $\textbf{R}$elaxation-based $\textbf{P}$erturbation $\textbf{A}$nalysis ($\texttt{PT-LiRPA}$), a novel framework that combines over-approximation techniques from LiRPA-based approaches with a sampling-based method to compute tight intermediate reachable sets. In detail, we show that with negligible computational overhead, $\texttt{PT-LiRPA}$ exploiting the estimated reachable sets, significantly tightens the lower and upper linear bounds of a neural network’s output, reducing the computational cost of formal verification tools while providing probabilistic guarantees on verification soundness. Extensive experiments on standard formal verification benchmarks, including the International Verification of Neural Networks Competition, show that our $\texttt{PT-LiRPA}$-based verifier improves robustness certificates by up to 3.31X and 2.26X compared to related work. Importantly, our probabilistic approach results in a valuable solution for challenging competition entries where state-of-the-art formal verification methods fail, allowing us to provide answers with high confidence (i.e., at least 99%).

nan


Article 677

Title@2025-07-07 (1): Q-Detection: A Quantum-Classical Hybrid Poisoning Attack Detection Method

Title: Q-Detection: A Quantum-Classical Hybrid Poisoning Attack Detection Method Q-Erkennung: Eine Quantum-klassische Hybrid-Vergiftungs-Angriffserkennungsmethode Q-检测:量子谱混合中毒袭击探测方法 2507.06262v1

Authors (4): Haoqi He, Xiaokai Lin, Jiancai Chen, Yan Xiao

Data poisoning attacks pose significant threats to machine learning models by introducing malicious data into the training process, thereby degrading model performance or manipulating predictions. Detecting and sifting out poisoned data is an important method to prevent data poisoning attacks. Limited by classical computation frameworks, upcoming larger-scale and more complex datasets may pose difficulties for detection. We introduce the unique speedup of quantum computing for the first time in the task of detecting data poisoning. We present Q-Detection, a quantum-classical hybrid defense method for detecting poisoning attacks. Q-Detection also introduces the Q-WAN, which is optimized using quantum computing devices. Experimental results using multiple quantum simulation libraries show that Q-Detection effectively defends against label manipulation and backdoor attacks. The metrics demonstrate that Q-Detection consistently outperforms the baseline methods and is comparable to the state-of-the-art. Theoretical analysis shows that Q-Detection is expected to achieve more than a 20% speedup using quantum computing power.

nan


Article 678

Title@2025-07-07 (1): Embedding-Based Approaches to Hyperpartisan News Detection

Title: Embedding-Based Approaches to Hyperpartisan News Detection Einbetten-basierte Ansätze zu Hyperparteien-Nachrichten-Erkennung 以嵌入式方法探测超党派新闻 2501.01370v3

Authors (1): Karthik Mohan

In this report, I describe the systems in which the objective is to determine whether a given news article could be considered as hyperpartisan. Hyperpartisan news takes an extremely polarized political standpoint with an intention of creating political divide among the public. Several approaches, including n-grams, sentiment analysis, as well as sentence and document representations using pre-tained ELMo models were used. The best system is using LLMs for embedding generation achieving an accuracy of around 92% over the previously best system using pre-trained ELMo with Bidirectional LSTM which achieved an accuracy of around 83% through 10-fold cross-validation.

nan


Article 679

Title@2025-07-07 (1): pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

Title: pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models pFedMMA: Personalisiertes Federed Fine-Tuning mit Multi-Modal Adapter für Vision-Language-Modelle pFedMMA:具有视觉语言模型多模式适应器的个性化联邦微调 2507.05394v1

Authors (3): Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani

Vision-Language Models (VLMs) like CLIP have demonstrated remarkable generalization in zero- and few-shot settings, but adapting them efficiently to decentralized, heterogeneous data remains a challenge. While prompt tuning has emerged as a popular parameter-efficient approach in personalized federated learning, existing methods often sacrifice generalization in favor of personalization, struggling particularly on unseen classes or domains. In this work, we propose pFedMMA, the first personalized federated learning framework that leverages multi-modal adapters for vision-language tasks. Each adapter contains modality-specific up- and down-projection layers alongside a globally shared projection that aligns cross-modal features. Our asymmetric optimization strategy allows clients to locally adapt to personalized data distributions while collaboratively training the shared projection to improve global generalization. This design is also communication-efficient, as only the shared component is exchanged during rounds. Through extensive experiments across eleven datasets, including domain- and label-shift scenarios, we show that pFedMMA achieves state-of-the-art trade-offs between personalization and generalization, outperforming recent federated prompt tuning methods. The code is available at https://github.com/sajjad-ucsb/pFedMMA.

nan


Article 680

Title@2025-07-07 (1): Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training

Title: Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training Verstärkte Feinsteuerung mildert natürlich das Vergessen im kontinuierlichen Post-Training 长期培训后忘却的自然缓解措施 2507.05386v1

Authors (13): Song Lai, Haohan Zhao, Rong Feng, Changyi Ma, Wenzhuo Liu, Hongbo Zhao, Xi Lin, Dong Yi, Min Xie, Qingfu Zhang, Hongbin Liu, Gaofeng Meng, Fei Zhu

Continual post-training (CPT) is a popular and effective technique for adapting foundation models like multimodal large language models to specific and ever-evolving downstream tasks. While existing research has primarily concentrated on methods like data replay, model expansion, or parameter regularization, the fundamental role of the learning paradigm within CPT remains largely unexplored. This paper presents a comparative analysis of two core post-training paradigms: supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT), investigating their respective impacts on knowledge retention during CPT. Our experiments are conducted on a benchmark comprising seven diverse multimodal tasks, utilizing Qwen2.5-VL-7B-Instruct as the base model for continual post-training. The investigation yields two significant findings: (1) When continuously learning on downstream tasks, SFT leads to catastrophic forgetting of previously learned tasks. In contrast, RFT inherently preserves prior knowledge and achieve performance comparable to multi-task training. (2) RFT successfully protects and even enhances the model’s general knowledge on standard benchmarks (e.g., MMMU and MMLU-Pro). Conversely, SFT degrades general model capabilities severely. Further analysis shows that explicit mechanisms, such as KL penalty and chain-of-thought reasoning, are not the primary factors. Instead, we find that the implicit regularization inherent to RFT is a key factor in mitigating forgetting. Finally, we propose a rollout-based instance filtering algorithm to improve the stability and efficiency of RFT. Our comprehensive study demonstrates the superiority of RFT as a robust paradigm for continual post-training.

nan


Article 681

Title@2025-07-07 (1): On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study

Title: On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study Über die Bias von Next-Token Predictors hin zu systematisch ineffizienter Begründung: Eine Kurzeste-Path-Fallstudie 关于对系统、低效率理性的 “ 下肯预测家 “ 的偏见:最短篇案例研究 2507.05362v1

Authors (4): Riccardo Alberghi, Elizaveta Demyanenko, Luca Biggio, Luca Saglietti

Recent advances in natural language processing highlight two key factors for improving reasoning in large language models (LLMs): (i) allocating more test-time compute tends to help on harder problems but often introduces redundancy in the reasoning trace, and (ii) compute is most effective when reasoning is systematic and incremental, forming structured chains of thought (CoTs) akin to human problem-solving. To study these factors in isolation, we introduce a controlled setting based on shortest-path tasks in layered graphs. We train decoder-only transformers on question-trace-answer triples using a custom tokenizer, comparing models trained on optimal bottom-up dynamic programming traces with those trained on longer, valid traces involving backtracking. Surprisingly, with the same training-token budget, models trained on inefficient traces generalize better to unseen graphs. This benefit is not due to length alone-injecting arbitrary redundancy into reasoning traces fails to help and can even hurt performance. Instead, we find that generalization correlates with the model’s confidence in next-token prediction, suggesting that long, coherent, and locally incremental traces make the training signal easier to optimize.

nan


Article 682

Title@2025-07-07 (1): LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks

Title: LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks LoRA-Augmented Generation (LAG) für wissensintensive Sprachaufgaben LoRA - 知识密集型语言任务支持代(LAG) 2507.05346v1

Authors (2): William Fleshman, Benjamin Van Durme

The proliferation of fine-tuned language model experts for specific tasks and domains signals the need for efficient selection and combination methods. We propose LoRA-Augmented Generation (LAG) for leveraging large libraries of knowledge and task-specific LoRA adapters. LAG requires no additional training or access to data, and efficiently filters, retrieves, and applies experts on a per-token and layer basis. We evaluate LAG on various knowledge-intensive tasks, achieving superior performance over existing data-free methods. We explore scenarios where additional data is available, demonstrating LAG’s compatibility with alternative solutions such as retrieval-augmented generation (RAG).

nan


Article 683

Title@2025-07-07 (1): Causal Foundation Models: Disentangling Physics from Instrument Properties

Title: Causal Foundation Models: Disentangling Physics from Instrument Properties Causal Foundation Models: Entwirren von Physik aus Instrumenteneigenschaften 因果基金会模型:将物理与仪器属性分离 2507.05333v1

Authors (5): Jeroen Audenaert, Daniel Muthukrishna, Paul F. Gregory, David W. Hogg, V. Ashley Villar

Foundation models for structured time series data must contend with a fundamental challenge: observations often conflate the true underlying physical phenomena with systematic distortions introduced by measurement instruments. This entanglement limits model generalization, especially in heterogeneous or multi-instrument settings. We present a causally-motivated foundation model that explicitly disentangles physical and instrumental factors using a dual-encoder architecture trained with structured contrastive learning. Leveraging naturally occurring observational triplets (i.e., where the same target is measured under varying conditions, and distinct targets are measured under shared conditions) our model learns separate latent representations for the underlying physical signal and instrument effects. Evaluated on simulated astronomical time series designed to resemble the complexity of variable stars observed by missions like NASA’s Transiting Exoplanet Survey Satellite (TESS), our method significantly outperforms traditional single-latent space foundation models on downstream prediction tasks, particularly in low-data regimes. These results demonstrate that our model supports key capabilities of foundation models, including few-shot generalization and efficient adaptation, and highlight the importance of encoding causal structure into representation learning for structured data.

nan


Article 684

Title@2025-07-07 (1): Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Title: Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations Beyond One Shot, Beyond One Perspective: Cross-View und Long-Horizon-Destillation für bessere LiDAR-Darstellungen 超越一个镜头,超越一个视角:交叉查看和长毛利区蒸馏,以更好地进行激光雷达演示 2507.05260v1

Authors (5): Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu

LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. LiMA comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments. LiMA maintains high pretraining efficiency and incurs no additional computational overhead during downstream tasks. Extensive experiments on mainstream LiDAR-based perception benchmarks demonstrate that LiMA significantly improves both LiDAR semantic segmentation and 3D object detection. We hope this work inspires more effective pretraining paradigms for autonomous driving. The code has be made publicly accessible for future research.

nan


Article 685

Title@2025-07-07 (1): Spatio-Temporal LLM: Reasoning about Environments and Actions

Title: Spatio-Temporal LLM: Reasoning about Environments and Actions Spatio-Temporal LLM: Begründung von Umgebungen und Aktionen Spatio-临时LLM:环境原因和行动 2507.05258v1

Authors (6): Haozhen Zheng, Beitong Tian, Mingyuan Wu, Zhenggang Tang, Klara Nahrstedt, Alex Schwing

Despite the significant recent progress of Multimodal Large Language Models (MLLMs), MLLMs still struggle to correctly answer prompts that require a holistic spatio-temporal understanding. Specifically, it is challenging to address prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. To address this issue, we first develop a framework to collect a large-scale dataset. Using the collected “Reasoning about Environments and Actions” (REA) dataset, we show that recent methods indeed struggle to correctly answer the prompts. To improve, we develop a “spatio-temporal LLM” (ST-LLM), a model equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. On the collected REA data, we show that the proposed method significantly improves results compared to prior work. Code and data are available at https://zoezheng126.github.io/STLLM-website/.

nan


Article 686

Title@2025-07-07 (1): Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining

Title: Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining Human2LocoMan: Vielseitige Quadrupedalmanipulation mit menschlichem Vortraining lernen 人类2 Locoman: 学习与人类预科培训一起四步操作 2506.16475v2

Authors (14): Yaru Niu, Yunzhe Zhang, Mingyang Yu, Changyi Lin, Chenhao Li, Yikai Wang, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Zhenzhen Li, Jonathan Francis, Bingqing Chen, Jie Tan, Ding Zhao

Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at: https://human2bots.github.io.

nan


Article 687

Title@2025-07-07 (1): From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Title: From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving Von Marginal zu Joint Predictions: Bewertung szenekonsistenter Trajektorien-Vorhersageansätze für Automatisiertes Fahren 从边际到联合预测:评价自动驾驶的场景-群集轨迹预测方法 2507.05254v1

Authors (7): Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller

Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent’s future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/.

nan


Article 688

Title@2025-07-07 (1): Physics-Guided Dual Implicit Neural Representations for Source Separation

Title: Physics-Guided Dual Implicit Neural Representations for Source Separation Physik-geführte Dual-Implizite-Neural-Darstellungen für die Quellentrennung 源分离的物理引导双隐含神经图示 2507.05249v1

Authors (10): Yuan Ni, Zhantao Chen, Alexander N. Petsch, Edmund Xu, Cheng Peng, Alexander I. Kolesnikov, Sugata Chowdhury, Arun Bansil, Jana B. Thayer, Joshua J. Turner

Significant challenges exist in efficient data analysis of most advanced experimental and observational techniques because the collected signals often include unwanted contributions–such as background and signal distortions–that can obscure the physically relevant information of interest. To address this, we have developed a self-supervised machine-learning approach for source separation using a dual implicit neural representation framework that jointly trains two neural networks: one for approximating distortions of the physical signal of interest and the other for learning the effective background contribution. Our method learns directly from the raw data by minimizing a reconstruction-based loss function without requiring labeled data or pre-defined dictionaries. We demonstrate the effectiveness of our framework by considering a challenging case study involving large-scale simulated as well as experimental momentum-energy-dependent inelastic neutron scattering data in a four-dimensional parameter space, characterized by heterogeneous background contributions and unknown distortions to the target signal. The method is found to successfully separate physically meaningful signals from a complex or structured background even when the signal characteristics vary across all four dimensions of the parameter space. An analytical approach that informs the choice of the regularization parameter is presented. Our method offers a versatile framework for addressing source separation problems across diverse domains, ranging from superimposed signals in astronomical measurements to structural features in biomedical image reconstructions.

nan


Article 689

Title@2025-07-07 (1): Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints

Title: Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints Multi-Disease Deep Learning Framework für GWAS: Neben der Feature-Auswahl Einschränkungen GWAS多重疾病深入学习框架:超越地物选择限制 2507.05247v1

Authors (5): Iqra Farooq, Sara Atito, Ayse Demirkan, Inga Prokopenko, Muhammad Rana

Traditional GWAS has advanced our understanding of complex diseases but often misses nonlinear genetic interactions. Deep learning offers new opportunities to capture complex genomic patterns, yet existing methods mostly depend on feature selection strategies that either constrain analysis to known pathways or risk data leakage when applied across the full dataset. Further, covariates can inflate predictive performance without reflecting true genetic signals. We explore different deep learning architecture choices for GWAS and demonstrate that careful architectural choices can outperform existing methods under strict no-leakage conditions. Building on this, we extend our approach to a multi-label framework that jointly models five diseases, leveraging shared genetic architecture for improved efficiency and discovery. Applied to five million SNPs across 37,000 samples, our method achieves competitive predictive performance (AUC 0.68-0.96), offering a scalable, leakage-free, and biologically meaningful approach for multi-disease GWAS analysis.

nan


Article 690

Title@2025-07-07 (1): Going Beyond Heuristics by Imposing Policy Improvement as a Constraint

Title: Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Über Heuristiken hinausgehend durch politische Verbesserung als Zwang 将政策改进作为约束手段, 2507.05328v1

Authors (3): Chi-Chang Lee, Zhang-Wei Hong, Pulkit Agrawal

In many reinforcement learning (RL) applications, augmenting the task rewards with heuristic rewards that encode human priors about how a task should be solved is crucial for achieving desirable performance. However, because such heuristics are usually not optimal, much human effort and computational resources are wasted in carefully balancing tasks and heuristic rewards. Theoretically rigorous ways of incorporating heuristics rely on the idea of \textit{policy invariance}, which guarantees that the performance of a policy obtained by maximizing heuristic rewards is the same as the optimal policy with respect to the task reward. However, in practice, policy invariance doesn’t result in policy improvement, and such methods are known to empirically perform poorly. We propose a new paradigm to mitigate reward hacking and effectively use heuristics based on the practical goal of maximizing policy improvement instead of policy improvement. Our framework, Heuristic Enhanced Policy Optimization (HEPO), effectively leverages heuristics while avoiding the pitfall of prior methods for mitigating reward hacking. HEPO achieves superior performance on standard benchmarks with well-engineered reward functions. More surprisingly, HEPO allows policy optimization to achieve good performance even when heuristics are not well-engineered and designed by non-expert humans, showcasing HEPO’s ability to reduce human effort in reward design. % HEPO is a plug-and-play optimization method for leveraging heuristics in reinforcement learning. Code is available at https://github.com/Improbable-AI/hepo.

nan


Article 691

Title@2025-07-07 (1): Logit Reweighting for Topic-Focused Summarization

Title: Logit Reweighting for Topic-Focused Summarization Logit-Umgewichtung für thematisch fokussierte Zusammenfassung 用于专题用途汇总的对数重新加权 2507.05235v1

Authors (3): Joschka Braun, Bálint Mucsányi, Seyed Ali Bahrainian

Generating abstractive summaries that adhere to a specific topic remains a significant challenge for language models. While standard approaches, such as fine-tuning, are resource-intensive, simpler methods like prompt engineering often struggle to maintain topical focus, particularly with smaller models. To address this, we propose a lightweight method that enhances topical relevance by directly reweighting the logits of topic-relevant tokens during generation. We evaluate three such reweighting techniques: Constant Shift, which adds a constant value to logits; Factor Scaling, which multiplies them by a factor; and Threshold Selection, which selectively boosts logits that exceed a probability threshold. Experiments on the NEWTS topical summarization dataset, using both Gemma-2B and Llama-3-8B models, show that these techniques effectively increase the use of topic-relevant vocabulary. Notably, the Threshold Selection method successfully improves topical focus without compromising summary quality-a trade-off often seen in other approaches. Our findings demonstrate that directly reweighting logits is a practical and resource-efficient alternative to fine-tuning, offering a promising pathway for precisely controlling the thematic content of generated text.

nan


Article 692

Title@2025-07-07 (1): Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals

Title: Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals Transformer können nicht-lineare und nicht-markowsche Filterprobleme in kontinuierlicher Zeit für bedingt gaussische Signale lösen 变换器可以在连续时间解答非滑动和非马尔科维的过滤问题, 以用于有条件的高斯信号 2310.19603v3

Authors (4): Blanka Horvath, Anastasis Kratsios, Yannick Limmer, Xuwei Yang

The use of attention-based deep learning models in stochastic filtering, e.g.\ transformers and deep Kalman filters, has recently come into focus; however, the potential for these models to solve stochastic filtering problems remains largely unknown. The paper provides an affirmative answer to this open problem in the theoretical foundations of machine learning by showing that a class of continuous-time transformer models, called \textit{filterformers}, can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-time (possibly non-Gaussian) measurements. Our approximation guarantees hold uniformly over sufficiently regular compact subsets of continuous-time paths, where the worst-case 2-Wasserstein distance between the true optimal filter and our deep learning model quantifies the approximation error. Our construction relies on two new customizations of the standard attention mechanism: The first can losslessly adapt to the characteristics of a broad range of paths since we show that the attention mechanism implements bi-Lipschitz embeddings of sufficiently regular sets of paths into low-dimensional Euclidean spaces; thus, it incurs no ``dimension reduction error’’. The latter attention mechanism is tailored to the geometry of Gaussian measures in the $2$-Wasserstein space. Our analysis relies on new stability estimates of robust optimal filters in the conditionally Gaussian setting.

nan


Article 693

Title@2025-07-07 (1): SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Title: SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound SEE-2-SOUND: Zero-Shot-Spatial-Environment-to-Spatial-Sound SEE-2-SOUND:零热空间环境对空间声音 2406.06612v2

Authors (4): Rishit Dagli, Shivesh Prakash, Robert Wu, Houman Khosravani

Generating combined visual and auditory sensory experiences is critical for the consumption of immersive content. Recent advances in neural generative models have enabled the creation of high-resolution content across multiple modalities such as images, text, speech, and videos. Despite these successes, there remains a significant gap in the generation of high-quality spatial audio that complements generated visual content. Furthermore, current audio generation models excel in either generating natural audio or speech or music but fall short in integrating spatial audio cues necessary for immersive experiences. In this work, we introduce SEE-2-SOUND, a zero-shot approach that decomposes the task into (1) identifying visual regions of interest; (2) locating these elements in 3D space; (3) generating mono-audio for each; and (4) integrating them into spatial audio. Using our framework, we demonstrate compelling results for generating spatial audio for high-quality videos, images, and dynamic images from the internet, as well as media generated by learned approaches.

nan


Article 694

Title@2025-07-07 (1): Cascade: Token-Sharded Private LLM Inference

Title: Cascade: Token-Sharded Private LLM Inference Cascade: Token-Sharded Private LLM Inferenz Cascade: 石块硬硬的私人LLM 推论 2507.05228v1

Authors (6): Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal

As LLMs continue to increase in parameter size, the computational resources required to run them are available to fewer parties. Therefore, third-party inference services – where LLMs are hosted by third parties with significant computational resources – are becoming increasingly popular. However, third party inference raises critical concerns about user data privacy. To mitigate these risks, privacy researchers have developed provably secure schemes for third-party inference, such as Secure Multi-Party Computation (SMPC). However, SMPC protocols have significant computational and communication overhead, and do not scale to large models. In this work, we propose a new multi-party inference protocol, Cascade, that avoids these punitive costs by leveraging sharding in the sequence dimension to maintain privacy, trading off cryptographic privacy guarantees for increased performance and scalability. We demonstrate that Cascade is resistant to a generalization of a recent attack that is highly effective against other statistical privacy schemes, and that it is further resistant to learning-based attacks. As Cascade is orders of magnitude faster than existing schemes, our findings offer practical solutions for secure deployment of modern state-of-the-art LLMs.

nan


Article 695

Title@2025-07-07 (1): NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

Title: NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving NavigScene: Überbrückung der lokalen Wahrnehmung und globale Navigation für das autonome Fahren jenseits des Visuellen Bereichs 导航-光景:将地方感知与全球导航连接起来,促进超越视觉-Range 自主驾驶 2507.05227v1

Authors (8): Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu

Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language dataset that simulates a human-like driving environment within autonomous driving systems. Moreover, we develop three complementary paradigms to leverage NavigScene: (1) Navigation-guided Reasoning, which enhances vision-language models by incorporating navigation context into the prompting approach; (2) Navigation-guided Preference Optimization, a reinforcement learning method that extends Direct Preference Optimization to improve vision-language model responses by establishing preferences for navigation-relevant summarized information; and (3) Navigation-guided Vision-Language-Action model, which integrates navigation guidance and vision-language models with conventional driving models through feature fusion. Extensive experiments demonstrate that our approaches significantly improve performance across perception, prediction, planning, and question-answering tasks by enabling reasoning capabilities beyond visual range and improving generalization to diverse driving scenarios. This work represents a significant step toward more comprehensive autonomous driving systems capable of navigating complex, unfamiliar environments with greater reliability and safety.

nan


Article 696

Title@2025-07-07 (1): OminiControl: Minimal and Universal Control for Diffusion Transformer

Title: OminiControl: Minimal and Universal Control for Diffusion Transformer OminiControl: Minimale und universelle Steuerung für Diffusionstransformator Omini Control: 扩散变异器最小和通用控制 2411.15098v6

Authors (5): Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang

We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. Current image conditioning methods either introduce substantial parameter overhead or handle only specific control tasks effectively, limiting their practical versatility. OminiControl addresses these limitations through three key innovations: (1) a minimal architectural design that leverages the DiT’s own VAE encoder and transformer blocks, requiring just 0.1% additional parameters; (2) a unified sequence processing strategy that combines condition tokens with image tokens for flexible token interactions; and (3) a dynamic position encoding mechanism that adapts to both spatially-aligned and non-aligned control tasks. Our extensive experiments show that this streamlined approach not only matches but surpasses the performance of specialized methods across multiple conditioning tasks. To overcome data limitations in subject-driven generation, we also introduce Subjects200K, a large-scale dataset of identity-consistent image pairs synthesized using DiT models themselves. This work demonstrates that effective image control can be achieved without architectural complexity, opening new possibilities for efficient and versatile image generation systems.

nan


Article 697

Title@2025-07-07 (1): QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions

Title: QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions QuEst: Verbesserung der Schätzungen von Quantil-basierten Verteilungsmaßen mittels Modellvorhersagen Quest:利用模型预测,提高量基分配措施的估计数 2507.05220v1

Authors (6): Zhun Deng, Thomas P Zollo, Benjamin Eyre, Amogh Inamdar, David Madras, Richard Zemel

As machine learning models grow increasingly competent, their predictions can supplement scarce or expensive data in various important domains. In support of this paradigm, algorithms have emerged to combine a small amount of high-fidelity observed data with a much larger set of imputed model outputs to estimate some quantity of interest. Yet current hybrid-inference tools target only means or single quantiles, limiting their applicability for many critical domains and use cases. We present QuEst, a principled framework to merge observed and imputed data to deliver point estimates and rigorous confidence intervals for a wide family of quantile-based distributional measures. QuEst covers a range of measures, from tail risk (CVaR) to population segments such as quartiles, that are central to fields such as economics, sociology, education, medicine, and more. We extend QuEst to multidimensional metrics, and introduce an additional optimization technique to further reduce variance in this and other hybrid estimators. We demonstrate the utility of our framework through experiments in economic modeling, opinion polling, and language model auto-evaluation.

nan


Article 698

Title@2025-07-07 (1): A 3D Machine Learning based Volume Of Fluid scheme without explicit interface reconstruction

Title: A 3D Machine Learning based Volume Of Fluid scheme without explicit interface reconstruction Ein 3D Machine Learning based Volume Of Fluid Schema ohne explizite Schnittstellenrekonstruktion 基于A3D的基于A3D机机学习的流流计划量,没有明确的接口重建 2507.05218v1

Authors (2): Moreno Pintore, Bruno Després

We present a machine-learning based Volume Of Fluid method to simulate multi-material flows on three-dimensional domains. One of the novelties of the method is that the flux fraction is computed by evaluating a previously trained neural network and without explicitly reconstructing any local interface approximating the exact one. The network is trained on a purely synthetic dataset generated by randomly sampling numerous local interfaces and which can be adapted to improve the scheme on less regular interfaces when needed. Several strategies to ensure the efficiency of the method and the satisfaction of physical constraints and properties are suggested and formalized. Numerical results on the advection equation are provided to show the performance of the method. We observe numerical convergence as the size of the mesh tends to zero $h=1/N_h\searrow 0$, with a better rate than two reference schemes.

nan


Article 699

Title@2025-07-07 (1): Bridging Prediction and Intervention Problems in Social Systems

Title: Bridging Prediction and Intervention Problems in Social Systems Überbrückung von Vorhersagen und Interventionsprobleme in den Sozialsystemen 弥合社会制度中的预测和干预问题 2507.05216v1

Authors (35): Lydia T. Liu, Inioluwa Deborah Raji, Angela Zhou, Luke Guerdan, Jessica Hullman, Daniel Malinsky, Bryan Wilder, Simone Zhang, Hammaad Adam, Amanda Coston, Ben Laufer, Ezinne Nwankwo, Michael Zanger-Tishler, Eli Ben-Michael, Solon Barocas, Avi Feller, Marissa Gerchick, Talia Gillis, Shion Guha, Daniel Ho, Lily Hu, Kosuke Imai, Sayash Kapoor, Joshua Loftus, Razieh Nabi, Arvind Narayanan, Ben Recht, Juan Carlos Perdomo, Matthew Salganik, Mark Sendak, Alexander Tolbert, Berk Ustun, Suresh Venkatasubramanian, Angelina Wang, Ashia Wilson

Many automated decision systems (ADS) are designed to solve prediction problems – where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.

nan


Article 700

Title@2025-07-07 (1): Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense

Title: Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense Mehrstufige Picard-Annäherungen und tiefe neuronale Netzwerke mit ReLU, undichter ReLU und Softplus-Aktivierung überwinden den Fluch der Dimensionalität, wenn sie semilineare parabolische partielle Differentialgleichungen in $L^p$-Sense annähern 多级 Piccar 近似和深神经网络,与 ReLU、 泄漏 ReLU 和软附加激活 克服了维度的诅咒, 当半线性半线性抛抛物线部分偏差方程以 $Lp$- sense 等值接近一致时 2409.20431v3

Authors (2): Ariel Neufeld, Tuan Anh Nguyen

We prove that multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation are capable of approximating solutions of semilinear Kolmogorov PDEs in $L^\mathfrak{p}$-sense, $\mathfrak{p}\in [2,\infty)$, in the case of gradient-independent, Lipschitz-continuous nonlinearities, while the computational effort of the multilevel Picard approximations and the required number of parameters in the neural networks grow at most polynomially in both dimension $d\in \mathbb{N}$ and reciprocal of the prescribed accuracy $\epsilon$.

nan


Article 701

Title@2025-07-07 (1): ST-LoRA: Low-rank Adaptation for Spatio-Temporal Forecasting

Title: ST-LoRA: Low-rank Adaptation for Spatio-Temporal Forecasting ST-LoRA: Niedrigrank-Anpassung für Spatio-Temporale Vorhersagen ST-LORA: 低度适应性 2404.07919v2

Authors (7): Weilin Ruan, Wei Chen, Xilin Dang, Jianxiang Zhou, Weichuang Li, Xu Liu, Yuxuan Liang

Spatio-temporal forecasting is essential for understanding future dynamics within real-world systems by leveraging historical data from multiple locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data. These methods neglect node-level heterogeneity and face over-parameterization when attempting to model node-specific characteristics. In this paper, we present a novel low-rank adaptation framework for existing spatio-temporal prediction models, termed \model, which alleviates the aforementioned problems through node-level adjustments. Specifically, we introduce the node-adaptive low-rank layer and node-specific predictor, capturing the complex functional characteristics of nodes while maintaining computational efficiency. Extensive experiments on multiple real-world datasets demonstrate that our method consistently achieves superior performance across various forecasting models with minimal computational overhead, improving performance by 7% with only 1% additional parameter cost. The source code is available at https://github.com/RWLinno/ST-LoRA.

nan


Article 702

Title@2025-07-07 (1): Pre-Trained Policy Discriminators are General Reward Models

Title: Pre-Trained Policy Discriminators are General Reward Models Pre-Trained Policy Diskriminatoren sind allgemeine Prämienmodelle 培训前政策歧视者是一般奖励模式 2507.05197v1

Authors (22): Shihan Dou, Shichun Liu, Yuming Yang, Yicheng Zou, Yunhua Zhou, Shuhao Xing, Chenhao Huang, Qiming Ge, Demin Song, Haijun Lv, Songyang Gao, Chengqi Lv, Enyu Zhou, Honglin Guo, Zhiheng Xi, Wenwei Zhang, Qipeng Guo, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Tao Gui, Kai Chen

We offer a novel perspective on reward modeling by formulating it as a policy discriminator, which quantifies the difference between two policies to generate a reward signal, guiding the training policy towards a target policy with desired behaviors. Based on this conceptual insight, we propose a scalable pre-training method named Policy Discriminative Learning (POLAR), which trains a reward model (RM) to discern identical policies and discriminate different ones. Unlike traditional reward modeling methods relying on absolute preferences, POLAR captures the relative difference between one policy and an arbitrary target policy, which is a scalable, high-level optimization objective suitable for modeling generic ranking relationships. Leveraging the POLAR pre-training paradigm, we present a series of RMs with parameter scales from 1.8B to 7B. Empirical results show that POLAR substantially outperforms traditional non-pre-trained methods, significantly enhancing RM performance. For instance, POLAR-7B could improve preference accuracy from 54.8% to 81.0% on STEM tasks and from 57.9% to 85.5% on creative writing tasks compared to SOTA baselines. POLAR also shows robust generalization capabilities in RLHF using Reinforcement Fine-tuning (RFT), providing reliable reward signals and markedly enhancing policy performance–improving LLaMa3.1-8B from an average of 47.36% to 56.33% and Qwen2.5-32B from 64.49% to 70.47% on 20 benchmarks. Moreover, scaling experiments reveal a clear power-law relationship between computation and performance, supported by linear correlation coefficients approaching 0.99. The impressive performance, strong generalization, and scaling properties suggest that POLAR is a promising direction for developing general and strong reward models.

nan


Article 703

Title@2025-07-07 (1): Train-before-Test Harmonizes Language Model Rankings

Title: Train-before-Test Harmonizes Language Model Rankings Train-fore-Test harmonisiert Sprachmodell-Rankings 培训前测试前统一语言协调培训示范排列 2507.05195v1

Authors (3): Guanhua Zhang, Ricardo Dominguez-Olmedo, Moritz Hardt

Existing language model benchmarks provide contradictory model rankings, even for benchmarks that aim to capture similar skills. This dilemma of conflicting rankings hampers model selection, clouds model comparisons, and adds confusion to a growing ecosystem of competing models. Recent work attributed ranking disagreement to the phenomenon of training on the test task: As released, different models exhibit a different level of preparation for any given test task. A candidate solution to the problem is train-before-test: Give each model the same benchmark-specific finetuning before evaluation. Our primary contribution is a broad empirical evaluation of train-before-test across 24 benchmarks and 61 models. We show that train-before-test significantly improves ranking agreement consistently across all benchmarks. Whereas rankings have little external validity to start with, they enjoy a significant degree of external validity when applying train-before-test: Model rankings transfer gracefully from one benchmark to the other. Even within the same model family, train-before-test reduces strong ranking disagreement to near-perfect agreement. In addition, train-before-test reduces the model-score matrix to essentially rank one, revealing new insights into the latent factors of benchmark performance. Our work supports the recommendation to make train-before-test a default component of LLM benchmarking.

nan


Article 704

Title@2025-07-07 (1): $\varphi$-Adapt: A Physics-Informed Adaptation Learning Approach to 2D Quantum Material Discovery

Title: $\varphi$-Adapt: A Physics-Informed Adaptation Learning Approach to 2D Quantum Material Discovery $\varphi$-Adapt: Ein physikinformierter Anpassungs-Lernansatz zur 2D Quantenmaterial-Entdeckung 美元-dapt: 2D 量量材料发现物理上综合的适应学习方法 2507.05184v1

Authors (7): Hoang-Quan Nguyen, Xuan Bac Nguyen, Sankalp Pandey, Tim Faltermeier, Nicholas Borys, Hugh Churchill, Khoa Luu

Characterizing quantum flakes is a critical step in quantum hardware engineering because the quality of these flakes directly influences qubit performance. Although computer vision methods for identifying two-dimensional quantum flakes have emerged, they still face significant challenges in estimating flake thickness. These challenges include limited data, poor generalization, sensitivity to domain shifts, and a lack of physical interpretability. In this paper, we introduce one of the first Physics-informed Adaptation Learning approaches to overcome these obstacles. We focus on two main issues, i.e., data scarcity and generalization. First, we propose a new synthetic data generation framework that produces diverse quantum flake samples across various materials and configurations, reducing the need for time-consuming manual collection. Second, we present $\varphi$-Adapt, a physics-informed adaptation method that bridges the performance gap between models trained on synthetic data and those deployed in real-world settings. Experimental results show that our approach achieves state-of-the-art performance on multiple benchmarks, outperforming existing methods. Our proposed approach advances the integration of physics-based modeling and domain adaptation. It also addresses a critical gap in leveraging synthesized data for real-world 2D material analysis, offering impactful tools for deep learning and materials science communities.

nan


Article 705

Title@2025-07-07 (1): MMD-OPT : Maximum Mean Discrepancy Based Sample Efficient Collision Risk Minimization for Autonomous Driving

Title: MMD-OPT : Maximum Mean Discrepancy Based Sample Efficient Collision Risk Minimization for Autonomous Driving MMD-OPT : Maximale mittlere Diskrepanz auf Probenbasis Effiziente Kollisionsrisikominimierung für autonomes Fahren MMD-OPT: 以最大平均值差差为基础的抽样有效碰撞风险最小化 2412.09121v2

Authors (2): Basant Sharma, Arun Kumar Singh

We propose MMD-OPT: a sample-efficient approach for minimizing the risk of collision under arbitrary prediction distribution of the dynamic obstacles. MMD-OPT is based on embedding distribution in Reproducing Kernel Hilbert Space (RKHS) and the associated Maximum Mean Discrepancy (MMD). We show how these two concepts can be used to define a sample efficient surrogate for collision risk estimate. We perform extensive simulations to validate the effectiveness of MMD-OPT on both synthetic and real-world datasets. Importantly, we show that trajectory optimization with our MMD-based collision risk surrogate leads to safer trajectories at low sample regimes than popular alternatives based on Conditional Value at Risk (CVaR).

nan


Article 706

Title@2025-07-07 (1): Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis

Title: Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis Auf dem Weg zu erklärbarer Fusion und ausgewogenem Lernen in der multimodalen Sentiment-Analyse 争取在多式联运分析中实现可解释的融合和平衡学习 2504.12151v2

Authors (3): Miaosen Luo, Yuncheng Jiang, Sijie Mai

Multimodal Sentiment Analysis (MSA) faces two critical challenges: the lack of interpretability in the decision logic of multimodal fusion and modality imbalance caused by disparities in inter-modal information density. To address these issues, we propose KAN-MCP, a novel framework that integrates the interpretability of Kolmogorov-Arnold Networks (KAN) with the robustness of the Multimodal Clean Pareto (MCPareto) framework. First, KAN leverages its univariate function decomposition to achieve transparent analysis of cross-modal interactions. This structural design allows direct inspection of feature transformations without relying on external interpretation tools, thereby ensuring both high expressiveness and interpretability. Second, the proposed MCPareto enhances robustness by addressing modality imbalance and noise interference. Specifically, we introduce the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method, which jointly denoises and reduces feature dimensionality. This approach provides KAN with discriminative low-dimensional inputs to reduce the modeling complexity of KAN while preserving critical sentiment-related information. Furthermore, MCPareto dynamically balances gradient contributions across modalities using the purified features output by DRD-MIB, ensuring lossless transmission of auxiliary signals and effectively alleviating modality imbalance. This synergy of interpretability and robustness not only achieves superior performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 but also offers an intuitive visualization interface through KAN’s interpretable architecture. Our code is released on https://github.com/LuoMSen/KAN-MCP.

nan


Article 707

Title@2025-07-07 (1): Blind Targeting: Personalization under Third-Party Privacy Constraints

Title: Blind Targeting: Personalization under Third-Party Privacy Constraints Blind Targeting: Personalisierung unter Datenschutzbeschränkungen Dritter 瞄准盲人:第三方隐私限制下的个性化 2507.05175v1

Authors (1): Anya Shchetkina

Major advertising platforms recently increased privacy protections by limiting advertisers’ access to individual-level data. Instead of providing access to granular raw data, the platforms only allow a limited number of aggregate queries to a dataset, which is further protected by adding differentially private noise. This paper studies whether and how advertisers can design effective targeting policies within these restrictive privacy preserving data environments. To achieve this, I develop a probabilistic machine learning method based on Bayesian optimization, which facilitates dynamic data exploration. Since Bayesian optimization was designed to sample points from a function to find its maximum, it is not applicable to aggregate queries and to targeting. Therefore, I introduce two innovations: (i) integral updating of posteriors which allows to select the best regions of the data to query rather than individual points and (ii) a targeting-aware acquisition function that dynamically selects the most informative regions for the targeting task. I identify the conditions of the dataset and privacy environment that necessitate the use of such a “smart” querying strategy. I apply the strategic querying method to the Criteo AI Labs dataset for uplift modeling (Diemert et al., 2018) that contains visit and conversion data from 14M users. I show that an intuitive benchmark strategy only achieves 33% of the non-privacy-preserving targeting potential in some cases, while my strategic querying method achieves 97-101% of that potential, and is statistically indistinguishable from Causal Forest (Athey et al., 2019): a state-of-the-art non-privacy-preserving machine learning targeting method.

nan


Article 708

Title@2025-07-07 (1): Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Title: Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models Physikkodierte Blöcke in residualen Neural Netzwerkarchitekturen für digitale Zwillingsmodelle 数字双型模型残余神经网络架构中的物理编码区块 2411.11497v2

Authors (5): Muhammad Saad Zia, Ashiq Anjum, Lu Liu, Anthony Conway, Anasol Pena Rios

Physics Informed Machine Learning has emerged as a popular approach for modeling and simulation in digital twins, enabling the generation of accurate models of processes and behaviors in real-world systems. However, existing methods either rely on simple loss regularizations that offer limited physics integration or employ highly specialized architectures that are difficult to generalize across diverse physical systems. This paper presents a generic approach based on a novel physics-encoded residual neural network (PERNN) architecture that seamlessly combines data-driven and physics-based analytical models to overcome these limitations. Our method integrates differentiable physics blocks-implementing mathematical operators from physics-based models with feed-forward learning blocks, while intermediate residual blocks ensure stable gradient flow during training. Consequently, the model naturally adheres to the underlying physical principles even when prior physics knowledge is incomplete, thereby improving generalizability with low data requirements and reduced model complexity. We investigate our approach in two application domains. The first is a steering model for autonomous vehicles in a simulation environment, and the second is a digital twin for climate modeling using an ordinary differential equation (ODE)-based model of Net Ecosystem Exchange (NEE) to enable gap-filling in flux tower data. In both cases, our method outperforms conventional neural network approaches as well as state-of-the-art Physics Informed Machine Learning methods.

nan


Article 709

Title@2025-07-07 (1): Critiques of World Models

Title: Critiques of World Models Kritik an Weltmodellen 世界模式的证明 2507.05169v1

Authors (4): Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu

World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of “hypothetical thinking” in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervision learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

nan


Article 710

Title@2025-07-07 (1): Can Local Representation Alignment RNNs Solve Temporal Tasks?

Title: Can Local Representation Alignment RNNs Solve Temporal Tasks? Kann die lokale Vertretung RNNs zeitliche Aufgaben lösen? 当地代表协调区域网络能够解决时空任务吗? 2504.13531v2

Authors (2): Nikolay Manchev, Luis C. Garcia-Peraza-Herrera

Recurrent Neural Networks (RNNs) are commonly used for real-time processing, streaming data, and cases where the amount of training samples is limited. Backpropagation Through Time (BPTT) is the predominant algorithm for training RNNs; however, it is frequently criticized for being prone to exploding and vanishing gradients and being biologically implausible. In this paper, we present and evaluate a target propagation-based method for RNNs, which uses local updates and seeks to reduce the said instabilities. Having stable RNN models increases their practical use in a wide range of fields such as natural language processing, time-series forecasting, anomaly detection, control systems, and robotics. The proposed solution uses local representation alignment (LRA). We thoroughly analyze the performance of this method, experiment with normalization and different local error functions, and invalidate certain assumptions about the behavior of this type of learning. Namely, we demonstrate that despite the decomposition of the network into sub-graphs, the model still suffers from vanishing gradients. We also show that gradient clipping as proposed in LRA has little to no effect on network performance. This results in an LRA RNN model that is very difficult to train due to vanishing gradients. We address this by introducing gradient regularization in the direction of the update and demonstrate that this modification promotes gradient flow and meaningfully impacts convergence. We compare and discuss the performance of the algorithm, and we show that the regularized LRA RNN considerably outperforms the unregularized version on three landmark tasks: temporal order, 3-bit temporal order, and random permutation.

nan


Article 711

Title: Language Models can Self-Improve at State-Value Estimation for Better Search Sprachmodelle können sich bei State-Value-Schätzung für eine bessere Suche selbst improvisieren 语言模型可以在国家估价估算中自我改进,以更好地搜索 2503.02878v2

Authors (2): Ethan Mendes, Alan Ritter

Collecting ground-truth rewards or human demonstrations for multi-step reasoning tasks is often prohibitively expensive and time consuming, especially in interactive domains like web tasks. To address this bottleneck, we present self-taught lookahead (STL), a self-supervised method that leverages state-transition dynamics to improve a value model capable of effectively guiding language model-controlled search without any labeled data. We find that moderately sized (8 billion parameters) open-weight value models improved with STL can match the performance of using a gpt-4o value model. Furthermore, we find that specialized value models learned with STL can be deployed with computationally lightweight search algorithms, achieving performance that matches that of more expensive tree search methods, while reducing costs by an order of magnitude.

nan


Article 712

Title@2025-07-07 (1): A Dynamical Systems Perspective on the Analysis of Neural Networks

Title: A Dynamical Systems Perspective on the Analysis of Neural Networks Eine dynamische Systemperspektive zur Analyse neuraler Netzwerke 神经网络分析动态系统视角 2507.05164v1

Authors (4): Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

nan


Article 713

Title@2025-07-07 (1): Role of scrambling and noise in temporal information processing with quantum systems

Title: Role of scrambling and noise in temporal information processing with quantum systems Rolle von Scrambling und Rauschen in der zeitlichen Informationsverarbeitung mit Quantensystemen 利用量子系统处理时空信息的作用 2505.10080v2

Authors (6): Weijie Xiong, Zoë Holmes, Armando Angrisani, Yudai Suzuki, Thiparat Chotibut, Supanut Thanasilp

Scrambling quantum systems have attracted attention as effective substrates for temporal information processing. Here we consider a quantum reservoir processing framework that captures a broad range of physical computing models with quantum systems. We examine the scalability and memory retention of the model with scrambling reservoirs modelled by high-order unitary designs in both noiseless and noisy settings. In the former regime, we show that measurement readouts become exponentially concentrated with increasing reservoir size, yet strikingly do not worsen with the reservoir iterations. Thus, while repeatedly reusing a small scrambling reservoir with quantum data might be viable, scaling up the problem size deteriorates generalization unless one can afford an exponential shot overhead. In contrast, the memory of early inputs and initial states decays exponentially in both reservoir size and reservoir iterations. In the noisy regime, we also prove that memory decays exponentially in time for local noisy channels. These results required us to introduce new proof techniques for bounding concentration in temporal quantum models.

nan


Article 714

Title@2025-07-07 (1): Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks

Title: Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks Skalierbares Multi-Task-Lernen für die Rekonstruktion von Teilchenkollisionsereignissen mit heterogenen Graphen-Neuralnetzwerken 可缩放多任务学习,用于与异形图形神经网络相配合的粒子碰撞重建活动 2504.21844v2

Authors (8): William Sutcliffe, Marta Calvi, Simone Capelli, Jonas Eschle, Julián García Pardiñas, Abhijit Mathad, Azusa Uzuki, Nicola Serra

The growing luminosity frontier at the Large Hadron Collider is challenging the reconstruction and analysis of particle collision events. Increased particle multiplicities are straining latency and storage requirements at the data acquisition stage, while new complications are emerging, including higher background levels and more frequent particle vertex misassociations. This in turn necessitates the development of more holistic and scalable reconstruction methods that take advantage of recent advances in machine learning. We propose a novel Heterogeneous Graph Neural Network (HGNN) architecture featuring unique representations for diverse particle collision relationships and integrated graph pruning layers for scalability. Trained with a multi-task paradigm in an environment mimicking the LHCb experiment, this HGNN significantly improves beauty hadron reconstruction performance. Notably, it concurrently performs particle vertex association and graph pruning within a single framework. We quantify reconstruction and pruning performance, demonstrate enhanced inference time scaling with event complexity, and mitigate potential performance loss using a weighted message passing scheme.

nan


Article 715

Title@2025-07-07 (1): Robust Molecular Property Prediction via Densifying Scarce Labeled Data

Title: Robust Molecular Property Prediction via Densifying Scarce Labeled Data Robuste Vorhersage der molekularen Eigenschaften durch verdichtende, mit Beschriftung versehene Daten 通过冷凝隐蔽标签数据进行强力分子属性预测 2506.11877v2

Authors (4): Jina Kim, Jeffrey Willette, Bruno Andreis, Sung Ju Hwang

A widely recognized limitation of molecular prediction models is their reliance on structures observed in the training data, resulting in poor generalization to out-of-distribution compounds. Yet in drug discovery, the compounds most critical for advancing research often lie beyond the training set, making the bias toward the training data particularly problematic. This mismatch introduces substantial covariate shift, under which standard deep learning models produce unstable and inaccurate predictions. Furthermore, the scarcity of labeled data, stemming from the onerous and costly nature of experimental validation, further exacerbates the difficulty of achieving reliable generalization. To address these limitations, we propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data, enabling the model to meta-learn how to generalize beyond the training distribution. We demonstrate significant performance gains on challenging real-world datasets with substantial covariate shift, supported by t-SNE visualizations highlighting our interpolation method.

nan


Article 716

Title@2025-07-07 (1): Distributional Diffusion Models with Scoring Rules

Title: Distributional Diffusion Models with Scoring Rules Verteilungs-Diffusionsmodelle mit Bewertungsregeln 配有Scorcing 规则的分发传播模型 2502.02483v3

Authors (7): Valentin De Bortoli, Alexandre Galashov, J. Swaroop Guntupalli, Guangyao Zhou, Kevin Murphy, Arthur Gretton, Arnaud Doucet

Diffusion models generate high-quality synthetic data. They operate by defining a continuous-time forward process which gradually adds Gaussian noise to data until fully corrupted. The corresponding reverse process progressively “denoises” a Gaussian sample into a sample from the data distribution. However, generating high-quality outputs requires many discretization steps to obtain a faithful approximation of the reverse process. This is expensive and has motivated the development of many acceleration methods. We propose to accomplish sample generation by learning the posterior {\em distribution} of clean data samples given their noisy versions, instead of only the mean of this distribution. This allows us to sample from the probability transitions of the reverse process on a coarse time scale, significantly accelerating inference with minimal degradation of the quality of the output. This is accomplished by replacing the standard regression loss used to estimate conditional means with a scoring rule. We validate our method on image and robot trajectory generation, where we consistently outperform standard diffusion models at few discretization steps.

nan


Article 717

Title@2025-07-07 (1): OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

Title: OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows OGF: Eine Online-Gradient-Flow-Methode zur Optimierung des statistischen Steady-State-Zeitdurchschnitts unbeständiger Turbulentenströme OGGF:优化不稳定波动流量统计稳定时间平均值的在线渐进流动方法 2507.05149v1

Authors (4): Tom Hickling, Jonathan F. MacArt, Justin Sirignano, Den Waidmann

Turbulent flows are chaotic and unsteady, but their statistical distribution converges to a statistical steady state. Engineering quantities of interest typically take the form of time-average statistics such as $ \frac{1}{t} \int_0^t f ( u(x,\tau; \theta) ) d\tau \overset{t \rightarrow \infty}{\rightarrow} F(x; \theta)$, where $u(x,t; \theta)$ are solutions of the Navier–Stokes equations with parameters $\theta$. Optimizing over $F(x; \theta)$ has many engineering applications including geometric optimization, flow control, and closure modeling. However, this remains an open challenge, as existing computational approaches are incapable of scaling to physically representative numbers of grid points. The fundamental obstacle is the chaoticity of turbulent flows: gradients calculated with the adjoint method diverge exponentially as $t \rightarrow \infty$. We develop a new online gradient-flow (OGF) method that is scalable to large degree-of-freedom systems and enables optimizing for the steady-state statistics of chaotic, unsteady, turbulence-resolving simulations. The method forward-propagates an online estimate for the gradient of $F(x; \theta)$ while simultaneously performing online updates of the parameters $\theta$. A key feature is the fully online nature of the algorithm to facilitate faster optimization progress and its combination with a finite-difference estimator to avoid the divergence of gradients due to chaoticity. The proposed OGF method is demonstrated for optimizations over three chaotic ordinary and partial differential equations: the Lorenz-63 equation, the Kuramoto–Sivashinsky equation, and Navier–Stokes solutions of compressible, forced, homogeneous isotropic turbulence. In each case, the OGF method successfully reduces the loss based on $F(x; \theta)$ by several orders of magnitude and accurately recovers the optimal parameters.

nan


Article 718

Title@2025-07-07 (1): Dataless Neural Networks for Resource-Constrained Project Scheduling

Title: Dataless Neural Networks for Resource-Constrained Project Scheduling Datenlose Neurale Netzwerke für ressourcengebundene Projektplanung 无数据神经网络 资源培训项目日程安排 2507.05322v1

Authors (1): Marc Bara

Dataless neural networks represent a paradigm shift in applying neural architectures to combinatorial optimization problems, eliminating the need for training datasets by encoding problem instances directly into network parameters. Despite the pioneering work of Alkhouri et al. (2022) demonstrating the viability of dataless approaches for the Maximum Independent Set problem, our comprehensive literature review reveals that no published work has extended these methods to the Resource-Constrained Project Scheduling Problem (RCPSP). This paper addresses this gap by presenting the first dataless neural network approach for RCPSP, providing a complete mathematical framework that transforms discrete scheduling constraints into differentiable objectives suitable for gradient-based optimization. Our approach leverages smooth relaxations and automatic differentiation to unlock GPU parallelization for project scheduling, traditionally a domain of sequential algorithms. We detail the mathematical formulation for both precedence and renewable resource constraints, including a memory-efficient dense time-grid representation. Implementation and comprehensive experiments on PSPLIB benchmark instances (J30, J60, and J120) are currently underway, with empirical results to be reported in an updated version of this paper.

nan


Article 719

Title@2025-07-07 (1): Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings

Title: Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings Pseudo-Likelihood erzeugt assoziative Erinnerungen, die auch für asymmetrische Kupplungen verallgemeinern können 优多似近似产生连带记忆,即使对非对称联结而言,也能加以概括化。 2507.05147v1

Authors (5): Francesco D’Amico, Dario Bocchi, Luca Maria Del Bono, Saverio Rossi, Matteo Negri

Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whose basins of attraction exceed those of any classical Hopfield rule. We explain quantitatively this effect on uncorrelated random patterns. Moreover, we show that, for different structured datasets coming from computer science (random feature model, MNIST), physics (spin glasses) and biology (proteins), as the number of training examples increases the learned network goes beyond memorization, developing meaningful attractors with non-trivial correlations with test examples, thus showing the ability to generalize. Our results therefore reveal pseudo-likelihood works both as an efficient inference tool and as a principled mechanism for memory and generalization.

nan


Article 720

Title@2025-07-07 (1): VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems

Title: VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems VERITAS: Verifizierung und Erklärung der Realität in Bildern für Transparenz in KI-Systemen VERITAS: 核实和解释图像的真实性,促进AI系统透明度 2507.05146v1

Authors (6): Aadi Srivastava, Vignesh Natarajkumar, Utkarsh Bheemanaboyna, Devisree Akashapu, Nagraj Gaonkar, Archit Joshi

The widespread and rapid adoption of AI-generated content, created by models such as Generative Adversarial Networks (GANs) and Diffusion Models, has revolutionized the digital media landscape by allowing efficient and creative content generation. However, these models also blur the difference between real images and AI-generated synthetic images, raising concerns regarding content authenticity and integrity. While many existing solutions to detect fake images focus solely on classification and higher-resolution images, they often lack transparency in their decision-making, making it difficult for users to understand why an image is classified as fake. In this paper, we present VERITAS, a comprehensive framework that not only accurately detects whether a small (32x32) image is AI-generated but also explains why it was classified that way through artifact localization and semantic reasoning. VERITAS produces human-readable explanations that describe key artifacts in synthetic images. We show that this architecture offers clear explanations of the basis of zero-shot synthetic image detection tasks. Code and relevant prompts can be found at https://github.com/V-i-g-n-e-s-h-N/VERITAS .

nan


Article 721

Title@2025-07-07 (1): A generalized Wasserstein-2 distance approach for efficient reconstruction of random field models using stochastic neural networks

Title: A generalized Wasserstein-2 distance approach for efficient reconstruction of random field models using stochastic neural networks Ein generalisierter Wasserstein-2-Distanzansatz zur effizienten Rekonstruktion von Zufallsfeldmodellen mit stochastischen neuronalen Netzwerken 利用随机现场模型利用随机神经神经网络有效重建的通用瓦森斯坦-2远程方法 2507.05143v1

Authors (2): Mingtao Xia, Qijing Shen

In this work, we propose a novel generalized Wasserstein-2 distance approach for efficiently training stochastic neural networks to reconstruct random field models, where the target random variable comprises both continuous and categorical components. We prove that a stochastic neural network can approximate random field models under a Wasserstein-2 distance metric under nonrestrictive conditions. Furthermore, this stochastic neural network can be efficiently trained by minimizing our proposed generalized local squared Wasserstein-2 loss function. We showcase the effectiveness of our proposed approach in various uncertainty quantification tasks, including classification, reconstructing the distribution of mixed random variables, and learning complex noisy dynamical systems from spatiotemporal data.

nan


Article 722

Title@2025-07-07 (1): Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications

Title: Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications Hardware-effiziente tragbare probabilistische Inferenz für TinyML Neurosymbolische KI-Anwendungen TinyML Nerosymblic AI 应用的硬件高效可移动可移动概率推推 2507.05141v1

Authors (3): Jelin Leslin, Martin Trapp, Martin Andraud

Neurosymbolic AI (NSAI) has recently emerged to mitigate limitations associated with deep learning (DL) models, e.g. quantifying their uncertainty or reason with explicit rules. Hence, TinyML hardware will need to support these symbolic models to bring NSAI to embedded scenarios. Yet, although symbolic models are typically compact, their sparsity and computation resolution contrasts with low-resolution and dense neuro models, which is a challenge on resource-constrained TinyML hardware severely limiting the size of symbolic models that can be computed. In this work, we remove this bottleneck leveraging a tight hardware/software integration to present a complete framework to compute NSAI with TinyML hardware. We focus on symbolic models realized with tractable probabilistic circuits (PCs), a popular subclass of probabilistic models for hardware integration. This framework: (1) trains a specific class of hardware-efficient \emph{deterministic} PCs, chosen for the symbolic task; (2) \emph{compresses} this PC until it can be computed on TinyML hardware with minimal accuracy degradation, using our $n^{th}$-root compression technique, and (3) \emph{deploys} the complete NSAI model on TinyML hardware. Compared to a 64b precision baseline necessary for the PC without compression, our workflow leads to significant hardware reduction on FPGA (up to 82.3\% in FF, 52.6\% in LUTs, and 18.0\% in Flash usage) and an average inference speedup of 4.67x on ESP32 microcontroller.

nan


Article 723

Title@2025-07-07 (1): Inductive randomness predictors: beyond conformal

Title: Inductive randomness predictors: beyond conformal Induktive Zufallsvorhersagen: jenseits von konform 感导随机性预测值:超出符合 2503.02803v2

Authors (1): Vladimir Vovk

This paper introduces inductive randomness predictors, which form a proper superset of inductive conformal predictors but have the same principal property of validity under the assumption of randomness (i.e., of IID data). It turns out that every non-trivial inductive conformal predictor is strictly dominated by an inductive randomness predictor, although the improvement is not great, at most a factor of $\mathrm{e}\approx2.72$ in the case of e-prediction. The dominating inductive randomness predictors are more complicated and more difficult to compute; besides, an improvement by a factor of $\mathrm{e}$ is rare. Therefore, this paper does not suggest replacing inductive conformal predictors by inductive randomness predictors and only calls for a more detailed study of the latter.

nan


Article 724

Title@2025-07-07 (1): Deep Learning to Automate Parameter Extraction and Model Fitting of Two-Dimensional Transistors

Title: Deep Learning to Automate Parameter Extraction and Model Fitting of Two-Dimensional Transistors Deep Learning zur Automatisierung der Parameterextraktion und Modellmontage von zweidimensionalen Transistoren 深入学习自动取用参数自动提取和两维晶体管模型 2507.05134v1

Authors (11): Robert K. A. Bennett, Jan-Lucas Uslu, Harmon F. Gault, Asir Intisar Khan, Lauren Hoang, Tara Peña, Kathryn Neilson, Young Suh Song, Zhepeng Zhang, Andrew J. Mannix, Eric Pop

We present a deep learning approach to extract physical parameters (e.g., mobility, Schottky contact barrier height, defect profiles) of two-dimensional (2D) transistors from electrical measurements, enabling automated parameter extraction and technology computer-aided design (TCAD) fitting. To facilitate this task, we implement a simple data augmentation and pre-training approach by training a secondary neural network to approximate a physics-based device simulator. This method enables high-quality fits after training the neural network on electrical data generated from physics-based simulations of ~500 devices, a factor >40$\times$ fewer than other recent efforts. Consequently, fitting can be achieved by training on physically rigorous TCAD models, including complex geometry, self-consistent transport, and electrostatic effects, and is not limited to computationally inexpensive compact models. We apply our approach to reverse-engineer key parameters from experimental monolayer WS$_2$ transistors, achieving a median coefficient of determination ($R^2$) = 0.99 when fitting measured electrical data. We also demonstrate that this approach generalizes and scales well by reverse-engineering electrical data on high-electron-mobility transistors while fitting 35 parameters simultaneously. To facilitate future research on deep learning approaches for inverse transistor design, we have published our code and sample data sets online.

nan


Article 725

Title@2025-07-07 (1): Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions

Title: Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions Federated Learning for Big Data: Eine Umfrage zu Chancen, Anwendungen und Zukunftsrichtungen 大数据联邦学习:机会、应用和未来方向调查 2110.04160v3

Authors (9): Thippa Reddy Gadekallu, Quoc-Viet Pham, Thien Huynh-The, Hailin Feng, Kai Fang, Sharnil Pandya, Madhusanka Liyanage, Wei Wang, Thanh Thi Nguyen

In the recent years, generation of data have escalated to extensive dimensions and big data has emerged as a propelling force in the development of various machine learning advances and internet-of-things (IoT) devices. In this regard, the analytical and learning tools that transport data from several sources to a central cloud for its processing, training, and storage enable realization of the potential of big data. Nevertheless, since the data may contain sensitive information like banking account information, government information, and personal information, these traditional techniques often raise serious privacy concerns. To overcome such challenges, Federated Learning (FL) emerges as a sub-field of machine learning that focuses on scenarios where several entities (commonly termed as clients) work together to train a model while maintaining the decentralisation of their data. Although enormous efforts have been channelized for such studies, there still exists a gap in the literature wherein an extensive review of FL in the realm of big data services remains unexplored. The present paper thus emphasizes on the use of FL in handling big data and related services which encompasses comprehensive review of the potential of FL in big data acquisition, storage, big data analytics and further privacy preservation. Subsequently, the potential of FL in big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media are also explored. The paper also highlights various projects pertaining to FL-big data and discusses the associated challenges related to such implementations. This acts as a direction of further research encouraging the development of plausible solutions.

nan


Article 726

Title@2025-07-07 (1): SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction

Title: SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction SMART: Simulierte Schüler ausgerichtet mit Item Response Theorie für Frage Schwierigkeitsvorhersage SMART: 与 “ 问题困难预测 “ 项目反应理论一致的模拟学生 2507.05129v1

Authors (5): Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew Lan

Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get item difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with an LLM-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on a real-world student response dataset, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.

nan


Article 727

Title@2025-07-07 (1): A Concise Lyapunov Analysis of Nesterov’s Accelerated Gradient Method

Title: A Concise Lyapunov Analysis of Nesterov’s Accelerated Gradient Method Eine kurze Lyapunov-Analyse der beschleunigten Gradientenmethode von Nesterov Nesterov加速梯度方法的简明Lyapunov分析 2502.17373v3

Authors (1): Jun Liu

Convergence analysis of Nesterov’s accelerated gradient method has attracted significant attention over the past decades. While extensive work has explored its theoretical properties and elucidated the intuition behind its acceleration, a simple and direct proof of its convergence rates is still lacking. We provide a concise Lyapunov analysis of the convergence rates of Nesterov’s accelerated gradient method for both general convex and strongly convex functions.

nan


Article 728

Title@2025-07-07 (1): LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks

Title: LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks LVM4CSI: Direkte Anwendung von vortrainierten Large Vision-Modellen für Wireless Channel-Aufgaben LVM4CSI:无线频道任务能够直接应用培训前大型愿景模型 2507.05121v1

Authors (5): Jiajia Guo, Peiwen Jiang, Chao-Kai Wen, Shi Jin, Jun Zhang

Accurate channel state information (CSI) is critical to the performance of wireless communication systems, especially with the increasing scale and complexity introduced by 5G and future 6G technologies. While artificial intelligence (AI) offers a promising approach to CSI acquisition and utilization, existing methods largely depend on task-specific neural networks (NNs) that require expert-driven design and large training datasets, limiting their generalizability and practicality. To address these challenges, we propose LVM4CSI, a general and efficient framework that leverages the structural similarity between CSI and computer vision (CV) data to directly apply large vision models (LVMs) pre-trained on extensive CV datasets to wireless tasks without any fine-tuning, in contrast to large language model-based methods that generally necessitate fine-tuning. LVM4CSI maps CSI tasks to analogous CV tasks, transforms complex-valued CSI into visual formats compatible with LVMs, and integrates lightweight trainable layers to adapt extracted features to specific communication objectives. We validate LVM4CSI through three representative case studies, including channel estimation, human activity recognition, and user localization. Results demonstrate that LVM4CSI achieves comparable or superior performance to task-specific NNs, including an improvement exceeding 9.61 dB in channel estimation and approximately 40% reduction in localization error. Furthermore, it significantly reduces the number of trainable parameters and eliminates the need for task-specific NN design.

nan


Article 729

Title@2025-07-07 (1): A Comparative Study of Machine Learning Algorithms for Stock Price Prediction Using Insider Trading Data

Title: A Comparative Study of Machine Learning Algorithms for Stock Price Prediction Using Insider Trading Data Eine vergleichende Studie des maschinellen Lernens Algorithmen für Aktienkursvorhersage mit Insider Trading-Daten 利用内部贸易数据进行股票价格预测的机器学习算法比较研究 2502.08728v2

Authors (2): Amitabh Chakravorty, Nelly Elsayed

The research paper empirically investigates several machine learning algorithms to forecast stock prices depending on insider trading information. Insider trading offers special insights into market sentiment, pointing to upcoming changes in stock prices. This study examines the effectiveness of algorithms like decision trees, random forests, support vector machines (SVM) with different kernels, and K-Means Clustering using a dataset of Tesla stock transactions. Examining past data from April 2020 to March 2023, this study focuses on how well these algorithms identify trends and forecast stock price fluctuations. The paper uses Recursive Feature Elimination (RFE) and feature importance analysis to optimize the feature set and, hence, increase prediction accuracy. While it requires substantially greater processing time than other models, SVM with the Radial Basis Function (RBF) kernel displays the best accuracy. This paper highlights the trade-offs between accuracy and efficiency in machine learning models and proposes the possibility of pooling multiple data sources to raise prediction performance. The results of this paper aim to help financial analysts and investors in choosing strong algorithms to optimize investment strategies.

nan


Article 730

Title@2025-07-07 (1): Fast online node labeling with graph subsampling

Title: Fast online node labeling with graph subsampling Schnelle Online-Node-Beschriftung mit Graphen-Subsampling 带有图形子抽样的快速在线节点标签 2503.16755v2

Authors (4): Yushen Huang, Ertai Luo, Reza Babenezhad, Yifan Sun

Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized approximate personalized page rank (APPR) solve sparse linear systems with complexity independent of graph size, but is in terms of the maximum node degree, which can be much larger in practice than the average node degree for real-world large graphs. In this paper, we consider an \emph{online subsampled APPR method}, where messages are intentionally dropped at random. We use tools from graph sparsifiers and matrix linear algebra to give approximation bounds on the graph’s spectral properties ($O(1/\epsilon^2)$ edges), and node classification performance (added $O(n\epsilon)$ overhead).

nan


Article 731

Title@2025-07-07 (1): CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation CLIP-geführte Backdoor-Verteidigung durch Entropie-basierte vergiftete Datensatztrennung CLIP-通过基于英基中毒数据集的分离来引导后门防御 2507.05113v1

Authors (5): Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP’s logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD’s exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

nan


Article 732

Title@2025-07-07 (1): DICE: Discrete inverse continuity equation for learning population dynamics

Title: DICE: Discrete inverse continuity equation for learning population dynamics DICE: Diskrete inverse Kontinuitätsgleichung für das Lernen von Populationsdynamik DICIC: 学习人口动态的分辨反反连续性方程 2507.05107v1

Authors (4): Tobias Blickhan, Jules Berman, Andrew Stuart, Benjamin Peherstorfer

We introduce the Discrete Inverse Continuity Equation (DICE) method, a generative modeling approach that learns the evolution of a stochastic process from given sample populations at a finite number of time points. Models learned with DICE capture the typically smooth and well-behaved population dynamics, rather than the dynamics of individual sample trajectories that can exhibit complex or even chaotic behavior. The DICE loss function is developed specifically to be invariant, even in discrete time, to spatially constant but time-varying spurious constants that can emerge during training; this invariance increases training stability and robustness. Generating a trajectory of sample populations with DICE is fast because samples evolve directly in the time interval over which the stochastic process is formulated, in contrast to approaches that condition on time and then require multiple sampling steps per time step. DICE is stable to train, in situations where other methods for learning population dynamics fail, and DICE generates representative samples with orders of magnitude lower costs than methods that have to condition on time. Numerical experiments on a wide range of problems from random waves, Vlasov-Poisson instabilities and high-dimensional chaos are included to justify these assertions.

nan


Article 733

Title@2025-07-07 (1): PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

Title: PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs PRING: Protein-Protein-Interaktionsvorhersage von Paaren zu Graphen neu denken 重新思考从对到图的蛋白质-蛋白质-蛋白质互动预测 2507.05101v1

Authors (12): Xinzhe Zheng, Hao Du, Fanding Xu, Jinzhe Li, Zhiyuan Liu, Wenkang Wang, Tao Chen, Wanli Ouyang, Stan Z. Li, Yan Lu, Nanqing Dong, Yang Zhang

Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model’s capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model’s capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.

nan


Article 734

Title@2025-07-07 (1): Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance

Title: Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance Beyond Features: Wie Dataset Design die Leistung von Multi-Agent Trajektorien beeinflusst 超越地貌外的地貌:数据集设计如何影响多要素轨迹预测性能 2507.05098v1

Authors (6): Tobias Demmler, Jakob Häringer, Andreas Tamke, Thao Dang, Alexander Hegai, Lars Mikelsons

Accurate trajectory prediction is critical for safe autonomous navigation, yet the impact of dataset design on model performance remains understudied. This work systematically examines how feature selection, cross-dataset transfer, and geographic diversity influence trajectory prediction accuracy in multi-agent settings. We evaluate a state-of-the-art model using our novel L4 Motion Forecasting dataset based on our own data recordings in Germany and the US. This includes enhanced map and agent features. We compare our dataset to the US-centric Argoverse 2 benchmark. First, we find that incorporating supplementary map and agent features unique to our dataset, yields no measurable improvement over baseline features, demonstrating that modern architectures do not need extensive feature sets for optimal performance. The limited features of public datasets are sufficient to capture convoluted interactions without added complexity. Second, we perform cross-dataset experiments to evaluate how effective domain knowledge can be transferred between datasets. Third, we group our dataset by country and check the knowledge transfer between different driving cultures.

nan


Article 735

Title@2025-07-07 (1): Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph

Title: Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph Erforschung semantischer Clusterung und Ähnlichkeit Suche nach heterogenen Verkehrsszenarien Graph 探索迭代交通情况图 2507.05086v1

Authors (4): Ferdinand Mütsch, Maximilian Zipfl, Nikolai Polley, J. Marius Zöllner

Scenario-based testing is an indispensable instrument for the comprehensive validation and verification of automated vehicles (AVs). However, finding a manageable and finite, yet representative subset of scenarios in a scalable, possibly unsupervised manner is notoriously challenging. Our work is meant to constitute a cornerstone to facilitate sample-efficient testing, while still capturing the diversity of relevant operational design domains (ODDs) and accounting for the “long tail” phenomenon in particular. To this end, we first propose an expressive and flexible heterogeneous, spatio-temporal graph model for representing traffic scenarios. Leveraging recent advances of graph neural networks (GNNs), we then propose a self-supervised method to learn a universal embedding space for scenario graphs that enables clustering and similarity search. In particular, we implement contrastive learning alongside a bootstrapping-based approach and evaluate their suitability for partitioning the scenario space. Experiments on the nuPlan dataset confirm the model’s ability to capture semantics and thus group related scenarios in a meaningful way despite the absence of discrete class labels. Different scenario types materialize as distinct clusters. Our results demonstrate how variable-length traffic scenarios can be condensed into single vector representations that enable nearest-neighbor retrieval of representative candidates for distinct scenario categories. Notably, this is achieved without manual labeling or bias towards an explicit objective such as criticality. Ultimately, our approach can serve as a basis for scalable selection of scenarios to further enhance the efficiency and robustness of testing AVs in simulation.

nan


Article 736

Title@2025-07-07 (1): Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Title: Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks Verteilungsabhängige Generalisierungs-Bounds für das Tuning linearer Regressions-Across-Aufgaben 调制线性递退跨任务时的视分配情况而定的通用宽度 2507.05084v1

Authors (3): Maria-Florina Balcan, Saumya Goyal, Dravyansh Sharma

Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the “niceness” of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

nan


Article 737

Title@2025-07-07 (1): Mirror Online Conformal Prediction with Intermittent Feedback

Title: Mirror Online Conformal Prediction with Intermittent Feedback Spiegel Online Konforme Vorhersage mit intermittierendem Feedback 附带间断反馈的 “ 镜 “ 在线非正式预测 2503.10345v5

Authors (3): Bowen Wang, Matteo Zecchin, Osvaldo Simeone

Online conformal prediction enables the runtime calibration of a pre-trained artificial intelligence model using feedback on its performance. Calibration is achieved through set predictions that are updated via online rules so as to ensure long-term coverage guarantees. While recent research has demonstrated the benefits of incorporating prior knowledge into the calibration process, this has come at the cost of replacing coverage guarantees with less tangible regret guarantees based on the quantile loss. This work introduces intermittent mirror online conformal prediction (IM-OCP), a novel runtime calibration framework that integrates prior knowledge, operates under potentially intermittent feedback, and features minimal memory complexity. IM-OCP guarantees long-term coverage and sub-linear regret, both of which hold deterministically for any given data sequence and in expectation with respect to the intermittent feedback.

nan


Article 738

Title@2025-07-07 (1): A dimensionality reduction technique based on the Gromov-Wasserstein distance

Title: A dimensionality reduction technique based on the Gromov-Wasserstein distance Eine Dimensionalitätsreduktionstechnik auf Basis der Gromov-Wasserstein-Distanz 基于Gromov-Wasserstein距离的降低维度技术 2501.13732v2

Authors (3): Rafael P. Eufrazio, Eduardo Fernandes Montesuma, Charles C. Cavalcante

Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation theory and the Gromov-Wasserstein distance. We offer a new probabilistic view of the classical Multidimensional Scaling (MDS) algorithm and the nonlinear dimensionality reduction algorithm, Isomap (Isometric Mapping or Isometric Feature Mapping) that extends the classical MDS, in which we use the Gromov-Wasserstein distance between the probability measure of high-dimensional data, and its low-dimensional representation. Through gradient descent, our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.

nan


Article 739

Title@2025-07-07 (1): Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference

Title: Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference Computation-Aware Gaussian Prozesse: Modellauswahl und lineare Zeitableitung 计算-软件高斯过程:模型选择和线性时间推算 2411.01036v2

Authors (6): Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, John P. Cunningham

Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables – at the cost of quadratic complexity – an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty – a fundamental prerequisite for optimal decision-making.

nan


Article 740

Title@2025-07-07 (1): Replacing thinking with tool usage enables reasoning in small language models

Title: Replacing thinking with tool usage enables reasoning in small language models Das Ersetzen des Denkens durch den Werkzeuggebrauch ermöglicht die Argumentation in kleinen Sprachmodellen 以使用工具取代思维,使小语言模式推理成为可能 2507.05065v1

Authors (3): Corrado Rainone, Tim Bakker, Roland Memisevic

Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of “thoughts” expressed in natural language. In this paper, we propose to instead format these tokens as a multi-turn interaction trace with a stateful tool. At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL. We benchmark this approach on the problem of repairing malfunctioning Python code, and show that this constrained setup allows for faster sampling of experience and a denser reward signal, allowing even models of size up to 3B parameters to learn how to proficiently expend additional compute on the task.

nan


Article 741

Title@2025-07-07 (1): Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Title: Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes Vecchia-Inducing-Points Voll-Scale-Annäherungen für Gauß-Prozesse 高斯进程的全面接近度 2507.05064v1

Authors (3): Tim Gyger, Reinhard Furrer, Fabio Sigrist

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

nan


Article 742

Title@2025-07-07 (1): AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics

Title: AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics KI-getriebene Zytomorphologie Bildsynthese für medizinische Diagnostik AI-Driven 用于医学诊断的细胞形态图像合成 2507.05063v1

Authors (3): Jan Carreras Boada, Rao Muhammad Umer, Carsten Marr

Biomedical datasets often contain a large sample imbalance and are subject to strict privacy constraints, which together hinder the development of accurate machine learning models. One potential solution is to generate synthetic images, as this can improve data availability while preserving patient privacy. However, it remains difficult to generate synthetic images of sufficient quality for training robust classifiers. In this work, we focus on the classification of single white blood cells, a key component in the diagnosis of hematological diseases such as acute myeloid leukemia (AML), a severe blood cancer. We demonstrate how synthetic images generated with a fine-tuned stable diffusion model using LoRA weights when guided by real few-shot samples of the target white blood cell classes, can enhance classifier performance for limited data. When training a ResNet classifier, accuracy increased from 27.3\% to 78.4\% (+51.1\%) by adding 5000 synthetic images per class to a small and highly imbalanced real dataset. For a CLIP-based classifier, the accuracy improved from 61.8\% to 76.8\% (+15.0\%). The synthetic images are highly similar to real images, and they can help overcome dataset limitations, enhancing model generalization. Our results establish synthetic images as a tool in biomedical research, improving machine learning models, and facilitating medical diagnosis and research.

nan


Article 743

Title@2025-07-07 (1): Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Title: Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems Follow-the-Perturbed-Leader nähert sich Best-of-Both-Worlds für die m-Set Semi-Bandit-Probleme M-Set半银行问题最佳世界最佳办法 2504.07307v4

Authors (4): Jingxin Zhan, Yuchen Xin, Chenjie Sun, Zhihua Zhang

We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr'echet perturbation also enjoys the near optimal regret bound $\mathcal{O}(\sqrt{nm}(\sqrt{d\log(d)}+m^{5/6}))$ in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.

nan


Article 744

Title@2025-07-07 (1): Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Title: Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v3

Authors (5): Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim

Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.

nan


Article 745

Title@2025-07-07 (1): Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Title: Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v2

Authors (10): Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs’ analytical reasoning capabilities.

nan


Article 746

Title@2025-07-07 (1): Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens

Title: Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens Jenseits von Skalierungskurven: Interne Dynamik neuraler Netzwerke durch die NTK-Lens 超越缩放曲线:通过NTK镜头神经网络的内部动态 2507.05035v1

Authors (4): Konstantin Nikolaou, Sven Krippendorf, Samuel Tovey, Christian Holm

Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK). This analysis establishes a link between performance scaling and the internal dynamics of neural networks. Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior. This demonstrates that performance scaling alone is insufficient for understanding the underlying mechanisms of neural networks. We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models. To this end, we investigate how feature learning is lost as the model width increases and quantify the transition between kernel-driven and feature-driven scaling regimes. We identify the maximum model width that supports feature learning, which, in our setups, we find to be more than ten times smaller than typical large language model widths.

nan


Article 747

Title@2025-07-07 (1): A Generative Diffusion Model for Amorphous Materials

Title: A Generative Diffusion Model for Amorphous Materials Ein Generatives Diffusionsmodell für amorphe Materialien 无定形材料发源扩散模型 2507.05024v1

Authors (2): Kai Yang, Daniel Schwalbe-Koda

Generative models show great promise for the inverse design of molecules and inorganic crystals, but remain largely ineffective within more complex structures such as amorphous materials. Here, we present a diffusion model that reliably generates amorphous structures up to 1000 times faster than conventional simulations across processing conditions, compositions, and data sources. Generated structures recovered the short- and medium-range order, sampling diversity, and macroscopic properties of silica glass, as validated by simulations and an information-theoretical strategy. Conditional generation allowed sampling large structures at low cooling rates of 10$^{-2}$ K/ps to uncover a ductile-to-brittle transition and mesoporous silica structures. Extension to metallic glassy systems accurately reproduced local structures and properties from both computational and experimental datasets, demonstrating how synthetic data can be generated from characterization results. Our methods provide a roadmap for the design and simulation of amorphous materials previously inaccessible to computational methods.

nan


Article 748

Title@2025-07-07 (1): A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation

Title: A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation Eine neuartige automatische Echtzeit-Bewegungsverfolgungsmethode in der MRT-geführten Radiotherapie mit erweitertem Tracking-Learning-Detection-Framework mit automatischer Segmentierung 利用带有自动分割的强化跟踪-学习-检测框架,在MRI-制导放射治疗中使用新颖自动实时实时运动跟踪方法 2411.07503v3

Authors (9): Shengqi Chen, Zilin Wang, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Jiayun Chen, Guohua Wu, Yuan Tang

Background and Purpose: Accurate motion tracking in MRI-guided Radiotherapy (MRIgRT) is essential for effective treatment delivery. This study aimed to enhance motion tracking precision in MRIgRT through an automatic real-time markerless tracking method using an enhanced Tracking-Learning-Detection (ETLD) framework with automatic segmentation. Materials and Methods: We developed a novel MRIgRT motion tracking and segmentation method by integrating the ETLD framework with an improved Chan-Vese model (ICV), named ETLD+ICV. The ETLD framework was upgraded for real-time cine MRI, including advanced image preprocessing, no-reference image quality assessment, an enhanced median-flow tracker, and a refined detector with dynamic search region adjustments. ICV was used for precise target volume coverage, refining the segmented region frame by frame using tracking results, with key parameters optimized. The method was tested on 3.5D MRI scans from 10 patients with liver metastases. Results: Evaluation of 106,000 frames across 77 treatment fractions showed sub-millimeter tracking errors of less than 0.8mm, with over 99% precision and 98% recall for all subjects in the Beam Eye View(BEV)/Beam Path View(BPV) orientation. The ETLD+ICV method achieved a dice global score of more than 82% for all subjects, demonstrating the method’s extensibility and precise target volume coverage. Conclusion: This study successfully developed an automatic real-time markerless motion tracking method for MRIgRT that significantly outperforms current methods. The novel method not only delivers exceptional precision in tracking and segmentation but also shows enhanced adaptability to clinical demands, making it an indispensable asset in improving the efficacy of radiotherapy treatments.

nan


Article 749

Title@2025-07-07 (1): Meta-Learning Transformers to Improve In-Context Generalization

Title: Meta-Learning Transformers to Improve In-Context Generalization Meta-Learning Transformer zur Verbesserung der In-Context-Verallgemeinerung 利用元学习变换器改进内文通化 2507.05019v1

Authors (6): Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

nan


Article 750

Title@2025-07-07 (1): Relative Overfitting and Accept-Reject Framework

Title: Relative Overfitting and Accept-Reject Framework Relativer Überbau und Akzeptanz-Abwehrrahmen 相对超配和接受-接受-拒绝框架 2505.07783v4

Authors (2): Yanxin Liu, Yunqi Zhang

The scaling of Large Language Models (LLMs) currently faces significant challenges. Model assembly is widely considered a promising solution to break through these performance bottlenecks. However, current ensembling methods are primarily guided by the statistical expectation that combining multiple models over large samples will lead to performance gains. We propose an ensemble framework that transitions from such stochastic, sample-dependent methods to a regular, controllable approach based on fine-grained model segmentation. This regularity governs how models are segmented to ensure performance improvement, how the magnitude of this improvement varies with model selection, and what factors determine its theoretical maximum. To formalize this pattern, we introduce the concept of’relative overfitting,’ which is derived from the performance discrepancies between constituent models and builds a bridge between ensemble outcomes and the inherent attributes of these models. We detail the patterns of this framework within the domain of NLP and briefly describe its extensibility to other fields, such as computer vision (CV) and AI for science. Our approach was validated using both custom-built and pre-trained mainstream models across diverse benchmarks, including language modeling, long-context tasks, and question-answering (QA). The results indicate that the ensemble rules we proposed are generally effective and that we provide a rigorous proof of these rules in certain experimental scenarios. The proposed framework offers a new perspective for understanding ensemble theory and provides a systematic approach to addressing the performance bottlenecks of LLMs.

nan


Article 751

Title@2025-07-07 (1): Moderate Adaptive Linear Units (MoLU)

Title: Moderate Adaptive Linear Units (MoLU) Mäßige adaptive Lineareinheiten (MoLU) 适应性线性线性单位(MoLU) 2302.13696v6

Authors (3): Hankyul Koh, Joon-hyuk Ko, Wonho Jhe

We propose a new high-performance activation function, Moderate Adaptive Linear Units (MoLU), for the deep neural network. The MoLU is a simple, beautiful and powerful activation function that can be a good main activation function among hundreds of activation functions. Because the MoLU is made up of the elementary functions, not only it is a diffeomorphism (i.e. analytic over whole domains), but also it reduces the training time.

nan


Article 752

Title@2025-07-07 (1): Conditional Graph Neural Network for Predicting Soft Tissue Deformation and Forces

Title: Conditional Graph Neural Network for Predicting Soft Tissue Deformation and Forces Bedingtes Graphen-Neural-Netzwerk zur Vorhersage weicher Gewebedeformationen und Kräfte 预测软质变形和力量有条件图图神经网络 2507.05315v1

Authors (4): Madina Kojanazarova, Florentin Bieder, Robin Sandkühler, Philippe C. Cattin

Soft tissue simulation in virtual environments is becoming increasingly important for medical applications. However, the high deformability of soft tissue poses significant challenges. Existing methods rely on segmentation, meshing and estimation of stiffness properties of tissues. In addition, the integration of haptic feedback requires precise force estimation to enable a more immersive experience. We introduce a novel data-driven model, a conditional graph neural network (cGNN) to tackle this complexity. Our model takes surface points and the location of applied forces, and is specifically designed to predict the deformation of the points and the forces exerted on them. We trained our model on experimentally collected surface tracking data of a soft tissue phantom and used transfer learning to overcome the data scarcity by initially training it with mass-spring simulations and fine-tuning it with the experimental data. This approach improves the generalisation capability of the model and enables accurate predictions of tissue deformations and corresponding interaction forces. The results demonstrate that the model can predict deformations with a distance error of 0.35$\pm$0.03 mm for deformations up to 30 mm and the force with an absolute error of 0.37$\pm$0.05 N for forces up to 7.5 N. Our data-driven approach presents a promising solution to the intricate challenge of simulating soft tissues within virtual environments. Beyond its applicability in medical simulations, this approach holds the potential to benefit various fields where realistic soft tissue simulations are required.

nan


Article 753

Title@2025-07-07 (1): Random weights of DNNs and emergence of fixed points

Title: Random weights of DNNs and emergence of fixed points Zufällige Gewichte von DNNs und Entstehung von festen Punkten DNN的随机权数和固定点的出现 2501.04182v2

Authors (3): L. Berlyand, O. Krupchytskyi, V. Slavin

This paper is concerned with a special class of deep neural networks (DNNs) where the input and the output vectors have the same dimension. Such DNNs are widely used in applications, e.g., autoencoders. The training of such networks can be characterized by their fixed points (FPs). We are concerned with the dependence of the FPs number and their stability on the distribution of randomly initialized DNNs’ weight matrices. Specifically, we consider the i.i.d. random weights with heavy and light-tail distributions. Our objectives are twofold. First, the dependence of FPs number and stability of FPs on the type of the distribution tail. Second, the dependence of the number of FPs on the DNNs’ architecture. We perform extensive simulations and show that for light tails (e.g., Gaussian), which are typically used for initialization, a single stable FP exists for broad types of architectures. In contrast, for heavy tail distributions (e.g., Cauchy), which typically appear in trained DNNs, a number of FPs emerge. We further observe that these FPs are stable attractors and their basins of attraction partition the domain of input vectors. Finally, we observe an intriguing non-monotone dependence of the number of fixed points $Q(L)$ on the DNNs’ depth $L$. The above results were first obtained for untrained DNNs with two types of distributions at initialization and then verified by considering DNNs in which the heavy tail distributions arise in training.

nan


Article 754

Title@2025-07-07 (1): Solar Flare Prediction Using LSTM and DLSTM with Sliding Window Pattern Recognition

Title: Solar Flare Prediction Using LSTM and DLSTM with Sliding Window Pattern Recognition Solarflare-Vorhersage mit LSTM und DLSTM mit Schiebefenstermustererkennung 使用 LSTM 和 DLSTM 和 DLSTM 进行太阳光电预测,并识别滑动窗口模式 2507.05313v1

Authors (3): Zeinab Hassani, Davud Mohammadpur, Hossein Safari

We investigate the use of Long Short-Term Memory (LSTM) and Decomposition-LSTM (DLSTM) networks, combined with an ensemble algorithm, to predict solar flare occurrences using time-series data from the GOES catalog. The dataset spans from 2003 to 2023 and includes 151,071 flare events. Among approximately possible patterns, 7,552 yearly pattern windows are identified, highlighting the challenge of long-term forecasting due to the Sun’s complex, self-organized criticality-driven behavior. A sliding window technique is employed to detect temporal quasi-patterns in both irregular and regularized flare time series. Regularization reduces complexity, enhances large flare activity, and captures active days more effectively. To address class imbalance, resampling methods are applied. LSTM and DLSTM models are trained on sequences of peak fluxes and waiting times from irregular time series, while LSTM and DLSTM, integrated with an ensemble approach, are applied to sliding windows of regularized time series with a 3-hour interval. Performance metrics, particularly TSS (0.74), recall (0.95) and the area under the curve (AUC=0.87) in the receiver operating characteristic (ROC), indicate that DLSTM with an ensemble approach on regularized time series outperforms other models, offering more accurate large-flare forecasts with fewer false errors compared to models trained on irregular time series. The superior performance of DLSTM is attributed to its ability to decompose time series into trend and seasonal components, effectively isolating random noise. This study underscores the potential of advanced machine learning techniques for solar flare prediction and highlights the importance of incorporating various solar cycle phases and resampling strategies to enhance forecasting reliability.

nan


Article 755

Title@2025-07-07 (1): The Case for Instance-Optimized LLMs in OLAP Databases

Title: The Case for Instance-Optimized LLMs in OLAP Databases Der Fall der instance-optimierten LLM in OLAP-Datenbanken OLAP数据库中经适当改造的理 案LMLM 2507.04967v1

Authors (2): Bardia Mohammadi, Laurent Bindschaedler

Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale – processing millions to billions of rows – remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized models tailored to each query’s specific needs using representative data samples. IOLM-DB reduces model footprints by up to 76% and increases throughput by up to 3.31$\times$ while maintaining accuracy through aggressive compression techniques, including quantization, sparsification, and structural pruning. We further show how our approach enables higher parallelism on existing hardware and seamlessly supports caching and batching strategies to reduce overheads. Our prototype demonstrates that leveraging LLM queries inside analytics systems is feasible at scale, opening new possibilities for future OLAP applications.

nan


Article 756

Title@2025-07-07 (1): Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning

Title: Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning Maskenannäherung Netz: Ein neuartiger Diffusions-Modellansatz zur Fernerkundung von Änderungsunterschriften 面罩接近网:遥感变化说明的新传播示范方法 2412.19179v3

Authors (6): Dongwei Sun, Jing Yao, Wu Xue, Changsheng Zhou, Pedram Ghamisi, Xiangyong Cao

Remote sensing image change description represents an innovative multimodal task within the realm of remote sensing processing.This task not only facilitates the detection of alterations in surface conditions, but also provides comprehensive descriptions of these changes, thereby improving human interpretability and interactivity.Current deep learning methods typically adopt a three stage framework consisting of feature extraction, feature fusion, and change localization, followed by text generation. Most approaches focus heavily on designing complex network modules but lack solid theoretical guidance, relying instead on extensive empirical experimentation and iterative tuning of network components. This experience-driven design paradigm may lead to overfitting and design bottlenecks, thereby limiting the model’s generalizability and adaptability.To address these limitations, this paper proposes a paradigm that shift towards data distribution learning using diffusion models, reinforced by frequency-domain noise filtering, to provide a theoretically motivated and practically effective solution to multimodal remote sensing change description.The proposed method primarily includes a simple multi-scale change detection module, whose output features are subsequently refined by a well-designed diffusion model.Furthermore, we introduce a frequency-guided complex filter module to boost the model performance by managing high-frequency noise throughout the diffusion process. We validate the effectiveness of our proposed method across several datasets for remote sensing change detection and description, showcasing its superior performance compared to existing techniques. The code will be available at \href{https://github.com/sundongwei}{MaskApproxNet}.

nan


Article 757

Title@2025-07-07 (1): Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems

Title: Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems Quantifying Robustness: Ein Benchmarking-Framework für Deep Learning Forecasting in Cyber-Physical Systems 量化强度:网络物理系统深度学习预测基准框架 2504.03494v2

Authors (4): Alexander Windmann, Henrik Steude, Daniel Boschmann, Oliver Niggemann

Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.

nan


Article 758

Title@2025-07-07 (1): ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints

Title: ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints ConBatch-BAL: Batch Bayesian Active Learning unter Budgetbeschränkungen BAL:预算限制下的Bayesian Batch Bayesian积极学习 2507.04929v1

Authors (3): Pablo G. Morato, Charalampos P. Andriotis, Seyran Khademi

Varying annotation costs among data points and budget constraints can hinder the adoption of active learning strategies in real-world applications. This work introduces two Bayesian active learning strategies for batch acquisition under constraints (ConBatch-BAL), one based on dynamic thresholding and one following greedy acquisition. Both select samples using uncertainty metrics computed via Bayesian neural networks. The dynamic thresholding strategy redistributes the budget across the batch, while the greedy one selects the top-ranked sample at each step, limited by the remaining budget. Focusing on scenarios with costly data annotation and geospatial constraints, we also release two new real-world datasets containing geolocated aerial images of buildings, annotated with energy efficiency or typology classes. The ConBatch-BAL strategies are benchmarked against a random acquisition baseline on these datasets under various budget and cost scenarios. The results show that the developed ConBatch-BAL strategies can reduce active learning iterations and data acquisition costs in real-world settings, and even outperform the unconstrained baseline solutions.

nan


Article 759

Title@2025-07-07 (1): Object-centric Denoising Diffusion Models for Physical Reasoning

Title: Object-centric Denoising Diffusion Models for Physical Reasoning Objektzentrierte Denoising-Diffusionsmodelle für physikalische Vernunft 物理理由解释的传播模型 2507.04920v1

Authors (5): Moritz Lange, Raphael C. Engelhardt, Wolfgang Konen, Andrew Melnik, Laurenz Wiskott

Reasoning about the trajectories of multiple, interacting objects is integral to physical reasoning tasks in machine learning. This involves conditions imposed on the objects at different time steps, for instance initial states or desired goal states. Existing approaches in physical reasoning generally rely on autoregressive modeling, which can only be conditioned on initial states, but not on later states. In fields such as planning for reinforcement learning, similar challenges are being addressed with denoising diffusion models. In this work, we propose an object-centric denoising diffusion model architecture for physical reasoning that is translation equivariant over time, permutation equivariant over objects, and can be conditioned on arbitrary time steps for arbitrary objects. We demonstrate how this model can solve tasks with multiple conditions and examine its performance when changing object numbers and trajectory lengths during inference.

nan


Article 760

Title@2025-07-07 (1): LaCoOT: Layer Collapse through Optimal Transport

Title: LaCoOT: Layer Collapse through Optimal Transport LaCoOT: Layer Collapse durch optimalen Transport LaCOOT: 通过最佳迁移折叠图层 2406.08933v2

Authors (3): Victor Quétu, Nour Hezbri, Enzo Tartaglione

Although deep neural networks are well-known for their outstanding performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, preventing their widespread adoption. In this paper, we present an optimal transport-based method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, achieving better performance/depth trade-off compared to existing techniques. We assess the effectiveness of our method on traditional image classification setups and extend it to generative image models. Our code is available at https://github.com/VGCQ/LaCoOT.

nan


Article 761

Title@2025-07-07 (1): Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning

Title: Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning Gradient Purification: Verteidigung gegen den Giftangriff im dezentralisierten Föderierten Lernen 逐步净化:在权力下放的联邦学习中防范毒物攻击 2501.04453v3

Authors (4): Bin Li, Xiaoye Miao, Yan Zhang, Jianwei Yin

Decentralized federated learning (DFL) is inherently vulnerable to data poisoning attacks, as malicious clients can transmit manipulated gradients to neighboring clients. Existing defense methods either reject suspicious gradients per iteration or restart DFL aggregation after excluding all malicious clients. They all neglect the potential benefits that may exist within contributions from malicious clients. In this paper, we propose a novel gradient purification defense, termed GPD, to defend against data poisoning attacks in DFL. It aims to separately mitigate the harm in gradients and retain benefits embedded in model weights, thereby enhancing overall model accuracy. For each benign client in GPD, a recording variable is designed to track historically aggregated gradients from one of its neighbors. It allows benign clients to precisely detect malicious neighbors and mitigate all aggregated malicious gradients at once. Upon mitigation, benign clients optimize model weights using purified gradients. This optimization not only retains previously beneficial components from malicious clients but also exploits canonical contributions from benign clients. We analyze the convergence of GPD, as well as its ability to harvest high accuracy. Extensive experiments demonstrate that, GPD is capable of mitigating data poisoning attacks under both iid and non-iid data distributions. It also significantly outperforms state-of-the-art defense methods in terms of model accuracy.

nan


Article 762

Title@2025-07-07 (1): Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes

Title: Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes Verstehen LLMs die Sicherheit ihrer Eingaben? Schulungsfreie Moderation über Latent Prototypen LLM女士了解其投入的安全性吗?通过中原原型进行无培训的调节 2502.16174v2

Authors (6): Maciej Chrabąszcz, Filip Szatkowski, Bartosz Wójcik, Jan Dubiński, Tomasz Trzciński, Sebastian Cygert

With the rise of LLMs, ensuring model safety and alignment has become a critical concern. While modern instruction-finetuned LLMs incorporate alignment during training, they still frequently require moderation tools to prevent unsafe behavior. The most common approach to moderation are guard models that flag unsafe inputs. However, guards require costly training and are typically limited to fixed-size, pre-trained options, making them difficult to adapt to evolving risks and resource constraints. We hypothesize that instruction-finetuned LLMs already encode safety-relevant information internally and explore training-free safety assessment methods that work with off-the-shelf models. We show that simple prompting allows models to recognize harmful inputs they would otherwise mishandle. We also demonstrate that safe and unsafe prompts are distinctly separable in the models’ latent space. Building on this, we introduce the Latent Prototype Moderator (LPM), a training-free moderation method that uses Mahalanobis distance in latent space to assess input safety. LPM is a lightweight, customizable add-on that generalizes across model families and sizes. Our method matches or exceeds state-of-the-art guard models across multiple safety benchmarks, offering a practical and flexible solution for scalable LLM moderation.

nan


Article 763

Title@2025-07-07 (1): Training-Conditional Coverage Bounds under Covariate Shift

Title: Training-Conditional Coverage Bounds under Covariate Shift Training-Conditional Coverage Bounds unter Kovariate Shift 共同变换下的培训 – – 有条件的覆盖范围 2405.16594v3

Authors (2): Mehrdad Pournaderi, Yu Xiang

Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results quantify the relationship between the quality of the prediction sets and the severity of distributional changes, and can potentially be used to compute more efficient prediction sets.

nan


Article 764

Title@2025-07-07 (1): When do World Models Successfully Learn Dynamical Systems?

Title: When do World Models Successfully Learn Dynamical Systems? Wann lernen Weltmodelle erfolgreich Dynamische Systeme? 世界模型何时成功学习动态系统? 2507.04898v1

Authors (7): Edmund Ross, Claudia Drygala, Leonhard Schwarz, Samir Kaiser, Francesca di Mare, Tobias Breiten, Hanno Gottschalk

In this work, we explore the use of compact latent representations with learned time dynamics (‘World Models’) to simulate physical systems. Drawing on concepts from control theory, we propose a theoretical framework that explains why projecting time slices into a low-dimensional space and then concatenating to form a history (‘Tokenization’) is so effective at learning physics datasets, and characterise when exactly the underlying dynamics admit a reconstruction mapping from the history of previous tokenized frames to the next. To validate these claims, we develop a sequence of models with increasing complexity, starting with least-squares regression and progressing through simple linear layers, shallow adversarial learners, and ultimately full-scale generative adversarial networks (GANs). We evaluate these models on a variety of datasets, including modified forms of the heat and wave equations, the chaotic regime 2D Kuramoto-Sivashinsky equation, and a challenging computational fluid dynamics (CFD) dataset of a 2D K'arm'an vortex street around a fixed cylinder, where our model is successfully able to recreate the flow.

nan


Article 765

Title@2025-07-07 (1): EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

Title: EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework EFRame: Deeper Reasoning über Exploration-Filter-Replay Stärkungs-Lernrahmen EFRame:通过探索-钻孔-回放-强化学习框架,更深的原因 2506.22200v3

Authors (8): Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yue Wang, Yuzhi Zhang

Recent advances in reinforcement learning (RL) have significantly enhanced the reasoning capabilities of large language models (LLMs). Group Relative Policy Optimization (GRPO), an efficient variant of PPO that lowers RL’s computational cost, still faces limited exploration, low sample efficiency and instability, constraining its performance on complex reasoning tasks. To address these limitations, we introduce EFRame, an Exploration-Filter-Replay framework that systematically augments GRPO along three critical dimensions. EFRame performs additional rollouts to explore high-quality trajectories, applies online filtering to eliminate low-quality samples that introduce noise and variance, and leverages experience replay to repeatedly exploit rare but informative samples. EFRame establishes a complete and stable learning cycle, guiding the model through a structured transition from exploration to convergence. Our experiments across a variety of reasoning benchmarks demonstrate that EFRame not only improves the robustness and efficiency of training, but also enables access to deeper reasoning capabilities that remain unattainable under vanilla GRPO. Furthermore, EFRame not only enables fine-grained categorization of training samples for deeper insight into their contributions, but also introduces an efficient and precise mechanism for entropy control, which is critical for balancing exploration and convergence in RL training. Our code is available at https://github.com/597358816/EFRame.

nan


Article 766

Title@2025-07-07 (1): Fine-tuning on simulated data outperforms prompting for agent tone of voice

Title: Fine-tuning on simulated data outperforms prompting for agent tone of voice Feinabstimmung auf simulierte Daten übertrifft Aufforderung für Agent Ton der Stimme 模拟数据优于工作表现的微调,激发了代理方声音的音调 2507.04889v1

Authors (2): Ingo Marquardt, Philippe Brule

Deploying language models (LMs) in customer-facing speech applications requires conversational fluency and adherence to specific stylistic guidelines. This can be challenging to achieve reliably using complex system prompts due to issues like instruction following limitations and in-context bias. This study investigates the effectiveness of fine-tuning versus system prompting for aligning LMs with a specific behavioral target: responding in a natural, conversational tone suitable for voice interactions. We fine-tuned a small, open-weights model (Llama3.2-1B-Instruct) using Low-Rank Adaptation (LoRA) on a synthetically generated dataset derived from Wikipedia. Additionally, we fine-tuned two closed-source models (gpt-4o-mini, gpt-4.1-mini). Our results demonstrate that fine-tuning outperformed system prompting, achieving a high percentage of conversational responses, even when trained on only 100 data samples. Semantic similarity analysis confirmed that fine-tuning did not degrade content quality. Interestingly, fine-tuning with 8-bit integer quantization converged faster towards the target style than using bfloat16 precision, potentially due to implicit regularization effects. We conclude that fine-tuning small, open-weights LMs on simulated data is a highly effective and data-efficient method for instilling specific stylistic behaviors, offering a preferable alternative to complex system prompting for practical applications requiring nuanced response styles.

nan


Article 767

Title@2025-07-07 (1): Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning

Title: Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning Jenseits der Trainingszeit Vergiftung: Komponenten-Ebene und Nachschulung Hintertüren im Deep Reinforcement Learning 超越培训时间中毒:深强化学习的构成部分和培训后后后门 2507.04883v1

Authors (5): Sanyam Vyas, Alberto Caron, Chris Hicks, Pete Burnap, Vasilios Mavroudis

Deep Reinforcement Learning (DRL) systems are increasingly used in safety-critical applications, yet their security remains severely underexplored. This work investigates backdoor attacks, which implant hidden triggers that cause malicious actions only when specific inputs appear in the observation space. Existing DRL backdoor research focuses solely on training-time attacks requiring unrealistic access to the training pipeline. In contrast, we reveal critical vulnerabilities across the DRL supply chain where backdoors can be embedded with significantly reduced adversarial privileges. We introduce two novel attacks: (1) TrojanentRL, which exploits component-level flaws to implant a persistent backdoor that survives full model retraining; and (2) InfrectroRL, a post-training backdoor attack which requires no access to training, validation, nor test data. Empirical and analytical evaluations across six Atari environments show our attacks rival state-of-the-art training-time backdoor attacks while operating under much stricter adversarial constraints. We also demonstrate that InfrectroRL further evades two leading DRL backdoor defenses. These findings challenge the current research focus and highlight the urgent need for robust defenses.

nan


Article 768

Title@2025-07-07 (1): Model-free Posterior Sampling via Learning Rate Randomization

Title: Model-free Posterior Sampling via Learning Rate Randomization Modellfreies Polandor-Proben über Lernrate Randomisierung 通过学习率随机调整进行无模型的Pos别处抽样 2310.18186v2

Authors (9): Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order $\widetilde{O}(\sqrt{H^{5}SAT})$, where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order $\widetilde{O}(H^{5/2} T^{(d_z+1)/(d_z+2)})$, where $d_z$ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.

nan


Article 769

Title@2025-07-07 (1): Adaptive Slimming for Scalable and Efficient Speech Enhancement

Title: Adaptive Slimming for Scalable and Efficient Speech Enhancement Adaptive Schlankheit für skalierbare und effiziente Sprachverbesserung 促进可缩放和高效语音增强的适应性缩小 2507.04879v1

Authors (5): Riccardo Miccini, Minje Kim, Clément Laroche, Luca Pezzarossa, Paris Smaragdis

Speech enhancement (SE) enables robust speech recognition, real-time communication, hearing aids, and other applications where speech quality is crucial. However, deploying such systems on resource-constrained devices involves choosing a static trade-off between performance and computational efficiency. In this paper, we introduce dynamic slimming to DEMUCS, a popular SE architecture, making it scalable and input-adaptive. Slimming lets the model operate at different utilization factors (UF), each corresponding to a different performance/efficiency trade-off, effectively mimicking multiple model sizes without the extra storage costs. In addition, a router subnet, trained end-to-end with the backbone, determines the optimal UF for the current input. Thus, the system saves resources by adaptively selecting smaller UFs when additional complexity is unnecessary. We show that our solution is Pareto-optimal against individual UFs, confirming the benefits of dynamic routing. When training the proposed dynamically-slimmable model to use 10% of its capacity on average, we obtain the same or better speech quality as the equivalent static 25% utilization while reducing MACs by 29%.

nan


Article 770

Title@2025-07-07 (1): Learned enclosure method for experimental EIT data

Title: Learned enclosure method for experimental EIT data Gelerntes Gehäuseverfahren für experimentelle EIT-Daten 经济转型期数据实验性数据累积方法 2504.11512v3

Authors (5): Sara Sippola, Siiri Rautio, Andreas Hauptmann, Takanori Ide, Samuli Siltanen

Electrical impedance tomography (EIT) is a non-invasive imaging method with diverse applications, including medical imaging and non-destructive testing. The inverse problem of reconstructing internal electrical conductivity from boundary measurements is nonlinear and highly ill-posed, making it difficult to solve accurately. In recent years, there has been growing interest in combining analytical methods with machine learning to solve inverse problems. In this paper, we propose a method for estimating the convex hull of inclusions from boundary measurements by combining the enclosure method proposed by Ikehata with neural networks. We demonstrate its performance using experimental data. Compared to the classical enclosure method with least squares fitting, the learned convex hull achieves superior performance on both simulated and experimental data.

nan


Article 771

Title@2025-07-07 (1): ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment

Title: ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment ReCAP: Rekursives Cross-Care-Netzwerk für die Pseudo-Label-Generation in Robotic Surgical Skills Assessment RECAP:机器人外科技能评估中优杜-拉贝尔一代人递递性交叉关注网络 2407.05180v4

Authors (5): Julien Quarez, Marc Modat, Sebastien Ourselin, Jonathan Shapey, Alejandro Granados

In surgical skill assessment, the Objective Structured Assessments of Technical Skills (OSATS) and Global Rating Scale (GRS) are well-established tools for evaluating surgeons during training. These metrics, along with performance feedback, help surgeons improve and reach practice standards. Recent research on the open-source JIGSAWS dataset, which includes both GRS and OSATS labels, has focused on regressing GRS scores from kinematic data, video, or their combination. However, we argue that regressing GRS alone is limiting, as it aggregates OSATS scores and overlooks clinically meaningful variations during a surgical trial. To address this, we developed a weakly-supervised recurrent transformer model that tracks a surgeon’s performance throughout a session by mapping hidden states to six OSATS, derived from kinematic data. These OSATS scores are averaged to predict GRS, allowing us to compare our model’s performance against state-of-the-art (SOTA) methods. We report Spearman’s Correlation Coefficients (SCC) demonstrating that our model outperforms SOTA using kinematic data (SCC 0.83-0.88), and matches performance with video-based models. Our model also surpasses SOTA in most tasks for average OSATS predictions (SCC 0.46-0.70) and specific OSATS (SCC 0.56-0.95). The generation of pseudo-labels at the segment level translates quantitative predictions into qualitative feedback, vital for automated surgical skill assessment pipelines. A senior surgeon validated our model’s outputs, agreeing with 77\% of the weakly-supervised predictions (p=0.006).

nan


Article 772

Title@2025-07-07 (1): NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification

Title: NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification NTSFormer: Ein selbstlernender Graphentransformator für multimodale Kaltstartknotenklassifikation NTS Former: 用于多式冷 Start- Start节点分类的自教形图变形器 2507.04870v1

Authors (5): Jun Hu, Yufei He, Yuan Li, Bryan Hooi, Bingsheng He

Cold-start node classification on multimodal graphs is challenging because cold-start nodes are isolated (i.e., no edges) and often have missing modalities (e.g., absent text or image features). Existing methods address structural isolation by degrading graph learning models to MLPs for cold-start inference, using a teacher model (with graph access) to guide the MLP. However, this results in limited model capacity in the student, which is further challenged when modalities are missing. In this paper, we propose Neighbor-to-Self Graph Transformer (NTSFormer), a unified Graph Transformer framework that jointly tackles the isolation and missing-modality issues via a self-teaching paradigm. Specifically, NTSFormer uses a cold-start attention mask to simultaneously make two predictions for each node: a “student” prediction based only on self-information (i.e., the node’s own features), and a “teacher” prediction incorporating both self and neighbor information. This enables the model to supervise itself without degrading to an MLP, thereby fully leveraging the Transformer’s capacity to handle missing modalities. To handle diverse graph information and missing modalities, NTSFormer performs a one-time multimodal graph pre-computation that converts structural and feature data into token sequences, which are then processed by a Mixture-of-Experts (MoE) Input Projection and Transformer layers for effective fusion. Experimental results on public datasets show that NTSFormer achieves superior performance on multimodal cold-start node classification tasks.

nan


Article 773

Title@2025-07-07 (1): Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation

Title: Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation Music Boomerang: Wiederverwendbare Diffusionsmodelle für Datenvergrößerung und Audiomanipulation 音乐 “ 潮动 “ :重新使用数据增强和音频操纵的传播模型 2507.04864v1

Authors (3): Alexander Fichtinger, Jan Schlüter, Gerhard Widmer

Generative models of music audio are typically used to generate output based solely on a text prompt or melody. Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model. In this work, we explore its application in the audio domain as a tool for data augmentation or content manipulation. Specifically, implementing Boomerang sampling for Stable Audio Open, we augment training data for a state-of-the-art beat tracker, and attempt to replace musical instruments in recordings. Our results show that the rhythmic structure of existing examples is mostly preserved, that it improves performance of the beat tracker, but only in scenarios of limited training data, and that it can accomplish text-based instrument replacement on monophonic inputs. We publish our implementation to invite experiments on data augmentation in other tasks and explore further applications.

nan


Article 774

Title@2025-07-07 (1): Fairness and Sparsity within Rashomon sets: Enumeration-Free Exploration and Characterization

Title: Fairness and Sparsity within Rashomon sets: Enumeration-Free Exploration and Characterization Fairness und Sparsamkeit innerhalb der Rashomon-Sets: Enumeration-freie Exploration und Charakterisierung Rashomon各套内部的公平和公平:数字-自由勘探和定性 2502.05286v2

Authors (4): Lucas Langlade, Julien Ferry, Gabriel Laberge, Thibaut Vidal

We introduce an enumeration-free method based on mathematical programming to precisely characterize various properties such as fairness or sparsity within the set of “good models”, known as Rashomon set. This approach is generically applicable to any hypothesis class, provided that a mathematical formulation of the model learning task exists. It offers a structured framework to define the notion of business necessity and evaluate how fairness can be improved or degraded towards a specific protected group, while remaining within the Rashomon set and maintaining any desired sparsity level. We apply our approach to two hypothesis classes: scoring systems and decision diagrams, leveraging recent mathematical programming formulations for training such models. As seen in our experiments, the method comprehensively and certifiably quantifies trade-offs between predictive performance, sparsity, and fairness. We observe that a wide range of fairness values are attainable, ranging from highly favorable to significantly unfavorable for a protected group, while staying within less than 1% of the best possible training accuracy for the hypothesis class. Additionally, we observe that sparsity constraints limit these trade-offs and may disproportionately harm specific subgroups. As we evidenced, thoroughly characterizing the tensions between these key aspects is critical for an informed and accountable selection of models.

nan


Article 775

Title@2025-07-07 (1): Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models

Title: Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models Verbesserung der Vorhersagen von Konvective Storm Wind Gusts durch statistische Nachbearbeitung von neuralen Wettermodellen 通过神经气象模型的统计后处理,改进对动态风暴风速风速素的预测 2504.00128v2

Authors (5): Antoine Leclerc, Erwan Koch, Monika Feldmann, Daniele Nerini, Tom Beucler

Issuing timely severe weather warnings helps mitigate potentially disastrous consequences. Recent advancements in Neural Weather Models (NWMs) offer a computationally inexpensive and fast approach for forecasting atmospheric environments on a 0.25{\deg} global grid. For thunderstorms, these environments can be empirically post-processed to predict wind gust distributions at specific locations. With the Pangu-Weather NWM, we apply a hierarchy of statistical and deep learning post-processing methods to forecast hourly wind gusts up to three days ahead. To ensure statistical robustness, we constrain our probabilistic forecasts using generalised extreme-value distributions across five regions in Switzerland. Using a convolutional neural network to post-process the predicted atmospheric environment’s spatial patterns yields the best results, outperforming direct forecasting approaches across lead times and wind gust speeds. Our results confirm the added value of NWMs for extreme wind forecasting, especially for designing more responsive early-warning systems.

nan


Article 776

Title@2025-07-07 (1): Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Title: Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu Auf dem Weg zum Menschen-in-the-Loop-Eingangserkennung: Ein Transfer-Lernansatz für Maracatu 走向人类在卢布Onset探测:Maracatu的转让学习方法 2507.04858v1

Authors (1): António Sá Pinto

We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition, which features complex rhythmic patterns that challenge conventional models. We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task). Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments. Our results demonstrate significant improvements over baseline performance, with F1 scores reaching up to 0.998 in the intra-task setting and improvements of over 50 percentage points in best-case scenarios. The cross-task adaptation proves particularly effective for time-keeping instruments, where onsets naturally align with beat positions. The optimal fine-tuning configuration varies by instrument, highlighting the importance of instrument-specific adaptation strategies. This approach addresses the challenges of underrepresented musical traditions, offering an efficient human-in-the-loop methodology that minimizes annotation effort while maximizing performance. Our findings contribute to more inclusive music information retrieval tools applicable beyond Western musical contexts.

nan


Article 777

Title@2025-07-07 (1): Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos

Title: Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos Räumliche und semantische Integration für Stereo Sound Event Lokalisierung und Detektion in regelmäßigen Videos 常规视频中立体声音事件定位和探测的空间和语义内嵌式整合 2507.04845v1

Authors (2): Davide Berghi, Philip J. B. Jackson

This report presents our systems submitted to the audio-only and audio-visual tracks of the DCASE2025 Task 3 Challenge: Stereo Sound Event Localization and Detection (SELD) in Regular Video Content. SELD is a complex task that combines temporal event classification with spatial localization, requiring reasoning across spatial, temporal, and semantic dimensions. The last is arguably the most challenging to model. Traditional SELD architectures rely on multichannel input, which limits their ability to leverage large-scale pre-training due to data constraints. To address this, we enhance standard SELD architectures with semantic information by integrating pre-trained, contrastive language-aligned models: CLAP for audio and OWL-ViT for visual inputs. These embeddings are incorporated into a modified Conformer module tailored for multimodal fusion, which we refer to as the Cross-Modal Conformer. Additionally, we incorporate autocorrelation-based acoustic features to improve distance estimation. We pre-train our models on curated synthetic audio and audio-visual datasets and apply a left-right channel swapping augmentation to further increase the training data. Both our audio-only and audio-visual systems substantially outperform the challenge baselines on the development set, demonstrating the effectiveness of our strategy. Performance is further improved through model ensembling and a visual post-processing step based on human keypoints. Future work will investigate the contribution of each modality and explore architectural variants to further enhance results.

nan


Article 778

Title@2025-07-07 (1): PEVLM: Parallel Encoding for Vision-Language Models

Title: PEVLM: Parallel Encoding for Vision-Language Models PEVLM: Parallele Kodierung für Vision-Language-Modelle PEVLM: 视觉语言模型平行编码 2506.19651v2

Authors (6): Letian Kang, Shixian Luo, Yiqiang Li, Xiaoyang Yu, Shenxuan Zhou, Yong Wu

Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long video scenarios. PEVLM partitions the input video into context blocks with a shared sink block, while preserving sequential position embeddings to align the attention weight distribution with that of Full-Attention. This design reduces attention complexity from $O((T \times N)^2)$ to $O(T \times N)$ where $T$ is the number of frames and $N$ the number of tokens per frame, without sacrificing accuracy. Extensive experiments across multiple state-of-the-art models and benchmarks demonstrate that PEVLM consistently outperforms existing parallel encoding approaches, achieving up to \textbf{7.47x} speedup in attention computation and reducing end-to-end latency by \textbf{40\%}. Remarkably, PEVLM not only maintains high accuracy, but in some settings even surpasses Full-Attention performance. Under strict latency constraints, it achieves substantial gains, improving accuracy from \textbf{23.26\%} to \textbf{61.03\%}. These results underscore the effectiveness of PEVLM for low-latency, long-context video understanding, making it a promising solution for real-world applications.

nan


Article 779

Title@2025-07-07 (1): RewardAnything: Generalizable Principle-Following Reward Models

Title: RewardAnything: Generalizable Principle-Following Reward Models BelohnungAlles: Allgemeines Prinzip-folgende Belohnung Modelle 奖励任何奖励:可普遍适用的遵循原则的奖赏模式 2506.03637v2

Authors (10): Zhuohao Yu, Jiali Zeng, Weizheng Gu, Yidong Wang, Jindong Wang, Fandong Meng, Jie Zhou, Yue Zhang, Shikun Zhang, Wei Ye

Reward Models, essential for guiding Large Language Model optimization, are typically trained on fixed preference datasets, resulting in rigid alignment to single, implicit preference distributions. This prevents adaptation to diverse real-world needs-from conciseness in one task to detailed explanations in another. The standard practice of collecting task-specific preference data and retraining reward models is resource-intensive, often producing biased rewards, and limits practical application. We introduce generalizable, principle-following reward models. We propose that RMs should understand and adhere to dynamically provided natural language specifications of reward principles, similar to instruction-following in LLMs. To measure this capability, we develop RABench, a comprehensive benchmark for RMs focusing on generalization across diverse principles. Evaluations on RABench reveal poor generalization of current RMs. As a solution, we present RewardAnything, a novel RM designed and trained to explicitly follow natural language principles. We achieve SotA performance with RewardAnything in traditional RM benchmark simply by specifying a well-defined principle, and results on RABench show we excel in adapting to novel principles without retraining. Furthermore, RewardAnything integrates seamlessly with existing RLHF methods and we show by a case study on how to automatically and efficiently align LLMs with only natural language principles.

nan


Article 780

Title@2025-07-07 (1): Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

Title: Discrete Diffusion Trajectory Alignment via Stepwise Decomposition Diskrete Diffusions-Trajektorie-Ausrichtung über Schrittweise Zersetzung 通过步进分解解解析分解扩散轨迹对齐 2507.04832v1

Authors (7): Jiaqi Han, Austin Wang, Minkai Xu, Wenda Chu, Meihua Dang, Yisong Yue, Stefano Ermon

Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose a novel preference optimization method for masked discrete diffusion models through a principled diffusion trajectory alignment. Instead of applying the reward on the final output and backpropagating the gradient to the entire discrete denoising process, we decompose the problem into a set of stepwise alignment objectives. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, guarantees an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 80.7 on LLaDA-8B-Instruct for language modeling.

nan


Article 781

Title@2025-07-07 (1): A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

Title: A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization Ein High-Level-Compiler-Integrationsansatz für Deep-Learning-Beschleuniger, die Abstraktion und Optimierung unterstützen 支持抽象和优化的深学习加速器高级别汇编者一体化办法 2507.04828v1

Authors (3): Samira Ahmadifarsani, Daniel Mueller-Gritschneder, Ulf Schlichtmann

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML) compilers remains a complex challenge due to the need for significant modifications in compilation layers and specialized scheduling techniques. Existing frameworks offer partial solutions and require users to navigate intricate compiler internals. In this paper, we introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators. Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators without requiring in-depth knowledge of the underlying compiler. Furthermore, we extend and incorporate design space exploration tools, specifically CoSA, to automate efficient tensor scheduling, accounting for factors such as uneven mapping and double buffering. Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.

nan


Article 782

Title@2025-07-07 (1): Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions

Title: Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v3

Authors (2): Marko Tuononen, Ville Hautamäki

Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromising precision, increasing bias, or adding significant computational overhead, while also helping to maintain estimator variance.

nan


Article 783

Title@2025-07-07 (1): UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment

Title: UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment UDF-GMA: Ungewissheitsentflechtung und Fusion für allgemeine Bewegungsbewertung UDF-GMA: 不确定的分解和融合促进总体运动评估 2507.04814v1

Authors (3): Zeqi Luo, Ali Gooya, Edmond S. L. Ho

General movement assessment (GMA) is a non-invasive tool for the early detection of brain dysfunction through the qualitative assessment of general movements, and the development of automated methods can broaden its application. However, mainstream pose-based automated GMA methods are prone to uncertainty due to limited high-quality data and noisy pose estimation, hindering clinical reliability without reliable uncertainty measures. In this work, we introduce UDF-GMA which explicitly models epistemic uncertainty in model parameters and aleatoric uncertainty from data noise for pose-based automated GMA. UDF-GMA effectively disentangles uncertainties by directly modelling aleatoric uncertainty and estimating epistemic uncertainty through Bayesian approximation. We further propose fusing these uncertainties with the embedded motion representation to enhance class separation. Extensive experiments on the Pmi-GMA benchmark dataset demonstrate the effectiveness and generalisability of the proposed approach in predicting poor repertoire.

nan


Article 784

Title@2025-07-07 (1): Neural Velocity for hyperparameter tuning

Title: Neural Velocity for hyperparameter tuning Neurale Geschwindigkeit für Hyperparameter-Tuning 用于超分光计调的神经速率 2507.05309v1

Authors (5): Gianluca Dalmasso, Andrea Bragagnolo, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Hyperparameter tuning, such as learning rate decay and defining a stopping criterion, often relies on monitoring the validation loss. This paper presents NeVe, a dynamic training approach that adjusts the learning rate and defines the stop criterion based on the novel notion of “neural velocity”. The neural velocity measures the rate of change of each neuron’s transfer function and is an indicator of model convergence: sampling neural velocity can be performed even by forwarding noise in the network, reducing the need for a held-out dataset. Our findings show the potential of neural velocity as a key metric for optimizing neural network training efficiently

nan


Article 785

Title@2025-07-07 (1): Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning

Title: Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning 与深创机学习相结合的活动参与和时间安排 2501.10221v3

Authors (2): Fred Shone, Tim Hillel

Using a deep generative machine learning approach, we synthesise human activity participations and scheduling; i.e. the choices of what activities to participate in and when. Activity schedules are a core component of many applied transport, energy, and epidemiology models. Our data-driven approach directly learns the distributions resulting from human preferences and scheduling logic without the need for complex interacting combinations of sub-models and custom rules. This makes our approach significantly faster and simpler to operate than existing approaches to synthesise or anonymise schedule data. We additionally contribute a novel schedule representation and a comprehensive evaluation framework. We evaluate a range of schedule encoding and deep model architecture combinations. The evaluation shows our approach can rapidly generate large, diverse, novel, and realistic synthetic samples of activity schedules.

nan


Article 786

Title@2025-07-07 (1): High Order Collaboration-Oriented Federated Graph Neural Network for Accurate QoS Prediction

Title: High Order Collaboration-Oriented Federated Graph Neural Network for Accurate QoS Prediction High Order Collaboration-Oriented Federated Graph Neural Network für genaue QoS-Vorhersage 高级秩序协作-以联邦州际同步预测神经网络 2507.05308v1

Authors (2): Zehuan Chen, Xiangwei Lai

Predicting Quality of Service (QoS) data crucial for cloud service selection, where user privacy is a critical concern. Federated Graph Neural Networks (FGNNs) can perform QoS data prediction as well as maintaining user privacy. However, existing FGNN-based QoS predictors commonly implement on-device training on scattered explicit user-service graphs, thereby failing to utilize the implicit user-user interactions. To address this issue, this study proposes a high order collaboration-oriented federated graph neural network (HC-FGNN) to obtain accurate QoS prediction with privacy preservation. Concretely, it magnifies the explicit user-service graphs following the principle of attention mechanism to obtain the high order collaboration, which reflects the implicit user-user interactions. Moreover, it utilizes a lightweight-based message aggregation way to improve the computational efficiency. The extensive experiments on two QoS datasets from real application indicate that the proposed HC-FGNN possesses the advantages of high prediction accurate and privacy protection.

nan


Article 787

Title@2025-07-07 (1): Kalman Filter Aided Federated Koopman Learning

Title: Kalman Filter Aided Federated Koopman Learning Kalman Filter Aided Federated Koopman Learning Kalman过滤器帮助联邦库普曼学习 2507.04808v1

Authors (2): Yutao Chen, Wei Chen

Real-time control and estimation are pivotal for applications such as industrial automation and future healthcare. The realization of this vision relies heavily on efficient interactions with nonlinear systems. Therefore, Koopman learning, which leverages the power of deep learning to linearize nonlinear systems, has been one of the most successful examples of mitigating the complexity inherent in nonlinearity. However, the existing literature assumes access to accurate system states and abundant high-quality data for Koopman analysis, which is usually impractical in real-world scenarios. To fill this void, this paper considers the case where only observations of the system are available and where the observation data is insufficient to accomplish an independent Koopman analysis. To this end, we propose Kalman Filter aided Federated Koopman Learning (KF-FedKL), which pioneers the combination of Kalman filtering and federated learning with Koopman analysis. By doing so, we can achieve collaborative linearization with privacy guarantees. Specifically, we employ a straightforward yet efficient loss function to drive the training of a deep Koopman network for linearization. To obtain system information devoid of individual information from observation data, we leverage the unscented Kalman filter and the unscented Rauch-Tung-Striebel smoother. To achieve collaboration between clients, we adopt the federated learning framework and develop a modified FedAvg algorithm to orchestrate the collaboration. A convergence analysis of the proposed framework is also presented. Finally, through extensive numerical simulations, we showcase the performance of KF-FedKL under various situations.

nan


Article 788

Title@2025-07-07 (1): Interpretable Machine Learning for Urban Heat Mitigation: Attribution and Weighting of Multi-Scale Drivers

Title: Interpretable Machine Learning for Urban Heat Mitigation: Attribution and Weighting of Multi-Scale Drivers Verdolmetschbares maschinelles Lernen für die urbane Wärmemilderung: Attribution und Gewichtung von Multi-Scale-Treibern 缓解城市热量的可解释性机器学习:多规模驱动因素的归属和加权 2507.04802v1

Authors (4): David Tschan, Zhi Wang, Jan Carmeliet, Yongling Zhao

Urban heat islands (UHIs) are often accentuated during heat waves (HWs) and pose a public health risk. Mitigating UHIs requires urban planners to first estimate how urban heat is influenced by different land use types (LUTs) and drivers across scales - from synoptic-scale climatic background processes to small-scale urban- and scale-bridging features. This study proposes to classify these drivers into driving (D), urban (U), and local (L) features, respectively. To increase interpretability and enhance computation efficiency, a LUT-distinguishing machine learning approach is proposed as a fast emulator for Weather Research and Forecasting model coupled to a Single-Layer Urban Canopy Model (WRF-SLUCM) to predict ground- (TSK) and 2-meter air temperature (T2). Using random forests (RFs) with extreme gradient boosting (XGB) trained on WRF-SLUCM output over Zurich, Switzerland, during heatwave (HW) periods in 2017 and 2019, this study proposes LUT-based (LB) models that categorize features by scales and practical controllability, allowing optional categorical weighting. This approach enables category-specific feature ranking and sensitivity estimation of T2 and TSK to most important small-scale drivers - most notably surface emissivity, albedo, and leaf area index (LAI). Models employing the LB framework are statistically significantly more accurate than models that do not, with higher performance when more HW data is included in training. With RF-XGB robustly performing optimal with unit weights, the method substantially increase interpretability. Despite the needs to reduce statistical uncertainties and testing the method on other cities, the proposed approach offers urban planners a direct framework for feasibility-centered UHI mitigation assessment.

nan


Article 789

Title@2025-07-07 (1): Training-Free Query Optimization via LLM-Based Plan Similarity

Title: Training-Free Query Optimization via LLM-Based Plan Similarity Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v2

Authors (3): Nikita Vasilenko, Alexander Demin, Vladimir Boorlakov

Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight consistency check validates the selected hint, while a fallback mechanism searches the full hint space when needed. Evaluated on the JOB-CEB benchmark using OpenGauss, LLM-PM achieves an average speed-up of 21% query latency reduction. This work highlights the potential of LLM-powered embeddings to deliver practical improvements in query performance and opens new directions for training-free, embedding-based optimizer guidance systems.

nan


Article 790

Title@2025-07-07 (1): Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Title: Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen 交互式组合式动态规划:有效利用多种移动式数据集进行强力规划 2507.04790v1

Authors (5): Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon

Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.

nan


Article 791

Title@2025-07-07 (1): Machine Learning from Explanations

Title: Machine Learning from Explanations Maschinelles Lernen aus Erklärungen 从解释中学习机器 2507.04788v1

Authors (2): Jiashu Tao, Reza Shokri

Acquiring and training on large-scale labeled data can be impractical due to cost constraints. Additionally, the use of small training datasets can result in considerable variability in model outcomes, overfitting, and learning of spurious correlations. A crucial shortcoming of data labels is their lack of any reasoning behind a specific label assignment, causing models to learn any arbitrary classification rule as long as it aligns data with labels. To overcome these issues, we introduce an innovative approach for training reliable classification models on smaller datasets, by using simple explanation signals such as important input features from labeled data. Our method centers around a two-stage training cycle that alternates between enhancing model prediction accuracy and refining its attention to match the explanations. This instructs models to grasp the rationale behind label assignments during their learning phase. We demonstrate that our training cycle expedites the convergence towards more accurate and reliable models, particularly for small, class-imbalanced training data, or data with spurious features.

nan


Article 792

Title@2025-07-07 (1): Reason to Rote: Rethinking Memorization in Reasoning

Title: Reason to Rote: Rethinking Memorization in Reasoning Grund zu rotieren: Die Erinnerung an die Vernunft neu denken 旋转理由:重新思考在理由方面的记忆 2507.04782v1

Authors (6): Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank

Large language models readily memorize arbitrary training instances, such as label noise, yet they perform strikingly well on reasoning tasks. In this work, we investigate how language models memorize label noise, and why such memorization in many cases does not heavily affect generalizable reasoning capabilities. Using two controllable synthetic reasoning datasets with noisy labels, four-digit addition (FDA) and two-hop relational reasoning (THR), we discover a reliance of memorization on generalizable reasoning mechanisms: models continue to compute intermediate reasoning outputs even when retrieving memorized noisy labels, and intervening reasoning adversely affects memorization. We further show that memorization operates through distributed encoding, i.e., aggregating various inputs and intermediate results, rather than building a look-up mechanism from inputs to noisy labels. Moreover, our FDA case study reveals memorization occurs via outlier heuristics, where existing neuron activation patterns are slightly shifted to fit noisy labels. Together, our findings suggest that memorization of label noise in language models builds on, rather than overrides, the underlying reasoning mechanisms, shedding lights on the intriguing phenomenon of benign memorization.

nan


Article 793

Title@2025-07-07 (1): FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift

Title: FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift FedPall: Prototypenbasiertes adversarisches und kollaboratives Lernen für Federated Learning mit Feature Drift FedPall: 以原型为基础的反versarial和协作学习,促进联邦学习与地物学 2507.04781v1

Authors (6): Yong Zhang, Feng Liang, Guanghu Yuan, Min Yang, Chengming Li, Xiping Hu

Federated learning (FL) enables collaborative training of a global model in the centralized server with data from multiple parties while preserving privacy. However, data heterogeneity can significantly degrade the performance of the global model when each party uses datasets from different sources to train a local model, thereby affecting personalized local models. Among various cases of data heterogeneity, feature drift, feature space difference among parties, is prevalent in real-life data but remains largely unexplored. Feature drift can distract feature extraction learning in clients and thus lead to poor feature extraction and classification performance. To tackle the problem of feature drift in FL, we propose FedPall, an FL framework that utilizes prototype-based adversarial learning to unify feature spaces and collaborative learning to reinforce class information within the features. Moreover, FedPall leverages mixed features generated from global prototypes and local features to enhance the global classifier with classification-relevant information from a global perspective. Evaluation results on three representative feature-drifted datasets demonstrate FedPall’s consistently superior performance in classification with feature-drifted data in the FL scenario.

nan


Article 794

Title@2025-07-07 (1): Sure Convergence and Constructive Universal Approximation for Multi-Layer Neural Networks

Title: Sure Convergence and Constructive Universal Approximation for Multi-Layer Neural Networks Sichere Konvergenz und konstruktive Universal-Annäherung für multi-layer-Neural-Netzwerke 多层神经神经网络的稳固趋同和建设性通用近似 2507.04779v1

Authors (1): Chien-Ming Chi

We propose a new neural network model, 01Neuro, built on indicator activation neurons. Its boosted variant possesses two key statistical properties: (1) Sure Convergence, where model optimization can be achieved with high probability given sufficient computational resources; and (2) Constructive Universal Approximation: In the infinite sample setting, the model can approximate any finite sum of measurable functions, each depending on only k out of p input features, provided the architecture is properly tuned. Unlike most universal approximation results that are agnostic to training procedures, our guarantees are directly tied to the model’s explicit construction and optimization algorithm. To improve prediction stability, we integrate stochastic training and bagging into the boosted 01Neuro framework. Empirical evaluations on simulated and real-world tabular datasets with small to medium sample sizes highlight its strengths: effective approximation of interaction components (multiplicative terms), stable prediction performance (comparable to Random Forests), robustness to many noisy features, and insensitivity to feature scaling. A major limitation of the current implementation of boosted 01Neuro is its higher computational cost, which is approximately 5 to 30 times that of Random Forests and XGBoost.

nan


Article 795

Title@2025-07-07 (1): Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction

Title: Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction Verbesserung des BERT für symbolisches Musikverständnis mit Token Denoising und Pianoroll Prediction 改进使用托肯否认学和钢琴预测学的符号音乐理解BERT 2507.04776v1

Authors (2): Jun-You Wang, Li Su

We propose a pre-trained BERT-like model for symbolic music understanding that achieves competitive performance across a wide range of downstream tasks. To achieve this target, we design two novel pre-training objectives, namely token correction and pianoroll prediction. First, we sample a portion of note tokens and corrupt them with a limited amount of noise, and then train the model to denoise the corrupted tokens; second, we also train the model to predict bar-level and local pianoroll-derived representations from the corrupted note tokens. We argue that these objectives guide the model to better learn specific musical knowledge such as pitch intervals. For evaluation, we propose a benchmark that incorporates 12 downstream tasks ranging from chord estimation to symbolic genre classification. Results confirm the effectiveness of the proposed pre-training objectives on downstream tasks.

nan


Article 796

Title@2025-07-07 (1): Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces

Title: Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces Integration biologischer und maschineller Intelligenz: Aufmerksamkeitsmechanismen in Gehirn-Computer-Schnittstellen 生物和机器综合情报:脑-计算机界面中的注意机制 2502.19281v2

Authors (7): Jiyuan Wang, Weishan Ye, Jialin He, Li Zhang, Gan Huang, Zhuliang Yu, Zhen Liang

With the rapid advancement of deep learning, attention mechanisms have become indispensable in electroencephalography (EEG) signal analysis, significantly enhancing Brain-Computer Interface (BCI) applications. This paper presents a comprehensive review of traditional and Transformer-based attention mechanisms, their embedding strategies, and their applications in EEG-based BCI, with a particular emphasis on multimodal data fusion. By capturing EEG variations across time, frequency, and spatial channels, attention mechanisms improve feature extraction, representation learning, and model robustness. These methods can be broadly categorized into traditional attention mechanisms, which typically integrate with convolutional and recurrent networks, and Transformer-based multi-head self-attention, which excels in capturing long-range dependencies. Beyond single-modality analysis, attention mechanisms also enhance multimodal EEG applications, facilitating effective fusion between EEG and other physiological or sensory data. Finally, we discuss existing challenges and emerging trends in attention-based EEG modeling, highlighting future directions for advancing BCI technology. This review aims to provide valuable insights for researchers seeking to leverage attention mechanisms for improved EEG interpretation and application.

nan


Article 797

Title@2025-07-07 (1): Efficient Unlearning with Privacy Guarantees

Title: Efficient Unlearning with Privacy Guarantees Effizientes Lernen mit Datenschutzgarantien 以隐私保障高效率地不学习 2507.04771v1

Authors (3): Josep Domingo-Ferrer, Najeeb Jebreel, David Sánchez

Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present \emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {\em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $\epsilon$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.

nan


Article 798

Title@2025-07-07 (1): ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems

Title: ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems ABench-Physics: Benchmarking Physical Reasoning in LLMs über hochschwierige und dynamische Physikprobleme ABench-物理:通过高难度和动态物理问题确定LLMLM中物理原因的基准 2507.04766v1

Authors (12): Yiming Zhang, Yingfan Ma, Yanmei Gu, Zhengkai Yang, Yihong Zhuang, Feng Wang, Zenan Huang, Yuanyuan Wang, Chao Huang, Bowen Song, Cheng Lin, Junbo Zhao

Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics poses unique challenges that demand not only precise computation but also deep conceptual understanding and physical modeling skills. Existing benchmarks often fall short due to limited difficulty, multiple-choice formats, and static evaluation settings that fail to capture physical modeling ability. In this paper, we introduce ABench-Physics, a novel benchmark designed to rigorously evaluate LLMs’ physical reasoning and generalization capabilities. ABench-Physics consists of two components: Phy_A, a static set of 400 graduate- or Olympiad-level problems; and Phy_B, a dynamic subset of 100 problems equipped with an automatic variation engine to test model robustness across changing conditions. All questions require precise numerical answers, with strict formatting and tolerance constraints. Our evaluation of several state-of-the-art LLMs reveals substantial performance gaps, highlighting persistent limitations in physical reasoning, especially in generalization to dynamic variants. ABench-Physics provides a challenging and diagnostic framework for advancing scientific reasoning in LLMs.

nan


Article 799

Title@2025-07-07 (1): Intervening to learn and compose disentangled representations

Title: Intervening to learn and compose disentangled representations Intervenieren, um entfremdete Darstellungen zu lernen und zu komponieren 干预学习和组成分解的表述 2507.04754v1

Authors (5): Alex Markham, Jeri A. Chang, Isaac Hirsch, Liam Solus, Bryon Aragam

In designing generative models, it is commonly believed that in order to learn useful latent structure, we face a fundamental tension between expressivity and structure. In this paper we challenge this view by proposing a new approach to training arbitrarily expressive generative models that simultaneously learn disentangled latent structure. This is accomplished by adding a simple decoder-only module to the head of an existing decoder block that can be arbitrarily complex. The module learns to process concept information by implicitly inverting linear representations from an encoder. Inspired by the notion of intervention in causal graphical models, our module selectively modifies its architecture during training, allowing it to learn a compact joint model over different contexts. We show how adding this module leads to disentangled representations that can be composed for out-of-distribution generation. To further validate our proposed approach, we prove a new identifiability result that extends existing work on identifying structured representations in nonlinear models.

nan


Article 800

Title@2025-07-07 (1): Enhancing variational quantum algorithms by balancing training on classical and quantum hardware

Title: Enhancing variational quantum algorithms by balancing training on classical and quantum hardware Verbesserung von variationalen Quantenalgorithmen durch Ausbalancieren von Trainings auf klassischer und Quantenhardware 通过平衡传统硬件和量子硬件的培训,加强变异量算法 2503.16361v2

Authors (6): Rahul Bhowmick, Harsh Wadhwa, Avinash Singh, Tania Sidana, Quoc Hoan Tran, Krishna Kumar Sabapathy

Quantum computers offer a promising route to tackling problems that are classically intractable such as in prime-factorization, solving large-scale linear algebra and simulating complex quantum systems, but potentially require fault-tolerant quantum hardware. On the other hand, variational quantum algorithms (VQAs) are a promising approach for leveraging near-term quantum computers to solve complex problems. However, there remain major challenges in their trainability and resource costs on quantum hardware. Here we address these challenges by adopting Hardware Efficient and dynamical LIe algebra supported Ansatz (HELIA), and propose two training methods that combine an existing classical-enhanced g-sim method and the quantum-based Parameter-Shift Rule (PSR). Our improvement comes from distributing the resources required for gradient estimation and training to both classical and quantum hardware. We numerically evaluate our approach for ground-state estimation of 6 to 18-qubit Hamiltonians using the Variational Quantum Eigensolver (VQE) and quantum phase classification for up to 12-qubit Hamiltonians using quantum neural networks. For VQE, our method achieves higher accuracy and success rates, with an average reduction in quantum hardware calls of up to 60% compared to purely quantum-based PSR. For classification, we observe test accuracy improvements of up to 2.8%. We also numerically demonstrate the capability of HELIA in mitigating barren plateaus, paving the way for training large-scale quantum models.

nan


Article 801

Title@2025-07-07 (1): Improving Graph Out-of-distribution Generalization Beyond Causality

Title: Improving Graph Out-of-distribution Generalization Beyond Causality Verbesserung der Graphen-Verallgemeinerung außerhalb der Verteilung jenseits der Kausalität 改善超出原因范围的普遍化的分布外图 2407.10204v3

Authors (7): Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Yao Liu, Xiang Li

Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environments and invariant sub-graphs, this paper presents the theorems of environment-label dependency and mutable rationale invariance, where the former characterizes the usefulness of environments in determining graph labels while the latter refers to the mutable importance of graph rationales. Based on analytic investigations, a novel variational inference based method named ``Probability Dependency on Environments and Rationales for OOD Graphs on Real-world Data’’ (DEROG) is introduced. To alleviate the adverse effect of unknown prior knowledge on environments and rationales, DEROG utilizes generalized Bayesian inference. Further, DEROG employs an EM-based algorithm for optimization. Finally, extensive experiments on real-world datasets under different distribution shifts are conducted to show the superiority of DEROG. Our code is publicly available at https://github.com/LEOXC1571/DEROG.

nan


Article 802

Title@2025-07-07 (1): Enjoying Non-linearity in Multinomial Logistic Bandits

Title: Enjoying Non-linearity in Multinomial Logistic Bandits Nichtlinearität in multinomialen Logistischen Banditen genießen 享受多种后勤大盗的非线性 2507.05306v1

Authors (3): Pierre Boudart, Pierre Gaillard, Alessandro Rudi

We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $\kappa_$, that may be exponentially large in some problem parameters and which is captured by the derivative of the sigmoid function. It encapsulates the non-linearity and improves existing regret guarantees over $T$ rounds from $\smash{O(d\sqrt{T})}$ to $\smash{O(d\sqrt{T/\kappa_})}$, where $d$ is the dimension of the parameter space. We extend their analysis to the multinomial logistic bandit framework, making it suitable for complex applications with more than two choices, such as reinforcement learning or recommender systems. To achieve this, we extend the definition of $\kappa_$ to the multinomial setting and propose an efficient algorithm that leverages the problem’s non-linearity. Our method yields a problem-dependent regret bound of order $ \smash{\widetilde{\mathcal{O}}( Kd \sqrt{ {T}/{\kappa_} })} $, where $K$ is the number of actions and $\kappa_* \ge 1$. This improves upon the best existing guarantees of order $ \smash{\widetilde{\mathcal{O}}( Kd \sqrt{T} )} $. Moreover, we provide a $\smash{ \Omega(d\sqrt{T/\kappa_})}$ lower-bound, showing that our dependence on $\kappa_$ is optimal.

nan


Article 803

Title@2025-07-07 (1): CueLearner: Bootstrapping and local policy adaptation from relative feedback

Title: CueLearner: Bootstrapping and local policy adaptation from relative feedback CueLearner: Bootstrapping und lokale Politikanpassung durch relatives Feedback CueLearner:利用相对反馈推动和调整地方政策 2507.04730v1

Authors (4): Giulio Schiavi, Andrei Cramariuc, Lionel Ott, Roland Siegwart

Human guidance has emerged as a powerful tool for enhancing reinforcement learning (RL). However, conventional forms of guidance such as demonstrations or binary scalar feedback can be challenging to collect or have low information content, motivating the exploration of other forms of human input. Among these, relative feedback (i.e., feedback on how to improve an action, such as “more to the left”) offers a good balance between usability and information richness. Previous research has shown that relative feedback can be used to enhance policy search methods. However, these efforts have been limited to specific policy classes and use feedback inefficiently. In this work, we introduce a novel method to learn from relative feedback and combine it with off-policy reinforcement learning. Through evaluations on two sparse-reward tasks, we demonstrate our method can be used to improve the sample efficiency of reinforcement learning by guiding its exploration process. Additionally, we show it can adapt a policy to changes in the environment or the user’s preferences. Finally, we demonstrate real-world applicability by employing our approach to learn a navigation policy in a sparse reward setting.

nan


Article 804

Title@2025-07-07 (1): Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization

Title: Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization Textentgiftung: Dateneffizienz, Semantische Erhaltung und Modellverallgemeinerung 文字解毒:数据效率、语义保护和模式通用 2507.01050v2

Authors (7): Jing Yu, Yibo Zhao, Jiapeng Zhu, Wenming Shao, Bo Pang, Zhao Zhang, Xiang Li

The widespread dissemination of toxic content on social media poses a serious threat to both online environments and public discourse, highlighting the urgent need for detoxification methods that effectively remove toxicity while preserving the original semantics. However, existing approaches often struggle to simultaneously achieve strong detoxification performance, semantic preservation, and robustness to out-of-distribution data. Moreover, they typically rely on costly, manually annotated parallel corpora while showing poor data efficiency. To address these challenges, we propose a two-stage training framework that jointly optimizes for data efficiency, semantic preservation, and model generalization. We first perform supervised fine-tuning on a small set of high-quality, filtered parallel data to establish a strong initialization. Then, we leverage unlabeled toxic inputs and a custom-designed reward model to train the LLM using Group Relative Policy Optimization. Experimental results demonstrate that our method effectively mitigates the trade-offs faced by previous work, achieving state-of-the-art performance with improved generalization and significantly reduced dependence on annotated data. Our code is available at: https://github.com/allacnobug/Detoxification-of-Text.

nan


Article 805

Title@2025-07-07 (1): Data Matters: The Case of Predicting Mobile Cellular Traffic

Title: Data Matters: The Case of Predicting Mobile Cellular Traffic Datenangelegenheiten: Der Fall der Vorhersage des mobilen Zellulären Verkehrs 数据事项:预测移动电话流量的案例 2411.02418v2

Authors (3): Natalia Vesselinova, Matti Harjula, Pauliina Ilmonen

Accurate predictions of base stations’ traffic load are essential to mobile cellular operators and their users as they support the efficient use of network resources and allow delivery of services that sustain smart cities and roads. Traditionally, cellular network time-series have been considered for this prediction task. More recently, exogenous factors such as points of interest and other environmental knowledge have been explored too. In contrast to incorporating external factors, we propose to learn the processes underlying cellular load generation by employing population dynamics data. In this study, we focus on smart roads and use road traffic measures to improve prediction accuracy. Comprehensive experiments demonstrate that by employing road flow and speed, in addition to cellular network metrics, base station load prediction errors can be substantially reduced, by as much as $56.5\%.$ The code, visualizations and extensive results are available on https://github.com/nvassileva/DataMatters.

nan


Article 806

Title@2025-07-07 (1): Dementia Detection using Multi-modal Methods on Audio Data

Title: Dementia Detection using Multi-modal Methods on Audio Data Demenzerkennung mit multimodalen Methoden auf Audiodaten 使用多模式音频数据方法检测痴呆症 2501.00465v2

Authors (8): Saugat Kannojia, Anirudh Praveen, Danish Vasdev, Saket Nandedkar, Divyansh Mittal, Sarthak Kalankar, Shaurya Johari, Vipul Arora

Dementia is a neurodegenerative disease that causes gradual cognitive impairment, which is very common in the world and undergoes a lot of research every year to prevent and cure it. It severely impacts the patient’s ability to remember events and communicate clearly, where most variations of it have no known cure, but early detection can help alleviate symptoms before they become worse. One of the main symptoms of dementia is difficulty in expressing ideas through speech. This paper attempts to talk about a model developed to predict the onset of the disease using audio recordings from patients. An ASR-based model was developed that generates transcripts from the audio files using Whisper model and then applies RoBERTa regression model to generate an MMSE score for the patient. This score can be used to predict the extent to which the cognitive ability of a patient has been affected. We use the PROCESS_V1 dataset for this task, which is introduced through the PROCESS Grand Challenge 2025. The model achieved an RMSE score of 2.6911 which is around 10 percent lower than the described baseline.

nan


Article 807

Title@2025-07-07 (1): Uncertainty in Real-Time Semantic Segmentation on Embedded Systems

Title: Uncertainty in Real-Time Semantic Segmentation on Embedded Systems Unsicherheit in der Echtzeit-Semantik-Segmentierung auf Embedded-Systemen 内嵌系统实时塞义分割的不确定性 2301.01201v5

Authors (2): Ethan Goan, Clinton Fookes

Application for semantic segmentation models in areas such as autonomous vehicles and human computer interaction require real-time predictive capabilities. The challenges of addressing real-time application is amplified by the need to operate on resource constrained hardware. Whilst development of real-time methods for these platforms has increased, these models are unable to sufficiently reason about uncertainty present when applied on embedded real-time systems. This paper addresses this by combining deep feature extraction from pre-trained models with Bayesian regression and moment propagation for uncertainty aware predictions. We demonstrate how the proposed method can yield meaningful epistemic uncertainty on embedded hardware in real-time whilst maintaining predictive performance.

nan


Article 808

Title@2025-07-07 (1): Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs

Title: Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs 正式/非正式陈述和正式/非正式证明正式理由理由完全基准辩护人 2507.04719v1

Authors (2): Roozbeh Yousefzadeh, Xuenan Cao

This position paper provides a critical but constructive discussion of current practices in benchmarking and evaluative practices in the field of formal reasoning and automated theorem proving. We take the position that open code, open data, and benchmarks that are complete and error-free will accelerate progress in this field. We identify practices that create barriers to contributing to this field and suggest ways to remove them. We also discuss some of the practices that might produce misleading evaluative information. We aim to create discussions that bring together people from various groups contributing to automated theorem proving, autoformalization, and informal reasoning.

nan


Article 809

Title@2025-07-07 (1): On the quality of randomized approximations of Tukey’s depth

Title: On the quality of randomized approximations of Tukey’s depth Zur Qualität der randomisierten Näherung der Tukey-Tiefe Tukey深度随机近似质量 2309.05657v3

Authors (3): Simon Briend, Gábor Lugosi, Roberto Imbuzeiro Oliveira

Tukey’s depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey’s depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey’s depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey’s depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.

nan


Article 810

Title@2025-07-07 (1): Optimal Model Selection for Conformalized Robust Optimization

Title: Optimal Model Selection for Conformalized Robust Optimization Optimale Modellauswahl für konformisierte Robuste Optimierung 正规化强力优化最佳模型选择最佳模型选择 2507.04716v1

Authors (5): Yajie Bao, Yang Hu, Haojie Ren, Peng Zhao, Changliang Zou

In decision-making under uncertainty, Contextual Robust Optimization (CRO) provides reliability by minimizing the worst-case decision loss over a prediction set, hedging against label variability. While recent advances use conformal prediction to construct prediction sets for machine learning models, the downstream decisions critically depend on model selection. This paper introduces novel model selection frameworks for CRO that unify robustness control with decision risk minimization. We first propose Conformalized Robust Optimization with Model Selection (CROMS), which automatically selects models to approximately minimize the average decision risk in CRO solutions. We develop two algorithms: E-CROMS, which is computationally efficient, and F-CROMS, which enjoys a marginal robustness guarantee in finite samples. Further, we introduce Conformalized Robust Optimization with Individualized Model Selection (CROiMS), which performs individualized model selection by minimizing the conditional decision risk given the covariate of test data. This framework advances conformal prediction methodology by enabling covariate-aware model selection. Theoretically, CROiMS achieves asymptotic conditional robustness and decision efficiency under mild assumptions. Numerical results demonstrate significant improvements in decision efficiency and robustness across diverse synthetic and real-world applications, outperforming baseline approaches.

nan


Article 811

Title@2025-07-07 (1): Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments

Title: Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments Maximal sichere Sets mit Hypernetworks für MPC-basierte lokale Flugbahnplanung in unbekannten Umgebungen lernen 为未知环境中基于 MPC 的本地轨迹规划使用超网络学习最大安全设置 2410.20267v3

Authors (4): Bojan Derajić, Mohamed-Khalil Bouzidi, Sebastian Bernhard, Wolfgang Hönig

This paper presents a novel learning-based approach for online estimation of maximal safe sets for local trajectory planning in unknown static environments. The neural representation of a set is used as the terminal set constraint for a model predictive control (MPC) local planner, resulting in improved recursive feasibility and safety. To achieve real-time performance and desired generalization properties, we employ the idea of hypernetworks. We use the Hamilton-Jacobi (HJ) reachability analysis as the source of supervision during the training process, allowing us to consider general nonlinear dynamics and arbitrary constraints. The proposed method is extensively evaluated against relevant baselines in simulations for different environments and robot dynamics. The results show an increase in success rate of up to 52% compared to the best baseline while maintaining comparable execution speed. Additionally, we deploy our proposed method, NTC-MPC, on a physical robot and demonstrate its ability to safely avoid obstacles in scenarios where the baselines fail.

nan


Article 812

Title@2025-07-07 (1): Mutual Information Optimal Control of Discrete-Time Linear Systems

Title: Mutual Information Optimal Control of Discrete-Time Linear Systems Gegenseitige Information Optimale Kontrolle von Diskretzeit-Linearsystemen Discrete-Ti时线性系统最佳最佳控制 2507.04712v1

Authors (2): Shoju Enami, Kenji Kashima

In this paper, we formulate a mutual information optimal control problem (MIOCP) for discrete-time linear systems. This problem can be regarded as an extension of a maximum entropy optimal control problem (MEOCP). Differently from the MEOCP where the prior is fixed to the uniform distribution, the MIOCP optimizes the policy and prior simultaneously. As analytical results, under the policy and prior classes consisting of Gaussian distributions, we derive the optimal policy and prior of the MIOCP with the prior and policy fixed, respectively. Using the results, we propose an alternating minimization algorithm for the MIOCP. Through numerical experiments, we discuss how our proposed algorithm works.

nan


Article 813

Title@2025-07-07 (1): Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model

Title: Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model Geometrisch-geführte Wenig-heiße Dentalmark-Erkennung mit human-zentrischem Stiftungsmodell 使用人类研究中心基金会模型的几口热牙科地标探测 2507.04710v1

Authors (7): Anbang Wang, Marawan Elbatel, Keyuan Liu, Lizhuo Lin, Meng Lan, Yanqi Yang, Xiaomeng Li

Accurate detection of anatomic landmarks is essential for assessing alveolar bone and root conditions, thereby optimizing clinical outcomes in orthodontics, periodontics, and implant dentistry. Manual annotation of landmarks on cone-beam computed tomography (CBCT) by dentists is time-consuming, labor-intensive, and subject to inter-observer variability. Deep learning-based automated methods present a promising approach to streamline this process efficiently. However, the scarcity of training data and the high cost of expert annotations hinder the adoption of conventional deep learning techniques. To overcome these challenges, we introduce GeoSapiens, a novel few-shot learning framework designed for robust dental landmark detection using limited annotated CBCT of anterior teeth. Our GeoSapiens framework comprises two key components: (1) a robust baseline adapted from Sapiens, a foundational model that has achieved state-of-the-art performance in human-centric vision tasks, and (2) a novel geometric loss function that improves the model’s capacity to capture critical geometric relationships among anatomical structures. Experiments conducted on our collected dataset of anterior teeth landmarks revealed that GeoSapiens surpassed existing landmark detection methods, outperforming the leading approach by an 8.18% higher success detection rate at a strict 0.5 mm threshold-a standard widely recognized in dental diagnostics. Code is available at: https://github.com/xmed-lab/GeoSapiens.

nan


Article 814

Title@2025-07-07 (1): Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication

Title: Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication Spooky Action auf Distanz: Normalisierungsschichten aktivieren die räumliche Kommunikation mit Seitenkanal 远距离恐怖行动:普通化层使侧通道空间通信得以实现 2507.04709v1

Authors (4): Samuel Pfrommer, George Ma, Yixiao Huang, Somayeh Sojoudi

This work shows that normalization layers can facilitate a surprising degree of communication across the spatial dimensions of an input tensor. We study a toy localization task with a convolutional architecture and show that normalization layers enable an iterative message passing procedure, allowing information aggregation from well outside the local receptive field. Our results suggest that normalization layers should be employed with caution in applications such as diffusion-based trajectory generation, where maintaining a spatially limited receptive field is crucial.

nan


Article 815

Title@2025-07-07 (1): UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization

Title: UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization UrbanMind: Auf dem Weg zu urbaner Allgemeinintelligenz durch Tool-Enhanced Retrieval-Augmented Generation und Multilevel-Optimierung 城市中心:通过工具增强的回收型和多层次优化实现城市一般情报 2507.04706v1

Authors (7): Kai Yang, Zelin Zhu, Chengtao Jian, Hui Ma, Shengjie Zhao, Xiaozhou Ye, Ye Ouyang

Urban general intelligence (UGI) refers to the capacity of AI systems to autonomously perceive, reason, and act within dynamic and complex urban environments. In this paper, we introduce UrbanMind, a tool-enhanced retrieval-augmented generation (RAG) framework designed to facilitate UGI. Central to UrbanMind is a novel architecture based on Continual Retrieval-Augmented MoE-based LLM (C-RAG-LLM), which dynamically incorporates domain-specific knowledge and evolving urban data to support long-term adaptability. The architecture of C-RAG-LLM aligns naturally with a multilevel optimization framework, where different layers are treated as interdependent sub-problems. Each layer has distinct objectives and can be optimized either independently or jointly through a hierarchical learning process. The framework is highly flexible, supporting both end-to-end training and partial layer-wise optimization based on resource or deployment constraints. To remain adaptive under data drift, it is further integrated with an incremental corpus updating mechanism. Evaluations on real-world urban tasks of a variety of complexity verify the effectiveness of the proposed framework. This work presents a promising step toward the realization of general-purpose LLM agents in future urban environments.

nan


Article 816

Title@2025-07-07 (1): Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

Title: Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards Balance Act: Priorisierungsstrategien für LLM-Designed Restless Bandit Rewards 平衡法:LLM-指派的无无无无无无无无无无无无无无无无有匪奖项的优先战略 2408.12112v4

Authors (4): Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe

LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

nan


Article 817

Title@2025-07-07 (1): FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs

Title: FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs FAMOUS: Flexibler Beschleuniger für den Aufmerksamkeitsmechanismus des Transformators auf UltraScale+ FPGAs FANOUS: 超标准+FPGAs变异器注意机制灵活加速器 2409.14023v3

Authors (6): Ehsan Kabir, Md. Arafat Kabir, Austin R. J. Downey, Jason D. Bakos, David Andrews, Miaoqing Huang

Transformer neural networks (TNNs) are being applied across a widening range of application domains, including natural language processing (NLP), machine translation, and computer vision (CV). Their popularity is largely attributed to the exceptional performance of their multi-head self-attention blocks when analyzing sequential data and extracting features. To date, there are limited hardware accelerators tailored for this mechanism, which is the first step before designing an accelerator for a complete model. This paper proposes \textit{FAMOUS}, a flexible hardware accelerator for dense multi-head attention (MHA) computation of TNNs on field-programmable gate arrays (FPGAs). It is optimized for high utilization of processing elements and on-chip memories to improve parallelism and reduce latency. An efficient tiling of large matrices has been employed to distribute memory and computing resources across different modules on various FPGA platforms. The design is evaluated on Xilinx Alveo U55C and U200 data center cards containing Ultrascale+ FPGAs. Experimental results are presented that show that it can attain a maximum throughput, number of parallel attention heads, embedding dimension and tile size of 328 (giga operations/second (GOPS)), 8, 768 and 64 respectively on the U55C. Furthermore, it is 3.28$\times$ and 2.6$\times$ faster than the Intel Xeon Gold 5220R CPU and NVIDIA V100 GPU respectively. It is also 1.3$\times$ faster than the fastest state-of-the-art FPGA-based accelerator.

nan


Article 818

Title@2025-07-07 (1): The Neural Networks with Tensor Weights and the Corresponding Fermionic Quantum Field Theory

Title: The Neural Networks with Tensor Weights and the Corresponding Fermionic Quantum Field Theory Die neuralen Netzwerke mit Tensorgewichten und die korrespondierende Fermionische Quantenfeldtheorie 具有Tensor重量的神经网络和相应的Fermionic 量子场理论 2507.05303v1

Authors (2): Guojun Huang, Kai Zhou

In this paper, we establish a theoretical connection between complex-valued neural networks (CVNNs) and fermionic quantum field theory (QFT), bridging a fundamental gap in the emerging framework of neural network quantum field theory (NN-QFT). While prior NN-QFT works have linked real-valued architectures to bosonic fields, we demonstrate that CVNNs equipped with tensor-valued weights intrinsically generate fermionic quantum fields. By promoting hidden-to-output weights to Clifford algebra-valued tensors, we induce anticommutation relations essential for fermionic statistics. Through analytical study of the generating functional, we obtain the exact quantum state in the infinite-width limit, revealing that the parameters between the input layer and the last hidden layer correspond to the eigenvalues of the quantum system, and the tensor weighting parameters in the hidden-to-output layer map to dynamical fermionic fields. The continuum limit reproduces free fermion correlators, with diagrammatic expansions confirming anticommutation. The work provides the first explicit mapping from neural architectures to fermionic QFT at the level of correlation functions and generating functional. It extends NN-QFT beyond bosonic theories and opens avenues for encoding fermionic symmetries into machine learning models, with potential applications in quantum simulation and lattice field theory.

nan


Article 819

Title@2025-07-07 (1): ResQuNNs: Towards Enabling Deep Learning in Quantum Convolution Neural Networks

Title: ResQuNNs: Towards Enabling Deep Learning in Quantum Convolution Neural Networks ResQuNNs: Auf dem Weg zum Deep Learning in Quantum Convolution Neural Networks ResQuNNs:争取在量子革命神经网络中进行深入学习 2402.09146v6

Authors (2): Muhammad Kashif, Muhammad Shafique

In this paper, we present a novel framework for enhancing the performance of Quanvolutional Neural Networks (QuNNs) by introducing trainable quanvolutional layers and addressing the critical challenges associated with them. Traditional quanvolutional layers, although beneficial for feature extraction, have largely been static, offering limited adaptability. Unlike state-of-the-art, our research overcomes this limitation by enabling training within these layers, significantly increasing the flexibility and potential of QuNNs. However, the introduction of multiple trainable quanvolutional layers induces complexities in gradient-based optimization, primarily due to the difficulty in accessing gradients across these layers. To resolve this, we propose a novel architecture, Residual Quanvolutional Neural Networks (ResQuNNs), leveraging the concept of residual learning, which facilitates the flow of gradients by adding skip connections between layers. By inserting residual blocks between quanvolutional layers, we ensure enhanced gradient access throughout the network, leading to improved training performance. Moreover, we provide empirical evidence on the strategic placement of these residual blocks within QuNNs. Through extensive experimentation, we identify an efficient configuration of residual blocks, which enables gradients across all the layers in the network that eventually results in efficient training. Our findings suggest that the precise location of residual blocks plays a crucial role in maximizing the performance gains in QuNNs. Our results mark a substantial step forward in the evolution of quantum deep learning, offering new avenues for both theoretical development and practical quantum computing applications.

nan


Article 820

Title@2025-07-07 (1): Synergistic Localization and Sensing in MIMO-OFDM Systems via Mixed-Integer Bilevel Learning

Title: Synergistic Localization and Sensing in MIMO-OFDM Systems via Mixed-Integer Bilevel Learning Synergistische Lokalisierung und Sensing in MIMO-OFDM-Systemen über Mixed-Integer Bilevel Learning 通过混合内插双级学习,在MOIMO-OFDM系统中实现协同本地化和遥感 2507.07118v1

Authors (3): Zelin Zhu, Kai Yang, Rui Zhang

Wireless localization and sensing technologies are essential in modern wireless networks, supporting applications in smart cities, the Internet of Things (IoT), and autonomous systems. High-performance localization and sensing systems are critical for both network efficiency and emerging intelligent applications. Integrating channel state information (CSI) with deep learning has recently emerged as a promising solution. Recent works have leveraged the spatial diversity of multiple input multiple output (MIMO) systems and the frequency granularity of orthogonal frequency division multiplexing (OFDM) waveforms to improve spatial resolution. Nevertheless, the joint modeling of localization and sensing under the high-dimensional CSI characteristics of MIMO-OFDM systems remains insufficiently investigated. This work aims to jointly model and optimize localization and sensing tasks to harness their potential synergy. We first formulate localization and sensing as a mixed-integer bilevel deep learning problem and then propose a novel stochastic proximal gradient-based mixed-integer bilevel optimization (SPG-MIBO) algorithm. SPG-MIBO is well-suited for high-dimensional and large-scale datasets, leveraging mini-batch training at each step for computational and memory efficiency. The algorithm is also supported by theoretical convergence guarantees. Extensive experiments on multiple datasets validate its effectiveness and highlight the performance gains from joint localization and sensing optimization.

nan


Article 821

Title@2025-07-07 (1): Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation

Title: Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation Performance-Evaluierung allgemeiner Zwecke Große Sprachmodelle für grundlegende lineare Algebra-Unterprogramme Code-Generierung 基本线性代代数子方案代码生成通用大语言模型绩效评价 2507.04697v1

Authors (4): Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri

Generative AI technology based on Large Language Models (LLM) has been developed and applied to assist or automatically generate program codes. In this paper, we evaluate the capability of existing general LLMs for Basic Linear Algebra Subprograms (BLAS) code generation for CPUs. We use two LLMs provided by OpenAI: GPT-4.1, a Generative Pre-trained Transformer (GPT) model, and o4-mini, one of the o-series of Reasoning models. Both have been released in April 2025. For the routines from level-1 to 3 BLAS, we tried to generate (1) C code without optimization from routine name only, (2) C code with basic performance optimizations (thread parallelization, SIMD vectorization, and cache blocking) from routine name only, and (3) C code with basic performance optimizations based on Fortran reference code. As a result, we found that correct code can be generated in many cases even when only routine name are given. We also confirmed that thread parallelization with OpenMP, SIMD vectorization, and cache blocking can be implemented to some extent, and that the code is faster than the reference code.

nan


Article 822

Title@2025-07-07 (1): Monte Carlo Tree Diffusion for System 2 Planning

Title: Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v5

Authors (5): Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.

nan


Article 823

Title@2025-07-07 (1): Interpretable Reward Modeling with Active Concept Bottlenecks

Title: Interpretable Reward Modeling with Active Concept Bottlenecks Interpretierbare Prämienmodellierung mit Active Concept Engpässen 具有主动概念瓶颈的可解释的奖励模型 2507.04695v1

Authors (4): Sonia Laguna, Katarzyna Kobalczyk, Julia E. Vogt, Mihaela Van der Schaar

We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning through selective concept annotation. Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts. To make this framework efficient in low-supervision settings, we formalize an active learning strategy that dynamically acquires the most informative concept labels. We propose an acquisition function based on Expected Information Gain and show that it significantly accelerates concept learning without compromising preference accuracy. Evaluated on the UltraFeedback dataset, our method outperforms baselines in interpretability and sample efficiency, marking a step towards more transparent, auditable, and human-aligned reward models.

nan


Article 824

Title@2025-07-07 (1): Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness

Title: Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness Überbrücken von KAN und MLP: MJKAN, eine hybride Architektur mit Effizienz und Expressivität 连接KAN和MLP:MJKAN,一个既效率又能表达的混合结构 2507.04690v1

Authors (4): Hanseon Joo, Hayoung Choi, Ook Lee, Minjong Cheon

Kolmogorov-Arnold Networks (KANs) have garnered attention for replacing fixed activation functions with learnable univariate functions, but they exhibit practical limitations, including high computational costs and performance deficits in general classification tasks. In this paper, we propose the Modulation Joint KAN (MJKAN), a novel neural network layer designed to overcome these challenges. MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations, creating a hybrid architecture that combines the non-linear expressive power of KANs with the efficiency of Multilayer Perceptrons (MLPs). We empirically validated MJKAN’s performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam). The results demonstrate that MJKAN achieves superior approximation capabilities in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases. Conversely, in image and text classification, its performance was competitive with MLPs but revealed a critical dependency on the number of basis functions. We found that a smaller basis size was crucial for better generalization, highlighting that the model’s capacity must be carefully tuned to the complexity of the data to prevent overfitting. In conclusion, MJKAN offers a flexible architecture that inherits the theoretical advantages of KANs while improving computational efficiency and practical viability.

nan


Article 825

Title@2025-07-07 (1): Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Title: Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks Skalierungskollaps offenbart universelle Dynamiken in berechenbaren neuralen Netzwerken 计算最佳培训神经网络中的缩放 缩放 折叠 Revials 通用动态 2507.02119v2

Authors (5): Shikai Qiu, Lechao Xiao, Andrew Gordon Wilson, Jeffrey Pennington, Atish Agarwala

What scaling limits govern neural network training dynamics when model size and training time grow in tandem? We show that despite the complex interactions between architecture, training algorithms, and data, compute-optimally trained models exhibit a remarkably precise universality. Specifically, loss curves from models of varying sizes collapse onto a single universal curve when training compute and loss are normalized to unity at the end of training. With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor of individual loss curves across random seeds, a phenomenon we term supercollapse. We observe supercollapse across learning rate schedules, datasets, and architectures, including transformers trained on next-token prediction, and find it breaks down when hyperparameters are scaled suboptimally, providing a precise and practical indicator of good scaling. We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws, and analyzing a simple yet surprisingly effective model of SGD noise dynamics that accurately predicts loss curves across various learning rate schedules and quantitatively explains the origin of supercollapse.

nan


Article 826

Title@2025-07-07 (1): Fully Automatic Neural Network Reduction for Formal Verification

Title: Fully Automatic Neural Network Reduction for Formal Verification Vollautomatische Neural-Netzwerk-Reduktion für formale Überprüfung 用于正式核查的全自动神经网络减少 2305.01932v3

Authors (2): Tobias Ladner, Matthias Althoff

Formal verification of neural networks is essential before their deployment in safety-critical applications. However, existing methods for formally verifying neural networks are not yet scalable enough to handle practical problems under strict time constraints. We address this challenge by introducing a fully automatic and sound reduction of neural networks using reachability analysis. The soundness ensures that the verification of the reduced network entails the verification of the original network. Our sound reduction approach is applicable to neural networks with any type of element-wise activation function, such as ReLU, sigmoid, and tanh. The network reduction is computed on the fly while simultaneously verifying the original network and its specification. All parameters are automatically tuned to minimize the network size without compromising verifiability. We further show the applicability of our approach to convolutional neural networks by explicitly exploiting similar neighboring pixels. Our evaluation shows that our approach reduces large neural networks to a fraction of the original number of neurons and thus shortens the verification time to a similar degree.

nan


Article 827

Title@2025-07-07 (1): Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Title: Recovering Plasticity of Neural Networks via Soft Weight Rescaling Wiederherstellung der Plastizität von neuralen Netzwerken durch weiche Gewichtsreskalierung 通过软重量缩放恢复神经网络的可塑性 2507.04683v1

Authors (4): Seungwon Oh, Sangyeon Park, Isaac Han, Kyung-Joong Kim

Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks.

nan


Article 828

Title@2025-07-07 (1): Operator-based machine learning framework for generalizable prediction of unsteady treatment dynamics in stormwater infrastructure

Title: Operator-based machine learning framework for generalizable prediction of unsteady treatment dynamics in stormwater infrastructure Bedienerbasierter Rahmen für maschinelles Lernen zur allgemeinen Vorhersage unbeständiger Behandlungsdynamiken in der Sturmwasserinfrastruktur 以操作人员为基础的机械学习框架,以普遍预测暴雨水基础设施中的不稳定处理动态 2507.04682v1

Authors (3): Mohamed Shatarah, Kai Liu, Haochen Li

Stormwater infrastructures are decentralized urban water-management systems that face highly unsteady hydraulic and pollutant loadings from episodic rainfall-runoff events. Accurately evaluating their in-situ treatment performance is essential for cost-effective design and planning. Traditional lumped dynamic models (e.g., continuously stirred tank reactor, CSTR) are computationally efficient but oversimplify transport and reaction processes, limiting predictive accuracy and insight. Computational fluid dynamics (CFD) resolves detailed turbulent transport and pollutant fate physics but incurs prohibitive computational cost for unsteady and long-term simulations. To address these limitations, this study develops a composite operator-based neural network (CPNN) framework that leverages state-of-the-art operator learning to predict the spatial and temporal dynamics of hydraulics and particulate matter (PM) in stormwater treatment. The framework is demonstrated on a hydrodynamic separator (HS), a common urban treatment device. Results indicate that the CPNN achieves R2 > 0.8 for hydraulic predictions in 95.2% of test cases; for PM concentration predictions, R2 > 0.8 in 72.6% of cases and 0.4 < R2 < 0.8 in 22.6%. The analysis identifies challenges in capturing dynamics under extreme low-flow conditions, owing to their lower contribution to the training loss. Exploiting the automatic-differentiation capability of the CPNN, sensitivity analyses quantify the influence of storm event loading on PM transport. Finally, the potential of the CPNN framework for continuous, long-term evaluation of stormwater infrastructure performance is discussed, marking a step toward robust, climate-aware planning and implementation.

nan


Article 829

Title@2025-07-07 (1): Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

Title: Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation Identifizieren, Isolieren und Spülen: Halluzinationen in LVLMs mittels selbst-entwickelnder Destillation abmildern 识别、孤立和清洗:通过自演蒸馏减少LVLMs中的幻觉 2507.04680v1

Authors (8): Wenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu

Large Vision-Language Models (LVLMs) have demonstrated remarkable advancements in numerous areas such as multimedia. However, hallucination issues significantly limit their credibility and application potential. Existing mitigation methods typically rely on external tools or the comparison of multi-round inference, which significantly increase inference time. In this paper, we propose \textbf{SE}lf-\textbf{E}volving \textbf{D}istillation (\textbf{SEED}), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model, enabling self-evolution. Furthermore, we identified that traditional distillation methods are prone to inducing void spaces in the output space of LVLMs. To address this issue, we propose a Mode-Seeking Evolving approach, which performs distillation to capture the dominant modes of the purified knowledge distribution, thereby avoiding the chaotic results that could emerge from void spaces. Moreover, we introduce a Hallucination Elimination Adapter, which corrects the dark knowledge of the original model by learning purified knowledge. Extensive experiments on multiple benchmarks validate the superiority of our SEED, demonstrating substantial improvements in mitigating hallucinations for representative LVLM models such as LLaVA-1.5 and InternVL2. Remarkably, the F1 score of LLaVA-1.5 on the hallucination evaluation metric POPE-Random improved from 81.3 to 88.3.

nan


Article 830

Title@2025-07-07 (1): QuTE: decentralized multiple testing on sensor networks with false discovery rate control

Title: QuTE: decentralized multiple testing on sensor networks with false discovery rate control QuTE: dezentrale Mehrfachprüfung in Sensornetzwerken mit falscher Discovery-Rate-Steuerung QuTE:对带有虚假发现率控制的传感器网络进行分散式多重测试 2210.04334v2

Authors (4): Aaditya Ramdas, Jianbo Chen, Martin J. Wainwright, Michael I. Jordan

This paper designs methods for decentralized multiple hypothesis testing on graphs that are equipped with provable guarantees on the false discovery rate (FDR). We consider the setting where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-values corresponding to one or more hypotheses local to its node. Each agent must individually decide whether to reject one or more of its local hypotheses by only communicating with its neighbors, with the joint aim that the global FDR over the entire graph must be controlled at a predefined level. We propose a simple decentralized family of Query-Test-Exchange (QuTE) algorithms and prove that they can control FDR under independence or positive dependence of the p-values. Our algorithm reduces to the Benjamini-Hochberg (BH) algorithm when after graph-diameter rounds of communication, and to the Bonferroni procedure when no communication has occurred or the graph is empty. To avoid communicating real-valued p-values, we develop a quantized BH procedure, and extend it to a quantized QuTE procedure. QuTE works seamlessly in streaming data settings, where anytime-valid p-values may be continually updated at each node. Last, QuTE is robust to arbitrary dropping of packets, or a graph that changes at every step, making it particularly suitable to mobile sensor networks involving drones or other multi-agent systems. We study the power of our procedure using a simulation suite of different levels of connectivity and communication on a variety of graph structures, and also provide an illustrative real-world example.

nan


Article 831

Title@2025-07-07 (1): Normality-Guided Distributional Reinforcement Learning for Continuous Control

Title: Normality-Guided Distributional Reinforcement Learning for Continuous Control Normalitätsorientiertes Verstärkungslernen zur kontinuierlichen Kontrolle 为持续控制而指导的正常水平分配强化学习 2208.13125v4

Authors (2): Ju-Seung Byun, Andrew Perrault

Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirically quite close to normal. We design a method that exploits this property, employing variances predicted from a variance network, along with returns, to analytically compute target quantile bars representing a normal for our distributional value function. In addition, we propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function. The approach we outline is compatible with many DRL structures. We use two representative on-policy algorithms, PPO and TRPO, as testbeds. Our method yields statistically significant improvements in 10 out of 16 continuous task settings, while utilizing a reduced number of weights and achieving faster training time compared to an ensemble-based method for quantifying value distribution uncertainty.

nan


Article 832

Title@2025-07-07 (1): Enhancing Long Video Generation Consistency without Tuning

Title: Enhancing Long Video Generation Consistency without Tuning Verbesserung der langen Video-Generierungskonsistenz ohne Tuning 增强长视频一代的连续性,不收费 2412.17254v2

Authors (6): Xingyao Li, Fengzhuo Zhang, Jiachun Pan, Yunlong Hou, Vincent Y. F. Tan, Zhuoran Yang

Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the generated videos, particularly in terms of their smoothness and transitions between scenes. We address these issues to enhance the consistency and coherence of videos generated with either single or multiple prompts. We propose the Time-frequency based temporal Attention Reweighting Algorithm (TiARA), which judiciously edits the attention score matrix based on the Discrete Short-Time Fourier Transform. This method is supported by a frequency-based analysis, ensuring that the edited attention score matrix achieves improved consistency across frames. It represents the first-of-its-kind for frequency-based methods in video diffusion models. For videos generated by multiple prompts, we further uncover key factors such as the alignment of the prompts affecting prompt interpolation quality. Inspired by our analyses, we propose PromptBlend, an advanced prompt interpolation pipeline that systematically aligns the prompts. Extensive experimental results validate the efficacy of our proposed method, demonstrating consistent and substantial improvements over multiple baselines.

nan


Article 833

Title@2025-07-07 (1): Universal approximation results for neural networks with non-polynomial activation function over non-compact domains

Title: Universal approximation results for neural networks with non-polynomial activation function over non-compact domains Universelle Näherungsergebnisse für neuronale Netzwerke mit nicht-polynomischer Aktivierungsfunktion über nicht-kompakte Domänen 具有非常规领域非球体激活功能的神经网络的通用近似结果 2410.14759v4

Authors (2): Ariel Neufeld, Philipp Schmocker

This paper extends the universal approximation property of single-hidden-layer feedforward neural networks beyond compact domains, which is of particular interest for the approximation within weighted $C^k$-spaces and weighted Sobolev spaces over unbounded domains. More precisely, by assuming that the activation function is non-polynomial, we establish universal approximation results within function spaces defined over non-compact subsets of a Euclidean space, including $L^p$-spaces, weighted $C^k$-spaces, and weighted Sobolev spaces, where the latter two include the approximation of the (weak) derivatives. Moreover, we provide some dimension-independent rates for approximating a function with sufficiently regular and integrable Fourier transform by neural networks with non-polynomial activation function.

nan


Article 834

Title@2025-07-07 (1): DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation

Title: DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation DANCE: Ressourceneffiziente Neuralarchitektur-Suche mit Daten-Bewusstsein und kontinuierlicher Anpassung 包括数据软件和持续适应在内的资源效率高的神经结构搜索 2507.04671v1

Authors (9): Maolin Wang, Tianshuo Wei, Sheng Zhang, Ruocheng Guo, Wanyu Wang, Shanshan Ye, Lixin Zou, Xuetao Wei, Xiangyu Zhao

Neural Architecture Search (NAS) has emerged as a powerful approach for automating neural network design. However, existing NAS methods face critical limitations in real-world deployments: architectures lack adaptability across scenarios, each deployment context requires costly separate searches, and performance consistency across diverse platforms remains challenging. We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem through learning distributions over architectural components. DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization. Extensive experiments across five datasets demonstrate DANCE’s effectiveness. Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs. Under varying computational constraints, DANCE maintains robust performance while smoothly adapting architectures to different hardware requirements. The code and appendix can be found at https://github.com/Applied-Machine-Learning-Lab/DANCE.

nan


Article 835

Title@2025-07-07 (1): Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

Title: Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading Pensieve Grader: Eine KI-Powered, Ready-to-Use Plattform für mühelose handschriftliche STEM-Grading Pensieve grafer: 一个AI授权的无力手写STEM分级的现用平台 2507.01431v2

Authors (4): Yoonseok Yang, Minjune Kim, Marlon Rondinelli, Keren Shao

Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.

nan


Article 836

Title@2025-07-07 (1): Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks for Signal Data Augmentation in Ultra-precision Machining Surface Roughness Prediction

Title: Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks for Signal Data Augmentation in Ultra-precision Machining Surface Roughness Prediction Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks für Signaldaten Augmentation in Ultrapräzision Bearbeitung Oberflächenrauigkeit Vorhersage 超精密断层表面粗糙预测中信号数据增强条件性反向网络 2507.04665v1

Authors (3): Suiyan Shang, Chi Fai Cheung, Pai Zheng

Accurate surface roughness prediction in ultra-precision machining (UPM) is critical for real-time quality control, but small datasets hinder model performance. We propose HAS-CGAN, a Hybrid Adversarial Spectral Loss CGAN, for effective UPM data augmentation. Among five CGAN variants tested, HAS-CGAN excels in 1D force signal generation, particularly for high-frequency signals, achieving >0.85 wavelet coherence through Fourier-domain optimization. By combining generated signals with machining parameters, prediction accuracy significantly improves. Experiments with traditional ML (SVR, RF, LSTM) and deep learning models (BPNN, 1DCNN, CNN-Transformer) demonstrate that augmenting training data with 520+ synthetic samples reduces prediction error from 31.4% (original 52 samples) to ~9%, effectively addressing data scarcity in UPM roughness prediction.”

nan


Article 837

Title@2025-07-07 (1): A Cycle-Consistency Constrained Framework for Dynamic Solution Space Reduction in Noninjective Regression

Title: A Cycle-Consistency Constrained Framework for Dynamic Solution Space Reduction in Noninjective Regression Ein zyklisch-konsistenzbeschränktes Framework zur Reduzierung des Raumes dynamischer Lösungen bei nichtinjektiver Regression 循环-一致性协调框架,以在非预测性倒退中减少动态解决办法空间 2507.04659v1

Authors (2): Hanzhang Jia, Yi Gao

To address the challenges posed by the heavy reliance of multi-output models on preset probability distributions and embedded prior knowledge in non-injective regression tasks, this paper proposes a cycle consistency-based data-driven training framework. The method jointly optimizes a forward model {\Phi}: X to Y and a backward model {\Psi}: Y to X, where the cycle consistency loss is defined as L _cycleb equal L(Y reduce {\Phi}({\Psi}(Y))) (and vice versa). By minimizing this loss, the framework establishes a closed-loop mechanism integrating generation and validation phases, eliminating the need for manual rule design or prior distribution assumptions. Experiments on normalized synthetic and simulated datasets demonstrate that the proposed method achieves a cycle reconstruction error below 0.003, achieving an improvement of approximately 30% in evaluation metrics compared to baseline models without cycle consistency. Furthermore, the framework supports unsupervised learning and significantly reduces reliance on manual intervention, demonstrating potential advantages in non-injective regression tasks.

nan


Article 838

Title@2025-07-07 (1): Quantum Doeblin Coefficients: Interpretations and Applications

Title: Quantum Doeblin Coefficients: Interpretations and Applications Quanten-Doeblin-Koeffizienten: Interpretationen und Anwendungen Doeblin系数:解释和应用 2503.22823v2

Authors (4): Ian George, Christoph Hirche, Theshani Nuradha, Mark M. Wilde

In classical information theory, the Doeblin coefficient of a classical channel provides an efficiently computable upper bound on the total-variation contraction coefficient of the channel, leading to what is known as a strong data-processing inequality. Here, we investigate quantum Doeblin coefficients as a generalization of the classical concept. In particular, we define various new quantum Doeblin coefficients, one of which has several desirable properties, including concatenation and multiplicativity, in addition to being efficiently computable. We also develop various interpretations of two of the quantum Doeblin coefficients, including representations as minimal singlet fractions, exclusion values, reverse max-mutual and oveloH informations, reverse robustnesses, and hypothesis testing reverse mutual and oveloH informations. Our interpretations of quantum Doeblin coefficients as either entanglement-assisted or unassisted exclusion values are particularly appealing, indicating that they are proportional to the best possible error probabilities one could achieve in state-exclusion tasks by making use of the channel. We also outline various applications of quantum Doeblin coefficients, ranging from limitations on quantum machine learning algorithms that use parameterized quantum circuits (noise-induced barren plateaus), on error mitigation protocols, on the sample complexity of noisy quantum hypothesis testing, and on mixing, distinguishability, and decoupling times of time-varying channels. All of these applications make use of the fact that quantum Doeblin coefficients appear in upper bounds on various trace-distance contraction coefficients of a channel. Furthermore, in all of these applications, our analysis using Doeblin coefficients provides improvements of various kinds over contributions from prior literature, both in terms of generality and being efficiently computable.

nan


Article 839

Title@2025-07-07 (1): VaxPulse: Monitoring of Online Public Concerns to Enhance Post-licensure Vaccine Surveillance

Title: VaxPulse: Monitoring of Online Public Concerns to Enhance Post-licensure Vaccine Surveillance VaxPulse: Überwachung der Online-Belange der Öffentlichkeit zur Verbesserung der Überwachung nach der Lizenzierung von Impfstoffen VaxPulse:监测公众在线关注加强许可证后疫苗监测 2507.04656v1

Authors (6): Muhammad Javed, Sedigh Khademi, Joanne Hickman, Jim Buttery, Hazel Clothier, Gerardo Luis Dimaguila

The recent vaccine-related infodemic has amplified public concerns, highlighting the need for proactive misinformation management. We describe how we enhanced the reporting surveillance system of Victoria’s vaccine safety service, SAEFVIC, through the incorporation of new information sources for public sentiment analysis, topics of discussion, and hesitancies about vaccinations online. Using VaxPulse, a multi-step framework, we integrate adverse events following immunisation (AEFI) with sentiment analysis, demonstrating the importance of contextualising public concerns. Additionally, we emphasise the need to address non-English languages to stratify concerns across ethno-lingual communities, providing valuable insights for vaccine uptake strategies and combating mis/disinformation. The framework is applied to real-world examples and a case study on women’s vaccine hesitancy, showcasing its benefits and adaptability by identifying public opinion from online media.

nan


Article 840

Title@2025-07-07 (1): Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

Title: Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models Breach in the Shield: Enthüllen der Schwachstellen großer Sprachmodelle 打破盾牌:消除大语言模式的脆弱性 2504.03714v2

Authors (4): Runpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu

Large Language Models (LLMs) and Vision-Language Models (VLMs) have achieved impressive performance across a wide range of tasks, yet they remain vulnerable to carefully crafted perturbations. In this study, we seek to pinpoint the sources of this fragility by identifying parameters and input dimensions (pixels or token embeddings) that are susceptible to such perturbations. To this end, we propose a stability measure called \textbf{FI}, \textbf{F}irst order local \textbf{I}nfluence, which is rooted in information geometry and quantifies the sensitivity of individual parameter and input dimensions. Our extensive analysis across LLMs and VLMs (from 1.5B to 13B parameters) reveals that: (I) A small subset of parameters or input dimensions with high FI values disproportionately contribute to model brittleness. (II) Mitigating the influence of these vulnerable parameters during model merging leads to improved performance.

nan


Article 841

Title@2025-07-07 (1): Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Title: Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance Selbst-rektifizierende Diffusions-Probenahme mit Perturbed-Attention-Anleitung 使用受扰动的防扰动指导进行自我强化扩散抽样 2403.17377v2

Authors (9): Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms’ ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.

nan


Article 842

Title@2025-07-07 (1): Decision Feedback In-Context Learning for Wireless Symbol Detection

Title: Decision Feedback In-Context Learning for Wireless Symbol Detection Entscheidungsfeedback In-Context-Lernen für drahtlose Symbolerkennung 无线电无线符号探测的网上学习 2503.16594v2

Authors (4): Li Fan, Wei Shen, Jing Yang, Cong Shen

Pre-trained Transformers, through in-context learning (ICL), have demonstrated exceptional capabilities to adapt to new tasks using example prompts without model update. Transformer-based wireless receivers, where prompts consist of the pilot data in the form of transmitted and received signal pairs, have shown high detection accuracy when pilot data are abundant. However, pilot information is often costly and limited in practice. In this work, we propose DEcision Feedback IN-ContExt Detection (DEFINED) as a new wireless receiver design, which bypasses channel estimation and directly performs symbol detection using the (sometimes extremely) limited pilot data. The key innovation in DEFINED is the proposed decision feedback mechanism in ICL, where we sequentially incorporate the detected symbols into the prompts as pseudo-labels to improve the detection for subsequent symbols. We further establish an error lower bound and provide theoretical insights into the model’s generalization under channel distribution mismatch. Extensive experiments across a broad range of wireless settings demonstrate that a small Transformer trained with DEFINED achieves significant performance improvements over conventional methods, in some cases only needing a single pilot pair to achieve similar performance to the latter with more than 4 pilot pairs.

nan


Article 843

Title@2025-07-07 (1): SOSAE: Self-Organizing Sparse AutoEncoder

Title: SOSAE: Self-Organizing Sparse AutoEncoder SOSAE: Selbstorganisierender Sparse AutoEncoder SOSAE: 自行组织式的垃圾自动编码器 2507.04644v1

Authors (6): Sarthak Ketanbhai Modi, Zi Pong Lim, Yushi Cao, Yupeng Cheng, Yon Shin Teo, Shang-Wei Lin

The process of tuning the size of the hidden layers for autoencoders has the benefit of providing optimally compressed representations for the input data. However, such hyper-parameter tuning process would take a lot of computation and time effort with grid search as the default option. In this paper, we introduce the Self-Organization Regularization for Autoencoders that dynamically adapts the dimensionality of the feature space to the optimal size. Inspired by physics concepts, Self-Organizing Sparse AutoEncoder (SOSAE) induces sparsity in feature space in a structured way that permits the truncation of the non-active part of the feature vector without any loss of information. This is done by penalizing the autoencoder based on the magnitude and the positional index of the feature vector dimensions, which during training constricts the feature space in both terms. Extensive experiments on various datasets show that our SOSAE can tune the feature space dimensionality up to 130 times lesser Floating-point Operations (FLOPs) than other baselines while maintaining the same quality of tuning and performance.

nan


Article 844

Title@2025-07-07 (1): Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Title: Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models? 快速困难能否预测为加速理据模型的RL微调而在线化? 2507.04632v1

Authors (5): Yun Qu, Qi Cheems Wang, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji

Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline’s reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt’s success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.

nan


Article 845

Title@2025-07-07 (1): Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Title: Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation Multimodales Latent Diffusionsmodell für komplexe Nähmuster-Generierung 复杂缝合模式生成的多式联运中流传播模式模型 2412.14453v2

Authors (9): Shengqi Liu, Yuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Lincheng Li, Mengxiao Bi, Xiaokang Yang, Yichao Yan

Generating sewing patterns in garment design is receiving increasing attention due to its CG-friendly and flexible-editing nature. Previous sewing pattern generation methods have been able to produce exquisite clothing, but struggle to design complex garments with detailed control. To address these issues, we propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches. Initially, we extend the original vector of sewing patterns into a more comprehensive representation to cover more intricate details and then compress them into a compact latent space. To learn the sewing pattern distribution in the latent space, we design a two-step training strategy to inject the multi-modal conditions, \ie, body shapes, text prompts, and garment sketches, into a diffusion model, ensuring the generated garments are body-suited and detail-controlled. Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method, significantly surpassing previous approaches in terms of complex garment design and various body adaptability. Our project page: https://shengqiliu1.github.io/SewingLDM.

nan


Article 846

Title@2025-07-07 (1): VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

Title: VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung VCDiag: 失灵千兆字节加速不规则波形分类 2506.03590v2

Authors (7): Minh Luu, Surya Jasper, Khoi Le, Evan Pan, Michael Quinn, Aakash Tyagi, Jiang Hu

Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.

nan


Article 847

Title@2025-07-07 (1): Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences

Title: Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences Multimodale LLM integrierte semantische Kommunikation für 6G Immersive Erfahrungen 用于6G吸收经验的多式多式LLM综合语义通信 2507.04621v1

Authors (6): Yusong Zhang, Yuxuan Sun, Lei Guo, Wei Chen, Bo Ai, Deniz Gunduz

6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.

nan


Article 848

Title@2025-07-07 (1): Information-Guided Diffusion Sampling for Dataset Distillation

Title: Information-Guided Diffusion Sampling for Dataset Distillation Informationsgeführte Diffusionsproben für die Datensatzdestillation 用于数据集蒸馏的信息引导扩散抽样 2507.04619v1

Authors (6): Linfeng Ye, Shayan Mohajer Hamidi, Guang Li, Takahiro Ogawa, Miki Haseyama, Konstantinos N. Plataniotis

Dataset distillation aims to create a compact dataset that retains essential information while maintaining model performance. Diffusion models (DMs) have shown promise for this task but struggle in low images-per-class (IPC) settings, where generated samples lack diversity. In this paper, we address this issue from an information-theoretic perspective by identifying two key types of information that a distilled dataset must preserve: ($i$) prototype information $\mathrm{I}(X;Y)$, which captures label-relevant features; and ($ii$) contextual information $\mathrm{H}(X Y)$, which preserves intra-class variability. Here, $(X,Y)$ represents the pair of random variables corresponding to the input data and its ground truth label, respectively. Observing that the required contextual information scales with IPC, we propose maximizing $\mathrm{I}(X;Y) + \beta \mathrm{H}(X Y)$ during the DM sampling process, where $\beta$ is IPC-dependent. Since directly computing $\mathrm{I}(X;Y)$ and $\mathrm{H}(X Y)$ is intractable, we develop variational estimations to tightly lower-bound these quantities via a data-driven approach. Our approach, information-guided diffusion sampling (IGDS), seamlessly integrates with diffusion models and improves dataset distillation across all IPC settings. Experiments on Tiny ImageNet and ImageNet subsets show that IGDS significantly outperforms existing methods, particularly in low-IPC regimes. The code will be released upon acceptance.

nan


Article 849

Title@2025-07-07 (1): AI for the Open-World: the Learning Principles

Title: AI for the Open-World: the Learning Principles KI für die offene Welt: die Lernprinzipien 开放世界大赦国际:学习原则 2504.14751v2

Authors (1): Jianyu Zhang

During the past decades, numerous successes of AI has been made on “specific capabilities”, named closed-world, such as artificial environments or specific real-world tasks. This well-defined narrow capability brings two nice benefits, a clear criterion of success and the opportunity to collect a lot of examples. The criteria not only reveal whether a machine has achieved a goal, but reveal how the machine falls short of the goal. As a result, human designers can fix the problems one after the other until the machine is deemed good enough for the task. Furthermore, the large set of collected examples reduces the difficulty of this problem-fixing process (by the central limit theorem). Do the success in closed-world translate into broad open-world, where a machine is required to perform any task that a human could possibly undertake with fewer examples and less priori knowledge from human designers? No. Because competence in a specific task provides little insight in handling other tasks, the valuable criteria for specific tasks become helpless when handling broader unseen tasks. Furthermore, due to the shortage of examples in unseen tasks, central limit theorem does not stand on our side. At the end, human designers lose the oscilloscope to “hack” an AI system for the open-world. Achieving AI for the open-world requires unique learning principles and innovated techniques, which are different from the ones in building AI for the closed-world. This thesis explores necessary learning principles required to construct AI for the open-world, including rich features (analogy a large tool box), disentangled representation (an organized tool box), and inference-time learning (a tool-savvy hand). Driven by the learning principles, this thesis further proposes techniques to use the learning principles, conducts enormous large-scale experiments to verify the learning principles.

nan


Article 850

Title@2025-07-07 (1): Towards Cost-Effective Reward Guided Text Generation

Title: Towards Cost-Effective Reward Guided Text Generation Auf dem Weg zu einer kosteneffizienten, von der Belohnung geleiteten Textgenerierung 实现成本-效益高的回报制制制制制制 2502.04517v2

Authors (6): Ahmad Rashid, Ruotian Wu, Rongqi Fan, Hongliang Li, Agustinus Kristiadi, Pascal Poupart

Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training like in standard RLHF methods. However, they rely on a reward model to score each candidate token generated by the language model at inference, incurring significant test-time overhead. Additionally, the reward model is usually only trained to score full sequences, which can lead to sub-optimal choices for partial sequences. In this work, we present a novel reward model architecture that is trained, using a Bradley-Terry loss, to prefer the optimal expansion of a sequence with just a \emph{single call} to the reward model at each step of the generation process. That is, a score for all possible candidate tokens is generated simultaneously, leading to efficient inference. We theoretically analyze various RGTG reward models and demonstrate that prior techniques prefer sub-optimal sequences compared to our method during inference. Empirically, our reward model leads to significantly faster inference than other RGTG methods. It requires fewer calls to the reward model and performs competitively compared to previous RGTG and offline RLHF methods.

nan


Article 851

Title@2025-07-07 (1): Network Topology Inference from Smooth Signals Under Partial Observability

Title: Network Topology Inference from Smooth Signals Under Partial Observability Netzwerk Topologie Schlussfolgerung von glatten Signalen unter teilweiser Beobachtung 局部可部分观测下平滑信号产生的网络地形推断 2410.05707v3

Authors (4): Chuansen Peng, Hanning Tang, Zhiguo Wang, Xiaojing Shen

Inferring network topology from smooth signals is a significant problem in data science and engineering. A common challenge in real-world scenarios is the availability of only partially observed nodes. While some studies have considered hidden nodes and proposed various optimization frameworks, existing methods often lack the practical efficiency needed for large-scale networks or fail to provide theoretical convergence guarantees. In this paper, we address the problem of inferring network topologies from smooth signals with partially observed nodes. We propose a first-order algorithmic framework that includes two variants: one based on column sparsity regularization and the other on a low-rank constraint. We establish theoretical convergence guarantees and demonstrate the linear convergence rate of our algorithms. Extensive experiments on both synthetic and real-world data show that our results align with theoretical predictions, exhibiting not only linear convergence but also superior speed compared to existing methods. To the best of our knowledge, this is the first work to propose a first-order algorithmic framework for inferring network structures from smooth signals under partial observability, offering both guaranteed linear convergence and practical effectiveness for large-scale networks.

nan


Article 852

Title@2025-07-07 (1): Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track

Title: Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v3

Authors (14): Rylan Schaeffer, Joshua Kazdan, Yegor Denisov-Blanch, Brando Miranda, Matthias Gerstgrasser, Susan Zhang, Andreas Haupt, Isha Gupta, Elyas Obbad, Jesse Dodge, Jessica Zosa Forde, Francesco Orabona, Sanmi Koyejo, David Donoho

Science progresses by iteratively advancing and correcting humanity’s understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated “Refutations and Critiques” (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.

nan


Article 853

Title@2025-07-07 (1): any4: Learned 4-bit Numeric Representation for LLMs

Title: any4: Learned 4-bit Numeric Representation for LLMs any4: Erlernte 4-Bit numerische Darstellung für LLMs 任何4: 获得4位数的LLM 数字代表数 2507.04610v1

Authors (2): Mostafa Elhoushi, Jeff Johnson

We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. any4 yields higher accuracy compared to other related 4-bit numeric representation types: int4, fp4 and nf4, as evaluated on a range of model sizes, generations and families (Llama 2, Llama 3, Mistral and Mixtral). While any4 does not require preprocessing of weights or activations, it is also competitive with orthogonal techniques that require such preprocessing (e.g., AWQ and GPTQ). We also experiment with any3 and any2 and show competitiveness at lower bits. Additionally, we show that we can calibrate using a single curated diverse sample rather than hundreds of samples from a dataset as done in most quantization approaches. We also open source tinygemm, a latency optimized GPU matrix multiplication library for LLMs, that implements any4 using a GPU-efficient lookup table strategy along with other common quantization methods. We open source our code at https://github.com/facebookresearch/any4 .

nan


Article 854

Title@2025-07-07 (1): Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

Title: Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions Beschleunigtes Online-Verstärkungslernen mit Hilfe von Hilfs-Zustandsverteilungen 利用辅助启动州分发物资加速在线强化学习 2507.04606v1

Authors (3): Aman Mehra, Alexandre Capone, Jeff Schneider

A long-standing problem in online reinforcement learning (RL) is of ensuring sample efficiency, which stems from an inability to explore environments efficiently. Most attempts at efficient exploration tackle this problem in a setting where learning begins from scratch, without prior information available to bootstrap learning. However, such approaches fail to leverage expert demonstrations and simulators that can reset to arbitrary states. These affordances are valuable resources that offer enormous potential to guide exploration and speed up learning. In this paper, we explore how a small number of expert demonstrations and a simulator allowing arbitrary resets can accelerate learning during online RL. We find that training with a suitable choice of an auxiliary start state distribution that may differ from the true start state distribution of the underlying Markov Decision Process can significantly improve sample efficiency. We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning. By using episode length information as a way to operationalize this notion, we demonstrate state-of-the-art sample efficiency on a sparse-reward hard-exploration environment.

nan


Article 855

Title@2025-07-07 (1): A Statistical Approach for Synthetic EEG Data Generation

Title: A Statistical Approach for Synthetic EEG Data Generation Ein statistischer Ansatz für die synthetische EEG-Datenerstellung 合成EEG数据生成的统计方法 2504.16143v2

Authors (5): Gideon Vos, Maryam Ebrahimpour, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi

Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data. We first analyze interdependencies between EEG frequency bands using correlation analysis. Guided by this structure, we generate synthetic samples via random sampling. Samples with high correlation to real data are retained and evaluated through distribution analysis and classification tasks. A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity. The generated synthetic data closely match the statistical and structural properties of the original EEG, with similar correlation coefficients and no significant differences in PERMANOVA tests. This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.

nan


Article 856

Title@2025-07-07 (1): Predicting Drivers’ Route Trajectories in Last-Mile Delivery Using A Pair-wise Attention-based Pointer Neural Network

Title: Predicting Drivers’ Route Trajectories in Last-Mile Delivery Using A Pair-wise Attention-based Pointer Neural Network Vorhersage der Routentrajektorien von Fahrern in der Last-Mile-Lieferung mit einem paarweise aufmerksamkeitsbasierten Neural-Netzwerk 利用以对等关注为基础的指针神经网络预测最后中程交付中的司机路线轨迹 2301.03802v2

Authors (5): Baichuan Mo, Qing Yi Wang, Xiaotong Guo, Matthias Winkenbach, Jinhua Zhao

In last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last-mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers’ historical delivery trajectory data. In addition to the commonly used encoder-decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair-wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon’s last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization-based approaches and other machine learning methods (such as the Long Short-Term Memory encoder-decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.229 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%.

nan


Article 857

Title@2025-07-07 (1): Photon Splatting: A Physics-Guided Neural Surrogate for Real-Time Wireless Channel Prediction

Title: Photon Splatting: A Physics-Guided Neural Surrogate for Real-Time Wireless Channel Prediction Photon Splatting: Ein physikgeführtes Neural Surrogate für Echtzeit-Wireless-Channel-Vorhersage 光子喷射:实时无线频道预测的物理引导神经系统巡视 2507.04595v1

Authors (3): Ge Cao, Gabriele Gradoni, Zhen Peng

We present Photon Splatting, a physics-guided neural surrogate model for real-time wireless channel prediction in complex environments. The proposed framework introduces surface-attached virtual sources, referred to as photons, which carry directional wave signatures informed by the scene geometry and transmitter configuration. At runtime, channel impulse responses (CIRs) are predicted by splatting these photons onto the angular domain of the receiver using a geodesic rasterizer. The model is trained to learn a physically grounded representation that maps transmitter-receiver configurations to full channel responses. Once trained, it generalizes to new transmitter positions, antenna beam patterns, and mobile receivers without requiring model retraining. We demonstrate the effectiveness of the framework through a series of experiments, from canonical 3D scenes to a complex indoor cafe with 1,000 receivers. Results show 30 millisecond-level inference latency and accurate CIR predictions across a wide range of configurations. The approach supports real-time adaptability and interpretability, making it a promising candidate for wireless digital twin platforms and future 6G network planning.

nan


Article 858

Title@2025-07-07 (1): Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)

Title: Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M) Structured Captions verbessern die sofortige Einhaltung von Text-zu-Bild-Modellen (Re-LAION-Caption 19M) 结构说明 改进对文本到图像模型的迅速遵守(Re-LAION-Caption 19M) 2507.05300v1

Authors (4): Nicholas Merchant, Haitz Sáez de Ocáriz Borde, Andrei Cristian Popescu, Carlos Garcia Jurado Suarez

We argue that generative text-to-image models often struggle with prompt adherence due to the noisy and unstructured nature of large-scale datasets like LAION-5B. This forces users to rely heavily on prompt engineering to elicit desirable outputs. In this work, we propose that enforcing a consistent caption structure during training can significantly improve model controllability and alignment. We introduce Re-LAION-Caption 19M, a high-quality subset of Re-LAION-5B, comprising 19 million 1024x1024 images with captions generated by a Mistral 7B Instruct-based LLaVA-Next model. Each caption follows a four-part template: subject, setting, aesthetics, and camera details. We fine-tune PixArt-$\Sigma$ and Stable Diffusion 2 using both structured and randomly shuffled captions, and show that structured versions consistently yield higher text-image alignment scores using visual question answering (VQA) models. The dataset is publicly available at https://huggingface.co/datasets/supermodelresearch/Re-LAION-Caption19M.

nan


Article 859

Title@2025-07-07 (1): Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation

Title: Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation Gradient-geführte Aufmerksamkeitskarte Bearbeitung: Auf dem Weg zu einer effizienten kontextuellen Halluzination 逐步制导注意地图编辑:努力有效减少环境幻觉 2503.08963v2

Authors (6): Yu Wang, Kamalika Das, Xiang Gao, Wendi Cui, Peng Li, Jiaxin Zhang

In tasks like summarization and open-book question answering (QA), Large Language Models (LLMs) often encounter “contextual hallucination”, where they produce irrelevant or incorrect responses despite having access to accurate source information. This typically occurs because these models tend to prioritize self-generated content over the input context, causing them to disregard pertinent details. To address this challenge, we introduce a novel method called “Guided Attention Map Editing” (GAME), which dynamically adjusts attention maps to improve contextual relevance. During inference, GAME employs a trained classifier to identify attention maps prone to inducing hallucinations and executes targeted interventions. These interventions, guided by gradient-informed “edit directions’’, strategically redistribute attention weights across various heads to effectively reduce hallucination. Comprehensive evaluations on challenging summarization and open-book QA tasks show that GAME consistently reduces hallucinations across a variety of open-source models. Specifically, GAME reduces hallucinations by 10% in the XSum summarization task while achieving a 7X speed-up in computational efficiency compared to the state-of-the-art baselines.

nan


Article 860

Title@2025-07-07 (1): Label-free evaluation of lung and heart transplant biopsies using tissue autofluorescence-based virtual staining

Title: Label-free evaluation of lung and heart transplant biopsies using tissue autofluorescence-based virtual staining Labelfreie Bewertung von Lungen- und Herztransplantationsbiopsien unter Verwendung von Gewebe-Autofluoreszenz-basierter virtueller Färbung 利用组织基于自体荧光的虚拟污点,对肺部和心脏移植活性细胞进行无标签的无标签评价 2409.05255v2

Authors (15): Yuzhu Li, Nir Pillar, Tairan Liu, Guangdong Ma, Yuxuan Qi, Kevin de Haan, Yijie Zhang, Xilin Yang, Adrian J. Correa, Guangqian Xiao, Kuang-Yu Jen, Kenneth A. Iczkowski, Yulun Wu, William Dean Wallace, Aydogan Ozcan

Organ transplantation serves as the primary therapeutic strategy for end-stage organ failures. However, allograft rejection is a common complication of organ transplantation. Histological assessment is essential for the timely detection and diagnosis of transplant rejection and remains the gold standard. Nevertheless, the traditional histochemical staining process is time-consuming, costly, and labor-intensive. Here, we present a panel of virtual staining neural networks for lung and heart transplant biopsies, which digitally convert autofluorescence microscopic images of label-free tissue sections into their brightfield histologically stained counterparts, bypassing the traditional histochemical staining process. Specifically, we virtually generated Hematoxylin and Eosin (H&E), Masson’s Trichrome (MT), and Elastic Verhoeff-Van Gieson (EVG) stains for label-free transplant lung tissue, along with H&E and MT stains for label-free transplant heart tissue. Subsequent blind evaluations conducted by three board-certified pathologists have confirmed that the virtual staining networks consistently produce high-quality histology images with high color uniformity, closely resembling their well-stained histochemical counterparts across various tissue features. The use of virtually stained images for the evaluation of transplant biopsies achieved comparable diagnostic outcomes to those obtained via traditional histochemical staining, with a concordance rate of 82.4% for lung samples and 91.7% for heart samples. Moreover, virtual staining models create multiple stains from the same autofluorescence input, eliminating structural mismatches observed between adjacent sections stained in the traditional workflow, while also saving tissue, expert time, and staining costs.

nan


Article 861

Title@2025-07-07 (1): A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network

Title: A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network Ein leichtes Deep-Learning-Modell für automatische Modulationsklassifikation mit Dual Path Deep Residual Shrinkage Network 使用双路深海残余缩小网络自动调整分类的轻量深学习模式 2507.04586v1

Authors (2): Prakash Suman, Yanzhen Qu

Efficient spectrum utilization is critical to meeting the growing data demands of modern wireless communication networks. Automatic Modulation Classification (AMC) plays a key role in enhancing spectrum efficiency by accurately identifying modulation schemes in received signals-an essential capability for dynamic spectrum allocation and interference mitigation, particularly in cognitive radio (CR) systems. With the increasing deployment of smart edge devices, such as IoT nodes with limited computational and memory resources, there is a pressing need for lightweight AMC models that balance low complexity with high classification accuracy. This paper proposes a low-complexity, lightweight deep learning (DL) AMC model optimized for resource-constrained edge devices. We introduce a dual-path deep residual shrinkage network (DP-DRSN) with Garrote thresholding for effective signal denoising and design a compact hybrid CNN-LSTM architecture comprising only 27,000 training parameters. The proposed model achieves average classification accuracies of 61.20%, 63.78%, and 62.13% on the RML2016.10a, RML2016.10b, and RML2018.01a datasets, respectively demonstrating a strong balance between model efficiency and classification performance. These results underscore the model’s potential for enabling accurate and efficient AMC on-edge devices with limited resources.

nan


Article 862

Title@2025-07-07 (1): Distributionally Robust Active Learning for Gaussian Process Regression

Title: Distributionally Robust Active Learning for Gaussian Process Regression Distributionell robustes aktives Lernen für Gaußsche Prozessregression Gaussian 进程倒退的分布强力积极学习 2502.16870v3

Authors (12): Shion Takeno, Yoshito Okura, Yu Inatsu, Tatsuya Aoyama, Tomonari Tanaka, Satoshi Akahane, Hiroyuki Hanada, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi

Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.

nan


Article 863

Title@2025-07-06 (7): Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates

Title: Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates Iterative lineare quadratische Optimierung für nichtlineare Steuerung: Differentielle Programmierung algorithmischer Vorlagen 非线性控控的迭代线性线性赤道优化:可区分的编程 2207.06362v2

Authors (4): Vincent Roulet, Siddhartha Srinivasa, Maryam Fazel, Zaid Harchaoui

Iterative optimization algorithms depend on access to information about the objective function. In a differentiable programming framework, this information, such as gradients, can be automatically derived from the computational graph. We explore how nonlinear control algorithms, often employing linear and/or quadratic approximations, can be effectively cast within this framework. Our approach illuminates shared components and differences between gradient descent, Gauss-Newton, Newton, and differential dynamic programming methods in the context of discrete time nonlinear control. Furthermore, we present line-search strategies and regularized variants of these algorithms, along with a comprehensive analysis of their computational complexities. We study the performance of the aforementioned algorithms on various nonlinear control benchmarks, including autonomous car racing simulations using a simplified car model. All implementations are publicly available in a package coded in a differentiable programming language.

nan


Article 864

Title@2025-07-06 (7): DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Title: DMesh++: An Efficient Differentiable Mesh for Complex Shapes DMesh++: Ein effizientes, differenzierbares Mesh für komplexe Formen DMEsh++: 用于复杂形状的高效区别网格 2412.16776v2

Authors (8): Sanghyun Son, Matheus Gadelha, Yang Zhou, Matthew Fisher, Zexiang Xu, Yi-Ling Qiao, Ming C. Lin, Yi Zhou

Recent probabilistic methods for 3D triangular meshes capture diverse shapes by differentiable mesh connectivity, but face high computational costs with increased shape details. We introduce a new differentiable mesh processing method that addresses this challenge and efficiently handles meshes with intricate structures. Our method reduces time complexity from O(N) to O(log N) and requires significantly less memory than previous approaches. Building on this innovation, we present a reconstruction algorithm capable of generating complex 2D and 3D shapes from point clouds or multi-view images. Visit our project page (https://sonsang.github.io/dmesh2-project) for source code and supplementary material.

nan


Article 865

Title@2025-07-06 (7): Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes

Title: Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes Sensor-fusionsbasierte Prognostik für Deep-Space-Habitate, die mehrere nicht markierte Ausfallmodi ausstellen 展示多种无标签故障模式的深空生境感应集成预测 2411.12159v3

Authors (5): Benjamin Peters, Ayush Mohanty, Xiaolei Fang, Stephen K. Robinson, Nagi Gebraeel

Deep-space habitats are complex systems that must operate autonomously over extended durations without ground-based maintenance. These systems are vulnerable to multiple, often unknown, failure modes that affect different subsystems and sensors in mode-specific ways. Developing accurate remaining useful life (RUL) prognostics is challenging, especially when failure labels are unavailable and sensor relevance varies by failure mode. In this paper, we propose an unsupervised prognostics framework that jointly identifies latent failure modes and selects informative sensors using only unlabeled training data. The methodology consists of two phases. In the offline phase, we model system failure times using a mixture of Gaussian regressions and apply a novel Expectation-Maximization algorithm to cluster degradation trajectories and select mode-specific sensors. In the online phase, we extract low-dimensional features from the selected sensors to diagnose the active failure mode and predict RUL using a weighted regression model. We demonstrate the effectiveness of our approach on a simulated dataset that reflects deep-space telemetry characteristics and on a real-world engine degradation dataset, showing improved accuracy and interpretability over existing methods.

nan


Article 866

Title@2025-07-06 (7): Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

Title: Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts Nile-Chat: Ägyptische Sprachmodelle für arabische und lateinische Scripte Nile-Chat:阿拉伯文和拉丁文脚本埃及语言模型 2507.04569v1

Authors (10): Guokan Shang, Hadi Abdine, Ahmad Chamma, Amr Mohamed, Mohamed Anwar, Abdelaziz Bounhar, Omar El Herraoui, Preslav Nakov, Michalis Vazirgiannis, Eric Xing

We introduce Nile-Chat-4B, 3x4B-A6B, and 12B, a collection of LLMs for Egyptian dialect, uniquely designed to understand and generate texts written in both Arabic and Latin scripts. Specifically, with Nile-Chat-3x4B-A6B, we introduce a novel language adaptation approach by leveraging the Branch-Train-MiX strategy to merge script-specialized experts, into a single MoE model. Our Nile-Chat models significantly outperform leading multilingual and Arabic LLMs, such as LLaMa, Jais, and ALLaM, on our newly introduced Egyptian evaluation benchmarks, which span both understanding and generative tasks. Notably, our 12B model yields a 14.4% performance gain over Qwen2.5-14B-Instruct on Latin-script benchmarks. All our resources are publicly available. We believe this work presents a comprehensive methodology for adapting LLMs to dual-script languages, addressing an often overlooked aspect in modern LLM development.

nan


Article 867

Title@2025-07-06 (7): Evaluating LLMs on Real-World Forecasting Against Human Superforecasters

Title: Evaluating LLMs on Real-World Forecasting Against Human Superforecasters Bewertung von LLMs bei der Real-World-Prognose gegen menschliche Superforecaster 评价针对人类超先天预测者进行真实世界预测的LLMs评价项目 2507.04562v1

Authors (1): Janna Lu

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against human superforecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of superforecasters.

nan


Article 868

Title@2025-07-06 (7): PENCIL: Long Thoughts with Short Memory

Title: PENCIL: Long Thoughts with Short Memory PENCIL: Lange Gedanken mit kurzer Erinnerung PENCIL:有短记忆的长思考 2503.14337v2

Authors (4): Chenxiao Yang, Nathan Srebro, David McAllester, Zhiyuan Li

While state-of-the-art LLMs have demonstrated great promise of using long Chains-of-Thought (CoT) to boost reasoning, scaling it up to more challenging problems at test-time is fundamentally limited by suboptimal memory usage – intermediate computations accumulate indefinitely in context even when no longer needed for future thoughts. We introduce PENCIL, which incorporates a novel reduction mechanism into the autoregressive generation process that recursively cleans up intermediate thoughts based on patterns learned from training. By iteratively generating and erasing thoughts, PENCIL can think deeper to solve harder problems using shorter context and less compute. Empirically, we observe PENCIL is significantly more effective and efficient than CoT. For example, we demonstrate PENCIL with a small 25M-parameter transformer and 2048 context length solves Einstein’s puzzle – a task that challenges much larger models like GPT-4. Theoretically, we prove PENCIL can perform universal efficient computation by simulating any Turing machines with optimal time and space complexity, and thus can solve arbitrary computable tasks that are otherwise intractable for vanilla CoT.

nan


Article 869

Title@2025-07-06 (7): SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection

Title: SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection SPIRA: Aufbau eines intelligenten Systems zur Erkennung von respiratorischer Insuffizienz SPIRA: 建立呼吸系统不足检测智能系统 2507.04548v1

Authors (5): Renato Cordeiro Ferreira, Dayanne Gomes, Vitor Tamae, Francisco Wernke, Alfredo Goldman

Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.

nan


Article 870

Title@2025-07-06 (7): Long Context Modeling with Ranked Memory-Augmented Retrieval

Title: Long Context Modeling with Ranked Memory-Augmented Retrieval Langes Kontextmodellieren mit Rangfolgememory-Augmented Retrieval 长处建模,有升级的内存增强检索模型 2503.14800v2

Authors (7): Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Hakim Hacid, Flora D. Salim, Imran Razzak

Effective long-term memory management is crucial for language models handling extended contexts. We introduce a novel framework that dynamically ranks memory entries based on relevance. Unlike previous works, our model introduces a novel relevance scoring and a pointwise re-ranking model for key-value embeddings, inspired by learning-to-rank techniques in information retrieval. Enhanced Ranked Memory Augmented Retrieval ERMAR achieves state-of-the-art results on standard benchmarks.

nan


Article 871

Title@2025-07-06 (7): Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data

Title: Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data Gradienten Steigerung Entscheidung Bäume auf medizinische Diagnose über Tabellendaten 关于表格数据上医学诊断的渐进推动决策树 2410.03705v4

Authors (2): A. Yarkın Yıldız, Asli Kalayci

Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient’s life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.

nan


Article 872

Title@2025-07-06 (7): da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

Title: da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs da4ml: Verteilte Arithmetik für Echtzeit-Neurale Netzwerke auf FPGAs da4ml: FPGAs 实时神经网络的分布式重新测量 2507.04535v1

Authors (5): Chang Sun, Zhiqiang Que, Vladimir Loncar, Wayne Luk, Maria Spiropulu

Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient algorithm for implementing CMVM operations with distributed arithmetic (DA) on FPGAs that simultaneously optimizes for area consumption and latency. The algorithm achieves resource reduction similar to state-of-the-art algorithms while being significantly faster to compute. The proposed algorithm is open-sourced and integrated into the \texttt{hls4ml} library, a free and open-source library for running real-time neural network inference on FPGAs. We show that the proposed algorithm can reduce on-chip resources by up to a third for realistic, highly quantized neural networks while simultaneously reducing latency, enabling the implementation of previously infeasible networks.

nan


Article 873

Title@2025-07-06 (7): DP-Fusion: Token-Level Differentially Private Inference for Large Language Models

Title: DP-Fusion: Token-Level Differentially Private Inference for Large Language Models DP-Fusion: Token-Level Differentially Private Inferenz für große Sprachmodelle DP-Fusion: 大语言模式的调控层面的私人不同推论 2507.04531v1

Authors (4): Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, Nils Lukas

Large language models (LLMs) can leak sensitive information from their context through generated outputs, either accidentally or when prompted adversarially. Existing defenses that aim to preserve context privacy during inference either lack formal guarantees or suffer from a poor utility/privacy trade-off. We propose DP-Fusion, a token-level Differentially Private Inference (DPI) mechanism that provably bounds how much an LLM’s outputs reveal about sensitive tokens in its context. We demonstrate DPI through the task of document privatization, where the goal is to paraphrase documents so that sensitive content (e.g., Personally Identifiable Information, PII) cannot be reliably inferred, while still preserving the overall utility of the text. This is controlled by a parameter $\epsilon$: $\epsilon=0$ hides PII entirely, while higher values trade off privacy for improved paraphrase quality. DP-Fusion works as follows: (i) partition sensitive tokens into disjoint privacy groups, (ii) run the LLM once per group, and (iii) blend the output distributions so that the final output remains within a fixed statistical distance of the baseline distribution produced when no privacy group is revealed. This approach allows fine-grained control over the privacy/utility trade-off but requires multiple LLM forward passes.

nan


Article 874

Title@2025-07-06 (7): Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

Title: Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness Von ähnlichen linearen Darstellungen lernen: Adaptivität, Minimaximität und Robustheit 从类似线性表示法学习:适应性、微量和强力 2303.17765v5

Authors (3): Ye Tian, Yuqi Gu, Yang Feng

Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL almost always improves performance. Nevertheless, as the number of tasks grows, assuming all tasks share the same representation is unrealistic. Furthermore, empirical findings often indicate that a shared representation does not necessarily improve single-task learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. Assuming a known intrinsic dimension, we propose a penalized empirical risk minimization method and a spectral method that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks. Both algorithms outperform single-task learning when representations across tasks are sufficiently similar and the proportion of outlier tasks is small. Moreover, they always perform at least as well as single-task learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to demonstrate that both methods are nearly \textit{minimax} optimal in a large regime, with the spectral method being optimal in the absence of outlier tasks. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conduct extensive numerical experiments to validate our theoretical findings.

nan


Article 875

Title@2025-07-06 (7): Selective Prediction via Training Dynamics

Title: Selective Prediction via Training Dynamics Selektive Vorhersage über Trainingsdynamik 通过培训动态进行选择性预测 2205.13532v4

Authors (8): Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, Israfil Bahceci, Akram Bin Sediq, Hamza Sokun, Nicolas Papernot

Selective Prediction is the task of rejecting inputs a model would predict incorrectly on. This involves a trade-off between input space coverage (how many data points are accepted) and model utility (how good is the performance on accepted data points). Current methods for selective prediction typically impose constraints on either the model architecture or the optimization objective; this inhibits their usage in practice and introduces unknown interactions with pre-existing loss functions. In contrast to prior work, we show that state-of-the-art selective prediction performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, given a test input, monitors metrics capturing the instability of predictions from intermediate models (i.e., checkpoints) obtained during training w.r.t. the final model’s prediction. In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training. The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches as it does not require any train-time modifications. Our experimental evaluation on image classification, regression, and time series problems shows that our method beats past state-of-the-art accuracy/utility trade-offs on typical selective prediction benchmarks.

nan


Article 876

Title@2025-07-06 (7): Verification of Visual Controllers via Compositional Geometric Transformations

Title: Verification of Visual Controllers via Compositional Geometric Transformations Verifizierung von visuellen Controllern durch kompositorische geometrische Transformationen 通过组成几何变形核查视觉控制器 2507.04523v1

Authors (3): Alexander Estornell, Leonard Jung, Michael Everett

Perception-based neural network controllers are increasingly used in autonomous systems that rely on visual inputs to operate in the real world. Ensuring the safety of such systems under uncertainty is challenging. Existing verification techniques typically focus on Lp-bounded perturbations in the pixel space, which fails to capture the low-dimensional structure of many real-world effects. In this work, we introduce a novel verification framework for perception-based controllers that can generate outer-approximations of reachable sets through explicitly modeling uncertain observations with geometric perturbations. Our approach constructs a boundable mapping from states to images, enabling the use of state-based verification tools while accounting for uncertainty in perception. We provide theoretical guarantees on the soundness of our method and demonstrate its effectiveness across benchmark control environments. This work provides a principled framework for certifying the safety of perception-driven control systems under realistic visual perturbations.

nan


Article 877

Title@2025-07-06 (7): Diffusion Explorer: Interactive Exploration of Diffusion Models

Title: Diffusion Explorer: Interactive Exploration of Diffusion Models Diffusion Explorer: Interaktive Erforschung von Diffusionsmodellen 传播探索:传播模型的互动探索 2507.01178v2

Authors (2): Alec Helbling, Duen Horng Chau

Diffusion models have been central to the development of recent image, video, and even text generation systems. They posses striking geometric properties that can be faithfully portrayed in low-dimensional settings. However, existing resources for explaining diffusion either require an advanced theoretical foundation or focus on their neural network architectures rather than their rich geometric properties. We introduce Diffusion Explorer, an interactive tool to explain the geometric properties of diffusion models. Users can train 2D diffusion models in the browser and observe the temporal dynamics of their sampling process. Diffusion Explorer leverages interactive animation, which has been shown to be a powerful tool for making engaging visualizations of dynamic systems, making it well suited to explaining diffusion models which represent stochastic processes that evolve over time. Diffusion Explorer is open source and a live demo is available at alechelbling.com/Diffusion-Explorer.

nan


Article 878

Title@2025-07-06 (7): Toward Universal Laws of Outlier Propagation

Title: Toward Universal Laws of Outlier Propagation Auf dem Weg zu universellen Gesetzen der Ausreißerpropagation 走向世界外阴外传播法 2502.08593v3

Authors (3): Aram Ebtekar, Yuhao Wang, Dominik Janzing

When a variety of anomalous features motivate flagging different samples as outliers, Algorithmic Information Theory (AIT) offers a principled way to unify them in terms of a sample’s randomness deficiency. Subject to the algorithmic Markov condition on a causal Bayesian network, we show that the randomness deficiency of a joint sample decomposes into a sum of randomness deficiencies at each causal mechanism. Consequently, anomalous observations can be attributed to their root causes, i.e., the mechanisms that behaved anomalously. As an extension of Levin’s law of randomness conservation, we show that weak outliers cannot cause strong ones. We show how these information theoretic laws clarify our understanding of outlier detection and attribution, in the context of more specialized outlier scores from prior literature.

nan


Article 879

Title@2025-07-06 (7): DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

Title: DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging DOTResize: Reduzierung der LLM-Breite durch diskrete Optimale transportbasierte Neuronenverschmelzung DOTRIZ: 通过分立优化运输最佳方式的中子合并减少 LLM 宽度 2507.04517v1

Authors (3): Neha Verma, Kenton Murray, Kevin Duh

Model compression offers a promising path to reducing the cost and inaccessibility of large pre-trained models, without significantly compromising their impressive performance. Large Transformer models, including large language models (LLMs), often contain computational redundancy, which can serve as a target for new model compression methods. In this work, we specifically target neuron-level redundancies in model layers by combining groups of similar neurons into fewer neurons. We frame this width reduction as a Discrete Optimal Transport problem, and propose DOTResize, a novel Transformer compression method that uses optimal transport theory to transform and compress model weights. To ensure applicability within the Transformer architecture, we motivate and incorporate entropic regularization and matrix factorization into the transportation maps produced by our method. Unlike pruning-based approaches which discard neurons based on importance measures, DOTResize re-projects the entire neuron width, allowing the retention and redistribution of useful signal across the reduced layer. Empirical results show that compared to simple or state-of-the-art neuron width-pruning techniques, DOTResize can outperform these methods across multiple LLM families and sizes, while achieving measurable reductions in real-world computational cost.

nan


Article 880

Title@2025-07-06 (7): On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems

Title: On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems Über die Nützlichkeit von Domain-Adjacent Fine-Tuned Model Ensembles für Probleme mit wenigen Fotos 关于Dome-邻接精度精美设计模型集的效用问题 2406.13720v2

Authors (5): Md Ibrahim Ibne Alam, Parikshit Ram, Soham Dan, Horst Samulowitz, Koushik Kar

Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.

nan


Article 881

Title@2025-07-06 (7): Collapse-Proof Non-Contrastive Self-Supervised Learning

Title: Collapse-Proof Non-Contrastive Self-Supervised Learning Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens 非封闭性自我监督学习的折叠结果 2410.04959v4

Authors (3): Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks.

nan


Article 882

Title@2025-07-06 (7): Mixtures of Gaussian Process Experts with SMC$^2$

Title: Mixtures of Gaussian Process Experts with SMC$^2$ Mischungen von Gaußschen Prozessexperten mit SMC$^2$ 高斯进程专家与SMC合金2美元混合 2208.12830v2

Authors (4): Teemu Härkönen, Sara Wade, Kody Law, Lassi Roininen

Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model’s flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.

nan


Article 883

Title@2025-07-06 (7): Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation

Title: Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation Ein dimensionaler Zusammenbruch hin zu einer umfassenden universellen Domain-Anpassung 应对走向全面通用域域适应的方位折叠 2410.11271v3

Authors (3): Hung-Chieh Fang, Po-Yi Lu, Hsuan-Tien Lin

Universal Domain Adaptation (UniDA) addresses unsupervised domain adaptation where target classes may differ arbitrarily from source ones, except for a shared subset. A widely used approach, partial domain matching (PDM), aligns only shared classes but struggles in extreme cases where many source classes are absent in the target domain, underperforming the most naive baseline that trains on only source data. In this work, we identify that the failure of PDM for extreme UniDA stems from dimensional collapse (DC) in target representations. To address target DC, we propose to use the de-collapse techniques in self-supervised learning on the unlabeled target data to preserve the intrinsic structure of the learned representations. Our experimental results confirm that SSL consistently advances PDM and delivers new state-of-the-art results across a broader benchmark of UniDA scenarios with different portions of shared classes, representing a crucial step toward truly comprehensive UniDA. Project page: https://dc-unida.github.io/

nan


Article 884

Title@2025-07-06 (7): Self-Consistency Preference Optimization

Title: Self-Consistency Preference Optimization Selbst-Konsistenz-Präferenz-Optimierung 优化优惠 2411.04109v3

Authors (9): Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu

Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area. However, existing techniques often fail to improve complex reasoning tasks due to the difficulty of assigning correct rewards. An orthogonal approach that is known to improve correctness is self-consistency, a method applied at inference time based on multiple sampling in order to find the most consistent answer. In this work, we extend the self-consistency concept to help train models. We thus introduce self-consistency preference optimization (ScPO), which iteratively trains consistent answers to be preferred over inconsistent ones on unsupervised new problems. We show ScPO leads to large improvements over conventional reward model training on reasoning tasks such as GSM8K and MATH, closing the gap with supervised training with gold answers or preferences, and that combining ScPO with standard supervised learning improves results even further. On ZebraLogic, ScPO finetunes Llama-3 8B to be superior to Llama-3 70B, Gemma-2 27B, and Claude-3 Haiku.

nan


Article 885

Title@2025-07-06 (7): Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference

Title: Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference Tausende-Brains-Systeme: Sensorimotorische Intelligenz für schnelles, robustes Lernen und Rückschlüsse 千字系统:用于快速、强力学习和推断的感知分子情报 2507.04494v1

Authors (5): Niels Leadholm, Viviane Clay, Scott Knudstrup, Hojae Lee, Jeff Hawkins

Current AI systems achieve impressive performance on many tasks, yet they lack core attributes of biological intelligence, including rapid, continual learning, representations grounded in sensorimotor interactions, and structured knowledge that enables efficient generalization. Neuroscience theory suggests that mammals evolved flexible intelligence through the replication of a semi-independent, sensorimotor module, a functional unit known as a cortical column. To address the disparity between biological and artificial intelligence, thousand-brains systems were proposed as a means of mirroring the architecture of cortical columns and their interactions. In the current work, we evaluate the unique properties of Monty, the first implementation of a thousand-brains system. We focus on 3D object perception, and in particular, the combined task of object recognition and pose estimation. Utilizing the YCB dataset of household objects, we first assess Monty’s use of sensorimotor learning to build structured representations, finding that these enable robust generalization. These representations include an emphasis on classifying objects by their global shape, as well as a natural ability to detect object symmetries. We then explore Monty’s use of model-free and model-based policies to enable rapid inference by supporting principled movements. We find that such policies complement Monty’s modular architecture, a design that can accommodate communication between modules to further accelerate inference speed via a novel `voting’ algorithm. Finally, we examine Monty’s use of associative, Hebbian-like binding to enable rapid, continual, and computationally efficient learning, properties that compare favorably to current deep learning architectures. While Monty is still in a nascent stage of development, these findings support thousand-brains systems as a powerful and promising new approach to AI.

nan


Article 886

Title@2025-07-06 (7): Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models

Title: Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models Machine Learning-based Prediction of Metal-Organic Framework Materials: Eine vergleichende Analyse mehrerer Modelle 对金属-有机框架材料的基于学习的机 机 动 学习-预测:对多种模型的比较分析 2507.04493v1

Authors (3): Zhuo Zheng, Keyan Liu, Xiyuan Zhu

Metal-organic frameworks (MOFs) have emerged as promising materials for various applications due to their unique structural properties and versatile functionalities. This study presents a comprehensive investigation of machine learning approaches for predicting MOF material properties. We employed five different machine learning models: Random Forest, XGBoost, LightGBM, Support Vector Machine, and Neural Network, to analyze and predict MOF characteristics using a dataset from the Kaggle platform. The models were evaluated using multiple performance metrics, including RMSE, R^2, MAE, and cross-validation scores. Results demonstrated that the Random Forest model achieved superior performance with an R^2 value of 0.891 and RMSE of 0.152, significantly outperforming other models. LightGBM showed remarkable computational efficiency, completing training in 25.7 seconds while maintaining high accuracy. Our comparative analysis revealed that ensemble learning methods generally exhibited better performance than traditional single models in MOF property prediction. This research provides valuable insights into the application of machine learning in materials science and establishes a robust framework for future MOF material design and property prediction.

nan


Article 887

Title@2025-07-06 (7): Dealing with Uncertainty in Contextual Anomaly Detection

Title: Dealing with Uncertainty in Contextual Anomaly Detection Umgang mit Ungewissheit bei der kontextuellen Anomalienerkennung 处理内地异常探测的不确定性 2507.04490v1

Authors (5): Luca Bindini, Lorenzo Perini, Stefano Nistri, Jesse Davis, Paolo Frasconi

Contextual anomaly detection (CAD) aims to identify anomalies in a target (behavioral) variable conditioned on a set of contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In many anomaly detection tasks, there exist contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In this work, we propose a novel framework for CAD, normalcy score (NS), that explicitly models both the aleatoric and epistemic uncertainties. Built on heteroscedastic Gaussian process regression, our method regards the Z-score as a random variable, providing confidence intervals that reflect the reliability of the anomaly assessment. Through experiments on benchmark datasets and a real-world application in cardiology, we demonstrate that NS outperforms state-of-the-art CAD methods in both detection accuracy and interpretability. Moreover, confidence intervals enable an adaptive, uncertainty-driven decision-making process, which may be very important in domains such as healthcare.

nan


Article 888

Title@2025-07-06 (7): Source Attribution in Retrieval-Augmented Generation

Title: Source Attribution in Retrieval-Augmented Generation Quellenzuweisung in der retrieval-angereicherten Generation 来源来源来源的归属 2507.04480v1

Authors (7): Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, Tomer Sagi

While attribution methods, such as Shapley values, are widely used to explain the importance of features or training data in traditional machine learning, their application to Large Language Models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems, is nascent and challenging. The primary obstacle is the substantial computational cost, where each utility function evaluation involves an expensive LLM call, resulting in direct monetary and time expenses. This paper investigates the feasibility and effectiveness of adapting Shapley-based attribution to identify influential retrieved documents in RAG. We compare Shapley with more computationally tractable approximations and some existing attribution methods for LLM. Our work aims to: (1) systematically apply established attribution principles to the RAG document-level setting; (2) quantify how well SHAP approximations can mirror exact attributions while minimizing costly LLM interactions; and (3) evaluate their practical explainability in identifying critical documents, especially under complex inter-document relationships such as redundancy, complementarity, and synergy. This study seeks to bridge the gap between powerful attribution techniques and the practical constraints of LLM-based RAG systems, offering insights into achieving reliable and affordable RAG explainability.

nan


Article 889

Title@2025-07-06 (7): Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice

Title: Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice Minimax und Buchten Optimale Best-Arm-Identifikation: Adaptives Experimentaldesign für die Behandlungswahl Minimax和Bayes 最佳最佳武器识别:治疗选择的适应性实验设计 2506.24007v2

Authors (1): Masahiro Kato

This study investigates adaptive experimental design for treatment choice, also known as fixed-budget best-arm identification. We consider an adaptive procedure consisting of a treatment-allocation phase followed by a treatment-choice phase, and we design an adaptive experiment for this setup to efficiently identify the best treatment arm, defined as the one with the highest expected outcome. In our designed experiment, the treatment-allocation phase consists of two stages. The first stage is a pilot phase, where we allocate each treatment arm uniformly with equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, we allocate treatment arms in proportion to the variances estimated in the first stage. After the treatment-allocation phase, the procedure enters the treatment-choice phase, where we choose the treatment arm with the highest sample mean as our estimate of the best treatment arm. We prove that this single design is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that match our lower bounds up to exact constants. Therefore, our designed experiment achieves the sharp efficiency limits without requiring separate tuning for minimax and Bayesian objectives.

nan


Article 890

Title@2025-07-06 (7): Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models

Title: Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models Modellinversionsangriffe auf Llama 3: PII aus großen Sprachmodellen extrahieren 3:从大语言模型中提取PII 2507.04478v1

Authors (1): Sathesh P. Sivashanmugam

Large language models (LLMs) have transformed natural language processing, but their ability to memorize training data poses significant privacy risks. This paper investigates model inversion attacks on the Llama 3.2 model, a multilingual LLM developed by Meta. By querying the model with carefully crafted prompts, we demonstrate the extraction of personally identifiable information (PII) such as passwords, email addresses, and account numbers. Our findings highlight the vulnerability of even smaller LLMs to privacy attacks and underscore the need for robust defenses. We discuss potential mitigation strategies, including differential privacy and data sanitization, and call for further research into privacy-preserving machine learning techniques.

nan


Article 891

Title@2025-07-06 (7): SHyPar: A Spectral Coarsening Approach to Hypergraph Partitioning

Title: SHyPar: A Spectral Coarsening Approach to Hypergraph Partitioning ShyPar: Ein spektraler, gewichtiger Ansatz zur Hypergraphenpartitionierung SHyPar: 高射线分割的光谱粗化方法 2410.10875v3

Authors (3): Hamed Sajadinia, Ali Aghdaei, Zhuo Feng

State-of-the-art hypergraph partitioners utilize a multilevel paradigm to construct progressively coarser hypergraphs across multiple layers, guiding cut refinements at each level of the hierarchy. Traditionally, these partitioners employ heuristic methods for coarsening and do not consider the structural features of hypergraphs. In this work, we introduce a multilevel spectral framework, SHyPar, for partitioning large-scale hypergraphs by leveraging hyperedge effective resistances and flow-based community detection techniques. Inspired by the latest theoretical spectral clustering frameworks, such as HyperEF and HyperSF, SHyPar aims to decompose large hypergraphs into multiple subgraphs with few inter-partition hyperedges (cut size). A key component of SHyPar is a flow-based local clustering scheme for hypergraph coarsening, which incorporates a max-flow-based algorithm to produce clusters with substantially improved conductance. Additionally, SHyPar utilizes an effective resistance-based rating function for merging nodes that are strongly connected (coupled). Compared with existing state-of-the-art hypergraph partitioning methods, our extensive experimental results on real-world VLSI designs demonstrate that SHyPar can more effectively partition hypergraphs, achieving state-of-the-art solution quality.

nan


Article 892

Title@2025-07-06 (7): ESSA: Evolutionary Strategies for Scalable Alignment

Title: ESSA: Evolutionary Strategies for Scalable Alignment ESSA: Evolutionäre Strategien für eine skalierbare Ausrichtung ESSA:可缩放调整的演变战略 2507.04453v1

Authors (8): Daria Korotyshova, Boris Shaposhnikov, Alexey Malakhov, Nikita Surnachev, George Bredis, Alexey Gorbatovski, Viacheslav Sinii, Daniil Gavrilov

Large Language Models (LLMs) are increasingly relying on alignment techniques to ensure that their outputs match human preferences. Although reinforcement learning from human feedback (RLHF) is the dominant approach, it has high computational costs, memory requirements, and training instability, particularly when scaling to larger models. This paper introduces ESSA (Evolutionary Strategies for Scalable Alignment), a new framework that uses Evolutionary Strategies (ES) to efficiently align LLMs without the need for gradient computation. ES is well-suited for LLM alignment due to its favorable properties, such as high parallelizability, memory efficiency, robustness to sparse rewards, and fewer data samples required for convergence, especially when starting from a strong pre-trained policy. Moreover, ES eliminates the need for extensive hyperparameter tuning, making the alignment process simpler and more stable. Although ES excels in low-dimensional optimization, it poses a challenge when applied to high-dimensional LLMs. To address this challenge, we propose a parameter-efficient architectural modification that reduces the dimensionality of optimization through low-rank adaptation. We evaluated our approach on mathematical reasoning tasks with verifiable accuracy-based metrics, demonstrating that ESSA converges faster and is more data efficient than gradient-based methods like Group Relative Policy Optimization (GRPO). Our findings establish ES as a promising and scalable alternative to gradient-based alignment, paving the way for efficient post-training of large language models.

nan


Article 893

Title@2025-07-06 (7): Transfer Learning in Infinite Width Feature Learning Networks

Title: Transfer Learning in Infinite Width Feature Learning Networks Transfer Lernen in unendlichen Breite Feature Lernnetzwerke 无限宽地学习网络中的转移学习 2507.04448v1

Authors (3): Clarissa Lauditi, Blake Bordelon, Cengiz Pehlevan

We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.

nan


Article 894

Title@2025-07-06 (7): KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

Title: KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität KnowRL:探索知识强化学习促进事实质量 2506.19807v2

Authors (5): Baochang Ren, Shuofei Qiao, Wenhao Yu, Huajun Chen, Ningyu Zhang

Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.

nan


Article 895

Title@2025-07-06 (7): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft 自动Mind:自动数据科学适应性知识代理 2506.10974v2

Authors (9): Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

nan


Article 896

Title@2025-07-06 (7): The Joys of Categorical Conformal Prediction

Title: The Joys of Categorical Conformal Prediction Die Freuden der kategorischen konformen Vorhersage 分类共变预言的欢乐 2507.04441v1

Authors (1): Michele Caprio

Conformal prediction (CP) is an Uncertainty Representation technique that delivers finite-sample calibrated prediction regions for any underlying Machine Learning model, yet its status as an Uncertainty Quantification (UQ) tool has remained conceptually opaque. We adopt a category-theoretic approach to CP – framing it as a morphism, embedded in a commuting diagram, of two newly-defined categories – that brings us three joys. First, we show that – under minimal assumptions – CP is intrinsically a UQ mechanism, that is, its UQ capabilities are a structural feature of the method. Second, we demonstrate that CP bridges (and perhaps subsumes) the Bayesian, frequentist, and imprecise probabilistic approaches to predictive statistical reasoning. Finally, we show that a conformal prediction region (CPR) is the image of a covariant functor. This observation is relevant to AI privacy: It implies that privacy noise added locally does not break coverage.

nan


Article 897

Title@2025-07-06 (7): Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities

Title: Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities Quantenalgorithmen für Banditen mit Knapsacks mit verbesserter Reue und Zeitkomplexitäten 带克纳普书包、有改进的悔恨和时间复杂性的强盗的量图量算法 2507.04438v1

Authors (5): Yuexin Su, Ziyi Yang, Peiyuan Huang, Tongyang Li, Yinyu Ye

Bandits with knapsacks (BwK) constitute a fundamental model that combines aspects of stochastic integer programming with online learning. Classical algorithms for BwK with a time horizon $T$ achieve a problem-independent regret bound of ${O}(\sqrt{T})$ and a problem-dependent bound of ${O}(\log T)$. In this paper, we initiate the study of the BwK model in the setting of quantum computing, where both reward and resource consumption can be accessed via quantum oracles. We establish both problem-independent and problem-dependent regret bounds for quantum BwK algorithms. For the problem-independent case, we demonstrate that a quantum approach can improve the classical regret bound by a factor of $(1+\sqrt{B/\mathrm{OPT}\mathrm{LP}})$, where $B$ is budget constraint in BwK and $\mathrm{OPT}{\mathrm{LP}}$ denotes the optimal value of a linear programming relaxation of the BwK problem. For the problem-dependent setting, we develop a quantum algorithm using an inexact quantum linear programming solver. This algorithm achieves a quadratic improvement in terms of the problem-dependent parameters, as well as a polynomial speedup of time complexity on problem’s dimensions compared to classical counterparts. Compared to previous works on quantum algorithms for multi-armed bandits, our study is the first to consider bandit models with resource constraints and hence shed light on operations research.

nan


Article 898

Title@2025-07-06 (7): Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

Title: Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain Herstellung einer Pipeline-Produktion: Herausforderungen und Lektionen im Bereich Healthcare 《管道生产-准备:保健领域的挑战和经验教训》 2506.06946v3

Authors (6): Daniel Angelo Esteves Lawand, Lucas Quaresma Medina Lam, Roberto Oliveira Bolgheroni, Renato Cordeiro Ferreira, Alfredo Goldman, Marcelo Finger

Deploying a Machine Learning (ML) training pipeline into production requires good software engineering practices. Unfortunately, the typical data science workflow often leads to code that lacks critical software quality attributes. This experience report investigates this problem in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. This paper presents an overview of the architecture of the MLES, then compares three versions of its Continuous Training subsystem: from a proof of concept Big Ball of Mud (v1), to a design pattern-based Modular Monolith (v2), to a test-driven set of Microservices (v3) Each version improved its overall extensibility, maintainability, robustness, and resiliency. The paper shares challenges and lessons learned in this process, offering insights for researchers and practitioners seeking to productionize their pipelines.

nan


Article 899

Title@2025-07-06 (7): Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models

Title: Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models Rekonstruieren biologischer Pfade durch Anwendung selektiven Inkrementellen Lernens auf (Sehr) kleine Sprachmodelle 通过将选择性增量学习应用到(老式)小型语言模式,重建生物途径 2507.04432v1

Authors (8): Pranta Saha, Joyce Reimer, Brook Byrns, Connor Burbridge, Neeraj Dhar, Jeffrey Chen, Steven Rayan, Gordon Broderick

The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called “hallucinations”, which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this technology in biomedical research. In this work we apply a very small LM by today’s standards to the specialized task of predicting regulatory interactions between molecular components to fill gaps in our current understanding of intracellular pathways. Toward this we attempt to correctly posit known pathway-informed interactions recovered from manually curated pathway databases by selecting and using only the most informative examples as part of an active learning scheme. With this example we show that a small (~110 million parameters) LM based on a Bidirectional Encoder Representations from Transformers (BERT) architecture can propose molecular interactions relevant to tuberculosis persistence and transmission with over 80% accuracy using less than 25% of the ~520 regulatory relationships in question. Using information entropy as a metric for the iterative selection of new tuning examples, we also find that increased accuracy is driven by favoring the use of the incorrectly assigned statements with the highest certainty (lowest entropy). In contrast, the concurrent use of correct but least certain examples contributed little and may have even been detrimental to the learning rate.

nan


Article 900

Title@2025-07-06 (7): ARMR: Adaptively Responsive Network for Medication Recommendation

Title: ARMR: Adaptively Responsive Network for Medication Recommendation ARMR: Adaptiv Responsive Network for Medicination Empfehlung ARMR: 药品建议适应性反应网络 2507.04428v1

Authors (3): Feiyue Wu, Tianxing Wu, Shenqi Jing

Medication recommendation is a crucial task in healthcare, especially for patients with complex medical conditions. However, existing methods often struggle to effectively balance the reuse of historical medications with the introduction of new drugs in response to the changing patient conditions. In order to address this challenge, we propose an Adaptively Responsive network for Medication Recommendation (ARMR), a new method which incorporates 1) a piecewise temporal learning component that distinguishes between recent and distant patient history, enabling more nuanced temporal understanding, and 2) an adaptively responsive mechanism that dynamically adjusts attention to new and existing drugs based on the patient’s current health state and medication history. Experiments on the MIMIC-III and MIMIC-IV datasets indicate that ARMR has better performance compared with the state-of-the-art baselines in different evaluation metrics, which contributes to more personalized and accurate medication recommendations. The source code is publicly avaiable at: https://github.com/seucoin/armr2.

nan


Article 901

Title@2025-07-06 (7): Accelerated Diffusion Models via Speculative Sampling

Title: Accelerated Diffusion Models via Speculative Sampling Beschleunigte Diffusionsmodelle durch spekulative Probenahme 通过投机抽样加速传播模型 2501.05370v2

Authors (4): Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet

Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model’s distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.

nan


Article 902

Title@2025-07-06 (7): Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive Learning

Title: Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive Learning Verbesserung der textbasierten Hierarchischen Multilabel-Klassifikation für mobile Anwendungen durch Kontrastives Lernen 通过差异学习加强用于移动应用的基于文本的等级式等级式多标签标签分类 2507.04413v1

Authors (4): Jiawei Guo, Yang Xiao, Weipeng Huang, Guangyuan Piao

A hierarchical labeling system for mobile applications (apps) benefits a wide range of downstream businesses that integrate the labeling with their proprietary user data, to improve user modeling. Such a label hierarchy can define more granular labels that capture detailed app features beyond the limitations of traditional broad app categories. In this paper, we address the problem of hierarchical multilabel classification for apps by using their textual information such as names and descriptions. We present: 1) HMCN (Hierarchical Multilabel Classification Network) for handling the classification from two perspectives: the first focuses on a multilabel classification without hierarchical constraints, while the second predicts labels sequentially at each hierarchical level considering such constraints; 2) HMCL (Hierarchical Multilabel Contrastive Learning), a scheme that is capable of learning more distinguishable app representations to enhance the performance of HMCN. Empirical results on our Tencent App Store dataset and two public datasets demonstrate that our approach performs well compared with state-of-the-art methods. The approach has been deployed at Tencent and the multilabel classification outputs for apps have helped a downstream task–credit risk management of user–improve its performance by 10.70% with regard to the Kolmogorov-Smirnov metric, for over one year.

nan


Article 903

Title@2025-07-06 (7): Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

Title: Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning Konvergenz und Stichprobenkomplexität von Methoden erster Ordnung für das agnostische Verstärkungslernen 一等加强教育方法的趋同性和样本复杂性 2507.04406v1

Authors (3): Uri Sherman, Tomer Koren, Yishay Mansour

We study reinforcement learning (RL) in the agnostic policy learning setting, where the goal is to find a policy whose performance is competitive with the best policy in a given class of interest $\Pi$ – crucially, without assuming that $\Pi$ contains the optimal policy. We propose a general policy learning framework that reduces this problem to first-order optimization in a non-Euclidean space, leading to new algorithms as well as shedding light on the convergence properties of existing ones. Specifically, under the assumption that $\Pi$ is convex and satisfies a variational gradient dominance (VGD) condition – an assumption known to be strictly weaker than more standard completeness and coverability conditions – we obtain sample complexity upper bounds for three policy learning algorithms: \emph{(i)} Steepest Descent Policy Optimization, derived from a constrained steepest descent method for non-convex optimization; \emph{(ii)} the classical Conservative Policy Iteration algorithm \citep{kakade2002approximately} reinterpreted through the lens of the Frank-Wolfe method, which leads to improved convergence results; and \emph{(iii)} an on-policy instantiation of the well-studied Policy Mirror Descent algorithm. Finally, we empirically evaluate the VGD condition across several standard environments, demonstrating the practical relevance of our key assumption.

nan


Article 904

Title@2025-07-06 (7): Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

Title: Convergence of Policy Mirror Descent Beyond Compatible Function Approximation Konvergenz der Politik Spiegel Abstieg jenseits kompatibler Funktion Annäherung 超出相兼容功能相似性的政策从镜面产生的趋同 2502.11033v3

Authors (3): Uri Sherman, Tomer Koren, Yishay Mansour

Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.

nan


Article 905

Title@2025-07-06 (7): KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs

Title: KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs KERAP: Ein wissensbasierter Ansatz für genaue Null-Shot-Diagnose-Vorhersage mit Multi-Agent LLMs KERRAP: 利用多种试剂LLMs进行准确零光诊断预测的知识强化理由说明方法 2507.02773v2

Authors (8): Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang

Medical diagnosis prediction plays a critical role in disease detection and personalized healthcare. While machine learning (ML) models have been widely adopted for this task, their reliance on supervised training limits their ability to generalize to unseen cases, particularly given the high cost of acquiring large, labeled datasets. Large language models (LLMs) have shown promise in leveraging language abilities and biomedical knowledge for diagnosis prediction. However, they often suffer from hallucinations, lack structured medical reasoning, and produce useless outputs. To address these challenges, we propose KERAP, a knowledge graph (KG)-enhanced reasoning approach that improves LLM-based diagnosis prediction through a multi-agent architecture. Our framework consists of a linkage agent for attribute mapping, a retrieval agent for structured knowledge extraction, and a prediction agent that iteratively refines diagnosis predictions. Experimental results demonstrate that KERAP enhances diagnostic reliability efficiently, offering a scalable and interpretable solution for zero-shot medical diagnosis prediction.

nan


Article 906

Title@2025-07-06 (7): Have LLMs Made Active Learning Obsolete? Surveying the NLP Community

Title: Have LLMs Made Active Learning Obsolete? Surveying the NLP Community Haben LLMs aktives Lernen überholt? Vermessung der NLP-Gemeinschaft LLM女士是否做过积极学习过时? 2503.09701v2

Authors (5): Julia Romberg, Christopher Schröder, Julius Gonsior, Katrin Tomanek, Fredrik Olsson

Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, while also advancing methods such as few- or zero-shot learning, and text synthesis – all of which can reduce the need for active learning. This naturally raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles, and future prospects. Our findings show that annotated data is expected to remain a key factor and active learning to stay highly relevant while benefiting from LLMs. Consistent with a community survey from over a decade ago, however, we find that three key challenges persist – setup complexity, risks in the cost reduction, and tooling – for which we propose alleviation strategies. We publish an anonymized version of the collected dataset.

nan


Article 907

Title@2025-07-06 (7): Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Title: Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization Inverses Verstärktes Lernen mit Revealed Preferences und passiver stochastischer Optimierung 利用公开首选项和被动斯托口优化进行反强化学习 2507.04396v1

Authors (1): Vikram Krishnamurthy

This monograph, spanning three chapters, explores Inverse Reinforcement Learning (IRL). The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics while the third chapter studies adaptive IRL via Langevin dynamics stochastic gradient algorithms. Chapter uses classical revealed preference theory (Afriat’s theorem and extensions) to identify constrained utility maximizers based on observed agent actions. This allows for the reconstruction of set-valued estimates of an agent’s utility. We illustrate this procedure by identifying the presence of a cognitive radar and reconstructing its utility function. The chapter also addresses the construction of a statistical detector for utility maximization behavior when agent actions are corrupted by noise. Chapter 2 studies Bayesian IRL. It investigates how an analyst can determine if an observed agent is a rationally inattentive Bayesian utility maximizer (i.e., simultaneously optimizing its utility and observation likelihood). The chapter discusses inverse stopping-time problems, focusing on reconstructing the continuation and stopping costs of a Bayesian agent operating over a random horizon. We then apply this IRL methodology to identify the presence of a Bayes-optimal sequential detector. Additionally, Chapter 2 provides a concise overview of discrete choice models, inverse Bayesian filtering, and inverse stochastic gradient algorithms for adaptive IRL. Finally, Chapter 3 introduces an adaptive IRL approach utilizing passive Langevin dynamics. This method aims to track time-varying utility functions given noisy and misspecified gradients. In essence, the adaptive IRL algorithms presented in Chapter 3 can be conceptualized as inverse stochastic gradient algorithms, as they learn the utility function in real-time while a stochastic gradient algorithm is in operation.

nan


Article 908

Title@2025-07-06 (7): Continual Visual Reinforcement Learning with A Life-Long World Model

Title: Continual Visual Reinforcement Learning with A Life-Long World Model Kontinuierliches Visuelles Verstärktes Lernen mit einem lebenslangen Weltmodell 采用 “ 终身世界模式 “ 的持续视力强化学习 2303.06572v2

Authors (7): Minting Pan, Wendong Zhang, Geng Chen, Xiangming Zhu, Siyu Gao, Yunbo Wang, Xiaokang Yang

Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first introduce the life-long world model, which learns task-specific latent dynamics using a mixture of Gaussians and incorporates generative experience replay to mitigate catastrophic forgetting. Then, we further address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the straightforward combinations of existing continual learning and visual RL algorithms on DeepMind Control Suite and Meta-World benchmarks with continual visual control tasks.

nan


Article 909

Title@2025-07-06 (7): Linear Bandits with Partially Observable Features

Title: Linear Bandits with Partially Observable Features Lineare Banditen mit teilweise beobachtbaren Eigenschaften 带有部分可部分观察特征的线形强盗 2502.06142v2

Authors (5): Wonyoung Kim, Sungwoo Park, Garud Iyengar, Assaf Zeevi, Min-hwan Oh

We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this challenge, we propose a novel theoretical framework and an algorithm with sublinear regret guarantees. The core of our algorithm consists of: (i) feature augmentation, by appending basis vectors that are orthogonal to the row space of the observed features; and (ii) the introduction of a doubly robust estimator. Our approach achieves a regret bound of $\tilde{O}(\sqrt{(d + d_h)T})$, where $d$ denotes the dimension of the observed features, and $d_h$ represents the number of nonzero coefficients in the parameter associated with the reward component projected onto the subspace orthogonal to the row space spanned by the observed features. Notably, our algorithm requires no prior knowledge of the unobserved feature space, which may expand as more features become hidden. Numerical experiments confirm that our algorithm outperforms both non-contextual multi-armed bandits and linear bandit algorithms depending solely on observed features.

nan


Article 910

Title@2025-07-06 (7): Tractable Representation Learning with Probabilistic Circuits

Title: Tractable Representation Learning with Probabilistic Circuits Tractable Representative Learning mit probabilistischen Schaltungen 利用概率电路进行可追踪的代表性学习 2507.04385v1

Authors (6): Steven Braun, Sahil Sidheekh, Antonio Vergari, Martin Mundt, Sriraam Natarajan, Kristian Kersting

Probabilistic circuits (PCs) are powerful probabilistic models that enable exact and tractable inference, making them highly suitable for probabilistic reasoning and inference tasks. While dominant in neural networks, representation learning with PCs remains underexplored, with prior approaches relying on external neural embeddings or activation-based encodings. To address this gap, we introduce autoencoding probabilistic circuits (APCs), a novel framework leveraging the tractability of PCs to model probabilistic embeddings explicitly. APCs extend PCs by jointly modeling data and embeddings, obtaining embedding representations through tractable probabilistic inference. The PC encoder allows the framework to natively handle arbitrary missing data and is seamlessly integrated with a neural decoder in a hybrid, end-to-end trainable architecture enabled by differentiable sampling. Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. These results highlight APCs as a powerful and flexible representation learning method that exploits the probabilistic inference capabilities of PCs, showing promising directions for robust inference, out-of-distribution detection, and knowledge distillation.

nan


Article 911

Title@2025-07-06 (7): SCAWaveNet: A Spatial-Channel Attention-Based Network for Global Significant Wave Height Retrieval

Title: SCAWaveNet: A Spatial-Channel Attention-Based Network for Global Significant Wave Height Retrieval SCAWaveNet: Ein raumgreifendes, auf Aufmerksamkeit basierendes Netzwerk für globales, signifikantes Wellenhöhen-Retrieval SCAWaveNet:全球重大波高高度回收空间通道关注网 2507.00701v2

Authors (6): Chong Zhang, Xichao Liu, Yibing Zhan, Dapeng Tao, Jun Ni, Jinwei Bu

Recent advancements in spaceborne GNSS missions have produced extensive global datasets, providing a robust basis for deep learning-based significant wave height (SWH) retrieval. While existing deep learning models predominantly utilize CYGNSS data with four-channel information, they often adopt single-channel inputs or simple channel concatenation without leveraging the benefits of cross-channel information interaction during training. To address this limitation, a novel spatial-channel attention-based network, namely SCAWaveNet, is proposed for SWH retrieval. Specifically, features from each channel of the DDMs are modeled as independent attention heads, enabling the fusion of spatial and channel-wise information. For auxiliary parameters, a lightweight attention mechanism is designed to assign weights along the spatial and channel dimensions. The final feature integrates both spatial and channel-level characteristics. Model performance is evaluated using four-channel CYGNSS data. When ERA5 is used as a reference, SCAWaveNet achieves an average RMSE of 0.438 m. When using buoy data from NDBC, the average RMSE reaches 0.432 m. Compared to state-of-the-art models, SCAWaveNet reduces the average RMSE by at least 3.52% on the ERA5 dataset and by 5.68% on the NDBC buoy observations. The code is available at https://github.com/Clifx9908/SCAWaveNet.

nan


Article 912

Title@2025-07-06 (7): Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic

Title: Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic Übertragung der visuellen Erklärbarkeit von selbsterklärenden Modellen durch Task Arithmetic 通过任务重新定性转移自我解释模型的可视解释性 2507.04380v1

Authors (4): Yuya Yoshikawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito

In scenarios requiring both prediction and explanation efficiency for image classification, self-explaining models that perform both tasks in a single inference are effective. However, their training incurs substantial labeling and computational costs. This study aims to tackle the issue by proposing a method to transfer the visual explainability of self-explaining models, learned in a source domain, to a target domain based on a task arithmetic framework. Specifically, we construct a self-explaining model by extending image classifiers based on a vision-language pretrained model. We then define an \emph{explainability vector} as the difference between model parameters trained on the source domain with and without explanation supervision. Based on the task arithmetic framework, we impart explainability to a model trained only on the prediction task in the target domain by applying the explainability vector. Experimental results on various image classification datasets demonstrate that, except for transfers between some less-related domains, visual explainability can be successfully transferred from source to target domains, improving explanation quality in the target domain without sacrificing classification accuracy. Furthermore, we show that the explainability vector learned on a large and diverse dataset like ImageNet, extended with explanation supervision, exhibits universality and robustness, improving explanation quality on nine out of ten different target datasets. We also find that the explanation quality achieved with a single model inference is comparable to that of Kernel SHAP, which requires 150 model inferences.

nan


Article 913

Title@2025-07-06 (7): Hierarchical Reinforcement Learning with Targeted Causal Interventions

Title: Hierarchical Reinforcement Learning with Targeted Causal Interventions Hierarchisches Stärkungslernen mit gezielten Kausalinterventionen 具有有针对性的因果干预措施的等级强化学习 2507.04373v1

Authors (4): Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser

Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals. The main challenge of HRL is efficient discovery of the hierarchical structure among subgoals and utilizing this structure to achieve the final goal. We address this challenge by modeling the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it. Additionally, rather than intervening on the subgoals at random during exploration, we harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal. These targeted interventions result in a significantly more efficient policy in terms of the training cost. Unlike previous work on causal HRL, which lacked theoretical analysis, we provide a formal analysis of the problem. Specifically, for tree structures and, for a variant of Erd\H{o}s-R'enyi random graphs, our approach results in remarkable improvements. Our experimental results on HRL tasks also illustrate that our proposed framework outperforms existing work in terms of training cost.

nan


Article 914

Title@2025-07-06 (7): Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification

Title: Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification Adaptive Malware-Erkennung mit sequentieller Feature-Auswahl: Ein duellierendes Double Deep Q-Network (D3QN) Framework für intelligente Klassifizierung 使用序列特征选择选择的适应性恶意检测:智能分类双重深Q网络框架(D3QN) 2507.04372v1

Authors (4): Naseem Khan, Aref Y. Al-Tamimi, Amine Bermak, Issa M. Khalil

Traditional malware detection methods exhibit computational inefficiency due to exhaustive feature extraction requirements, creating accuracy-efficiency trade-offs that limit real-time deployment. We formulate malware classification as a Markov Decision Process with episodic feature acquisition and propose a Dueling Double Deep Q-Network (D3QN) framework for adaptive sequential feature selection. The agent learns to dynamically select informative features per sample before terminating with classification decisions, optimizing both detection accuracy and computational cost through reinforcement learning. We evaluate our approach on Microsoft Big2015 (9-class, 1,795 features) and BODMAS (binary, 2,381 features) datasets. D3QN achieves 99.22% and 98.83% accuracy while utilizing only 61 and 56 features on average, representing 96.6% and 97.6% dimensionality reduction. This yields computational efficiency improvements of 30.1x and 42.5x over traditional ensemble methods. Comprehensive ablation studies demonstrate consistent superiority over Random Forest, XGBoost, and static feature selection approaches. Quantitative analysis demonstrates that D3QN learns non-random feature selection policies with 62.5% deviation from uniform baseline distributions. The learned policies exhibit structured hierarchical preferences, utilizing high-level metadata features for initial assessment while selectively incorporating detailed behavioral features based on classification uncertainty. Feature specialization analysis reveals 57.7% of examined features demonstrate significant class-specific discrimination patterns. Our results validate reinforcement learning-based sequential feature selection for malware classification, achieving superior accuracy with substantial computational reduction through learned adaptive policies.

nan


Article 915

Title@2025-07-06 (7): Time2Agri: Temporal Pretext Tasks for Agricultural Monitoring

Title: Time2Agri: Temporal Pretext Tasks for Agricultural Monitoring Time2Agri: Temporale Vortextaufgaben für die Agrarüberwachung Time2Agri: 农业监测时间预测任务 2507.04366v1

Authors (2): Moti Rattan Gupta, Anupam Sobti

Self Supervised Learning(SSL) has emerged as a prominent paradigm for label-efficient learning, and has been widely utilized by remote sensing foundation models(RSFMs). Recent RSFMs including SatMAE, DoFA, primarily rely on masked autoencoding(MAE), contrastive learning or some combination of them. However, these pretext tasks often overlook the unique temporal characteristics of agricultural landscape, namely nature’s cycle. Motivated by this gap, we propose three novel agriculture-specific pretext tasks, namely Time-Difference Prediction(TD), Temporal Frequency Prediction(FP), and Future-Frame Prediction(FF). Comprehensive evaluation on SICKLE dataset shows FF achieves 69.6% IoU on crop mapping and FP reduces yield prediction error to 30.7% MAPE, outperforming all baselines, and TD remains competitive on most tasks. Further, we also scale FF to the national scale of India, achieving 54.2% IoU outperforming all baselines on field boundary delineation on FTW India dataset.

nan


Article 916

Title@2025-07-06 (7): K Nearest Neighbor-Guided Trajectory Similarity Learning

Title: K Nearest Neighbor-Guided Trajectory Similarity Learning K Näherste Nachbar-geführte Flugbahn Ähnlichkeit Lernen Kneeest 邻里引导轨迹相似性学习 2502.00285v2

Authors (4): Yanchuan Chang, Xu Cai, Christian S. Jensen, Jianzhong Qi

Trajectory similarity is fundamental to many spatio-temporal data mining applications. Recent studies propose deep learning models to approximate conventional trajectory similarity measures, exploiting their fast inference time once trained. Although efficient inference has been reported, challenges remain in similarity approximation accuracy due to difficulties in trajectory granularity modeling and in exploiting similarity signals in the training data. To fill this gap, we propose TSMini, a highly effective trajectory similarity model with a sub-view modeling mechanism capable of learning multi-granularity trajectory patterns and a k nearest neighbor-based loss that guides TSMini to learn not only absolute similarity values between trajectories but also their relative similarity ranks. Together, these two innovations enable highly accurate trajectory similarity approximation. Experiments show that TSMini can outperform the state-of-the-art models by 22% in accuracy on average when learning trajectory similarity measures.

nan


Article 917

Title@2025-07-06 (7): Information-theoretic Quantification of High-order Feature Effects in Classification Problems

Title: Information-theoretic Quantification of High-order Feature Effects in Classification Problems Information-theoretische Quantifizierung von High-Order-Feature-Effekten bei Klassifizierungsproblemen 对分类问题中高顺序地物效应的信息理论定量 2507.04362v1

Authors (6): Ivan Lazic, Chiara Barà, Marta Iovino, Sebastiano Stramaglia, Niksa Jakovljevic, Luca Faes

Understanding the contribution of individual features in predictive models remains a central goal in interpretable machine learning, and while many model-agnostic methods exist to estimate feature importance, they often fall short in capturing high-order interactions and disentangling overlapping contributions. In this work, we present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method, leveraging Conditional Mutual Information (CMI) estimated via a k-Nearest Neighbor (kNN) approach working on mixed discrete and continuous random variables. Our framework decomposes feature contributions into unique, synergistic, and redundant components, offering a richer, model-independent understanding of their predictive roles. We validate the method using synthetic datasets with known Gaussian structures, where ground truth interaction patterns are analytically derived, and further test it on non-Gaussian and real-world gene expression data from TCGA-BRCA. Results indicate that the proposed estimator accurately recovers theoretical and expected findings, providing a potential use case for developing feature selection algorithms or model development based on interaction analysis.

nan


Article 918

Title@2025-07-06 (7): AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection

Title: AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection AVTENet: Ein vom Menschen erkannten, inspirierten Audio-Visual Transformer-basiertes Ensemble-Netzwerk für Video-Detektion Deepfake AVTENet:一个以视频深假探测视频合成网络 2310.13103v2

Authors (5): Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous studies on detecting artificial intelligence-generated fake videos only utilize visual modality or audio modality. While some methods exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multimodal datasets of deepfake videos involving acoustic and visual manipulations, and are mostly based on convolutional neural networks with low detection accuracy. Considering that human cognition instinctively integrates multisensory information including audio and visual cues to perceive and interpret content and the success of transformer in various fields, this study introduces the audio-visual transformer-based ensemble network (AVTENet). This innovative framework tackles the complexities of deepfake technology by integrating both acoustic and visual manipulations to enhance the accuracy of video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multimodal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that the proposed model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset. We also compare AVTENet against humans in detecting video forgery. The results show that AVTENet significantly outperforms humans.

nan


Article 919

Title@2025-07-06 (7): Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

Title: Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models Effiziente Verwirrtheit und Verhältnisbindung in diskreten Diffusions-Sprachmodellen 分散传播语言模型中高效的双倍宽度和比例匹配 2507.04341v1

Authors (4): Etrit Haxholli, Yeti Z. Gürbüz, Oğul Can, Eli Waxman

While continuous diffusion models excel in modeling continuous distributions, their application to categorical data has been less effective. Recent work has shown that ratio-matching through score-entropy within a continuous-time discrete Markov chain (CTMC) framework serves as a competitive alternative to autoregressive models in language modeling. To enhance this framework, we first introduce three new theorems concerning the KL divergence between the data and learned distribution. Our results serve as the discrete counterpart to those established for continuous diffusion models and allow us to derive an improved upper bound of the perplexity. Second, we empirically show that ratio-matching performed by minimizing the denoising cross-entropy between the clean and corrupted data enables models to outperform those utilizing score-entropy with up to 10% lower perplexity/generative-perplexity, and 15% faster training steps. To further support our findings, we introduce and evaluate a novel CTMC transition-rate matrix that allows prediction refinement, and derive the analytic expression for its matrix exponential which facilitates the computation of conditional ratios thus enabling efficient training and generation.

nan


Article 920

Title@2025-07-06 (7): Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Title: Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback Interactive Groupwise Vergleich für verstärktes Lernen aus menschlichem Feedback 利用人类反馈加强学习交互式群体比较 2507.04340v1

Authors (5): Jan Kompatscher, Danqing Shi, Giovanna Varni, Tino Weinkauf, Antti Oulasvirta

Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behavior with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualization that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviors organized in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviors for user queries. Users can efficiently explore large sets of behaviors by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final policy returns by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.

nan


Article 921

Title@2025-07-06 (7): Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting

Title: Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting 智能网格负载预测的维护隐私合作拆分学习框架 2403.01438v3

Authors (3): Asif Iqbal, Prosanta Gope, Biplab Sikdar

Accurate load forecasting is crucial for energy management, infrastructure planning, and demand-supply balancing. Smart meter data availability has led to the demand for sensor-based load forecasting. Conventional ML allows training a single global model using data from multiple smart meters requiring data transfer to a central server, raising concerns for network requirements, privacy, and security. We propose a split learning-based framework for load forecasting to alleviate this issue. We split a deep neural network model into two parts, one for each Grid Station (GS) responsible for an entire neighbourhood’s smart meters and the other for the Service Provider (SP). Instead of sharing their data, client smart meters use their respective GSs’ model split for forward pass and only share their activations with the GS. Under this framework, each GS is responsible for training a personalized model split for their respective neighbourhoods, whereas the SP can train a single global or personalized model for each GS. Experiments show that the proposed models match or exceed a centrally trained model’s performance and generalize well. Privacy is analyzed by assessing information leakage between data and shared activations of the GS model split. Additionally, differential privacy enhances local data privacy while examining its impact on performance. A transformer model is used as our base learner.

nan


Article 922

Title@2025-07-06 (7): Consistent Labeling Across Group Assignments: Variance Reduction in Conditional Average Treatment Effect Estimation

Title: Consistent Labeling Across Group Assignments: Variance Reduction in Conditional Average Treatment Effect Estimation Konsequente Beschriftung über Gruppenzuweisungen: Varianzreduzierung in bedingter durchschnittlicher Behandlungseffektschätzung 各组任务统一标签标签:减少在条件平均治疗效果估计方面的差异 2507.04332v1

Authors (3): Yi-Fu Fu, Keng-Te Liao, Shou-De Lin

Numerous algorithms have been developed for Conditional Average Treatment Effect (CATE) estimation. In this paper, we first highlight a common issue where many algorithms exhibit inconsistent learning behavior for the same instance across different group assignments. We introduce a metric to quantify and visualize this inconsistency. Next, we present a theoretical analysis showing that this inconsistency indeed contributes to higher test errors and cannot be resolved through conventional machine learning techniques. To address this problem, we propose a general method called \textbf{Consistent Labeling Across Group Assignments} (CLAGA), which eliminates the inconsistency and is applicable to any existing CATE estimation algorithm. Experiments on both synthetic and real-world datasets demonstrate significant performance improvements with CLAGA.

nan


Article 923

Title@2025-07-06 (7): A note on the unique properties of the Kullback–Leibler divergence for sampling via gradient flows

Title: A note on the unique properties of the Kullback–Leibler divergence for sampling via gradient flows Ein Hinweis auf die einzigartigen Eigenschaften der Kullback–Leiibler Divergenz für die Probenahme über Gradientenströme 关于通过梯度流取样的Kullback-Librec-Libber差异的独特特性的说明 2507.04330v1

Authors (1): Francesca Romana Crucinio

We consider the problem of sampling from a probability distribution $\pi$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise a divergence from $\pi$. and The optimisation problem is normally solved through gradient flows in the space of probability distribution with an appropriate metric. We show that the Kullback–Leibler divergence is the only divergence in the family of Bregman divergences whose gradient flow w.r.t. many popular metrics does not require knowledge of the normalising constant of $\pi$.

nan


Article 924

Title@2025-07-06 (7): TinyProto: Communication-Efficient Federated Learning with Sparse Prototypes in Resource-Constrained Environments

Title: TinyProto: Communication-Efficient Federated Learning with Sparse Prototypes in Resource-Constrained Environments TinyProto: Kommunikationseffizientes Federated Learning mit Sparse Prototypen in ressourcengebundenen Umgebungen TinyProto:在资源紧张的环境中,与浅质原型进行交流-高效的联邦学习 2507.04327v1

Authors (2): Gyuejeong Lee, Daeyoung Choi

Communication efficiency in federated learning (FL) remains a critical challenge for resource-constrained environments. While prototype-based FL reduces communication overhead by sharing class prototypes-mean activations in the penultimate layer-instead of model parameters, its efficiency decreases with larger feature dimensions and class counts. We propose TinyProto, which addresses these limitations through Class-wise Prototype Sparsification (CPS) and adaptive prototype scaling. CPS enables structured sparsity by allocating specific dimensions to class prototypes and transmitting only non-zero elements, while adaptive scaling adjusts prototypes based on class distributions. Our experiments show TinyProto reduces communication costs by up to 4x compared to existing methods while maintaining performance. Beyond its communication efficiency, TinyProto offers crucial advantages: achieving compression without client-side computational overhead and supporting heterogeneous architectures, making it ideal for resource-constrained heterogeneous FL.

nan


Article 925

Title@2025-07-06 (7): Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures

Title: Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures Benachteiligung von Angriffsdaten: SCANIAs Erfahrung zur Verbesserung von Cybersicherheitsmaßnahmen im Fahrzeug 减轻攻击数据稀缺性:SCANIA在加强车辆内部网络安全措施方面的经验 2507.02607v2

Authors (5): Frida Sundfeldt, Bianca Widstam, Mahshid Helali Moghadam, Kuo-Yun Liang, Anders Vesterberg

The digital evolution of connected vehicles and the subsequent security risks emphasize the critical need for implementing in-vehicle cyber security measures such as intrusion detection and response systems. The continuous advancement of attack scenarios further highlights the need for adaptive detection mechanisms that can detect evolving, unknown, and complex threats. The effective use of ML-driven techniques can help address this challenge. However, constraints on implementing diverse attack scenarios on test vehicles due to safety, cost, and ethical considerations result in a scarcity of data representing attack scenarios. This limitation necessitates alternative efficient and effective methods for generating high-quality attack-representing data. This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log, i.e., controller area network (CAN) log, representing various types of attack including denial of service (DoS), fuzzy, spoofing, suspension, and replay attacks. It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios with high similarity to real-world scenarios and promote variability. We evaluate the practicality of the generated attack-representing data within an intrusion detection system (IDS) case study, in which we develop and perform an empirical evaluation of two deep neural network IDS models using the generated data. In addition to the efficiency and scalability of the approach, the performance results of IDS models, high detection and classification capabilities, validate the consistency and effectiveness of the generated data as well. In this experience study, we also elaborate on the aspects influencing the fidelity of the data to real-world scenarios and provide insights into its application.

nan


Article 926

Title@2025-07-06 (7): CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning

Title: CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning CLIP-RL: Chirurgische Szene-Segmentierung mit Kontrastiv-Sprach-Vision Vor- und Verstärkungs-Lernen CLIP-RL: 使用矛盾语言视力的外科区分解 2507.04317v1

Authors (5): Fatmaelzahraa Ali Ahmed, Muhammad Arsalan, Abdulaziz Al-Ali, Khalid Al-Jalham, Shidin Balakrishnan

Understanding surgical scenes can provide better healthcare quality for patients, especially with the vast amount of video data that is generated during MIS. Processing these videos generates valuable assets for training sophisticated models. In this paper, we introduce CLIP-RL, a novel contrastive language-image pre-training model tailored for semantic segmentation for surgical scenes. CLIP-RL presents a new segmentation approach which involves reinforcement learning and curriculum learning, enabling continuous refinement of the segmentation masks during the full training pipeline. Our model has shown robust performance in different optical settings, such as occlusions, texture variations, and dynamic lighting, presenting significant challenges. CLIP model serves as a powerful feature extractor, capturing rich semantic context that enhances the distinction between instruments and tissues. The RL module plays a pivotal role in dynamically refining predictions through iterative action-space adjustments. We evaluated CLIP-RL on the EndoVis 2018 and EndoVis 2017 datasets. CLIP-RL achieved a mean IoU of 81%, outperforming state-of-the-art models, and a mean IoU of 74.12% on EndoVis 2017. This superior performance was achieved due to the combination of contrastive learning with reinforcement learning and curriculum learning.

nan


Article 927

Title@2025-07-06 (7): A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Title: A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment Ein ursächliches Weltmodell für die nächste Token-Vorhersage: GPT in einer kontrollierten Umgebung erforschen 由下到下预测所根据的因果世界模式:在受控制的环境中探索GPT 2412.07446v4

Authors (5): Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal

Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.

nan


Article 928

Title@2025-07-06 (7): Heterogeneous Federated Learning with Prototype Alignment and Upscaling

Title: Heterogeneous Federated Learning with Prototype Alignment and Upscaling Heterogenes Föderiertes Lernen mit Prototypenausrichtung und Upscaling 具有原型调整和升级的异异质联邦学习 2507.04310v1

Authors (3): Gyuejeong Lee, Jihwan Shin, Daeyoung Choi

Heterogeneity in data distributions and model architectures remains a significant challenge in federated learning (FL). Various heterogeneous FL (HtFL) approaches have recently been proposed to address this challenge. Among them, prototype-based FL (PBFL) has emerged as a practical framework that only shares per-class mean activations from the penultimate layer. However, PBFL approaches often suffer from suboptimal prototype separation, limiting their discriminative power. We propose Prototype Normalization (ProtoNorm), a novel PBFL framework that addresses this limitation through two key components: Prototype Alignment (PA) and Prototype Upscaling (PU). The PA method draws inspiration from the Thomson problem in classical physics, optimizing global prototype configurations on a unit sphere to maximize angular separation; subsequently, the PU method increases prototype magnitudes to enhance separation in Euclidean space. Extensive evaluations on benchmark datasets show that our approach better separates prototypes and thus consistently outperforms existing HtFL approaches. Notably, since ProtoNorm inherits the communication efficiency of PBFL and the PA is performed server-side, it is particularly suitable for resource-constrained environments.

nan


Article 929

Title@2025-07-06 (7): Domain Adaptation of Drag Reduction Policy to Partial Measurements

Title: Domain Adaptation of Drag Reduction Policy to Partial Measurements Bereich Anpassung der Drag-Reduction-Politik an Teilmessungen 将减拖量政策调整为部分计量 2507.04309v1

Authors (2): Anton Plaksin, Georgios Rigas

Feedback control of fluid-based systems poses significant challenges due to their high-dimensional, nonlinear, and multiscale dynamics, which demand real-time, three-dimensional, multi-component measurements for sensing. While such measurements are feasible in digital simulations, they are often only partially accessible in the real world. In this paper, we propose a method to adapt feedback control policies obtained from full-state measurements to setups with only partial measurements. Our approach is demonstrated in a simulated environment by minimising the aerodynamic drag of a simplified road vehicle. Reinforcement learning algorithms can optimally solve this control task when trained on full-state measurements by placing sensors in the wake. However, in real-world applications, sensors are limited and typically only on the vehicle, providing only partial measurements. To address this, we propose to train a Domain Specific Feature Transfer (DSFT) map reconstructing the full measurements from the history of the partial measurements. By applying this map, we derive optimal policies based solely on partial data. Additionally, our method enables determination of the optimal history length and offers insights into the architecture of optimal control policies, facilitating their implementation in real-world environments with limited sensor information.

nan


Article 930

Title@2025-07-06 (7): Learning Classifiers That Induce Markets

Title: Learning Classifiers That Induce Markets Lernklassifikatoren, die Märkte induzieren 招生市场学习分类 2502.20012v2

Authors (4): Yonatan Sommer, Ivri Hikri, Lotan Amit, Nir Rosenfeld

When learning is used to inform decisions about humans, such as for loans, hiring, or admissions, this can incentivize users to strategically modify their features, at a cost, to obtain positive predictions. The common assumption is that the function governing costs is exogenous, fixed, and predetermined. We challenge this assumption, and assert that costs can emerge as a result of deploying a classifier. Our idea is simple: when users seek positive predictions, this creates demand for important features; and if features are available for purchase, then a market will form, and competition will give rise to prices. We extend the strategic classification framework to support this notion, and study learning in a setting where a classifier can induce a market for features. We present an analysis of the learning task, devise an algorithm for computing market prices, propose a differentiable learning framework, and conduct experiments to explore our novel setting and approach.

nan


Article 931

Title@2025-07-06 (7): Agnostic Online Learning and Excellent Sets

Title: Agnostic Online Learning and Excellent Sets Agnostic Online-Lernen und ausgezeichnete Sets 无名在线学习和优秀套 2108.05569v3

Authors (2): Maryanthe Malliaris, Shay Moran

We use algorithmic methods from online learning to explore some important objects at the intersection of model theory and combinatorics, and find natural ways that algorithmic methods can detect and explain (and improve our understanding of) stable structure in the sense of model theory. The main theorem deals with existence of $\epsilon$-excellent sets (which are key to the Stable Regularity Lemma, a theorem characterizing the appearance of irregular pairs in Szemer'edi’s celebrated Regularity Lemma). We prove that $\epsilon$-excellent sets exist for any $\epsilon < \frac{1}{2}$ in $k$-edge stable graphs in the sense of model theory (equivalently, Littlestone classes); earlier proofs had given this only for $\epsilon < 1/{2^{2^k}}$ or so. We give two proofs: the first uses regret bounds from online learning, the second uses Boolean closure properties of Littlestone classes and sampling. We also give a version of the dynamic Sauer-Shelah-Perles lemma appropriate to this setting, related to definability of types. We conclude by characterizing stable/Littlestone classes as those supporting a certain abstract notion of majority: the proof shows that the two distinct, natural notions of majority, arising from measure and from dimension, densely often coincide.

nan


Article 932

Title@2025-07-06 (7): Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction

Title: Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction Verbesserung der Neuralen Autoregressiven Verteilungsstimulatoren für die Bildrekonstruktion 图像重建的加强神经自动递减分布动动测算器 2506.05391v2

Authors (2): Ambrose Emmett-Iwaniw, Nathan Kirk

Autoregressive models are often employed to learn distributions of image data by decomposing the $D$-dimensional density function into a product of one-dimensional conditional distributions. Each conditional depends on preceding variables (pixels, in the case of image data), making the order in which variables are processed fundamental to the model performance. In this paper, we study the problem of observing a small subset of image pixels (referred to as a pixel patch) to predict the unobserved parts of the image. As our prediction mechanism, we propose a generalized version of the convolutional neural autoregressive distribution estimation (ConvNADE) model adapted for real-valued and color images. Moreover, we investigate the quality of image reconstruction when observing both random pixel patches and low-discrepancy pixel patches inspired by quasi-Monte Carlo theory. Experiments on benchmark datasets demonstrate that, where design permits, pixels sampled or stored to preserve uniform coverage improves reconstruction fidelity and test performance.

nan


Article 933

Title@2025-07-06 (7): Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization

Title: Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization Adversariale Daten Augmentation für Single Domain Generalization via Lyapunov Exponent-Guided Optimization 通过 Lyapunov 指导热源引导优化化对单域的对数据增强 2507.04302v1

Authors (5): Zuyu Zhang, Ning Chen, Yongshan Liu, Qinghua Zhang, Xu Zhang

Single Domain Generalization (SDG) aims to develop models capable of generalizing to unseen target domains using only one source domain, a task complicated by substantial domain shifts and limited data diversity. Existing SDG approaches primarily rely on data augmentation techniques, which struggle to effectively adapt training dynamics to accommodate large domain shifts. To address this, we propose LEAwareSGD, a novel Lyapunov Exponent (LE)-guided optimization approach inspired by dynamical systems theory. By leveraging LE measurements to modulate the learning rate, LEAwareSGD encourages model training near the edge of chaos, a critical state that optimally balances stability and adaptability. This dynamic adjustment allows the model to explore a wider parameter space and capture more generalizable features, ultimately enhancing the model’s generalization capability. Extensive experiments on PACS, OfficeHome, and DomainNet demonstrate that LEAwareSGD yields substantial generalization gains, achieving up to 9.47\% improvement on PACS in low-data regimes. These results underscore the effectiveness of training near the edge of chaos for enhancing model generalization capability in SDG tasks.

nan


Article 934

Title@2025-07-06 (7): QF: Quick Feedforward AI Model Training without Gradient Back Propagation

Title: QF: Quick Feedforward AI Model Training without Gradient Back Propagation QF: Quick Feedforward KI-Modelltraining ohne gradiente Back Propagation QF: 快速进料推进AI 示范培训,不逐步后推推进 2507.04300v1

Authors (1): Feng Qi

We propose Quick Feedforward (QF) Learning, a novel knowledge consolidation framework for transformer-based models that enables efficient transfer of instruction derived knowledge into model weights through feedforward activations without any gradient back propagation. Unlike traditional finetuning, QF updates are computed in closed form, require minimal parameter modification, and preserve prior knowledge. Importantly, QF allows models to train and infer within the same runtime environment, making the process more resource efficient and closely aligned with how the human brain operates. Code and models are open sourced on GitHub. I hope QF Learning inspires a more efficient and brain-like paradigm for AI systems.

nan


Article 935

Title@2025-07-06 (7): Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents

Title: Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents Intelligente Assistenten für die Semiconductor Failure Analysis mit LLM-basierten Planungs-Agenten 与基于LLM的规划代理人进行半导体故障分析的智能助理 2506.15567v2

Authors (3): Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer

Failure Analysis (FA) is a highly intricate and knowledge-intensive process. The integration of AI components within the computational infrastructure of FA labs has the potential to automate a variety of tasks, including the detection of non-conformities in images, the retrieval of analogous cases from diverse data sources, and the generation of reports from annotated images. However, as the number of deployed AI models increases, the challenge lies in orchestrating these components into cohesive and efficient workflows that seamlessly integrate with the FA process. This paper investigates the design and implementation of a Large Language Model (LLM)-based Planning Agent (LPA) to assist FA engineers in solving their analysis cases. The LPA integrates LLMs with advanced planning capabilities and external tool utilization, enabling autonomous processing of complex queries, retrieval of relevant data from external systems, and generation of human-readable responses. Evaluation results demonstrate the agent’s operational effectiveness and reliability in supporting FA tasks.

nan


Article 936

Title@2025-07-06 (7): Learning collision risk proactively from naturalistic driving data at scale

Title: Learning collision risk proactively from naturalistic driving data at scale Kollisionsrisiko proaktiv aus naturalistischen Fahrdaten im Maßstab lernen 从规模的自然主义驾驶数据中积极主动地学习碰撞风险 2505.13556v3

Authors (4): Yiru Jiao, Simeon C. Calvert, Sander van Cranenburgh, Hans van Lint

Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. However, existing approaches to identifying potential collisions either require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are only useful in specific scenarios. To address these limits, this study introduces the Generalised Surrogate Safety Measure (GSSM), a new data-driven approach that learns collision risk exclusively from naturalistic driving without the need for crash or risk labels. GSSM captures the patterns of normal driving and estimates the extent to which a traffic interaction deviates from the norm towards an unsafe state. Diverse data from naturalistic driving, including motion kinematics, weather, lighting, etc., are used to train multiple GSSMs, which are tested with 2,591 reconstructed real-world crashes and near-crashes. These test events are also released here as the largest dataset of its kind to date. A basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions. Additional interaction patterns and contextual factors provide further performance gains. Across various types of collision risk scenarios (such as rear-end, merging, and turning interactions), the accuracy and timeliness of GSSM consistently outperforms existing baselines. GSSM therefore establishes a scalable, context-aware, and generalisable foundation for proactively quantifying collision risk in traffic interactions. This can support and facilitate autonomous driving systems, traffic safety assessment, and road emergency management. Code and experiment data are openly accessible at https://github.com/Yiru-Jiao/GSSM.

nan


Article 937

Title@2025-07-06 (7): Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping

Title: Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping Robustes Federated Learning Over the Air: Mit Median Anchored Clipping gegen schwerfällige Geräusche 强力的联邦在空中学习:用中度排成的剪接打击重击噪音 2409.15100v6

Authors (6): Jiaxing Li, Zihan Chen, Kai Fong Ernest Chong, Bikramjit Das, Tony Q. S. Quek, Howard H. Yang

Leveraging over-the-air computations for model aggregation is an effective approach to cope with the communication bottleneck in federated edge learning. By exploiting the superposition properties of multi-access channels, this approach facilitates an integrated design of communication and computation, thereby enhancing system privacy while reducing implementation costs. However, the inherent electromagnetic interference in radio channels often exhibits heavy-tailed distributions, giving rise to exceptionally strong noise in globally aggregated gradients that can significantly deteriorate the training performance. To address this issue, we propose a novel gradient clipping method, termed Median Anchored Clipping (MAC), to combat the detrimental effects of heavy-tailed noise. We also derive analytical expressions for the convergence rate of model training with analog over-the-air federated learning under MAC, which quantitatively demonstrates the effect of MAC on training performance. Extensive experimental results show that the proposed MAC algorithm effectively mitigates the impact of heavy-tailed noise, hence substantially enhancing system robustness.

nan


Article 938

Title@2025-07-06 (7): Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

Title: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning Skalierung von Iterationsnetzwerken auf 5000 Ebenen für extreme Langzeitplanung 用于极端长期规划的 向5000层扩展值迭代网络 2406.08404v2

Authors (7): Yuhui Wang, Qingyuan Wu, Dylan R. Ashley, Francesco Faccio, Weida Li, Chao Huang, Jürgen Schmidhuber

The Value Iteration Network (VIN) is an end-to-end differentiable neural network architecture for planning. It exhibits strong generalization to unseen domains by incorporating a differentiable planning module that operates on a latent Markov Decision Process (MDP). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a 100x100 maze – a task that typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module’s depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introduce an “adaptive highway loss” that constructs skip connections to improve gradient flow. We evaluate our method on 2D/3D maze navigation environments, continuous control, and the real-world Lunar rover navigation task. We find that our new method, named Dynamic Transition VIN (DT-VIN), scales to 5000 layers and solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in complex environments.

nan


Article 939

Title@2025-07-06 (7): VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning

Title: VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning VOLTRON: Unbekannte Malware mit graphisch basiertem Zero-Shot-Lernen erkennen VOLTRON: 使用基于图表的零热学习检测未知的恶意软件 2507.04275v1

Authors (5): M. Tahir Akdeniz, Zeynep Yeşilkaya, İ. Enes Köse, İ. Ulaş Ünal, Sevil Şen

The persistent threat of Android malware presents a serious challenge to the security of millions of users globally. While many machine learning-based methods have been developed to detect these threats, their reliance on large labeled datasets limits their effectiveness against emerging, previously unseen malware families, for which labeled data is scarce or nonexistent. To address this challenge, we introduce a novel zero-shot learning framework that combines Variational Graph Auto-Encoders (VGAE) with Siamese Neural Networks (SNN) to identify malware without needing prior examples of specific malware families. Our approach leverages graph-based representations of Android applications, enabling the model to detect subtle structural differences between benign and malicious software, even in the absence of labeled data for new threats. Experimental results show that our method outperforms the state-of-the-art MaMaDroid, especially in zero-day malware detection. Our model achieves 96.24% accuracy and 95.20% recall for unknown malware families, highlighting its robustness against evolving Android threats.

nan


Article 940

Title@2025-07-06 (7): EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Title: EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research EDALearn: Ein umfassender RTL-to-Signoff EDA Benchmark für Demokratisierung und reproduzierbare ML für EDA-Forschung EDALearn: EDA 研究的民主化和可复制的 ML 综合RTL-Sign-Sign-Sign-Signoff EDA基准 2312.01674v2

Authors (5): Jingyu Pan, Chen-Chia Chang, Zhiyao Xie, Yiran Chen, Hai Li

The application of Machine Learning (ML) in Electronic Design Automation (EDA) for Very Large-Scale Integration (VLSI) design has garnered significant research attention. Despite the requirement for extensive datasets to build effective ML models, most studies are limited to smaller, internally generated datasets due to the lack of comprehensive public resources. In response, we introduce EDALearn, the first holistic, open-source benchmark suite specifically for ML tasks in EDA. This benchmark suite presents an end-to-end flow from synthesis to physical implementation, enriching data collection across various stages. It fosters reproducibility and promotes research into ML transferability across different technology nodes. Accommodating a wide range of VLSI design instances and sizes, our benchmark aptly represents the complexity of contemporary VLSI designs. Additionally, we provide an in-depth data analysis, enabling users to fully comprehend the attributes and distribution of our data, which is essential for creating efficient ML models. Our contributions aim to encourage further advances in the ML-EDA domain.

nan


Article 941

Title@2025-07-06 (7): Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need

Title: Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need Effiziente Ausbildung von Deep Networks unter Verwendung einer geführten Spektraldatenauswahl: Ein Schritt auf dem Weg zu lernen, was man braucht 利用向导光谱数据选择对深网络进行有效培训:逐步学习你需要什么 2507.04269v1

Authors (2): Mohammadreza Sharifi, Ahad Harati

Effective data curation is essential for optimizing neural network training. In this paper, we present the Guided Spectrally Tuned Data Selection (GSTDS) algorithm, which dynamically adjusts the subset of data points used for training using an off-the-shelf pre-trained reference model. Based on a pre-scheduled filtering ratio, GSTDS effectively reduces the number of data points processed per batch. The proposed method ensures an efficient selection of the most informative data points for training while avoiding redundant or less beneficial computations. Preserving data points in each batch is performed based on spectral analysis. A Fiedler vector-based scoring mechanism removes the filtered portion of the batch, lightening the resource requirements of the learning. The proposed data selection approach not only streamlines the training process but also promotes improved generalization and accuracy. Extensive experiments on standard image classification benchmarks, including CIFAR-10, Oxford-IIIT Pet, and Oxford-Flowers, demonstrate that GSTDS outperforms standard training scenarios and JEST, a recent state-of-the-art data curation method, on several key factors. It is shown that GSTDS achieves notable reductions in computational requirements, up to four times, without compromising performance. GSTDS exhibits a considerable growth in terms of accuracy under the limited computational resource usage, in contrast to other methodologies. These promising results underscore the potential of spectral-based data selection as a scalable solution for resource-efficient deep learning and motivate further exploration into adaptive data curation strategies. You can find the code at https://github.com/rezasharifi82/GSTDS.

nan


Article 942

Title@2025-07-06 (7): An Explainable Transformer Model for Alzheimer’s Disease Detection Using Retinal Imaging

Title: An Explainable Transformer Model for Alzheimer’s Disease Detection Using Retinal Imaging Ein erklärbares Transformer-Modell für die Alzheimer-Erkennung mit retinaler Bildgebung 阿尔茨海默氏病利用视网膜成像检测疾病可解释的变形模型 2507.04259v1

Authors (5): Saeed Jamshidiha, Alireza Rezaee, Farshid Hajati, Mojtaba Golzan, Raymond Chiong

Alzheimer’s disease (AD) is a neurodegenerative disorder that affects millions worldwide. In the absence of effective treatment options, early diagnosis is crucial for initiating management strategies to delay disease onset and slow down its progression. In this study, we propose Retformer, a novel transformer-based architecture for detecting AD using retinal imaging modalities, leveraging the power of transformers and explainable artificial intelligence. The Retformer model is trained on datasets of different modalities of retinal images from patients with AD and age-matched healthy controls, enabling it to learn complex patterns and relationships between image features and disease diagnosis. To provide insights into the decision-making process of our model, we employ the Gradient-weighted Class Activation Mapping algorithm to visualize the feature importance maps, highlighting the regions of the retinal images that contribute most significantly to the classification outcome. These findings are compared to existing clinical studies on detecting AD using retinal biomarkers, allowing us to identify the most important features for AD detection in each imaging modality. The Retformer model outperforms a variety of benchmark algorithms across different performance metrics by margins of up to 11.

nan


Article 943

Title@2025-07-06 (7): Bandit Pareto Set Identification in a Multi-Output Linear Model

Title: Bandit Pareto Set Identification in a Multi-Output Linear Model Bandit Pareto Set-Identifikation in einem linearen Multi-Output-Modell 在多输出线性模型中设定多输出线性模型中的标识 2507.04255v1

Authors (3): Cyrille Kone, Emilie Kaufmann, Laura Richert

We study the Pareto Set Identification (PSI) problem in a structured multi-output linear bandit model. In this setting, each arm is associated a feature vector belonging to $\mathbb{R}^h$, and its mean vector in $\mathbb{R}^d$ linearly depends on this feature vector through a common unknown matrix $\Theta \in \mathbb{R}^{h \times d}$. The goal is to identify the set of non-dominated arms by adaptively collecting samples from the arms. We introduce and analyze the first optimal design-based algorithms for PSI, providing nearly optimal guarantees in both the fixed-budget and the fixed-confidence settings. Notably, we show that the difficulty of these tasks mainly depends on the sub-optimality gaps of $h$ arms only. Our theoretical results are supported by an extensive benchmark on synthetic and real-world datasets.

nan


Article 944

Title@2025-07-06 (7): FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services

Title: FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services FedAli: Personalisiertes Federated Learning Alignment mit Prototypen-Layers für Generalized Mobile Services FedAli:通用移动服务中个性化联邦学习与原型图层相适应 2411.10595v2

Authors (5): Sannara Ek, Kaile Wang, François Portet, Philippe Lalanda, Jiannong Cao

Personalized Federated Learning (PFL) enables distributed training on edge devices, allowing models to collaboratively learn global patterns while tailoring their parameters to better fit each client’s local data, all while preserving data privacy. However, PFL faces two key challenges in mobile systems: client drift, where heterogeneous data cause model divergence, and the overlooked need for client generalization, as the dynamic of mobile sensing demands adaptation beyond local environments. To overcome these limitations, we introduce Federated Alignment (FedAli), a prototype-based regularization technique that enhances inter-client alignment while strengthening the robustness of personalized adaptations. At its core, FedAli introduces the ALignment with Prototypes (ALP) layer, inspired by human memory, to enhance generalization by guiding inference embeddings toward personalized prototypes while reducing client drift through alignment with shared prototypes during training. By leveraging an optimal transport plan to compute prototype-embedding assignments, our approach allows pre-training the prototypes without any class labels to further accelerate convergence and improve performance. Our extensive experiments show that FedAli significantly enhances client generalization while preserving strong personalization in heterogeneous settings.

nan


Article 945

Title@2025-07-06 (7): ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments

Title: ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments ATwo-Stage-Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments 分布式计算环境中微射线数据分类的两步组合特征选择和粒子摇动优化方法 2507.04251v1

Authors (9): Aayush Adhikari, Sandesh Bhatta, Harendra S. Jangwan, Amit Mishra, Khair Ul Nisa, Abu Taha Zamani, Aaron Sapkota, Debendra Muduli, Nikhat Parveen

High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.

nan


Article 946

Title@2025-07-06 (7): Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Title: Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning Just Enough Shifts: Mit gezielten Darstellungs-Fine-Tunings überwiderliche Sprachmodelle abmildern 仅作足够的改动:减少在有针对性地派有定额代表的定时派定额代表制统一语言模式中的过度拒绝 2507.04250v1

Authors (5): Mahavir Dabas, Si Chen, Charles Fleming, Ming Jin, Ruoxi Jia

Safety alignment is crucial for large language models (LLMs) to resist malicious instructions but often results in over-refusals, where benign prompts are unnecessarily rejected, impairing user experience and model utility. We introduce ACTOR (Activation-Based Training for Over-Refusal Reduction), a robust and compute- and data-efficient training framework that minimizes over-refusals by leveraging internal activation patterns from diverse queries. ACTOR precisely identifies and adjusts the activation components that trigger refusals, providing stronger control over the refusal mechanism. By fine-tuning only a single model layer, ACTOR effectively reduces over-refusals across multiple benchmarks while maintaining the model’s ability to handle harmful queries and preserve overall utility.

nan


Article 947

Title@2025-07-06 (7): GLU Attention Improve Transformer

Title: GLU Attention Improve Transformer GLU Aufmerksamkeit verbessern Transformer GLU 关注改善变换器 2507.00022v2

Authors (1): Zehao Wang

Gated Linear Units (GLU) have shown great potential in enhancing neural network performance. In this paper, I introduce a novel attention mechanism called GLU Attention, which introduces nonlinearity into the values of Attention. My experiments demonstrate that GLU Attention improves both model performance and convergence speed across text and vision modalities with zero additional parameters and negligible computational costs. GLU Attention is lightweight and can seamlessly integrate with other technologies, such as Flash Attention, Rotary Position Embedding (RoPE), and various Multi-Head Attention (MHA) variants such as Grouped-Query Attention (GQA). This project is open-sourced at github.

nan


Article 948

Title@2025-07-06 (7): Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning

Title: Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning Inertiale quadratische Majorisierung Minimierung mit Anwendung auf Kernel Regularisiertes Lernen 内核正规化学习应用的惯性二次二次二次多数化最小化 2507.04247v1

Authors (2): Qiang Heng, Caixing Wang

First-order methods in convex optimization offer low per-iteration cost but often suffer from slow convergence, while second-order methods achieve fast local convergence at the expense of costly Hessian inversions. In this paper, we highlight a middle ground: minimizing a quadratic majorant with fixed curvature at each iteration. This strategy strikes a balance between per-iteration cost and convergence speed, and crucially allows the reuse of matrix decompositions, such as Cholesky or spectral decompositions, across iterations and varying regularization parameters. We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties under standard assumptions. The new perspective of our analysis is to center the arguments around the induced norm of the curvature matrix $H$. To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems. In particular, we propose a novel Sylvester equation modelling technique for kernel multinomial regression. In Julia-based experiments, QMME compares favorably against various established first- and second-order methods. Furthermore, we demonstrate that our algorithms complement existing kernel approximation techniques through more efficiently handling sketching matrices with large projection dimensions. Our numerical experiments and real data analysis are available and fully reproducible at https://github.com/qhengncsu/QMME.jl.

nan


Article 949

Title@2025-07-06 (7): Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics

Title: Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics Mit maschinellem Lernen Parsimonious und physisch-interpretable Darstellungen von Catchment-Scale Rainfall-Runoff Dynamics entdecken 利用机器学习发现抓捕规模降雨量规模降雨运行动态的分解和物理解释性说明 2412.04845v4

Authors (2): Yuan-Heng Wang, Hoshin V. Gupta

Despite excellent real-world predictive performance of modern machine learning (ML) methods, many scientists hesitate to discard traditional physical-conceptual (PC) approaches due to their relative interpretability, which contributes to credibility during decision-making. In this context, a currently underexplored aspect of ML is how to develop minimally-optimal representations that can facilitate better insight regarding system functioning. Regardless of how this is achieved, parsimonious representations seem to better support the advancement of scientific understanding. Our own view is that ML-based modeling should be based in use of computational units that are fundamentally easy to interpret in a physical-conceptual sense. This paper continues our exploration of how ML can be exploited in the service of scientific investigation. We use the Mass-Conserving-Perceptron (MCP) as the fundamental computational unit in a generic network architecture to explore important issues related to the use of observational data for constructing models of dynamical systems. We show, in the context of lumped catchment modeling, that physical interpretability and predictive performance can both be achieved using a relatively parsimonious distributed-state multiple-flow-path network with context-dependent gating and information sharing across the nodes, suggesting that MCP-based modeling can play a significant role in application of ML to geoscientific investigation.

nan


Article 950

Title@2025-07-06 (7): EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

Title: EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models EasyEdit2: Ein einfach zu bedienender Steuerungsrahmen für die Bearbeitung großer Sprachmodelle EasyEdit2: 编辑大语言模式的易用指导框架 2504.15133v2

Authors (10): Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, Ningyu Zhang

In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model’s behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use-users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model’s responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide a demo video at https://www.youtube.com/watch?v=AkfoiPfp5rQ for a quick introduction.

nan


Article 951

Title@2025-07-06 (7): Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment

Title: Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment Music102: Ein $D_{12}$-equivarianter Transformator für Akkordprogressionsbegleitung 音乐102: 用于伴奏合奏进度的 $D12} QQ变压器 2410.18151v2

Authors (1): Weiliang Luo

We present Music102, an advanced model aimed at enhancing chord progression accompaniment through a $D_{12}$-equivariant transformer. Inspired by group theory and symbolic music structures, Music102 leverages musical symmetry–such as transposition and reflection operations–integrating these properties into the transformer architecture. By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences. The POP909 dataset was employed to train and evaluate Music102, revealing significant improvements over the non-equivariant Music101 prototype Music101 in both weighted loss and exact accuracy metrics, despite using fewer parameters. This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain, addressing challenges in computational music analysis. With its stable and flexible neural framework, Music102 sets the stage for further exploration in equivariant music generation and computational composition tools, bridging mathematical theory with practical music performance.

nan


Article 952

Title@2025-07-06 (7): Scaling Context Requires Rethinking Attention

Title: Scaling Context Requires Rethinking Attention Skalierungskontext erfordert nachdenkliche Aufmerksamkeit 需要重新思考时重新思考环境 2507.04239v1

Authors (4): Carles Gelada, Jacob Buckman, Sean Zhang, Txus Bach

We argue that neither transformers nor sub-quadratic architectures are well suited to training at long sequence lengths: the cost of processing the context is too expensive in the former, too inexpensive in the latter. Approaches such as sliding window attention which reduce the cost-per-token of a transformer impair in-context learning, and so are also unsuitable. To address these limitations, we introduce power attention, an architectural layer for linear-cost sequence modeling whose state size can be adjusted independently of parameters, unlocking the advantages of linear attention on practical domains. We develop and open-source a set of GPU kernels for efficient power attention, identifying a novel pattern of operation fusion to avoid memory and bandwidth bottlenecks. Our experiments on the in-context learning of power attention shows that these models dominate both exponential attention and linear attention at long-context training.

nan


Article 953

Title@2025-07-06 (7): Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering

Title: Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering Tensor-basiertes Graphenlernen mit Konsistenz und Spezifität für Multi-View-Clustering 多观点群集的具有一致性和针对性的基于时标的图示学习 2403.18393v3

Authors (5): Long Shi, Lei Cao, Yunshan Ye, Yu Zhao, Badong Chen

In the context of multi-view clustering, graph learning is recognized as a crucial technique, which generally involves constructing an adaptive neighbor graph based on probabilistic neighbors, and then learning a consensus graph for clustering. However, it is worth noting that these graph learning methods encounter two significant limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive neighbor graph, which proves inadequate in capturing the intrinsic structure among data points in practice, particularly for high-dimensional data. Secondly, most of these methods focus solely on consensus graph, ignoring unique information from each view. Although a few graph-based studies have considered using specific information as well, the modelling approach employed does not exclude the noise impact from the common or specific components. To this end, we propose a novel tensor-based multi-view graph learning framework that simultaneously considers consistency and specificity, while effectively eliminating the influence of noise. Specifically, we calculate similarity using pseudo-Stiefel manifold distance to preserve the intrinsic properties of data. By making an assumption that the learned neighbor graph of each view comprises a consistent part, a specific part, and a noise part, we formulate a new tensor-based target graph learning paradigm for noise-free graph fusion. Owing to the benefits of tensor singular value decomposition (t-SVD) in uncovering high-order correlations, this model is capable of achieving a comprehensive understanding of the target graph. Furthermore, we derive an algorithm to address the optimization problem. Experiments on six datasets have demonstrated the superiority of our method. We have released the source code on https://github.com/lshi91/CSTGL-Code.

nan


Article 954

Title@2025-07-06 (7): Normalized Iterative Hard Thresholding for Tensor Recovery

Title: Normalized Iterative Hard Thresholding for Tensor Recovery Normalisierte iterative Hard Thresholding für Tensor Erholung 恢复Tensor的标准化迭代硬推进手段 2507.04228v1

Authors (4): Li Li, Yuneng Liang, Kaijie Zheng, Jian Lu

Low-rank recovery builds upon ideas from the theory of compressive sensing, which predicts that sparse signals can be accurately reconstructed from incomplete measurements. Iterative thresholding-type algorithms-particularly the normalized iterative hard thresholding (NIHT) method-have been widely used in compressed sensing (CS) and applied to matrix recovery tasks. In this paper, we propose a tensor extension of NIHT, referred to as TNIHT, for the recovery of low-rank tensors under two widely used tensor decomposition models. This extension enables the effective reconstruction of high-order low-rank tensors from a limited number of linear measurements by leveraging the inherent low-dimensional structure of multi-way data. Specifically, we consider both the CANDECOMP/PARAFAC (CP) rank and the Tucker rank to characterize tensor low-rankness within the TNIHT framework. At the same time, we establish a convergence theorem for the proposed TNIHT method under the tensor restricted isometry property (TRIP), providing theoretical support for its recovery guarantees. Finally, we evaluate the performance of TNIHT through numerical experiments on synthetic, image, and video data, and compare it with several state-of-the-art algorithms.

nan


Article 955

Title@2025-07-06 (7): Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling

Title: Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling Erklärbare KI für psychische Gesundheit Notfallrückkehren: Integration von LLMs mit vorausschauender Modellierung 心理健康紧急返回可解释的AI:将LLMs与预测模型相结合 2502.00025v4

Authors (8): Abdulaziz Ahmed, Mohammad Saleem, Mohammed Alzeen, Badari Birur, Rachel E Fargason, Bradley G Burk, Ahmed Alhassan, Mohammed Ali Al-Garadi

Importance: Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days. Traditional machine learning models for predicting these returns often lack interpretability for clinical use. Objective: To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models. Methods: This retrospective cohort study analyzed 42,464 ED visits for 27,904 unique mental health patients at an academic medical center in the Deep South from January 2018 to December 2022. Main Outcomes and Measures: Two primary outcomes were evaluated: (1) 30-day ED return prediction accuracy and (2) model interpretability using a novel LLM-enhanced framework integrating SHAP (SHapley Additive exPlanations) values with clinical knowledge. Results: For chief complaint classification, LLaMA 3 (8B) with 10-shot learning outperformed traditional models (accuracy: 0.882, F1-score: 0.86). In SDoH classification, LLM-based models achieved 0.95 accuracy and 0.96 F1-score, with Alcohol, Tobacco, and Substance Abuse performing best (F1: 0.96-0.89), while Exercise and Home Environment showed lower performance (F1: 0.70-0.67). The LLM-based interpretability framework achieved 99% accuracy in translating model predictions into clinically relevant explanations. LLM-extracted features improved XGBoost AUC from 0.74 to 0.76 and AUC-PR from 0.58 to 0.61. Conclusions and Relevance: Integrating LLMs with machine learning models yielded modest but consistent accuracy gains while significantly enhancing interpretability through automated, clinically relevant explanations. This approach provides a framework for translating predictive analytics into actionable clinical insights.

nan


Article 956

Title@2025-07-06 (7): Zero-Shot Cyclic Peptide Design with Composable Geometric Conditions

Title: Zero-Shot Cyclic Peptide Design with Composable Geometric Conditions Zero-Shot Cyclic Peptid Design mit komposierbaren geometrischen Bedingungen 具有可合成几何条件的零热相流式五氯苯设计 2507.04225v1

Authors (9): Dapeng Jiang, Xiangzhe Kong, Jiaqi Han, Mingyu Li, Rui Jiao, Wenbing Huang, Stefano Ermon, Jianzhu Ma, Yang Liu

Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation via composable geometric constraints. Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model through geometric conditioning on nodes and edges. During training, the model learns from unit constraints and their random combinations in linear peptides, while at inference, novel constraint combinations required for cyclization are imposed as input. Experiments show that our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84% on different cyclization strategies.

nan


Article 957

Title@2025-07-06 (7): Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning

Title: Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning Interface Laplace Learning: Erlernbarer Interface Term hilft semi-überwachtes Lernen 界面 Laplace 学习: 可学习的界面定期帮助半监督学习 2408.05419v2

Authors (3): Tangjun Wang, Chenglong Bao, Zuoqiang Shi

We introduce a novel framework, called Interface Laplace learning, for graph-based semi-supervised learning. Motivated by the observation that an interface should exist between different classes where the function value is non-smooth, we introduce a Laplace learning model that incorporates an interface term. This model challenges the long-standing assumption that functions are smooth at all unlabeled points. In the proposed approach, we add an interface term to the Laplace learning model at the interface positions. We provide a practical algorithm to approximate the interface positions using k-hop neighborhood indices, and to learn the interface term from labeled data without artificial design. Our method is efficient and effective, and we present extensive experiments demonstrating that Interface Laplace learning achieves better performance than other recent semi-supervised learning approaches at extremely low label rates on the MNIST, FashionMNIST, and CIFAR-10 datasets.

nan


Article 958

Title@2025-07-06 (7): Context Tuning for In-Context Optimization

Title: Context Tuning for In-Context Optimization Kontext-Tuning für In-Context-Optimierung Intext 优化的上下文图示 2507.04221v1

Authors (4): Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for large language models (LLMs), they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model’s inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.

nan


Article 959

Title@2025-07-06 (7): Particle Trajectory Representation Learning with Masked Point Modeling

Title: Particle Trajectory Representation Learning with Masked Point Modeling Partikel-Trajektoriendarstellung Lernen mit maskierten Punktmodellierung 使用蒙面点建模模型的粒子轨迹代表学习 2502.02558v3

Authors (3): Sam Young, Yeon-jae Jwa, Kazuhiro Terao

Effective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This yields remarkable data efficiency: fine-tuning on just 100 labeled events achieves track/shower semantic segmentation performance comparable to the state-of-the-art supervised baseline trained on $>$100,000 events. Furthermore, internal attention maps exhibit emergent instance segmentation of particle trajectories. While challenges remain, particularly for fine-grained features, we make concrete SSL’s potential for building a foundation model for LArTPC image analysis capable of serving as a common base for all data reconstruction tasks. To facilitate further progress, we release PILArNet-M, a large dataset of 1M LArTPC events. Project site: https://youngsm.com/polarmae.

nan


Article 960

Title@2025-07-06 (7): Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Title: Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs Modellkollaps ist kein Fehler, sondern ein Feature in Machine Unlearning für LLMs 模型折叠不是臭虫,而是机器为 LLM 取消学习的特写 2507.04219v1

Authors (4): Yan Scholten, Sophie Xhonneux, Stephan Günnemann, Leo Schwinn

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their training objectives. We argue this not only risks reinforcing exposure to sensitive data, it also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method - Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from the model. Our core idea is to leverage this collapse for unlearning by triggering collapse partially on the sensitive data. We theoretically analyze that our approach converges to the desired outcome, i.e. the LLM unlearns the information in the forget set. We empirically demonstrate that PMC overcomes two key limitations of existing unlearning approaches that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs. Overall, our contributions represent an important step toward more comprehensive unlearning that aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

nan


Article 961

Title@2025-07-06 (7): CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Title: CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v2

Authors (8): Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information. To eliminate reliance on geometric prompts, we use a semantic prompting strategy, replacing the original prompt encoder to sharpen the perception of challenging targets. In addition, a similarity-sorting self-updating strategy for memory and a mask-refining process is applied to further adapt to medical imaging and enhance localized details. Comparative experiments conducted on seven public datasets indicate that CRISP-SAM2 outperforms existing models. Extensive analysis also demonstrates the effectiveness of our method, thereby confirming its superior performance, especially in addressing the limitations mentioned earlier. Our code is available at: https://github.com/YU-deep/CRISP_SAM2.git.

nan


Article 962

Title@2025-07-06 (7): Confidence Intervals for Evaluation of Data Mining

Title: Confidence Intervals for Evaluation of Data Mining Vertrauensintervalle für die Evaluation von Data Mining 评价数据采矿的互信评价 2502.07016v2

Authors (2): Zheng Yuan, Wenxin Jiang

In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification accuracy, precision, recall, F measures, and Jaccard index. Typically, these performance measures are only approximately estimated from a finite dataset, which may lead to findings that are not statistically significant. In order to properly quantify such statistical uncertainty, it is important to provide confidence intervals associated with these estimated performance measures. We consider statistical inference about general performance measures used in data mining, with both individual and joint confidence intervals. These confidence intervals are based on asymptotic normal approximations and can be computed fast, without needs to do bootstrap resampling. We study the finite sample coverage probabilities for these confidence intervals and also propose a `blurring correction’ on the variance to improve the finite sample performance. This ‘blurring correction’ generalizes the plus-four method from binomial proportion to general performance measures used in data mining. Our framework allows multiple performance measures of multiple classification rules to be inferred simultaneously for comparisons.

nan


Article 963

Title@2025-07-06 (7): Backdooring Bias ($B^2$) into Stable Diffusion Models

Title: Backdooring Bias ($B^2$) into Stable Diffusion Models Hintertürige Bias ($B^2$) in stabile Diffusionsmodelle 后门Bisas (B2美元) 进入稳定扩散模型 2406.15213v4

Authors (4): Ali Naseh, Jaechul Roh, Eugene Bagdasarian, Amir Houmansadr

Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign users during inference. We investigate the feasibility and challenges of such attacks, demonstrating how modern generative models have made this adversarial process both easier and more adaptable. On the other hand, we explore various aspects of the detectability of such attacks and demonstrate that the model’s utility remains intact in the absence of the triggers. Our extensive experiments using over 200,000 generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases maintain strong text-image alignment, highlighting the challenges in detecting biased images without knowing that bias in advance. Our cost analysis confirms the low financial barrier ($10-$15) to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in diffusion models.

nan


Article 964

Title@2025-07-06 (7): An explicit formulation of the learned noise predictor $ε_θ({\bf x}t, t)$ via the forward-process noise $ε{t}$ in denoising diffusion probabilistic models (DDPMs)

Title: An explicit formulation of the learned noise predictor $ε_θ({\bf x}t, t)$ via the forward-process noise $ε{t}$ in denoising diffusion probabilistic models (DDPMs) Eine explizite Formulierung des erlernten Noise Predictors $ε_θ({\bf x}t, t)$ über das Forward-Process-Noise $ε{t}$ in demnoising diffusion probabilistic models (DDPMs) 通过在排除扩散概率模型(DDPMs)中的前处理噪声,明确配制所学的噪音预测器$(~bf xt, t)$的清晰配方 2507.04203v1

Authors (1): KiHyun Yun

In denoising diffusion probabilistic models (DDPMs), the learned noise predictor $ \epsilon_{\theta} ( {\bf x}t , t)$ is trained to approximate the forward-process noise $\epsilon_t$. The equality $\nabla{ {\bf x}t} \log q({\bf x}_t) = -\frac 1 {\sqrt {1- {\bar \alpha}_t} } \epsilon{\theta} ( {\bf x}t , t)$ plays a fundamental role in both theoretical analyses and algorithmic design, and thus is frequently employed across diffusion-based generative models. In this paper, an explicit formulation of $ \epsilon{\theta} ( {\bf x}t , t)$ in terms of the forward-process noise $\epsilon_t$ is derived. This result show how the forward-process noise $\epsilon_t$ contributes to the learned predictor $ \epsilon{\theta} ( {\bf x}_t , t)$. Furthermore, based on this formulation, we present a novel and mathematically rigorous proof of the fundamental equality above, clarifying its origin and providing new theoretical insight into the structure of diffusion models.

nan


Article 965

Title@2025-07-06 (7): Predicting Air Pollution in Cork, Ireland Using Machine Learning

Title: Predicting Air Pollution in Cork, Ireland Using Machine Learning Vorhersage der Luftverschmutzung in Cork, Irland durch maschinelles Lernen 利用机器学习预测爱尔兰科克的空气污染 2507.04196v1

Authors (6): Md Rashidunnabi, Fahmida Faiza Ananna, Kailash Hambarde, Bruno Gabriel Nascimento Andrade, Dean Venables, Hugo Proenca

Air pollution poses a critical health threat in cities worldwide, with nitrogen dioxide levels in Cork, Ireland exceeding World Health Organization safety standards by up to $278\%$. This study leverages artificial intelligence to predict air pollution with unprecedented accuracy, analyzing nearly ten years of data from five monitoring stations combined with 30 years of weather records. We evaluated 17 machine learning algorithms, with Extra Trees emerging as the optimal solution, achieving $77\%$ prediction accuracy and significantly outperforming traditional forecasting methods. Our analysis reveals that meteorological conditions particularly temperature, wind speed, and humidity are the primary drivers of pollution levels, while traffic patterns and seasonal changes create predictable pollution cycles. Pollution exhibits dramatic seasonal variations, with winter levels nearly double those of summer, and daily rush-hour peaks reaching $120\%$ above normal levels. While Cork’s air quality shows concerning violations of global health standards, our models detected an encouraging $31\%$ improvement from 2014 to 2022. This research demonstrates that intelligent forecasting systems can provide city planners and environmental officials with powerful prediction tools, enabling life-saving early warning systems and informed urban planning decisions. The technology exists today to transform urban air quality management. All research materials and code are freely available at: https://github.com/MdRashidunnabi/Air-Pollution-Analysis.git

nan


Article 966

Title@2025-07-06 (7): Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

Title: Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning Mixed-Sample SGD: Eine End-to-End-Analyse des Supervised Transfer Learning 混合抽样 SGD:监督转移学习的端到端分析 2507.04194v1

Authors (2): Yuyang Deng, Samory Kpotufe

Theoretical works on supervised transfer learning (STL) – where the learner has access to labeled samples from both source and target distributions – have for the most part focused on statistical aspects of the problem, while efficient optimization has received less attention. We consider the problem of designing an SGD procedure for STL that alternates sampling between source and target data, while maintaining statistical transfer guarantees without prior knowledge of the quality of the source data. A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step, to automatically gain from the source when it is informative, or bias towards the target and avoid negative transfer when the source is less informative. We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses, rooted in tracking an abstract sequence of constrained convex programs that serve to maintain the desired transfer guarantees. We instantiate these results in the concrete setting of linear regression with square loss, and show that the procedure converges, with $1/\sqrt{T}$ rate, to a solution whose statistical performance on the target is adaptive to the a priori unknown quality of the source. Experiments with synthetic and real datasets support the theory.

nan


Article 967

Title@2025-07-06 (7): JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling

Title: JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling JAX-MPM: Ein Learning-Augmented Differentiable Meshfree Framework für GPU-beschleunigte lagrangianische Simulation und geophysikalische Inverse Modellierung JAX-MPM:GPU-加速拉格朗日模拟和地球物理反建模的学习增强型差异网状框架 2507.04192v1

Authors (2): Honghui Du, QiZhi He

Differentiable programming that enables automatic differentiation through simulation pipelines has emerged as a powerful paradigm in scientific computing, supporting both forward and inverse modeling and facilitating integration with deep learning frameworks. We present JAX-MPM, a general-purpose differentiable meshfree solver within a hybrid Lagrangian-Eulerian framework, tailored for simulating complex continuum mechanics involving large deformations, frictional contact, and inelastic material behavior, with emphasis on geomechanics and geophysical hazard applications. Built on the material point method (MPM) and implemented using the JAX computing framework, JAX-MPM is fully differentiable and GPU-accelerated, enabling efficient gradient-based optimization directly through time-stepping solvers. It supports joint training of physical models and neural networks, allowing the learning of embedded closures and neural constitutive models. We validate JAX-MPM on several 2D and 3D benchmarks, including dam-breaks and granular collapses, demonstrating its accuracy and performance. A high-resolution 3D granular cylinder collapse with 2.7 million particles completes 1000 steps in ~22 seconds (single precision) and ~98 seconds (double precision) on a single GPU. Beyond forward modeling, we demonstrate inverse modeling capabilities such as velocity field reconstruction and spatially varying friction estimation. These results establish JAX-MPM as a unified, scalable platform for differentiable meshfree simulation and data-driven geomechanical inference.

nan


Article 968

Title@2025-07-05 (6): Where to Intervene: Action Selection in Deep Reinforcement Learning

Title: Where to Intervene: Action Selection in Deep Reinforcement Learning Wo man interveniert: Aktionsauswahl im Deep Reinforcement Learning 在哪里干预:深强化学习中的行动选择 2507.04187v1

Authors (2): Wenbo Zhang, Hengrui Cai

Deep reinforcement learning (RL) has gained widespread adoption in recent years but faces significant challenges, particularly in unknown and complex environments. Among these, high-dimensional action selection stands out as a critical problem. Existing works often require a sophisticated prior design to eliminate redundancy in the action space, relying heavily on domain expert experience or involving high computational complexity, which limits their generalizability across different RL tasks. In this paper, we address these challenges by proposing a general data-driven action selection approach with model-free and computationally friendly properties. Our method not only selects minimal sufficient actions but also controls the false discovery rate via knockoff sampling. More importantly, we seamlessly integrate the action selection into deep RL methods during online training. Empirical experiments validate the established theoretical guarantees, demonstrating that our method surpasses various alternative techniques in terms of both performance in variable selection and overall achieved rewards.

nan


Article 969

Title@2025-07-05 (6): Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models

Title: Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models Bayesian Multiobject Tracking mit neural-erweiterten Bewegungs- und Messmodellen Bayesian 多功能物体跟踪,以神经强化机动和测量模型跟踪 2506.18124v2

Authors (3): Shaoxiu Wei, Mingchao Liang, Florian Meyer

Multiobject tracking (MOT) is an important task in applications including autonomous driving, ocean sciences, and aerospace surveillance. Traditional MOT methods are model-based and combine sequential Bayesian estimation with data association and an object birth model. More recent methods are fully data-driven and rely on the training of neural networks. Both approaches offer distinct advantages in specific settings. In particular, model-based methods are generally applicable across a wide range of scenarios, whereas data-driven MOT achieves superior performance in scenarios where abundant labeled data for training is available. A natural thought is whether a general framework can integrate the two approaches. This paper introduces a hybrid method that utilizes neural networks to enhance specific aspects of the statistical model in Bayesian MOT that have been identified as overly simplistic. By doing so, the performance of the prediction and update steps of Bayesian MOT is improved. To ensure tractable computation, our framework uses belief propagation to avoid high-dimensional operations combined with sequential Monte Carlo methods to perform low-dimensional operations efficiently. The resulting method combines the flexibility and robustness of model-based approaches with the capability to learn complex information from data of neural networks. We evaluate the performance of the proposed method based on the nuScenes autonomous driving dataset and demonstrate that it has state-of-the-art performance

nan


Article 970

Title@2025-07-05 (6): AASeg: Attention Aware Network for Real Time Semantic Segmentation

Title: AASeg: Attention Aware Network for Real Time Semantic Segmentation AASeg: Achtung im Netzwerk für Echtzeit-Semantik-Segmentierung AASeg:实时语义区段注意网络 2108.04349v4

Authors (1): Abhinav Sagar

Semantic segmentation is a fundamental task in computer vision that involves dense pixel-wise classification for scene understanding. Despite significant progress, achieving high accuracy while maintaining real-time performance remains a challenging trade-off, particularly for deployment in resource-constrained or latency-sensitive applications. In this paper, we propose AASeg, a novel Attention-Aware Network for real-time semantic segmentation. AASeg effectively captures both spatial and channel-wise dependencies through lightweight Spatial Attention (SA) and Channel Attention (CA) modules, enabling enhanced feature discrimination without incurring significant computational overhead. To enrich contextual representation, we introduce a Multi-Scale Context (MSC) module that aggregates dense local features across multiple receptive fields. The outputs from attention and context modules are adaptively fused to produce high-resolution segmentation maps. Extensive experiments on Cityscapes, ADE20K, and CamVid demonstrate that AASeg achieves a compelling trade-off between accuracy and efficiency, outperforming prior real-time methods.

nan


Article 971

Title@2025-07-05 (6): Graded Neural Networks

Title: Graded Neural Networks Eingestufte neurale Netze 神经网络 2502.17751v2

Authors (1): Tony Shaska

This paper presents a novel framework for graded neural networks (GNNs) built over graded vector spaces $\V_\w^n$, extending classical neural architectures by incorporating algebraic grading. Leveraging a coordinate-wise grading structure with scalar action $\lambda \star \x = (\lambda^{q_i} x_i)$, defined by a tuple $\w = (q_0, \ldots, q_{n-1})$, we introduce graded neurons, layers, activation functions, and loss functions that adapt to feature significance. Theoretical properties of graded spaces are established, followed by a comprehensive GNN design, addressing computational challenges like numerical stability and gradient scaling. Potential applications span machine learning and photonic systems, exemplified by high-speed laser-based implementations. This work offers a foundational step toward graded computation, unifying mathematical rigor with practical potential, with avenues for future empirical and hardware exploration.

nan


Article 972

Title@2025-07-05 (6): Uncertainty Quantification in the Tsetlin Machine

Title: Uncertainty Quantification in the Tsetlin Machine Unsicherheit Quantifizierung in der Tsetlin Maschine Tsetlin 机器的不确定性量化 2507.04175v1

Authors (8): Runar Helin, Ole-Christoffer Granmo, Mayur Kishor Shende, Lei Jiao, Vladimir I. Zadorozhny, Kunal Ganesh Dumbre, Rishad Shafik, Alex Yakovlev

Data modeling using Tsetlin machines (TMs) is all about building logical rules from the data features. The decisions of the model are based on a combination of these logical rules. Hence, the model is fully transparent and it is possible to get explanations of its predictions. In this paper, we present a probability score for TM predictions and develop new techniques for uncertainty quantification to increase the explainability further. The probability score is an inherent property of any TM variant and is derived through an analysis of the TM learning dynamics. Simulated data is used to show a clear connection between the learned TM probability scores and the underlying probabilities of the data. A visualization of the probability scores also reveals that the TM is less confident in its predictions outside the training data domain, which contrasts the typical extrapolation phenomenon found in Artificial Neural Networks. The paper concludes with an application of the uncertainty quantification techniques on an image classification task using the CIFAR-10 dataset, where they provide new insights and suggest possible improvements to current TM image classification models.

nan


Article 973

Title@2025-07-05 (6): Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

Title: Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization Struktur als Suche: Unüberwachtes Permutationslernen für kombinatorische Optimierung 结构搜索: 用于组合优化的未经监督的变异学习 2507.04164v1

Authors (2): Yimeng Min, Carla P. Gomes

We propose a non-autoregressive framework for the Travelling Salesman Problem where solutions emerge directly from learned permutations without explicit search. By applying a similarity transformation to Hamiltonian cycles, the model learns to approximate permutation matrices via continuous relaxations. Our unsupervised approach achieves competitive performance against classical heuristics, demonstrating that the inherent structure of the problem can effectively guide combinatorial optimization without sequential decision-making.

nan


Article 974

Title@2025-07-05 (6): Enhancing Learning Path Recommendation via Multi-task Learning

Title: Enhancing Learning Path Recommendation via Multi-task Learning Verbesserung der Empfehlung über Lernpfade durch Multi-Task-Learning 通过多任务学习加强学习途径建议 2507.05295v1

Authors (4): Afsana Nasrin, Lijun Qian, Pamela Obiomon, Xishuang Dong

Personalized learning is a student-centered educational approach that adapts content, pace, and assessment to meet each learner’s unique needs. As the key technique to implement the personalized learning, learning path recommendation sequentially recommends personalized learning items such as lectures and exercises. Advances in deep learning, particularly deep reinforcement learning, have made modeling such recommendations more practical and effective. This paper proposes a multi-task LSTM model that enhances learning path recommendation by leveraging shared information across tasks. The approach reframes learning path recommendation as a sequence-to-sequence (Seq2Seq) prediction problem, generating personalized learning paths from a learner’s historical interactions. The model uses a shared LSTM layer to capture common features for both learning path recommendation and deep knowledge tracing, along with task-specific LSTM layers for each objective. To avoid redundant recommendations, a non-repeat loss penalizes repeated items within the recommended learning path. Experiments on the ASSIST09 dataset show that the proposed model significantly outperforms baseline methods for the learning path recommendation.

nan


Article 975

Title@2025-07-05 (6): SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network

Title: SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network SEZ-HARN: Selbsterklärbares Null-Shot-Netzwerk für die Anerkennung menschlicher Aktivität SEZ-HARN: 自爆零弹人活动识别网 2507.00050v2

Authors (4): Devin Y. De Silva, Sandareka Wickramanayake, Dulani Meedeniya, Sanka Rasnayaka

Human Activity Recognition (HAR), which uses data from Inertial Measurement Unit (IMU) sensors, has many practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited by the lack of comprehensive IMU-based HAR datasets that cover a wide range of activities and the lack of transparency in existing HAR models. Zero-shot HAR (ZS-HAR) overcomes the data limitations, but current models struggle to explain their decisions, making them less transparent. This paper introduces a novel IMU-based ZS-HAR model called the Self-Explainable Zero-shot Human Activity Recognition Network (SEZ-HARN). It can recognize activities not encountered during training and provide skeleton videos to explain its decision-making process. We evaluate the effectiveness of the proposed SEZ-HARN on four benchmark datasets PAMAP2, DaLiAc, HTD-MHAD and MHealth and compare its performance against three state-of-the-art black-box ZS-HAR models. The experiment results demonstrate that SEZ-HARN produces realistic and understandable explanations while achieving competitive Zero-shot recognition accuracy. SEZ-HARN achieves a Zero-shot prediction accuracy within 3\% of the best-performing black-box model on PAMAP2 while maintaining comparable performance on the other three datasets.

nan


Article 976

Title@2025-07-05 (6): Geometric Deep Learning for Computer-Aided Design: A Survey

Title: Geometric Deep Learning for Computer-Aided Design: A Survey Geometrisches Deep Learning für computergestütztes Design: Eine Umfrage 计算机辅助设计深深深学习:调查 2402.17695v2

Authors (2): Negar Heidari, Alexandros Iosifidis

Geometric Deep Learning techniques have become a transformative force in the field of Computer-Aided Design (CAD), and have the potential to revolutionize how designers and engineers approach and enhance the design process. By harnessing the power of machine learning-based methods, CAD designers can optimize their workflows, save time and effort while making better informed decisions, and create designs that are both innovative and practical. The ability to process the CAD designs represented by geometric data and to analyze their encoded features enables the identification of similarities among diverse CAD models, the proposition of alternative designs and enhancements, and even the generation of novel design alternatives. This survey offers a comprehensive overview of learning-based methods in computer-aided design across various categories, including similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds, and single/multi-view images. Additionally, it provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain. The final discussion delves into the challenges prevalent in this field, followed by potential future research directions in this rapidly evolving field.

nan


Article 977

Title@2025-07-05 (6): Physics-informed neural networks and neural operators for a study of EUV electromagnetic wave diffraction from a lithography mask

Title: Physics-informed neural networks and neural operators for a study of EUV electromagnetic wave diffraction from a lithography mask Physik-informierte neuronale Netzwerke und neuronale Operatoren für eine Untersuchung der elektromagnetischen Wellenbeugung von EUV aus einer Lithographiemaske 物理知情神经网络和神经操作器,用于研究欧洲紫外线电磁波从平面印刷面罩的碎片 2507.04153v1

Authors (2): Vasiliy A. Es’kin, Egor V. Ivanov

Physics-informed neural networks (PINNs) and neural operators (NOs) for solving the problem of diffraction of Extreme Ultraviolet (EUV) electromagnetic waves from a mask are presented. A novel hybrid Waveguide Neural Operator (WGNO) is introduced, which is based on a waveguide method with its most computationally expensive part replaced by a neural network. Numerical experiments on realistic 2D and 3D masks show that the WGNO achieves state-of-the-art accuracy and inference time, providing a highly efficient solution for accelerating the design workflows of lithography masks.

nan


Article 978

Title@2025-07-05 (6): Pedestrian Intention Prediction via Vision-Language Foundation Models

Title: Pedestrian Intention Prediction via Vision-Language Foundation Models Fußgänger-Intentionsvorhersage über Vision-Language-Stiftungsmodelle 通过视觉语言基金会模型预测行人意图 2507.04141v1

Authors (3): Mohsen Azarmi, Mahdi Rezaei, He Wang

Prediction of pedestrian crossing intention is a critical function in autonomous vehicles. Conventional vision-based methods of crossing intention prediction often struggle with generalizability, context understanding, and causal reasoning. This study explores the potential of vision-language foundation models (VLFMs) for predicting pedestrian crossing intentions by integrating multimodal data through hierarchical prompt templates. The methodology incorporates contextual information, including visual frames, physical cues observations, and ego-vehicle dynamics, into systematically refined prompts to guide VLFMs effectively in intention prediction. Experiments were conducted on three common datasets-JAAD, PIE, and FU-PIP. Results demonstrate that incorporating vehicle speed, its variations over time, and time-conscious prompts significantly enhances the prediction accuracy up to 19.8%. Additionally, optimised prompts generated via an automatic prompt engineering framework yielded 12.5% further accuracy gains. These findings highlight the superior performance of VLFMs compared to conventional vision-based models, offering enhanced generalisation and contextual understanding for autonomous driving applications.

nan


Article 979

Title@2025-07-05 (6): Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles

Title: Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles Driver-Net: Multi-Kamera Fusion für die Bewertung von Driver Take-Over Readiness in Automated Vehicles 司机-网络:评估自动车辆驾驶员准备就绪情况评估驱动驾驶员采用自动车辆多车队合并 2507.04139v1

Authors (2): Mahdi Rezaei, Mohsen Azarmi

Ensuring safe transition of control in automated vehicles requires an accurate and timely assessment of driver readiness. This paper introduces Driver-Net, a novel deep learning framework that fuses multi-camera inputs to estimate driver take-over readiness. Unlike conventional vision-based driver monitoring systems that focus on head pose or eye gaze, Driver-Net captures synchronised visual cues from the driver’s head, hands, and body posture through a triple-camera setup. The model integrates spatio-temporal data using a dual-path architecture, comprising a Context Block and a Feature Block, followed by a cross-modal fusion strategy to enhance prediction accuracy. Evaluated on a diverse dataset collected from the University of Leeds Driving Simulator, the proposed method achieves an accuracy of up to 95.8% in driver readiness classification. This performance significantly enhances existing approaches and highlights the importance of multimodal and multi-view fusion. As a real-time, non-intrusive solution, Driver-Net contributes meaningfully to the development of safer and more reliable automated vehicles and aligns with new regulatory mandates and upcoming safety standards.

nan


Article 980

Title@2025-07-05 (6): Token Level Hallucination Detection via Variance in Language Models

Title: Token Level Hallucination Detection via Variance in Language Models Token Level Halluzination Erkennung durch Varianz in Sprachmodellen 通过语言模式差异探测洞穴级幻觉 2507.04137v1

Authors (1): Keshav Kumar

Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations, confidently generated yet factually incorrect outputs. We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple stochastic generations. Unlike prior methods that require ground-truth references or sentence-level verification, our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis. We evaluate our method on unanswerable question prompts from the SQuAD v2 dataset and benchmark across three autoregressive models of varying scales: GPT-Neo 125M, Falcon 1B, and Mistral 7B. Through both quantitative metrics and visual diagnostics, we show that token-level variance reliably highlights instability in model outputs and correlates with hallucination patterns. Our framework is lightweight, reproducible, and adaptable to multiple domains, offering a valuable diagnostic tool for analyzing generative reliability in LLMs.

nan


Article 981

Title@2025-07-05 (6): Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment

Title: Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v3

Authors (4): Yuhui Sun, Xiyao Wang, Zixi Li, Jinman Zhao

While large-scale unsupervised language models (LMs) capture broad world knowledge and reasoning capabilities, steering their behavior toward desired objectives remains challenging due to the lack of explicit supervision. Existing alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on training a reward model and performing reinforcement learning to align with human preferences. However, RLHF is often computationally intensive, unstable, and sensitive to hyperparameters. To address these limitations, Direct Preference Optimization (DPO) was introduced as a lightweight and stable alternative, enabling direct alignment of language models with pairwise preference data via classification loss. However, DPO and its extensions generally assume a single static preference distribution, limiting flexibility in multi-objective or dynamic alignment settings. In this paper, we propose a novel framework: Multi-Preference Lambda-weighted Listwise DPO, which extends DPO to incorporate multiple human preference dimensions (e.g., helpfulness, harmlessness, informativeness) and enables dynamic interpolation through a controllable simplex-weighted formulation. Our method supports both listwise preference feedback and flexible alignment across varying user intents without re-training. Empirical and theoretical analysis demonstrates that our method is as effective as traditional DPO on static objectives while offering greater generality and adaptability for real-world deployment.

nan


Article 982

Title@2025-07-05 (6): Online Convex Optimization with Switching Cost with Only One Single Gradient Evaluation

Title: Online Convex Optimization with Switching Cost with Only One Single Gradient Evaluation Online Convex-Optimierung mit Schaltkosten mit nur einem einzigen Gradienten-Bewertung 仅用单级梯度评价以切换成本实现在线 Convex优化 2507.04133v1

Authors (3): Harsh Shah, Purna Chandrasekhar, Rahul Vaze

Online convex optimization with switching cost is considered under the frugal information setting where at time $t$, before action $x_t$ is taken, only a single function evaluation and a single gradient is available at the previously chosen action $x_{t-1}$ for either the current cost function $f_t$ or the most recent cost function $f_{t-1}$. When the switching cost is linear, online algorithms with optimal order-wise competitive ratios are derived for the frugal setting. When the gradient information is noisy, an online algorithm whose competitive ratio grows quadratically with the noise magnitude is derived.

nan


Article 983

Title@2025-07-05 (6): False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems

Title: False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems Falsche Alarme, wirkliche Schäden: Widerwärtige Angriffe mit LLM-basierten Modellen auf textbasierten Cyber Threat Intelligence-Systemen 假警报、真实损害:利用基于LLM的基于LLM的基于文本的网络威胁信息系统模型进行对抗性攻击 2507.06252v1

Authors (3): Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira

Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system’s information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.

nan


Article 984

Title@2025-07-05 (6): Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics

Title: Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics Graphische Neuralnetze als Ersatz für Transformer in Single-Cell Transcriptomics 以神经网络替代单子转换器变换器的神经网络 2507.04125v1

Authors (5): Jiaxin Qi, Yan Cui, Jinli Ou, Jianqiang Huang, Gaogang Xie

Graph Neural Networks (GNNs) and Transformers share significant similarities in their encoding strategies for interacting with features from nodes of interest, where Transformers use query-key scores and GNNs use edges. Compared to GNNs, which are unable to encode relative positions, Transformers leverage dynamic attention capabilities to better represent relative relationships, thereby becoming the standard backbones in large-scale sequential pre-training. However, the subtle difference prompts us to consider: if positions are no longer crucial, could we substitute Transformers with Graph Neural Networks in some fields such as Single-Cell Transcriptomics? In this paper, we first explore the similarities and differences between GNNs and Transformers, specifically in terms of relative positions. Additionally, we design a synthetic example to illustrate their equivalence where there are no relative positions between tokens in the sample. Finally, we conduct extensive experiments on a large-scale position-agnostic dataset-single-cell transcriptomics-finding that GNNs achieve competitive performance compared to Transformers while consuming fewer computation resources. These findings provide novel insights for researchers in the field of single-cell transcriptomics, challenging the prevailing notion that the Transformer is always the optimum choice.

nan


Article 985

Title@2025-07-05 (6): Diversity Conscious Refined Random Forest

Title: Diversity Conscious Refined Random Forest Diversity Bewusst Verfeinerter Zufallswald 改良的随机森林 2507.00467v2

Authors (5): Sijan Bhattarai, Saurav Bhandari, Girija Bhusal, Saroj Shakya, Tapendra Pandey

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and model redundancy. In this work, our goal is to grow trees dynamically only on informative features and then enforce maximal diversity by clustering and retaining uncorrelated trees. Therefore, we propose a Refined Random Forest Classifier that iteratively refines itself by first removing the least informative features and then analytically determines how many new trees should be grown, followed by correlation-based clustering to remove redundant trees. The classification accuracy of our model was compared against the standard RF on the same number of trees. Experiments on 8 multiple benchmark datasets, including binary and multiclass datasets, demonstrate that the proposed model achieves improved accuracy compared to standard RF.

nan


Article 986

Title@2025-07-05 (6): Graph Inverse Style Transfer for Counterfactual Explainability

Title: Graph Inverse Style Transfer for Counterfactual Explainability Graph Inverse Style Transfer für gegenfaktische Erklärbarkeit 反事实解释的反反向样式传输 2505.17542v2

Authors (3): Bardh Prenkaj, Efstratios Zaradoukas, Gjergji Kasneci

Counterfactual explainability seeks to uncover model decisions by identifying minimal changes to the input that alter the predicted outcome. This task becomes particularly challenging for graph data due to preserving structural integrity and semantic meaning. Unlike prior approaches that rely on forward perturbation mechanisms, we introduce Graph Inverse Style Transfer (GIST), the first framework to re-imagine graph counterfactual generation as a backtracking process, leveraging spectral style transfer. By aligning the global structure with the original input spectrum and preserving local content faithfulness, GIST produces valid counterfactuals as interpolations between the input style and counterfactual content. Tested on 8 binary and multi-class graph classification benchmarks, GIST achieves a remarkable +7.6% improvement in the validity of produced counterfactuals and significant gains (+45.5%) in faithfully explaining the true class distribution. Additionally, GIST’s backtracking mechanism effectively mitigates overshooting the underlying predictor’s decision boundary, minimizing the spectral differences between the input and the counterfactuals. These results challenge traditional forward perturbation methods, offering a novel perspective that advances graph explainability.

nan


Article 987

Title@2025-07-05 (6): Model selection for stochastic dynamics: a parsimonious and principled approach

Title: Model selection for stochastic dynamics: a parsimonious and principled approach Modellauswahl für stochastische Dynamik: ein parsimonischer und prinzipienorientierter Ansatz 随机动态模型选择模式:有原则、有原则的方法 2507.04121v1

Authors (1): Andonis Gerardos

This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\mathcal{B} \ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($\Delta t$) or measurement noise ($\sigma$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$\Delta t$ and PASTIS-$\sigma$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.

nan


Article 988

Title@2025-07-05 (6): When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need

Title: When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need Wenn datenfreie Wissensdestillation auf nicht übertragbaren Lehrer trifft: Aus-von-Distribution-Falle zu entkommen ist alles, was Sie brauchen 当数据-无知识蒸馏遇到非可转让教师时:摆脱分配办法就是你所需要的一切 2507.04119v1

Authors (6): Ziming Hong, Runnan Chen, Zengmao Wang, Bo Han, Bo Du, Tongliang Liu

Data-free knowledge distillation (DFKD) transfers knowledge from a teacher to a student without access the real in-distribution (ID) data. Its common solution is to use a generator to synthesize fake data and use them as a substitute for real ID data. However, existing works typically assume teachers are trustworthy, leaving the robustness and security of DFKD from untrusted teachers largely unexplored. In this work, we conduct the first investigation into distilling non-transferable learning (NTL) teachers using DFKD, where the transferability from an ID domain to an out-of-distribution (OOD) domain is prohibited. We find that NTL teachers fool DFKD through divert the generator’s attention from the useful ID knowledge to the misleading OOD knowledge. This hinders ID knowledge transfer but prioritizes OOD knowledge transfer. To mitigate this issue, we propose Adversarial Trap Escaping (ATEsc) to benefit DFKD by identifying and filtering out OOD-like synthetic samples. Specifically, inspired by the evidence that NTL teachers show stronger adversarial robustness on OOD samples than ID samples, we split synthetic samples into two groups according to their robustness. The fragile group is treated as ID-like data and used for normal knowledge distillation, while the robust group is seen as OOD-like data and utilized for forgetting OOD knowledge. Extensive experiments demonstrate the effectiveness of ATEsc for improving DFKD against NTL teachers. Code is released at https://github.com/tmllab/2025_ICML_ATEsc.

nan


Article 989

Title@2025-07-05 (6): Relational inductive biases on attention mechanisms

Title: Relational inductive biases on attention mechanisms Relationale induktive Verzerrungen der Aufmerksamkeitsmechanismen 对关注机制的关系诱导偏见 2507.04117v1

Authors (6): Víctor Mijangos, Ximena Gutierrez-Vasques, Verónica E. Arriola, Ulises Rodríguez-Domínguez, Alexis Cervantes, José Luis Almanzara

Inductive learning aims to construct general models from specific examples, guided by biases that influence hypothesis selection and determine generalization capacity. In this work, we focus on characterizing the relational inductive biases present in attention mechanisms, understood as assumptions about the underlying relationships between data elements. From the perspective of geometric deep learning, we analyze the most common attention mechanisms in terms of their equivariance properties with respect to permutation subgroups, which allows us to propose a classification based on their relational biases. Under this perspective, we show that different attention layers are characterized by the underlying relationships they assume on the input data.

nan


Article 990

Title@2025-07-05 (6): How to Train Your LLM Web Agent: A Statistical Diagnosis

Title: How to Train Your LLM Web Agent: A Statistical Diagnosis Wie Sie Ihren LLM Web Agent trainieren: Eine statistische Diagnose 如何训练你的LLM网络代理:统计分析 2507.04103v1

Authors (16): Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.

nan


Article 991

Title@2025-07-05 (6): How more data can hurt: Instability and regularization in next-generation reservoir computing

Title: How more data can hurt: Instability and regularization in next-generation reservoir computing Wie mehr Daten schaden können: Instabilität und Regularisierung im Reservoir Computing der nächsten Generation 下一代储油层计算中的不稳定性和正规化 2407.08641v3

Authors (4): Yuanzhao Zhang, Edmilson Roque dos Santos, Huixin Zhang, Sean P. Cornelius

It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) – a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator’’ and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.

nan


Article 992

Title@2025-07-05 (6): Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems

Title: Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems Hierarchische Tests mit Rabbit-Optimierung für industrielle Cyber-Physische Systeme 工业网络物理系统对兔子最佳优化进行等级测试 2507.04100v1

Authors (6): Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang

This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO’s ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains.

nan


Article 993

Title@2025-07-05 (6): A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Title: A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory A LoD of Gaussians: Unified Training und Rendering für Ultra-Large Scale Rekonstruktion mit externem Speicher Gaussian AD: 利用外部记忆进行超大型重建的统一培训和投标 2507.01110v2

Authors (6): Felix Windisch, Lukas Radl, Thomas Köhler, Michael Steiner, Dieter Schmalstieg, Markus Steinberger

Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks – a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU – without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes – from broad aerial views to fine-grained ground-level details.

nan


Article 994

Title@2025-07-05 (6): Accurate and Efficient World Modeling with Masked Latent Transformers

Title: Accurate and Efficient World Modeling with Masked Latent Transformers Präzise und effiziente Weltmodellierung mit maskierten Latententransformatoren 以隐蔽的冷流变形器建立准确而高效的世界模型 2507.04075v1

Authors (2): Maxime Burchi, Radu Timofte

The Dreamer algorithm has recently obtained remarkable performance across diverse environment domains by training powerful agents with simulated trajectories. However, the compressed nature of its world model’s latent space can result in the loss of crucial information, negatively affecting the agent’s performance. Recent approaches, such as $\Delta$-IRIS and DIAMOND, address this limitation by training more accurate world models. However, these methods require training agents directly from pixels, which reduces training efficiency and prevents the agent from benefiting from the inner representations learned by the world model. In this work, we propose an alternative approach to world modeling that is both accurate and efficient. We introduce EMERALD (Efficient MaskEd latent tRAnsformer worLD model), a world model using a spatial latent state with MaskGIT predictions to generate accurate trajectories in latent space and improve the agent performance. On the Crafter benchmark, EMERALD achieves new state-of-the-art performance, becoming the first method to surpass human experts performance within 10M environment steps. Our method also succeeds to unlock all 22 Crafter achievements at least once during evaluation.

nan


Article 995

Title@2025-07-05 (6): Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Title: Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering Diffusionsmodelle lernen Low-Dimensional Distributionen über Subspace Clustering 通过子空间聚集学习低多样性分布的传播模型 2409.02426v4

Authors (6): Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image data, (ii) a union of manifold structure of image data, and (iii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution of image data as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model according to the score function of the assumed distribution. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. Based on this equivalence, we further show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions under the above data and model assumptions. This insight sheds light on why diffusion models can break the curse of dimensionality and exhibit the phase transition in learning distributions. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.

nan


Article 996

Title@2025-07-05 (6): Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering

Title: Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering Jenseits von unabhängigen Passagen: Adaptive Passage-Kombinations-Retrieval für retrieval Augmented Open-Domain-Fragebeantwortung 超过独立通过通道: 用于检索的适应性被动通道组合检索, 以获取更多开放域 问题解答 2507.04069v1

Authors (3): Ting-Wen Ko, Jyun-Yu Jiang, Pu-Jen Cheng

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external documents at inference time, enabling up-to-date knowledge access without costly retraining. However, conventional RAG methods retrieve passages independently, often leading to redundant, noisy, or insufficiently diverse context-particularly problematic - particularly problematic in noisy corpora and for multi-hop questions. To address this, we propose Adaptive Passage Combination Retrieval (AdaPCR), a novel framework for open-domain question answering with black-box LMs. AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking. It consists of a context-aware query reformulation using concatenated passages, and a reranking step trained with a predictive objective aligned with downstream answer likelihood. Crucially, AdaPCR adaptively selects the number of retrieved passages without additional stopping modules. Experiments across several QA benchmarks show that AdaPCR outperforms baselines, particularly in multi-hop reasoning, demonstrating the effectiveness of modeling inter-passage dependencies for improved retrieval.

nan


Article 997

Title@2025-07-05 (6): Attributing Data for Sharpness-Aware Minimization

Title: Attributing Data for Sharpness-Aware Minimization Zuordnung von Daten zur Schärfe-Bewusst-Minimierung 用于尽量减少尖锐度( 最小化)软件的数据 2507.04059v1

Authors (8): Chenyang Ren, Yifan Jia, Huanyi Xie, Zhaobin Xu, Tianxing Wei, Liangyu Wang, Lijie Hu, Di Wang

Sharpness-aware Minimization (SAM) improves generalization in large-scale model training by linking loss landscape geometry to generalization. However, challenges such as mislabeled noisy data and privacy concerns have emerged as significant issues. Data attribution, which identifies the contributions of specific training samples, offers a promising solution. However, directly rendering existing data influence evaluation tools such as influence functions (IF) to SAM will be inapplicable or inaccurate as SAM utilizes an inner loop to find model perturbations that maximize loss, which the outer loop then minimizes, resulting in a doubled computational structure. Additionally, this bilevel structure complicates the modeling of data influence on the parameters. In this paper, based on the IF, we develop two innovative data valuation methods for SAM, each offering unique benefits in different scenarios: the Hessian-based IF and the Gradient Trajectory-based IF. The first one provides a comprehensive estimation of data influence using a closed-form measure that relies only on the trained model weights. In contrast, the other IF for SAM utilizes gradient trajectory information during training for more accurate and efficient data assessment. Extensive experiments demonstrate their effectiveness in data evaluation and parameter tuning, with applications in identifying mislabeled data, model editing, and enhancing interpretability.

nan


Article 998

Title@2025-07-05 (6): Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms

Title: Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms Aggregierende Konzepte von Fairness und Genauigkeit in Vorhersagen Algorithmen 综合预测数值中的公平和准确概念 2505.08829v3

Authors (1): David Kinney

An algorithm that outputs predictions about the state of the world will almost always be designed with the implicit or explicit goal of outputting accurate predictions (i.e., predictions that are likely to be true). In addition, the rise of increasingly powerful predictive algorithms brought about by the recent revolution in artificial intelligence has led to an emphasis on building predictive algorithms that are fair, in the sense that their predictions do not systematically evince bias or bring about harm to certain individuals or groups. This state of affairs presents two conceptual challenges. First, the goals of accuracy and fairness can sometimes be in tension, and there are no obvious normative guidelines for managing the trade-offs between these two desiderata when they arise. Second, there are many distinct ways of measuring both the accuracy and fairness of a predictive algorithm; here too, there are no obvious guidelines on how to aggregate our preferences for predictive algorithms that satisfy disparate measures of fairness and accuracy to various extents. The goal of this paper is to address these challenges by arguing that there are good reasons for using a linear combination of accuracy and fairness metrics to measure the all-things-considered value of a predictive algorithm for agents who care about both accuracy and fairness. My argument depends crucially on a classic result in the preference aggregation literature due to Harsanyi. After making this formal argument, I apply my result to an analysis of accuracy-fairness trade-offs using the COMPAS dataset compiled by Angwin et al.

nan


Article 999

Title@2025-07-05 (6): Membership Inference Attacks on Large-Scale Models: A Survey

Title: Membership Inference Attacks on Large-Scale Models: A Survey Membership Inferenz Attacks on Large-Scale Models: Eine Umfrage 成员对大型模型的推断攻击:调查 2503.19338v2

Authors (2): Hengyu Wu, Yang Cao

The adoption of the Large Language Model (LLM) has accelerated dramatically since ChatGPT from OpenAI went online in November 2022. Recent advances in Large Multimodal Models (LMMs), which process diverse data types and enable interaction through various channels, have expanded beyond the text-to-text limitations of early LLMs, attracting significant and concurrent attention from both researchers and industry. While LLMs and LMMs are starting to spread widely, concerns about their privacy risks are increasing as well. Membership Inference Attacks (MIAs) are techniques used to determine whether a particular data point was part of a model’s training set, which is a key metric for assessing the privacy vulnerabilities of machine learning models. Hu et al. show that various machine learning algorithms are vulnerable to MIA. Despite extensive studies on MIAs in classic models, there remains a lack of systematic surveys addressing their effectiveness and limitations in advanced large-scale models like LLMs and LMMs. In this paper, we systematically reviewed recent studies of MIA against LLMs and LMMs. We analyzed and categorized each attack based on its methodology, scenario, and targeted model, and we discussed the limitations of existing research. In addition to examining attacks on pre-training and fine-tuning stages, we also explore MIAs that target other development pipelines, including Retrieval-Augmented Generation (RAG) and the model alignment process. Based on the survey, we provide suggestions for future studies to improve the robustness of MIA in large-scale AI models.

nan


Article 1000

Title@2025-07-05 (6): Predictive Modeling of Effluent Temperature in SAT Systems Using Ambient Meteorological Data: Implications for Infiltration Management

Title: Predictive Modeling of Effluent Temperature in SAT Systems Using Ambient Meteorological Data: Implications for Infiltration Management Prädiktive Modellierung der Effluenttemperatur in SAT-Systemen unter Verwendung von Ambient Meteorological Data: Implikationen für Infiltrationsmanagement 利用环境气象数据利用环境气象数据对SAT系统中的废水温度进行预测性模型建模:对渗透管理的影响 2507.04050v1

Authors (1): Roy Elkayam

Accurate prediction of effluent temperature in recharge basins is essential for optimizing the Soil Aquifer Treatment (SAT) process, as temperature directly influences water viscosity and infiltration rates. This study develops and evaluates predictive models for effluent temperature in the upper recharge layer of a Shafdan SAT system recharge basin using ambient meteorological data. Multiple linear regression (MLR), neural networks (NN), and random forests (RF) were tested for their predictive accuracy and interpretability. The MLR model, preferred for its operational simplicity and robust performance, achieved high predictive accuracy (R2 = 0.86-0.87) and was used to estimate effluent temperatures over a 10-year period. Results highlight pronounced seasonal temperature cycles and the importance of topsoil temperature in governing the thermal profile of the infiltrating effluent. The study provides practical equations for real-time monitoring and long-term planning of SAT operations.

nan


Article 1001

Title@2025-07-05 (6): Discrete Diffusion in Large Language and Multimodal Models: A Survey

Title: Discrete Diffusion in Large Language and Multimodal Models: A Survey Diskrete Diffusion in großen Sprach- und Multimodalmodellen: Eine Umfrage 大语言和多模式模型的分解传播:调查 2506.13759v3

Authors (3): Runpeng Yu, Qi Li, Xinchao Wang

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

nan


Article 1002

Title@2025-07-05 (6): Physics-Informed Graph Neural Networks to Reconstruct Local Fields Considering Finite Strain Hyperelasticity

Title: Physics-Informed Graph Neural Networks to Reconstruct Local Fields Considering Finite Strain Hyperelasticity Physik-informierte Graph-Neural-Netzwerke, um lokale Felder unter Berücksichtigung der Finite Strain Hyperelastizität zu rekonstruieren 物理成形图形神经网络,以重建考虑到有限斯特兰超弹性的局部场 2507.05291v1

Authors (4): Manuel Ricardo Guevara Garban, Yves Chemisky, Étienne Prulière, Michaël Clément

We propose a physics-informed machine learning framework called P-DivGNN to reconstruct local stress fields at the micro-scale, in the context of multi-scale simulation given a periodic micro-structure mesh and mean, macro-scale, stress values. This method is based in representing a periodic micro-structure as a graph, combined with a message passing graph neural network. We are able to retrieve local stress field distributions, providing average stress values produced by a mean field reduced order model (ROM) or Finite Element (FE) simulation at the macro-scale. The prediction of local stress fields are of utmost importance considering fracture analysis or the definition of local fatigue criteria. Our model incorporates physical constraints during training to constraint local stress field equilibrium state and employs a periodic graph representation to enforce periodic boundary conditions. The benefits of the proposed physics-informed GNN are evaluated considering linear and non linear hyperelastic responses applied to varying geometries. In the non-linear hyperelastic case, the proposed method achieves significant computational speed-ups compared to FE simulation, making it particularly attractive for large-scale applications.

nan


Article 1003

Title@2025-07-05 (6): Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks

Title: Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks Benchmarking stochastischer Annäherungsalgorithmen für Fairness-beschränkte Ausbildung von tiefen neuralen Netzwerken 为深神经网络的公平-有约束的深神经网络培训制定标准 2507.04033v1

Authors (4): Andrii Kliachkin, Jana Lepšová, Gilles Bareilles, Jakub Mareček

The ability to train Deep Neural Networks (DNNs) with constraints is instrumental in improving the fairness of modern machine-learning models. Many algorithms have been analysed in recent years, and yet there is no standard, widely accepted method for the constrained training of DNNs. In this paper, we provide a challenging benchmark of real-world large-scale fairness-constrained learning tasks, built on top of the US Census (Folktables). We point out the theoretical challenges of such tasks and review the main approaches in stochastic approximation algorithms. Finally, we demonstrate the use of the benchmark by implementing and comparing three recently proposed, but as-of-yet unimplemented, algorithms both in terms of optimization performance, and fairness improvement. We release the code of the benchmark as a Python package at https://github.com/humancompatible/train.

nan


Article 1004

Title@2025-07-05 (6): From Prompting to Alignment: A Generative Framework for Query Recommendation

Title: From Prompting to Alignment: A Generative Framework for Query Recommendation Von Prompting zu Alignment: Ein generativer Rahmen für die Abfrageempfehlung 从提示到对齐:查询建议生成框架 2504.10208v3

Authors (10): Erxue Min, Hsiu-Yuan Huang, Xihong Yang, Min Yang, Xin Jia, Yunfang Wu, Hengyi Cai, Junfeng Wang, Shuaiqiang Wang, Dawei Yin

In modern search systems, search engines often suggest relevant queries to users through various panels or components, helping refine their information needs. Traditionally, these recommendations heavily rely on historical search logs to build models, which suffer from cold-start or long-tail issues. Furthermore, tasks such as query suggestion, completion or clarification are studied separately by specific design, which lacks generalizability and hinders adaptation to novel applications. Despite recent attempts to explore the use of LLMs for query recommendation, these methods mainly rely on the inherent knowledge of LLMs or external sources like few-shot examples, retrieved documents, or knowledge bases, neglecting the importance of the calibration and alignment with user feedback, thus limiting their practical utility. To address these challenges, we first propose a general Generative Query Recommendation (GQR) framework that aligns LLM-based query generation with user preference. Specifically, we unify diverse query recommendation tasks by a universal prompt framework, leveraging the instruct-following capability of LLMs for effective generation. Secondly, we align LLMs with user feedback via presenting a CTR-alignment framework, which involves training a query-wise CTR predictor as a process reward model and employing list-wise preference alignment to maximize the click probability of the generated query list. Furthermore, recognizing the inconsistency between LLM knowledge and proactive search intents arising from the separation of user-initiated queries from models, we align LLMs with user initiative via retrieving co-occurrence queries as side information when historical logs are available.

nan


Article 1005

Title@2025-07-05 (6): An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks

Title: An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks Ein effizienter lokaler Suchansatz für polarisierte Community Discovery in Signed Networks 对签署网络中的极地化社区发现的有效地方搜索方法 2502.02197v2

Authors (2): Linus Aronsson, Morteza Haghir Chehreghani

Signed networks, where edges are labeled as positive or negative to represent friendly or antagonistic interactions, offer a natural framework for analyzing polarization, trust, and conflict in social systems. Detecting meaningful group structures in such networks is crucial for understanding online discourse, political divisions, and trust dynamics. A key challenge is to identify communities that are internally cohesive and externally antagonistic, while allowing for neutral or unaligned vertices. In this paper, we propose a method for identifying $k$ polarized communities that addresses a major limitation of prior methods: their tendency to produce highly size-imbalanced solutions. We introduce a novel optimization objective that avoids such imbalance. In addition, it is well known that approximation algorithms based on local search are highly effective for clustering signed networks when neutral vertices are not allowed. We build on this idea and design the first local search algorithm that extends to the setting with neutral vertices while scaling to large networks. By connecting our approach to block-coordinate Frank-Wolfe optimization, we prove a linear convergence rate, enabled by the structure of our objective. Experiments on real-world and synthetic datasets demonstrate that our method consistently outperforms state-of-the-art baselines in solution quality, while remaining competitive in computational efficiency.

nan


Article 1006

Title@2025-07-05 (6): Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models

Title: Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models Commute Networks als Signatur urbaner Sozioökonomischer Leistung: Bewertung von Mobilitätsstrukturen mit Deep-Learning-Modellen 将网络作为城市社会经济绩效的标志:利用深学习模式评价流动结构 2507.04027v1

Authors (3): Devashish Khulbe, Alexander Belyi, Stanislav Sobolevsky

Urban socioeconomic modeling has predominantly concentrated on extensive location and neighborhood-based features, relying on the localized population footprint. However, networks in urban systems are common, and many urban modeling methods don’t account for network-based effects. In this study, we propose using commute information records from the census as a reliable and comprehensive source to construct mobility networks across cities. Leveraging deep learning architectures, we employ these commute networks across U.S. metro areas for socioeconomic modeling. We show that mobility network structures provide significant predictive performance without considering any node features. Consequently, we use mobility networks to present a supervised learning framework to model a city’s socioeconomic indicator directly, combining Graph Neural Network and Vanilla Neural Network models to learn all parameters in a single learning pipeline. Our experiments in 12 major U.S. cities show the proposed model outperforms previous conventional machine learning models. This work provides urban researchers methods to incorporate network effects in urban modeling and informs stakeholders of wider network-based effects in urban policymaking and planning.

nan


Article 1007

Title@2025-07-05 (6): Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo

Title: Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo Test-Time Alignment von diskreten Diffusionsmodellen mit Sequential Monte Carlo 使用顺序式蒙特卡洛的分解传播模型的测试时间对齐 2505.22524v2

Authors (3): Chinmay Pani, Zijing Ou, Yingzhen Li

Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints but without task-specific fine-tuning. To this end, we propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time. Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function. To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework. Empirical results on both synthetic datasets and image modelling validate the effectiveness of our approach.

nan


Article 1008

Title@2025-07-05 (6): Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder

Title: Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder Adversariale Entflechtung durch Backpropagation mit physik-informiertem Variational Autoencoder 与物理不完善的变异自动编码器反向反向反向分解 2506.13658v2

Authors (2): Ioannis Christoforos Koune, Alice Cicirello

Inference and prediction under partial knowledge of a physical system is challenging, particularly when multiple confounding sources influence the measured response. Explicitly accounting for these influences in physics-based models is often infeasible due to epistemic uncertainty, cost, or time constraints, resulting in models that fail to accurately describe the behavior of the system. On the other hand, data-driven machine learning models such as variational autoencoders are not guaranteed to identify a parsimonious representation. As a result, they can suffer from poor generalization performance and reconstruction accuracy in the regime of limited and noisy data. We propose a physics-informed variational autoencoder architecture that combines the interpretability of physics-based models with the flexibility of data-driven models. To promote disentanglement of the known physics and confounding influences, the latent space is partitioned into physically meaningful variables that parametrize a physics-based model, and data-driven variables that capture variability in the domain and class of the physical system. The encoder is coupled with a decoder that integrates physics-based and data-driven components, and constrained by an adversarial training objective that prevents the data-driven components from overriding the known physics, ensuring that the physics-grounded latent variables remain interpretable. We demonstrate that the model is able to disentangle features of the input signal and separate the known physics from confounding influences using supervision in the form of class and domain observables. The model is evaluated on a series of synthetic case studies relevant to engineering structures, demonstrating the feasibility of the proposed approach.

nan


Article 1009

Title@2025-07-05 (6): Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics

Title: Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics Quantum Computing und Neuromorphic Computing für sicheres, zuverlässiges und erklärbares Multi-Agenten-Verstärkungslernen: Optimale Kontrolle in der autonomen Robotik 用于安全、可靠和可解释的多机构强化学习的量子计算和神经形态计算:自主机器人学的最佳控制 2408.03884v2

Authors (2): Mazyar Taghavi, Rahman Farnoosh

This paper investigates the utilization of Quantum Computing and Neuromorphic Computing for Safe, Reliable, and Explainable Multi_Agent Reinforcement Learning (MARL) in the context of optimal control in autonomous robotics. The objective was to address the challenges of optimizing the behavior of autonomous agents while ensuring safety, reliability, and explainability. Quantum Computing techniques, including Quantum Approximate Optimization Algorithm (QAOA), were employed to efficiently explore large solution spaces and find approximate solutions to complex MARL problems. Neuromorphic Computing, inspired by the architecture of the human brain, provided parallel and distributed processing capabilities, which were leveraged to develop intelligent and adaptive systems. The combination of these technologies held the potential to enhance the safety, reliability, and explainability of MARL in autonomous robotics. This research contributed to the advancement of autonomous robotics by exploring cutting-edge technologies and their applications in multi-agent systems. Codes and data are available.

nan


Article 1010

Title@2025-07-05 (6): Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Title: Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Easy Dataset: Ein einheitliches und erweiterbares Framework zur Synthese von LLM-Feineinstellungsdaten aus unstrukturierten Dokumenten 简易数据集:从无结构文档中合成LLM精密调试数据的统一和扩展框架 2507.04009v1

Authors (7): Ziyang Miao, Qiyu Sun, Jingyuan Wang, Yuchen Gong, Yaowei Zheng, Shiqi Li, Richong Zhang

Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle to extract reliable fine-tuning data from heterogeneous documents effectively. To address this limitation, we propose Easy Dataset, a unified framework for synthesizing fine-tuning data from unstructured documents via an intuitive graphical user interface (GUI). Specifically, Easy Dataset allows users to easily configure text extraction models and chunking strategies to transform raw documents into coherent text chunks. It then leverages a persona-driven prompting approach to generate diverse question-answer pairs using public-available LLMs. Throughout the pipeline, a human-in-the-loop visual interface facilitates the review and refinement of intermediate outputs to ensure data quality. Experiments on a financial question-answering task show that fine-tuning LLMs on the synthesized dataset significantly improves domain-specific performance while preserving general knowledge. The source code and installable package are available at https://github.com/ConardLi/easy-dataset and have garnered over 9,000 GitHub stars.

nan


Article 1011

Title@2025-07-05 (6): Score-based Diffusion Models via Stochastic Differential Equations – a Technical Tutorial

Title: Score-based Diffusion Models via Stochastic Differential Equations – a Technical Tutorial Score-basierte Diffusionsmodelle über stochastische Differentialgleichungen – ein technisches Tutorial 以记分为基础的通过斯托卡差异等式传播模型 – – 技术教学 2402.07487v3

Authors (2): Wenpin Tang, Hanyang Zhao

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling – sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency models, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily a technical introduction to the field, and practitioners may also find some analysis useful in designing new models or algorithms.

nan


Article 1012

Title@2025-07-05 (6): Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation

Title: Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation Nahtlose Integration von baumbasierten Positions-Embeddings in Transformer-Modelle zur Darstellung von Quellcodes 将植树定位嵌入式嵌入纳入源代码代表法的变换模型 2507.04003v1

Authors (2): Patryk Bartkowiak, Filip Graliński

Transformer-based models have demonstrated significant success in various source code representation tasks. Nonetheless, traditional positional embeddings employed by these models inadequately capture the hierarchical structure intrinsic to source code, typically represented as Abstract Syntax Trees (ASTs). To address this, we propose a novel tree-based positional embedding approach that explicitly encodes hierarchical relationships derived from ASTs, including node depth and sibling indices. These hierarchical embeddings are integrated into the transformer architecture, specifically enhancing the CodeBERTa model. We thoroughly evaluate our proposed model through masked language modeling (MLM) pretraining and clone detection fine-tuning tasks. Experimental results indicate that our Tree-Enhanced CodeBERTa consistently surpasses the baseline model in terms of loss, accuracy, F1 score, precision, and recall, emphasizing the importance of incorporating explicit structural information into transformer-based representations of source code.

nan


Article 1013

Title@2025-07-05 (6): Fast Re-Trainable Attention Autoencoder for Liquid Sensor Anomaly Detection at the Edge

Title: Fast Re-Trainable Attention Autoencoder for Liquid Sensor Anomaly Detection at the Edge Schnelle, retrainierbare Aufmerksamkeit Autoencoder für Flüssigkeitssensoranomalienerkennung an der Kante 边边缘液体感应器异常探测快速再培训注意自动编码器 2507.03995v1

Authors (1): Seongyun Choi

A lightweight, edge-deployable pipeline is proposed for detecting sensor anomalies in chemistry and biology laboratories. A custom PCB captures seven sensor channels and streams them over the local network. An Attention-based One-Class Autoencoder reaches a usable state after training on only thirty minutes of normal data. Despite the small data set, the model already attains an F1 score of 0.72, a precision of 0.89, and a recall of 0.61 when tested on synthetic micro-anomalies. The trained network is converted into a TensorFlow-Lite binary of about 31 kB and runs on an Advantech ARK-1221L, a fan-less x86 edge device without AVX instructions; end-to-end inference latency stays below two seconds. The entire collect-train-deploy workflow finishes within one hour, which demonstrates that the pipeline adapts quickly whenever a new liquid or sensor is introduced.

nan


Article 1014

Title@2025-07-05 (6): MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation

Title: MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation MalVol-25: Ein vielfältiges, beschriftetes und detailliertes Speicherdatensatz für Malware-Erkennung und -Ansprechtests und -Validierung MalVol-25:用于检测和反应测试及校验的多种、标签和详细、挥发性记忆数据集 2507.03993v1

Authors (3): Dipo Dunsin, Mohamed Chahine Ghanem, Eduardo Almeida Palmieri

This paper addresses the critical need for high-quality malware datasets that support advanced analysis techniques, particularly machine learning and agentic AI frameworks. Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective machine learning and agent-based AI training. To fill this gap, we developed a systematic approach for generating a dataset that combines automated malware execution in controlled virtual environments with dynamic monitoring tools. The resulting dataset comprises clean and infected memory snapshots across multiple malware families and operating systems, capturing detailed behavioural and environmental features. Key design decisions include applying ethical and legal compliance, thorough validation using both automated and manual methods, and comprehensive documentation to ensure replicability and integrity. The dataset’s distinctive features enable modelling system states and transitions, facilitating RL-based malware detection and response strategies. This resource is significant for advancing adaptive cybersecurity defences and digital forensic research. Its scope supports diverse malware scenarios and offers potential for broader applications in incident response and automated threat mitigation.

nan


Article 1015

Title: Adaptive Inference-Time Scaling via Cyclic Diffusion Search Adaptive Inferenz-Zeitskalierung über zyklische Diffusionssuche 通过cycclic传播搜索进行适应性推导-时间缩放 2505.14036v3

Authors (7): Gyubin Lee, Truong Nhat Nguyen Bao, Jaesik Yoon, Dongwoo Lee, Minsu Kim, Yoshua Bengio, Sungjin Ahn

Diffusion models have demonstrated strong generative capabilities across domains ranging from image synthesis to complex reasoning tasks. However, most inference-time scaling methods rely on fixed denoising schedules, limiting their ability to allocate computation based on instance difficulty or task-specific demands adaptively. We introduce the challenge of adaptive inference-time scaling-dynamically adjusting computational effort during inference-and propose Adaptive Bi-directional Cyclic Diffusion (ABCD), a flexible, search-based inference framework. ABCD refines outputs through bi-directional diffusion cycles while adaptively controlling exploration depth and termination. It comprises three components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time. Experiments show that ABCD improves performance across diverse tasks while maintaining computational efficiency.

nan


Article 1016

Title@2025-07-05 (6): Improving RL Exploration for LLM Reasoning through Retrospective Replay

Title: Improving RL Exploration for LLM Reasoning through Retrospective Replay Verbesserung der RL Exploration für LLM Reasoning durch retrospektive Replay 通过回溯性回放改进LL探索LLM 2504.14363v2

Authors (7): Shihan Dou, Muling Wu, Jingwen Xu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited capability at this stage prevents it from successfully solving these problems. The early suppression of these potentially valuable solution ideas by the policy gradient hinders the model’s ability to revisit and re-explore these ideas later. Consequently, although the LLM’s capabilities improve in the later stages of training, it still struggles to effectively address these complex problems. To address this exploration issue, we propose a novel algorithm named Retrospective Replay-based Reinforcement Learning (RRL), which introduces a dynamic replay mechanism throughout the training process. RRL enables the model to revisit promising states identified in the early stages, thereby improving its efficiency and effectiveness in exploration. To evaluate the effectiveness of RRL, we conduct extensive experiments on complex reasoning tasks, including mathematical reasoning and code generation, and general dialogue tasks. The results indicate that RRL maintains high exploration efficiency throughout the training period, significantly enhancing the effectiveness of RL in optimizing LLMs for complicated reasoning tasks. Moreover, it also improves the performance of RLHF, making the model both safer and more helpful.

nan


Article 1017

Title@2025-07-05 (6): Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data

Title: Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data Real-TabPFN: Tabuläre Stiftungsmodelle durch fortgesetztes Pre-Training mit Real-World Daten verbessern Real-TabPPFN:通过继续以真实世界数据进行预培训,改进图表基础模型 2507.03971v1

Authors (6): Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel Müller, Frank Hutter

Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase. Specifically, we demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior downstream predictive accuracy compared to using broader, potentially noisier corpora like CommonCrawl or GitTables. Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.

nan


Article 1018

Title@2025-07-05 (6): Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Title: Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Open-Reasoner-Zero: Ein Open Source Ansatz zur Skalierung von Verstärkungslernen am Basismodell 开放反馈-零:基于基础模式的扩大强化学习的开放源码方法 2503.24290v2

Authors (6): Jingcheng Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum

We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through extensive experiments, we demonstrate that a minimalist approach, vanilla PPO with GAE ($\lambda=1$, $\gamma=1$) and straightforward rule-based rewards, without any KL regularization, is sufficient to scale up both benchmark performance and response length, replicating the scaling phenomenon observed in DeepSeek-R1-Zero. Using the same base model, Qwen2.5-32B base, as DeepSeek-R1-Zero-Qwen-32B, our implementation achieves superior performance across AIME2024, MATH500, and GPQA Diamond, while demonstrating remarkable efficiency, requiring only 1/10 of the training steps compared to the DeepSeek-R1-Zero pipeline. Moreover, our analysis not only covers training dynamics and ablation for critical design choices, but also quantitatively shows how the learned critic in Reasoner-Zero training effectively identifies and devalues repetitive response patterns, yielding more robust advantage estimations and enhancing training stability. Embracing the principles of open-source, we release our source code, training data, and various model weights, fostering reproducibility and encouraging further exploration of the properties of related models.

nan


Article 1019

Title@2025-07-05 (6): A Comparative Study of Specialized LLMs as Dense Retrievers

Title: A Comparative Study of Specialized LLMs as Dense Retrievers Eine vergleichende Studie von spezialisierten LLMs als Dense Retriever 专业专利妇女作为 “ 大量开发 “ 的比较研究 2507.03958v1

Authors (3): Hengran Zhang, Keping Bi, Jiafeng Guo

While large language models (LLMs) are increasingly deployed as dense retrievers, the impact of their domain-specific specialization on retrieval effectiveness remains underexplored. This investigation systematically examines how task-specific adaptations in LLMs influence their retrieval capabilities, an essential step toward developing unified retrievers capable of handling text, code, images, and multimodal content. We conduct extensive experiments with eight Qwen2.5 7B LLMs, including base, instruction-tuned, code/math-specialized, long reasoning, and vision-language models across zero-shot retrieval settings and the supervised setting. For the zero-shot retrieval settings, we consider text retrieval from the BEIR benchmark and code retrieval from the CoIR benchmark. Further, to evaluate supervised performance, all LLMs are fine-tuned on the MS MARCO dataset. We find that mathematical specialization and the long reasoning capability cause consistent degradation in three settings, indicating conflicts between mathematical reasoning and semantic matching. The vision-language model and code-specialized LLMs demonstrate superior zero-shot performance compared to other LLMs, even surpassing BM25 on the code retrieval task, and maintain comparable performance to base LLMs in supervised settings. These findings suggest promising directions for the unified retrieval task leveraging cross-domain and cross-modal fusion.

nan


Article 1020

Title@2025-07-05 (6): Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT Networks

Title: Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT Networks Optimierung des Zeitalters des Vertrauens und des Durchsatzes in Multi-Hop UAV-geförderten IoT-Netzwerken 优化多Hop UAV援助的IOT网络的信任和输送时间 2507.03950v1

Authors (7): Yizhou Luo, Kwan-Wu Chin, Ruyi Guan, Xi Xiao, Caimeng Wang, Jingyin Feng, Tengjiao He

Devices operating in Internet of Things (IoT) networks may be deployed across vast geographical areas and interconnected via multi-hop communications. Further, they may be unguarded. This makes them vulnerable to attacks and motivates operators to check on devices frequently. To this end, we propose and study an Unmanned Aerial Vehicle (UAV)-aided attestation framework for use in IoT networks with a charging station powered by solar. A key challenge is optimizing the trajectory of the UAV to ensure it attests as many devices as possible. A trade-off here is that devices being checked by the UAV are offline, which affects the amount of data delivered to a gateway. Another challenge is that the charging station experiences time-varying energy arrivals, which in turn affect the flight duration and charging schedule of the UAV. To address these challenges, we employ a Deep Reinforcement Learning (DRL) solution to optimize the UAV’s charging schedule and the selection of devices to be attested during each flight. The simulation results show that our solution reduces the average age of trust by 88% and throughput loss due to attestation by 30%.

nan


Article 1021

Title: Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs Graph Kollaboratives Aufmerksamkeitsnetzwerk für Link-Vorhersage in Wissensgraphen 知识图中预测联系协作关注网络 2507.03947v1

Authors (1): Thanh Hoang-Minh

Knowledge graphs offer a structured representation of real-world entities and their relationships, enabling a wide range of applications from information retrieval to automated reasoning. In this paper, we conduct a systematic comparison between traditional rule-based approaches and modern deep learning methods for link prediction. We focus on KBGAT, a graph neural network model that leverages multi-head attention to jointly encode both entity and relation features within local neighborhood structures. To advance this line of research, we introduce \textbf{GCAT} (Graph Collaborative Attention Network), a refined model that enhances context aggregation and interaction between heterogeneous nodes. Experimental results on four widely-used benchmark datasets demonstrate that GCAT not only consistently outperforms rule-based methods but also achieves competitive or superior performance compared to existing neural embedding models. Our findings highlight the advantages of attention-based architectures in capturing complex relational patterns for knowledge graph completion tasks.

nan


Article 1022

Title@2025-07-05 (6): Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Title: Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance Selbstlenkende, nicht-lineare, räumlich selektive Filter zur effizienten Extraktion von beweglichen Lautsprechern unter schwacher Anleitung 在薄弱指导下有效抽取流动发言者的自制深不滑空间选择过滤器 2507.02791v2

Authors (4): Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter. Evaluation on a synthetic dataset illustrates how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance. A listening test with real-world recordings complements these findings by indicating a clear trend towards our proposed self-steering pipeline as preferred choice over comparable methods.

nan


Article 1023

Title@2025-07-05 (6): Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series

Title: Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction mit Weather Image Time Series 纯粹外源预测所用时间分配的深学习模型:利用气象图像时间序列对水表深度预测应用 2409.13284v2

Authors (4): Matteo Salis, Abdourrahmane M. Atto, Stefano Ferraris, Rosa Meo

Groundwater resources are one of the most relevant elements in the water cycle, therefore developing models to accurately predict them is a pivotal task in the sustainable resource management framework. Deep Learning (DL) models have been revealed to be very effective in hydrology, especially by feeding spatially distributed data (e.g. raster data). In many regions, hydrological measurements are difficult to obtain regularly or periodically in time, and in some cases, the last available data are not up to date. Reversely, weather data, which significantly impacts water resources, are usually more available and with higher quality. More specifically, we have proposed two different DL models to predict the water table depth in the Grana-Maira catchment (Piemonte, IT) using only exogenous weather image time series. To deal with the image time series, both models are made of a first Time Distributed Convolutional Neural Network (TDC) which encodes the image available at each time step into a vectorial representation. The first model, TDC-LSTM uses then a Sequential Module based on an LSTM layer to learn temporal relations and output the predictions. The second model, TDC-UnPWaveNet uses instead a new version of the WaveNet architecture, adapted here to output a sequence shorter and completely shifted in the future with respect to the input one. To this aim, and to deal with the different sequence lengths in the UnPWaveNet, we have designed a new Channel Distributed layer, that acts like a Time Distributed one but on the channel dimension, i.e. applying the same set of operations to each channel of the input. TDC-LSTM and TDC-UnPWaveNet have shown both remarkable results. However, the two models have focused on different learnable information: TDC-LSTM has focused more on lowering the bias, while TDC-UnPWaveNet has focused more on the temporal dynamics, maximizing correlation, and KGE.

nan


Article 1024

Title@2025-07-05 (6): Hierarchical graph sampling based minibatch learning with chain preservation and variance reduction

Title: Hierarchical graph sampling based minibatch learning with chain preservation and variance reduction Hierarchisches Graphen-Sampling basiertes Minibatch-Lernen mit Kettenerhaltung und Varianzreduktion 以等级图表抽样抽样为基础的小型学习,同时保持链条和减少差异 2503.00860v7

Authors (2): Qia Hu, Bo Jiao

Graph sampling-based Graph Convolutional Networks (GCNs) decouple sampling from forward and backward propagation during minibatch training, enhancing scalability with respect to layer depth and graph size. We propose HIS_GCNs, a hierarchical importance sampling-based learning method. By constructing minibatches using sampled subgraphs, HIS_GCNs focuses on the importance of both the core and periphery in a scale-free training graph. Specifically, it preserves the centrum of the core in most minibatches, which maintains connectivity between periphery nodes, and samples periphery edges without core node interference, which allows longer chains composed entirely of low-degree nodes remain within the same minibatch. HIS_GCNs can maximize the discrete Ricci curvature (i.e., Ollivier-Ricci curvatures) of the edges in a subgraph, enabling preservation of important chains for information propagation. This approach can achieve a low node embedding variance and a high convergence speed. Diverse experiments on Graph Neural Networks (GNNs) with node classification tasks confirmed the superior performance of HIS_GCNs in terms of both accuracy and training time. Open-source code (https://github.com/HuQiaCHN/HIS-GCN).

nan


Article 1025

Title@2025-07-05 (6): Learning Physically Interpretable Atmospheric Models from Data with WSINDy

Title: Learning Physically Interpretable Atmospheric Models from Data with WSINDy Physikalisch interpretierbare atmosphärische Modelle aus Daten mit WSINDy lernen WSINDy 数据产生的学习物理上可解释的大气模型 2501.00738v2

Authors (4): Seth Minor, Daniel A. Messenger, Vanja Dukic, David M. Bortz

The multiscale and turbulent nature of Earth’s atmosphere has historically rendered accurate weather modeling a hard problem. Recently, there has been an explosion of interest surrounding data-driven approaches to weather modeling, which in many cases show improved forecasting accuracy and computational efficiency when compared to traditional methods. However, many of the current data-driven approaches employ highly parameterized neural networks, often resulting in uninterpretable models and limited gains in scientific understanding. In this work, we address the interpretability problem by explicitly discovering partial differential equations governing atmospheric phenomena, identifying symbolic mathematical models with direct physical interpretations. The purpose of this paper is to demonstrate that, in particular, the Weak form Sparse Identification of Nonlinear Dynamics (WSINDy) algorithm can learn effective atmospheric models from both simulated and assimilated data. Our approach adapts the standard WSINDy algorithm to work with high-dimensional fluid data of arbitrary spatial dimension.

nan


Article 1026

Title@2025-07-05 (6): MCST-Mamba: Multivariate Mamba-Based Model for Traffic Prediction

Title: MCST-Mamba: Multivariate Mamba-Based Model for Traffic Prediction MCST-Mamba: Multivariates Mamba-basiertes Modell für Verkehrsvorhersage MCST-Mamba:多变 Mamba 流量预测模型 2507.03927v1

Authors (3): Mohamed Hamad, Mohamed Mabrok, Nizar Zorba

Accurate traffic prediction plays a vital role in intelligent transportation systems by enabling efficient routing, congestion mitigation, and proactive traffic control. However, forecasting is challenging due to the combined effects of dynamic road conditions, varying traffic patterns across different locations, and external influences such as weather and accidents. Traffic data often consists of several interrelated measurements - such as speed, flow and occupancy - yet many deep-learning approaches either predict only one of these variables or require a separate model for each. This limits their ability to capture joint patterns across channels. To address this, we introduce the Multi-Channel Spatio-Temporal (MCST) Mamba model, a forecasting framework built on the Mamba selective state-space architecture that natively handles multivariate inputs and simultaneously models all traffic features. The proposed MCST-Mamba model integrates adaptive spatio-temporal embeddings and separates the modeling of temporal sequences and spatial sensor interactions into two dedicated Mamba blocks, improving representation learning. Unlike prior methods that evaluate on a single channel, we assess MCST-Mamba across all traffic features at once, aligning more closely with how congestion arises in practice. Our results show that MCST-Mamba achieves strong predictive performance with a lower parameter count compared to baseline models.

nan


Article 1027

Title@2025-07-05 (6): Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model

Title: Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model Kombination von Graph-Neural-Netzwerken und gemischter Integer-Linear-Programmierung für molekulare Inferenz unter dem zweilagigen Modell 在双面模型下结合图形神经网络和混合整数线性线性规划,用于分子推断 2507.03920v1

Authors (5): Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

Recently, a novel two-phase framework named mol-infer for inference of chemical compounds with prescribed abstract structures and desired property values has been proposed. The framework mol-infer is primarily based on using mixed integer linear programming (MILP) to simulate the computational process of machine learning methods and describe the necessary and sufficient conditions to ensure such a chemical graph exists. The existing approaches usually first convert the chemical compounds into handcrafted feature vectors to construct prediction functions, but because of the limit on the kinds of descriptors originated from the need for tractability in the MILP formulation, the learning performances on datasets of some properties are not good enough. A lack of good learning performance can greatly lower the quality of the inferred chemical graphs, and thus improving learning performance is of great importance. On the other hand, graph neural networks (GNN) offer a promising machine learning method to directly utilize the chemical graphs as the input, and many existing GNN-based approaches to the molecular property prediction problem have shown that they can enjoy better learning performances compared to the traditional approaches that are based on feature vectors. In this study, we develop a molecular inference framework based on mol-infer, namely mol-infer-GNN, that utilizes GNN as the learning method while keeping the great flexibility originated from the two-layered model on the abstract structure of the chemical graph to be inferred. We conducted computational experiments on the QM9 dataset to show that our proposed GNN model can obtain satisfying learning performances for some properties despite its simple structure, and can infer small chemical graphs comprising up to 20 non-hydrogen atoms within reasonable computational time.

nan


Article 1028

Title: Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search Konsistenz-Bewusst-Padding für unvollständige Multi-Modal Alignment Clustering basierend auf selbst-abweisender Greedy Ankersuche 以自应贪婪锁定搜索为基础,为不完全多模式对齐聚合集成设置一致软件面 2507.03917v1

Authors (5): Shubin Ma, Liang Zhao, Mingdong Lu, Yifan Guo, Bo Xu

Multimodal representation is faithful and highly effective in describing real-world data samples’ characteristics by describing their complementary information. However, the collected data often exhibits incomplete and misaligned characteristics due to factors such as inconsistent sensor frequencies and device malfunctions. Existing research has not effectively addressed the issue of filling missing data in scenarios where multiview data are both imbalanced and misaligned. Instead, it relies on class-level alignment of the available data. Thus, it results in some data samples not being well-matched, thereby affecting the quality of data fusion. In this paper, we propose the Consistency-Aware Padding for Incomplete Multimodal Alignment Clustering Based on Self-Repellent Greedy Anchor Search(CAPIMAC) to tackle the problem of filling imbalanced and misaligned data in multimodal datasets. Specifically, we propose a self-repellent greedy anchor search module(SRGASM), which employs a self-repellent random walk combined with a greedy algorithm to identify anchor points for re-representing incomplete and misaligned multimodal data. Subsequently, based on noise-contrastive learning, we design a consistency-aware padding module (CAPM) to effectively interpolate and align imbalanced and misaligned data, thereby improving the quality of multimodal data fusion. Experimental results demonstrate the superiority of our method over benchmark datasets. The code will be publicly released at https://github.com/Autism-mm/CAPIMAC.git.

nan


Article 1029

Title@2025-07-05 (6): Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces

Title: Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces Rückkehr des Latent Space COWBOYS: Der Einsatz von VAEs für die Bayesian-Optimierung strukturierter Räume neu denken 冷层空间COWBOYS的返回:重新思考如何利用VAEs优化Bayesian结构空间 2507.03910v1

Authors (3): Henry B. Moss, Sebastian W. Ober, Tom Diethe

Bayesian optimisation in the latent space of a Variational AutoEncoder (VAE) is a powerful framework for optimisation tasks over complex structured domains, such as the space of scientifically interesting molecules. However, existing approaches tightly couple the surrogate and generative models, which can lead to suboptimal performance when the latent space is not tailored to specific tasks, which in turn has led to the proposal of increasingly sophisticated algorithms. In this work, we explore a new direction, instead proposing a decoupled approach that trains a generative model and a Gaussian Process (GP) surrogate separately, then combines them via a simple yet principled Bayesian update rule. This separation allows each component to focus on its strengths – structure generation from the VAE and predictive modelling by the GP. We show that our decoupled approach improves our ability to identify high-potential candidates in molecular optimisation problems under constrained evaluation budgets.

nan


Article 1030

Title@2025-07-05 (6): Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning

Title: Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning Risikosensitive Aktor-Kritik mit statischen Spektralrisikomaßnahmen für Online- und Offline-Verstärkungstraining 与在线和离线强化学习的静态频谱风险衡量措施有关的对风险敏感的行为者-批评 2507.03900v1

Authors (2): Mehrdad Moghimi, Hyejin Ku

The development of Distributional Reinforcement Learning (DRL) has introduced a natural way to incorporate risk sensitivity into value-based and actor-critic methods by employing risk measures other than expectation in the value function. While this approach is widely adopted in many online and offline RL algorithms due to its simplicity, the naive integration of risk measures often results in suboptimal policies. This limitation can be particularly harmful in scenarios where the need for effective risk-sensitive policies is critical and worst-case outcomes carry severe consequences. To address this challenge, we propose a novel framework for optimizing static Spectral Risk Measures (SRM), a flexible family of risk measures that generalizes objectives such as CVaR and Mean-CVaR, and enables the tailoring of risk preferences. Our method is applicable to both online and offline RL algorithms. We establish theoretical guarantees by proving convergence in the finite state-action setting. Moreover, through extensive empirical evaluations, we demonstrate that our algorithms consistently outperform existing risk-sensitive methods in both online and offline environments across diverse domains.

nan


Article 1031

Title@2025-07-05 (6): Transformer Model for Alzheimer’s Disease Progression Prediction Using Longitudinal Visit Sequences

Title: Transformer Model for Alzheimer’s Disease Progression Prediction Using Longitudinal Visit Sequences Transformer-Modell für Alzheimer-Krankheit Progressionsvorhersage mit Längsschnitt Besuch Sequenzen 利用纵向访问序列对阿尔茨海默氏病变异变异模型 2507.03899v1

Authors (3): Mahdi Moghaddami, Clayton Schubring, Mohammad-Reza Siadat

Alzheimer’s disease (AD) is a neurodegenerative disorder with no known cure that affects tens of millions of people worldwide. Early detection of AD is critical for timely intervention to halt or slow the progression of the disease. In this study, we propose a Transformer model for predicting the stage of AD progression at a subject’s next clinical visit using features from a sequence of visits extracted from the subject’s visit history. We also rigorously compare our model to recurrent neural networks (RNNs) such as long short-term memory (LSTM), gated recurrent unit (GRU), and minimalRNN and assess their performances based on factors such as the length of prior visits and data imbalance. We test the importance of different feature categories and visit history, as well as compare the model to a newer Transformer-based model optimized for time series. Our model demonstrates strong predictive performance despite missing visits and missing features in available visits, particularly in identifying converter subjects – individuals transitioning to more severe disease stages – an area that has posed significant challenges in longitudinal prediction. The results highlight the model’s potential in enhancing early diagnosis and patient outcomes.

nan


Article 1032

Title@2025-07-05 (6): GenAI-Powered Inference

Title: GenAI-Powered Inference GenAI-Powered-Schlussfolgerung GenAI 授权推论 2507.03897v1

Authors (2): Kosuke Imai, Kentaro Nakamura

We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates’ facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.

nan


Article 1033

Title@2025-07-05 (6): LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records

Title: LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records LabTOP: Ein einheitliches Modell für Labortestergebnisse Vorhersage auf elektronische Gesundheitsdatensätze LabTOP:电子健康记录实验室试验结果预测统一模型 2502.14259v5

Authors (3): Sujeong Im, Jungwoo Oh, Edward Choi

Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose LabTOP, a unified model that predicts lab test outcomes by leveraging a language modeling approach on EHR data. Unlike conventional methods that estimate only a subset of lab tests or classify discrete value ranges, LabTOP performs continuous numerical predictions for a diverse range of lab items. We evaluate LabTOP on three publicly available EHR datasets and demonstrate that it outperforms existing methods, including traditional machine learning models and state-of-the-art large language models. We also conduct extensive ablation studies to confirm the effectiveness of our design choices. We believe that LabTOP will serve as an accurate and generalizable framework for lab test outcome prediction, with potential applications in clinical decision support and early detection of critical conditions.

nan


Article 1034

Title@2025-07-05 (6): Unraveling the Black-box Magic: An Analysis of Neural Networks’ Dynamic Local Extrema

Title: Unraveling the Black-box Magic: An Analysis of Neural Networks’ Dynamic Local Extrema Entwirren der Black-Box-Magie: Eine Analyse der dynamischen lokalen Neuralnetzwerke 解开黑箱魔术:神经网络动态局部极端现象分析 2507.03885v1

Authors (1): Shengjian Chen

We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the local extrema of the model function. We further prove that the number of local extrema in a neural network is positively correlated with the number of its parameters, and on this basis, we give a new algorithm that is different from the back-propagation algorithm, which we call the extremum-increment algorithm. Some difficult situations, such as gradient vanishing and overfitting, can be reasonably explained and dealt with in this framework.

nan


Article 1035

Title@2025-07-05 (6): Leo Breiman, the Rashomon Effect, and the Occam Dilemma

Title: Leo Breiman, the Rashomon Effect, and the Occam Dilemma Leo Breiman, der Rashomon-Effekt und das Occam-Dilemma 里欧布雷曼,拉索蒙效应 和奥卡姆困境 2507.03884v1

Authors (1): Cynthia Rudin

In the famous Two Cultures paper, Leo Breiman provided a visionary perspective on the cultures of ‘‘data models’’ (modeling with consideration of data generation) versus ‘‘algorithmic models’’ (vanilla machine learning models). I provide a modern perspective on these approaches. One of Breiman’s key arguments against data models is the ‘‘Rashomon Effect,’’ which is the existence of many different-but-equally-good models. The Rashomon Effect implies that data modelers would not be able to determine which model generated the data. Conversely, one of his core advantages in favor of data models is simplicity, as he claimed there exists an ‘‘Occam Dilemma,’’ i.e., an accuracy-simplicity tradeoff. After 25 years of powerful computers, it has become clear that this claim is not generally true, in that algorithmic models do not need to be complex to be accurate; however, there are nuances that help explain Breiman’s logic, specifically, that by ‘‘simple,’’ he appears to consider only linear models or unoptimized decision trees. Interestingly, the Rashomon Effect is a key tool in proving the nullification of the Occam Dilemma. To his credit though, Breiman did not have the benefit of modern computers, with which my observations are much easier to make. Breiman’s goal for interpretability was somewhat intertwined with causality: simpler models can help reveal which variables have a causal relationship with the outcome. However, I argue that causality can be investigated without the use of single models, whether or not they are simple. Interpretability is useful in its own right, and I think Breiman knew that too. Technically, my modern perspective does not belong to either of Breiman’s Two Cultures, but shares the goals of both of them - causality, simplicity, accuracy - and shows that these goals can be accomplished in other ways, without the limitations Breiman was concerned about.

nan


Article 1036

Title@2025-07-05 (6): On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Title: On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v2

Authors (3): Zhen Qin, Jinxin Zhou, Zhihui Zhu

Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components–such as self-attention mechanisms and feedforward networks–without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

nan


Article 1037

Title@2025-07-05 (6): Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application

Title: Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application Neurale Modelle von multiskaligen Systemen: konzeptionelle Einschränkungen, stochastische Parametrisierungen und eine Klimaanwendung 多尺度系统的神经模型:概念局限性、随机超称和气候应用 2506.22552v2

Authors (1): Fabrizio Falasca

This work explores key conceptual limitations in data-driven modeling of multiscale dynamical systems, focusing on neural emulators and stochastic climate modeling. A skillful climate model should capture both stationary statistics and responses to external perturbations. While current autoregressive neural models often reproduce the former, they typically struggle with the latter. We begin by analyzing a low-dimensional dynamical system to expose, by analogy, fundamental limitations that persist in high-dimensional settings. Specifically, we construct neural stochastic models under two scenarios: one where the full state vector is observed, and another with only partial observations (i.e. a subset of variables). In the first case, the models accurately capture both equilibrium statistics and forced responses in ensemble mean and variance. In the more realistic case of partial observations, two key challenges emerge: (i) identifying the \textit{proper} variables to model, and (ii) parameterizing the influence of unobserved degrees of freedom. These issues are not specific to neural networks but reflect fundamental limitations of data-driven modeling and the need to target the slow dynamics of the system. We argue that physically grounded strategies – such as coarse-graining and stochastic parameterizations – are critical, both conceptually and practically, for the skillful emulation of complex systems like the coupled climate system. Building on these insights, we turn to a more realistic application: a stochastic reduced neural model of the sea surface temperature field and the net radiative flux at the top of the atmosphere, assessing its stationary statistics, response to temperature forcing, and interpretability.

nan


Article 1038

Title@2025-07-05 (6): Predicting function of evolutionarily implausible DNA sequences

Title: Predicting function of evolutionarily implausible DNA sequences Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen 进化性DNA序列的预测功能 2506.10271v2

Authors (3): Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang

Genomic language models (gLMs) show potential for generating novel, functional DNA sequences for synthetic biology, but doing so requires them to learn not just evolutionary plausibility, but also sequence-to-function relationships. We introduce a set of prediction tasks called Nullsettes, which assesses a model’s ability to predict loss-of-function mutations created by translocating key control elements in synthetic expression cassettes. Across 12 state-of-the-art models, we find that mutation effect prediction performance strongly correlates with the predicted likelihood of the nonmutant. Furthermore, the range of likelihood values predictive of strong model performance is highly dependent on sequence length. Our work highlights the importance of considering both sequence likelihood and sequence length when using gLMs for mutation effect prediction.

nan


Article 1039

Title@2025-07-05 (6): LoRAShield: Data-Free Editing Alignment for Secure Personalized LoRA Sharing

Title: LoRAShield: Data-Free Editing Alignment for Secure Personalized LoRA Sharing LoRAShield: Datenfreie Bearbeitungsausrichtung für die sichere Personalisierte LoRA-Sharing LoRAShild: 安全个人化LORA共享的无数据编辑一致 2507.07056v1

Authors (9): Jiahao Chen, junhao li, Yiming Wang, Zhe Ma, Yi Jiang, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji

The proliferation of Low-Rank Adaptation (LoRA) models has democratized personalized text-to-image generation, enabling users to share lightweight models (e.g., personal portraits) on platforms like Civitai and Liblib. However, this “share-and-play” ecosystem introduces critical risks: benign LoRAs can be weaponized by adversaries to generate harmful content (e.g., political, defamatory imagery), undermining creator rights and platform safety. Existing defenses like concept-erasure methods focus on full diffusion models (DMs), neglecting LoRA’s unique role as a modular adapter and its vulnerability to adversarial prompt engineering. To bridge this gap, we propose LoRAShield, the first data-free editing framework for securing LoRA models against misuse. Our platform-driven approach dynamically edits and realigns LoRA’s weight subspace via adversarial optimization and semantic augmentation. Experimental results demonstrate that LoRAShield achieves remarkable effectiveness, efficiency, and robustness in blocking malicious generations without sacrificing the functionality of the benign task. By shifting the defense to platforms, LoRAShield enables secure, scalable sharing of personalized models, a critical step toward trustworthy generative ecosystems.

nan


Article 1040

Title@2025-07-05 (6): Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States

Title: Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States Verbesserung der adaptiven Verhaltensinterventionen mit LLM-Schlussfolgerungen aus den von den Teilnehmern benannten Staaten 加强适应性行为干预,使用LLM LLM 2507.03871v1

Authors (2): Karine Karine, Benjamin M. Marlin

The use of reinforcement learning (RL) methods to support health behavior change via personalized and just-in-time adaptive interventions is of significant interest to health and behavioral science researchers focused on problems such as smoking cessation support and physical activity promotion. However, RL methods are often applied to these domains using a small collection of context variables to mitigate the significant data scarcity issues that arise from practical limitations on the design of adaptive intervention trials. In this paper, we explore an approach to significantly expanding the state space of an adaptive intervention without impacting data efficiency. The proposed approach enables intervention participants to provide natural language descriptions of aspects of their current state. It then leverages inference with pre-trained large language models (LLMs) to better align the policy of a base RL method with these state descriptions. To evaluate our method, we develop a novel physical activity intervention simulation environment that generates text-based state descriptions conditioned on latent state variables using an auxiliary LLM. We show that this approach has the potential to significantly improve the performance of online policy learning methods.

nan


Article 1041

Title@2025-07-05 (6): A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

Title: A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime Eine strenge Verhaltensbewertung von CNNs mittels eines Data-Domain-Sampling-Regimes 利用数据域抽样制度对有线电视新闻网进行严格行为评估 2507.03866v1

Authors (5): Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen

We present a data-domain sampling regime for quantifying CNNs’ graphic perception behaviors. This regime lets us evaluate CNNs’ ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 113 human participants, we arrived at a simple and actionable conclusion: CNNs can outperform humans and their biases simply depend on the training-test distance. We show evidence of this simple, elegant behavior of the machines when they interpret visualization images. osf.io/gfqc3 provides registration, the code for our sampling regime, and experimental results.

nan


Article 1042

Title@2025-07-05 (6): Efficient generative adversarial networks using linear additive-attention Transformers

Title: Efficient generative adversarial networks using linear additive-attention Transformers Effiziente generative Adversarial-Netzwerke mit linearen Additive-Attention-Transformatoren 使用线性添加添加剂-意向变异器的高效基因化对抗性网络 2401.09596v5

Authors (2): Emilio Morales-Juarez, Gibran Fuentes-Pineda

Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present a novel GAN architecture which we call LadaGAN. This architecture is based on a linear attention Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions while being significantly more efficient. Moreover, LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models (e.g. DMs) using orders of magnitude less computational resources.

nan


Article 1043

Title@2025-07-05 (6): OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference

Title: OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference OrthoRank: Tokenauswahl über Sink Token Orthogonalität für effiziente LLM-Inferenz OrthoRank:通过 Sink Token 条形对高效LLM 推论进行当量选择 2507.03865v1

Authors (3): Seungjun Shin, Jaehoon Oh, Dokwan Oh

Attention mechanisms are central to the success of large language models (LLMs), enabling them to capture intricate token dependencies and implicitly assign importance to each token. Recent studies have revealed the sink token, which receives disproportionately high attention despite their limited semantic role. In this paper, we first expand the relationship between the sink token and other tokens, moving beyond attention to explore their similarity in hidden states, considering the layer depth. We observe that as the layers get deeper, the cosine similarity between the normalized hidden states of the sink token and those of other tokens increases, and that the normalized hidden states of the sink token exhibit negligible changes. These imply that other tokens consistently are directed toward the sink token throughout the layers. Next, we propose a dynamic token selection method, called OrthoRank, using these findings to select important tokens. Specifically, in a certain layer, we define token importance by the speed at which the token moves toward the sink token. This is converted into orthogonality with the sink token, meaning that tokens that are more orthogonal to the sink token are assigned greater importance. Finally, through extensive experiments, we demonstrated that our method results in lower perplexity and higher zero-shot accuracy compared to layer pruning methods at the same sparsity ratio with comparable throughput, while also achieving superior performance on LongBench.

nan


Article 1044

Title@2025-07-05 (6): Enhanced accuracy through ensembling of randomly initialized auto-regressive models for time-dependent PDEs

Title: Enhanced accuracy through ensembling of randomly initialized auto-regressive models for time-dependent PDEs Verbesserte Genauigkeit durch Assembling von zufällig initialisierten auto-regressiven Modellen für zeitabhängige PDEs 通过混合随机初始化自动递减模型,为时间依赖性项目设计实体组合随机初始化自动递减模型,提高准确性 2507.03863v1

Authors (4): Ishan Khurjekar, Indrashish Saha, Lori Graham-Brady, Somdatta Goswami

Systems governed by partial differential equations (PDEs) require computationally intensive numerical solvers to predict spatiotemporal field evolution. While machine learning (ML) surrogates offer faster solutions, autoregressive inference with ML models suffer from error accumulation over successive predictions, limiting their long-term accuracy. We propose a deep ensemble framework to address this challenge, where multiple ML surrogate models with random weight initializations are trained in parallel and aggregated during inference. This approach leverages the diversity of model predictions to mitigate error propagation while retaining the autoregressive strategies ability to capture the system’s time dependent relations. We validate the framework on three PDE-driven dynamical systems - stress evolution in heterogeneous microstructures, Gray-Scott reaction-diffusion, and planetary-scale shallow water system - demonstrating consistent reduction in error accumulation over time compared to individual models. Critically, the method requires only a few time steps as input, enabling full trajectory predictions with inference times significantly faster than numerical solvers. Our results highlight the robustness of ensemble methods in diverse physical systems and their potential as efficient and accurate alternatives to traditional solvers. The codes for this work are available on GitHub (https://github.com/Graham-Brady-Research-Group/AutoregressiveEnsemble_SpatioTemporal_Evolution).

nan


Article 1045

Title@2025-07-05 (6): Continual Recommender Systems

Title: Continual Recommender Systems Kontinuierliche Recommender-Systeme 连续建议系统 2507.03861v1

Authors (3): Hyunsik Yoo, SeongKu Kang, Hanghang Tong

Modern recommender systems operate in uniquely dynamic settings: user interests, item pools, and popularity trends shift continuously, and models must adapt in real time without forgetting past preferences. While existing tutorials on continual or lifelong learning cover broad machine learning domains (e.g., vision and graphs), they do not address recommendation-specific demands-such as balancing stability and plasticity per user, handling cold-start items, and optimizing recommendation metrics under streaming feedback. This tutorial aims to make a timely contribution by filling that gap. We begin by reviewing the background and problem settings, followed by a comprehensive overview of existing approaches. We then highlight recent efforts to apply continual learning to practical deployment environments, such as resource-constrained systems and sequential interaction settings. Finally, we discuss open challenges and future research directions. We expect this tutorial to benefit researchers and practitioners in recommender systems, data mining, AI, and information retrieval across academia and industry.

nan


Article 1046

Title@2025-07-05 (6): Structured Generative Modeling with the Thermodynamic Kolmogorov-Arnold Model

Title: Structured Generative Modeling with the Thermodynamic Kolmogorov-Arnold Model Strukturierte Generative Modellierung mit dem Thermodynamischen Kolmogorov-Arnold-Modell 与热力科尔莫戈洛夫-阿诺尔德模型进行结构生成模型 2506.14167v2

Authors (1): Prithvi Raj

Learning an energy-based model (EBM) in the latent space of a top-down generative model offers an expressive and interpretable framework for text and image generation. However, it remains unclear how this interpretability can be systematically leveraged to guide model design, improve generative quality, and reduce training time. Moreover, the reliance on Langevin Monte Carlo (LMC) sampling presents challenges in efficiency and exploring multimodal latent distributions. In this work, we propose a novel adaptation of the Kolmogorov-Arnold representation theorem for generative modeling and introduce the Thermodynamic Kolmogorov-Arnold Model (T-KAM) as a new framework for incorporating structural and inductive biases. By constraining the prior to univariate relationships, T-KAM enables fast and exact inference via the inverse transform method. We also introduce a training strategy based on population-based LMC, which decomposes posterior sampling into a sequence of annealed distributions to improve multimodal exploration. We empirically demonstrate how inductive biases enable more efficient training strategies and compare our novel approaches to scaling and prior sampling. T-KAM provides an elegant balance among common trade-offs in generative modeling, offering fast inference, high sample quality, and stable training, while being naturally suited to upcoming Zettascale Computing Co. hardware and extendable to other high-impact research directions in generative intelligence.

nan


Article 1047

Title@2025-07-05 (6): Taylor-Model Physics-Informed Neural Networks (PINNs) for Ordinary Differential Equations

Title: Taylor-Model Physics-Informed Neural Networks (PINNs) for Ordinary Differential Equations Taylor-Model Physics-informed Neural Networks (PINNs) für normale Differentialgleichungen 用于普通差异等式的Taylor-Model物理成形神经网络 2507.03860v1

Authors (5): Chandra Kanth Nagesh, Sriram Sankaranarayanan, Ramneet Kaur, Tuhin Sahai, Susmit Jha

We study the problem of learning neural network models for Ordinary Differential Equations (ODEs) with parametric uncertainties. Such neural network models capture the solution to the ODE over a given set of parameters, initial conditions, and range of times. Physics-Informed Neural Networks (PINNs) have emerged as a promising approach for learning such models that combine data-driven deep learning with symbolic physics models in a principled manner. However, the accuracy of PINNs degrade when they are used to solve an entire family of initial value problems characterized by varying parameters and initial conditions. In this paper, we combine symbolic differentiation and Taylor series methods to propose a class of higher-order models for capturing the solutions to ODEs. These models combine neural networks and symbolic terms: they use higher order Lie derivatives and a Taylor series expansion obtained symbolically, with the remainder term modeled as a neural network. The key insight is that the remainder term can itself be modeled as a solution to a first-order ODE. We show how the use of these higher order PINNs can improve accuracy using interesting, but challenging ODE benchmarks. We also show that the resulting model can be quite useful for situations such as controlling uncertain physical systems modeled as ODEs.

nan


Article 1048

Title@2025-07-05 (6): Towards Efficient Contrastive PAC Learning

Title: Towards Efficient Contrastive PAC Learning Auf dem Weg zu einem effizienten kontraproduktiven PAC-Lernen 争取高效率竞争和反腐败方案学习 2502.15962v2

Authors (1): Jie Shen

We study contrastive learning under the PAC learning framework. While a series of recent works have shown statistical results for learning under contrastive loss, based either on the VC-dimension or Rademacher complexity, their algorithms are inherently inefficient or not implying PAC guarantees. In this paper, we consider contrastive learning of the fundamental concept of linear representations. Surprisingly, even under such basic setting, the existence of efficient PAC learners is largely open. We first show that the problem of contrastive PAC learning of linear representations is intractable to solve in general. We then show that it can be relaxed to a semi-definite program when the distance between contrastive samples is measured by the $\ell_2$-norm. We then establish generalization guarantees based on Rademacher complexity, and connect it to PAC guarantees under certain contrastive large-margin conditions. To the best of our knowledge, this is the first efficient PAC learning algorithm for contrastive learning.

nan


Article 1049

Title@2025-07-05 (6): Learning Representations for Independence Testing

Title: Learning Representations for Independence Testing Lernrepräsentanzen für Unabhängigkeitsprüfungen 独立测试学习代表 2409.06890v2

Authors (3): Nathaniel Xu, Feng Liu, Danica J. Sutherland

Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any dependence with enough samples, standard tests may require an exorbitant amount of samples for detecting subtle dependencies between high-dimensional random variables with complex distributions. In this work, we study two related ways to learn powerful independence tests. First, we show how to construct powerful statistical tests with finite-sample validity by using variational estimators of mutual information, such as the InfoNCE or NWJ estimators. Second, we establish a close connection between these variational mutual information-based tests and tests based on the Hilbert-Schmidt Independence Criterion (HSIC); in particular, learning a variational bound (typically parameterized by a deep network) for mutual information is closely related to learning a kernel for HSIC. Finally, we show how to, rather than selecting a representation to maximize the statistic itself, select a representation which can maximize the power of a test, in either setting; we term the former case a Neural Dependency Statistic (NDS). While HSIC power optimization has been recently considered in the literature, we correct some important misconceptions and expand to considering deep kernels. In our experiments, while all approaches can yield powerful tests with exact level control, optimized HSIC tests generally outperform the other approaches on difficult problems of detecting structured dependence.

nan


Article 1050

Title@2025-07-05 (6): Graphons of Line Graphs

Title: Graphons of Line Graphs Graphen von Liniengraphen 线图图图图图 2409.01656v3

Authors (2): Sevvandi Kandanaarachchi, Cheng Soon Ong

We consider the problem of estimating graph limits, known as graphons, from observations of sequences of sparse finite graphs. In this paper we show a simple method that can shed light on a subset of sparse graphs. The method involves mapping the original graphs to their line graphs. We show that graphs satisfying a particular property, which we call the square-degree property are sparse, but give rise to dense line graphs. This enables the use of results on graph limits of dense graphs to derive convergence. In particular, star graphs satisfy the square-degree property resulting in dense line graphs and non-zero graphons of line graphs. We demonstrate empirically that we can distinguish different numbers of stars (which are sparse) by the graphons of their corresponding line graphs. Whereas in the original graphs, the different number of stars all converge to the zero graphon due to sparsity. Similarly, superlinear preferential attachment graphs give rise to dense line graphs almost surely. In contrast, dense graphs, including Erdos-Renyi graphs make the line graphs sparse, resulting in the zero graphon.

nan


Article 1051

Title@2025-07-05 (6): Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting

Title: Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting Transformer mit Koopman-erweitertem Graphen-Konvolutions-Netzwerk für Spatiotemporal Dynamics Forecasting 带有Koopman-Enced的光时动态预测图变形网络的变形器 2507.03855v1

Authors (2): Zekai Wang, Bing Yao

Spatiotemporal dynamics forecasting is inherently challenging, particularly in systems defined over irregular geometric domains, due to the need to jointly capture complex spatial correlations and nonlinear temporal dynamics. To tackle these challenges, we propose TK-GCN, a two-stage framework that integrates geometry-aware spatial encoding with long-range temporal modeling. In the first stage, a Koopman-enhanced Graph Convolutional Network (K-GCN) is developed to embed the high-dimensional dynamics distributed on spatially irregular domains into a latent space where the evolution of system states is approximately linear. By leveraging Koopman operator theory, this stage enhances the temporal consistency during the latent learning. In the second stage, a Transformer module is employed to model the temporal progression within the Koopman-encoded latent space. Through the self-attention mechanism, the Transformer captures long-range temporal dependencies, enabling accurate forecasting over extended horizons. We evaluate TK-GCN in spatiotemporal cardiac dynamics forecasting and benchmark its performance against several state-of-the-art baselines. Experimental results and ablation studies show that TK-GCN consistently delivers superior predictive accuracy across a range of forecast horizons, demonstrating its capability to effectively model complex spatial structures and nonlinear temporal dynamics.

nan


Article 1052

Title@2025-07-05 (6): Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters

Title: Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters Latent FxLMS: Beschleunigung der aktiven Geräuschkontrolle mit neuralen adaptiven Filtern 低端 FxLMS: 使用神经适应过滤器加速主动噪音控制 2507.03854v1

Authors (6): Kanad Sarkar, Austin Lu, Manan Mittal, Yongjie Zhuang, Ryan Corey, Andrew Singer

Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC), wherein the soundfield is minimized at a desired location. Given prior knowledge of the spatial region of the noise or control sources, we could improve FxLMS by adapting along the low-dimensional manifold of possible adaptive filter weights. We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region and constrain the weights of the adaptive filter to be the output of the decoder for a given state of latent variables. Then, we perform updates in the latent space and use the decoder to generate the cancellation filter. We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error. Under certain conditions, our Latent FxLMS model converges in fewer steps with comparable steady-state error to the standard FxLMS.

nan


Article 1053

Title@2025-07-05 (6): OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems

Title: OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems OrbitAll: Ein einheitliches Quantum Mechanische Darstellung Deep Learning Framework für alle molekularen Systeme 全轨道:所有分子系统统一量子机械代表深学习框架 2507.03853v1

Authors (6): Beom Seok Kang, Vignesh C. Bhethanabotla, Amin Tavakoli, Maurice D. Hanisch, William A. Goddard III, Anima Anandkumar

Despite the success of deep learning methods in quantum chemistry, their representational capacity is most often confined to neutral, closed-shell molecules. However, real-world chemical systems often exhibit complex characteristics, including varying charges, spins, and environments. We introduce OrbitAll, a geometry- and physics-informed deep learning framework that can represent all molecular systems with electronic structure information. OrbitAll utilizes spin-polarized orbital features from the underlying quantum mechanical method, and combines it with graph neural networks satisfying SE(3)-equivariance. The resulting framework can represent and process any molecular system with arbitrary charges, spins, and environmental effects. OrbitAll demonstrates superior performance and generalization on predicting charged, open-shell, and solvated molecules, while also robustly extrapolating to molecules significantly larger than the training data by leveraging a physics-informed architecture. OrbitAll achieves chemical accuracy using 10 times fewer training data than competing AI models, with a speedup of approximately $10^3$ - $10^4$ compared to density functional theory.

nan


Article 1054

Title@2025-07-05 (6): KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis

Title: KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis KEA Explain: Erklärungen von Halluzinationen mittels Graph Kernel Analysis KEA 解释: 使用图心分析解释幻觉 2507.03847v1

Authors (2): Reilly Haskins, Ben Adams

Large Language Models (LLMs) frequently generate hallucinations: statements that are syntactically plausible but lack factual grounding. This research presents KEA (Kernel-Enriched AI) Explain: a neurosymbolic framework that detects and explains such hallucinations by comparing knowledge graphs constructed from LLM outputs with ground truth data from Wikidata or contextual documents. Using graph kernels and semantic clustering, the method provides explanations for detected hallucinations, ensuring both robustness and interpretability. Our framework achieves competitive accuracy in detecting hallucinations across both open- and closed-domain tasks, and is able to generate contrastive explanations, enhancing transparency. This research advances the reliability of LLMs in high-stakes domains and provides a foundation for future work on precision improvements and multi-source knowledge integration.

nan


Article 1055

Title@2025-07-05 (6): Interpretable Diffusion Models with B-cos Networks

Title: Interpretable Diffusion Models with B-cos Networks Interpretierbare Diffusionsmodelle mit B-cos-Netzwerken B-cos网络的可解释传播模型 2507.03846v1

Authors (4): Nicola Bernold, Moritz Vandenhirtz, Alice Bizeul, Julia E. Vogt

Text-to-image diffusion models generate images by iteratively denoising random noise, conditioned on a prompt. While these models have enabled impressive progress in image generation, they often fail to accurately reflect all semantic information described in the prompt – failures that are difficult to detect automatically. In this work, we introduce a diffusion model architecture built with B-cos modules that offers inherent interpretability. Our approach provides insight into how individual prompt tokens affect the generated image by producing explanations that highlight the pixel regions influenced by each token. We demonstrate that B-cos diffusion models can produce high-quality images while providing meaningful insights into prompt-image alignment.

nan


Article 1056

Title@2025-07-05 (6): Regularizing Log-Linear Cost Models for Inpatient Stays by Merging ICD-10 Codes

Title: Regularizing Log-Linear Cost Models for Inpatient Stays by Merging ICD-10 Codes Regularisieren von Log-Linear Cost Models für stationäre Aufenthalte durch Zusammenführen von ICD-10 Codes 通过合并 ICD-10 代码,规范住院患者日志成本模型 2507.03843v1

Authors (4): Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard

Cost models in healthcare research must balance interpretability, accuracy, and parameter consistency. However, interpretable models often struggle to achieve both accuracy and consistency. Ordinary least squares (OLS) models for high-dimensional regression can be accurate but fail to produce stable regression coefficients over time when using highly granular ICD-10 diagnostic codes as predictors. This instability arises because many ICD-10 codes are infrequent in healthcare datasets. While regularization methods such as Ridge can address this issue, they risk discarding important predictors. Here, we demonstrate that reducing the granularity of ICD-10 codes is an effective regularization strategy within OLS while preserving the representation of all diagnostic code categories. By truncating ICD-10 codes from seven characters (e.g., T67.0XXA, T67.0XXD) to six (e.g., T67.0XX) or fewer, we reduce the dimensionality of the regression problem while maintaining model interpretability and consistency. Mathematically, the merging of predictors in OLS leads to increased trace of the Hessian matrix, which reduces the variance of coefficient estimation. Our findings explain why broader diagnostic groupings like DRGs and HCC codes are favored over highly granular ICD-10 codes in real-world risk adjustment and cost models.

nan


Article 1057

Title@2025-07-04 (5): Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction

Title: Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction 用于大型电子结构预测的分布式等差图像神经网络 2507.03840v1

Authors (5): Manasa Kaniselvan, Alexander Maeder, Chen Hao Xia, Alexandros Nikolaos Ziogas, Mathieu Luisier

Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer.

nan


Article 1058

Title: MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search MatRL: Wahrscheinlich verallgemeinerbare iterative Algorithmen Entdeckung über Monte-Carlo Baumsuche MatRL: 通过蒙特-卡洛树搜索 发现可普遍实现的迭代性电算算法 2507.03833v1

Authors (4): Sungyoon Kim, Rajat Vadiraj Dwaraknath, Longling geng, Mert Pilanci

Iterative methods for computing matrix functions have been extensively studied and their convergence speed can be significantly improved with the right tuning of parameters and by mixing different iteration types. Handtuning the design options for optimal performance can be cumbersome, especially in modern computing environments: numerous different classical iterations and their variants exist, each with non-trivial per-step cost and tuning parameters. To this end, we propose MatRL – a reinforcement learning based framework that automatically discovers iterative algorithms for computing matrix functions. The key idea is to treat algorithm design as a sequential decision-making process. Monte-Carlo tree search is then used to plan a hybrid sequence of matrix iterations and step sizes, tailored to a specific input matrix distribution and computing environment. Moreover, we also show that the learned algorithms provably generalize to sufficiently large matrices drawn from the same distribution. Finally, we corroborate our theoretical results with numerical experiments demonstrating that MatRL produces algorithms that outperform various baselines in the literature.

nan


Article 1059

Title@2025-07-04 (5): Understanding Differential Transformer Unchains Pretrained Self-Attentions

Title: Understanding Differential Transformer Unchains Pretrained Self-Attentions Verständnis von Differentialtransformer-Unketten Vorgespannte Selbstaufmerksamkeiten 理解差异变异器 2505.16333v2

Authors (3): Chaerin Kong, Jiho Jang, Nojun Kwak

Differential Transformer has recently gained significant attention for its impressive empirical performance, often attributed to its ability to perform noise canceled attention. However, precisely how differential attention achieves its empirical benefits remains poorly understood. Moreover, Differential Transformer architecture demands large-scale training from scratch, hindering utilization of open pretrained weights. In this work, we conduct an in-depth investigation of Differential Transformer, uncovering three key factors behind its success: (1) enhanced expressivity via negative attention, (2) reduced redundancy among attention heads, and (3) improved learning dynamics. Based on these findings, we propose DEX, a novel method to efficiently integrate the advantages of differential attention into pretrained language models. By reusing the softmax attention scores and adding a lightweight differential operation on the output value matrix, DEX effectively incorporates the key advantages of differential attention while remaining lightweight in both training and inference. Evaluations confirm that DEX substantially improves the pretrained LLMs across diverse benchmarks, achieving significant performance gains with minimal adaptation data (< 0.01%).

nan


Article 1060

Title@2025-07-04 (5): Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion

Title: Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion Beziehungs-Aware-Netzwerk mit aufmerksamkeitsbasiertem Verlust für wenige-heiße Wissensgraphen-Vervollständigung 几乎没有热知识图补全的基于关注的损失关系软件网络 2306.09519v2

Authors (4): Qiao Qiao, Yuepei Li, Kang Zhou, Qi Li

Few-shot knowledge graph completion (FKGC) task aims to predict unseen facts of a relation with few-shot reference entity pairs. Current approaches randomly select one negative sample for each reference entity pair to minimize a margin-based ranking loss, which easily leads to a zero-loss problem if the negative sample is far away from the positive sample and then out of the margin. Moreover, the entity should have a different representation under a different context. To tackle these issues, we propose a novel Relation-Aware Network with Attention-Based Loss (RANA) framework. Specifically, to better utilize the plentiful negative samples and alleviate the zero-loss issue, we strategically select relevant negative samples and design an attention-based loss function to further differentiate the importance of each negative sample. The intuition is that negative samples more similar to positive samples will contribute more to the model. Further, we design a dynamic relation-aware entity encoder for learning a context-dependent entity representation. Experiments demonstrate that RANA outperforms the state-of-the-art models on two benchmark datasets.

nan


Article 1061

Title@2025-07-04 (5): IMPACT: Importance-Aware Activation Space Reconstruction

Title: IMPACT: Importance-Aware Activation Space Reconstruction IMPACT: Bedeutungs-Bewusst-Aktivierung Weltraum-Rekonstruktion IMPACT: 重要-警报启动空间重建 2507.03828v1

Authors (4): Md Mokarram Chowdhury, Daniel Agyei Asante, Ernie Chang, Yang Li

Large language models (LLMs) achieve strong performance across many domains but are difficult to deploy in resource-constrained settings due to their size. Low-rank weight matrix compression is a popular strategy for reducing model size, typically by minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. Instead, LLM activations exhibit stronger low-rank structure-prompting a shift toward minimizing activation reconstruction error. We show that this shift alone is insufficient: activation dimensions contribute unequally to model performance, and uniform reconstruction can harm performance. We propose IMPACT, a principled framework for importance-aware activation reconstruction that links model compression decisions to their impact on model behavior. IMPACT formulates an optimization problem that considers both activation structure and gradient sensitivity, and derives a closed-form solution where the optimal reconstruction bases are the eigenvectors of an importance-weighted activation covariance matrix. This enables low-rank approximations explicitly optimized to preserve accuracy. Experiments across diverse models and tasks show that IMPACT achieves up to 48.6% greater model size reduction with accuracy comparable to state-of-the-art baselines.

nan


Article 1062

Title@2025-07-04 (5): Symmetry-Robust 3D Orientation Estimation

Title: Symmetry-Robust 3D Orientation Estimation Symmetrie-Robuste 3D-Orientierung Schätzung 对称-对称 3D 3D 方向估计 2410.02101v4

Authors (3): Christopher Scarvelis, David Benhaim, Paul Zhang

Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape’s orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, where its orientation axes are aligned with the coordinate axes. Developing an orientation algorithm that reliably estimates complete orientations of general shapes remains an open problem. We introduce a two-stage orientation pipeline that achieves state of the art performance on up-axis estimation and further demonstrate its efficacy on full-orientation estimation, where one seeks all three orientation axes. Unlike previous work, we train and evaluate our method on all of Shapenet rather than a subset of classes. We motivate our engineering contributions by theory describing fundamental obstacles to orientation estimation for rotationally-symmetric shapes, and show how our method avoids these obstacles.

nan


Article 1063

Title@2025-07-04 (5): Compressing Deep Neural Networks Using Explainable AI

Title: Compressing Deep Neural Networks Using Explainable AI Komprimierende Tiefen-Neural-Netzwerke mit erklärbarer KI 使用可解释的 AI 压缩深神经网络 2507.05286v1

Authors (3): Kimia Soroush, Mohsen Raji, Behnam Ghavami

Deep neural networks (DNNs) have demonstrated remarkable performance in many tasks but it often comes at a high computational cost and memory usage. Compression techniques, such as pruning and quantization, are applied to reduce the memory footprint of DNNs and make it possible to accommodate them on resource-constrained edge devices. Recently, explainable artificial intelligence (XAI) methods have been introduced with the purpose of understanding and explaining AI methods. XAI can be utilized to get to know the inner functioning of DNNs, such as the importance of different neurons and features in the overall performance of DNNs. In this paper, a novel DNN compression approach using XAI is proposed to efficiently reduce the DNN model size with negligible accuracy loss. In the proposed approach, the importance score of DNN parameters (i.e. weights) are computed using a gradient-based XAI technique called Layer-wise Relevance Propagation (LRP). Then, the scores are used to compress the DNN as follows: 1) the parameters with the negative or zero importance scores are pruned and removed from the model, 2) mixed-precision quantization is applied to quantize the weights with higher/lower score with higher/lower number of bits. The experimental results show that, the proposed compression approach reduces the model size by 64% while the accuracy is improved by 42% compared to the state-of-the-art XAI-based compression method.

nan


Article 1064

Title@2025-07-04 (5): Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts

Title: Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts Nutzung großer Sprachmodelle für die Tacit Knowledge Discovery in organisatorischen Kontexten 利用大型语言模型促进组织环境中的隐蔽知识发现 2507.03811v1

Authors (4): Gianlucca Zuin, Saulo Mastelini, Túlio Loures, Adriano Veloso

Documenting tacit knowledge in organizations can be a challenging task due to incomplete initial information, difficulty in identifying knowledgeable individuals, the interplay of formal hierarchies and informal networks, and the need to ask the right questions. To address this, we propose an agent-based framework leveraging large language models (LLMs) to iteratively reconstruct dataset descriptions through interactions with employees. Modeling knowledge dissemination as a Susceptible-Infectious (SI) process with waning infectivity, we conduct 864 simulations across various synthetic company structures and different dissemination parameters. Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores. We analyze how each simulation parameter affects the knowledge retrieval process for the agent. In particular, we find that our approach is able to recover information without needing to access directly the only domain specialist. These findings highlight the agent’s ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible.

nan


Article 1065

Title@2025-07-04 (5): Coil Geometry Learning for Short-Range Magnetic Actuation

Title: Coil Geometry Learning for Short-Range Magnetic Actuation Spulengeometrie-Lernen für kurze magnetische Betätigung 短期磁力动动动的焦油几何学习 2507.03806v1

Authors (3): Yuta Takahashi, Hayate Tajima, Shin-ichiro Sakai

Fuel-free docking is a key operational technology for in-space assembly, resupplying space stations, sample return missions, and formation keeping of large-scale satellite swarms. The use of conventional propulsion systems, including thrusters, can cause adverse effects at short distances, such as sensor contamination, which may lead to the failure of the satellite or onboard equipment. The magnetic field interaction control generated by magnetorquers can overcome these weaknesses of propulsion. This actuation enables simultaneous control of attitude and formation control among desired satellite groups. The previous study typically uses the traditional dipole approximation model of the exact magnetic field to reduce computation cost. However, proximity operations often involve relatively short distances between satellites, which can easily compromise the effectiveness of this approximation. To avoid model errors that could result in satellite collisions, we utilize a magnetic field model described by Biot-Savart’s law, without distance approximations (Near-field model), in consideration of short-distance operations. To overcome the high computational cost associated with the coil geometry and relative states information, a learning-based magnetic field approximation is derived, and its effectiveness is shown in the docking simulation of target and chaser satellites equipped with electromagnetic coils on three axes. Our method significantly reduces the computational cost of the exact magnetic model and possesses scalability that can accommodate an increasing number of target satellites through parallel processing.

nan


Article 1066

Title@2025-07-04 (5): Learning Differentiable Logic Programs for Abstract Visual Reasoning

Title: Learning Differentiable Logic Programs for Abstract Visual Reasoning Lernen von differenzierbaren Logic-Programmen für abstrakte visuelle Vernunft 学习用于抽象视觉理性的差别化逻辑程序 2307.00928v2

Authors (4): Hikaru Shindo, Viktor Pfanschilling, Devendra Singh Dhami, Kristian Kersting

Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms. However, due to the memory intensity, most existing approaches do not bring the best of the expressivity of first-order logic, excluding a crucial ability to solve abstract visual reasoning, where agents need to perform reasoning by using analogies on abstract concepts in different scenarios. To overcome this problem, we propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), which is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors. Moreover, we propose a computationally-efficient structure learning algorithm to perform explanatory program induction on complex visual scenes. To evaluate, in addition to conventional visual reasoning tasks, we propose a new task, visual reasoning behind-the-scenes, where agents need to learn abstract programs and then answer queries by imagining scenes that are not observed. We empirically demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.

nan


Article 1067

Title@2025-07-04 (5): Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs

Title: Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs Halluzinatorische Bild-Token: Ein trainingsfreier EAZY-Ansatz zur Erkennung und Abmilderung von Objekt-Halluzinationen in LVLMs 幻觉图像 Tokens: 关于在LVLMs中探测和减轻天体幻觉的无培训的EAZY方法 2503.07772v2

Authors (6): Liwei Che, Tony Qingze Liu, Jing Jia, Weiyi Qin, Ruixiang Tang, Vladimir Pavlovic

Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.

nan


Article 1068

Title@2025-07-04 (5): Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation

Title: Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation Effizientes und effektives Abfrage-Context-Aware-Lern-to-Rank-Modell für sequentielle Empfehlung 高效率和高效能的查询背景- 软件学习到 Rank 序列建议模型 2507.03789v1

Authors (5): Andrii Dzhoha, Alisa Mironenko, Vladimir Vlasov, Maarten Versteegh, Marjan Celikik

Modern sequential recommender systems commonly use transformer-based models for next-item prediction. While these models demonstrate a strong balance between efficiency and quality, integrating interleaving features - such as the query context (e.g., browse category) under which next-item interactions occur - poses challenges. Effectively capturing query context is crucial for refining ranking relevance and enhancing user engagement, as it provides valuable signals about user intent within a session. Unlike an item’s features, query context is not temporally aligned with the item sequence, making its incorporation into transformers challenging and error-prone. This paper analyzes different strategies for incorporating query context into transformers trained with a causal language modeling procedure as a case study. We propose a new method that effectively fuses the item sequence with query context within the attention mechanism. Through extensive offline and online experiments on a large-scale online platform and open datasets, we present evidence that our proposed method is an effective approach for integrating query context to improve model ranking quality in terms of relevance and diversity.

nan


Article 1069

Title@2025-07-04 (5): Effective Capacitance Modeling Using Graph Neural Networks

Title: Effective Capacitance Modeling Using Graph Neural Networks Effektive Kapazitätsmodellierung mit Hilfe von Graph Neural Networks 使用图形神经网络的有效能力建模 2507.03787v1

Authors (2): Eren Dogan, Matthew R. Guthaus

Static timing analysis is a crucial stage in the VLSI design flow that verifies the timing correctness of circuits. Timing analysis depends on the placement and routing of the design, but at the same time, placement and routing efficiency depend on the final timing performance. VLSI design flows can benefit from timing-related prediction to better perform the earlier stages of the design flow. Effective capacitance is an essential input for gate delay calculation, and finding exact values requires routing or routing estimates. In this work, we propose the first GNN-based post-layout effective capacitance modeling method, GNN-Ceff, that achieves significant speed gains due to GPU parallelization while also providing better accuracy than current heuristics. GNN-Ceff parallelization achieves 929x speedup on real-life benchmarks over the state-of-the-art method run serially.

nan


Article 1070

Title@2025-07-04 (5): Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods

Title: Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods Identifizieren von großräumigen Linearparameter-Verschiedenheitssystemen mit dynamischen Methoden zur Dekomposition 确定具有动态模式分解方法的大型线性参数变异系统 2502.02336v2

Authors (3): Jean Panaioti Jordanou, Eduardo Camponogara, Eduardo Gildin

Linear Parameter Varying (LPV) Systems are a well-established class of nonlinear systems with a rich theory for stability analysis, control, and analytical response finding, among other aspects. Although there are works on data-driven identification of such systems, the literature is quite scarce in terms of works that tackle the identification of LPV models for large-scale systems. Since large-scale systems are ubiquitous in practice, this work develops a methodology for the local and global identification of large-scale LPV systems based on nonintrusive reduced-order modeling. The developed method is coined as DMD-LPV for being inspired in the Dynamic Mode Decomposition (DMD). To validate the proposed identification method, we identify a system described by a discretized linear diffusion equation, with the diffusion gain defined by a polynomial over a parameter. The experiments show that the proposed method can easily identify a reduced-order LPV model of a given large-scale system without the need to perform identification in the full-order dimension, and with almost no performance decay over performing a reduction, given that the model structure is well-established.

nan


Article 1071

Title@2025-07-04 (5): FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

Title: FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed FastDINOv2: Frequency Based Curriculum Learning verbessert Robustheit und Trainingsgeschwindigkeit FastDINOv2:基于频率的课程学习提高强力和培训速度 2507.03779v1

Authors (5): Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero

Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning–which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence–and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum–low-frequency being seen first–and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive linear probing performance compared with baseline. This dual benefit of efficiency and robustness makes large-scale self-supervised foundation modeling more attainable, while opening the door to novel exploration around data curriculum and augmentation as means to improve self-supervised learning models robustness. The code is available at https://github.com/KevinZ0217/fast_dinov2

nan


Article 1072

Title@2025-07-04 (5): Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Title: Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms Effiziente Finite-Initialisierung mit Teilnormen für spannungsverstärkte Neuronalnetze und Tensor-Netzwerke Algorithmen 具有部分规范的短期神经网络和Tensor网络的有限化 2309.06577v4

Authors (4): Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta

We present two algorithms to initialize layers of tensorized neural networks and general tensor network algorithms using partial computations of their Frobenius norms and lineal entrywise norms, depending on the type of tensor network involved. The core of this method is the use of the norm of subnetworks of the tensor network in an iterative way, so that we normalize by the finite values of the norms that led to the divergence or zero norm. In addition, the method benefits from the reuse of intermediate calculations. We have also applied it to the Matrix Product State/Tensor Train (MPS/TT) and Matrix Product Operator/Tensor Train Matrix (MPO/TT-M) layers and have seen its scaling versus the number of nodes, bond dimension, and physical dimension. All code is publicly available.

nan


Article 1073

Title@2025-07-04 (5): Finetuning CLIP to Reason about Pairwise Differences

Title: Finetuning CLIP to Reason about Pairwise Differences Finetuning CLIP auf Grund von Pairwise Unterschiede 将 CLIP 微调到对等差异的原因 2409.09721v2

Authors (4): Dylan Sam, Devin Willmott, Joao D. Semedo, J. Zico Kolter

Vision-language models (VLMs) such as CLIP are trained via contrastive learning between text and image pairs, resulting in aligned image and text embeddings that are useful for many downstream tasks. A notable drawback of CLIP, however, is that the resulting embedding space seems to lack some of the structure of its purely text-based alternatives. For instance, while text embeddings have long been noted to satisfy analogies in embedding space using vector arithmetic, CLIP has no such property. In this paper, we propose an approach to natively train CLIP in a contrastive manner to reason about differences in embedding space. We finetune CLIP so that text descriptions of differences between images correspond to their difference in image embedding space, using synthetically generated data with large language models on image-caption paired datasets. We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute (e.g., elephants are larger than cats), which is useful in retrieval or constructing attribute-based classifiers, and improved zeroshot classification performance on many downstream image classification tasks. In addition, our approach enables a new mechanism for inference that we refer to as comparative prompting, where we leverage prior knowledge of text descriptions of differences between classes of interest, achieving even larger performance gains in classification. Finally, we illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space, such as in text-to-image generation.

nan


Article 1074

Title@2025-07-04 (5): Causal Evidence for the Primordiality of Colors in Trans-Neptunian Objects

Title: Causal Evidence for the Primordiality of Colors in Trans-Neptunian Objects Kausale Beweise für die Primordialität von Farben in transneptunischen Objekten 跨尼普敦天体颜色纯度的因果证据 2507.03760v1

Authors (6): Benjamin L. Davis, Mohamad Ali-Dib, Yujia Zheng, Zehao Jin, Kun Zhang, Andrea Valerio Macciò

The origins of the colors of Trans-Neptunian Objects (TNOs) represent a crucial unresolved question, central to understanding the history of our Solar System. Recent observational surveys have revealed correlations between the eccentricity and inclination of TNOs and their colors. This has rekindled the long-standing debate on whether these colors reflect the conditions of TNO formation or their subsequent collisional evolution. In this study, we address this question with 98.7% certainty, using a model-agnostic, data-driven approach based on causal graphs. First, as a sanity check, we demonstrate how our model can replicate the currently accepted paradigms of TNOs’ dynamical history, blindly and without any orbital modeling or physics-based assumptions. In fact, our causal model (with no knowledge of the existence of Neptune) predicts the existence of an unknown perturbing body, i.e., Neptune. We then show how this model predicts, with high certainty, that the color of TNOs is the root cause of their inclination distribution, rather than the other way around. This strongly suggests that the colors of TNOs reflect an underlying dynamical property, most likely their formation location. Moreover, our causal model excludes formation scenarios that invoke substantial color modification by subsequent irradiation. We therefore conclude that the colors of TNOs are predominantly primordial.

nan


Article 1075

Title@2025-07-04 (5): Sequential Regression Learning with Randomized Algorithms

Title: Sequential Regression Learning with Randomized Algorithms Sequentielle Regressions-Lernen mit Randomisierten Algorithmen 使用随机定序算法的序列回归学习 2507.03759v1

Authors (3): Dorival Leão, Reiko Aoki, Teh Led Red

This paper presents ``randomized SINDy”, a sequential machine learning algorithm designed for dynamic data that has a time-dependent structure. It employs a probabilistic approach, with its PAC learning property rigorously proven through the mathematical theory of functional analysis. The algorithm dynamically predicts using a learned probability distribution of predictors, updating weights via gradient descent and a proximal algorithm to maintain a valid probability density. Inspired by SINDy (Brunton et al. 2016), it incorporates feature augmentation and Tikhonov regularization. For multivariate normal weights, the proximal step is omitted to focus on parameter estimation. The algorithm’s effectiveness is demonstrated through experimental results in regression and binary classification using real-world data.

nan


Article 1076

Title@2025-07-04 (5): Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models

Title: Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models Probabilistische Einbettungen für frozen Vision-Language-Modelle: Ungewissheitsquantifizierung mit Gaußschen Prozess Latent Variable Models 冻结愿景-语言模型的概率嵌入模型:与高山进程边流变量模型的不确定性量化 2505.05163v2

Authors (3): Aishwarya Venkataramanan, Paul Bodesheim, Joachim Denzler

Vision-Language Models (VLMs) learn joint representations by mapping images and text into a shared latent space. However, recent research highlights that deterministic embeddings from standard VLMs often struggle to capture the uncertainties arising from the ambiguities in visual and textual descriptions and the multiple possible correspondences between images and texts. Existing approaches tackle this by learning probabilistic embeddings during VLM training, which demands large datasets and does not leverage the powerful representations already learned by large-scale VLMs like CLIP. In this paper, we propose GroVE, a post-hoc approach to obtaining probabilistic embeddings from frozen VLMs. GroVE builds on Gaussian Process Latent Variable Model (GPLVM) to learn a shared low-dimensional latent space where image and text inputs are mapped to a unified representation, optimized through single-modal embedding reconstruction and cross-modal alignment objectives. Once trained, the Gaussian Process model generates uncertainty-aware probabilistic embeddings. Evaluation shows that GroVE achieves state-of-the-art uncertainty calibration across multiple downstream tasks, including cross-modal retrieval, visual question answering, and active learning.

nan


Article 1077

Title@2025-07-04 (5): Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

Title: Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis Implizite Regularisierung in Diffusionsmodellen: Eine Algorithmen-Abhängige Generalisierungsanalyse 在传播模型中隐含的规范化:一个分析法 – – 依赖性一般分析 2507.03756v1

Authors (4): Tyler Farghly, Patrick Rebeschini, George Deligiannidis, Arnaud Doucet

The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data – implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.

nan


Article 1078

Title@2025-07-04 (5): Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression

Title: Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression Bestimmung von Partikelgrößenverteilungen aus lichtstreuender Messung mit eingeschränkter Gauß-Prozessregression 利用受限制的高斯进程倒退,确定从轻喷射测量中得出的粒子-碎片分布 2507.03736v1

Authors (4): Fahime Seyedheydari, Mahdi Nasiri, Marcin Mińkowski, Simo Särkkä

In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of the first kind, an ill-posed inverse problem characterized by instability due to measurement noise and limited data. To address this, we use a Gaussian process prior to regularize the solution and integrate a normalization constraint into the Gaussian process via two approaches: by constraining the Gaussian process using a pseudo-measurement and by using Lagrange multipliers in the equivalent optimization problem. To improve computational efficiency, we employ a spectral expansion of the covariance kernel using eigenfunctions of the Laplace operator, resulting in a computationally tractable low-rank representation without sacrificing accuracy. Additionally, we investigate two complementary strategies for hyperparameter estimation: a data-driven approach based on maximizing the unconstrained log marginal likelihood, and an alternative approach where the physical constraints are taken into account. Numerical experiments demonstrate that the proposed constrained Gaussian process regression framework accurately reconstructs particle size distributions, producing numerically stable, smooth, and physically interpretable results. This methodology provides a principled and efficient solution for addressing inverse scattering problems and related ill-posed integral equations.

nan


Article 1079

Title@2025-07-04 (5): Inverse Synthetic Aperture Fourier Ptychography

Title: Inverse Synthetic Aperture Fourier Ptychography Inverse Synthetische Blende Fourier Ptychographie 反向合成孔径孔径 2507.03733v1

Authors (3): Matthew A. Chan, Casey J. Pellizzari, Christopher A. Metzler

Fourier ptychography (FP) is a powerful light-based synthetic aperture imaging technique that allows one to reconstruct a high-resolution, wide field-of-view image by computationally integrating a diverse collection of low-resolution, far-field measurements. Typically, FP measurement diversity is introduced by changing the angle of the illumination or the position of the camera; either approach results in sampling different portions of the target’s spatial frequency content, but both approaches introduce substantial costs and complexity to the acquisition process. In this work, we introduce Inverse Synthetic Aperture Fourier Ptychography, a novel approach to FP that foregoes changing the illumination angle or camera position and instead generates measurement diversity through target motion. Critically, we also introduce a novel learning-based method for estimating k-space coordinates from dual plane intensity measurements, thereby enabling synthetic aperture imaging without knowing the rotation of the target. We experimentally validate our method in simulation and on a tabletop optical system.

nan


Article 1080

Title@2025-07-04 (5): Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers

Title: Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers Calabi-Yau-Konstruktionen transformieren: Neue Calabi-Yau-Manifolds mit Transformern generieren Calabi-Yau 变换建筑: 以变换器生成新 Calabi- Yau 的 变换元件 2507.03732v1

Authors (4): Jacky H. T. Yip, Charles Arnal, Francois Charton, Gary Shiu

Fine, regular, and star triangulations (FRSTs) of four-dimensional reflexive polytopes give rise to toric varieties, within which generic anticanonical hypersurfaces yield smooth Calabi-Yau threefolds. We employ transformers – deep learning models originally developed for language modeling – to generate FRSTs across a range of polytope sizes. Our models exhibit efficient and unbiased sampling, and can self-improve through retraining on their own output. These results lay the foundation for AICY: a community-driven platform that combines self-improving machine learning models with a continuously expanding FRST database to explore and catalog the Calabi-Yau landscape.

nan


Article 1081

Title@2025-07-04 (5): Less is More: Empowering GUI Agent with Context-Aware Simplification

Title: Less is More: Empowering GUI Agent with Context-Aware Simplification Weniger ist mehr: Ermächtigen von GUI-Agenten mit Context-Aware-Vereinfachung 更少是更多: 授权用户界面代理,简化内容软件 2507.03730v1

Authors (10): Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie

The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

nan


Article 1082

Title@2025-07-04 (5): FAROS: Fair Graph Generation via Attribute Switching Mechanisms

Title: FAROS: Fair Graph Generation via Attribute Switching Mechanisms FAROS: Fair Graph Generierung über Attributwechselmechanismen FAROS:通过属性转换机制产生公平的图表 2507.03728v1

Authors (5): Abdennacer Badaoui, Oussama Kharouiche, Hatim Mrabet, Daniele Malitesta, Fragkiskos D. Malliaros

Recent advancements in graph diffusion models (GDMs) have enabled the synthesis of realistic network structures, yet ensuring fairness in the generated data remains a critical challenge. Existing solutions attempt to mitigate bias by re-training the GDMs with ad-hoc fairness constraints. Conversely, with this work, we propose FAROS, a novel FAir graph geneRatiOn framework leveraging attribute Switching mechanisms and directly running in the generation process of the pre-trained GDM. Technically, our approach works by altering nodes’ sensitive attributes during the generation. To this end, FAROS calculates the optimal fraction of switching nodes, and selects the diffusion step to perform the switch by setting tailored multi-criteria constraints to preserve the node-topology profile from the original distribution (a proxy for accuracy) while ensuring the edge independence on the sensitive attributes for the generated graph (a proxy for fairness). Our experiments on benchmark datasets for link prediction demonstrate that the proposed approach effectively reduces fairness discrepancies while maintaining comparable (or even higher) accuracy performance to other similar baselines. Noteworthy, FAROS is also able to strike a better accuracy-fairness trade-off than other competitors in some of the tested settings under the Pareto optimality concept, demonstrating the effectiveness of the imposed multi-criteria constraints.

nan


Article 1083

Title@2025-07-04 (5): Temporal Window Smoothing of Exogenous Variables for Improved Time Series Prediction

Title: Temporal Window Smoothing of Exogenous Variables for Improved Time Series Prediction Temporale Fensterglättung von exogenen Variablen für eine verbesserte Zeitreihenvorhersage 改进时间序列预测外源变量平滑时间窗口 2507.05284v1

Authors (5): Mustafa Kamal, Niyaz Bin Hashem, Robin Krambroeckers, Nabeel Mohammed, Shafin Rahman

Although most transformer-based time series forecasting models primarily depend on endogenous inputs, recent state-of-the-art approaches have significantly improved performance by incorporating external information through exogenous inputs. However, these methods face challenges, such as redundancy when endogenous and exogenous inputs originate from the same source and limited ability to capture long-term dependencies due to fixed look-back windows. In this paper, we propose a method that whitens the exogenous input to reduce redundancy that may persist within the data based on global statistics. Additionally, our approach helps the exogenous input to be more aware of patterns and trends over extended periods. By introducing this refined, globally context-aware exogenous input to the endogenous input without increasing the lookback window length, our approach guides the model towards improved forecasting. Our approach achieves state-of-the-art performance in four benchmark datasets, consistently outperforming 11 baseline models. These results establish our method as a robust and effective alternative for using exogenous inputs in time series forecasting.

nan


Article 1084

Title@2025-07-04 (5): Differentially private scale testing via rank transformations and percentile modifications

Title: Differentially private scale testing via rank transformations and percentile modifications Differential private Skalierungstests über Rangtransformationen und Perzentilmodifikationen 通过等级变换和百分度变换进行差别私人规模测试 2507.03725v1

Authors (2): Joshua Levine, Kelly Ramsay

We develop a class of differentially private two-sample scale tests, called the rank-transformed percentile-modified Siegel–Tukey tests, or RPST tests. These RPST tests are inspired both by recent differentially private extensions of some common rank tests and some older modifications to non-private rank tests. We present the asymptotic distribution of the RPST test statistic under the null hypothesis, under a very general condition on the rank transformation. We also prove RPST tests are differentially private, and that their type I error does not exceed the given level. We uncover that the growth rate of the rank transformation presents a tradeoff between power and sensitivity. We do extensive simulations to investigate the effects of the tuning parameters and compare to a general private testing framework. Lastly, we show that our techniques can also be used to improve the differentially private signed-rank test.

nan


Article 1085

Title@2025-07-04 (5): Predicting Business Angel Early-Stage Decision Making Using AI

Title: Predicting Business Angel Early-Stage Decision Making Using AI Geschäftsengel frühzeitige Entscheidungsfindung mit KI voraussagen 利用大赦国际预测商业天使早期阶段决策 2507.03721v1

Authors (2): Yan Katcharovski, Andrew L. Maxwell

External funding is crucial for early-stage ventures, particularly technology startups that require significant R&D investment. Business angels offer a critical source of funding, but their decision-making is often subjective and resource-intensive for both investor and entrepreneur. Much research has investigated this investment process to find the critical factors angels consider. One such tool, the Critical Factor Assessment (CFA), deployed more than 20,000 times by the Canadian Innovation Centre, has been evaluated post-decision and found to be significantly more accurate than investors’ own decisions. However, a single CFA analysis requires three trained individuals and several days, limiting its adoption. This study builds on previous work validating the CFA to investigate whether the constraints inhibiting its adoption can be overcome using a trained AI model. In this research, we prompted multiple large language models (LLMs) to assign the eight CFA factors to a dataset of 600 transcribed, unstructured startup pitches seeking business angel funding with known investment outcomes. We then trained and evaluated machine learning classification models using the LLM-generated CFA scores as input features. Our best-performing model demonstrated high predictive accuracy (85.0% for predicting BA deal/no-deal outcomes) and exhibited significant correlation (Spearman’s r = 0.896, p-value < 0.001) with conventional human-graded evaluations. The integration of AI-based feature extraction with a structured and validated decision-making framework yielded a scalable, reliable, and less-biased model for evaluating startup pitches, removing the constraints that previously limited adoption.

nan


Article 1086

Title@2025-07-04 (5): Code Simulation as a Proxy for High-order Tasks in Large Language Models

Title: Code Simulation as a Proxy for High-order Tasks in Large Language Models Code-Simulation als Proxy für High-Order-Aufgaben in großen Sprachmodellen 守则模拟作为大语言模型中高顺序任务代理器的代码模拟 2502.03568v3

Authors (9): Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, X. Angelo Huang, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM). While naturalistic tasks often require careful human handcrafting, we show that synthetic data is, in many cases, a good proxy that is much easier to collect at scale. We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks, such as straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the capabilities of LLMs on sorting problems and repeated operations via sorting algorithms and nested loops. Our synthetic datasets further reveal that while the most powerful LLMs exhibit relatively strong execution capabilities, the process is fragile: it is negatively affected by memorisation and seems to rely heavily on pattern recognition. Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.

nan


Article 1087

Title@2025-07-04 (5): CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

Title: CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning CosmoBench: Multiscale, Multiview, Multitask Kosmologie Benchmark für geometrisches Deep Learning CosmoBench: 用于几何深习的多尺度、多视角、多任务宇宙学基准 2507.03707v1

Authors (9): Ningyuan Huang, Richard Stiskalek, Jun-Young Lee, Adrian E. Bayer, Charles C. Margossian, Christian Kragh Jespersen, Lucia A. Perez, Lawrence K. Saul, Francisco Villaescusa-Navarro

Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks – to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches – from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training times. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at https://cosmobench.streamlit.app

nan


Article 1088

Title@2025-07-04 (5): Offline RLAIF: Piloting VLM Feedback for RL via SFO

Title: Offline RLAIF: Piloting VLM Feedback for RL via SFO Offline RLAIF: Pilotierung von VLM Feedback für RL über SFO RLAIF离线:通过SFO试行VLM对RL的反馈 2503.01062v5

Authors (1): Jacob Beck

While internet-scale image and textual data have enabled strong generalization in Vision-Language Models (VLMs), the absence of internet-scale control data has impeded the development of similar generalization in standard reinforcement learning (RL) agents. Although VLMs are fundamentally limited in their ability to solve control tasks due to their lack of action-conditioned training data, their capacity for image understanding allows them to provide valuable feedback in RL tasks by recognizing successful outcomes. A key challenge in Reinforcement Learning from AI Feedback (RLAIF) is determining how best to integrate VLM-derived signals into the learning process. We explore this question in the context of offline RL and introduce a class of methods called Sub-Trajectory Filtered Optimization (SFO). We identify three key insights. First, trajectory length plays a crucial role in offline RL, as full-trajectory preference learning exacerbates the stitching problem, necessitating the use of sub-trajectories. Second, even in Markovian environments, a non-Markovian reward signal from a sequence of images is required to assess trajectory improvement, as VLMs do not interpret control actions and must rely on visual cues over time. Third, a simple yet effective approach–filtered and weighted behavior cloning–consistently outperforms more complex RLHF-based methods. We propose Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages VLM feedback on sub-trajectories while incorporating a retrospective filtering mechanism that removes sub-trajectories preceding failures to improve robustness and prevent turbulence. Please enjoy our airport puns.

nan


Article 1089

Title@2025-07-04 (5): The Geometries of Truth Are Orthogonal Across Tasks

Title: The Geometries of Truth Are Orthogonal Across Tasks Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg 真理的地貌不同任务 2506.08572v2

Authors (7): Waiss Azizian, Michael Kirchhof, Eugene Ndiaye, Louis Bethune, Michal Klein, Pierre Ablin, Marco Cuturi

Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a “geometry of truth” can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these “geometries of truth” are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.

nan


Article 1090

Title@2025-07-04 (5): Plugging Attention into Power Grids: Towards Transparent Forecasting

Title: Plugging Attention into Power Grids: Towards Transparent Forecasting Einbinden der Aufmerksamkeit in Stromnetze: Auf dem Weg zu einer transparenten Prognose 将注意力引入电力网网:走向透明预测 2507.03690v1

Authors (5): Eloi Campagne, Itai Zehavi, Yvenn Amara-Ouali, Yannig Goude, Argyris Kalogeratos

Accurate electricity consumption forecasting is crucial for ensuring grid stability and optimizing power generation, particularly in increasingly decentralized and complex systems. While classical approaches such as Generalized Additive Models (GAMs) remain widely used, they often fail to capture the spatial dependencies inherent in energy networks. Graph Neural Networks (GNNs) offer a principled framework to incorporate this structure by directly leveraging graph topologies. In this work, we evaluate a broad set of GNN architectures – including GCN, GraphSAGE, ChebConv, TAG, APPNP, TransformerConv, and Graph Attention Networks (GAT and GATv2) – on two real-world electricity consumption datasets from France and the UK. Our experiments show that while complex architectures like GATv2 and TransformerConv do not consistently outperform their simpler counterparts, models such as GCN and APPNP achieve strong results in low-data or highly disaggregated settings. Nonetheless, the vanilla GAT remains highly competitive across both datasets and offers an additional interpretability layer via attention mechanisms. We perform a temporal analysis of attention weights, revealing evolving patterns of regional interaction linked to seasonal and meteorological variability. These results highlight that, although attention is not universally superior, it provides valuable explanatory power when spatial dependencies are prominent. Finally, we benchmark ensemble-based expert aggregation strategies, showing that uniform or learned combinations can enhance robustness and outperform individual models under data heterogeneity.

nan


Article 1091

Title@2025-07-04 (5): A Resource Efficient Quantum Kernel

Title: A Resource Efficient Quantum Kernel Ein ressourceneffizienter Quantenkern 资源效率高的量子核心 2507.03689v1

Authors (5): Utkarsh Singh, Marco Armenta, Jean-Frédéric Laprade, Aaron Z. Goldberg, Khabat Heshami

Quantum processors may enhance machine learning by mapping high-dimensional data onto quantum systems for processing. Conventional quantum kernels, or feature maps, for encoding data features onto a quantum circuit are currently impractical, as the number of entangling gates scales quadratically with the dimension of the dataset and the number of qubits. In this work, we introduce a quantum kernel designed to handle high-dimensional data with a significantly reduced number of qubits and entangling operations. Our approach preserves essential data characteristics while promoting computational efficiency, as evidenced by extensive experiments on benchmark datasets that demonstrate a marked improvement in both accuracy and resource utilization, as compared to state-of-the-art quantum feature maps. Our noisy simulations results combined with lower resource requirements highlight our kernel’s ability to function within the constraints of noisy intermediate-scale quantum devices. Through numerical simulations and small-scale implementation on a superconducting circuit quantum computing platform, we demonstrate that our scheme performs on par or better than a set of classical algorithms for classification. Our findings herald a promising avenue for the practical implementation of quantum machine learning algorithms on near future quantum computing platforms.

nan


Article 1092

Title@2025-07-04 (5): Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning

Title: Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning Auf dem Weg zur Maschinentheorie des Geistes mit groß sprachlicher modellverstärkter Inverse Planung 通向大语言模拟反向规划的脑力理论 2507.03682v1

Authors (3): Rebekah A. Gelpí, Eric Xue, William A. Cunningham

We propose a hybrid approach to machine Theory of Mind (ToM) that uses large language models (LLMs) as a mechanism for generating hypotheses and likelihood functions with a Bayesian inverse planning model that computes posterior probabilities for an agent’s likely mental states given its actions. Bayesian inverse planning models can accurately predict human reasoning on a variety of ToM tasks, but these models are constrained in their ability to scale these predictions to scenarios with a large number of possible hypotheses and actions. Conversely, LLM-based approaches have recently demonstrated promise in solving ToM benchmarks, but can exhibit brittleness and failures on reasoning tasks even when they pass otherwise structurally identical versions. By combining these two methods, this approach leverages the strengths of each component, closely matching optimal results on a task inspired by prior inverse planning models and improving performance relative to models that utilize LLMs alone or with chain-of-thought prompting, even with smaller LLMs that typically perform poorly on ToM tasks. We also exhibit the model’s potential to predict mental states on open-ended tasks, offering a promising direction for future development of ToM models and the creation of socially intelligent generative agents.

nan


Article 1093

Title@2025-07-04 (5): Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

Title: Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data Robuste Abschätzung heterogener Behandlungseffekte in randomisierten Studien, die externe Daten nutzen 利用外部数据对随机试验中多种不同治疗效果的有力估计 2507.03681v1

Authors (4): Rickard Karlsson, Piersilvio De Bartolomeis, Issa J. Dahabreh, Jesse H. Krijthe

Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover effect heterogeneity over patient characteristics, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it has the potential to reduce the CATE prediction mean squared error while maintaining consistency, even when the external data is not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds the trial-only learner in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.

nan


Article 1094

Title@2025-07-04 (5): Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs

Title: Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs Re-Emergent-Missausrichtung: Wie schmale Feintuning-Erodes Sicherheit Ausrichtung in LLMs 重新出现错配: LLM 中如何将安全对齐 2507.03662v1

Authors (1): Jeremiah Giordani

Recent work has shown that fine-tuning large language models (LLMs) on code with security vulnerabilities can result in misaligned and unsafe behaviors across broad domains. These results prompted concerns about the emergence of harmful behaviors from narrow domain fine-tuning. In this paper, we contextualize these findings by analyzing how such narrow adaptation impacts the internal mechanisms and behavioral manifestations of LLMs. Through a series of experiments covering output probability distributions, loss and gradient vector geometry, layer-wise activation dynamics, and activation space dimensions, we find that behaviors attributed to “emergent misalignment” may be better interpreted as an erosion of prior alignment. We show that fine tuning on insecure code induces internal changes that oppose alignment. Further, we identify a shared latent dimension in the model’s activation space that governs alignment behavior. We show that this space is activated by insecure code and by misaligned responses more generally, revealing how narrow fine-tuning can degrade general safety behavior by interfering with shared internal mechanisms. Our findings offer a mechanistic interpretation for previously observed misalignment phenomena, and highlights the fragility of alignment in LLMs. The results underscore the need for more robust fine-tuning strategies that preserve intended behavior across domains.

nan


Article 1095

Title@2025-07-04 (5): When Network Architecture Meets Physics: Deep Operator Learning for Coupled Multiphysics

Title: When Network Architecture Meets Physics: Deep Operator Learning for Coupled Multiphysics Wenn Netzwerkarchitektur auf Physik trifft: Deep Operator Learning für gekoppelte Multiphysik 当网络架构与物理相遇时:深操作者为混合多物理学学习 2507.03660v1

Authors (6): Kazuma Kobayashi, Jaewan Park, Qibang Liu, Seid Koric, Diab Abueidda, Syed Bahauddin Alam

Scientific applications increasingly demand real-time surrogate models that can capture the behavior of strongly coupled multiphysics systems driven by multiple input functions, such as in thermo-mechanical and electro-thermal processes. While neural operator frameworks, such as Deep Operator Networks (DeepONets), have shown considerable success in single-physics settings, their extension to multiphysics problems remains poorly understood. In particular, the challenge of learning nonlinear interactions between tightly coupled physical fields has received little systematic attention. This study addresses a foundational question: should the architectural design of a neural operator reflect the strength of physical coupling it aims to model? To answer this, we present the first comprehensive, architecture-aware evaluation of DeepONet variants across three regimes: single-physics, weakly coupled, and strongly coupled multiphysics systems. We consider a reaction-diffusion equation with dual spatial inputs, a nonlinear thermo-electrical problem with bidirectional coupling through temperature-dependent conductivity, and a viscoplastic thermo-mechanical model of steel solidification governed by transient phase-driven interactions. Two operator-learning frameworks, the classical DeepONet and its sequential GRU-based extension, S-DeepONet, are benchmarked using both single-branch and multi-branch (MIONet-style) architectures. Our results demonstrate that architectural alignment with physical coupling is crucial: single-branch networks significantly outperform multi-branch counterparts in strongly coupled settings, whereas multi-branch encodings offer advantages for decoupled or single-physics problems. Once trained, these surrogates achieve full-field predictions up to 1.8e4 times faster than high-fidelity finite-element solvers, without compromising solution accuracy.

nan


Article 1096

Title@2025-07-04 (5): Forecast Evaluation and the Relationship of Regret and Calibration

Title: Forecast Evaluation and the Relationship of Regret and Calibration Prognosebewertung und das Verhältnis von Bedauern und Kalibrierung 预测评价以及遗憾和校准的关系 2401.14483v3

Authors (2): Rabanus Derr, Robert C. Williamson

Machine learning is about forecasting. When the forecasts come with an evaluation metric the forecasts become useful. What are reasonable evaluation metrics? How do existing evaluation metrics relate? In this work, we provide a general structure which subsumes many currently used evaluation metrics in a two-dimensional hierarchy, e.g., external and swap regret, loss scores, and calibration scores. The framework embeds those evaluation metrics in a large set of single-instance-based comparisons of forecasts and observations which respect a meta-criterion for reasonable forecast evaluations which we term ``fairness’’. In particular, this framework sheds light on the relationship on regret-type and calibration-type evaluation metrics showing a theoretical equivalence in their ability to evaluate, but practical incomparability of the obtained scores.

nan


Article 1097

Title@2025-07-04 (5): PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks

Title: PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks PRUNE: Ein Patching-basiertes Reparatur-Framework für zertifizierbares Entlernen neuraler Netzwerke PPRUNE:神经网络可核证的不可学习补补补式修理框架 2505.06520v3

Authors (4): Xuran Li, Jingyi Wang, Xiaohan Yuan, Peixin Zhang

It is often desirable to remove (a.k.a. unlearn) a specific part of the training data from a trained neural network model. A typical application scenario is to protect the data holder’s right to be forgotten, which has been promoted by many recent regulation rules. Existing unlearning methods involve training alternative models with remaining data, which may be costly and challenging to verify from the data holder or a thirdparty auditor’s perspective. In this work, we provide a new angle and propose a novel unlearning approach by imposing carefully crafted “patch” on the original neural network to achieve targeted “forgetting” of the requested data to delete. Specifically, inspired by the research line of neural network repair, we propose to strategically seek a lightweight minimum “patch” for unlearning a given data point with certifiable guarantee. Furthermore, to unlearn a considerable amount of data points (or an entire class), we propose to iteratively select a small subset of representative data points to unlearn, which achieves the effect of unlearning the whole set. Extensive experiments on multiple categorical datasets demonstrates our approach’s effectiveness, achieving measurable unlearning while preserving the model’s performance and being competitive in efficiency and memory consumption compared to various baseline methods.

nan


Article 1098

Title@2025-07-04 (5): MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson’s Progression Markers Initiative cohort

Title: MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson’s Progression Markers Initiative cohort MLASDO: ein Software-Tool zur Erkennung und Erklärung klinischer und omischer Inkonsistenzen, die auf die Kohorte der Parkinson Progression Markers Initiative angewendet werden MLASDO:用于检测和解释适用于帕金森人进步标记倡议组群的临床和流行性不一致现象的软件工具 2507.03656v1

Authors (8): José A. Pardo, Tomás Bernal, Jaime Ñiguez, Ana Luisa Gil-Martínez, Laura Ibañez, José T. Palma, Juan A. Botía, Alicia Gómez-Pascual

Inconsistencies between clinical and omics data may arise within medical cohorts. The identification, annotation and explanation of anomalous omics-based patients or individuals may become crucial to better reshape the disease, e.g., by detecting early onsets signaled by the omics and undetectable from observable symptoms. Here, we developed MLASDO (Machine Learning based Anomalous Sample Detection on Omics), a new method and software tool to identify, characterize and automatically describe anomalous samples based on omics data. Its workflow is based on three steps: (1) classification of healthy and cases individuals using a support vector machine algorithm; (2) detection of anomalous samples within groups; (3) explanation of anomalous individuals based on clinical data and expert knowledge. We showcase MLASDO using transcriptomics data of 317 healthy controls (HC) and 465 Parkinson’s disease (PD) cases from the Parkinson’s Progression Markers Initiative. In this cohort, MLASDO detected 15 anomalous HC with a PD-like transcriptomic signature and PD-like clinical features, including a lower proportion of CD4/CD8 naive T-cells and CD4 memory T-cells compared to HC (P<3.510^-3). MLASDO also identified 22 anomalous PD cases with a transcriptomic signature more similar to that of HC and some clinical features more similar to HC, including a lower proportion of mature neutrophils compared to PD cases (P<610^-3). In summary, MLASDO is a powerful tool that can help the clinician to detect and explain anomalous HC and cases of interest to be followed up. MLASDO is an open-source R package available at: https://github.com/JoseAdrian3/MLASDO.

nan


Article 1099

Title@2025-07-04 (5): Disentangling the Roles of Representation and Selection in Data Pruning

Title: Disentangling the Roles of Representation and Selection in Data Pruning Entwirren der Rollen von Repräsentation und Auswahl bei der Datenkorrektur 分离数据谨慎中的代表和甄选作用 2507.03648v1

Authors (6): Yupei Du, Yingjin Song, Hugh Mee Wong, Daniil Ignatev, Albert Gatt, Dong Nguyen

Data pruning, selecting small but impactful subsets, offers a promising way to efficiently scale NLP model training. However, existing methods often involve many different design choices, which have not been systematically studied. This limits future developments. In this work, we decompose data pruning into two key components: the data representation and the selection algorithm, and we systematically analyze their influence on the selection of instances. Our theoretical and empirical results highlight the crucial role of representations: better representations, e.g., training gradients, generally lead to a better selection of instances, regardless of the chosen selection algorithm. Furthermore, different selection algorithms excel in different settings, and none consistently outperforms the others. Moreover, the selection algorithms do not always align with their intended objectives: for example, algorithms designed for the same objective can select drastically different instances, highlighting the need for careful evaluation.

nan


Article 1100

Title@2025-07-04 (5): SymmetryLens: Unsupervised Symmetry Learning via Locality and Density Preservation

Title: SymmetryLens: Unsupervised Symmetry Learning via Locality and Density Preservation SymmetrieLens: Unüberwachte Symmetrie Lernen über Lokalität und Dichte Erhaltung 对称路程:通过地点和密度保护进行不受监督的对称学习 2410.05232v2

Authors (2): Onur Efe, Arkadas Ozakin

We develop a new unsupervised symmetry learning method that starts with raw data and provides the minimal generator of an underlying Lie group of symmetries, together with a symmetry-equivariant representation of the data, which turns the hidden symmetry into an explicit one. The method is able to learn the pixel translation operator from a dataset with only an approximate translation symmetry and can learn quite different types of symmetries that are not apparent to the naked eye. The method is based on the formulation of an information-theoretic loss function that measures both the degree of symmetry of a dataset under a candidate symmetry generator and a proposed notion of locality of the samples, which is coupled to symmetry. We demonstrate that this coupling between symmetry and locality, together with an optimization technique developed for entropy estimation, results in a stable system that provides reproducible results.

nan


Article 1101

Title@2025-07-04 (5): A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications

Title: A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications Hybrides überwachtes und selbstüberwachtes Graphen-Neural-Netzwerk für Edge-Centric-Anwendungen 边缘中心应用混合监督和自我监督图形神经网络 2501.12309v2

Authors (3): Eugenio Borzone, Leandro Di Persia, Matias Gerard

This paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks), where the focus lies on predicting relationships and interactions between pairs of nodes rather than node properties themselves. This model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Additionally it incorporates an attention mechanism that leverages both node and edge features. The architecture, trained end-to-end, comprises two primary components: embedding generation and prediction. First, a graph neural network (GNN) transform raw node features into dense, low-dimensional embeddings, incorporating edge attributes. Then, a feedforward neural model processes the node embeddings to produce the final output. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction. The model also performs effectively with one-hot encoding for node features, providing a solution for the previously unsolved problem of predicting similarity between compounds with unknown structures.

nan


Article 1102

Title@2025-07-04 (5): Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

Title: Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States Probing Latent Subspaces in LLM für KI-Sicherheit: Identifizierung und Manipulation von Adversarialstaaten 为AI安全而在LLM中为LLM中为LLM中进行试验的低端子空间:识别和操纵反向国家 2503.09066v2

Authors (3): Xin Wei Chia, Swee Liang Wong, Jonathan Pan

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model’s layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.

nan


Article 1103

Title@2025-07-04 (5): PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction

Title: PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: 用于加密失传图像重建的基于隐私-保护联邦学习的残余常量空间网络 2507.00230v2

Authors (2): Peilin He, James Joshi

Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging, particularly in collaborative scenarios where centralized training poses significant privacy risks, including data leakage and inference attacks, as well as high computational costs. We propose a novel Privacy-Preserving Federated Learning-based RDSN (PPFL-RDSN) framework specifically tailored for lossy image reconstruction. PPFL-RDSN integrates Federated Learning (FL), local differential privacy, and robust model watermarking techniques, ensuring data remains secure on local devices, safeguarding sensitive information, and maintaining model authenticity without revealing underlying data. Empirical evaluations show that PPFL-RDSN achieves comparable performance to the state-of-the-art centralized methods while reducing computational burdens, and effectively mitigates security and privacy vulnerabilities, making it a practical solution for secure and privacy-preserving collaborative computer vision applications.

nan


Article 1104

Title@2025-07-04 (5): Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations

Title: Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations Wissenschaftliches maschinelles Lernen chaotischer Systeme entdeckt regierende Gleichungen für neurale Bevölkerungen 神经人口等分的麻风系统发现科学机学 2507.03631v1

Authors (9): Anthony G. Chesebro, David Hofmann, Vaibhav Dixit, Earl K. Miller, Richard H. Granger, Alan Edelman, Christopher V. Rackauckas, Lilianne R. Mujica-Parodi, Helmut H. Strey

Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.

nan


Article 1105

Title@2025-07-04 (5): Disentangling Doubt in Deep Causal AI

Title: Disentangling Doubt in Deep Causal AI Disentangling Zweifel in tiefen kausalen KI 《深因果中驱散怀疑》 2507.03622v1

Authors (1): Cooper Doyle

Accurate individual treatment-effect estimation in high-stakes applications demands both reliable point predictions and interpretable uncertainty quantification. We propose a factorized Monte Carlo Dropout framework for deep twin-network models that splits total predictive variance into representation uncertainty (sigma_rep) in the shared encoder and prediction uncertainty (sigma_pred) in the outcome heads. Across three synthetic covariate-shift regimes, our intervals are well-calibrated (ECE < 0.03) and satisfy sigma_rep^2 + sigma_pred^2 ~ sigma_tot^2. Additionally, we observe a crossover: head uncertainty leads on in-distribution data, but representation uncertainty dominates under shift. Finally, on a real-world twins cohort with induced multivariate shifts, only sigma_rep spikes on out-of-distribution samples (delta sigma ~ 0.0002) and becomes the primary error predictor (rho_rep <= 0.89), while sigma_pred remains flat. This module-level decomposition offers a practical diagnostic for detecting and interpreting uncertainty sources in deep causal-effect models.

nan


Article 1106

Title@2025-07-04 (5): Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy

Title: Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy Ist es Zeit, Prompts als Code zu behandeln? Eine Multi-Use-Fallstudie für Prompt-Optimierung mit DSPy 是否是时候将提示作为代码处理? 使用 DSPy 快速优化的多用途案例研究 2507.03620v1

Authors (3): Francisca Lemos, Victor Alves, Filipa Ferraz

Although prompt engineering is central to unlocking the full potential of Large Language Models (LLMs), crafting effective prompts remains a time-consuming trial-and-error process that relies on human intuition. This study investigates Declarative Self-improving Python (DSPy), an optimization framework that programmatically creates and refines prompts, applied to five use cases: guardrail enforcement, hallucination detection in code, code generation, routing agents, and prompt evaluation. Each use case explores how prompt optimization via DSPy influences performance. While some cases demonstrated modest improvements - such as minor gains in the guardrails use case and selective enhancements in hallucination detection - others showed notable benefits. The prompt evaluation criterion task demonstrated a substantial performance increase, rising accuracy from 46.2% to 64.0%. In the router agent case, the possibility of improving a poorly performing prompt and of a smaller model matching a stronger one through optimized prompting was explored. Although prompt refinement increased accuracy from 85.0% to 90.0%, using the optimized prompt with a cheaper model did not improve performance. Overall, this study’s findings suggest that DSPy’s systematic prompt optimization can enhance LLM performance, particularly when instruction tuning and example selection are optimized together. However, the impact varies by task, highlighting the importance of evaluating specific use cases in prompt optimization research.

nan


Article 1107

Title@2025-07-04 (5): DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

Title: DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation DTN: Deep Multiple Task-spezifisches Feature Interactions Network für Multi-Task-Empfehlung DTN:多任务建议深度多任务特有互动网络 2408.11611v4

Authors (17): Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Jian Dong, Guanxing Zhang

Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company’s real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models.

nan


Article 1108

Title@2025-07-04 (5): Non-negative matrix factorization algorithms generally improve topic model fits

Title: Non-negative matrix factorization algorithms generally improve topic model fits Nicht-negative Matrixfaktorisierungsalgorithmen verbessern generell Themenmodell-Anpassungen 非负矩阵非负矩阵乘数化算法一般改进专题模式 2105.13440v4

Authors (4): Peter Carbonetto, Abhishek Sarkar, Zihao Wang, Matthew Stephens

In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.

nan


Article 1109

Title@2025-07-04 (5): Kinetic Langevin Diffusion for Crystalline Materials Generation

Title: Kinetic Langevin Diffusion for Crystalline Materials Generation Kinetische Langevin Diffusion für die kristalline Materialgenerierung 水晶材料生产动态Langevin传播 2507.03602v1

Authors (6): François Cornet, Federico Bergamin, Arghya Bhowmik, Juan Maria Garcia Lastra, Jes Frellsen, Mikkel N. Schmidt

Generative modeling of crystalline materials using diffusion models presents a series of challenges: the data distribution is characterized by inherent symmetries and involves multiple modalities, with some defined on specific manifolds. Notably, the treatment of fractional coordinates representing atomic positions in the unit cell requires careful consideration, as they lie on a hypertorus. In this work, we introduce Kinetic Langevin Diffusion for Materials (KLDM), a novel diffusion model for crystalline materials generation, where the key innovation resides in the modeling of the coordinates. Instead of resorting to Riemannian diffusion on the hypertorus directly, we generalize Trivialized Diffusion Model (TDM) to account for the symmetries inherent to crystals. By coupling coordinates with auxiliary Euclidean variables representing velocities, the diffusion process is now offset to a flat space. This allows us to effectively perform diffusion on the hypertorus while providing a training objective that accounts for the periodic translation symmetry of the true data distribution. We evaluate KLDM on both Crystal Structure Prediction (CSP) and De-novo Generation (DNG) tasks, demonstrating its competitive performance with current state-of-the-art models.

nan


Article 1110

Title@2025-07-04 (5): SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Title: SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications SciVid: Cross-Domain-Bewertung von Videomodellen in wissenschaftlichen Anwendungen SciVid:科学应用录像模型跨主题评价 2507.03578v1

Authors (13): Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman

In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general-purpose domain-agnostic approaches. However, it is not known whether the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific disciplines, and if a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five Scientific Video tasks, across medical computer vision, animal behavior, and weather forecasting. We adapt six leading ViFMs to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating the potential for effective transfer learning. Specifically, we show that state-of-the-art results can be obtained in several applications by leveraging the general-purpose representations from ViFM backbones. Furthermore, our results reveal the limitations of existing ViFMs, and highlight opportunities for the development of generalizable models for high-impact scientific applications. We release our code at https://github.com/google-deepmind/scivid to facilitate further research in the development of ViFMs.

nan


Article 1111

Title@2025-07-04 (5): JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model

Title: JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model JanusDNA: Ein leistungsstarkes bidirektionales Hybrid-DNA-Stiftungsmodell JanusDNA:一个强大的双向双向混合DNA基金会模型 2505.17257v3

Authors (7): Qihao Duan, Bingding Huang, Zhenqiao Song, Irina Lehmann, Lei Gu, Roland Eils, Benjamin Wild

Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA

nan


Article 1112

Title@2025-07-04 (5): Learning unitaries with quantum statistical queries

Title: Learning unitaries with quantum statistical queries Lerneinheiten mit quantenstatistischen Abfragen 附有量数统计查询的学习单 2310.02254v2

Authors (1): Armando Angrisani

We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.

nan


Article 1113

Title@2025-07-04 (5): From Street Form to Spatial Justice: Explaining Urban Exercise Inequality via a Triadic SHAP-Informed Framework

Title: From Street Form to Spatial Justice: Explaining Urban Exercise Inequality via a Triadic SHAP-Informed Framework Von der Straßenform zur räumlichen Gerechtigkeit: Erklären städtischer Übungsungleichheit über einen triadischen SHAP-informierten Rahmen 从街道形式到空间司法:通过三部三部CHAP综合框架解释城市行使不平等问题 2507.03570v1

Authors (4): Minwei Zhao, Guosheng Yang, Zhuoni Zhang, Cai Wu

Urban streets are essential public spaces that facilitate everyday physical activity and promote health equity. Drawing on Henri Lefebvre’s spatial triad, this study proposes a conceptual and methodological framework to quantify street-level exercise deprivation through the dimensions of conceived (planning and structure), perceived (visual and sensory), and lived (practice and experiential) urban spaces. We integrate multi-source spatial data-including street networks, street-view imagery, and social media-using explainable machine learning (SHAP analysis) to classify streets by their dominant deprivation modes, forming a novel typology of spatial inequity. Results highlight significant differences across urban contexts: older city cores predominantly experience infrastructural constraints (conceived space), whereas new development areas suffer from experiential disengagement (lived space). Furthermore, by identifying spatial mismatches between population distribution and exercise intensity, our study reveals localized clusters of latent deprivation. Simulation experiments demonstrate that targeted improvements across spatial dimensions can yield up to 14% increases in exercise supportiveness. This research not only operationalizes Lefebvre’s spatial theory at the street scale but also provides actionable insights and intervention guidelines, contributing to the broader goals of spatial justice and urban health equity.

nan


Article 1114

Title@2025-07-04 (5): Consistency of augmentation graph and network approximability in contrastive learning

Title: Consistency of augmentation graph and network approximability in contrastive learning Kohärenz von Augmentationsgraph und Netzwerk-Annäherung im kontrastierenden Lernen 在对比性学习中,增强图和网络接近网络的一致性 2502.04312v2

Authors (2): Chenghui Li, A. Martina Neuman

Contrastive learning leverages data augmentation to develop feature representation without relying on large labeled datasets. However, despite its empirical success, the theoretical foundations of contrastive learning remain incomplete, with many essential guarantees left unaddressed, particularly the realizability assumption concerning neural approximability of an optimal spectral contrastive loss solution. In this work, we overcome these limitations by analyzing pointwise and spectral consistency of the augmentation graph Laplacian. We establish that, under specific conditions for data generation and graph connectivity, as the augmented dataset size increases, the augmentation graph Laplacian converges to a weighted Laplace-Beltrami operator on the natural data manifold. These consistency results ensure that the graph Laplacian spectrum effectively captures the manifold geometry. Consequently, they give way to a robust framework for establishing neural approximability, directly resolving the realizability assumption in a current paradigm.

nan


Article 1115

Title@2025-07-04 (5): 2.5D Object Detection for Intelligent Roadside Infrastructure

Title: 2.5D Object Detection for Intelligent Roadside Infrastructure 2.5D-Objekterkennung für intelligente Straßeninfrastruktur 2.5D 智能路边基础设施物体探测 2507.03564v1

Authors (6): Nikolai Polley, Yacin Boualili, Ferdinand Mütsch, Maximilian Zipfl, Tobias Fleck, J. Marius Zöllner

On-board sensors of autonomous vehicles can be obstructed, occluded, or limited by restricted fields of view, complicating downstream driving decisions. Intelligent roadside infrastructure perception systems, installed at elevated vantage points, can provide wide, unobstructed intersection coverage, supplying a complementary information stream to autonomous vehicles via vehicle-to-everything (V2X) communication. However, conventional 3D object-detection algorithms struggle to generalize under the domain shift introduced by top-down perspectives and steep camera angles. We introduce a 2.5D object detection framework, tailored specifically for infrastructure roadside-mounted cameras. Unlike conventional 2D or 3D object detection, we employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame. The parallelogram preserves the planar position, size, and orientation of objects while omitting their height, which is unnecessary for most downstream applications. For training, a mix of real-world and synthetically generated scenes is leveraged. We evaluate generalizability on a held-out camera viewpoint and in adverse-weather scenarios absent from the training set. Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions. Model weights and inference code are provided at: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection

nan


Article 1116

Title@2025-07-04 (5): Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention

Title: Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention Gradient Short-Circuit: Effiziente Out-of-Distribution-Erkennung über Feature-Intervention 渐进短路:通过地物干预有效外分配探测 2507.01417v2

Authors (3): Jiawei Gu, Ziyue Qiao, Zechao Li

Out-of-Distribution (OOD) detection is critical for safely deploying deep models in open-world environments, where inputs may lie outside the training distribution. During inference on a model trained exclusively with In-Distribution (ID) data, we observe a salient gradient phenomenon: around an ID sample, the local gradient directions for “enhancing” that sample’s predicted class remain relatively consistent, whereas OOD samples–unseen in training–exhibit disorganized or conflicting gradient directions in the same neighborhood. Motivated by this observation, we propose an inference-stage technique to short-circuit those feature coordinates that spurious gradients exploit to inflate OOD confidence, while leaving ID classification largely intact. To circumvent the expense of recomputing the logits after this gradient short-circuit, we further introduce a local first-order approximation that accurately captures the post-modification outputs without a second forward pass. Experiments on standard OOD benchmarks show our approach yields substantial improvements. Moreover, the method is lightweight and requires minimal changes to the standard inference pipeline, offering a practical path toward robust OOD detection in real-world applications.

nan


Article 1117

Title@2025-07-04 (5): Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure

Title: Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure Vereinfachende Graphen-Neural-Kernel: von Stapelschichten zur zusammengebrochenen Struktur 简化图形神经内核:从堆叠层到折叠结构 2507.03560v1

Authors (4): Lin Wang, Shijie Wang, Sirui Huang, Qing Li

The Graph Neural Tangent Kernel (GNTK) successfully bridges the gap between kernel methods and Graph Neural Networks (GNNs), addressing key challenges such as the difficulty of training deep networks and the limitations of traditional kernel methods. However, the existing layer-stacking strategy in GNTK introduces redundant computations, significantly increasing computational complexity and limiting scalability for practical applications. To address these issues, this paper proposes the Simplified Graph Neural Tangent Kernel (SGTK), which replaces the traditional multi-layer stacking mechanism with a continuous $K$-step aggregation operation. This novel approach streamlines the iterative kernel computation process, effectively eliminating redundant calculations while preserving the kernel’s expressiveness. By reducing the dependency on layer stacking, SGTK achieves both computational simplicity and efficiency. Furthermore, we introduce the Simplified Graph Neural Kernel (SGNK), which models infinitely wide Graph Neural Networks as Gaussian Processes. This allows kernel values to be directly determined from the expected outputs of activation functions in the infinite-width regime, bypassing the need for explicit layer-by-layer computation. SGNK further reduces computational complexity while maintaining the capacity to capture intricate structural patterns in graphs. Extensive experiments on node and graph classification tasks demonstrate that the proposed SGTK and SGNK achieve performance comparable to existing approaches while improving computational efficiency. Implementation details are available at https://anonymous.4open.science/r/SGNK-1CE4/.

nan


Article 1118

Title@2025-07-04 (5): Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware Sketching

Title: Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware Sketching Kommunikation Effiziente, differenzierte Private Distributed Optimization mit Correlation-Aware Sketching 使用关联软件拼写法实现通信效率高的、有差异的私人分配最佳优化 2507.03545v1

Authors (4): Julien Nicolas, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates

Federated learning with differential privacy suffers from two major costs: each client must transmit $d$-dimensional gradients every round, and the magnitude of DP noise grows with $d$. Yet empirical studies show that gradient updates exhibit strong temporal correlations and lie in a $k$-dimensional subspace with $k \ll d$. Motivated by this, we introduce DOME, a decentralized DP optimization framework in which each client maintains a compact sketch to project gradients into $\mathbb{R}^k$ before privatization and Secure Aggregation. This reduces per-round communication from order $d$ to order $k$ and moves towards a gradient approximation mean-squared error of $\sigma^2 k$. To allow the sketch to span new directions and prevent it from collapsing onto historical gradients, we augment it with random probes orthogonal to historical directions. We prove that our overall protocol satisfies $(\epsilon,\delta)$-Differential Privacy.

nan


Article 1119

Title@2025-07-04 (5): Generating Synthetic Relational Tabular Data via Structural Causal Models

Title: Generating Synthetic Relational Tabular Data via Structural Causal Models Synthetische relationale Tabellendaten mittels struktureller Kausalmodelle generieren 通过结构因果模型生成合成关系图表数据 2507.03528v1

Authors (4): Frederik Hoppe, Astrid Franz, Lars Kleinemeier, Udo Göbel

Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.

nan


Article 1120

Title@2025-07-04 (5): Decoupled Relative Learning Rate Schedules

Title: Decoupled Relative Learning Rate Schedules Entkoppelte Zeitpläne für relative Lernraten 分离的相对学习率表 2507.03526v1

Authors (11): Jan Ludziejewski, Jan Małaśnicki, Maciej Pióro, Michał Krutul, Kamil Ciebiera, Maciej Stefaniak, Jakub Krajewski, Piotr Sankowski, Marek Cygan, Kamil Adamczewski, Sebastian Jaszczur

In this work, we introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models. Traditional methods often apply a uniform learning rate across all network layers, potentially overlooking the unique dynamics of each part. Remarkably, our introduced relative learning rates, RLRS, method accelerates the training process by up to $23\%$, particularly in complex models such as Mixture of Experts (MoE). Hyperparameters of RLRS can be efficiently tuned on smaller models and then effectively reused on models up to $27\times$ larger. This simple and effective method results in a substantial reduction in training time and computational resources, offering a practical and scalable solution for optimizing large-scale neural networks.

nan


Article 1121

Title@2025-07-04 (5): UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies

Title: UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies UWB TDoA Fehlerkorrektur mit Transformern: Patching und Positional Encoding Strategien UWB TDoA 使用变换器校正错误:补丁和定位编码战略 2507.03523v1

Authors (3): Dieter Coppens, Adnan Shahid, Eli De Poorter

Despite their high accuracy, UWB-based localization systems suffer inaccuracies when deployed in industrial locations with many obstacles due to multipath effects and non-line-of-sight (NLOS) conditions. In such environments, current error mitigation approaches for time difference of arrival (TDoA) localization typically exclude NLOS links. However, this exclusion approach leads to geometric dilution of precision problems and this approach is infeasible when the majority of links are NLOS. To address these limitations, we propose a transformer-based TDoA position correction method that uses raw channel impulse responses (CIRs) from all available anchor nodes to compute position corrections. We introduce different CIR ordering, patching and positional encoding strategies for the transformer, and analyze each proposed technique’s scalability and performance gains. Based on experiments on real-world UWB measurements, our approach can provide accuracies of up to 0.39 m in a complex environment consisting of (almost) only NLOS signals, which is an improvement of 73.6 % compared to the TDoA baseline.

nan


Article 1122

Title@2025-07-04 (5): A Flexible Instruction Set Architecture for Efficient GEMMs

Title: A Flexible Instruction Set Architecture for Efficient GEMMs Flexible Instruktions-Set-Architektur für effiziente GEMMs 高效的通用环管机制的灵活教学结构 2507.03522v1

Authors (5): Alexandre de Limas Santana, Adrià Armejach, Francesc Martinez, Erich Focht, Marc Casas

GEneral Matrix Multiplications (GEMMs) are recurrent in high-performance computing and deep learning workloads. Typically, high-end CPUs accelerate GEMM workloads with Single-Instruction Multiple Data (SIMD) or vector Instruction Set Architectures (ISAs). Since these ISAs face significant issues when running GEMM workloads, particularly when dealing with small, tall, or skinny matrices, matrix ISAs have been proposed and implemented by major hardware vendors in the last years. Although these matrix ISAs deliver larger throughput when running GEMMs than their SIMD/vector counterparts, they are rigid solutions unable to dynamically adapt themselves to application-specific aspects like the data format. This paper demonstrates that the state-of-the-art matrix ISAs deliver suboptimal performance when running the most commonly used convolution and transformer models. This paper proposes the Matrix Tile Extension (MTE), the first matrix ISA that completely decouples the instruction set architecture from the microarchitecture and seamlessly interacts with existing vector ISAs. MTE incurs minimal implementation overhead since it only requires a few additional instructions and a 64-bit Control Status Register (CSR) to keep its state. Specifically, MTE can i) vectorize GEMMs across the three dimensions M, N, and K; ii) leverage the capacity of the existing vector register file; and iii) decouple the tile shape from the underlying microarchitecture. MTE achieves speed-ups of 1.35x over the best state-of-the-art matrix ISA.

nan


Article 1123

Title@2025-07-04 (5): High-Dimensional Learning in Finance

Title: High-Dimensional Learning in Finance High-Dimensional Learning in Finance 金融领域高层次学习 2506.03780v3

Authors (1): Hasan Fallahgoul

Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data–well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.

nan


Article 1124

Title@2025-07-04 (5): ObjectRL: An Object-Oriented Reinforcement Learning Codebase

Title: ObjectRL: An Object-Oriented Reinforcement Learning Codebase ObjectRL: Eine objektorientierte, verstärkte Lerncodebase 对象RL: 目标导向的加强学习代码库 2507.03487v1

Authors (7): Gulcin Baykal, Abdullah Akgül, Manuel Haussmann, Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for deep RL research by organizing best practices into explicit, clearly separated components, making them easier to understand and adapt. Each algorithmic component is a class with attributes that describe key RL concepts and methods that intuitively reflect their interactions. The class hierarchy closely follows common ontological relationships, enabling data encapsulation, inheritance, and polymorphism, which are core features of OOP. We demonstrate the efficiency of ObjectRL’s design through representative use cases that highlight its flexibility and suitability for rapid prototyping. The documentation and source code are available at https://objectrl.readthedocs.io and https://github.com/adinlab/objectrl .

nan


Article 1125

Title@2025-07-04 (5): Molecular Machine Learning Using Euler Characteristic Transforms

Title: Molecular Machine Learning Using Euler Characteristic Transforms Molekulares maschinelles Lernen mit Euler Charakteristische Transformationen 利用超强特征变形进行分子机学习 2507.03474v1

Authors (3): Victor Toscano-Duran, Florian Rottach, Bastian Rieck

The shape of a molecule determines its physicochemical and biological properties. However, it is often underrepresented in standard molecular representation learning approaches. Here, we propose using the Euler Characteristic Transform (ECT) as a geometrical-topological descriptor. Computed directly on a molecular graph derived from handcrafted atomic features, the ECT enables the extraction of multiscale structural features, offering a novel way to represent and encode molecular shape in the feature space. We assess the predictive performance of this representation across nine benchmark regression datasets, all centered around predicting the inhibition constant $K_i$. In addition, we compare our proposed ECT-based representation against traditional molecular representations and methods, such as molecular fingerprints/descriptors and graph neural networks (GNNs). Our results show that our ECT-based representation achieves competitive performance, ranking among the best-performing methods on several datasets. More importantly, its combination with traditional representations, particularly with the AVALON fingerprint, significantly \emph{enhances predictive performance}, outperforming other methods on most datasets. These findings highlight the complementary value of multiscale topological information and its potential for being combined with established techniques. Our study suggests that hybrid approaches incorporating explicit shape information can lead to more informative and robust molecular representations, enhancing and opening new avenues in molecular machine learning tasks. To support reproducibility and foster open biomedical research, we provide open access to all experiments and code used in this work.

nan


Article 1126

Title@2025-07-04 (5): Evaluating the Evaluators: Trust in Adversarial Robustness Tests

Title: Evaluating the Evaluators: Trust in Adversarial Robustness Tests Bewertung der Evaluatoren: Vertrauen in zweifelhafte Robustheitstests 评估评价者:对抗逆力测试的信任 2507.03450v1

Authors (6): Antonio Emanuele Cinà, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli

Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified implementations, and uneven computational budgets, which can lead to biased results and a false sense of security. Consequently, robustness claims built on such flawed testing protocols may be misleading and give a false sense of security. As a concrete step toward improving evaluation reliability, we present AttackBench, a benchmark framework developed to assess the effectiveness of gradient-based attacks under standardized and reproducible conditions. AttackBench serves as an evaluation tool that ranks existing attack implementations based on a novel optimality metric, which enables researchers and practitioners to identify the most reliable and effective attack for use in subsequent robustness evaluations. The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.

nan


Article 1127

Title@2025-07-04 (5): Multi-Level Fusion Graph Neural Network for Molecule Property Prediction

Title: Multi-Level Fusion Graph Neural Network for Molecule Property Prediction Multi-Level Fusion Graph Neural Network für Molecule Property Prediction 多级融合图图 分子财产预测多级融合图神经网络 2507.03430v1

Authors (4): XiaYu Liu, Hou-biao Li, Yang Liu, Chao Fan

Accurate molecular property prediction is essential in drug discovery and related fields. However, existing graph neural networks (GNNs) often struggle to simultaneously capture both local and global molecular structures. In this work, we propose a Multi-Level Fusion Graph Neural Network (MLFGNN) that integrates Graph Attention Networks and a novel Graph Transformer to jointly model local and global dependencies. In addition, we incorporate molecular fingerprints as a complementary modality and introduce a mechanism of interaction between attention to adaptively fuse information across representations. Extensive experiments on multiple benchmark datasets demonstrate that MLFGNN consistently outperforms state-of-the-art methods in both classification and regression tasks. Interpretability analysis further reveals that the model effectively captures task-relevant chemical patterns, supporting the usefulness of multi-level and multi-modal fusion in molecular representation learning.

nan


Article 1128

Title@2025-07-04 (5): Generating realistic patient data

Title: Generating realistic patient data Erzeugen realistischer Patientendaten 产生现实的病人数据 2507.03423v1

Authors (5): Tabea Brandt, Christina Büsing, Johanna Leweke, Finn Seesemann, Sina Weber

Developing algorithms for real-life problems that perform well in practice highly depends on the availability of realistic data for testing. Obtaining real-life data for optimization problems in health care, however, is often difficult. This is especially true for any patient related optimization problems, e.g., for patient-to-room assignment, due to data privacy policies. Furthermore, obtained real-life data usually cannot be published which prohibits reproducibility of results by other researchers. Therefore, often artificially generated instances are used. In this paper, we present combinatorial insights about the feasibility of instances for the patient-to-room assignment problem (PRA). We use these insights to develop a configurable instance generator for PRA with an easy-to-use graphical user interface. Configurability is in this case especially important as we observed in an extensive analysis of real-life data that, e.g., the probability distribution for patients’ age and length of stay depends on the respective ward.

nan


Article 1129

Title@2025-07-04 (5): A Hybrid Game-Theory and Deep Learning Framework for Predicting Tourist Arrivals via Big Data Analytics and Opinion Leader Detection

Title: A Hybrid Game-Theory and Deep Learning Framework for Predicting Tourist Arrivals via Big Data Analytics and Opinion Leader Detection Ein hybrides Game-Theorie- und Deep-Learning-Framework zur Vorhersage von Touristenankünften über Big Data Analytics und Opinion Leader Detection 通过大数据分析和舆论领袖检测,预测游客抵达的混合游戏理论和深层学习框架 2507.03411v1

Authors (1): Ali Nikseresht

In the era of Industry 5.0, data-driven decision-making has become indispensable for optimizing systems across Industrial Engineering. This paper addresses the value of big data analytics by proposing a novel non-linear hybrid approach for forecasting international tourist arrivals in two different contexts: (i) arrivals to Hong Kong from five major source nations (pre-COVID-19), and (ii) arrivals to Sanya in Hainan province, China (post-COVID-19). The method integrates multiple sources of Internet big data and employs an innovative game theory-based algorithm to identify opinion leaders on social media platforms. Subsequently, nonstationary attributes in tourism demand data are managed through Empirical Wavelet Transform (EWT), ensuring refined time-frequency analysis. Finally, a memory-aware Stacked Bi-directional Long Short-Term Memory (Stacked BiLSTM) network is used to generate accurate demand forecasts. Experimental results demonstrate that this approach outperforms existing state-of-the-art techniques and remains robust under dynamic and volatile conditions, highlighting its applicability to broader Industrial Engineering domains, such as logistics, supply chain management, and production planning, where forecasting and resource allocation are key challenges. By merging advanced Deep Learning (DL), time-frequency analysis, and social media insights, the proposed framework showcases how large-scale data can elevate the quality and efficiency of decision-making processes.

nan


Article 1130

Title@2025-07-04 (5): On the Effectiveness of the $z$-Transform Method in Quadratic Optimization

Title: On the Effectiveness of the $z$-Transform Method in Quadratic Optimization Über die Wirksamkeit der $z$-Transform Methode in der quadratischen Optimierung 美元-四重力优化转换方法的效果 2507.03404v1

Authors (1): Francis Bach

The $z$-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering. It allows for studying sequences from their generating functions, with many operations that can be equivalently defined on the original sequence and its $z$-transform. In particular, the $z$-transform method focuses on asymptotic behaviors and allows the use of Taylor expansions. We present a sequence of results of increasing significance and difficulty for linear models and optimization algorithms, demonstrating the effectiveness and versatility of the $z$-transform method in deriving new asymptotic results. Starting from the simplest gradient descent iterations in an infinite-dimensional Hilbert space, we show how the spectral dimension characterizes the convergence behavior. We then extend the analysis to Nesterov acceleration, averaging techniques, and stochastic gradient descent.

nan


Article 1131

Title@2025-07-04 (5): Absolute Evaluation Measures for Machine Learning: A Survey

Title: Absolute Evaluation Measures for Machine Learning: A Survey Absolute Bewertungsmaßnahmen für maschinelles Lernen: Eine Umfrage 机器学习绝对评价措施:调查 2507.03392v1

Authors (4): Silvia Beddar-Wiesing, Alice Moallemy-Oureh, Marie Kempkes, Josephine M. Thomas

Machine Learning is a diverse field applied across various domains such as computer science, social sciences, medicine, chemistry, and finance. This diversity results in varied evaluation approaches, making it difficult to compare models effectively. Absolute evaluation measures offer a practical solution by assessing a model’s performance on a fixed scale, independent of reference models and data ranges, enabling explicit comparisons. However, many commonly used measures are not universally applicable, leading to a lack of comprehensive guidance on their appropriate use. This survey addresses this gap by providing an overview of absolute evaluation metrics in ML, organized by the type of learning problem. While classification metrics have been extensively studied, this work also covers clustering, regression, and ranking metrics. By grouping these measures according to the specific ML challenges they address, this survey aims to equip practitioners with the tools necessary to select appropriate metrics for their models. The provided overview thus improves individual model evaluation and facilitates meaningful comparisons across different models and applications.

nan


Article 1132

Title@2025-07-04 (5): On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

Title: On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages Zur Expressivität und Längenverallgemeinerung von selektiven State-Space-Modellen auf regulären Sprachen 关于常规语言选择性国家空间模型的表达性和时间长度的一般化 2412.19350v2

Authors (6): Aleksandar Terzić, Michael Hersche, Giacomo Camposampiero, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.

nan


Article 1133

Title@2025-07-04 (5): Learning Traffic Anomalies from Generative Models on Real-Time Observations

Title: Learning Traffic Anomalies from Generative Models on Real-Time Observations Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen 实时观测生成模型的学习交通异常现象 2502.01391v4

Authors (2): Fotis I. Giasemis, Alexandros Sopasakis

Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.

nan


Article 1134

Title@2025-07-04 (5): Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Title: Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v2

Authors (8): Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei

Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.

nan


Article 1135

Title@2025-07-04 (5): Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Title: Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections Implizite Belohnung als Brücke: Eine einheitliche Ansicht von SFT- und DPO-Verbindungen 隐含奖励为桥梁:SFT和DPO连接的统一观点 2507.00018v2

Authors (10): Bo Wang, Qinyuan Cheng, Runyu Peng, Rong Bao, Peiji Li, Qipeng Guo, Linyang Li, Zhiyuan Zeng, Yunhua Zhou, Xipeng Qiu

Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we demonstrate that both SFT and preference learning methods like Direct Preference Optimization (DPO) operate within the same optimal policy-reward subspace, with SFT representing a special case of implicit reward learning. Our analysis reveals a critical limitation in conventional SFT: the KL divergence term in distribution matching becomes constant with respect to the policy during optimization, failing to constrain model updates. To address this, we propose a simple yet effective learning rate reduction approach that yields significant performance improvements (up to \textbf{25\%} relative gain and \textbf{6\%} absolute win rate increase in instruction following tasks. Additionally, we derive alternative SFT objectives from various f-divergence functions that preserve the KL term during optimization, further enhancing post-DPO model performance. Finally, we extend the theoretical relationship between LLM logits and Q-functions from preference learning to the SFT context, providing mathematical derivations and experimental validation.

nan


Article 1136

Title@2025-07-04 (5): MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning

Title: MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning MENTOR: Mixture-of-Experts-Netzwerk mit Task-Oriented Perturbation für visuelles Verstärkungslernen INTOOR: 视力强化学习中以任务为导向的干扰干扰模拟专家网络 2410.14972v3

Authors (9): Suning Huang, Zheyu Zhang, Tianhai Liang, Yihan Xu, Zhehao Kou, Chenhao Lu, Guowei Xu, Zhengrong Xue, Huazhe Xu

Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.

nan


Article 1137

Title@2025-07-04 (5): Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization

Title: Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization Maßnahmen Robustes Verstärktes Lernen durch Optimal Adversary Aware Policy Optimization 通过优化反向最佳认识政策优化,加强学习行动强力加强学习 2507.03372v1

Authors (4): Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao

Reinforcement Learning (RL) has achieved remarkable success in sequential decision tasks. However, recent studies have revealed the vulnerability of RL policies to different perturbations, raising concerns about their effectiveness and safety in real-world applications. In this work, we focus on the robustness of RL policies against action perturbations and introduce a novel framework called Optimal Adversary-aware Policy Iteration (OA-PI). Our framework enhances action robustness under various perturbations by evaluating and improving policy performance against the corresponding optimal adversaries. Besides, our approach can be integrated into mainstream DRL algorithms such as Twin Delayed DDPG (TD3) and Proximal Policy Optimization (PPO), improving action robustness effectively while maintaining nominal performance and sample efficiency. Experimental results across various environments demonstrate that our method enhances robustness of DRL policies against different action adversaries effectively.

nan


Article 1138

Title@2025-07-04 (5): Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

Title: Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education Detection of Disengagement from Voluntary Quizzes: Ein erklärbarer maschineller Lernansatz in der höheren Fernlehre 探测自愿Quizzes脱离自愿Quizzes:高远程教育中可解释的机械学习方法 2507.02681v2

Authors (7): Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin

Students disengaging from their tasks can have serious long-term consequences, including academic drop-out. This is particularly relevant for students in distance education. One way to measure the level of disengagement in distance education is to observe participation in non-mandatory exercises in different online courses. In this paper, we detect student disengagement in the non-mandatory quizzes of 42 courses in four semesters from a distance-based university. We carefully identified the most informative student log data that could be extracted and processed from Moodle. Then, eight machine learning algorithms were trained and compared to obtain the highest possible prediction accuracy. Using the SHAP method, we developed an explainable machine learning framework that allows practitioners to better understand the decisions of the trained algorithm. The experimental results show a balanced accuracy of 91\%, where about 85\% of disengaged students were correctly detected. On top of the highly predictive performance and explainable framework, we provide a discussion on how to design a timely intervention to minimise disengagement from voluntary tasks in online learning.

nan


Article 1139

Title@2025-07-04 (5): A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference

Title: A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference Ein Echtzeit-Digital Twin für Typ-1-Diabetes mit simulationsbasierter Schlussfolgerung 使用模拟推断法为1型糖尿病实时数字双 2507.01740v2

Authors (7): Trung-Dung Hoang, Alceu Bissoto, Vihangkumar V. Naik, Tim Flühmann, Artemii Shlychkov, Jose Garcia-Tirado, Lisa M. Koch

Accurately estimating parameters of physiological models is essential to achieving reliable digital twins. For Type 1 Diabetes, this is particularly challenging due to the complexity of glucose-insulin interactions. Traditional methods based on Markov Chain Monte Carlo struggle with high-dimensional parameter spaces and fit parameters from scratch at inference time, making them slow and computationally expensive. In this study, we propose a Simulation-Based Inference approach based on Neural Posterior Estimation to efficiently capture the complex relationships between meal intake, insulin, and glucose level, providing faster, amortized inference. Our experiments demonstrate that SBI not only outperforms traditional methods in parameter estimation but also generalizes better to unseen conditions, offering real-time posterior inference with reliable uncertainty quantification.

nan


Article 1140

Title@2025-07-04 (5): Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting

Title: Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting Adaptive Gate-Aware Mamba-Netzwerke für Magnetresonanz-Fingerprinting 用于磁共振指纹的适应性门软件Mamba网络 2507.03369v1

Authors (7): Tianyi Ding, Hongli Chen, Yang Gao, Zhuang Xiong, Feng Liu, Martijn A. Cloos, Hongfu Sun

Magnetic Resonance Fingerprinting (MRF) enables fast quantitative imaging by matching signal evolutions to a predefined dictionary. However, conventional dictionary matching suffers from exponential growth in computational cost and memory usage as the number of parameters increases, limiting its scalability to multi-parametric mapping. To address this, recent work has explored deep learning-based approaches as alternatives to DM. We propose GAST-Mamba, an end-to-end framework that combines a dual Mamba-based encoder with a Gate-Aware Spatial-Temporal (GAST) processor. Built on structured state-space models, our architecture efficiently captures long-range spatial dependencies with linear complexity. On 5 times accelerated simulated MRF data (200 frames), GAST-Mamba achieved a T1 PSNR of 33.12~dB, outperforming SCQ (31.69~dB). For T2 mapping, it reached a PSNR of 30.62~dB and SSIM of 0.9124. In vivo experiments further demonstrated improved anatomical detail and reduced artifacts. Ablation studies confirmed that each component contributes to performance, with the GAST module being particularly important under strong undersampling. These results demonstrate the effectiveness of GAST-Mamba for accurate and robust reconstruction from highly undersampled MRF acquisitions, offering a scalable alternative to traditional DM-based methods.

nan


Article 1141

Title@2025-07-04 (5): LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

Title: LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment LearnAlign: Grundlegende Datenauswahl für Verstärkungslernen in großen Sprachmodellen basierend auf verbesserter Gradient Alignment 学习对称:根据改进梯度对齐,为在大语言模式中强化学习选择理由数据 2506.11480v3

Authors (8): Shipeng Li, Shikun Li, Zhiqin Yang, Xinghua Zhang, Gaode Chen, Xiaobo Xia, Hengyu Liu, Zhe Peng

Reinforcement learning (RL) has become a key technique for enhancing LLMs’ reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative training reasoning data for RL post-training. To overcome the issue of response-length bias in gradient norms, we introduce the data learnability based on the success rate, which can indicate the learning potential of each data point. Experiments across three mathematical reasoning benchmarks demonstrate that our method significantly reduces training data requirements while achieving minor performance degradation or even improving performance compared to full-data training. For example, it reduces data requirements by up to 1,000 data points with better performance (77.53%) than that on the full dataset on GSM8K benchmark (77.04%). Furthermore, we show its effectiveness in the staged RL setting. This work provides valuable insights into data-efficient RL post-training and establishes a foundation for future research in optimizing reasoning data selection. To facilitate future work, we will release code.

nan


Article 1142

Title@2025-07-04 (5): Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Title: Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency Grade der Freiheit für lineare Aufmerksamkeit: Destillieren von Softmax-Achtung mit optimaler Feature-Effizienz 线性关注自由度:以最佳地物效率保持软性关注 2507.03340v1

Authors (3): Naoki Nishikawa, Rei Higuchi, Taiji Suzuki

Linear attention has attracted interest as a computationally efficient approximation to softmax attention, especially for long sequences. Recent studies have explored distilling softmax attention in pre-trained Transformers into linear attention. However, a critical challenge remains: how to choose the feature dimension that governs the approximation quality. Existing methods fix this dimension uniformly across all attention layers, overlooking the diverse roles and complexities of them. In this paper, we propose a principled method to automatically determine the feature dimension in linear attention using the concept of statistical degrees of freedom, which represent the effective dimensionality of the inputs. We provide a theoretical bound on the approximation error and show that the dimension chosen by our method achieves smaller error under a fixed computational budget. Furthermore, we introduce an efficient layerwise training strategy to learn nonlinear features tailored to each layer. Experiments on multiple pre-trained transformers demonstrate that our method improves the performance of distilled models compared to baselines without increasing the inference cost. Our findings also provide insight into how the complexity of the attention mechanism evolves across layers.

nan


Article 1143

Title@2025-07-04 (5): CAOTE: KV Cache Eviction for LLMs via Attention Output Error-Based Token Selection

Title: CAOTE: KV Cache Eviction for LLMs via Attention Output Error-Based Token Selection CAOTE: KV-Cache-Eviktion für LLMs über Aufmerksamkeitsausgabe Fehlerbasierte Token-Auswahl CAOTE: KV 通过注意输出错误选择选中的LLMs KV Cache Evoction 2504.14051v3

Authors (8): Raghavv Goel, Junyoung Park, Mukul Gagrani, Dalton Jones, Matthew Morse, Harper Langston, Mingu Lee, Chris Lott

While long context support of large language models has extended their abilities, it also incurs challenges in memory and compute which becomes crucial bottlenecks in resource-restricted devices. Token eviction, a widely adopted post-training methodology designed to alleviate the bottlenecks by evicting less important tokens from the cache, typically uses attention scores as proxy metrics for token importance. However, one major limitation of attention score as a token-wise importance metrics is that it lacks the information about contribution of tokens to the attention output. In this paper, we propose a simple eviction criterion based on the contribution of cached tokens to attention outputs. Our method, CAOTE, optimizes for eviction error due to token eviction, by seamlessly integrating attention scores and value vectors. This is the first method which uses value vector information on top of attention-based eviction scores. Additionally, CAOTE can act as a meta-heuristic method with flexible usage with any token eviction method. We show that CAOTE, when combined with the state-of-the-art attention score-based methods, always improves accuracies on the downstream task, indicating the importance of leveraging information from values during token eviction process.

nan


Article 1144

Title@2025-07-04 (5): Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Title: Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Disambiguation-Centric Finetuning macht Enterprise Tool-Calling LLMs realistischer und weniger riskant 差异性-中心微调使企业用工具召集LLM 更现实、风险更小 2507.03336v1

Authors (3): Ashutosh Hathidara, Julien Yu, Sebastian Schreiber

Large language models (LLMs) are increasingly tasked with invoking enterprise APIs, yet they routinely falter when near-duplicate tools vie for the same user intent or when required arguments are left underspecified. We introduce DiaFORGE (Dialogue Framework for Organic Response Generation & Evaluation), a disambiguation-centric, three-stage pipeline that (i) synthesizes persona-driven, multi-turn dialogues in which the assistant must distinguish among highly similar tools, (ii) performs supervised fine-tuning of open-source models with reasoning traces across 3B - 70B parameters, and (iii) evaluates real-world readiness via a dynamic suite that redeploys each model in a live agentic loop and reports end-to-end goal completion alongside conventional static metrics. On our dynamic benchmark DiaBENCH, models trained with DiaFORGE raise tool-invocation success by 27 pp over GPT-4o and by 49 pp over Claude-3.5-Sonnet, both under optimized prompting. To spur further research, we release an open corpus of 5000 production-grade enterprise API specifications paired with rigorously validated, disambiguation-focused dialogues, offering a practical blueprint for building reliable, enterprise-ready tool-calling agents.

nan


Article 1145

Title@2025-07-04 (5): Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

Title: Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v1

Authors (6): Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama

To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, often overlooking task-specific information that can be critical for optimal downstream performance. In this paper, focusing on the downstream task of classification, we propose a task-specific sampling strategy for generative dataset distillation that incorporates the concept of difficulty to consider the requirements of the target task better. The final dataset is sampled from a larger image pool with a sampling distribution obtained by matching the difficulty distribution of the original dataset. A logarithmic transformation is applied as a pre-processing step to correct for distributional bias. The results of extensive experiments demonstrate the effectiveness of our method and suggest its potential for enhancing performance on other downstream tasks.

nan


Article 1146

Title@2025-07-04 (5): Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis

Title: Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis Universal Checkpointing: Ein flexibles und effizientes Distributed Checkpointing-System für großformatige DNN-Schulungen mit rekonfigurierbarer Parallelis 通用检查:采用可重新配置平行系统进行大型DNN培训的灵活和高效分布式检查系统 2406.18820v3

Authors (7): Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang

Deep neural network (DNN) training continues to scale rapidly in terms of model size, data volume, and sequence length, to the point where multiple machines are required to fit large models for training. Different distributed and parallel training strategies have been developed to support large-scale DNN training by partitioning the training state across GPUs. However, existing DNN training systems provide very limited support for reconfiguring parallelism strategies in the middle of the training via checkpointing. This limitation arises because distributed checkpoints are tightly coupled to specific model parallelism and hardware configurations, preventing large-scale training jobs from efficiently adapting to hardware failures or resource elasticity. This paper presents Universal Checkpointing (UCP), a novel checkpointing system that enables flexible and efficient DNN training with reconfigurable parallelism. UCP overcomes challenges in existing systems by decoupling checkpoint structure from parallel training strategies and hardware configurations. In addition, we present a pattern-based reconfiguration pipeline that enables automatic, flexible, and efficient mapping of checkpoint state to various parallelism strategies. Evaluation on a range of DNN models, including state-of-the-art dense and sparse LLMs, shows that UCP enables reconfiguration for a broader set of widely used parallelism strategies than existing solutions while adding negligible reconfiguration cost. UCP has been successfully employed in real LLM training workloads, greatly enhancing their flexibility and resilience to dynamic hardware environments.

nan


Article 1147

Title@2025-07-04 (5): Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization

Title: Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization Structure-Aware Compound-Protein Affinitätsvorhersage über Graph Neural Network mit Group Lasso Regularization 通过Lasso集团正规化的图像神经网络进行结构器件复合复合物-蛋白质亲近性预测 2507.03318v1

Authors (7): Zanyu Shi, Yang Wang, Pathum Weerawarna, Jie Zhang, Timothy Richardson, Yijie Wang, Kun Huang

Explainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable machine learning models for structure-activity relationship (SAR) modeling for compound property prediction faces many challenges, such as limited activity data per target and the sensitivity of properties to subtle molecular changes. To address this, we leveraged activity-cliff molecule pairs, i.e., compounds sharing a common scaffold but differing sharply in potency, targeting three proto-oncogene tyrosine-protein kinase Src proteins (i.e., PDB IDs 1O42, 2H8H, and 4MXO). We implemented graph neural network (GNN) methods to obtain atom-level feature information and predict compound-protein affinity (i.e., half maximal inhibitory concentration, IC50). In addition, we trained GNN models with different structure-aware loss functions to adequately leverage molecular property and structure information. We also utilized group lasso and sparse group lasso to prune and highlight molecular subgraphs and enhance the structure-specific model explainability for the predicted property difference in molecular activity-cliff pairs. We improved drug property prediction by integrating common and uncommon node information and using sparse group lasso, reducing the average root mean squared error (RMSE) by 12.70%, and achieving the lowest averaged RMSE=0.2551 and the highest PCC=0.9572. Furthermore, applying regularization enhances feature attribution methods that estimate the contribution of each atom in the molecular graphs by boosting global direction scores and atom-level accuracy in atom coloring accuracy, which improves model interpretability in drug discovery pipelines, particularly in investigating important molecular substructures in lead optimization.

nan


Article 1148

Title@2025-07-04 (5): Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Title: Rewriting Pre-Training Data Boosts LLM Performance in Math and Code Umschreiben von Vorschulungsdaten steigert die LLM-Performance in Mathematik und Code 数学和代码方面的LLM性能 2505.02881v3

Authors (16): Kazuki Fujii, Yukito Tajima, Sakae Mizuki, Hinari Shimada, Taihei Shiotani, Koshiro Saito, Masanari Ohi, Masaki Kawamura, Taishi Nakamura, Takumi Okamoto, Shigeki Ishida, Kakeru Hattori, Youmi Ma, Hiroya Takamura, Rio Yokota, Naoaki Okazaki

The performance of large language models (LLMs) in program synthesis and mathematical reasoning is fundamentally limited by the quality of their pre-training corpora. We introduce two openly licensed datasets, released under the Llama 3.3 Community License, that significantly enhance LLM performance by systematically rewriting public data. SwallowCode (approximately 16.1 billion tokens) refines Python snippets from The-Stack-v2 through a novel four-stage pipeline: syntax validation, pylint-based style filtering, and a two-stage LLM rewriting process that enforces style conformity and transforms snippets into self-contained, algorithmically efficient examples. Unlike prior methods that rely on exclusionary filtering or limited transformations, our transform-and-retain approach upgrades low-quality code, maximizing data utility. SwallowMath (approximately 2.3 billion tokens) enhances Finemath-4+ by removing boilerplate, restoring context, and reformatting solutions into concise, step-by-step explanations. Within a fixed 50 billion token training budget, continual pre-training of Llama-3.1-8B with SwallowCode boosts pass@1 by +17.0 on HumanEval and +17.7 on HumanEval+ compared to Stack-Edu, surpassing the baseline model’s code generation capabilities. Similarly, substituting SwallowMath yields +12.4 accuracy on GSM8K and +7.6 on MATH. Ablation studies confirm that each pipeline stage contributes incrementally, with rewriting delivering the largest gains. All datasets, prompts, and checkpoints are publicly available, enabling reproducible research and advancing LLM pre-training for specialized domains.

nan


Article 1149

Title@2025-07-04 (5): OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Title: OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework OpenRLHF: Ein einfach zu bedienendes, skalierbares und leistungsstarkes RLHF-Framework OpenRLHF: 易于使用、可缩放和高绩效RLHF框架 2405.11143v5

Authors (15): Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Zilin Zhu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Weikai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu

Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22x to 1.68x across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.

nan


Article 1150

Title@2025-07-04 (5): Deep Autoregressive Models as Causal Inference Engines

Title: Deep Autoregressive Models as Causal Inference Engines Deep Autoregressive Modelle als Kausale Inferenz-Engine 深自递减模型,如因果推断引擎 2409.18581v3

Authors (4): Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho

Existing causal inference (CI) models are often restricted to data with low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions commonly found in modern applications. Our approach accomplishes this using {\em sequencification}, which transforms data from an underlying causal diagram into a sequence of tokens. Sequencification not only accommodates training with data generated from a large class of DAGs, but also extends existing CI capabilities to estimate multiple causal quantities using a {\em single} model. We can directly compute probabilities from interventional distributions, simplifying inference and improving outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates, where we consider causal queries beyond standard reinforcement learning-type questions.

nan


Article 1151

Title@2025-07-04 (5): Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

Title: Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure Diffusionsfaktormodelle: Erzeugen von hochdimensionalen Rückgaben mit Faktorstruktur 传播因数模型:产生具有因数结构的高差异返回 2504.06566v3

Authors (4): Minshuo Chen, Renyuan Xu, Yumin Xu, Ruixun Zhang

Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function–a key component in diffusion models–using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.

nan


Article 1152

Title@2025-07-04 (5): KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

Title: KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction KEPLA: Ein Wissen-erweitertes Deep Learning Framework für präzise Protein-Ligand Bindung Affinity Prediction KEPLA:一个知识强化的更深层学习框架,用于准确预测蛋白-银-捆绑性近亲关系 2506.13196v2

Authors (7): Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

nan


Article 1153

Title@2025-07-04 (5): ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series

Title: ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series ReTimeCausal: 非定期时间序列中可解释性因果关系发现模型的EM- 增强的Additive Additive 噪音模型 2507.03310v1

Authors (4): Weihong Li, Anpeng Wu, Kun Kuang, Keting Yin

This paper studies causal discovery in irregularly sampled time series-a pivotal challenge in high-stakes domains like finance, healthcare, and climate science, where missing data and inconsistent sampling frequencies distort causal mechanisms. Traditional methods (e.g., Granger causality, PCMCI) fail to reconcile multi-scale interactions (e.g., hourly storms vs. decadal climate shifts), while neural approaches (e.g., CUTS+) lack interpretability, stemming from a critical gap: existing frameworks either rigidly assume temporal regularity or aggregate dynamics into opaque representations, neglecting real-world granularity and auditable logic. To bridge this gap, we propose ReTimeCausal, a novel integration of Additive Noise Models (ANM) and Expectation-Maximization (EM) that unifies physics-guided data imputation with sparse causal inference. Through kernelized sparse regression and structural constraints, ReTimeCausal iteratively refines missing values (E-step) and causal graphs (M-step), resolving cross-frequency dependencies and missing data issues. Extensive experiments on synthetic and real-world datasets demonstrate that ReTimeCausal outperforms existing state-of-the-art methods under challenging irregular sampling and missing data conditions.

nan


Article 1154

Title@2025-07-04 (5): LRM-1B: Towards Large Routing Model

Title: LRM-1B: Towards Large Routing Model LRM-1B: Auf dem Weg zu einem großen Routing-Modell LRM-1B:迈向大型区域迁移模式 2507.03300v1

Authors (4): Han Li, Fei Liu, Zhenkun Wang, Qingfu Zhang

Vehicle routing problems (VRPs) are central to combinatorial optimization with significant practical implications. Recent advancements in neural combinatorial optimization (NCO) have demonstrated promising results by leveraging neural networks to solve VRPs, yet the exploration of model scaling within this domain remains underexplored. Inspired by the success of model scaling in large language models (LLMs), this study introduces a Large Routing Model with 1 billion parameters (LRM-1B), designed to address diverse VRP scenarios. We present a comprehensive evaluation of LRM-1B across multiple problem variants, distributions, and sizes, establishing state-of-the-art results. Our findings reveal that LRM-1B not only adapts to different VRP challenges but also showcases superior performance, outperforming existing models. Additionally, we explore the scaling behavior of neural routing models from 1M to 1B parameters. Our analysis confirms power-law between multiple model factors and performance, offering critical insights into the optimal configurations for foundation neural routing solvers.

nan


Article 1155

Title@2025-07-04 (5): Dyn-O: Building Structured World Models with Object-Centric Representations

Title: Dyn-O: Building Structured World Models with Object-Centric Representations Dyn-O: Strukturierte Weltmodelle mit objektzentralen Darstellungen Dyn-O:建立结构化世界模型,有实物中心代表 2507.03298v1

Authors (5): Zizhao Wang, Kaixin Wang, Li Zhao, Peter Stone, Jiang Bian

World models aim to capture the dynamics of the environment, enabling agents to predict and plan for future states. In most scenarios of interest, the dynamics are highly centered on interactions among objects within the environment. This motivates the development of world models that operate on object-centric rather than monolithic representations, with the goal of more effectively capturing environment dynamics and enhancing compositional generalization. However, the development of object-centric world models has largely been explored in environments with limited visual complexity (such as basic geometries). It remains underexplored whether such models can generalize to more complex settings with diverse textures and cluttered scenes. In this paper, we fill this gap by introducing Dyn-O, an enhanced structured world model built upon object-centric representations. Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics. On the challenging Procgen games, we find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy. Furthermore, by decoupling object-centric features into dynamics-agnostic and dynamics-aware components, we enable finer-grained manipulation of these features and generate more diverse imagined trajectories.

nan


Article 1156

Title@2025-07-04 (5): SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

Title: SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v3

Authors (3): Jiaheng Hu, Peter Stone, Roberto Martín-Martín

Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information and robot videos at robo-rl.github.io

nan


Article 1157

Title@2025-07-04 (5): MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs

Title: MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression von LLMs MGAA: 低Rank 低Rank 低压压LLMs的多组适应性分配 2507.03294v1

Authors (3): Guangyan Li, Yongqiang Tang, Wensheng Zhang

The enormous parameter scale of large language models (LLMs) has made model compression a research hotspot, which aims to alleviate computational resource demands during deployment and inference. As a promising direction, low-rank approximation technique has made remarkable achievements. Nevertheless, unfortunately, the vast majority of studies to low-rank approximation compression generally apply uniform compression ratios across all weight matrices, while disregarding their inherently differentiated impacts on the model’s performance. Although a few recent work attempts to employ heuristic search strategies to achieve the optimal parameter allocation, such strategies are computationally inefficient and lose the generalization ability in the era of LLMs. In this study, we propose a novel parameter Multi-Granular Adaptive Allocation (MGAA) method, which can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process. MGAA consists of two components: 1) Among different sublayers, it assigns compression ratios based on their cosine similarity between inputs and outputs, allowing for a more tailored compression in sublayers with varying degrees of importance, and 2) Within each sublayer, it allocates different compression ratios to weight matrices based on their energy distribution characteristics, ensuring a consistent energy retention ratio while optimizing compression efficiency. Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance. Additionally, we apply our MGAA to multimodal model LLaVA, exhibiting remarkable performance improvements.

nan


Article 1158

Title@2025-07-04 (5): LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

Title: LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents LTLCrit: Ein auf Temporal Logic basierender LLM-Kritiker für sichere und effiziente einkörperige Wirkstoffe LLLCritri: 一种基于时时逻辑的LLM LLM 用于安全和高效装药剂的Criticr 2507.03293v1

Authors (3): Anand Gokhale, Vaibhav Srivastava, Francesco Bullo

Large language models (LLMs) have demonstrated promise in reasoning tasks and general decision-making in static environments. In long-term planning tasks, however, errors tend to accumulate, often leading to unsafe or inefficient behavior, limiting their use in general-purpose settings. We propose a modular actor-critic architecture in which an LLM actor is guided by LTLCrit, a trajectory-level LLM critic that communicates via linear temporal logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. The architecture supports both fixed, hand-specified safety constraints and adaptive, learned soft constraints that promote long-term efficiency. Our architecture is model-agnostic: any LLM-based planner can serve as the actor, and LTLCrit serves as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LTLCrit to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. We evaluate our system on the Minecraft diamond-mining benchmark, achieving 100% completion rates and improving efficiency compared to baseline LLM planners. Our results suggest that enabling LLMs to supervise each other through logic is a powerful and flexible paradigm for safe, generalizable decision making.

nan


Article 1159

Title@2025-07-04 (5): Lion Cub: Minimizing Communication Overhead in Distributed Lion

Title: Lion Cub: Minimizing Communication Overhead in Distributed Lion Lion Cub: Minimierung der Kommunikation über Kopf in verteilten Löwen Lion Cub:尽量减少分配狮子的通讯问题 2411.16462v2

Authors (5): Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden

Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techniques have been explored for SGD and Adam, the Lion optimizer has the distinct advantage that its update vectors are the output of a sign operation, enabling straightforward quantization. However, simply compressing updates for communication and using techniques like majority voting fails to lead to end-to-end speedups due to inefficient communication algorithms and reduced convergence. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. Our findings show that quantization techniques adapted to Lion and selective momentum synchronization can significantly reduce communication costs while maintaining convergence. We combine these into Lion Cub, which enables up to 5x speedups in end-to-end training compared to Lion. This highlights Lion’s potential as a communication-efficient solution for distributed training.

nan


Article 1160

Title@2025-07-04 (5): Global Variational Inference Enhanced Robust Domain Adaptation

Title: Global Variational Inference Enhanced Robust Domain Adaptation Globale Variationsableitung Verbesserte Robuste Domain-Anpassung 全球变异推力增强强力域域适应 2507.03291v1

Authors (3): Lingkun Luo, Shiqiang Hu, Liming Chen

Deep learning-based domain adaptation (DA) methods have shown strong performance by learning transferable representations. However, their reliance on mini-batch training limits global distribution modeling, leading to unstable alignment and suboptimal generalization. We propose Global Variational Inference Enhanced Domain Adaptation (GVI-DA), a framework that learns continuous, class-conditional global priors via variational inference to enable structure-aware cross-domain alignment. GVI-DA minimizes domain gaps through latent feature reconstruction, and mitigates posterior collapse using global codebook learning with randomized sampling. It further improves robustness by discarding low-confidence pseudo-labels and generating reliable target-domain samples. Extensive experiments on four benchmarks and thirty-eight DA tasks demonstrate consistent state-of-the-art performance. We also derive the model’s evidence lower bound (ELBO) and analyze the effects of prior continuity, codebook size, and pseudo-label noise tolerance. In addition, we compare GVI-DA with diffusion-based generative frameworks in terms of optimization principles and efficiency, highlighting both its theoretical soundness and practical advantages.

nan


Article 1161

Title@2025-07-04 (5): Do Tensorized Large-Scale Spatiotemporal Dynamic Atmospheric Data Exhibit Low-Rank Properties?

Title: Do Tensorized Large-Scale Spatiotemporal Dynamic Atmospheric Data Exhibit Low-Rank Properties? Tensorisierte großräumige räumlich-zeitliche dynamische atmosphärische Daten zeigen Low-Rank-Eigenschaften? 长期大型外观、时空动态大气数据展览低兰克属性是否? 2507.03289v1

Authors (3): Ryan Solgi, Seyedali Mousavinezhad, Hugo A. Loaiciga

In this study, we investigate for the first time the low-rank properties of a tensorized large-scale spatio-temporal dynamic atmospheric variable. We focus on the Sentinel-5P tropospheric NO2 product (S5P-TN) over a four-year period in an area that encompasses the contiguous United States (CONUS). Here, it is demonstrated that a low-rank approximation of such a dynamic variable is feasible. We apply the low-rank properties of the S5P-TN data to inpaint gaps in the Sentinel-5P product by adopting a low-rank tensor model (LRTM) based on the CANDECOMP / PARAFAC (CP) decomposition and alternating least squares (ALS). Furthermore, we evaluate the LRTM’s results by comparing them with spatial interpolation using geostatistics, and conduct a comprehensive spatial statistical and temporal analysis of the S5P-TN product. The results of this study demonstrated that the tensor completion successfully reconstructs the missing values in the S5P-TN product, particularly in the presence of extended cloud obscuration, predicting outliers and identifying hotspots, when the data is tensorized over extended spatial and temporal scales.

nan


Article 1162

Title@2025-07-04 (5): VGMShield: Mitigating Misuse of Video Generative Models

Title: VGMShield: Mitigating Misuse of Video Generative Models VGMShield: Missbrauch von Videogenerativen Modellen verhindern VGMShield:减少滥用视频生成模型 2402.13126v2

Authors (4): Yan Pang, Baicheng Chen, Yang Zhang, Tianhao Wang

With the rapid advancement in video generation, people can conveniently use video generation models to create videos tailored to their specific desires. As a result, there are also growing concerns about the potential misuse of video generation for spreading illegal content and misinformation. In this work, we introduce VGMShield: a set of straightforward but effective mitigations through the lifecycle of fake video generation. We start from fake video detection, trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it. Towards these, we propose to leverage pre-trained models that focus on spatial-temporal dynamics as the backbone to identify inconsistencies in videos. In detail, we analyze fake videos from the perspective of the generation process. Based on the observation of attention shifts, motion variations, and frequency fluctuations, we identify common patterns in the generated video. These patterns serve as the foundation for our experiments on fake video detection and source tracing. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot reliably reproduce spatial-temporal relationships, and thus, we can accomplish detection and source tracing with over 90% accuracy. Furthermore, anticipating future generative model improvements, we propose a prevention method that adds invisible perturbations to the query images to make the generated videos look unreal. Together with detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models.

nan


Article 1163

Title@2025-07-04 (5): QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction

Title: QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction QCResUNet: Gemeinsame Segmentierung auf Subjektebene und Voxel-Ebene QCResUNet:联合主题一级和福克斯级分解质量预测 2412.07156v2

Authors (5): Peijie Qiu, Satrajit Chakrabarty, Phuc Nguyen, Soumyendu Sekhar Ghosh, Aristeidis Sotiras

Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.

nan


Article 1164

Title@2025-07-04 (5): Conformal Information Pursuit for Interactively Guiding Large Language Models

Title: Conformal Information Pursuit for Interactively Guiding Large Language Models Conformal Information Pursuit für interactively Guiding große Sprachmodelle 互动指导型大语言模式非正式信息搜索 2507.03279v1

Authors (5): Kwan Ho Ryan Chan, Yuyan Ge, Edgar Dobriban, Hamed Hassani, René Vidal

A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a prediction by sequentially querying relevant information from the user, as opposed to a single-turn conversation. This paper explores sequential querying strategies that aim to minimize the expected number of queries. One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty. However, obtaining accurate estimates of mutual information or conditional entropy for LLMs is very difficult in practice due to over- or under-confident LLM probabilities, which leads to suboptimal query selection and predictive performance. To better estimate the uncertainty at each iteration, we propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets. More specifically, C-IP leverages a relationship between prediction sets and conditional entropy at each iteration to estimate uncertainty based on the average size of conformal prediction sets. In contrast to conditional entropy, we find that conformal prediction sets are a distribution-free and robust method of measuring uncertainty. Experiments with 20 Questions show that C-IP obtains better predictive performance and shorter query-answer chains compared to previous approaches to IP and uncertainty-based chain-of-thought methods. Furthermore, extending to an interactive medical setting between a doctor and a patient on the MediQ dataset, C-IP achieves competitive performance with direct single-turn prediction while offering greater interpretability.

nan


Article 1165

Title@2025-07-04 (5): Securing Transformer-based AI Execution via Unified TEE and Crypto-protected Accelerators

Title: Securing Transformer-based AI Execution via Unified TEE and Crypto-protected Accelerators Sicherung transformerbasierter KI-Execution über Unified TEE und Crypto-geschützte Acceleratoren 通过统一TEE和加密保护加速器实施基于安全变换器的AI执行 2507.03278v1

Authors (7): Jiaqi Xue, Yifei Zhao, Mengxin Zheng, Xun Chen, Fan Yao, Yan Solihin, Qian Lou

Recent advances in Transformer models, e.g., large language models (LLMs), have brought tremendous breakthroughs in various artificial intelligence (AI) tasks, leading to their wide applications in many security-critical domains. Due to their unprecedented scale and prohibitively high development cost, these models have become highly valuable intellectual property for AI stakeholders and are increasingly deployed via machine learning as a service (MLaaS). However, MLaaS often runs on untrusted cloud infrastructure, exposing data and models to potential breaches. Mainstream protection mechanisms leverage trusted execution environments (TEEs) where confidentiality and integrity for secretive data are shielded using hardware-based encryption and integrity checking. Unfortunately, running model inference entirely within TEEs is subject to non-trivial slowdown, which is further exacerbated in LLMs due to the substantial computation and memory footprint involved. Recent studies reveal that the hybrid TEE-based scheme offloading partial model inference operations to the untrusted accelerators (e.g., GPU) is a promising solution. However, prior offloading schemes fail to ensure dual protection of data and model in Transformer inference, as they cannot securely offload critical operations, i.e., Attention and SoftMax, forcing these computations to remain confined within TEEs. To address these challenges, we propose TwinShield, a framework enabling secure Transformer inference in heterogeneous TEE and accelerator systems with dual protection for both model and data. TwinShield offloads ~87% of computation to GPUs and delivers 4.0x - 6.1x speedups over previous approaches across various Transformer models.

nan


Article 1166

Title@2025-07-04 (5): REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Title: REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v5

Authors (3): Jian Hu, Jason Klein Liu, Wei Shen

Reinforcement Learning from Human Feedback (RLHF) is crucial in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT/GPT-4 commonly employ Proximal Policy Optimization (PPO), including a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on more straightforward prompts and vulnerability to reward hacking. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the normalized reward of a batch as the baseline. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in RLHF and long chain-of-thought (CoT) settings compared to REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.

nan


Article 1167

Title@2025-07-04 (5): DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning

Title: DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning DBA-DFL: Auf dem Weg zu verteilten Backdoor-Angriffen mit Netzwerkerkennung im dezentralisierten Föderierten Lernen DBA-DDFL:努力在权力下放的联邦学习中以网络探测方式进行分散的后门攻击 2501.15005v2

Authors (6): Bohan Liu, Yang Xiao, Ruimeng Ye, Zinan Ling, Xiaolong Ma, Bo Hui

Distributed backdoor attacks (DBA) have shown a higher attack success rate than centralized attacks in centralized federated learning (FL). However, it has not been investigated in the decentralized FL. In this paper, we experimentally demonstrate that, while directly applying DBA to decentralized FL, the attack success rate depends on the distribution of attackers in the network architecture. Considering that the attackers can not decide their location, this paper aims to achieve a high attack success rate regardless of the attackers’ location distribution. Specifically, we first design a method to detect the network by predicting the distance between any two attackers on the network. Then, based on the distance, we organize the attackers in different clusters. Lastly, we propose an algorithm to \textit{dynamically} embed local patterns decomposed from a global pattern into the different attackers in each cluster. We conduct a thorough empirical investigation and find that our method can, in benchmark datasets, outperform both centralized attacks and naive DBA in different decentralized frameworks.

nan


Article 1168

Title@2025-07-04 (5): SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Title: SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA SC-LoRA: Ausbalancieren effizienter Feinsteuerung und Wissenserhaltung über Subraum-kontrainierte LoRA SC-LORA:通过分空间训练LORA平衡高效微调和知识保护 2505.23724v2

Authors (5): Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He

Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.

nan


Article 1169

Title@2025-07-04 (5): Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Title: Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos Memory Storyboard: Nutzung von Temporal Segmentation zum Streaming von selbstüberwachtem Lernen aus egozentrischen Videos 内存故事板:利用时间分层利用从 Egocent 视频中流出自我超强学习 2501.12254v2

Authors (2): Yanlai Yang, Mengye Ren

Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard” that groups recent past frames into temporal segments for more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, and the storyboard temporal segments are then transferred to a long-term memory. Experiments on real-world egocentric video datasets including SAYCam and KrishnaCam show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.

nan


Article 1170

Title@2025-07-04 (5): Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies

Title: Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies Auftragserwerb unter Wettbewerbsdruck: Ein schnell adaptiver Lernansatz zur Stärkung der Reit-Hailing-Subsidy-Strategien 在竞争压力下取得订单:快速适应性强化学习办法,用于骑乘-快乐补贴战略 2507.02244v2

Authors (6): Fangzhou Shi, Xiaopeng Ke, Xinye Xiong, Kexin Meng, Chang Men, Zhengdan Zhu

The proliferation of ride-hailing aggregator platforms presents significant growth opportunities for ride-service providers by increasing order volume and gross merchandise value (GMV). On most ride-hailing aggregator platforms, service providers that offer lower fares are ranked higher in listings and, consequently, are more likely to be selected by passengers. This competitive ranking mechanism creates a strong incentive for service providers to adopt coupon strategies that lower prices to secure a greater number of orders, as order volume directly influences their long-term viability and sustainability. Thus, designing an effective coupon strategy that can dynamically adapt to market fluctuations while optimizing order acquisition under budget constraints is a critical research challenge. However, existing studies in this area remain scarce. To bridge this gap, we propose FCA-RL, a novel reinforcement learning-based subsidy strategy framework designed to rapidly adapt to competitors’ pricing adjustments. Our approach integrates two key techniques: Fast Competition Adaptation (FCA), which enables swift responses to dynamic price changes, and Reinforced Lagrangian Adjustment (RLA), which ensures adherence to budget constraints while optimizing coupon decisions on new price landscape. Furthermore, we introduce RideGym, the first dedicated simulation environment tailored for ride-hailing aggregators, facilitating comprehensive evaluation and benchmarking of different pricing strategies without compromising real-world operational efficiency. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches across diverse market conditions, highlighting its effectiveness in subsidy optimization for ride-hailing service providers.

nan


Article 1171

Title@2025-07-04 (5): ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Title: ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization ConceptMix++: Leveln des Spielfeldes in Text-zu-Bild-Benchmarking über iterative Prompt-Optimierung 概念Mix+++:通过迭代性即时优化,在文本到图像基准设定中将游戏场定位 2507.03275v1

Authors (5): Haosheng Gan, Berk Tinaz, Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on ConceptMix, our approach incorporates a multimodal optimization pipeline that leverages vision-language model feedback to refine prompts systematically. Through extensive experiments across multiple diffusion models, we show that optimized prompts significantly improve compositional generation performance, revealing previously hidden model capabilities and enabling fairer comparisons across T2I models. Our analysis reveals that certain visual concepts – such as spatial relationships and shapes – benefit more from optimization than others, suggesting that existing benchmarks systematically underestimate model performance in these categories. Additionally, we find strong cross-model transferability of optimized prompts, indicating shared preferences for effective prompt phrasing across models. These findings demonstrate that rigid benchmarking approaches may significantly underrepresent true model capabilities, while our framework provides more accurate assessment and insights for future development.

nan


Article 1172

Title@2025-07-04 (5): DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

Title: DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift DNN-basierte Vorkodierung in RIS-gestützten mmWave MIMO-Systemen mit praktischer Phasenverschiebung 以DNN为基础的RIS-援助毫米 wave MIMO系统实际阶段调整前编码 2507.02824v2

Authors (6): Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang

In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The traditional exhaustive search (ES) for optimal codewords in the continuous phase shift is computationally intensive and time-consuming. To reduce computational complexity, permuted discrete Fourier transform (DFT) vectors are used for finding codebook design, incorporating amplitude responses for practical or ideal RIS systems. However, even if the discrete phase shift is adopted in the ES, it results in significant computation and is time-consuming. Instead, the trained deep neural network (DNN) is developed to facilitate faster codeword selection. Simulation results show that the DNN maintains sub-optimal spectral efficiency even as the distance between the end-user and the RIS has variations in the testing phase. These results highlight the potential of DNN in advancing RIS-aided systems.

nan


Article 1173

Title@2025-07-04 (5): LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

Title: LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference LILI-Clustering-Algorithmus: Limit Inferior Leaf Interval Integrated in Causal Forest for Causal Interference LLIL 群集算法: 将受难者叶叶之间结合的限值纳入Causal Forest 2507.03271v1

Authors (3): Yiran Dong, Di Fan, Chuanhou Gao

Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.

nan


Article 1174

Title@2025-07-04 (5): On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

Title: On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization On-Policy-Optimierung von ANFIS-Politiken unter Verwendung einer proximalen Politikoptimierung ANFIS政策采用最优政策优化的政策优化 2507.01039v2

Authors (3): Kaaustaaub Shankar, Wilhelm Louw, Kelly Cohen

We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO’s potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.

nan


Article 1175

Title@2025-07-04 (5): Subject Invariant Contrastive Learning for Human Activity Recognition

Title: Subject Invariant Contrastive Learning for Human Activity Recognition Thema Invariant Kontrastives Lernen für die Anerkennung menschlicher Aktivität 为确认人类活动而进行各种差异学习的主题 2507.03250v1

Authors (4): Yavuz Yarici, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib

The high cost of annotating data makes self-supervised approaches, such as contrastive learning methods, appealing for Human Activity Recognition (HAR). Effective contrastive learning relies on selecting informative positive and negative samples. However, HAR sensor signals are subject to significant domain shifts caused by subject variability. These domain shifts hinder model generalization to unseen subjects by embedding subject-specific variations rather than activity-specific features. As a result, human activity recognition models trained with contrastive learning often struggle to generalize to new subjects. We introduce Subject-Invariant Contrastive Learning (SICL), a simple yet effective loss function to improve generalization in human activity recognition. SICL re-weights negative pairs drawn from the same subject to suppress subject-specific cues and emphasize activity-specific information. We evaluate our loss function on three public benchmarks: UTD-MHAD, MMAct, and DARai. We show that SICL improves performance by up to 11% over traditional contrastive learning methods. Additionally, we demonstrate the adaptability of our loss function across various settings, including multiple self-supervised methods, multimodal scenarios, and supervised learning frameworks.

nan


Article 1176

Title@2025-07-04 (5): EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Title: EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v2

Authors (9): Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou

Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.

nan


Article 1177

Title@2025-07-04 (5): Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach

Title: Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach Bewertung der Auswirkungen mehrerer DER Aggregatoren auf die Energiemärkte des Großhandels: Ein hybrider mittlerer Feldansatz 评价多个DER聚合企业对批发能源市场的影响:混合的实地办法 2409.00107v2

Authors (2): Jun He, Andrew L. Liu

The integration of distributed energy resources (DERs) into wholesale energy markets can greatly enhance grid flexibility, improve market efficiency, and contribute to a more sustainable energy future. As DERs – such as solar PV panels and energy storage – proliferate, effective mechanisms are needed to ensure that small prosumers can participate meaningfully in these markets. We study a wholesale market model featuring multiple DER aggregators, each controlling a portfolio of DER resources and bidding into the market on behalf of the DER asset owners. The key of our approach lies in recognizing the repeated nature of market interactions the ability of participants to learn and adapt over time. Specifically, Aggregators repeatedly interact with each other and with other suppliers in the wholesale market, collectively shaping wholesale electricity prices (aka the locational marginal prices (LMPs)). We model this multi-agent interaction using a mean-field game (MFG), which uses market information – reflecting the average behavior of market participants – to enable each aggregator to predict long-term LMP trends and make informed decisions. For each aggregator, because they control the DERs within their portfolio under certain contract structures, we employ a mean-field control (MFC) approach (as opposed to a MFG) to learn an optimal policy that maximizes the total rewards of the DERs under their management. We also propose a reinforcement learning (RL)-based method to help each agent learn optimal strategies within the MFG framework, enhancing their ability to adapt to market conditions and uncertainties. Numerical simulations show that LMPs quickly reach a steady state in the hybrid mean-field approach. Furthermore, our results demonstrate that the combination of energy storage and mean-field learning significantly reduces price volatility compared to scenarios without storage.

nan


Article 1178

Title@2025-07-04 (5): Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks

Title: Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks Fault Sneaking Attack: Ein Stealthy-Framework für fehlführende tiefe neurale Netzwerke 失事偷偷攻击:错误领导深神经网络的隐形框架 1905.12032v2

Authors (6): Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, Xue Lin

Despite the great achievements of deep neural networks (DNNs), the vulnerability of state-of-the-art DNNs raises security concerns of DNNs in many application domains requiring high reliability.We propose the fault sneaking attack on DNNs, where the adversary aims to misclassify certain input images into any target labels by modifying the DNN parameters. We apply ADMM (alternating direction method of multipliers) for solving the optimization problem of the fault sneaking attack with two constraints: 1) the classification of the other images should be unchanged and 2) the parameter modifications should be minimized. Specifically, the first constraint requires us not only to inject designated faults (misclassifications), but also to hide the faults for stealthy or sneaking considerations by maintaining model accuracy. The second constraint requires us to minimize the parameter modifications (using L0 norm to measure the number of modifications and L2 norm to measure the magnitude of modifications). Comprehensive experimental evaluation demonstrates that the proposed framework can inject multiple sneaking faults without losing the overall test accuracy performance.

nan


Article 1179

Title@2025-07-04 (5): Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Title: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models Neural Diskrete Token Repräsentanz Lernen für extreme Token Reduction in Video Große Sprachmodelle 在视频大语言模式中进行关于极端减少视频大语言模式的神经分辨代言人学习 2503.16980v4

Authors (2): Haichao Zhang, Yun Fu

Token-based video representation has emerged as a promising approach for enabling large language models (LLMs) to interpret video content. However, existing token reduction techniques, such as pruning and merging, often disrupt essential positional embeddings and rely on continuous visual tokens sampled from nearby pixels with similar spatial-temporal locations. By removing only a small fraction of tokens, these methods still produce relatively lengthy continuous sequences, which falls short of the extreme compression required to balance computational efficiency and token count in video LLMs. In this paper, we introduce the novel task of Extreme Short Token Reduction, which aims to represent entire videos using a minimal set of discrete tokens. We propose VQToken, a neural discrete token representation framework that (i) applies adaptive vector quantization to continuous ViT embeddings to learn a compact codebook and (ii) preserves spatial-temporal positions via a token hash function by assigning each grid-level token to its nearest codebook entry. On the Extreme Short Token Reduction task, our VQToken compresses sequences to just 0.07 percent of their original length while incurring only a 0.66 percent drop in accuracy on the NextQA-MC benchmark. It also achieves comparable performance on ActNet-QA, Long Video Bench, and VideoMME. We further introduce the Token Information Density (TokDense) metric and formalize fixed-length and adaptive-length subtasks, achieving state-of-the-art results in both settings. Our approach dramatically lowers theoretical complexity, increases information density, drastically reduces token counts, and enables efficient video LLMs in resource-constrained environments.

nan


Article 1180

Title@2025-07-04 (5): Federated Continual Learning: Concepts, Challenges, and Solutions

Title: Federated Continual Learning: Concepts, Challenges, and Solutions Federated Continual Learning: Konzepte, Herausforderungen und Lösungen 联邦持续性学习:概念、挑战和解决方案 2502.07059v2

Authors (3): Parisa Hamedi, Roozbeh Razavi-Far, Ehsan Hallaji

Federated Continual Learning (FCL) has emerged as a robust solution for collaborative model training in dynamic environments, where data samples are continuously generated and distributed across multiple devices. This survey provides a comprehensive review of FCL, focusing on key challenges such as heterogeneity, model stability, communication overhead, and privacy preservation. We explore various forms of heterogeneity and their impact on model performance. Solutions to non-IID data, resource-constrained platforms, and personalized learning are reviewed in an effort to show the complexities of handling heterogeneous data distributions. Next, we review techniques for ensuring model stability and avoiding catastrophic forgetting, which are critical in non-stationary environments. Privacy-preserving techniques are another aspect of FCL that have been reviewed in this work. This survey has integrated insights from federated learning and continual learning to present strategies for improving the efficacy and scalability of FCL systems, making it applicable to a wide range of real-world scenarios.

nan


Article 1181

Title@2025-07-04 (5): 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement

Title: 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v6

Authors (19): Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Arash Akbari, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Weiyan Shi, Xingchen Xu, Yu Huang, Wei Jiang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.

nan


Article 1182

Title@2025-07-03 (4): SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models

Title: SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models SI-Agent: Agentischer Rahmen für feedbackgetriebene Erzeugung und Tuning von menschenlesbaren Systemanweisungen für große Sprachmodelle SI-Agent:用于大语言模型的可读人类系统指令反馈驱动生成和调试的代理框架 2507.03223v1

Authors (1): Jeshwanth Challagundla

System Instructions (SIs), or system prompts, are pivotal for guiding Large Language Models (LLMs) but manual crafting is resource-intensive and often suboptimal. Existing automated methods frequently generate non-human-readable “soft prompts,” sacrificing interpretability. This paper introduces SI-Agent, a novel agentic framework designed to automatically generate and iteratively refine human-readable SIs through a feedback-driven loop. SI-Agent employs three collaborating agents: an Instructor Agent, an Instruction Follower Agent (target LLM), and a Feedback/Reward Agent evaluating task performance and optionally SI readability. The framework utilizes iterative cycles where feedback guides the Instructor’s refinement strategy (e.g., LLM-based editing, evolutionary algorithms). We detail the framework’s architecture, agent roles, the iterative refinement process, and contrast it with existing methods. We present experimental results validating SI-Agent’s effectiveness, focusing on metrics for task performance, SI readability, and efficiency. Our findings indicate that SI-Agent generates effective, readable SIs, offering a favorable trade-off between performance and interpretability compared to baselines. Potential implications include democratizing LLM customization and enhancing model transparency. Challenges related to computational cost and feedback reliability are acknowledged.

nan


Article 1183

Title@2025-07-03 (4): Neural Inhibition Improves Dynamic Routing and Mixture of Experts

Title: Neural Inhibition Improves Dynamic Routing and Mixture of Experts Neurale Hemmung verbessert dynamische Routing und Mischung von Experten 改善专家动态路由和混合 2507.03221v1

Authors (2): Will Y. Zou, Jennifer Y. Zhang

To be effective, efficient, and diverse, deep learning models need to dynamically choose its architecture based on signals from a population of neurons. We hypothesize dynamic routing models can be improved with neural inhibition in those neural populations. This means signals commonly shared among the various modes of data statistics can be inhibited so that the routing model can choose a specialized expert path for each data sample. Only through inhibition is the routing mechanism able to effectively select neural pathways. We believe this is an under-studied and under-verified implementation methodology for Mixture-of-Experts, dynamic routing, and transformer language models. We provide experimental evidence that the neural inhibition algorithm significantly boosts the performance of general tasks and motivates more effort to be invested in this research direction.

nan


Article 1184

Title@2025-07-03 (4): Symbiosis: Multi-Adapter Inference and Fine-Tuning

Title: Symbiosis: Multi-Adapter Inference and Fine-Tuning Symbiose: Multi-Adapter-Schlussfolgerung und Feinabstimmung 共生关系:多位开发商的推断和精准调整 2507.03220v1

Authors (4): Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman

Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task specific parameters into adapters, which are a fraction of the size of the original base model. Popularity of PEFT technique for fine-tuning has led to creation of a large number of adapters for popular Large Language Models (LLMs). However, existing frameworks fall short in supporting inference or fine-tuning with multiple adapters in the following ways. 1) For fine-tuning, each job needs to deploy its dedicated base model instance, which results in excessive GPU memory consumption and poor GPU utilization. 2) While popular inference platforms can serve multiple PEFT adapters, they do not allow independent resource management or mixing of different PEFT methods. 3) They cannot share resources (such as base model instance) between inference and fine-tuning jobs. 4) They do not provide privacy to users who may not wish to expose their fine-tuned parameters to service providers. In Symbiosis, we address the above problems by enabling as-a-service deployment of base model. The base model layers can be shared across multiple inference or fine-tuning processes. Our split-execution technique decouples the execution of client-specific adapters and layers from the frozen base model layers offering them flexibility to manage their resources, to select their fine-tuning method, to achieve their performance goals. Our approach is transparent to models and works out-of-the-box for most models in the transformers library. Our evaluation on Llama2-13B shows the compared to baseline, Symbiosis can fine-tune 4X more adapters on the same set of GPUs in the same amount of time.

nan


Article 1185

Title@2025-07-03 (4): Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data

Title: Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data Weiterentwicklung der Präzisions-Onkologie durch Modellierung von Längs- und Multimodaldaten 通过建立纵向和多模式数据模型,推进精密肿瘤学 2502.07836v3

Authors (6): Luoting Zhuang, Stephen H. Park, Steven J. Skates, Ashley E. Prosper, Denise R. Aberle, William Hsu

Cancer evolves continuously over time through a complex interplay of genetic, epigenetic, microenvironmental, and phenotypic changes. This dynamic behavior drives uncontrolled cell growth, metastasis, immune evasion, and therapy resistance, posing challenges for effective monitoring and treatment. However, today’s data-driven research in oncology has primarily focused on cross-sectional analysis using data from a single modality, limiting the ability to fully characterize and interpret the disease’s dynamic heterogeneity. Advances in multiscale data collection and computational methods now enable the discovery of longitudinal multimodal biomarkers for precision oncology. Longitudinal data reveal patterns of disease progression and treatment response that are not evident from single-timepoint data, enabling timely abnormality detection and dynamic treatment adaptation. Multimodal data integration offers complementary information from diverse sources for more precise risk assessment and targeting of cancer therapy. In this review, we survey methods of longitudinal and multimodal modeling, highlighting their synergy in providing multifaceted insights for personalized care tailored to the unique characteristics of a patient’s cancer. We summarize the current challenges and future directions of longitudinal multimodal analysis in advancing precision oncology.

nan


Article 1186

Title@2025-07-03 (4): Fast Algorithm for Constrained Linear Inverse Problems

Title: Fast Algorithm for Constrained Linear Inverse Problems Schneller Algorithmus für eingeschränkte lineare Inverse Probleme 限制线性反向问题快速算法 2212.01068v7

Authors (3): Mohammed Rayyan Sheriff, Floor Fenne Redel, Peyman Mohajerin Esfahani

We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ norm) is minimized subject to a quadratic constraint. Typically, such cost functions are non-differentiable, which makes them not amenable to the fast optimization methods existing in practice. We propose two equivalent reformulations of the constrained LIP with improved convex regularity: (i) a smooth convex minimization problem, and (ii) a strongly convex min-max problem. These problems could be solved by applying existing acceleration-based convex optimization methods which provide better $ O \left( \frac{1}{k^2} \right) $ theoretical convergence guarantee, improving upon the current best rate of $ O \left( \frac{1}{k} \right) $. We also provide a novel algorithm named the Fast Linear Inverse Problem Solver (FLIPS), which is tailored to maximally exploit the structure of the reformulations. We demonstrate the performance of FLIPS on the classical problems of Binary Selection, Compressed Sensing, and Image Denoising. We also provide an open-source package for these three examples, which can be easily adapted to other LIPs.

nan


Article 1187

Title@2025-07-03 (4): Structured Difference-of-Q via Orthogonal Learning

Title: Structured Difference-of-Q via Orthogonal Learning Strukturierter Differenz-of-Q über Orthogonales Lernen 通过矫形学习实现结构差异 2406.08697v3

Authors (2): Defu Cao, Angela Zhou

Offline reinforcement learning is important in many settings with available observational data but the inability to deploy new policies online due to safety, cost, and other concerns. Many recent advances in causal inference and machine learning target estimation of causal contrast functions such as CATE, which is sufficient for optimizing decisions and can adapt to potentially smoother structure. We develop a dynamic generalization of the R-learner (Nie and Wager 2021, Lewis and Syrgkanis 2021) for estimating and optimizing the difference of $Q^\pi$-functions, $Q^\pi(s,1)-Q^\pi(s,0)$ (which can be used to optimize multiple-valued actions). We leverage orthogonal estimation to improve convergence rates in the presence of slower nuisance estimation rates and prove consistency of policy optimization under a margin condition. The method can leverage black-box nuisance estimators of the $Q$-function and behavior policy to target estimation of a more structured $Q$-function contrast.

nan


Article 1188

Title@2025-07-03 (4): DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing

Title: DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing DistZO2: High-Throughput und Memory-Efficient Zeroth-Order Fine-Tuning LLMs mit verteiltem Parallel Computing DistrZO 2 : 具有分布式平行计算法的高吞压和内存-内存-有效 Zero- Ord 微调微调LMLM 2507.03211v1

Authors (3): Liangyu Wang, Huanyi Xie, Di Wang

Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passes, its application to multi-hundred-billion-parameter models is constrained by GPU memory and compute throughput. The ZO2 framework addresses the memory bottleneck by offloading model parameters to CPU memory and overlapping transformer block transfer with dual forward computation on a single GPU. However, ZO2 remains limited by its single-device execution and achieves modest throughput. In this work, we present DistZO2, a high-throughput, memory-efficient framework for distributed zeroth-order fine-tuning of LLMs. DistZO2 introduces three parallel strategies: (1) Perturbation Parallelism (PertP), which parallelizes the two perturbed forward passes across devices; (2) Distributed Data Parallelism (DDP), adapted to the scalar-gradient nature of ZO training; and (3) a unified 2D Parallelism design that combines PertP and DDP. To further mitigate communication bottlenecks introduced by parameter offloading, we propose a hardware-aware communication strategy that slices parameter blocks and redistributes them across GPUs via high-speed interconnects such as NVLink. DistZO2 scales zeroth-order fine-tuning to modern multi-GPU systems, preserving ZO2’s memory efficiency while substantially improving training throughput. In our experiments on OPT-175B, DistZO2 achieves a 3x speedup over ZO2 with distributed computing. DistZO2’s code has been open-sourced in https://github.com/liangyuwang/zo2.

nan


Article 1189

Title@2025-07-03 (4): LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction

Title: LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction LANTERN: Ein maschinelles Lern-Framework für Lipid-Nanopartikel-Transfektions-Effizienz-Vorhersage LANTERN:利皮纳米粒子转化效率预测的机械学习框架 2507.03209v1

Authors (7): Asal Mehradfar, Mohammad Shahab Sepehri, Jose Miguel Hernandez-Lobato, Glen S. Kwon, Mahdi Soltanolkotabi, Salman Avestimehr, Morteza Rasoulianboroujeni

The discovery of new ionizable lipids for efficient lipid nanoparticle (LNP)-mediated RNA delivery remains a critical bottleneck for RNA-based therapeutics development. Recent advances have highlighted the potential of machine learning (ML) to predict transfection efficiency from molecular structure, enabling high-throughput virtual screening and accelerating lead identification. However, existing approaches are hindered by inadequate data quality, ineffective feature representations, low predictive accuracy, and poor generalizability. Here, we present LANTERN (Lipid nANoparticle Transfection Efficiency pRedictioN), a robust ML framework for predicting transfection efficiency based on ionizable lipid representation. We benchmarked a diverse set of ML models against AGILE, a previously published model developed for transfection prediction. Our results show that combining simpler models with chemically informative features, particularly count-based Morgan fingerprints, outperforms more complex models that rely on internally learned embeddings, such as AGILE. We also show that a multi-layer perceptron trained on a combination of Morgan fingerprints and Expert descriptors achieved the highest performance ($\text{R}^2$ = 0.8161, r = 0.9053), significantly exceeding AGILE ($\text{R}^2$ = 0.2655, r = 0.5488). We show that the models in LANTERN consistently have strong performance across multiple evaluation metrics. Thus, LANTERN offers a robust benchmarking framework for LNP transfection prediction and serves as a valuable tool for accelerating lipid-based RNA delivery systems design.

nan


Article 1190

Title@2025-07-03 (4): Statistical Significance of Feature Importance Rankings

Title: Statistical Significance of Feature Importance Rankings Statistische Signifikanz der Merkmals-Wichtigitäts-Rankings 地物重要性排名的统计意义 2401.15800v5

Authors (2): Jeremy Goldwasser, Giles Hooker

Feature importance scores are ubiquitous tools for understanding the predictions of machine learning models. However, many popular attribution methods suffer from high instability due to random sampling. Leveraging novel ideas from hypothesis testing, we devise techniques that ensure the most important features are correct with high-probability guarantees. These assess the set of $K$ top-ranked features, as well as the order of its elements. Given a set of local or global importance scores, we demonstrate how to retrospectively verify the stability of the highest ranks. We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-\alpha$. The theoretical justification for these procedures is validated empirically on SHAP and LIME.

nan


Article 1191

Title@2025-07-03 (4): Weak Form Scientific Machine Learning: Test Function Construction for System Identification

Title: Weak Form Scientific Machine Learning: Test Function Construction for System Identification Schwache Form des wissenschaftlichen maschinellen Lernens: Testfunktionsaufbau für die Systemidentifizierung 微弱形式科学机器学习:系统识别的测试功能建设 2507.03206v1

Authors (2): April Tran, David Bortz

Weak form Scientific Machine Learning (WSciML) is a recently developed framework for data-driven modeling and scientific discovery. It leverages the weak form of equation error residuals to provide enhanced noise robustness in system identification via convolving model equations with test functions, reformulating the problem to avoid direct differentiation of data. The performance, however, relies on wisely choosing a set of compactly supported test functions. In this work, we mathematically motivate a novel data-driven method for constructing Single-scale-Local reference functions for creating the set of test functions. Our approach numerically approximates the integration error introduced by the quadrature and identifies the support size for which the error is minimal, without requiring access to the model parameter values. Through numerical experiments across various models, noise levels, and temporal resolutions, we demonstrate that the selected supports consistently align with regions of minimal parameter estimation error. We also compare the proposed method against the strategy for constructing Multi-scale-Global (and orthogonal) test functions introduced in our prior work, demonstrating the improved computational efficiency.

nan


Article 1192

Title@2025-07-03 (4): On the Adversarial Robustness of Graph Neural Networks with Graph Reduction

Title: On the Adversarial Robustness of Graph Neural Networks with Graph Reduction Auf der adversarialen Robustheit von Graphen-Neuralen Netzwerken mit Graphenreduktion 关于具有缩小图量的图形神经网络的逆向强力 2412.05883v2

Authors (4): Kerui Wu, Ka-Ho Chow, Wenqi Wei, Lei Yu

As Graph Neural Networks (GNNs) become increasingly popular for learning from large-scale graph data across various domains, their susceptibility to adversarial attacks when using graph reduction techniques for scalability remains underexplored. In this paper, we present an extensive empirical study to investigate the impact of graph reduction techniques, specifically graph coarsening and sparsification, on the robustness of GNNs against adversarial attacks. Through extensive experiments involving multiple datasets and GNN architectures, we examine the effects of four sparsification and six coarsening methods on the poisoning attacks. Our results indicate that, while graph sparsification can mitigate the effectiveness of certain poisoning attacks, such as Mettack, it has limited impact on others, like PGD. Conversely, graph coarsening tends to amplify the adversarial impact, significantly reducing classification accuracy as the reduction ratio decreases. Additionally, we provide a novel analysis of the causes driving these effects and examine how defensive GNN models perform under graph reduction, offering practical insights for designing robust GNNs within graph acceleration systems.

nan


Article 1193

Title@2025-07-03 (4): Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding

Title: Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding Quantifizierung der Cross-Attention-Interaktion in Transformern zur Interpretation von TCR-pMHC-Bindung 为解释TRCR-pMHC而对变压器中交叉注意互动进行量化 2507.03197v1

Authors (6): Jiarui Li, Zixiang Yin, Haley Smith, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu

CD8+ “killer” T cells and CD4+ “helper” T cells play a central role in the adaptive immune system by recognizing antigens presented by Major Histocompatibility Complex (pMHC) molecules via T Cell Receptors (TCRs). Modeling binding between T cells and the pMHC complex is fundamental to understanding basic mechanisms of human immune response as well as in developing therapies. While transformer-based models such as TULIP have achieved impressive performance in this domain, their black-box nature precludes interpretability and thus limits a deeper mechanistic understanding of T cell response. Most existing post-hoc explainable AI (XAI) methods are confined to encoder-only, co-attention, or model-specific architectures and cannot handle encoder-decoder transformers used in TCR-pMHC modeling. To address this gap, we propose Quantifying Cross-Attention Interaction (QCAI), a new post-hoc method designed to interpret the cross-attention mechanisms in transformer decoders. Quantitative evaluation is a challenge for XAI methods; we have compiled TCR-XAI, a benchmark consisting of 274 experimentally determined TCR-pMHC structures to serve as ground truth for binding. Using these structures we compute physical distances between relevant amino acid residues in the TCR-pMHC interaction region and evaluate how well our method and others estimate the importance of residues in this region across the dataset. We show that QCAI achieves state-of-the-art performance on both interpretability and prediction accuracy under the TCR-XAI benchmark.

nan


Article 1194

Title@2025-07-03 (4): CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

Title: CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning CODE-CL: Konzeptorbasierte Gradientenprojektion für tiefes kontinuierliches Lernen CODE-CL:深海持续学习基于概念的逐步预测 2411.15235v3

Authors (3): Marco Paul E. Apolinario, Sakshi Choudhary, Kaushik Roy

Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL’s efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.

nan


Article 1195

Title@2025-07-03 (4): Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

Title: Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity Steigerung der Robustheit im Preference-Based Verstärkungs-Lernen mit dynamischer Sparsamkeit 以动态平等促进基于优惠的强化学习的强力 2406.06495v3

Authors (6): Calarina Muslimani, Bram Grooten, Deepak Ranganatha Sastry Mamillapalli, Mykola Pechenizkiy, Decebal Constantin Mocanu, Matthew E. Taylor

To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments.

nan


Article 1196

Title@2025-07-03 (4): Discovering Algorithms with Computational Language Processing

Title: Discovering Algorithms with Computational Language Processing Algorithmen mit numerischer Sprachverarbeitung entdecken 使用计算语言语言处理发现算法 2507.03190v1

Authors (4): Theo Bourdais, Abeynaya Gnanasekaran, Houman Owhadi, Tuhin Sahai

Algorithms are the engine for reproducible problem-solving. We present a framework automating algorithm discovery by conceptualizing them as sequences of operations, represented as tokens. These computational tokens are chained using a grammar, enabling the formation of increasingly sophisticated procedures. Our ensemble Monte Carlo tree search (MCTS) guided by reinforcement learning (RL) explores token chaining and drives the creation of new tokens. This methodology rediscovers, improves, and generates new algorithms that substantially outperform existing methods for strongly NP-hard combinatorial optimization problems and foundational quantum computing approaches such as Grover’s and Quantum Approximate Optimization Algorithm. Operating at the computational rather than code-generation level, our framework produces algorithms that can be tailored specifically to problem instances, not merely classes.

nan


Article 1197

Title@2025-07-03 (4): First Contact: Data-driven Friction-Stir Process Control

Title: First Contact: Data-driven Friction-Stir Process Control Erster Kontakt: Datengesteuerte Friktions-Stir Prozesssteuerung 第一次联系:数据驱动摩擦-系统进程控制 2507.03177v1

Authors (7): James Koch, Ethan King, WoongJo Choi, Megan Ebers, David Garcia, Ken Ross, Keerti Kappagantula

This study validates the use of Neural Lumped Parameter Differential Equations for open-loop setpoint control of the plunge sequence in Friction Stir Processing (FSP). The approach integrates a data-driven framework with classical heat transfer techniques to predict tool temperatures, informing control strategies. By utilizing a trained Neural Lumped Parameter Differential Equation model, we translate theoretical predictions into practical set-point control, facilitating rapid attainment of desired tool temperatures and ensuring consistent thermomechanical states during FSP. This study covers the design, implementation, and experimental validation of our control approach, establishing a foundation for efficient, adaptive FSP operations.

nan


Article 1198

Title@2025-07-03 (4): Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

Title: Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies Deep Learning Atmospheric Models zuverlässig simulieren außerhalb der Probelandwärme und Kaltwellenfrequenzen 可可靠模拟地模拟地热和冷波碎片 2507.03176v1

Authors (4): Zilu Meng, Gregory J. Hakim, Wenchang Yang, Gabriel A. Vecchi

Deep learning (DL)-based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models – the hybrid Neural General Circulation Model (NGCM) and purely data-driven Deep Learning Earth System Model (DL\textit{ESy}M) – against a conventional high-resolution land-atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900-2020, focusing on the out-of-sample early-20th-century period (1900-1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and cold wave events during 1900-1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940-1960. Due to excessive temperature autocorrelation, DL\textit{ESy}M tends to overestimate heatwave and cold wave frequencies, whereas the physics-DL hybrid NGCM exhibits persistence more similar to HiRAM.

nan


Article 1199

Title@2025-07-03 (4): Understanding Knowledge Transferability for Transfer Learning: A Survey

Title: Understanding Knowledge Transferability for Transfer Learning: A Survey Wissensübertragbarkeit für Transfer Learning verstehen: Eine Umfrage 了解知识转让转让学习的可转让性:调查 2507.03175v1

Authors (12): Haohua Wang, Jingge Wang, Zijie Zhao, Yang Tan, Yanru Wu, Hanbing Liu, Jingyun Yang, Enming Zhang, Xiangyu Chen, Zhengze Rong, Shanxin Guo, Yang Li

Transfer learning has become an essential paradigm in artificial intelligence, enabling the transfer of knowledge from a source task to improve performance on a target task. This approach, particularly through techniques such as pretraining and fine-tuning, has seen significant success in fields like computer vision and natural language processing. However, despite its widespread use, how to reliably assess the transferability of knowledge remains a challenge. Understanding the theoretical underpinnings of each transferability metric is critical for ensuring the success of transfer learning. In this survey, we provide a unified taxonomy of transferability metrics, categorizing them based on transferable knowledge types and measurement granularity. This work examines the various metrics developed to evaluate the potential of source knowledge for transfer learning and their applicability across different learning paradigms emphasizing the need for careful selection of these metrics. By offering insights into how different metrics work under varying conditions, this survey aims to guide researchers and practitioners in selecting the most appropriate metric for specific applications, contributing to more efficient, reliable, and trustworthy AI systems. Finally, we discuss some open challenges in this field and propose future research directions to further advance the application of transferability metrics in trustworthy transfer learning.

nan


Article 1200

Title@2025-07-03 (4): Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data

Title: Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data Latent Thermodynamic Flows: Unified Representation Learning und Generative Modellierung von temperaturabhängigen Verhaltensmustern aus begrenzten Daten 冷热动力流动:从有限数据中产生单一代表学习和产生温度依附行为行为模型 2507.03174v1

Authors (4): Yunrui Qiu, Richard John, Lukas Herron, Pratyush Tiwary

Accurate characterization of the equilibrium distributions of complex molecular systems and their dependence on environmental factors such as temperature is essential for understanding thermodynamic properties and transition mechanisms. Projecting these distributions onto meaningful low-dimensional representations enables interpretability and downstream analysis. Recent advances in generative AI, particularly flow models such as Normalizing Flows (NFs), have shown promise in modeling such distributions, but their scope is limited without tailored representation learning. In this work, we introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling. LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations, referred to as Collective Variables (CVs), classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The two components of representation learning and generative modeling are optimized jointly, ensuring that the learned latent features capture the system’s slow, important degrees of freedom while the generative model accurately reproduces the system’s equilibrium behavior. We demonstrate LaTF’s effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Finally, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.

nan


Article 1201

Title@2025-07-03 (4): A Neurosymbolic Framework for Geometric Reduction of Binary Forms

Title: A Neurosymbolic Framework for Geometric Reduction of Binary Forms Ein neurosymbolisches Framework zur geometrischen Reduktion von Binärformen 二进制几何减少二进制的纽罗申姆伯利框架 2501.15404v2

Authors (2): Ilias Kotsireas, Tony Shaska

This paper compares Julia reduction and hyperbolic reduction with the aim of finding equivalent binary forms with minimal coefficients. We demonstrate that hyperbolic reduction generally outperforms Julia reduction, particularly in the cases of sextics and decimics, though neither method guarantees achieving the minimal form. We further propose an additional shift and scaling to approximate the minimal form more closely. Finally, we introduce a machine learning framework to identify optimal transformations that minimize the heights of binary forms. This study provides new insights into the geometry and algebra of binary forms and highlights the potential of AI in advancing symbolic computation and reduction techniques. The findings, supported by extensive computational experiments, lay the groundwork for hybrid approaches that integrate traditional reduction methods with data-driven techniques.

nan


Article 1202

Title@2025-07-03 (4): Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation

Title: Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation Über SEO hinaus: Ein transformerbasierter Ansatz zur Neuerfindung von Web Content Optimierung 超越SEO:改造网络内容优化的变革型办法 2507.03169v1

Authors (3): Florian Lüttgenau, Imar Colic, Gervasio Ramirez

The rise of generative AI search engines is disrupting traditional SEO, with Gartner predicting 25% reduction in conventional search usage by 2026. This necessitates new approaches for web content visibility in AI-driven search environments. We present a domain-specific fine-tuning approach for Generative Engine Optimization (GEO) that transforms web content to improve discoverability in large language model outputs. Our method fine-tunes a BART-base transformer on synthetically generated training data comprising 1,905 cleaned travel website content pairs. Each pair consists of raw website text and its GEO-optimized counterpart incorporating credible citations, statistical evidence, and improved linguistic fluency. We evaluate using intrinsic metrics (ROUGE-L, BLEU) and extrinsic visibility assessments through controlled experiments with Llama-3.3-70B. The fine-tuned model achieves significant improvements over baseline BART: ROUGE-L scores of 0.249 (vs. 0.226) and BLEU scores of 0.200 (vs. 0.173). Most importantly, optimized content demonstrates substantial visibility gains in generative search responses with 15.63% improvement in absolute word count and 30.96% improvement in position-adjusted word count metrics. This work provides the first empirical demonstration that targeted transformer fine-tuning can effectively enhance web content visibility in generative search engines with modest computational resources. Our results suggest GEO represents a tractable approach for content optimization in the AI-driven search landscape, offering concrete evidence that small-scale, domain-focused fine-tuning yields meaningful improvements in content discoverability.

nan


Article 1203

Title@2025-07-03 (4): Adopting a human developmental visual diet yields robust, shape-based AI vision

Title: Adopting a human developmental visual diet yields robust, shape-based AI vision Die Annahme einer visuellen Ernährung für die menschliche Entwicklung führt zu einer robusten, formenbasierten KI-Vision 采用人文发展视觉饮食,产生稳健的、基于形状的人工智能愿景 2507.03168v1

Authors (4): Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann

Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.

nan


Article 1204

Title@2025-07-03 (4): Adversarial Manipulation of Reasoning Models using Internal Representations

Title: Adversarial Manipulation of Reasoning Models using Internal Representations Adversariale Manipulation von Vernunftmodellen mittels interner Repräsentationen 采用内部代表制对理性模型进行反向调整 2507.03167v1

Authors (3): Kureha Yamaguchi, Benjamin Etheridge, Andy Arditi

Reasoning models generate chain-of-thought (CoT) tokens before their final output, but how this affects their vulnerability to jailbreak attacks remains unclear. While traditional language models make refusal decisions at the prompt-response boundary, we find evidence that DeepSeek-R1-Distill-Llama-8B makes these decisions within its CoT generation. We identify a linear direction in activation space during CoT token generation that predicts whether the model will refuse or comply – termed the “caution” direction because it corresponds to cautious reasoning patterns in the generated text. Ablating this direction from model activations increases harmful compliance, effectively jailbreaking the model. We additionally show that intervening only on CoT token activations suffices to control final outputs, and that incorporating this direction into prompt-based attacks improves success rates. Our findings suggest that the chain-of-thought itself is a promising new target for adversarial manipulation in reasoning models. Code available at https://github.com/ky295/reasoning-manipulation

nan


Article 1205

Title@2025-07-03 (4): PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Title: PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset PiCME: Pipeline zur kontrastiven Modalitätsbewertung und Kodierung im MIMIC Datensatz PICME:MIMIMIMIC数据集中的反竞争模式评估和编码管道 2507.03165v1

Authors (4): Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, Ritambhara Singh

Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

nan


Article 1206

Title@2025-07-03 (4): MathOptAI.jl: Embed trained machine learning predictors into JuMP models

Title: MathOptAI.jl: Embed trained machine learning predictors into JuMP models MathOptAI.jl: Einbetten ausgebildeter Maschinenlern-Prognostiker in JuMP-Modelle MathOptAI.jl: 嵌入式培训机器学习预测器,以进入JuMP模型 2507.03159v1

Authors (3): Oscar Dowson, Robert B Parker, Russel Bent

We present \texttt{MathOptAI.jl}, an open-source Julia library for embedding trained machine learning predictors into a JuMP model. \texttt{MathOptAI.jl} can embed a wide variety of neural networks, decision trees, and Gaussian Processes into a larger mathematical optimization model. In addition to interfacing a range of Julia-based machine learning libraries such as \texttt{Lux.jl} and \texttt{Flux.jl}, \texttt{MathOptAI.jl} uses Julia’s Python interface to provide support for PyTorch models. When the PyTorch support is combined with \texttt{MathOptAI.jl}’s gray-box formulation, the function, Jacobian, and Hessian evaluations associated with the PyTorch model are offloaded to the GPU in Python, while the rest of the nonlinear oracles are evaluated on the CPU in Julia. \MathOptAI is available at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

nan


Article 1207

Title@2025-07-03 (4): HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference

Title: HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference HGCA: Hybride GPU-CPU Aufmerksamkeit für lange Kontext LLM-Inferenz HGCA: 混合GPU-CPU注意长距离LLM 推理 2507.03153v1

Authors (9): Weishu Deng, Yujie Yang, Peiran Du, Lingfeng Xiang, Zhen Lin, Chen Zhong, Song Jiang, Hui Lu, Jia Rao

Scaling inference for large language models (LLMs) is increasingly constrained by limited GPU memory, especially due to growing key-value (KV) caches required for long-context generation. While existing approaches offload KV caches to CPU memory or apply sparse attention to reduce GPU load, they often underutilize CPU compute resources and compromise accuracy. We present HGCA, a hybrid CPU-GPU attention mechanism that enables scalable, high-throughput LLM inference with near-full attention quality. HGCA performs dense attention on recently generated KV entries retained in GPU memory and parallel sparse attention on selected, salient KV entries in CPU memory. The attention outputs are efficiently merged using log-sum-exp fusion, minimizing PCIe transfer overhead. HGCA also introduces a finegrained, per-head sparsification strategy optimized for CPU execution, preserving contextual relevance while reducing computation. Our implementation seamlessly integrates into existing LLM frameworks without requiring model retraining. Experiments across diverse models and workloads show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy – all on commodity GPU hardware.

nan


Article 1208

Title@2025-07-03 (4): Expert-level validation of AI-generated medical text with scalable language models

Title: Expert-level validation of AI-generated medical text with scalable language models Validierung von KI-generierten medizinischen Texten auf Expertenebene mit skalierbaren Sprachmodellen 专家一级对AI产生的带有可缩放语言模型的可缩放语言模型的医学文本进行鉴定 2507.03152v1

Authors (27): Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador-Martinez, Eduardo Juan Perez Guerrero, Paola Naovi Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy D. Zandee van Rilland, Poonam Laxmappa Hosamani, Kevin R Keet, Minjoung Go, Evelyn Ling, David B. Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay S. Chaudhari

With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the “LM-as-judge” paradigm (a LM evaluating another LM) offers scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a self-supervised framework that leverages synthetic data to train evaluator LMs to assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset containing 840 outputs annotated by physicians, following a physician-defined taxonomy of risk levels and error categories. Across 6 diverse medical tasks and 10 state-of-the-art LMs spanning open-source, proprietary, and medically adapted models, MedVAL fine-tuning significantly improves (p < 0.001) alignment with physicians on both seen and unseen tasks, increasing average F1 scores from 66% to 83%, with per-sample safety classification scores up to 86%. MedVAL improves the performance of even the best-performing proprietary LM (GPT-4o) by 8%. To support a scalable, risk-aware pathway towards clinical integration, we open-source the 1) codebase ( https://github.com/StanfordMIMI/MedVAL ), 2) MedVAL-Bench ( https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench ), and 3) MedVAL-4B ( https://huggingface.co/stanfordmimi/MedVAL-4B ), the best-performing open-source LM. Our research provides the first evidence of LMs approaching expert-level validation ability for medical text.

nan


Article 1209

Title@2025-07-03 (4): Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games

Title: Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games Last-Iterate Konvergenz von No-Regret-Learning für Equilibria in Schnäppchen Spiele 谈判运动会中不登记学习促进平衡的融合 2507.03150v1

Authors (3): Serafina Kamp, Reese Liebman, Benjamin Fish

Bargaining games, where agents attempt to agree on how to split utility, are an important class of games used to study economic behavior, which motivates a study of online learning algorithms in these games. In this work, we tackle when no-regret learning algorithms converge to Nash equilibria in bargaining games. Recent results have shown that online algorithms related to Follow the Regularized Leader (FTRL) converge to Nash equilibria (NE) in the last iterate in a wide variety of games, including zero-sum games. However, bargaining games do not have the properties used previously to established convergence guarantees, even in the simplest case of the ultimatum game, which features a single take-it-or-leave-it offer. Nonetheless, we establish that FTRL (without the modifications necessary for zero-sum games) achieves last-iterate convergence to an approximate NE in the ultimatum game along with a bound on convergence time under mild assumptions. Further, we provide experimental results to demonstrate that convergence to NE, including NE with asymmetric payoffs, occurs under a broad range of initial conditions, both in the ultimatum game and in bargaining games with multiple rounds. This work demonstrates how complex economic behavior (e.g. learning to use threats and the existence of many possible equilibrium outcomes) can result from using a simple learning algorithm, and that FTRL can converge to equilibria in a more diverse set of games than previously known.

nan


Article 1210

Title@2025-07-03 (4): Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models

Title: Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models Elemente des Weltwissens (EWoK): Ein von Kognition inspirierter Rahmen zur Bewertung des Grundwissens der Welt in Sprachmodellen 世界知识要素(EWoK): “ 评价基本世界语言模式知识的认知 – – 启发框架 “ 2405.09605v2

Authors (20): Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

The ability to build and reason about models of the world is essential for situated language understanding. But evaluating world modeling capabilities in modern AI systems – especially those based on language models – has proven challenging, in large part because of the difficulty of disentangling conceptual knowledge about the world from knowledge of surface co-occurrence statistics. This paper presents Elements of World Knowledge (EWoK), a framework for evaluating language models’ understanding of the conceptual knowledge underlying world modeling. EWoK targets specific concepts from multiple knowledge domains known to be important for world modeling in humans, from social interactions (help, deceive) to spatial relations (left, right). Objects, agents, and locations in the items can be flexibly filled in, enabling easy generation of multiple controlled datasets. We then introduce EWoK-core-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 open-weights large language models (1.3B–70B parameters) and compare them with human performance. All tested models perform worse than humans, with results varying drastically across domains. Performance on social interactions and social properties was highest and performance on physical relations and spatial relations was lowest. Overall, this dataset highlights simple cases where even large models struggle and presents rich avenues for targeted research on LLM world modeling capabilities.

nan


Article 1211

Title@2025-07-03 (4): DeepGesture: A conversational gesture synthesis system based on emotions and semantics

Title: DeepGesture: A conversational gesture synthesis system based on emotions and semantics DeepGesture: Ein dialogisches Gesten-Synthesesystem basierend auf Emotionen und Semantik DeepGesture:基于情感和语义的谈话手势合成系统 2507.03147v1

Authors (1): Thanh Hoang-Minh

Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements that correspond naturally to text or speech inputs. In this work, we present DeepGesture, a diffusion-based gesture synthesis framework for generating expressive co-speech gestures conditioned on multimodal signals-text, speech, emotion, and seed motion. Built upon the DiffuseStyleGesture model, DeepGesture introduces novel architectural enhancements that improve semantic alignment and emotional expressiveness in generated gestures. Specifically, we integrate fast text transcriptions as semantic conditioning and implement emotion-guided classifier-free diffusion to support controllable gesture generation across affective states. A lightweight Transformer backbone combines full self-attention and cross-local attention for effective feature fusion of heterogeneous modalities. To visualize results, we implement a full rendering pipeline in Unity based on BVH output from the model. Evaluation on the ZeroEGGS dataset shows that DeepGesture produces gestures with improved human-likeness and contextual appropriateness, outperforming baselines on Mean Opinion Score and Frechet Gesture Distance metrics. Our system supports interpolation between emotional states and demonstrates generalization to out-of-distribution speech, including synthetic voices-marking a step forward toward fully multimodal, emotionally aware digital humans.

nan


Article 1212

Title@2025-07-03 (4): Set Valued Predictions For Robust Domain Generalization

Title: Set Valued Predictions For Robust Domain Generalization Wertvolle Vorhersagen für robuste Domain-Verallgemeinerung setzen 设定强力域一般化的价值预测值 2507.03146v1

Authors (3): Ron Tsibulsky, Daniel Nevo, Uri Shalit

Despite the impressive advancements in modern machine learning, achieving robustness in Domain Generalization (DG) tasks remains a significant challenge. In DG, models are expected to perform well on samples from unseen test distributions (also called domains), by learning from multiple related training distributions. Most existing approaches to this problem rely on single-valued predictions, which inherently limit their robustness. We argue that set-valued predictors could be leveraged to enhance robustness across unseen domains, while also taking into account that these sets should be as small as possible. We introduce a theoretical framework defining successful set prediction in the DG setting, focusing on meeting a predefined performance criterion across as many domains as possible, and provide theoretical insights into the conditions under which such domain generalization is achievable. We further propose a practical optimization method compatible with modern learning architectures, that balances robust performance on unseen domains with small prediction set sizes. We evaluate our approach on several real-world datasets from the WILDS benchmark, demonstrating its potential as a promising direction for robust domain generalization.

nan


Article 1213

Title@2025-07-03 (4): Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics

Title: Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics Neuraler Ersatzlöser für effiziente Edge Inferenz von Power Electronic Hybrid Dynamics 电动电子混合动力学高效边缘推断神经替代溶剂 2507.03144v1

Authors (8): Jialin Zheng, Haoyu Wang, Yangbin Zeng, Han Xu, Di Mou, Hong Li, Sergio Vazquez, Leopoldo G. Franquelo

Advancing the dynamics inference of power electronic systems (PES) to the real-time edge-side holds transform-ative potential for testing, control, and monitoring. How-ever, efficiently inferring the inherent hybrid continu-ous-discrete dynamics on resource-constrained edge hardware remains a significant challenge. This letter pro-poses a neural substitute solver (NSS) approach, which is a neural-network-based framework aimed at rapid accurate inference with significantly reduced computational costs. Specifically, NSS leverages lightweight neural networks to substitute time-consuming matrix operation and high-order numerical integration steps in traditional solvers, which transforms sequential bottlenecks into highly parallel operation suitable for edge hardware. Experimental vali-dation on a multi-stage DC-DC converter demonstrates that NSS achieves 23x speedup and 60% hardware resource reduction compared to traditional solvers, paving the way for deploying edge inference of high-fidelity PES dynamics.

nan


Article 1214

Title@2025-07-03 (4): The geometry of financial institutions – Wasserstein clustering of financial data

Title: The geometry of financial institutions – Wasserstein clustering of financial data Die Geometrie der Finanzinstitute – Wasserstein-Clustering von Finanzdaten 金融机构的几何形状 – – 金融数据的瓦塞斯坦组合 2305.03565v2

Authors (5): Lorenz Riess, Mathias Beiglböck, Johannes Temme, Andreas Wolf, Julio Backhoff

The increasing availability of granular and big data on various objects of interest has made it necessary to develop methods for condensing this information into a representative and intelligible map. Financial regulation is a field that exemplifies this need, as regulators require diverse and often highly granular data from financial institutions to monitor and assess their activities. However, processing and analyzing such data can be a daunting task, especially given the challenges of dealing with missing values and identifying clusters based on specific features. To address these challenges, we propose a variant of Lloyd’s algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space which represents given data on various objects in condensed form. By applying our method to the financial regulation context, we demonstrate its usefulness in dealing with the specific challenges faced by regulators in this domain. We believe that our approach can also be applied more generally to other fields where large and complex data sets need to be represented in concise form.

nan


Article 1215

Title@2025-07-03 (4): Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations

Title: Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations Auf dem Weg zu einer psychoanalytischen Perspektive auf VLM-Verhalten: Eine erste Interpretation mit faszinierenden Beobachtungen 对VLM行为进行心理分析:带有引人注意的首步解释 2507.03123v1

Authors (5): Xiangrui Liu, Man Luo, Agneet Chatterjee, Hua Wei, Yezhou Yang

Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs). Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations. However, these explanations primarily focus on technical or externally driven factors, may have neglected the possibility that hallucination behaviours might mirror cognitive biases observed in human psychology. In this work, we introduce a psychological taxonomy, categorizing VLMs’ hallucination behaviours, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: authority bias. To systematically analyze these behaviours, we design AIpsych, a scalable benchmark that reveals psychological tendencies in model response patterns. Leveraging this benchmark, we investigate how variations in model architecture and parameter size influence model behaviour when responding to strategically manipulated questions. Our experiments reveal that as model size increases, VLMs exhibit stronger sycophantic tendencies but reduced authority bias, suggesting increasing competence but a potential erosion of response integrity. A human subject study further validates our hypotheses and highlights key behavioural differences between VLMs and human respondents. This work suggests a new perspective for understanding hallucination in VLMs and highlights the importance of integrating psychological principles into model evaluation.The benchmark is available at https://github.com/lxrswdd/AIpsych.

nan


Article 1216

Title@2025-07-03 (4): MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration

Title: MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration MORPH-LER: Log-Euklidischen Regularisierung für bevölkerungsbewusste Image-Registrierung MORPH-LER: 人口软件图像注册的日志-欧克利德兰常规化 2502.02029v2

Authors (4): Mokshagna Sai Teja Karanam, Krithika Iyer, Sarang Joshi, Shireen Elhabian

Spatial transformations that capture population-level morphological statistics are critical for medical image analysis. Commonly used smoothness regularizers for image registration fail to integrate population statistics, leading to anatomically inconsistent transformations. Inverse consistency regularizers promote geometric consistency but lack population morphometrics integration. Regularizers that constrain deformation to low-dimensional manifold methods address this. However, they prioritize reconstruction over interpretability and neglect diffeomorphic properties, such as group composition and inverse consistency. We introduce MORPH-LER, a Log-Euclidean regularization framework for population-aware unsupervised image registration. MORPH-LER learns population morphometrics from spatial transformations to guide and regularize registration networks, ensuring anatomically plausible deformations. It features a bottleneck autoencoder that computes the principal logarithm of deformation fields via iterative square-root predictions. It creates a linearized latent space that respects diffeomorphic properties and enforces inverse consistency. By integrating a registration network with a diffeomorphic autoencoder, MORPH-LER produces smooth, meaningful deformation fields. The framework offers two main contributions: (1) a data-driven regularization strategy that incorporates population-level anatomical statistics to enhance transformation validity and (2) a linearized latent space that enables compact and interpretable deformation fields for efficient population morphometrics analysis. We validate MORPH-LER across two families of deep learning-based registration networks, demonstrating its ability to produce anatomically accurate, computationally efficient, and statistically meaningful transformations on the OASIS-1 brain imaging dataset. https://github.com/iyerkrithika21/MORPH_LER

nan


Article 1217

Title@2025-07-03 (4): LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms

Title: LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms LEDA: Log-Euclidean Diffeomorphism Autoencoder für effiziente statistische Analyse von Diffeomorphismen LEDA: 用于有效分析二异形特征的高效统计分析的Log-Euclidean Diffepheromist 自动编码器 2412.16129v2

Authors (3): Krithika Iyer, Shireen Elhabian, Sarang Joshi

Image registration is a core task in computational anatomy that establishes correspondences between images. Invertible deformable registration, which computes a deformation field and handles complex, non-linear transformations, is essential for tracking anatomical variations, especially in neuroimaging applications where inter-subject differences and longitudinal changes are key. Analyzing the deformation fields is challenging due to their non-linearity, which limits statistical analysis. However, traditional approaches for analyzing deformation fields are computationally expensive, sensitive to initialization, and prone to numerical errors, especially when the deformation is far from the identity. To address these limitations, we propose the Log-Euclidean Diffeomorphism Autoencoder (LEDA), an innovative framework designed to compute the principal logarithm of deformation fields by efficiently predicting consecutive square roots. LEDA operates within a linearized latent space that adheres to the diffeomorphisms group action laws, enhancing our model’s robustness and applicability. We also introduce a loss function to enforce inverse consistency, ensuring accurate latent representations of deformation fields. Extensive experiments with the OASIS-1 dataset demonstrate the effectiveness of LEDA in accurately modeling and analyzing complex non-linear deformations while maintaining inverse consistency. Additionally, we evaluate its ability to capture and incorporate clinical variables, enhancing its relevance for clinical applications.

nan


Article 1218

Title@2025-07-03 (4): Predicting the Susceptibility of Examples to Catastrophic Forgetting

Title: Predicting the Susceptibility of Examples to Catastrophic Forgetting Vorhersage der Anfälligkeit von Beispielen für katastrophisches Vergessen 预测灾难性遗忘实例的可视性 2406.09935v2

Authors (2): Guy Hacohen, Tinne Tuytelaars

Catastrophic forgetting - the tendency of neural networks to forget previously learned data when learning new information - remains a central challenge in continual learning. In this work, we adopt a behavioral approach, observing a connection between learning speed and forgetting: examples learned more quickly are less prone to forgetting. Focusing on replay-based continual learning, we show that the composition of the replay buffer - specifically, whether it contains quickly or slowly learned examples - has a significant effect on forgetting. Motivated by this insight, we introduce Speed-Based Sampling (SBS), a simple yet general strategy that selects replay examples based on their learning speed. SBS integrates easily into existing buffer-based methods and improves performance across a wide range of competitive continual learning benchmarks, advancing state-of-the-art results. Our findings underscore the value of accounting for the forgetting dynamics when designing continual learning algorithms.

nan


Article 1219

Title@2025-07-03 (4): Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings

Title: Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings Federated Learning für die ICD-Klassifizierung mit Leichtbaumodellen und vorgeschulten Einbettungen 具有轻重量型模型和未受过训练的嵌入型综合疾病分类的联邦学习 2507.03122v1

Authors (2): Binbin Xu, Gérard Dray

This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline combining frozen text embeddings with simple multilayer perceptron (MLP) classifiers. This design offers a privacy-preserving and deployment-efficient alternative for clinical NLP applications, particularly suited to distributed healthcare settings. Extensive experiments across both centralized and federated configurations were conducted, testing six publicly available embedding models from Massive Text Embedding Benchmark leaderboard and three MLP classifier architectures under two medical coding (ICD-9 and ICD-10). Additionally, ablation studies over ten random stratified splits assess performance stability. Results show that embedding quality substantially outweighs classifier complexity in determining predictive performance, and that federated learning can closely match centralized results in idealized conditions. While the models are orders of magnitude smaller than state-of-the-art architectures and achieved competitive micro and macro F1 scores, limitations remain including the lack of end-to-end training and the simplified FL assumptions. Nevertheless, this work demonstrates a viable way toward scalable, privacy-conscious medical coding systems and offers a step toward for future research into federated, domain-adaptive clinical AI.

nan


Article 1220

Title@2025-07-03 (4): How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models

Title: How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models Wie Übervertrauen in anfängliche Entscheidungen und Unterbewusstsein unter Kritik den Sinneswandel in großen Sprachmodellen modulieren 批评主义下对最初选择和不信任的过度自信如何改变大语言模式中的心态变化 2507.03120v1

Authors (11): Dharshan Kumaran, Stephen M Fleming, Larisa Markeeva, Joe Heyward, Andrea Banino, Mrinal Mathur, Razvan Pascanu, Simon Osindero, Benedetto de Martino, Petar Velickovic, Viorica Patraucean

Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers whilst at the same time being prone to excessive doubt when challenged. To investigate this apparent paradox, we developed a novel experimental paradigm, exploiting the unique ability to obtain confidence estimates from LLMs without creating memory of their initial judgments – something impossible in human participants. We show that LLMs – Gemma 3, GPT4o and o1-preview – exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind. We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating. Finally, we demonstrate that these two mechanisms – a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback – parsimoniously capture LLM behavior in a different domain. Together, these findings furnish a mechanistic account of LLM confidence that explains both their stubbornness and excessive sensitivity to criticism.

nan


Article 1221

Title@2025-07-03 (4): BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

Title: BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers BLaST: High Performance Inferenz und Pretraining mit BLock Sparse Transformers BLAST:使用BLock Sparse变形器进行高性能推断和预先训练 2507.03117v1

Authors (7): Patrik Okanovic, Sameer Deshmukh, Grzegorz Kwasniewski, Kentaro Katayama, Takumi Honda, Maciej Besta, Torsten Hoefler

The energy consumption of large-scale ML models is dominated by data movement - shuffling billions of parameters across memory hierarchies and data centers. Effective sparsification to prune redundant parameters is still challenging: existing methods incur significant accuracy degradation, performance overhead, or both. We introduce (Bl)ock (a)nd (S)parse (T)ransformers (BLaST), a general, robust, and reliable sparsification method applicable to linear layers in all settings. Our method iteratively sparsifies weight matrices into a block sparsity pattern suitable for efficient sparse matrix-matrix (SpMM) multiplication. BLaST achieves up to 95% sparsity in MLP weights with negligible accuracy loss. Our fused, highly optimized Sparse MLP kernel delivers up to 16.7x speedup over dense MLPs across 9 architectures and 8 datasets, resulting in up to 1.6x inference speedup, 1.11x pretraining speedup and up to 3.12x inference memory usage reduction. BLaST enables the next generation of large-scale AI systems by reducing energy use, memory footprint, and latency.

nan


Article 1222

Title@2025-07-03 (4): Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization

Title: Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization Hintere Inferenz mit Diffusionsmodellen für die hochdimensionale Black-Box-Optimierung 高维黑盒优化扩散模型的外皮推断 2502.16824v2

Authors (5): Taeyoung Yun, Kiyoung Om, Jaewoo Lee, Sujin Yun, Jinkyoo Park

Optimizing high-dimensional and complex black-box functions is crucial in numerous scientific applications. While Bayesian optimization (BO) is a powerful method for sample-efficient optimization, it struggles with the curse of dimensionality and scaling to thousands of evaluations. Recently, leveraging generative models to solve black-box optimization problems has emerged as a promising framework. However, those methods often underperform compared to BO methods due to limited expressivity and difficulty of uncertainty estimation in high-dimensional spaces. To overcome these issues, we introduce \textbf{DiBO}, a novel framework for solving high-dimensional black-box optimization problems. Our method iterates two stages. First, we train a diffusion model to capture the data distribution and deep ensembles to predict function values with uncertainty quantification. Second, we cast the candidate selection as a posterior inference problem to balance exploration and exploitation in high-dimensional spaces. Concretely, we fine-tune diffusion models to amortize posterior inference. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines across synthetic and real-world tasks. Our code is publicly available \href{https://github.com/umkiyoung/DiBO}{here}.

nan


Article 1223

Title@2025-07-03 (4): An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks

Title: An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks Ein erweitertes Benchmarking von Multi-Agenten-Verstärkung-Lernalgorithmen in komplexen voll kooperativen Aufgaben 扩大复杂全面合作任务中多机构强化学习比重的基准基准 2502.04773v2

Authors (6): George Papadopoulos, Andreas Kontogiannis, Foteini Papadopoulou, Chaido Poulianou, Ioannis Koumentis, George Vouros

Multi-Agent Reinforcement Learning (MARL) has recently emerged as a significant area of research. However, MARL evaluation often lacks systematic diversity, hindering a comprehensive understanding of algorithms’ capabilities. In particular, cooperative MARL algorithms are predominantly evaluated on benchmarks such as SMAC and GRF, which primarily feature team game scenarios without assessing adequately various aspects of agents’ capabilities required in fully cooperative real-world tasks such as multi-robot cooperation and warehouse, resource management, search and rescue, and human-AI cooperation. Moreover, MARL algorithms are mainly evaluated on low dimensional state spaces, and thus their performance on high-dimensional (e.g., image) observations is not well-studied. To fill this gap, this paper highlights the crucial need for expanding systematic evaluation across a wider array of existing benchmarks. To this end, we conduct extensive evaluation and comparisons of well-known MARL algorithms on complex fully cooperative benchmarks, including tasks with images as agents’ observations. Interestingly, our analysis shows that many algorithms, hailed as state-of-the-art on SMAC and GRF, may underperform standard MARL baselines on fully cooperative benchmarks. Finally, towards more systematic and better evaluation of cooperative MARL algorithms, we have open-sourced PyMARLzoo+, an extension of the widely used (E)PyMARL libraries, which addresses an open challenge from [TBG++21], facilitating seamless integration and support with all benchmarks of PettingZoo, as well as Overcooked, PressurePlate, Capture Target and Box Pushing.

nan


Article 1224

Title@2025-07-03 (4): SymMatika: Structure-Aware Symbolic Discovery

Title: SymMatika: Structure-Aware Symbolic Discovery SymMatika: Struktur-Bewusst Symbolische Entdeckung SymMatika: 结构-意识符号发现 2507.03110v1

Authors (2): Michael Scherk, Boyuan Chen

Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(\mathbf{x})$) or discovering implicit relations (i.e., $F(\mathbf{x}, y)=0$), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates, achieving 5.1% higher performance than the previous best results on Nguyen, the first recovery of Nguyen-12, and competitive performance on the Feynman equations. It also recovers implicit physical laws from Eureqa datasets up to $100\times$ faster. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.

nan


Article 1225

Title@2025-07-03 (4): Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse Observations

Title: Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse Observations Neural Dynamische Modi: Computational Imaging von dynamischen Systemen aus Sparse-Beobachtungen 神经动态模式:来自分散观测的动态系统的计算成像 2507.03094v1

Authors (4): Ali SaraerToosi, Renbo Tu, Kamyar Azizzadenesheli, Aviad Levis

Dynamical systems are ubiquitous within science and engineering, from turbulent flow across aircraft wings to structural variability of proteins. Although some systems are well understood and simulated, scientific imaging often confronts never-before-seen dynamics observed through indirect, noisy, and highly sparse measurements. We present NeuralDMD, a model-free framework that combines neural implicit representations with Dynamic Mode Decomposition (DMD) to reconstruct continuous spatio-temporal dynamics from such measurements. The expressiveness of neural representations enables capturing complex spatial structures, while the linear dynamical modes of DMD introduce an inductive bias that guides training and supports stable, low-dimensional representations and forecasting. We validate NeuralDMD on two real-world problems: reconstructing near-surface wind-speed fields over North America from sparse station observations, and recovering the evolution of plasma near the Galactic-center black hole, Sgr A*. In both cases, NeuralDMD outperforms established baselines, demonstrating its potential as a general tool for imaging dynamical systems across geoscience, astronomy, and beyond.

nan


Article 1226

Title@2025-07-03 (4): The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks

Title: The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks Die Surrogat Gibbs-Posterior einer korrigierten stochastischen MALA: Auf dem Weg zur Unsicherheitsquantifizierung für neuronale Netzwerke 校正的《MAMALA》的替代Gibbs前置:争取对神经网络的不确定性进行量化 2310.09335v2

Authors (4): Sebastian Bieringer, Gregor Kasieczka, Maximilian F. Steffen, Mathias Trabs

MALA is a popular gradient-based Markov chain Monte Carlo method to access the Gibbs-posterior distribution. Stochastic MALA (sMALA) scales to large data sets, but changes the target distribution from the Gibbs-posterior to a surrogate posterior which only exploits a reduced sample size. We introduce a corrected stochastic MALA (csMALA) with a simple correction term for which distance between the resulting surrogate posterior and the original Gibbs-posterior decreases in the full sample size while retaining scalability. In a nonparametric regression model, we prove a PAC-Bayes oracle inequality for the surrogate posterior. Uncertainties can be quantified by sampling from the surrogate posterior. Focusing on Bayesian neural networks, we analyze the diameter and coverage of credible balls for shallow neural networks and we show optimal contraction rates for deep neural networks. Our credibility result is independent of the correction and can also be applied to the standard Gibbs-posterior. A simulation study in a high-dimensional parameter space demonstrates that an estimator drawn from csMALA based on its surrogate Gibbs-posterior indeed exhibits these advantages in practice.

nan


Article 1227

Title@2025-07-03 (4): Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Title: Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Point3R: Streaming der 3D-Rekonstruktion mit explizitem Raumzeigerspeicher Point3R: 利用明确的空间指内存进行三维重建 2507.02863v1

Authors (4): Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.

nan


Article 1228

Title@2025-07-03 (4): Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Title: Answer Matching Outperforms Multiple Choice for Language Model Evaluation Antwort Matching Outperforms Multiple Choice für Sprachmodell-Bewertung 语言模式评价的多种选择 2507.02856v1

Authors (5): Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping

Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from popular benchmarks can often be answered without even seeing the question. These shortcuts arise from a fundamental limitation of discriminative evaluation not shared by evaluations of the model’s free-form, generative answers. Until recently, there appeared to be no viable, scalable alternative to multiple choice–but, we show that this has changed. We consider generative evaluation via what we call answer matching: Give the candidate model the question without the options, have it generate a free-form response, then use a modern language model with the reference answer to determine if the response matches the reference. To compare the validity of different evaluation strategies, we annotate MMLU-Pro and GPQA-Diamond to obtain human grading data, and measure the agreement of each evaluation approach. We find answer matching using recent models–even small ones–achieves near-perfect agreement, in the range of inter-annotator agreement. In contrast, both multiple choice evaluation and using LLM-as-a-judge without reference answers aligns poorly with human grading. Improving evaluations via answer matching is not merely a conceptual concern: the rankings of several models change significantly when evaluating their free-form responses with answer matching. In light of these findings, we discuss how to move the evaluation ecosystem from multiple choice to answer matching.

nan


Article 1229

Title@2025-07-03 (4): Urban Region Pre-training and Prompting: A Graph-based Approach

Title: Urban Region Pre-training and Prompting: A Graph-based Approach Urban Region Pre-Training und Prompting: Ein graphischer Ansatz 培训前和推动:基于图表的方法 2408.05920v4

Authors (8): Jiahui Jin, Yifan Song, Dong Kan, Haojia Zhu, Xiangguo Sun, Zhicheng Li, Xigang Sun, Jinghui Zhang

Urban region representation is crucial for various urban downstream tasks. However, despite the proliferation of methods and their success, acquiring general urban region knowledge and adapting to different tasks remains challenging. Existing work pays limited attention to the fine-grained functional layout semantics in urban regions, limiting their ability to capture transferable knowledge across regions. Further, inadequate handling of the unique features and relationships required for different downstream tasks may also hinder effective task adaptation. In this paper, we propose a $\textbf{G}$raph-based $\textbf{U}$rban $\textbf{R}$egion $\textbf{P}$re-training and $\textbf{P}$rompting framework ($\textbf{GURPP}$) for region representation learning. Specifically, we first construct an urban region graph and develop a subgraph-centric urban region pre-training model to capture the heterogeneous and transferable patterns of entity interactions. This model pre-trains knowledge-rich region embeddings using contrastive learning and multi-view learning methods. To further refine these representations, we design two graph-based prompting methods: a manually-defined prompt to incorporate explicit task knowledge and a task-learnable prompt to discover hidden knowledge, which enhances the adaptability of these embeddings to different tasks. Extensive experiments on various urban region prediction tasks and different cities demonstrate the superior performance of our framework.

nan


Article 1230

Title@2025-07-03 (4): Mitigating Goal Misgeneralization with Minimax Regret

Title: Mitigating Goal Misgeneralization with Minimax Regret Zielverallgemeinerung mit Minimax-Beschwerde abmildern 与Minimax Regret 的不普及化 2507.03068v1

Authors (7): Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael Dennis

Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer’s intended goal. The latter requirement may fail when a proxy goal incentivizes similar behavior to the intended goal within the training environment, but not in novel deployment environments. This creates the risk that policies will behave as if in pursuit of the proxy goal, rather than the intended goal, in deployment – a phenomenon known as goal misgeneralization. In this paper, we formalize this problem setting in order to theoretically study the possibility of goal misgeneralization under different training objectives. We show that goal misgeneralization is possible under approximate optimization of the maximum expected value (MEV) objective, but not the minimax expected regret (MMER) objective. We then empirically show that the standard MEV-based training method of domain randomization exhibits goal misgeneralization in procedurally-generated grid-world environments, whereas current regret-based unsupervised environment design (UED) methods are more robust to goal misgeneralization (though they don’t find MMER policies in all cases). Our findings suggest that minimax expected regret is a promising approach to mitigating goal misgeneralization.

nan


Article 1231

Title@2025-07-03 (4): MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Title: MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs MOTIF: Modulares Denken durch Verstärkung Feinabstimmung in LLMs MOTIF:通过强化微调在LLM中进行模块思考 2507.02851v1

Authors (2): Purbesh Mitra, Sennur Ulukus

Recent advancements in the reasoning capabilities of large language models (LLMs) show that employing group relative policy optimization (GRPO) algorithm for reinforcement learning (RL) training allows the models to use more thinking/reasoning tokens for generating better responses. However, LLMs can generate only a finite amount of tokens while maintaining attention to the previously generated tokens. This limit, also known as the context size of an LLM, is a bottleneck in LLM reasoning with arbitrarily large number of tokens. To think beyond the limit of context size, an LLM must employ a modular thinking strategy to reason over multiple rounds. In this work, we propose $\textbf{MOTIF: Modular Thinking via Reinforcement Finetuning}$ – an RL training method for generating thinking tokens in multiple rounds, effectively allowing the model to think with additional context size. We trained the open-source model Qwen2.5-3B-Instruct on GSM8K dataset via parameter efficient fine-tuning and tested its accuracy on MATH500 and AIME2024 benchmarks. Our experiments show 3.8\% and 3.3\% improvements over vanilla GRPO based training in the respective benchmarks. Furthermore, this improvement was achieved with only 15\% of samples, thus demonstrating sample efficiency of MOTIF. Our code and models are available at https://github.com/purbeshmitra/MOTIF and https://huggingface.co/purbeshmitra/MOTIF, respectively.

nan


Article 1232

Title@2025-07-03 (4): MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Title: MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis MvHo-IB: Multi-View Higher-Order Informationen Engpass für Gehirnerkrankungen Diagnose MvHO-IB:脑紊乱诊断多视高端信息瓶颈 2507.02847v1

Authors (3): Kunyu Zhang, Qiang Li, Shujian Yu

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.

nan


Article 1233

Title@2025-07-03 (4): Evaluating Frontier Models for Stealth and Situational Awareness

Title: Evaluating Frontier Models for Stealth and Situational Awareness Bewertung von Frontier-Modellen für Stealth und Situationsbewusstsein 评价隐形和了解情况的边界模式 2505.01420v4

Authors (9): Mary Phuong, Roland S. Zimmermann, Ziyue Wang, David Lindner, Victoria Krakovna, Sarah Cogan, Allan Dafoe, Lewis Ho, Rohin Shah

Recent work has demonstrated the plausibility of frontier AI models scheming – knowingly and covertly pursuing an objective misaligned with its developer’s intentions. Such behavior could be very hard to detect, and if present in future advanced systems, could pose severe loss of control risk. It is therefore important for AI developers to rule out harm from scheming prior to model deployment. In this paper, we present a suite of scheming reasoning evaluations measuring two types of reasoning capabilities that we believe are prerequisites for successful scheming: First, we propose five evaluations of ability to reason about and circumvent oversight (stealth). Second, we present eleven evaluations for measuring a model’s ability to instrumentally reason about itself, its environment and its deployment (situational awareness). We demonstrate how these evaluations can be used as part of a scheming inability safety case: a model that does not succeed on these evaluations is almost certainly incapable of causing severe harm via scheming in real deployment. We run our evaluations on current frontier models and find that none of them show concerning levels of either situational awareness or stealth.

nan


Article 1234

Title@2025-07-03 (4): LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

Title: LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding LLM-getriebene Behandlung Wirkung Schätzung unter Inferenzzeit Text Confounding LLM-驱动处理效应估算 2507.02843v1

Authors (4): Yuchen Ma, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

nan


Article 1235

Title@2025-07-03 (4): On the Structure of Replicable Hypothesis Testers

Title: On the Structure of Replicable Hypothesis Testers Über die Struktur replizierbarer Hypothesenprüfer 关于可复制的假设测试器结构 2507.02842v1

Authors (5): Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Shyam Narayanan, Sandeep Silwal

A hypothesis testing algorithm is replicable if, when run on two different samples from the same distribution, it produces the same output with high probability. This notion, defined by by Impagliazzo, Lei, Pitassi, and Sorell [STOC’22], can increase trust in testing procedures and is deeply related to algorithmic stability, generalization, and privacy. We build general tools to prove lower and upper bounds on the sample complexity of replicable testers, unifying and quantitatively improving upon existing results. We identify a set of canonical properties, and prove that any replicable testing algorithm can be modified to satisfy these properties without worsening accuracy or sample complexity. A canonical replicable algorithm computes a deterministic function of its input (i.e., a test statistic) and thresholds against a uniformly random value in $[0,1]$. It is invariant to the order in which the samples are received, and, if the testing problem is ``symmetric,’’ then the algorithm is also invariant to the labeling of the domain elements, resolving an open question by Liu and Ye [NeurIPS’24]. We prove new lower bounds for uniformity, identity, and closeness testing by reducing to the case where the replicable algorithm satisfies these canonical properties. We systematize and improve upon a common strategy for replicable algorithm design based on test statistics with known expectation and bounded variance. Our framework allow testers which have been extensively analyzed in the non-replicable setting to be made replicable with minimal overhead. As direct applications of our framework, we obtain constant-factor optimal bounds for coin testing and closeness testing and get replicability for free in a large parameter regime for uniformity testing. We also give state-of-the-art bounds for replicable Gaussian mean testing, and, unlike prior work, our algorithm runs in polynomial time.

nan


Article 1236

Title@2025-07-03 (4): StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason

Title: StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason StepHint: Mehrstufige stufenweise Hinweise stärken das Lernen zur Vernunft 步进提示:多级分步骤将强化学习提升到合理 2507.02841v1

Authors (7): Kaiyi Zhang, Ang Lv, Jinpeng Li, Yongbo Wang, Feng Wang, Haoyuan Hu, Rui Yan

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for improving the complex reasoning abilities of large language models (LLMs). However, current RLVR methods face two significant challenges: the near-miss reward problem, where a small mistake can invalidate an otherwise correct reasoning process, greatly hindering training efficiency; and exploration stagnation, where models tend to focus on solutions within their comfort zone,'' lacking the motivation to explore potentially more effective alternatives. To address these challenges, we propose StepHint, a novel RLVR algorithm that utilizes multi-level stepwise hints to help models explore the solution space more effectively. StepHint generates valid reasoning chains from stronger models and partitions these chains into reasoning steps using our proposed adaptive partitioning method. The initial few steps are used as hints, and simultaneously, multiple-level hints (each comprising a different number of steps) are provided to the model. This approach directs the model's exploration toward a promising solution subspace while preserving its flexibility for independent exploration. By providing hints, StepHint mitigates the near-miss reward problem, thereby improving training efficiency. Additionally, the external reasoning pathways help the model develop better reasoning abilities, enabling it to move beyond itscomfort zone’’ and mitigate exploration stagnation. StepHint outperforms competitive RLVR enhancement methods across six mathematical benchmarks, while also demonstrating superior generalization and excelling over baselines on out-of-domain benchmarks.

nan


Article 1237

Title@2025-07-03 (4): From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

Title: From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v3

Authors (23): Weizhi Zhang, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie, Yuyao Yang, Wei-Chieh Huang, Chunyu Miao, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Yankai Chen, Chunkit Chan, Peilin Zhou, Xinyang Zhang, Chenwei Zhang, Jingbo Shang, Ming Zhang, Yangqiu Song, Irwin King, Philip S. Yu

Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking. All the related resources, including industry products, research papers, benchmark datasets, and open-source implementations, are collected for the community in https://github.com/DavidZWZ/Awesome-Deep-Research.

nan


Article 1238

Title@2025-07-03 (4): Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble

Title: Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble Mitgliedschafts-Inferenz Angriffe als Datenschutz-Tools: Zuverlässigkeit, Disparität und Ensemble 以成员攻击作为隐私工具:可靠性、差异性和共性 2506.13972v2

Authors (6): Zhiqi Wang, Chengyu Zhang, Yuetian Chen, Nathalie Baracaldo, Swanand Kadhe, Lei Yu

Membership inference attacks (MIAs) pose a significant threat to the privacy of machine learning models and are widely used as tools for privacy assessment, auditing, and machine unlearning. While prior MIA research has primarily focused on performance metrics such as AUC, accuracy, and TPR@low FPR - either by developing new methods to enhance these metrics or using them to evaluate privacy solutions - we found that it overlooks the disparities among different attacks. These disparities, both between distinct attack methods and between multiple instantiations of the same method, have crucial implications for the reliability and completeness of MIAs as privacy evaluation tools. In this paper, we systematically investigate these disparities through a novel framework based on coverage and stability analysis. Extensive experiments reveal significant disparities in MIAs, their potential causes, and their broader implications for privacy evaluation. To address these challenges, we propose an ensemble framework with three distinct strategies to harness the strengths of state-of-the-art MIAs while accounting for their disparities. This framework not only enables the construction of more powerful attacks but also provides a more robust and comprehensive methodology for privacy evaluation.

nan


Article 1239

Title@2025-07-03 (4): ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Title: ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning ExPO: Entsperren harter Vernunft mit selbsterklärungsgeführtem Verstärkungslernen ExPO: 以自我剥削指导强化学习来解锁困难理由 2507.02834v1

Authors (4): Ruiyang Zhou, Shuozhe Li, Amy Zhang, Liu Leqi

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model’s initial ability to produce positive samples. They primarily refine what the model already knows (distribution sharpening) rather than enabling the model to solve problems where it initially fails. This limitation is especially problematic in early-stage RL training and on challenging reasoning tasks, where positive samples are unlikely to be generated. To unlock reasoning ability in such settings, the model must explore new reasoning trajectories beyond its current output distribution. Such exploration requires access to sufficiently good positive samples to guide the learning. While expert demonstrations seem like a natural solution, we find that they are often ineffective in RL post-training. Instead, we identify two key properties of effective positive samples: they should (1) be likely under the current policy, and (2) increase the model’s likelihood of predicting the correct answer. Based on these insights, we propose $\textbf{Self-Explanation Policy Optimization (ExPO)}$-a simple and modular framework that generates such samples by conditioning on the ground-truth answer. ExPO enables efficient exploration and guides the model to produce reasoning trajectories more aligned with its policy than expert-written CoTs, while ensuring higher quality than its own (incorrect) samples. Experiments show that ExPO improves both learning efficiency and final performance on reasoning benchmarks, surpassing expert-demonstration-based methods in challenging settings such as MATH level-5, where the model initially struggles the most.

nan


Article 1240

Title@2025-07-03 (4): SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

Title: SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model SynapseRoute: Ein Auto-Routen-Schaltrahmen für das Dual-State Large Language Model SynapseRoute:关于两州大语言模式的自动运行切换框架 2507.02822v1

Authors (12): Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun

With the widespread adoption of large language models (LLMs) in practical applications, selecting an appropriate model requires balancing not only performance but also operational cost. The emergence of reasoning-capable models has further widened the cost gap between “thinking” (high reasoning) and “non-thinking” (fast, low-cost) modes. In this work, we reveal that approximately 58% of medical questions can be accurately answered by the non-thinking mode alone, without requiring the high-cost reasoning process. This highlights a clear dichotomy in problem complexity and suggests that dynamically routing queries to the appropriate mode based on complexity could optimize accuracy, cost-efficiency, and overall user experience. Based on this, we further propose SynapseRoute, a machine learning-based dynamic routing framework that intelligently assigns input queries to either thinking or non-thinking modes. Experimental results on several medical datasets demonstrate that SynapseRoute not only improves overall accuracy (0.8390 vs. 0.8272) compared to the thinking mode alone but also reduces inference time by 36.8% and token consumption by 39.66%. Importantly, qualitative analysis indicates that over-reasoning on simpler queries can lead to unnecessary delays and even decreased accuracy, a pitfall avoided by our adaptive routing. Finally, this work further introduces the Accuracy-Inference-Token (AIT) index to comprehensively evaluate the trade-offs among accuracy, latency, and token cost.

nan


Article 1241

Title@2025-07-03 (4): Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks

Title: Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks Messung als Bricolage: Untersuchung, wie Datenwissenschaftler Zielvariablen für vorausschauende Modellierungsaufgaben konstruieren 计量为 “ 饼干测量 “ :审查数据科学家如何构建预测模拟任务的目标变量 2507.02819v1

Authors (5): Luke Guerdan, Devansh Saxena, Stevie Chancellor, Zhiwei Steven Wu, Kenneth Holstein

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the “authenticity” of student writing or the “healthcare need” of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, involving iterative negotiation between high-level measurement objectives and low-level practical constraints. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively use problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.

nan


Article 1242

Title@2025-07-03 (4): Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

Title: Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case Große Sprachmodelle zur Automatisierung der klinischen Datenstandardisierung: HL7 FHIR Use Case 临床数据标准化自动化大语言模型:HL7 FHIR使用案例 2507.03067v1

Authors (3): Alvaro Riquelme, Pedro Costa, Catalina Martinez

For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy dipped slightly to 94 %, but refinements to the prompting strategy restored robust mappings. Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate. Overall, our study demonstrates the feasibility of context-aware, LLM-driven transformation of clinical data into HL7 FHIR, laying the groundwork for semi-automated interoperability workflows. Future work will focus on fine-tuning models with specialized medical corpora, extending support to additional standards such as HL7 CDA and OMOP, and developing an interactive interface to enable expert validation and iterative refinement.

nan


Article 1243

Title@2025-07-03 (4): Replicable Distribution Testing

Title: Replicable Distribution Testing Reproduzierbare Verteilungsprüfung 可复制分发测试 2507.02814v1

Authors (5): Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye

We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing – answering an open question from prior work – and closeness testing.

nan


Article 1244

Title@2025-07-03 (4): Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference

Title: Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference Cycle-Consistent Helmholtz Machine: Ziel-gesäte Simulation durch Inverted Inferenz Helmholtz 机 : 通过反向推论进行目标种子模拟 2507.03065v1

Authors (1): Xin Li

The Helmholtz Machine (HM) is a foundational architecture for unsupervised learning, coupling a bottom-up recognition model with a top-down generative model through alternating inference. However, its reliance on symmetric, data-driven updates constrains its ability to perform goal-directed reasoning or simulate temporally extended processes. In this work, we introduce the \emph{Cycle-Consistent Helmholtz Machine} (C$^2$HM), a novel extension that reframes inference as a \emph{goal-seeded}, \emph{asymmetric} process grounded in structured internal priors. Rather than inferring latent causes solely from sensory data, C$^2$HM simulates plausible latent trajectories conditioned on abstract goals, aligning them with observed outcomes through a recursive cycle of forward generation and inverse refinement. This cycle-consistent formulation integrates top-down structure with bottom-up evidence via a variational loop, enforcing mutual alignment between goal-conditioned latent predictions and recognition-based reconstructions. We formalize this mechanism within the framework of the \emph{Context-Content Uncertainty Principle} (CCUP), which posits that inference proceeds by aligning structured, low-entropy content with high-entropy, ambiguous context. C$^2$HM improves representational efficiency, supports memory chaining via path-dependent inference, and enables spatial compositional imagination. By offering a biologically inspired alternative to classical amortized inference, $C^2$HM reconceives generative modeling as intentional simulation, bridging memory-based planning and unsupervised learning in a unified probabilistic framework.

nan


Article 1245

Title@2025-07-03 (4): In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization

Title: In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization In-Training Multikalibrierte Überlebensanalyse für das Gesundheitswesen durch eingeschränkte Optimierung 通过 “ 限制最佳化 “ ,对通过 “ 限制优化 “ 进行的 “ 多校制 “ 多校准生存分析 “ 进行的 “ 多校准 “ 培训中 “ 多校准 “ 健康生存分析 “ 2507.02807v1

Authors (2): Thiti Suttaket, Stanley Kok

Survival analysis is an important problem in healthcare because it models the relationship between an individual’s covariates and the onset time of an event of interest (e.g., death). It is important for survival models to be well-calibrated (i.e., for their predicted probabilities to be close to ground-truth probabilities) because badly calibrated systems can result in erroneous clinical decisions. Existing survival models are typically calibrated at the population level only, and thus run the risk of being poorly calibrated for one or more minority subpopulations. We propose a model called GRADUATE that achieves multicalibration by ensuring that all subpopulations are well-calibrated too. GRADUATE frames multicalibration as a constrained optimization problem, and optimizes both calibration and discrimination in-training to achieve a good balance between them. We mathematically prove that the optimization method used yields a solution that is both near-optimal and feasible with high probability. Empirical comparisons against state-of-the-art baselines on real-world clinical datasets demonstrate GRADUATE’s efficacy. In a detailed analysis, we elucidate the shortcomings of the baselines vis-a-vis GRADUATE’s strengths.

nan


Article 1246

Title@2025-07-03 (4): Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Title: Transferrable Surrogates in Expressive Neural Architecture Search Spaces Übertragbare Surrogate in Expressive Neural Architecture Suchplätze 显性神经结构搜索空间中的可转让代孕人 2504.12971v3

Authors (8): Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, Linus Ericsson

Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

nan


Article 1247

Title@2025-07-03 (4): Learning to Coordinate Bidders in Non-Truthful Auctions

Title: Learning to Coordinate Bidders in Non-Truthful Auctions Lernen, Bieter in nicht-wahrheitsgemäßen Auktionen zu koordinieren 学习如何在非真话拍卖中协调投标人 2507.02801v1

Authors (2): Hu Fu, Tao Lin

In non-truthful auctions such as first-price and all-pay auctions, the independent strategic behaviors of bidders, with the corresponding equilibrium notion – Bayes Nash equilibria – are notoriously difficult to characterize and can cause undesirable outcomes. An alternative approach to designing better auction systems is to coordinate the bidders: let a mediator make incentive-compatible recommendations of correlated bidding strategies to the bidders, namely, implementing a Bayes correlated equilibrium (BCE). The implementation of BCE, however, requires knowledge of the distribution of bidders’ private valuations, which is often unavailable. We initiate the study of the sample complexity of learning Bayes correlated equilibria in non-truthful auctions. We prove that the BCEs in a large class of non-truthful auctions, including first-price and all-pay auctions, can be learned with a polynomial number $\tilde O(\frac{n}{\varepsilon^2})$ of samples from the bidders’ value distributions. Our technique is a reduction to the problem of estimating bidders’ expected utility from samples, combined with an analysis of the pseudo-dimension of the class of all monotone bidding strategies of bidders.

nan


Article 1248

Title@2025-07-03 (4): GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

Title: GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling GPAS: Beschleunigung der Konvergenz des LLM-Vortrainings durch Gradient-Preserving Activation Scaling GPAS:通过 “ 渐进式保留动力扩增 “ 加速汇集LLM预备训练 2506.22049v2

Authors (15): Tianhao Chen, Xin Xu, Zijing Liu, Pengxiang Li, Xinyuan Song, Ajay Kumar Jaiswal, Fan Zhang, Jishan Hu, Yang Wang, Hao Chen, Shizhe Diao, Shiwei Liu, Yu Li, Lu Yin, Can Yang

Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LN Transformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings. Our code is available at https://github.com/dandingsky/GPAS.

nan


Article 1249

Title@2025-07-03 (4): Understanding and Improving Length Generalization in Recurrent Models

Title: Understanding and Improving Length Generalization in Recurrent Models Verständnis und Verbesserung der Längenverallgemeinerung in rekurrenten Modellen 经常模式中的了解和改进时间长度的概述 2507.02782v1

Authors (2): Ricardo Buitrago Ruiz, Albert Gu

Recently, recurrent models such as state space models and linear attention have become popular due to their linear complexity in the sequence length. Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths-i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to support the unexplored states hypothesis, which posits that models fail to length generalize when during training they are only exposed to a limited subset of the distribution of all attainable states (i.e. states that would be attained if the recurrence was applied to long sequences). Furthermore, we investigate simple training interventions that aim to increase the coverage of the states that the model is trained on, e.g. by initializing the state with Gaussian noise or with the final state of a different input sequence. With only 500 post-training steps ($\sim 0.1\%$ of the pre-training budget), these interventions enable length generalization for sequences that are orders of magnitude longer than the training context (e.g. $2k\longrightarrow 128k$) and show improved performance in long context tasks, thus presenting a simple and efficient way to enable robust length generalization in general recurrent models.

nan


Article 1250

Title@2025-07-03 (4): Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Title: Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Selbstkorrektionsbank: Enthüllung und Adressierung des Selbstkorrektions-Blindflecks in LLMs 自我校正法官:在LLMs中披露和处理自我校正的盲人点 2507.02778v1

Authors (1): Ken Tsui

Although large language models (LLMs) have become transformative, they still make mistakes and can explore unproductive reasoning paths. Self-correction is an important capability for a trustworthy LLM, particularly an autoregressive LLM. While LLMs can identify error in user input, they exhibit a systematic ‘Self-Correction Blind Spot’ - failing to correct identical error in their own outputs. To systematically study this phenomenon, we introduce Self-Correction Bench, a systematic framework to measure this phenomenon through controlled error injection at three complexity levels. Testing 14 models, we find an average 64.5% blind spot rate. We find multiple evidences that this limitation relates to training data composition: human training demonstrations predominantly show error-free responses rather than error-correction sequences, unlike RL-trained models that learn error correction through outcome feedback. Remarkably, simply appending “Wait” reduces blind spots by 89.3%, suggesting that the capability exists but requires activation. Our work highlights a critical limitation in current LLMs and offers potential avenues for improving their reliability and trustworthiness.

nan


Article 1251

Title@2025-07-03 (4): LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet

Title: LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet LLM-Powered Vorhersage von Hyperglykämie und Entdeckung von Verhaltenstherapie Wege von Wearables und Diät LLM 法力预测超血糖和从穿衣和饮食中发现的行为处理途径 2503.03935v2

Authors (7): Abdullah Mamun, Asiful Arefeen, Susan B. Racette, Dorothy D. Sears, Corrie M. Whisner, Matthew P. Buman, Hassan Ghasemzadeh

Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after consuming a meal, is a critical indicator of progression toward type 2 diabetes in people with prediabetes and in healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (AUC). Predicting postprandial AUC in advance based on a person’s lifestyle factors, such as diet and physical activity level, and explaining the factors that affect postprandial blood glucose could allow an individual to adjust their lifestyle accordingly to maintain normal glucose levels. In this study, we developed an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and uses advanced data processing, large language models, and trainable machine learning models to predict postprandial AUC and hyperglycemia from diet, physical activity, and recent glucose patterns. We used data obtained from wearables in a five-week clinical trial of 10 adults who worked full-time to develop and evaluate the proposed computational model that integrates wearable sensing, multimodal data, and machine learning. Our machine learning model takes multimodal data from wearable activity and glucose monitoring sensors, along with food and work logs, and provides an interpretable prediction of the postprandial glucose pattern. Our GlucoLens system achieves a normalized root mean squared error (NRMSE) of 0.123 in its best configuration. On average, the proposed technology provides a 16% better performance level compared to the comparison models. Additionally, our technique predicts hyperglycemia with an accuracy of 73.3% and an F1 score of 0.716 and recommends different treatment options to help avoid hyperglycemia through diverse counterfactual explanations. Code available: https://github.com/ab9mamun/GlucoLens.

nan


Article 1252

Title@2025-07-03 (4): BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data

Title: BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data BERT4Traj: Transformer-basierter Trajektorien-Rekonstruktion für Sparse Mobility Data BERT4Traj: 变压器基础轨迹重建,用于分析移动数据 2507.03062v1

Authors (4): Hao Yang, Angela Yao, Christopher Whalen, Gengchen Mai

Understanding human mobility is essential for applications in public health, transportation, and urban planning. However, mobility data often suffers from sparsity due to limitations in data collection methods, such as infrequent GPS sampling or call detail record (CDR) data that only capture locations during communication events. To address this challenge, we propose BERT4Traj, a transformer based model that reconstructs complete mobility trajectories by predicting hidden visits in sparse movement sequences. Inspired by BERT’s masked language modeling objective and self_attention mechanisms, BERT4Traj leverages spatial embeddings, temporal embeddings, and contextual background features such as demographics and anchor points. We evaluate BERT4Traj on real world CDR and GPS datasets collected in Kampala, Uganda, demonstrating that our approach significantly outperforms traditional models such as Markov Chains, KNN, RNNs, and LSTMs. Our results show that BERT4Traj effectively reconstructs detailed and continuous mobility trajectories, enhancing insights into human movement patterns.

nan


Article 1253

Title@2025-07-03 (4): Grounding Intelligence in Movement

Title: Grounding Intelligence in Movement Erdungsintelligenz in der Bewegung 移动中的地面情报 2507.02771v1

Authors (6): Melanie Segado, Felipe Parodi, Jordan K. Matelsky, Michael L. Platt, Eva B. Dyer, Konrad P. Kording

Recent advances in machine learning have dramatically improved our ability to model language, vision, and other high-dimensional data, yet they continue to struggle with one of the most fundamental aspects of biological systems: movement. Across neuroscience, medicine, robotics, and ethology, movement is essential for interpreting behavior, predicting intent, and enabling interaction. Despite its core significance in our intelligence, movement is often treated as an afterthought rather than as a rich and structured modality in its own right. This reflects a deeper fragmentation in how movement data is collected and modeled, often constrained by task-specific goals and domain-specific assumptions. But movement is not domain-bound. It reflects shared physical constraints, conserved morphological structures, and purposeful dynamics that cut across species and settings. We argue that movement should be treated as a primary modeling target for AI. It is inherently structured and grounded in embodiment and physics. This structure, often allowing for compact, lower-dimensional representations (e.g., pose), makes it more interpretable and computationally tractable to model than raw, high-dimensional sensory inputs. Developing models that can learn from and generalize across diverse movement data will not only advance core capabilities in generative modeling and control, but also create a shared foundation for understanding behavior across biological and artificial systems. Movement is not just an outcome, it is a window into how intelligent systems engage with the world.

nan


Article 1254

Title@2025-07-03 (4): Avoiding Catastrophe in Online Learning by Asking for Help

Title: Avoiding Catastrophe in Online Learning by Asking for Help Vermeiden von Katastrophen im Online-Lernen, indem Sie um Hilfe bitten 通过请求帮助避免在线学习中的灾难 2402.08062v5

Authors (3): Benjamin Plaut, Hanlin Zhu, Stuart Russell

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are “catastrophic”, i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe in that round and try to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We also assume that the agent can transfer knowledge between similar inputs. We first show that in general, any algorithm either queries the mentor at a linear rate or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Although our focus is the product of payoffs, we provide matching bounds for the typical additive regret. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.

nan


Article 1255

Title@2025-07-03 (4): Multiple data-driven missing imputation

Title: Multiple data-driven missing imputation Mehrere datengetriebene fehlende Imputation 多数据驱动的缺失估算 2507.03061v1

Authors (1): Sergii Kavun

This paper introduces KZImputer, a novel adaptive imputation method for univariate time series designed for short to medium-sized missed points (gaps) (1-5 points and beyond) with tailored strategies for segments at the start, middle, or end of the series. KZImputer employs a hybrid strategy to handle various missing data scenarios. Its core mechanism differentiates between gaps at the beginning, middle, or end of the series, applying tailored techniques at each position to optimize imputation accuracy. The method leverages linear interpolation and localized statistical measures, adapting to the characteristics of the surrounding data and the gap size. The performance of KZImputer has been systematically evaluated against established imputation techniques, demonstrating its potential to enhance data quality for subsequent time series analysis. This paper describes the KZImputer methodology in detail and discusses its effectiveness in improving the integrity of time series data. Empirical analysis demonstrates that KZImputer achieves particularly strong performance for datasets with high missingness rates (around 50% or more), maintaining stable and competitive results across statistical and signal-reconstruction metrics. The method proves especially effective in high-sparsity regimes, where traditional approaches typically experience accuracy degradation.

nan


Article 1256

Title@2025-07-03 (4): MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

Title: MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v3

Authors (9): Elvis Kimara, Mozhgan Hadadi, Jackson Godbersen, Aditya Balu, Talukder Jubery, Yawei Li, Adarsh Krishnamurthy, Patrick S. Schnable, Baskar Ganapathysubramanian

The development of artificial intelligence (AI) and machine learning (ML) based tools for 3D phenotyping, especially for maize, has been limited due to the lack of large and diverse 3D datasets. 2D image datasets fail to capture essential structural details such as leaf architecture, plant volume, and spatial arrangements that 3D data provide. To address this limitation, we present MaizeField3D (https://baskargroup.github.io/MaizeField3D/), a curated dataset of 3D point clouds of field-grown maize plants from a diverse genetic panel, designed to be AI-ready for advancing agricultural research. Our dataset includes 1,045 high-quality point clouds of field-grown maize collected using a terrestrial laser scanner (TLS). Point clouds of 520 plants from this dataset were segmented and annotated using a graph-based segmentation method to isolate individual leaves and stalks, ensuring consistent labeling across all samples. This labeled data was then used for fitting procedural models that provide a structured parametric representation of the maize plants. The leaves of the maize plants in the procedural models are represented using Non-Uniform Rational B-Spline (NURBS) surfaces that were generated using a two-step optimization process combining gradient-free and gradient-based methods. We conducted rigorous manual quality control on all datasets, correcting errors in segmentation, ensuring accurate leaf ordering, and validating metadata annotations. The dataset also includes metadata detailing plant morphology and quality, alongside multi-resolution subsampled point cloud data (100k, 50k, 10k points), which can be readily used for different downstream computational tasks. MaizeField3D will serve as a comprehensive foundational dataset for AI-driven phenotyping, plant structural analysis, and 3D applications in agricultural research.

nan


Article 1257

Title@2025-07-03 (4): Contextual Online Pricing with (Biased) Offline Data

Title: Contextual Online Pricing with (Biased) Offline Data Kontextuelle Online-Preise mit (Biased) Offline-Daten 在线在线在线在线上上上上下线数据(比亚) 2507.02762v1

Authors (3): Yixuan Zhang, Ruihao Zhu, Qiaomin Xie

We study contextual online pricing with biased offline data. For the scalar price elasticity case, we identify the instance-dependent quantity $\delta^2$ that measures how far the offline data lies from the (unknown) online optimum. We show that the time length $T$, bias bound $V$, size $N$ and dispersion $\lambda_{\min}(\hat{\Sigma})$ of the offline data, and $\delta^2$ jointly determine the statistical complexity. An Optimism-in-the-Face-of-Uncertainty (OFU) policy achieves a minimax-optimal, instance-dependent regret bound $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT}{\lambda_{\min}(\hat{\Sigma}) + (N \wedge T) \delta^2})\big)$. For general price elasticity, we establish a worst-case, minimax-optimal rate $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT }{\lambda_{\min}(\hat{\Sigma})})\big)$ and provide a generalized OFU algorithm that attains it. When the bias bound $V$ is unknown, we design a robust variant that always guarantees sub-linear regret and strictly improves on purely online methods whenever the exact bias is small. These results deliver the first tight regret guarantees for contextual pricing in the presence of biased offline data. Our techniques also transfer verbatim to stochastic linear bandits with biased offline data, yielding analogous bounds.

nan


Article 1258

Title@2025-07-03 (4): Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

Title: Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness Quantifizierung der sektorübergreifenden Intersecting-Diskrepanzen innerhalb mehrerer Gruppen mittels Latent Class-Analyse zur Fairness 量化多个群体内部的跨部门交叉差异,利用编内分类分析进行量化,以实现公平 2407.03133v4

Authors (5): Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

The growing interest in fair AI development is evident. The ‘‘Leave No One Behind’’ initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including both EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies both among minority ethnic groups and between minority ethnic groups and non-minority ethnic groups, emphasising the need for targeted interventions in policy-making processes. Furthermore, we demonstrate how the proposed approach can provide valuable insights into ensuring fairness in machine learning systems.

nan


Article 1259

Title@2025-07-03 (4): Fast and Simplex: 2-Simplicial Attention in Triton

Title: Fast and Simplex: 2-Simplicial Attention in Triton Schnell und Simplex: 2-Simplizielle Aufmerksamkeit in Triton 快速和简单: Tritton 中的 2 简化注意 2507.02754v1

Authors (8): Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is becoming less valid. This shift highlights the need for architectures that prioritize token efficiency. In this work, we investigate the use of the 2-simplicial Transformer, an architecture that generalizes standard dot-product attention to trilinear functions through an efficient Triton kernel implementation. We demonstrate that the 2-simplicial Transformer achieves better token efficiency than standard Transformers: for a fixed token budget, similarly sized models outperform their dot-product counterparts on tasks involving mathematics, coding, reasoning, and logic. We quantify these gains by demonstrating that $2$-simplicial attention changes the exponent in the scaling laws for knowledge and reasoning tasks compared to dot product attention.

nan


Article 1260

Title@2025-07-03 (4): Towards a Novel Measure of User Trust in XAI Systems

Title: Towards a Novel Measure of User Trust in XAI Systems Auf dem Weg zu einem neuartigen Maß für das Vertrauen der Nutzer in XAI-Systeme 争取对XAI系统用户信任的新措施 2405.05766v2

Authors (7): Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo, Adel Ghazel, Maria Gemma Sempere Campello, Juan Antonio Palmer Sancho

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel trust measure in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted three case studies showing an improvement respect the state-of-the-art, with an increased sensitiviy to different scenarios.

nan


Article 1261

Title@2025-07-03 (4): Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics

Title: Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics Lineare Aufmerksamkeit mit globalem Kontext: Ein mehrpoliger Aufmerksamkeitsmechanismus für Vision und Physik 全球一线关注:视觉和物理多极关注机制 2507.02748v1

Authors (4): Alex Colagrande, Paul Caillon, Eva Feillet, Alexandre Allauzen

Transformers have become the de facto standard for a wide range of tasks, from image classification to physics simulations. Despite their impressive performance, the quadratic complexity of standard Transformers in both memory and time with respect to the input length makes them impractical for processing high-resolution inputs. Therefore, several variants have been proposed, the most successful relying on patchification, downsampling, or coarsening techniques, often at the cost of losing the finest-scale details. In this work, we take a different approach. Inspired by state-of-the-art techniques in $n$-body numerical simulations, we cast attention as an interaction problem between grid points. We introduce the Multipole Attention Neural Operator (MANO), which computes attention in a distance-based multiscale fashion. MANO maintains, in each attention head, a global receptive field and achieves linear time and memory complexity with respect to the number of grid points. Empirical results on image classification and Darcy flows demonstrate that MANO rivals state-of-the-art models such as ViT and Swin Transformer, while reducing runtime and peak memory usage by orders of magnitude. We open source our code for reproducibility at https://github.com/AlexColagrande/MANO.

nan


Article 1262

Title@2025-07-03 (4): Early Signs of Steganographic Capabilities in Frontier LLMs

Title: Early Signs of Steganographic Capabilities in Frontier LLMs Frühe Anzeichen von Steganographischen Fähigkeiten in Frontier LLMs 边疆长长体动物能力早期信号 2507.02737v1

Authors (5): Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner

Monitoring Large Language Model (LLM) outputs is crucial for mitigating risks from misuse and misalignment. However, LLMs could evade monitoring through steganography: Encoding hidden information within seemingly benign generations. In this paper, we evaluate the steganography capabilities in frontier LLMs to better understand the risk they pose. We focus on two types of steganography: passing encoded messages and performing encoded reasoning. We find that current models are unable to encode short messages in their outputs without a monitor noticing under standard affordances. They can succeed, however, if given additional affordances such as using an unmonitored scratchpad and coordinating on what encoding scheme to use. We additionally find early signs that models can perform basic encoded reasoning in a simple state-tracking problem. This includes some ability to reason with their own and pre-defined schemes, including encoding schemes such as Hexadecimal. Despite this, they can rarely hide reasoning subtly within a cover task to fool a monitor. Overall, our results indicate that current LLMs exhibit nascent steganographic capabilities. While these capabilities are likely insufficient to bypass well-designed monitors at present, this could change in the future.

nan


Article 1263

Title@2025-07-03 (4): Classification by Separating Hypersurfaces: An Entropic Approach

Title: Classification by Separating Hypersurfaces: An Entropic Approach Klassifizierung durch Trennen von Hyperoberflächen: Ein entropischer Ansatz 按分离超表层分类: 2507.02732v1

Authors (3): Argimiro Arratia, Mahmoud El Daou, Henryk Gzyl

We consider the following classification problem: Given a population of individuals characterized by a set of attributes represented as a vector in ${\mathbb R}^N$, the goal is to find a hyperplane in ${\mathbb R}^N$ that separates two sets of points corresponding to two distinct classes. This problem, with a history dating back to the perceptron model, remains central to machine learning. In this paper we propose a novel approach by searching for a vector of parameters in a bounded $N$-dimensional hypercube centered at the origin and a positive vector in ${\mathbb R}^M$, obtained through the minimization of an entropy-based function defined over the space of unknown variables. The method extends to polynomial surfaces, allowing the separation of data points by more complex decision boundaries. This provides a robust alternative to traditional linear or quadratic optimization techniques, such as support vector machines and gradient descent. Numerical experiments demonstrate the efficiency and versatility of the method in handling diverse classification tasks, including linear and non-linear separability.

nan


Article 1264

Title@2025-07-03 (4): Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Title: Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Bourbaki: Selbsterzeugte und zielkonditionierte MDPs für Theorem-Proving Bourbaki:为理论证明提供自创和有目标、有条件的MDP 2507.02726v1

Authors (6): Matthieu Zimmer, Xiaotong Ji, Rasul Tutunov, Anthony Bordg, Jun Wang, Haitham Bou Ammar

Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.

nan


Article 1265

Title@2025-07-03 (4): Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms

Title: Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms Hierarchisches Multi-Label Kontrastives Lernen für Protein-Protein-Interaktionsvorhersage über Organismen hinweg 高等级多标签差异学习,促进各种组织间蛋白因-蛋白因-蛋白质-蛋白因互动预测 2507.02724v1

Authors (5): Shiyi Liu, Buwen Liang, Yuetong Fang, Zixuan Jiang, Renjing Xu

Recent advances in AI for science have highlighted the power of contrastive learning in bridging heterogeneous biological data modalities. Building on this paradigm, we propose HIPPO (HIerarchical Protein-Protein interaction prediction across Organisms), a hierarchical contrastive framework for protein-protein interaction(PPI) prediction, where protein sequences and their hierarchical attributes are aligned through multi-tiered biological representation matching. The proposed approach incorporates hierarchical contrastive loss functions that emulate the structured relationship among functional classes of proteins. The framework adaptively incorporates domain and family knowledge through a data-driven penalty mechanism, enforcing consistency between the learned embedding space and the intrinsic hierarchy of protein functions. Experiments on benchmark datasets demonstrate that HIPPO achieves state-of-the-art performance, outperforming existing methods and showing robustness in low-data regimes. Notably, the model demonstrates strong zero-shot transferability to other species without retraining, enabling reliable PPI prediction and functional inference even in less characterized or rare organisms where experimental data are limited. Further analysis reveals that hierarchical feature fusion is critical for capturing conserved interaction determinants, such as binding motifs and functional annotations. This work advances cross-species PPI prediction and provides a unified framework for interaction prediction in scenarios with sparse or imbalanced multi-species data.

nan


Article 1266

Title@2025-07-03 (4): On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability

Title: On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability Über Charakterisierungen für die Sprachgenerierung: Interplay von Halluzinationen, Breadth und Stabilität 语言生成特征:幻觉、面包和稳定之间的相互作用 2412.18530v2

Authors (3): Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas

We study language generation in the limit - introduced by Kleinberg and Mullainathan [KM24] - building on classical works of Gold [Gol67] and Angluin [Ang79]. [KM24]’s main result is an algorithm for generating from any countable language collection in the limit. While their algorithm eventually generates unseen strings from the target language $K$, it sacrifices coverage or breadth, i.e., its ability to generate a rich set of strings. Recent work introduces different notions of breadth and explores when generation with breadth is possible, leaving a full characterization of these notions open. Our first set of results settles this by characterizing generation for existing notions of breadth and their natural extensions. Interestingly, our lower bounds are very flexible and hold for many performance metrics beyond breadth - for instance, showing that, in general, it is impossible to train generators which achieve a higher perplexity or lower hallucination rate for $K$ compared to other languages. Next, we study language generation with breadth and stable generators - algorithms that eventually stop changing after seeing an arbitrary but finite number of strings - and prove unconditional lower bounds for such generators, strengthening the results of [KMV25] and demonstrating that generation with many existing notions of breadth becomes equally hard, when stability is required. This gives a separation for generation with approximate breadth, between stable and unstable generators, highlighting the rich interplay between breadth, stability, and consistency in language generation.

nan


Article 1267

Title@2025-07-03 (4): Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification

Title: Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Gradient-Based Model Fingerprinting für LLM Ähnlichkeitserkennung und Familienklassifizierung LLM相似性探测和家庭分类的渐进式样指纹 2506.01631v2

Authors (3): Zehao Wu, Yanjie Zhao, Haoyu Wang

As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta’s LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance. To fill this gap, treating LLMs as software artifacts requiring provenance tracking, we present TensorGuard, a gradient-based fingerprinting framework for LLM similarity detection and family classification. Our approach extracts model-intrinsic behavioral signatures by analyzing gradient responses to random input perturbations across tensor layers, operating independently of training data, watermarks, or specific model formats. TensorGuard supports the widely-adopted safetensors format and constructs high-dimensional fingerprints through statistical analysis of gradient features. These fingerprints enable two complementary capabilities: direct pairwise similarity assessment between arbitrary models through distance computation, and systematic family classification of unknown models via the K-Means clustering algorithm with domain-informed centroid initialization using known base models. Experimental evaluation on 58 models comprising 8 base models and 50 derivatives across five model families (Llama, Qwen, Gemma, Phi, Mistral) demonstrates 94% classification accuracy under our centroid-initialized K-Means clustering.

nan


Article 1268

Title@2025-07-03 (4): A Comprehensive Machine Learning Framework for Micromobility Demand Prediction

Title: A Comprehensive Machine Learning Framework for Micromobility Demand Prediction Umfassendes Machine Learning Framework für die Nachfragevorhersage der Mikromobilität 微观流动需求预测综合机械学习框架 2507.02715v1

Authors (3): Omri Porat, Michael Fire, Eran Ben-Elia

Dockless e-scooters, a key micromobility service, have emerged as eco-friendly and flexible urban transport alternatives. These services improve first and last-mile connectivity, reduce congestion and emissions, and complement public transport for short-distance travel. However, effective management of these services depends on accurate demand prediction, which is crucial for optimal fleet distribution and infrastructure planning. While previous studies have focused on analyzing spatial or temporal factors in isolation, this study introduces a framework that integrates spatial, temporal, and network dependencies for improved micromobility demand forecasting. This integration enhances accuracy while providing deeper insights into urban micromobility usage patterns. Our framework improves demand prediction accuracy by 27 to 49% over baseline models, demonstrating its effectiveness in capturing micromobility demand patterns. These findings support data-driven micromobility management, enabling optimized fleet distribution, cost reduction, and sustainable urban planning.

nan


Article 1269

Title@2025-07-03 (4): A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control

Title: A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Eine Forget-and-Grow-Strategie für tiefe Verstärkungs-Lernskalierung in kontinuierlicher Kontrolle 连续控制深入加强学习的 “ 遗忘与增长 “ 战略 2507.02712v1

Authors (6): Zilin Kang, Chenyuan Hu, Yu Luo, Zhecheng Yuan, Ruijie Zheng, Huazhe Xu

Deep reinforcement learning for continuous control has recently achieved impressive progress. However, existing methods often suffer from primacy bias, a tendency to overfit early experiences stored in the replay buffer, which limits an RL agent’s sample efficiency and generalizability. In contrast, humans are less susceptible to such bias, partly due to infantile amnesia, where the formation of new neurons disrupts early memory traces, leading to the forgetting of initial experiences. Inspired by this dual processes of forgetting and growing in neuroscience, in this paper, we propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced. First, Experience Replay Decay (ER Decay) “forgetting early experience”, which balances memory by gradually reducing the influence of early experiences. Second, Network Expansion, “growing neural capacity”, which enhances agents’ capability to exploit the patterns of existing data by dynamically adding new parameters during training. Empirical results on four major continuous control benchmarks with more than 40 tasks demonstrate the superior performance of FoG against SoTA existing deep RL algorithms, including BRO, SimBa, and TD-MPC2.

nan


Article 1270

Title@2025-07-03 (4): Fluid Democracy in Federated Data Aggregation

Title: Fluid Democracy in Federated Data Aggregation Fluide Demokratie in Federated Data Aggregation 联邦数据聚合中流体民主 2507.02710v1

Authors (2): Aditya Vema Reddy Kesari, Krishna Reddy Kesari

Federated learning (FL) mechanisms typically require each client to transfer their weights to a central server, irrespective of how useful they are. In order to avoid wasteful data transfer costs from clients to the central server, we propose the use of consensus based protocols to identify a subset of clients with most useful model weights at each data transfer step. First, we explore the application of existing fluid democracy protocols to FL from a performance standpoint, comparing them with traditional one-person-one-vote (also known as 1p1v or FedAvg). We propose a new fluid democracy protocol named viscous-retained democracy that always does better than 1p1v under the same assumptions as existing fluid democracy protocols while also not allowing for influence accumulation. Secondly, we identify weaknesses of fluid democracy protocols from an adversarial lens in terms of their dependence on topology and/ or number of adversaries required to negatively impact the global model weights. To this effect, we propose an algorithm (FedVRD) that dynamically limits the effect of adversaries while minimizing cost by leveraging the delegation topology.

nan


Article 1271

Title@2025-07-03 (4): Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Title: Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Multi-Agenten-Verstärkungs-Lernen für dynamische Preise in Lieferketten: Benchmarking Strategic Agent Verhaltensmuster unter realistisch simulierten Marktbedingungen 供应链动态定价多机构强化学习:在现实模拟市场条件下确定战略代理行为基准 2507.02698v1

Authors (4): Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, Marijn van Rijswijk

This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain’s Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive dynamics. Among MARL agents, MADQN exhibits the most aggressive pricing behaviour, with the highest volatility and the lowest fairness (0.5844). MADDPG provides a more balanced approach, supporting market competition (share volatility: 9.5 pp) while maintaining relatively high fairness (0.8819) and stable pricing. These findings suggest that MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.

nan


Article 1272

Title@2025-07-03 (4): RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes

Title: RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes RLHGNN: Verstärkung Lernorientiertes Heterogenes Graph Neuronales Netzwerk für die nächste Aktivitätsvorhersage in Geschäftsprozessen RLHGNN: 业务流程下一个活动预测的强化学习驱动的异质图形神经网络 2507.02690v1

Authors (6): Jiaxing Wang, Yifeng Yu, Jiahan Song, Bin Cao, Jing Fan, Ji Zhang

Next activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. To address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. The source code is available at https://github.com/Joker3993/RLHGNN.

nan


Article 1273

Title@2025-07-03 (4): HAPI: A Model for Learning Robot Facial Expressions from Human Preferences

Title: HAPI: A Model for Learning Robot Facial Expressions from Human Preferences HAPI: Ein Modell zum Lernen von Roboter-Gesichtsausdrücken aus menschlichen Präferenzen HAPI: 人类首选机器人法表达方式学习模式 2503.17046v2

Authors (6): Dongsheng Yang, Qianying Liu, Wataru Sato, Takashi Minato, Chaoran Liu, Shin’ya Nishida

Automatic robotic facial expression generation is crucial for human-robot interaction, as handcrafted methods based on fixed joint configurations often yield rigid and unnatural behaviors. Although recent automated techniques reduce the need for manual tuning, they tend to fall short by not adequately bridging the gap between human preferences and model predictions-resulting in a deficiency of nuanced and realistic expressions due to limited degrees of freedom and insufficient perceptual integration. In this work, we propose a novel learning-to-rank framework that leverages human feedback to address this discrepancy and enhanced the expressiveness of robotic faces. Specifically, we conduct pairwise comparison annotations to collect human preference data and develop the Human Affective Pairwise Impressions (HAPI) model, a Siamese RankNet-based approach that refines expression evaluation. Results obtained via Bayesian Optimization and online expression survey on a 35-DOF android platform demonstrate that our approach produces significantly more realistic and socially resonant expressions of Anger, Happiness, and Surprise than those generated by baseline and expert-designed methods. This confirms that our framework effectively bridges the gap between human preferences and model predictions while robustly aligning robotic expression generation with human affective responses.

nan


Article 1274

Title@2025-07-03 (4): Learning few-step posterior samplers by unfolding and distillation of diffusion models

Title: Learning few-step posterior samplers by unfolding and distillation of diffusion models Lernen der wenigen Schritte hinteren Sampler durch Entfalten und Destillation von Diffusionsmodellen 通过推广模型的推出和蒸馏,学习几步后后后继采样器 2507.02686v1

Authors (3): Charlesquin Kemajou Mbakam, Jonathan Spence, Marcelo Pereyra

Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm - specifically, the recently proposed LATINO Langevin sampler (Spagnoletti et al., 2025) - representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

nan


Article 1275

Title@2025-07-03 (4): Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Title: Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs Einbetten-basierter Federated Data Sharing über differentially Private Conditional VAEs 通过差异私人条件性自愿和有条件的 嵌入式基于嵌入式的联邦联邦数据分享 2507.02671v1

Authors (3): Francesco Di Salvo, Hanh Huyen My Nguyen, Christian Ledig

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

nan


Article 1276

Title@2025-07-03 (4): Guided Generation for Developable Antibodies

Title: Guided Generation for Developable Antibodies Geführte Generation für entwickelbare Antikörper 可开发抗体的代人制 2507.02670v1

Authors (4): Siqi Zhao, Joshua Moller, Porfi Quintero-Cadena, Lood van Niekerk

Therapeutic antibodies require not only high-affinity target engagement, but also favorable manufacturability, stability, and safety profiles for clinical effectiveness. These properties are collectively called `developability’. To enable a computational framework for optimizing antibody sequences for favorable developability, we introduce a guided discrete diffusion model trained on natural paired heavy- and light-chain sequences from the Observed Antibody Space (OAS) and quantitative developability measurements for 246 clinical-stage antibodies. To steer generation toward biophysically viable candidates, we integrate a Soft Value-based Decoding in Diffusion (SVDD) Module that biases sampling without compromising naturalness. In unconstrained sampling, our model reproduces global features of both the natural repertoire and approved therapeutics, and under SVDD guidance we achieve significant enrichment in predicted developability scores over unguided baselines. When combined with high-throughput developability assays, this framework enables an iterative, ML-driven pipeline for designing antibodies that satisfy binding and biophysical criteria in tandem.

nan


Article 1277

Title@2025-07-03 (4): OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Title: OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding OmniDraft: Ein Cross-Vocabulary, Online Adaptive Drafter für die gerätespezifische Dekodierung 总括草案:跨词汇、在线在线可调适性套用投机下限设计图纸 2507.02659v1

Authors (7): Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, Xiaopeng Zhang

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the \textit{``one drafter for all’’} paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1.5-2x speedup.

nan


Article 1278

Title@2025-07-03 (4): Fair Deepfake Detectors Can Generalize

Title: Fair Deepfake Detectors Can Generalize Faire Deepfake-Detektoren können generalisieren 公平深假探测器可普遍化 2507.02645v1

Authors (6): Harry Cheng, Ming-Hui Liu, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

nan


Article 1279

Title@2025-07-03 (4): On Efficient Bayesian Exploration in Model-Based Reinforcement Learning

Title: On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Effiziente bayesische Exploration im modellbasierten Ausbau-Lernen 关于在示范强化学习中高效探索贝耶斯人 2507.02639v1

Authors (3): Alberto Caron, Chris Hicks, Vasilios Mavroudis

In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment’s dynamics and rewards, thereby aligning exploration with genuine knowledge gaps. Our analysis provides formal guarantees for IG-based approaches, which previously lacked theoretical grounding. To enable practical use, we also discuss tractable approximations via sparse variational Gaussian Processes, Deep Kernels and Deep Ensemble models. We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration. We empirically demonstrate that PTS-BE substantially outperforms other baselines across a variety of environments characterized by sparse rewards and/or purely exploratory tasks.

nan


Article 1280

Title@2025-07-03 (4): High-Order Deep Meta-Learning with Category-Theoretic Interpretation

Title: High-Order Deep Meta-Learning with Category-Theoretic Interpretation High-Order Deep Meta-Learning mit kategorie-theoretischer Interpretation 具有分类理论判读的高等级深层元元学习 2507.02634v1

Authors (1): David H. Mguni

We introduce a new hierarchical deep learning framework for recursive higher-order meta-learning that enables neural networks (NNs) to construct, solve, and generalise across hierarchies of tasks. Central to this approach is a generative mechanism that creates \emph{virtual tasks} – synthetic problem instances designed to enable the meta-learner to learn \emph{soft constraints} and unknown generalisable rules across related tasks. Crucially, this enables the framework to generate its own informative, task-grounded datasets thereby freeing machine learning (ML) training from the limitations of relying entirely on human-generated data. By actively exploring the virtual point landscape and seeking out tasks lower-level learners find difficult, the meta-learner iteratively refines constraint regions. This enhances inductive biases, regularises the adaptation process, and produces novel, unanticipated tasks and constraints required for generalisation. Each meta-level of the hierarchy corresponds to a progressively abstracted generalisation of problems solved at lower levels, enabling a structured and interpretable learning progression. By interpreting meta-learners as category-theoretic \emph{functors} that generate and condition a hierarchy of subordinate learners, we establish a compositional structure that supports abstraction and knowledge transfer across progressively generalised tasks. The category-theoretic perspective unifies existing meta-learning models and reveals how learning processes can be transformed and compared through functorial relationships, while offering practical design principles for structuring meta-learning. We speculate this architecture may underpin the next generation of NNs capable of autonomously generating novel, instructive tasks and their solutions, thereby advancing ML towards general artificial intelligence.

nan


Article 1281

Title@2025-07-03 (4): Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data

Title: Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Medical Data Pecking: Ein kontextorientierter Ansatz zur automatisierten Qualitätsbewertung strukturierter medizinischer Daten 医疗数据分级:结构医疗数据自动质量评价的内幕意识方法 2507.02628v1

Authors (4): Irena Girshovitz, Atai Ambus, Moni Shahar, Ran Gilad-Bachrach

Background: The use of Electronic Health Records (EHRs) for epidemiological studies and artificial intelligence (AI) training is increasing rapidly. The reliability of the results depends on the accuracy and completeness of EHR data. However, EHR data often contain significant quality issues, including misrepresentations of subpopulations, biases, and systematic errors, as they are primarily collected for clinical and billing purposes. Existing quality assessment methods remain insufficient, lacking systematic procedures to assess data fitness for research. Methods: We present the Medical Data Pecking approach, which adapts unit testing and coverage concepts from software engineering to identify data quality concerns. We demonstrate our approach using the Medical Data Pecking Tool (MDPT), which consists of two main components: (1) an automated test generator that uses large language models and grounding techniques to create a test suite from data and study descriptions, and (2) a data testing framework that executes these tests, reporting potential errors and coverage. Results: We evaluated MDPT on three datasets: All of Us (AoU), MIMIC-III, and SyntheticMass, generating 55-73 tests per cohort across four conditions. These tests correctly identified 20-43 non-aligned or non-conforming data issues. We present a detailed analysis of the LLM-generated test suites in terms of reference grounding and value accuracy. Conclusion: Our approach incorporates external medical knowledge to enable context-sensitive data quality testing as part of the data analysis workflow to improve the validity of its outcomes. Our approach tackles these challenges from a quality assurance perspective, laying the foundation for further development such as additional data modalities and improved grounding methods.

nan


Article 1282

Title@2025-07-03 (4): A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes

Title: A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes Ein Matrix-Varianten-Auto-Encoder für Variant-Effekt-Vorhersage in Pharmakogenen 药用植物可变效应预测矩阵变式自动编码器 2507.02624v1

Authors (6): Antoine Honoré, Borja Rodríguez Gálvez, Yoomi Park, Yitian Zhou, Volker M. Lauschke, Ming Xiao

Variant effect predictors (VEPs) aim to assess the functional impact of protein variants, traditionally relying on multiple sequence alignments (MSAs). This approach assumes that naturally occurring variants are fit, an assumption challenged by pharmacogenomics, where some pharmacogenes experience low evolutionary pressure. Deep mutational scanning (DMS) datasets provide an alternative by offering quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder (matVAE) with a structured prior and evaluate its performance on 33 DMS datasets corresponding to 26 drug target and ADME proteins from the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms the state-of-the-art DeepSequence model in zero-shot prediction on DMS datasets, despite using an order of magnitude fewer parameters and requiring less computation at inference time. We also compare matVAE-MSA to matENC-DMS, a model of similar capacity trained on DMS data, and find that the latter performs better on supervised prediction tasks. Additionally, incorporating AlphaFold-generated structures into our transformer model further improves performance, achieving results comparable to DeepSequence trained on MSAs and finetuned on DMS. These findings highlight the potential of DMS datasets to replace MSAs without significant loss in predictive performance, motivating further development of DMS datasets and exploration of their relationships to enhance variant effect prediction.

nan


Article 1283

Title@2025-07-03 (4): L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

Title: L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation L-VAE: Variationaler Auto-Encoder mit erlernbarer Beta zur entfremdeten Darstellung L-VAE: 具有拆分代表的可学习贝塔变量自动编码器 2507.02619v1

Authors (3): Hazal Mogultay Ozcan, Sinan Kalkan, Fatos T. Yarman-Vural

In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of \b{eta}-VAE, wherein the hyperparameter, \b{eta}, is empirically adjusted. L-VAE mitigates the limitations of \b{eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against \b{eta}-VAE, VAE, ControlVAE, DynamicVAE, and {\sigma}-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.

nan


Article 1284

Title@2025-07-03 (4): Interpreting Graph Inference with Skyline Explanations

Title: Interpreting Graph Inference with Skyline Explanations Graphenableitung mit Skyline-Erläuterungen interpretieren 与天线解释解释的图表推理 2505.07635v2

Authors (4): Dazhuo Qiu, Haolai Che, Arijit Khan, Yinghui Wu

Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided’’ interpretations. This paper introduces skyline explanation, a new paradigm that interprets GNN output by simultaneously optimizing multiple explainability measures of users’ interests. (1) We propose skyline explanations as a Pareto set of explanatory subgraphs that dominate others over multiple explanatory measures. We formulate skyline explanation as a multi-criteria optimization problem, and establish its hardness results. (2) We design efficient algorithms with an onion-peeling approach, which strategically prioritizes nodes and removes unpromising edges to incrementally assemble skyline explanations. (3) We also develop an algorithm to diversify the skyline explanations to enrich the comprehensive interpretation. (4) We introduce efficient parallel algorithms with load-balancing strategies to scale skyline explanation for large-scale GNN-based inference. Using real-world and synthetic graphs, we experimentally verify our algorithms’ effectiveness and scalability.

nan


Article 1285

Title@2025-07-03 (4): Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection

Title: Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection Orientation-Aware Sparse Tensor PCA für effiziente, unüberwachte Feature-Auswahl 用于高效、不受监督地物选择的五氯苯甲醚 2407.16985v3

Authors (5): Junjing Zheng, Xinyu Zhang, Weidong Jiang, Xiangfeng Qiu, Mingjian Ren

Recently, introducing Tensor Decomposition (TD) techniques into unsupervised feature selection (UFS) has been an emerging research topic. A tensor structure is beneficial for mining the relations between different modes and helps relieve the computation burden. However, while existing methods exploit TD to preserve the data tensor structure, they do not consider the influence of data orientation and thus have difficulty in handling orientation-specific data such as time series. To solve the above problem, we utilize the orientation-dependent tensor-tensor product from Tensor Singular Value Decomposition based on *M-product (T-SVDM) and extend the one-dimensional Sparse Principal Component Analysis (SPCA) to a tensor form. The proposed sparse tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components, enhancing flexibility and accuracy in learning feature relations. To ensure fast convergence and a flexible description of feature correlation, we develop a convex version specially designed for general UFS tasks and propose an efficient slice-by-slice algorithm that performs dual optimization in the transform domain. Experimental results on real-world datasets demonstrate the effectiveness and remarkable computational efficiency of the proposed method for tensor data of diverse structures over the state-of-the-art. When transform axes align with feature distribution patterns, our method is promising for various applications. The codes related to our proposed methods and the experiments are available at https://github.com/zjj20212035/STPCA.git.

nan


Article 1286

Title@2025-07-03 (4): Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

Title: Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Lost in Latent Space: Eine empirische Studie von Latent Diffusionsmodellen für die Emulation von Physik 失落于隐时空:物理模拟的中子扩散模型经验研究 2507.02608v1

Authors (6): François Rozet, Ruben Ohana, Michael McCabe, Gilles Louppe, François Lanusse, Shirley Ho

The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

nan


Article 1287

Title@2025-07-03 (4): Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs”

Title: Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs” Nieder mit der Hierarchie: Das ‘H’ in HNSW steht für “Hubs” 打倒等级制度: HNSW的”H”代表”Hubs” 2412.01940v3

Authors (3): Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. As the name suggests, HNSW searches a layered hierarchical graph to quickly identify neighborhoods of similar points to a given query vector. But is this hierarchy even necessary? A rigorous experimental analysis to answer this question would provide valuable insights into the nature of algorithm design for ANN search and motivate directions for future work in this increasingly crucial domain. We conduct an extensive benchmarking study covering more large-scale datasets than prior investigations of this question. We ultimately find that a flat navigable small world graph graph retains all of the benefits of HNSW on high-dimensional datasets, with latency and recall performance essentially \emph{identical} to the original algorithm but with less memory overhead. Furthermore, we go a step further and study \emph{why} the hierarchy of HNSW provides no benefit in high dimensions, hypothesizing that navigable small world graphs contain a well-connected, frequently traversed ``highway” of hub nodes that maintain the same purported function as the hierarchical layers. We present compelling empirical evidence that the \emph{Hub Highway Hypothesis} holds for real datasets and investigate the mechanisms by which the highway forms. The implications of this hypothesis may also provide future research directions in developing enhancements to graph-based ANN search.

nan


Article 1288

Title@2025-07-03 (4): De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

Title: De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks De-AntiFake: Die Protective Perturbations gegen Stimmen-Klonangriffe neu denken De-Antifake:重新思考对语音克隆攻击进行保护性干扰 2507.02606v1

Authors (5): Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https://de-antifake.github.io.

nan


Article 1289

Title@2025-07-03 (4): Automated Grading of Students’ Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models

Title: Automated Grading of Students’ Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models Automatisierte Abstufung der handschriftlichen Grafiken von Studenten: Vergleich von Meta-Learning und Vision-Large Language Models 学生手写图图的自动分级:混合学习和视野大的语言模型比较 2507.03056v1

Authors (3): Behnam Parsaeifard, Martin Hlosta, Per Bergamin

With the rise of online learning, the demand for efficient and consistent assessment in mathematics has significantly increased over the past decade. Machine Learning (ML), particularly Natural Language Processing (NLP), has been widely used for autograding student responses, particularly those involving text and/or mathematical expressions. However, there has been limited research on autograding responses involving students’ handwritten graphs, despite their prevalence in Science, Technology, Engineering, and Mathematics (STEM) curricula. In this study, we implement multimodal meta-learning models for autograding images containing students’ handwritten graphs and text. We further compare the performance of Vision Large Language Models (VLLMs) with these specially trained metalearning models. Our results, evaluated on a real-world dataset collected from our institution, show that the best-performing meta-learning models outperform VLLMs in 2-way classification tasks. In contrast, in more complex 3-way classification tasks, the best-performing VLLMs slightly outperform the meta-learning models. While VLLMs show promising results, their reliability and practical applicability remain uncertain and require further investigation.

nan


Article 1290

Title@2025-07-03 (4): StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Title: StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models StructTransform: Eine skalierbare Angriffsfläche für sicherheitsorientierte große Sprachmodelle 条形变形:用于安全-统一大语言模型的可缩放攻击表面 2502.11853v2

Authors (5): Shehel Yoosuf, Temoor Ali, Ahmed Lekssays, Mashael AlSabah, Issa Khalil

In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g., SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to a 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing content transformations, resulting in over 96% ASR with 0% refusals. To generalize our attacks, we explore numerous structure formats, including syntaxes purely generated by LLMs. Our results indicate that such novel syntaxes are easy to generate and result in a high ASR, suggesting that defending against our attacks is not a straightforward process. Finally, we develop a benchmark and evaluate existing safety-alignment defenses against it, showing that most of them fail with 100% ASR. Our results show that existing safety alignment mostly relies on token-level patterns without recognizing harmful concepts, highlighting and motivating the need for serious research efforts in this direction. As a case study, we demonstrate how attackers can use our attack to easily generate a sample malware and a corpus of fraudulent SMS messages, which perform well in bypassing detection.

nan


Article 1291

Title@2025-07-03 (4): Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data

Title: Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data Padé Ungefähre neurale Netzwerke für verbesserte elektrische Motorfehlerdiagnose mit Vibration und akustischen Daten 利用振动和声频数据增强电动机动车故障诊断 2507.02599v1

Authors (2): Sertac Kilickaya, Levent Eren

Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research addresses the question: Can Pad'e Approximant Neural Networks (Pad'eNets) outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in diagnosing electrical and mechanical faults using vibration and acoustic data? Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad'eNets. These models are tested on the University of Ottawa’s publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as Leaky ReLU. Results and Conclusion: Pad'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.

nan


Article 1292

Title@2025-07-03 (4): SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Title: SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks SURE-VQA: Systematisches Verständnis der Robustheitsbewertung in medizinischen VQA-Aufgaben SNU-VQA: 系统了解医疗VQA任务中的强力评价 2411.19688v3

Authors (7): Kim-Celine Kahl, Selen Erkan, Jeremias Traub, Carsten T. Lüth, Klaus Maier-Hein, Lena Maier-Hein, Paul F. Jaeger

Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question Answering (VQA), where they could act as interactive assistants for both patients and clinicians. Yet their robustness to distribution shifts on unseen data remains a key concern for safe deployment. Evaluating such robustness requires a controlled experimental setup that allows for systematic insights into the model’s behavior. However, we demonstrate that current setups fail to offer sufficiently thorough evaluations. To address this gap, we introduce a novel framework, called SURE-VQA, centered around three key requirements to overcome current pitfalls and systematically analyze VLM robustness: 1) Since robustness on synthetic shifts does not necessarily translate to real-world shifts, it should be measured on real-world shifts that are inherent to the VQA data; 2) Traditional token-matching metrics often fail to capture underlying semantics, necessitating the use of large language models (LLMs) for more accurate semantic evaluation; 3) Model performance often lacks interpretability due to missing sanity baselines, thus meaningful baselines should be reported that allow assessing the multimodal impact on the VLM. To demonstrate the relevance of this framework, we conduct a study on the robustness of various Fine-Tuning (FT) methods across three medical datasets with four types of distribution shifts. Our study highlights key insights into robustness: 1) No FT method consistently outperforms others in robustness, and 2) robustness trends are more stable across FT methods than across distribution shifts. Additionally, we find that simple sanity baselines that do not use the image data can perform surprisingly well and confirm LoRA as the best-performing FT method on in-distribution data. Code is provided at https://github.com/IML-DKFZ/sure-vqa.

nan


Article 1293

Title@2025-07-03 (4): Revisiting Active Learning under (Human) Label Variation

Title: Revisiting Active Learning under (Human) Label Variation Aktives Lernen unter (menschlichen) Label-Varianten 在(人)标签标签变换下重新审查积极学习 2507.02593v1

Authors (6): Cornelia Gruber, Helen Alber, Bernd Bischl, Göran Kauermann, Barbara Plank, Matthias Aßenmacher

Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed – or neglected – these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.

nan


Article 1294

Title@2025-07-03 (4): Scalable Interconnect Learning in Boolean Networks

Title: Scalable Interconnect Learning in Boolean Networks Skalierbares Interconnect-Lernen in Booleschen Netzwerken 布尔网络中的可缩放互连学习 2507.02585v1

Authors (3): Fabian Kresse, Emily Yu, Christoph H. Lampert

Learned Differentiable Boolean Logic Networks (DBNs) already deliver efficient inference on resource-constrained hardware. We extend them with a trainable, differentiable interconnect whose parameter count remains constant as input width grows, allowing DBNs to scale to far wider layers than earlier learnable-interconnect designs while preserving their advantageous accuracy. To further reduce model size, we propose two complementary pruning stages: an SAT-based logic equivalence pass that removes redundant gates without affecting performance, and a similarity-based, data-driven pass that outperforms a magnitude-style greedy baseline and offers a superior compression-accuracy trade-off.

nan


Article 1295

Title@2025-07-03 (4): SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings

Title: SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings SoccerDiffusion: Auf dem Weg zum Lernen von End-to-End Humanoid Robot Soccer aus Gameplay-Aufnahmen 足球传播:从游戏游戏录音录制中学习最终到最终人类的机器人足球 2504.20808v2

Authors (5): Florian Vahl, Jörn Griepenburg, Jan Gutsche, Jasper Güldenstein, Jianwei Zhang

This paper introduces SoccerDiffusion, a transformer-based diffusion model designed to learn end-to-end control policies for humanoid robot soccer directly from real-world gameplay recordings. Using data collected from RoboCup competitions, the model predicts joint command trajectories from multi-modal sensor inputs, including vision, proprioception, and game state. We employ a distillation technique to enable real-time inference on embedded platforms that reduces the multi-step diffusion process to a single step. Our results demonstrate the model’s ability to replicate complex motion behaviors such as walking, kicking, and fall recovery both in simulation and on physical robots. Although high-level tactical behavior remains limited, this work provides a robust foundation for subsequent reinforcement learning or preference optimization methods. We release the dataset, pretrained models, and code under: https://bit-bots.github.io/SoccerDiffusion

nan


Article 1296

Title@2025-07-03 (4): From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction

Title: From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction Von 2:4 bis 8:16 Sparsitätsmuster in LLMs für Ausreißer und Gewichte mit Varianzkorrektur 从2:4至8:16差异校正后外部线和重力LLMLMs的宽度模式 2507.03052v1

Authors (7): Egor Maximov, Yulia Kuzkina, Azamat Kanametov, Alexander Prutko, Aleksei Goncharov, Maxim Zhelnin, Egor Shvetsov

As large language models (LLMs) grow in size, efficient compression techniques like quantization and sparsification are critical. While quantization maintains performance with reduced precision, structured sparsity methods, such as N:M sparsification, often fall short due to limited flexibility, and sensitivity to outlier weights. We explore 8:16 semi-structured sparsity, demonstrating its ability to surpass the Performance Threshold-where a compressed model matches the accuracy of its uncompressed or smaller counterpart under equivalent memory constraints. Compared to 2:4 sparsity, 8:16 offers greater flexibility with minimal storage overhead (0.875 vs. 0.75 bits/element). We also apply sparse structured patterns for salient weights, showing that structured sparsity for outliers is competitive with unstructured approaches leading to equivalent or better results. Finally, we demonstrate that simple techniques such as variance correction and SmoothQuant like weight equalization improve sparse models performance.

nan


Article 1297

Title@2025-07-03 (4): Towards an Explainable Comparison and Alignment of Feature Embeddings

Title: Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v2

Authors (3): Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable comparison of different embeddings requires identifying and analyzing mismatches between sample groups clustered within the embedding spaces. In this work, we propose the \emph{Spectral Pairwise Embedding Comparison (SPEC)} framework to compare embeddings and identify their differences in clustering a reference dataset. Our approach examines the kernel matrices derived from two embeddings and leverages the eigendecomposition of the difference kernel matrix to detect sample clusters that are captured differently by the two embeddings. We present a scalable implementation of this kernel-based approach, with computational complexity that grows linearly with the sample size. Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. We provide numerical results demonstrating the SPEC’s application to compare and align embeddings on large-scale datasets such as ImageNet and MS-COCO. The project page is available at https://mjalali.github.io/SPEC/.

nan


Article 1298

Title@2025-07-03 (4): Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

Title: Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Transformer benötigen keine Layernorm zur Inferenzzeit: Skalierung der Layernorm Entfernung zu GPT-2 XL und die Implikationen für mechanistische Interpretierbarkeit 突变器不需要在推断时间使用层内线: 将层内线缩放去除到 GPT-2 XL 和对机械可解释性的影响 2507.02559v1

Authors (5): Luca Baroni, Galvin Khara, Joachim Schaeffer, Marat Subkhankulov, Stefan Heimersheim

Layer-wise normalization (LN) is an essential component of virtually all transformer-based large language models. While its effects on training stability are well documented, its role at inference time is poorly understood. Additionally, LN layers hinder mechanistic interpretability by introducing additional nonlinearities and increasing the interconnectedness of individual model components. Here, we show that all LN layers can be removed from every GPT-2 model with only a small increase in validation loss (e.g. +0.03 cross-entropy loss for GPT-2 XL). Thus, LN cannot play a substantial role in language modeling. We find that the amount of fine-tuning data needed for LN removal grows sublinearly with model parameters, suggesting scaling to larger models is feasible. We release a suite of LN-free GPT-2 models on Hugging Face. Furthermore, we test interpretability techniques on LN-free models. Direct logit attribution now gives the exact direct effect of individual components, while the accuracy of attribution patching does not significantly improve. We also confirm that GPT-2’s “confidence neurons” are inactive in the LN-free models. Our work clarifies the role of LN layers in language modeling, showing that GPT-2-class models can function without LN layers. We hope that our LN-free analogs of the GPT-2 family of models will enable more precise interpretability research and improve our understanding of language models.

nan


Article 1299

Title@2025-07-03 (4): Fading memory and the convolution theorem

Title: Fading memory and the convolution theorem Das verblassende Gedächtnis und das Konvolutionstheorem 过时的记忆和变幻的定理 2408.07386v3

Authors (2): Juan-Pablo Ortega, Florian Rossmannek

Several topological and analytical notions of continuity and fading memory for causal and time-invariant filters are introduced, and the relations between them are analyzed. A significant generalization of the convolution theorem that establishes the equivalence between the fading memory property and the availability of convolution representations of linear filters is proved. This result extends a previous similar characterization to a complete array of weighted norms in the definition of the fading memory property. Additionally, the main theorem shows that the availability of convolution representations can be characterized, at least when the codomain is finite-dimensional, not only by the fading memory property but also by the reunion of two purely topological notions that are called minimal continuity and minimal fading memory property. Finally, when the input space and the codomain of a linear functional are Hilbert spaces, it is shown that minimal continuity and the minimal fading memory property guarantee the existence of interesting embeddings of the associated reproducing kernel Hilbert spaces.

nan


Article 1300

Title@2025-07-03 (4): Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements

Title: Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements Adaptive probabilistische ODE-Lösemittel ohne adaptive Speicheranforderungen 无适应性内存要求的适应性概率性ODE溶剂 2410.10530v2

Authors (1): Nicholas Krämer

Despite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations – unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine’s capabilities, making our simulations fail unexpectedly and without warning. Still, dropping adaptivity would abandon years of progress, which can’t be the answer. In this work, we solve this conundrum. We develop an adaptive probabilistic solver with fixed memory demands building on recent developments in robust state estimation. Switching to our method (i) eliminates memory issues for long time series, (ii) accelerates simulations by orders of magnitude through unlocking just-in-time compilation, and (iii) makes adaptive probabilistic solvers compatible with scientific computing in JAX.

nan


Article 1301

Title@2025-07-03 (4): Flow Matching on Lie Groups

Title: Flow Matching on Lie Groups Flow Passend zu Lie-Gruppen 与 Lis Group 匹配 2504.00494v2

Authors (2): Finn M. Sherry, Bart M. N. Smets

Flow Matching (FM) is a recent generative modelling technique: we aim to learn how to sample from distribution $\mathfrak{X}_1$ by flowing samples from some distribution $\mathfrak{X}_0$ that is easy to sample from. The key trick is that this flow field can be trained while conditioning on the end point in $\mathfrak{X}_1$: given an end point, simply move along a straight line segment to the end point (Lipman et al. 2022). However, straight line segments are only well-defined on Euclidean space. Consequently, Chen and Lipman (2023) generalised the method to FM on Riemannian manifolds, replacing line segments with geodesics or their spectral approximations. We take an alternative point of view: we generalise to FM on Lie groups by instead substituting exponential curves for line segments. This leads to a simple, intrinsic, and fast implementation for many matrix Lie groups, since the required Lie group operations (products, inverses, exponentials, logarithms) are simply given by the corresponding matrix operations. FM on Lie groups could then be used for generative modelling with data consisting of sets of features (in $\mathbb{R}^n$) and poses (in some Lie group), e.g. the latent codes of Equivariant Neural Fields (Wessels et al. 2025).

nan


Article 1302

Title@2025-07-03 (4): Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints

Title: Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints Interleaved Gibbs Diffusion: Diskrete-kontinuierliche Daten mit impliziten Einschränkungen generieren Gibbs Interlebert Gibbs 扩散:产生具有隐含限制的分散持续数据 2502.13450v2

Authors (5): Gautham Govind Anil, Sachin Yadav, Dheeraj Nagaraj, Karthikeyan Shanmugam, Prateek Jain

We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for discrete-continuous data, focusing on problems with important, implicit and unspecified constraints in the data. Most prior works on discrete and discrete-continuous diffusion assume a factorized denoising distribution, which can hinder the modeling of strong dependencies between random variables in such problems. We empirically demonstrate a significant improvement in 3-SAT performance out of the box by switching to a Gibbs-sampling style discrete diffusion model which does not assume factorizability. Motivated by this, we introduce IGD which generalizes discrete time Gibbs sampling type Markov chain for the case of discrete-continuous generation. IGD allows for seamless integration between discrete and continuous denoisers while theoretically guaranteeing exact reversal of a suitable forward process. Further, it provides flexibility in the choice of denoisers, allows conditional generation via state-space doubling and inference time refinement. Empirical evaluations on three challenging generation tasks - molecule structures, layouts and tabular data - demonstrate state-of-the-art performance. Notably, IGD achieves state-of-the-art results without relying on domain-specific inductive biases like equivariant diffusion or auxiliary losses. We explore a wide range of modeling, and interleaving strategies along with hyperparameters in each of these problems.

nan


Article 1303

Title@2025-07-03 (4): AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Title: AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench KI-Forscher für maschinelles Lernen: Suche, Exploration und Generalisierung in der MLE-Bench AI 机器学习研究代理机构:MLE-Bench的搜索、探索和普及 2507.02554v1

Authors (25): Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach

AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents’ performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.

nan


Article 1304

Title@2025-07-03 (4): Position: A Theory of Deep Learning Must Include Compositional Sparsity

Title: Position: A Theory of Deep Learning Must Include Compositional Sparsity Position: Eine Theorie des tiefen Lernens muss kompositorische Sparsamkeit einschließen 位置:深深学习理论必须包含构成平等 2507.02550v1

Authors (4): David A. Danhofer, Davide D’Ascenzo, Rafael Dubach, Tomaso Poggio

Overparametrized Deep Neural Networks (DNNs) have demonstrated remarkable success in a wide variety of domains too high-dimensional for classical shallow networks subject to the curse of dimensionality. However, open questions about fundamental principles, that govern the learning dynamics of DNNs, remain. In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain. Completing the picture of the role of compositional sparsity in deep learning is essential to a comprehensive theory of artificial, and even general, intelligence.

nan


Article 1305

Title@2025-07-03 (4): Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

Title: Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling Offline-Verstärkung Lernen für das Lernen zum Versand für Job Shop Scheduling 离线强化学习,学习到派到招工商店日程安排的调度 2409.10589v4

Authors (3): Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL). Moreover, we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.

nan


Article 1306

Title@2025-07-03 (4): Higher-Order Singular-Value Derivatives of Rectangular Real Matrices

Title: Higher-Order Singular-Value Derivatives of Rectangular Real Matrices Hohe Ordnung Singular-Value Derivate der rechteckigen Realmatrizen 矩形真实矩阵的高端圆形单值衍生物 2506.03764v3

Authors (3): Róisín Luo, James McDermott, Colm O’Riordan

We present a theoretical framework for deriving the general $n$-th order Fr'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato’s asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).

nan


Article 1307

Title@2025-07-03 (4): Assessing Quantum Advantage for Gaussian Process Regression

Title: Assessing Quantum Advantage for Gaussian Process Regression Bewertung des Quantenvorteils für Gaussian Process Regression 评估高山进程倒退的量度优势 2505.22502v2

Authors (3): Dominic Lowe, M. S. Kim, Roberto Bondesan

Gaussian Process Regression is a well-known machine learning technique for which several quantum algorithms have been proposed. We show here that in a wide range of scenarios these algorithms show no exponential speedup. We achieve this by rigorously proving that the condition number of a kernel matrix scales at least linearly with the matrix size under general assumptions on the data and kernel. We additionally prove that the sparsity and Frobenius norm of a kernel matrix scale linearly under similar assumptions. The implications for the quantum algorithms runtime are independent of the complexity of loading classical data on a quantum computer and also apply to dequantised algorithms. We supplement our theoretical analysis with numerical verification for popular kernels in machine learning.

nan


Article 1308

Title@2025-07-03 (4): Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN

Title: Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN Normalität in Diskrete-Level-Frage Schwierigkeitsgrad Schätzung: Einführung ausgewogene DRPS und bestelltLogitNN 不同层次问题难以估计的常态:采用平衡的DRPS和有秩序LogitNN 2507.00736v2

Authors (3): Arthur Thuy, Ekaterina Loginova, Dries F. Benoit

Recent years have seen growing interest in Question Difficulty Estimation (QDE) using natural language processing techniques. Question difficulty is often represented using discrete levels, framing the task as ordinal regression due to the inherent ordering from easiest to hardest. However, the literature has neglected the ordinal nature of the task, relying on classification or discretized regression models, with specialized ordinal regression methods remaining unexplored. Furthermore, evaluation metrics are tightly coupled to the modeling paradigm, hindering cross-study comparability. While some metrics fail to account for the ordinal structure of difficulty levels, none adequately address class imbalance, resulting in biased performance assessments. This study addresses these limitations by benchmarking three types of model outputs – discretized regression, classification, and ordinal regression – using the balanced Discrete Ranked Probability Score (DRPS), a novel metric that jointly captures ordinality and class imbalance. In addition to using popular ordinal regression methods, we propose OrderedLogitNN, extending the ordered logit model from econometrics to neural networks. We fine-tune BERT on the RACE++ and ARC datasets and find that OrderedLogitNN performs considerably better on complex tasks. The balanced DRPS offers a robust and fair evaluation metric for discrete-level QDE, providing a principled foundation for future research.

nan


Article 1309

Title@2025-07-03 (4): A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning

Title: A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning Eine quadratische Peg in einem quadratischen Loch: Meta-Experte für langes semi-überwachtes Lernen 广场洞中的一个广场塞格:长期辍学的半模拟学习的元专家 2505.16341v2

Authors (2): Yaxin Hou, Yuheng Jia

This paper studies the long-tailed semi-supervised learning (LTSSL) with distribution mismatch, where the class distribution of the labeled training data follows a long-tailed distribution and mismatches with that of the unlabeled training data. Most existing methods introduce auxiliary classifiers (experts) to model various unlabeled data distributions and produce pseudo-labels, but the expertises of various experts are not fully utilized. We observe that different experts are good at predicting different intervals of samples, e.g., long-tailed expert is skilled in samples located in the head interval and uniform expert excels in samples located in the medium interval. Therefore, we propose a dynamic expert assignment module that can estimate the class membership (i.e., head, medium, or tail class) of samples, and dynamically assigns suitable expert to each sample based on the estimated membership to produce high-quality pseudo-label in the training phase and produce prediction in the testing phase. We also theoretically reveal that integrating different experts’ strengths will lead to a smaller generalization error bound. Moreover, we find that the deeper features are more biased toward the head class but with more discriminative ability, while the shallower features are less biased but also with less discriminative ability. We, therefore, propose a multi-depth feature fusion module to utilize different depth features to mitigate the model bias. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR-10-LT, STL-10-LT, and SVHN-LT datasets across various settings. The code is available at https://github.com/yaxinhou/Meta-Expert.

nan


Article 1310

Title@2025-07-03 (4): RetrySQL: text-to-SQL training with retry data for self-correcting query generation

Title: RetrySQL: text-to-SQL training with retry data for self-correcting query generation RetrySQL: Text-zu-SQL-Training mit Retry-Daten zur Selbstkorrektur der Abfragegenerierung 重新trySQL: 文本到 SQL 培训, 并重试数据, 用于自校查询生成 2507.02529v1

Authors (5): Alicja Rączkowska, Riccardo Belluzzo, Piotr Zieliński, Joanna Baran, Paweł Olszewski

The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored. In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train an open-source coding model with this data and demonstrate that retry steps yield an improvement of up to 4 percentage points in both overall and challenging execution accuracy metrics, compared to pre-training without retry data. Additionally, we confirm that supervised fine-tuning with LoRA is ineffective for learning from retry data and that full-parameter pre-training is a necessary requirement for that task. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters. RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for SQL-oriented language models.

nan


Article 1311

Title@2025-07-03 (4): Deep learning four decades of human migration

Title: Deep learning four decades of human migration Tiefes Lernen von vier Jahrzehnten menschlicher Migration 四十年人类移徙的深入学习 2506.22821v2

Authors (2): Thomas Gaskin, Guy J. Abel

We present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 35 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. We validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.

nan


Article 1312

Title@2025-07-03 (4): Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Title: Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling 通过推动积极抽样,自我监督的演讲人核查框架 2501.17772v3

Authors (2): Theo Lepage, Reda Dehak

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to the same speaker identity but correspond to different recording conditions. This method consistently demonstrates improvements in SV performance on VoxCeleb benchmarks when applied to major SSL frameworks, including SimCLR, SwAV, VICReg, and DINO. Using SSPS, SimCLR and DINO achieve 2.57% and 2.53% EER on VoxCeleb1-O, respectively. SimCLR yields a 58% relative reduction in EER, getting comparable performance to DINO with a simpler training framework. Furthermore, SSPS lowers intra-class variance and reduces channel information in speaker representations while exhibiting greater robustness without data-augmentation.

nan


Article 1313

Title@2025-07-03 (4): Reconsidering the energy efficiency of spiking neural networks

Title: Reconsidering the energy efficiency of spiking neural networks Über die Energieeffizienz neuronaler Netze nachdenken 重新考虑突发神经网络的能源效率 2409.08290v2

Authors (3): Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong

Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleading conclusions regarding the true energy benefits of SNNs. This paper presents a rigorous re-evaluation. We establish a fair baseline by mapping rate-encoded SNNs with $T$ timesteps to functionally equivalent QNNs with $\lceil \log_2(T+1) \rceil$ bits. This ensures both models have comparable representational capacities, as well has similar hardware requirement, enabling meaningful energy comparisons. We introduce a detailed analytical energy model encompassing core computation and data movement (sparse and dense activations, weights). Using this model, we systematically explore a wide parameter space, including intrinsic network characteristics ($T$, spike rate $s_r$, QNN sparsity $\gamma$, model size $N$, weight bit-level) and hardware characteristics (memory system and network-on-chip). Our analysis identifies specific operational regimes where SNNs genuinely offer superior energy efficiency. For example, under typical neuromorphic hardware conditions, SNNs with moderate time windows ($T \in [5,10]$) require an average spike rate ($s_r$) below 6.4% to outperform equivalent QNNs. These insights guide the design of genuinely energy-efficient neural network solutions.

nan


Article 1314

Title@2025-07-03 (4): Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Title: Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v2

Authors (5): Hao Wang, Pinzhi Huang, Jihan Yang, Saining Xie, Daisuke Kawahara

The rapid evolution of multimodal large language models (MLLMs) has significantly enhanced their real-world applications. However, achieving consistent performance across languages, especially when integrating cultural knowledge, remains a significant challenge. To better assess this issue, we introduce two new benchmarks: KnowRecall and VisRecall, which evaluate cross-lingual consistency in MLLMs. KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages, focusing on cultural and historical questions about global landmarks. VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images. Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, still struggle to achieve cross-lingual consistency. This underscores the need for more robust approaches that produce truly multilingual and culturally aware models.

nan


Article 1315

Title@2025-07-03 (4): Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning

Title: Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning Selbstgesteuerte Prozess-Reward-Optimierung mit neu definiertem Schrittweiser Vorteil für Prozess-Verstärkungs-Lernen 自指导流程向上优化,具有重新定义的逐步改进的流程强化学习优势 2507.01551v2

Authors (8): Wu Fei, Hao Kong, Shuxian Liang, Yang Lin, Yibo Yang, Jing Tang, Lei Chen, Xiansheng Hua

Process Reinforcement Learning~(PRL) has demonstrated considerable potential in enhancing the reasoning capabilities of Large Language Models~(LLMs). However, introducing additional process reward models incurs substantial computational overhead, and there is no unified theoretical framework for process-level advantage estimation. To bridge this gap, we propose \textbf{S}elf-Guided \textbf{P}rocess \textbf{R}eward \textbf{O}ptimization~(\textbf{SPRO}), a novel framework that enables process-aware RL through two key innovations: (1) we first theoretically demonstrate that process rewards can be derived intrinsically from the policy model itself, and (2) we introduce well-defined cumulative process rewards and \textbf{M}asked \textbf{S}tep \textbf{A}dvantage (\textbf{MSA}), which facilitates rigorous step-wise action advantage estimation within shared-prompt sampling groups. Our experimental results demonstrate that SPRO outperforms vaniila GRPO with 3.4x higher training efficiency and a 17.5\% test accuracy improvement. Furthermore, SPRO maintains a stable and elevated policy entropy throughout training while reducing the average response length by approximately $1/3$, evidencing sufficient exploration and prevention of reward hacking. Notably, SPRO incurs no additional computational overhead compared to outcome-supervised RL methods such as GRPO, which benefit industrial implementation.

nan


Article 1316

Title@2025-07-03 (4): Monitoring of Static Fairness

Title: Monitoring of Static Fairness Überwachung der statischen Fairness 监测静态公平 2507.03048v1

Authors (4): Thomas A. Henzinger, Mahyar Karimi, Konstantin Kueffner, Kaushik Mallik

Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes. We present a general framework of runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure, with or without full observation of the state space. We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden. We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time. The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer. We present two categories of monitoring algorithms, namely ones with a uniform error bound across all time points, and ones with weaker non-uniform, pointwise error bounds at different time points. Our monitoring algorithms use statistical tools that are adapted to suit the dynamic requirements of monitoring and the special needs of the fairness specifications. Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society. In these experiments, our monitors took less than a millisecond to update their verdicts after each observation.

nan


Article 1317

Title@2025-07-03 (4): TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification

Title: TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification TFOC-Net: Ein kurzzeitiger Fourier-Transform-basierter Deep-Learning-Ansatz zur Verbesserung der Klassifizierung bereichsübergreifender Motorbilder TFOC-Net:一个短时间短时间的四倍变换式深学习方法,以加强跨主题机动机图象分类 2507.02510v1

Authors (4): Ahmed G. Habashi, Ahmed M. Azab, Seif Eldawlatly, Gamal M. Aly

Cross-subject motor imagery (CS-MI) classification in brain-computer interfaces (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals. This variability often results in lower classification accuracy compared to subject-specific models, presenting a major barrier to developing calibration-free BCIs suitable for real-world applications. In this paper, we introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques. Our approach involves direct classification of Short-Time Fourier Transform (STFT)-transformed EEG data, optimized STFT parameters, and a balanced batching strategy during training of a Convolutional Neural Network (CNN). This approach is uniquely validated across four different datasets, including three widely-used benchmark datasets leading to substantial improvements in cross-subject classification, achieving 67.60% on the BCI Competition IV Dataset 1 (IV-1), 65.96% on Dataset 2A (IV-2A), and 80.22% on Dataset 2B (IV-2B), outperforming state-of-the-art techniques. Additionally, we systematically investigate the classification performance using MI windows ranging from the full 4-second window to 1-second windows. These results establish a new benchmark for generalizable, calibration-free MI classification in addition to contributing a robust open-access dataset to advance research in this domain.

nan


Article 1318

Title@2025-07-03 (4): The Choice of Normalization Influences Shrinkage in Regularized Regression

Title: The Choice of Normalization Influences Shrinkage in Regularized Regression Die Wahl der Normalisierung beeinflusst Schrumpfung bei regularisierter Regression 正常化的影响在正规化后退过程中的缩减 2501.03821v3

Authors (2): Johan Larsson, Jonas Wallin

Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice may have dramatic effects on the resulting model. In spite of this, there has so far been no research on this topic. In this paper, we begin to bridge this knowledge gap by studying normalization in the context of lasso, ridge, and elastic net regression. We focus on binary features and show that their class balances (proportions of ones) directly influences the regression coefficients and that this effect depends on the combination of normalization and regularization methods used. We demonstrate that this effect can be mitigated by scaling binary features with their variance in the case of the lasso and standard deviation in the case of ridge regression, but that this comes at the cost of increased variance of the coefficient estimates. For the elastic net, we show that scaling the penalty weights, rather than the features, can achieve the same effect. Finally, we also tackle mixes of binary and normal features as well as interactions and provide some initial results on how to normalize features in these cases.

nan


Article 1319

Title: IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders IndianBailJudgments-1200: Ein Multi-Attribut-Datensatz für legale NLP auf indischen Bail-Aufträgen IndianBailJail Judgments-12000:印度保释令法律国家保护程序多属性数据集 2507.02506v1

Authors (2): Sneha Deshmukh, Prathmesh Kamble

Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.

nan


Article 1320

Title@2025-07-03 (4): Continual Gradient Low-Rank Projection Fine-Tuning for LLMs

Title: Continual Gradient Low-Rank Projection Fine-Tuning for LLMs Kontinuierliche gradiente Low-Rank-Projektion Feintuning für LLMs LLMM 连续渐进式低射频低射射精度 2507.02503v1

Authors (4): Chenxu Wang, Yilin Lyu, Zicheng Sun, Liping Jing

Continual fine-tuning of Large Language Models (LLMs) is hampered by the trade-off between efficiency and expressiveness. Low-Rank Adaptation (LoRA) offers efficiency but constrains the model’s ability to learn new tasks and transfer knowledge due to its low-rank nature and reliance on explicit parameter constraints. We propose GORP (Gradient LOw Rank Projection) for Continual Learning, a novel training strategy that overcomes these limitations by synergistically combining full and low-rank parameters and jointly updating within a unified low-rank gradient subspace. GORP expands the optimization space while preserving efficiency and mitigating catastrophic forgetting. Extensive experiments on continual learning benchmarks demonstrate GORP’s superior performance compared to existing state-of-the-art approaches. Code is available at https://github.com/Wcxwcxw/GORP.

nan


Article 1321

Title@2025-07-03 (4): Online Conformal Prediction with Efficiency Guarantees

Title: Online Conformal Prediction with Efficiency Guarantees Online-Konforme Vorhersage mit Effizienzgarantien 具有效率保障的在线非正式预测 2507.02496v1

Authors (1): Vaidehi Srinivas

We study the problem of conformal prediction in a novel online framework that directly optimizes efficiency. In our problem, we are given a target miscoverage rate $\alpha > 0$, and a time horizon $T$. On each day $t \le T$ an algorithm must output an interval $I_t \subseteq [0, 1]$, then a point $y_t \in [0, 1]$ is revealed. The goal of the algorithm is to achieve coverage, that is, $y_t \in I_t$ on (close to) a $(1 - \alpha)$-fraction of days, while maintaining efficiency, that is, minimizing the average volume (length) of the intervals played. This problem is an online analogue to the problem of constructing efficient confidence intervals. We study this problem over arbitrary and exchangeable (random order) input sequences. For exchangeable sequences, we show that it is possible to construct intervals that achieve coverage $(1 - \alpha) - o(1)$, while having length upper bounded by the best fixed interval that achieves coverage in hindsight. For arbitrary sequences however, we show that any algorithm that achieves a $\mu$-approximation in average length compared to the best fixed interval achieving coverage in hindsight, must make a multiplicative factor more mistakes than $\alpha T$, where the multiplicative factor depends on $\mu$ and the aspect ratio of the problem. Our main algorithmic result is a matching algorithm that can recover all Pareto-optimal settings of $\mu$ and number of mistakes. Furthermore, our algorithm is deterministic and therefore robust to an adaptive adversary. This gap between the exchangeable and arbitrary settings is in contrast to the classical online learning problem. In fact, we show that no single algorithm can simultaneously be Pareto-optimal for arbitrary sequences and optimal for exchangeable sequences. On the algorithmic side, we give an algorithm that achieves the near-optimal tradeoff between the two cases.

nan


Article 1322

Title@2025-07-03 (4): Reliability-Adjusted Prioritized Experience Replay

Title: Reliability-Adjusted Prioritized Experience Replay Reliability-Adjusted Prioritized Experience Replay 调整了可靠性调整后确定优先经验重述 2506.18482v2

Authors (3): Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

nan


Article 1323

Title@2025-07-03 (4): MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations

Title: MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations MC-INR: Effiziente Kodierung von multivariaten wissenschaftlichen Simulationsdaten mit Meta-Learning und clusterierten Impliziten Neuraldarstellungen MC-INR:利用元学习和集成隐隐性神经图示对多变量科学模拟数据进行有效编码 2507.02494v1

Authors (5): Hyunsoo Son, Jeonghyun Noh, Suemin Jeon, Chaoli Wang, Won-Ki Jeong

Implicit Neural Representations (INRs) are widely used to encode data as continuous functions, enabling the visualization of large-scale multivariate scientific simulation data with reduced memory usage. However, existing INR-based methods face three main limitations: (1) inflexible representation of complex structures, (2) primarily focusing on single-variable data, and (3) dependence on structured grids. Thus, their performance degrades when applied to complex real-world datasets. To address these limitations, we propose a novel neural network-based framework, MC-INR, which handles multivariate data on unstructured grids. It combines meta-learning and clustering to enable flexible encoding of complex structures. To further improve performance, we introduce a residual-based dynamic re-clustering mechanism that adaptively partitions clusters based on local error. We also propose a branched layer to leverage multivariate data through independent branches simultaneously. Experimental results demonstrate that MC-INR outperforms existing methods on scientific data encoding tasks.

nan


Article 1324

Title@2025-07-03 (4): Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Title: Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke Ergebnisvorhersage und individualisierte Abschätzung des Behandlungseffekts bei Patienten mit großem Gefäßverschluss-Schlaganfall 大型船舶隔离中风病人的结果预测和个人化治疗估计效果 2507.03046v1

Authors (5): Lisa Herzog, Pascal Bühler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

nan


Article 1325

Title@2025-07-03 (4): Deep Transfer Learning for Kidney Cancer Diagnosis

Title: Deep Transfer Learning for Kidney Cancer Diagnosis Deep Transfer Learning für Nierenkrebs-Diagnose 肾脏癌症诊断深入转移学习 2408.04318v2

Authors (8): Yassine Habchi, Hamza Kheddar, Yassine Himeur, Mohamed Chahine Ghanem, Abdelkrim Boukabou, Shadi Atalla, Wathiq Mansoor, Hussain Al-Ahmad

Incurable diseases continue to pose major challenges to global healthcare systems, with their prevalence shaped by lifestyle, economic, social, and genetic factors. Among these, kidney disease remains a critical global health issue, requiring ongoing research to improve early diagnosis and treatment. In recent years, deep learning (DL) has shown promise in medical imaging and diagnostics, driving significant progress in automatic kidney cancer (KC) detection. However, the success of DL models depends heavily on the availability of high-quality, domain-specific datasets, which are often limited and expensive to acquire. Moreover, DL models demand substantial computational power and storage, restricting their real-world clinical use. To overcome these barriers, transfer learning (TL) has emerged as an effective approach, enabling the reuse of pre-trained models from related domains to enhance KC diagnosis. This paper presents a comprehensive survey of DL-based TL frameworks for KC detection, systematically reviewing key methodologies, their advantages, and limitations, and analyzing their practical performance. It further discusses challenges in applying TL to medical imaging and highlights emerging trends that could influence future research. This review demonstrates the transformative role of TL in precision medicine, particularly oncology, by improving diagnostic accuracy, lowering computational demands, and supporting the integration of AI-powered tools in healthcare. The insights provided offer valuable guidance for researchers and practitioners, paving the way for future advances in KC diagnostics and personalized treatment strategies.

nan


Article 1326

Title@2025-07-03 (4): Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

Title: Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD Learning Massive-scale Partial Correlation Networks in klinischen Multi-Omics-Studien mit HP-ACCORD 与HP-ACCORD在临床多工程学研究中学习大规模大规模局部关联网络 2412.11554v3

Authors (6): Sungdong Lee, Joshua Bang, Youngrae Kim, Hyungwon Choi, Sang-Yun Oh, Joong-Ho Won

Graphical model estimation from multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the target precision matrix while preserving the sparsity pattern and estimates it by minimizing an $\ell_1$-penalized empirical risk based on a new loss function. The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions. The associated optimization problem allows for a provably fast computation algorithm using a novel operator-splitting approach and communication-avoiding distributed matrix multiplication. A high-performance computing implementation of our framework was tested using simulated data with up to one million variables, demonstrating complex dependency structures similar to those found in biological networks. Leveraging this scalability, we estimated a partial correlation network from a dual-omic liver cancer data set. The co-expression network estimated from the ultrahigh-dimensional data demonstrated superior specificity in prioritizing key transcription factors and co-activators by excluding the impact of epigenetic regulation, thereby highlighting the value of computational scalability in multi-omic data analysis.

nan


Article 1327

Title@2025-07-03 (4): Variational Kolmogorov-Arnold Network

Title: Variational Kolmogorov-Arnold Network Unterschiedliches Kolmogorov-Arnold-Netzwerk 变化科尔莫戈罗夫-阿诺尔德网络 2507.02466v1

Authors (3): Francesco Alesiani, Henrik Christiansen, Federico Errica

Kolmogorov Arnold Networks (KANs) are an emerging architecture for building machine learning models. KANs are based on the theoretical foundation of the Kolmogorov-Arnold Theorem and its expansions, which provide an exact representation of a multi-variate continuous bounded function as the composition of a limited number of univariate continuous functions. While such theoretical results are powerful, their use as a representation learning alternative to a multi-layer perceptron (MLP) hinges on the ad-hoc choice of the number of bases modeling each of the univariate functions. In this work, we show how to address this problem by adaptively learning a potentially infinite number of bases for each univariate function during training. We therefore model the problem as a variational inference optimization problem. Our proposal, called InfinityKAN, which uses backpropagation, extends the potential applicability of KANs by treating an important hyperparameter as part of the learning process.

nan


Article 1328

Title@2025-07-03 (4): Quantum-enhanced causal discovery for a small number of samples

Title: Quantum-enhanced causal discovery for a small number of samples Quantenverstärkte kausale Entdeckung für eine kleine Anzahl von Proben 少数样本的量吨量强化因果发现 2501.05007v2

Authors (6): Yu Terada, Ken Arai, Yu Tanaka, Yota Maeda, Hiroshi Ueno, Hiroyuki Tezuka

The discovery of causal relations from observed data has attracted significant interest from disciplines such as economics, social sciences, and biology. In practical applications, considerable knowledge of the underlying systems is often unavailable, and real data are usually associated with nonlinear causal structures, which makes the direct use of most conventional causality analysis methods difficult. This study proposes a novel quantum Peter-Clark (qPC) algorithm for causal discovery that does not require any assumptions about the underlying model structures. Based on conditional independence tests in a class of reproducing kernel Hilbert spaces characterized by quantum circuits, the proposed algorithm can explore causal relations from the observed data drawn from arbitrary distributions. We conducted systematic experiments on fundamental graphs of causal structures, demonstrating that the qPC algorithm exhibits better performance, particularly with smaller sample sizes compared to its classical counterpart. Furthermore, we proposed a novel optimization approach based on Kernel Target Alignment (KTA) for determining hyperparameters of quantum kernels. This method effectively reduced the risk of false positives in causal discovery, enabling more reliable inference. Our theoretical and experimental results demonstrate that the quantum algorithm can empower classical algorithms for accurate inference in causal discovery, supporting them in regimes where classical algorithms typically fail. In addition, the effectiveness of this method was validated using the datasets on Boston housing prices, heart disease, and biological signaling systems as real-world applications. These findings highlight the potential of quantum-based causal discovery methods in addressing practical challenges, particularly in small-sample scenarios, where traditional approaches have shown significant limitations.

nan


Article 1329

Title@2025-07-03 (4): Universal Collection of Euclidean Invariants between Pairs of Position-Orientations

Title: Universal Collection of Euclidean Invariants between Pairs of Position-Orientations Universelle Sammlung von Euklidischen Invarianten zwischen Paaren von Positionsorientierungen 位置对等位置对等之间通用的欧立地异异异物收集 2504.03299v2

Authors (3): Gijs Bellaard, Bart M. N. Smets, Remco Duits

Euclidean E(3) equivariant neural networks that employ scalar fields on position-orientation space M(3) have been effectively applied to tasks such as predicting molecular dynamics and properties. To perform equivariant convolutional-like operations in these architectures one needs Euclidean invariant kernels on M(3) x M(3). In practice, a handcrafted collection of invariants is selected, and this collection is then fed into multilayer perceptrons to parametrize the kernels. We rigorously describe an optimal collection of 4 smooth scalar invariants on the whole of M(3) x M(3). With optimal we mean that the collection is independent and universal, meaning that all invariants are pertinent, and any invariant kernel is a function of them. We evaluate two collections of invariants, one universal and one not, using the PONITA neural network architecture. Our experiments show that using a collection of invariants that is universal positively impacts the accuracy of PONITA significantly.

nan


Article 1330

Title@2025-07-03 (4): Roto-Translation Invariant Metrics on Position-Orientation Space

Title: Roto-Translation Invariant Metrics on Position-Orientation Space Roto-Translation Invariante Metrics auf Positions-Orientierungsraum 定位空间的旋转变换变化计量器 2504.03309v2

Authors (2): Gijs Bellaard, Bart M. N. Smets

Riemannian metrics on the position-orientation space M(3) that are roto-translation group SE(3) invariant play a key role in image analysis tasks like enhancement, denoising, and segmentation. These metrics enable roto-translation equivariant algorithms, with the associated Riemannian distance often used in implementation. However, computing the Riemannian distance is costly, which makes it unsuitable in situations where constant recomputation is needed. We propose the mav (minimal angular velocity) distance, defined as the Riemannian length of a geometrically meaningful curve, as a practical alternative. We see an application of the mav distance in geometric deep learning. Namely, neural networks architectures such as PONITA, relies on geometric invariants to create their roto-translation equivariant model. The mav distance offers a trainable invariant, with the parameters that determine the Riemannian metric acting as learnable weights. In this paper we: 1) classify and parametrize all SE(3) invariant metrics on M(3), 2) describes how to efficiently calculate the mav distance, and 3) investigate if including the mav distance within PONITA can positively impact its accuracy in predicting molecular properties.

nan


Article 1331

Title@2025-07-03 (4): Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic

Title: Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic Rote Traubenerkennung mit beschleunigten künstlichen neuronalen Netzwerken in der programmierbaren Logik des FPGA FPGA的可编程逻辑的红葡萄探测与加速人工神经网络 2507.02443v1

Authors (5): Sandro Costa Magalhães, Marco Almeida, Filipe Neves dos Santos, António Paulo Moreira, Jorge Dias

Robots usually slow down for canning to detect objects while moving. Additionally, the robot’s camera is configured with a low framerate to track the velocity of the detection algorithms. This would be constrained while executing tasks and exploring, making robots increase the task execution time. AMD has developed the Vitis-AI framework to deploy detection algorithms into FPGAs. However, this tool does not fully use the FPGAs’ PL. In this work, we use the FINN architecture to deploy three ANNs, MobileNet v1 with 4-bit quantisation, CNV with 2-bit quantisation, and CNV with 1-bit quantisation (BNN), inside an FPGA’s PL. The models were trained on the RG2C dataset. This is a self-acquired dataset released in open access. MobileNet v1 performed better, reaching a success rate of 98 % and an inference speed of 6611 FPS. In this work, we proved that we can use FPGAs to speed up ANNs and make them suitable for attention mechanisms.

nan


Article 1332

Title@2025-07-03 (4): A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Title: A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms 二进制黑洞波形的深学习动力数字相对相对性替代工具 2412.06946v2

Authors (9): Osvaldo Gramaxo Freitas, Anastasios Theodoropoulos, Nino Villanueva, Tiago Fernandes, Solange Nunes, José A. Font, Antonio Onofre, Alejandro Torres-Forné, José D. Martin-Guerrero

Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.

nan


Article 1333

Title@2025-07-03 (4): Unsupervised Cognition

Title: Unsupervised Cognition Unüberwachte Kognition 未受监督的认知 2409.18624v3

Authors (5): Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon

Unsupervised learning methods have a soft inspiration in cognition models. To this day, the most successful unsupervised learning methods revolve around clustering samples in a mathematical space. In this paper we propose a primitive-based, unsupervised learning approach for decision-making inspired by a novel cognition framework. This representation-centric approach models the input space constructively as a distributed hierarchical structure in an input-agnostic way. We compared our approach with both current state-of-the-art unsupervised learning classification, with current state-of-the-art small and incomplete datasets classification, and with current state-of-the-art cancer type classification. We show how our proposal outperforms previous state-of-the-art. We also evaluate some cognition-like properties of our proposal where it not only outperforms the compared algorithms (even supervised learning ones), but it also shows a different, more cognition-like, behaviour.

nan


Article 1334

Title@2025-07-03 (4): Anatomical Foundation Models for Brain MRIs

Title: Anatomical Foundation Models for Brain MRIs Anatomische Grundlagenmodelle für Hirn-MRTs 脑吸管机模型解剖基础模型 2408.07079v4

Authors (4): Carlo Alberto Barbano, Matteo Brunello, Benoit Dufumier, Marco Grangetto

Deep Learning (DL) in neuroimaging has become increasingly relevant for detecting neurological conditions and neurodegenerative disorders. One of the most predominant biomarkers in neuroimaging is represented by brain age, which has been shown to be a good indicator for different conditions, such as Alzheimer’s Disease. Using brain age for weakly supervised pre-training of DL models in transfer learning settings has also recently shown promising results, especially when dealing with data scarcity of different conditions. On the other hand, anatomical information of brain MRIs (e.g. cortical thickness) can provide important information for learning good representations that can be transferred to many downstream tasks. In this work, we propose AnatCL, an anatomical foundation model for brain MRIs that i.) leverages anatomical information in a weakly contrastive learning approach, and ii.) achieves state-of-the-art performances across many different downstream tasks. To validate our approach we consider 12 different downstream tasks for the diagnosis of different conditions such as Alzheimer’s Disease, autism spectrum disorder, and schizophrenia. Furthermore, we also target the prediction of 10 different clinical assessment scores using structural MRI data. Our findings show that incorporating anatomical information during pre-training leads to more robust and generalizable representations. Pre-trained models can be found at: https://github.com/EIDOSLAB/AnatCL.

nan


Article 1335

Title@2025-07-03 (4): Optimisation Is Not What You Need

Title: Optimisation Is Not What You Need Optimierung ist nicht das, was man braucht 最优化不是你需要的 2507.03045v1

Authors (1): Alfredo Ibias

The Artificial Intelligence field has focused on developing optimisation methods to solve multiple problems, specifically problems that we thought to be only solvable through cognition. The obtained results have been outstanding, being able to even surpass the Turing Test. However, we have found that these optimisation methods share some fundamental flaws that impede them to become a true artificial cognition. Specifically, the field have identified catastrophic forgetting as a fundamental problem to develop such cognition. This paper formally proves that this problem is inherent to optimisation methods, and as such it will always limit approaches that try to solve the Artificial General Intelligence problem as an optimisation problem. Additionally, it addresses the problem of overfitting and discuss about other smaller problems that optimisation methods pose. Finally, it empirically shows how world-modelling methods avoid suffering from either problem. As a conclusion, the field of Artificial Intelligence needs to look outside the machine learning field to find methods capable of developing an artificial cognition.

nan


Article 1336

Title@2025-07-03 (4): The Artificial Scientist – in-transit Machine Learning of Plasma Simulations

Title: The Artificial Scientist – in-transit Machine Learning of Plasma Simulations Der Künstliche Wissenschaftler – in-transit maschinelles Lernen von Plasmasimulationen 人造科学家 – – Plasma模拟模拟的中转机器学习 2501.03383v3

Authors (22): Jeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Vedhas Pandit, Richard Pausch, Norbert Podhorszki, Franz Poschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, Rene Widera, Anna Willmann, Sunita Chandrasekaran

Increasing HPC cluster sizes and large-scale simulations that produce petabytes of data per run, create massive IO and storage challenges for analysis. Deep learning-based techniques, in particular, make use of these amounts of domain data to extract patterns that help build scientific understanding. Here, we demonstrate a streaming workflow in which simulation data is streamed directly to a machine-learning (ML) framework, circumventing the file system bottleneck. Data is transformed in transit, asynchronously to the simulation and the training of the model. With the presented workflow, data operations can be performed in common and easy-to-use programming languages, freeing the application user from adapting the application output routines. As a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU) simulation of the Kelvin- Helmholtz instability (KHI). We employ experience replay to avoid catastrophic forgetting in learning from this non-steady process in a continual manner. We detail challenges addressed while porting and scaling to Frontier exascale system.

nan


Article 1337

Title@2025-07-03 (4): Is Complex Query Answering Really Complex?

Title: Is Complex Query Answering Really Complex? Ist die Antwort auf komplexe Abfragen wirklich komplex? 复杂查询回答是否真的复杂? 2410.12537v3

Authors (7): Cosimo Gregucci, Bo Xiong, Daniel Hernandez, Lorenzo Loconte, Pasquale Minervini, Steffen Staab, Antonio Vergari

Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks, most queries (up to 98% for some query types) can be reduced to simpler problems, e.g., link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreases significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

nan


Article 1338

Title@2025-07-03 (4): Determination Of Structural Cracks Using Deep Learning Frameworks

Title: Determination Of Structural Cracks Using Deep Learning Frameworks Bestimmung von strukturellen Rissen mittels Deep Learning Frameworks 利用深学习框架确定结构裂缝 2507.02416v1

Authors (3): Subhasis Dasgupta, Jaydip Sen, Tuhina Halder

Structural crack detection is a critical task for public safety as it helps in preventing potential structural failures that could endanger lives. Manual detection by inexperienced personnel can be slow, inconsistent, and prone to human error, which may compromise the reliability of assessments. The current study addresses these challenges by introducing a novel deep-learning architecture designed to enhance the accuracy and efficiency of structural crack detection. In this research, various configurations of residual U-Net models were utilized. These models, due to their robustness in capturing fine details, were further integrated into an ensemble with a meta-model comprising convolutional blocks. This unique combination aimed to boost prediction efficiency beyond what individual models could achieve. The ensemble’s performance was evaluated against well-established architectures such as SegNet and the traditional U-Net. Results demonstrated that the residual U-Net models outperformed their predecessors, particularly with low-resolution imagery, and the ensemble model exceeded the performance of individual models, proving it as the most effective. The assessment was based on the Intersection over Union (IoU) metric and DICE coefficient. The ensemble model achieved the highest scores, signifying superior accuracy. This advancement suggests way for more reliable automated systems in structural defects monitoring tasks.

nan


Article 1339

Title@2025-07-03 (4): ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Title: ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForgGrip: VR 手动操纵中现实力力控的无参考课程学习 2503.08061v4

Authors (6): DongHeun Han, Byungmin Kim, RoUn Lee, KyeongMin Kim, Hyoseok Hwang, HyeongYeop Kang

Realistic Hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users’ intended force levels. We present ForceGrip, a deep learning agent that synthesizes realistic hand manipulation motions, faithfully reflecting the user’s grip force intention. Instead of mimicking predefined motion datasets, ForceGrip uses generated training scenarios-randomizing object shapes, wrist movements, and trigger input flows-to challenge the agent with a broad spectrum of physical interactions. To effectively learn from these complex tasks, we employ a three-phase curriculum learning framework comprising Finger Positioning, Intention Adaptation, and Dynamic Stabilization. This progressive strategy ensures stable hand-object contact, adaptive force control based on user inputs, and robust handling under dynamic conditions. Additionally, a proximity reward function enhances natural finger motions and accelerates training convergence. Quantitative and qualitative evaluations reveal ForceGrip’s superior force controllability and plausibility compared to state-of-the-art methods. Demo videos are available as supplementary material and the code is provided at https://han-dongheun.github.io/ForceGrip.

nan


Article 1340

Title@2025-07-03 (4): EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

Title: EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks EquiTabPFN: Eine Ziel-Permutation Equivariant Prior Fitted Networks EquiTabPFN: 目标-变异等同先前适合的网络 2502.06684v2

Authors (3): Michael Arbel, David Salinas, Frank Hutter

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters their predictions. This deficiency gives rise to an irreducible “equivariance gap”, an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture-ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

nan


Article 1341

Title@2025-07-03 (4): Significativity Indices for Agreement Values

Title: Significativity Indices for Agreement Values Signifikativitätsindizes für Vereinbarungswerte 协议价值的量化指数 2504.15325v2

Authors (4): Alberto Casagrande, Francesco Fabris, Rossano Girometti, Roberto Pagliarini

Agreement measures, such as Cohen’s kappa or intraclass correlation, gauge the matching between two or more classifiers. They are used in a wide range of contexts from medicine, where they evaluate the effectiveness of medical treatments and clinical trials, to artificial intelligence, where they can quantify the approximation due to the reduction of a classifier. The consistency of different classifiers to a golden standard can be compared simply by using the order induced by their agreement measure with respect to the golden standard itself. Nevertheless, labelling an approach as good or bad exclusively by using the value of an agreement measure requires a scale or a significativity index. Some quality scales have been proposed in the literature for Cohen’s kappa, but they are mainly na"ive, and their boundaries are arbitrary. This work proposes a general approach to evaluate the significativity of any agreement value between two classifiers and introduces two significativity indices: one dealing with finite data sets, the other one handling classification probability distributions. Moreover, this manuscript addresses the computational challenges of evaluating such indices and proposes some efficient algorithms for their evaluation.

nan


Article 1342

Title@2025-07-03 (4): Attention-based clustering

Title: Attention-based clustering Aufmerksamkeitsbasiertes Clustering 以关注为主的集群集群 2505.13112v2

Authors (3): Rodrigo Maulen-Soto, Claire Boyer, Pierre Marion

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the surprising capacity of transformer-based methods to adapt dynamically to input-specific distributions.

nan


Article 1343

Title@2025-07-03 (4): Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability

Title: Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability Benchmarking Akan ASR-Modelle über Domain-spezifische Datensätze: Eine vergleichende Bewertung von Leistung, Skalierbarkeit und Anpassungsfähigkeit 确定Akan ASR模型基准的全域具体数据集:业绩比较评价、可缩放性和可调适性 2507.02407v1

Authors (8): Mark Atta Mensah, Isaac Wiafe, Akon Ekpezu, Justice Kwame Appati, Jamal-Deen Abdulai, Akosua Nyarkoa Wiafe-Akenten, Frank Ernest Yeboah, Gifty Odame

Most existing automatic speech recognition (ASR) research evaluate models using in-domain datasets. However, they seldom evaluate how they generalize across diverse speech contexts. This study addresses this gap by benchmarking seven Akan ASR models built on transformer architectures, such as Whisper and Wav2Vec2, using four Akan speech corpora to determine their performance. These datasets encompass various domains, including culturally relevant image descriptions, informal conversations, biblical scripture readings, and spontaneous financial dialogues. A comparison of the word error rate and character error rate highlighted domain dependency, with models performing optimally only within their training domains while showing marked accuracy degradation in mismatched scenarios. This study also identified distinct error behaviors between the Whisper and Wav2Vec2 architectures. Whereas fine-tuned Whisper Akan models led to more fluent but potentially misleading transcription errors, Wav2Vec2 produced more obvious yet less interpretable outputs when encountering unfamiliar inputs. This trade-off between readability and transparency in ASR errors should be considered when selecting architectures for low-resource language (LRL) applications. These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks for Akan and other LRLs.

nan


Article 1344

Title@2025-07-03 (4): Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization

Title: Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization Verbesserung der Konsistenz in der Fahrzeug-Trajektorie-Vorhersage durch Preference-Optimierung 通过优先优化提高车辆轨迹预测的一致性 2507.02406v1

Authors (7): Caio Azevedo, Lina Achaji, Stefano Sabatini, Nicola Poerio, Grzegorz Bartyzel, Sascha Hornauer, Fabien Moutarde

Trajectory prediction is an essential step in the pipeline of an autonomous vehicle. Inaccurate or inconsistent predictions regarding the movement of agents in its surroundings lead to poorly planned maneuvers and potentially dangerous situations for the end-user. Current state-of-the-art deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets. However, when used in more complex, interactive scenarios, they often fail to capture important interdependencies between agents, leading to inconsistent predictions among agents in the traffic scene. Inspired by the efficacy of incorporating human preference into large language models, this work fine-tunes trajectory prediction models in multi-agent settings using preference optimization. By taking as input automatically calculated preference rankings among predicted futures in the fine-tuning process, our experiments–using state-of-the-art models on three separate datasets–show that we are able to significantly improve scene consistency while minimally sacrificing trajectory prediction accuracy and without adding any excess computational requirements at inference time.

nan


Article 1345

Title@2025-07-03 (4): XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

Title: XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v3

Authors (8): Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda

The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we show how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.

nan


Article 1346

Title@2025-07-03 (4): Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings

Title: Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings Wildlife Target Re-Identification mit selbstüberwachtem Lernen in nicht-Urban-Einstellungen 在非城市环境中利用自我监督的学习重新确定野生动物目标 2507.02403v1

Authors (2): Mufhumudzi Muthivhi, Terence L. van Zyl

Wildlife re-identification aims to match individuals of the same species across different observations. Current state-of-the-art (SOTA) models rely on class labels to train supervised models for individual classification. This dependence on annotated data has driven the curation of numerous large-scale wildlife datasets. This study investigates self-supervised learning Self-Supervised Learning (SSL) for wildlife re-identification. We automatically extract two distinct views of an individual using temporal image pairs from camera trap data without supervision. The image pairs train a self-supervised model from a potentially endless stream of video data. We evaluate the learnt representations against supervised features on open-world scenarios and transfer learning in various wildlife downstream tasks. The analysis of the experimental results shows that self-supervised models are more robust even with limited data. Moreover, self-supervised features outperform supervision across all downstream tasks. The code is available here https://github.com/pxpana/SSLWildlife.

nan


Article 1347

Title@2025-07-03 (4): TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

Title: TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation TABNet: Ein Triplet Augmentation Self-Recovery Framework mit Boundary-Aware Pseudo-Labors für die Segmentierung medizinischer Bilder TABNet:三联增强自我恢复框架,配有医疗图像分割的边界-软件软件-标签 2507.02399v1

Authors (8): Peilin Zhang, Shaouxan Wua, Jun Feng, Zhuo Jin, Zhizezhang Gao, Jingkun Chen, Yaqiong Xing, Xiao Zhang

Background and objective: Medical image segmentation is a core task in various clinical applications. However, acquiring large-scale, fully annotated medical image datasets is both time-consuming and costly. Scribble annotations, as a form of sparse labeling, provide an efficient and cost-effective alternative for medical image segmentation. However, the sparsity of scribble annotations limits the feature learning of the target region and lacks sufficient boundary supervision, which poses significant challenges for training segmentation networks. Methods: We propose TAB Net, a novel weakly-supervised medical image segmentation framework, consisting of two key components: the triplet augmentation self-recovery (TAS) module and the boundary-aware pseudo-label supervision (BAP) module. The TAS module enhances feature learning through three complementary augmentation strategies: intensity transformation improves the model’s sensitivity to texture and contrast variations, cutout forces the network to capture local anatomical structures by masking key regions, and jigsaw augmentation strengthens the modeling of global anatomical layout by disrupting spatial continuity. By guiding the network to recover complete masks from diverse augmented inputs, TAS promotes a deeper semantic understanding of medical images under sparse supervision. The BAP module enhances pseudo-supervision accuracy and boundary modeling by fusing dual-branch predictions into a loss-weighted pseudo-label and introducing a boundary-aware loss for fine-grained contour refinement. Results: Experimental evaluations on two public datasets, ACDC and MSCMR seg, demonstrate that TAB Net significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation. Moreover, it achieves performance comparable to that of fully supervised methods.

nan


Article 1348

Title@2025-07-03 (4): Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Title: Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement Hinterer Übergang Modellierung für unüberwachte Diffusions-basierte Sprachverbesserung 未经监督的基于传播的语音增强的外向过渡建模 2507.02391v1

Authors (4): Mostafa Sadeghi, Jean-Eudes Ayilo, Romain Serizel, Xavier Alameda-Pineda

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score, combined with the unconditional score via a trade-off hyperparameter. In this work, we propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states. The first method integrates the diffusion prior with the observation model in a principled way, removing the need for hyperparameter tuning. The second defines a diffusion process over the noisy speech itself, yielding a fully tractable and exact likelihood score. Experiments on the WSJ0-QUT and VoiceBank-DEMAND datasets demonstrate improved enhancement metrics and greater robustness to domain shifts compared to both supervised and unsupervised baselines.

nan


Article 1349

Title@2025-07-03 (4): MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning

Title: MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning MInCo: Informationskonflikte im distraktierten visuellen Modell-basierten Verstärkungslernen abmildern MINCO: 减缓视觉模型化强化学习中的信息冲突 2504.04164v3

Authors (7): Shiguang Sun, Hanbo Zhang, Zeyang Liu, Xinrui Yang, Lipeng Wan, Xingyu Chen, Xuguang Lan

Existing visual model-based reinforcement learning (MBRL) algorithms with observation reconstruction often suffer from information conflicts, making it difficult to learn compact representations and hence result in less robust policies, especially in the presence of task-irrelevant visual distractions. In this paper, we first reveal that the information conflicts in current visual MBRL algorithms stem from visual representation learning and latent dynamics modeling with an information-theoretic perspective. Based on this finding, we present a new algorithm to resolve information conflicts for visual MBRL, named MInCo, which mitigates information conflicts by leveraging negative-free contrastive learning, aiding in learning invariant representation and robust policies despite noisy observations. To prevent the dominance of visual representation learning, we introduce time-varying reweighting to bias the learning towards dynamics modeling as training proceeds. We evaluate our method on several robotic control tasks with dynamic background distractions. Our experiments demonstrate that MInCo learns invariant representations against background noise and consistently outperforms current state-of-the-art visual MBRL methods. Code is available at https://github.com/ShiguangSun/minco.

nan


Article 1350

Title@2025-07-03 (4): Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited

Title: Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited Sparse Gaussian Processes: Strukturierte Annäherungen und Power-EP Revisited 粗浅的高斯进程:结构相近和电源-EP 重新审视 2507.02377v1

Authors (2): Thang D. Bui, Michalis K. Titsias

Inducing-point-based sparse variational Gaussian processes have become the standard workhorse for scaling up GP models. Recent advances show that these methods can be improved by introducing a diagonal scaling matrix to the conditional posterior density given the inducing points. This paper first considers an extension that employs a block-diagonal structure for the scaling matrix, provably tightening the variational lower bound. We then revisit the unifying framework of sparse GPs based on Power Expectation Propagation (PEP) and show that it can leverage and benefit from the new structured approximate posteriors. Through extensive regression experiments, we show that the proposed block-diagonal approximation consistently performs similarly to or better than existing diagonal approximations while maintaining comparable computational costs. Furthermore, the new PEP framework with structured posteriors provides competitive performance across various power hyperparameter settings, offering practitioners flexible alternatives to standard variational approaches.

nan


Article 1351

Title@2025-07-03 (4): Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

Title: Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations Gateformer: Multivariate Zeitreihen durch Temporale und Variate-Wise-Achtung mit gezahnten Darstellungen prognostizieren Gatefex:通过时空和蒸汽预报推进多变时间序列 2505.00307v3

Authors (2): Yu-Hsiang Lan, Eric K. Oermann

There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7\% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.

nan


Article 1352

Title@2025-07-03 (4): Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

Title: Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Gute Darstellung, bessere Erklärung: Rolle konvolutionärer neuraler Netzwerke bei transformerbasierter Fernerkundung Bildunterschriften 良好的表述,更好的解释:进化神经网络在以变换器为基础的遥感图像描述中的作用 2502.16095v2

Authors (4): Swadhin Das, Saarthak Gupta, Kamal Kumar, Raksha Sharma

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes this representation to generate coherent textual descriptions. Recently, transformer-based models have gained significant popularity due to their ability to capture long-range dependencies and contextual information. The decoder has been well explored for text generation, whereas the encoder remains relatively unexplored. However, optimizing the encoder is crucial as it directly influences the richness of extracted features, which in turn affects the quality of generated captions. To address this gap, we systematically evaluate twelve different convolutional neural network (CNN) architectures within a transformer-based encoder framework to assess their effectiveness in RSIC. The evaluation consists of two stages: first, a numerical analysis categorizes CNNs into different clusters, based on their performance. The best performing CNNs are then subjected to human evaluation from a human-centric perspective by a human annotator. Additionally, we analyze the impact of different search strategies, namely greedy search and beam search, to ensure the best caption. The results highlight the critical role of encoder selection in improving captioning performance, demonstrating that specific CNN architectures significantly enhance the quality of generated descriptions for remote sensing images. By providing a detailed comparison of multiple encoders, this study offers valuable insights to guide advances in transformer-based image captioning models.

nan


Article 1353

Title@2025-07-03 (4): Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Title: Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes Erforschung der Integration von großen Sprachmodellen in industrielle Testwartungsprozesse 探索将大语言模型纳入工业试验维护工艺 2409.06416v2

Authors (6): Jingxiong Liu, Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Joakim Wennerberg

Much of the cost and effort required during the software testing process is invested in performing test maintenance - the addition, removal, or modification of test cases to keep the test suite in sync with the system-under-test or to otherwise improve its quality. Tool support could reduce the cost - and improve the quality - of test maintenance by automating aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language models (LLMs) - complex machine learning models adapted to textual analysis - to support test maintenance. We conducted a case study at Ericsson AB where we explore the triggers that indicate the need for test maintenance, the actions that LLMs can take, and the considerations that must be made when deploying LLMs in an industrial setting. We also propose and demonstrate a multi-agent architecture that can predict which tests require maintenance following a change to the source code. Collectively, these contributions advance our theoretical and practical understanding of how LLMs can be deployed to benefit industrial test maintenance processes.

nan


Article 1354

Title@2025-07-03 (4): Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations

Title: Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations Tiefenverstärkte DRAM-Equalizer-Parameter-Optimierung mit latenten Darstellungen 利用延迟代表方式优化深度强化学习-基于深强化的 DRAM 平准参数 2507.02365v1

Authors (2): Muhammad Usama, Dong Eui Chang

Equalizer parameter optimization for signal integrity in high-speed Dynamic Random Access Memory systems is crucial but often computationally demanding or model-reliant. This paper introduces a data-driven framework employing learned latent signal representations for efficient signal integrity evaluation, coupled with a model-free Advantage Actor-Critic reinforcement learning agent for parameter optimization. The latent representation captures vital signal integrity features, offering a fast alternative to direct eye diagram analysis during optimization, while the reinforcement learning agent derives optimal equalizer settings without explicit system models. Applied to industry-standard Dynamic Random Access Memory waveforms, the method achieved significant eye-opening window area improvements: 42.7\% for cascaded Continuous-Time Linear Equalizer and Decision Feedback Equalizer structures, and 36.8\% for Decision Feedback Equalizer-only configurations. These results demonstrate superior performance, computational efficiency, and robust generalization across diverse Dynamic Random Access Memory units compared to existing techniques. Core contributions include an efficient latent signal integrity metric for optimization, a robust model-free reinforcement learning strategy, and validated superior performance for complex equalizer architectures.

nan


Article 1355

Title@2025-07-03 (4): Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents

Title: Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents Erkennung von Schienengleisen und Menschen in der Nähe des Gleises, um Unfälle zu vermeiden 在避免事故的轨道上探测铁路线轨迹和附近人类 2507.03040v1

Authors (2): Mehrab Hosain, Rajiv Kapoor

This paper presents an approach for rail line detection and the identification of human beings in proximity to the track, utilizing the YOLOv5 deep learning model to mitigate potential accidents. The technique incorporates real-time video data to identify railway tracks with impressive accuracy and recognizes nearby moving objects within a one-meter range, specifically targeting the identification of humans. This system aims to enhance safety measures in railway environments by providing real-time alerts for any detected human presence close to the track. The integration of a functionality to identify objects at a longer distance further fortifies the preventative capabilities of the system. With a precise focus on real-time object detection, this method is poised to deliver significant contributions to the existing technologies in railway safety. The effectiveness of the proposed method is demonstrated through a comprehensive evaluation, yielding a remarkable improvement in accuracy over existing methods. These results underscore the potential of this approach to revolutionize safety measures in railway environments, providing a substantial contribution to accident prevention strategies.

nan


Article 1356

Title@2025-07-03 (4): Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling

Title: Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling Verbesserung der Swarms Durability to Threats durch Graph Signal Processing und GNN-basierte Generative Modellierung 通过图形信号处理和基于全球NNN的创制模型,提高牲畜对威胁的易感性 2507.03039v1

Authors (3): Jonathan Karin, Zoe Piran, Mor Nitzan

Swarms, such as schools of fish or drone formations, are prevalent in both natural and engineered systems. While previous works have focused on the social interactions within swarms, the role of external perturbations–such as environmental changes, predators, or communication breakdowns–in affecting swarm stability is not fully understood. Our study addresses this gap by modeling swarms as graphs and applying graph signal processing techniques to analyze perturbations as signals on these graphs. By examining predation, we uncover a “detectability-durability trade-off”, demonstrating a tension between a swarm’s ability to evade detection and its resilience to predation, once detected. We provide theoretical and empirical evidence for this trade-off, explicitly tying it to properties of the swarm’s spatial configuration. Toward task-specific optimized swarms, we introduce SwaGen, a graph neural network-based generative model. We apply SwaGen to resilient swarm generation by defining a task-specific loss function, optimizing the contradicting trade-off terms simultaneously.With this, SwaGen reveals novel spatial configurations, optimizing the trade-off at both ends. Applying the model can guide the design of robust artificial swarms and deepen our understanding of natural swarm dynamics.

nan


Article 1357

Title@2025-07-03 (4): Offline Reinforcement Learning with Penalized Action Noise Injection

Title: Offline Reinforcement Learning with Penalized Action Noise Injection Offline-Verstärkung Lernen mit Penalized Action Noise Injection 用惩罚性行动噪音注射加强离线强化学习 2507.02356v1

Authors (2): JunHyeok Oh, Byung-Jun Lee

Offline reinforcement learning (RL) optimizes a policy using only a fixed dataset, making it a practical approach in scenarios where interaction with the environment is costly. Due to this limitation, generalization ability is key to improving the performance of offline RL algorithms, as demonstrated by recent successes of offline RL with diffusion models. However, it remains questionable whether such diffusion models are necessary for highly performing offline RL algorithms, given their significant computational requirements during inference. In this paper, we propose Penalized Action Noise Injection (PANI), a method that simply enhances offline learning by utilizing noise-injected actions to cover the entire action space, while penalizing according to the amount of noise injected. This approach is inspired by how diffusion models have worked in offline RL algorithms. We provide a theoretical foundation for this method, showing that offline RL algorithms with such noise-injected actions solve a modified Markov Decision Process (MDP), which we call the noisy action MDP. PANI is compatible with a wide range of existing off-policy and offline RL algorithms, and despite its simplicity, it demonstrates significant performance improvements across various benchmarks.

nan


Article 1358

Title@2025-07-03 (4): Text-Aware Image Restoration with Diffusion Models

Title: Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v2

Authors (9): Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim

Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/

nan


Article 1359

Title@2025-07-03 (4): Privacy-Preserving Operating Room Workflow Analysis using Digital Twins

Title: Privacy-Preserving Operating Room Workflow Analysis using Digital Twins Datenschutz-Bewahrung von Betriebsraum-Workflow-Analysen mit digitalen Zwillingen 利用数字双对进行工作流量分析 2504.12552v2

Authors (10): Alejandra Perez, Han Zhang, Yu-Chun Ku, Lalithkumar Seenivasan, Roger Soberanis, Jose L. Porras, Richard Day, Jeff Jopling, Peter Najjar, Mathias Unberath

The operating room (OR) is a complex environment where optimizing workflows is critical to reduce costs and improve patient outcomes. While computer vision approaches for automatic recognition of perioperative events can identify bottlenecks for OR optimization, privacy concerns limit the use of OR videos for automated event detection. We propose a two-stage pipeline for privacy-preserving OR video analysis and event detection. First, we leverage vision foundation models for depth estimation and semantic segmentation to generate de-identified Digital Twins (DT) of the OR from conventional RGB videos. Second, we employ the SafeOR model, a fused two-stream approach that processes segmentation masks and depth maps for OR event detection. Evaluation on an internal dataset of 38 simulated surgical trials with five event classes shows that our DT-based approach achieves performance on par with – and sometimes better than – raw RGB video-based models for OR event detection. Digital Twins enable privacy-preserving OR workflow analysis, facilitating the sharing of de-identified data across institutions and potentially enhancing model generalizability by mitigating domain-specific appearance differences.

nan


Article 1360

Title@2025-07-03 (4): UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security

Title: UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security UniNet: Ein einheitliches Multi-Granular Traffic Modeling Framework für Netzwerksicherheit UniNet:网络安全统一多重点交通示范框架 2503.04174v2

Authors (3): Binghui Wu, Dinil Mon Divakaran, Mohan Gurusamy

As modern networks grow increasingly complex–driven by diverse devices, encrypted protocols, and evolving threats–network traffic analysis has become critically important. Existing machine learning models often rely only on a single representation of packets or flows, limiting their ability to capture the contextual relationships essential for robust analysis. Furthermore, task-specific architectures for supervised, semi-supervised, and unsupervised learning lead to inefficiencies in adapting to varying data formats and security tasks. To address these gaps, we propose UniNet, a unified framework that introduces a novel multi-granular traffic representation (T-Matrix), integrating session, flow, and packet-level features to provide comprehensive contextual information. Combined with T-Attent, a lightweight attention-based model, UniNet efficiently learns latent embeddings for diverse security tasks. Extensive evaluations across four key network security and privacy problems–anomaly detection, attack classification, IoT device identification, and encrypted website fingerprinting–demonstrate UniNet’s significant performance gain over state-of-the-art methods, achieving higher accuracy, lower false positive rates, and improved scalability. By addressing the limitations of single-level models and unifying traffic analysis paradigms, UniNet sets a new benchmark for modern network security.

nan


Article 1361

Title@2025-07-03 (4): DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values

Title: DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v1

Authors (4): Changhun Kim, Yechan Mun, Sangchul Hahn, Eunho Yang

This study proposes DeltaSHAP, a novel explainable artificial intelligence (XAI) algorithm specifically designed for online patient monitoring systems. In clinical environments, discovering the causes driving patient risk evolution is critical for timely intervention, yet existing XAI methods fail to address the unique requirements of clinical time series explanation tasks. To this end, DeltaSHAP addresses three key clinical needs: explaining the changes in the consecutive predictions rather than isolated prediction scores, providing both magnitude and direction of feature attributions, and delivering these insights in real time. By adapting Shapley values to temporal settings, our approach accurately captures feature coalition effects. It further attributes prediction changes using only the actually observed feature combinations, making it efficient and practical for time-sensitive clinical applications. We also introduce new evaluation metrics to evaluate the faithfulness of the attributions for online time series, and demonstrate through experiments on online patient monitoring tasks that DeltaSHAP outperforms state-of-the-art XAI methods in both explanation quality as 62% and computational efficiency as 33% time reduction on the MIMIC-III decompensation benchmark. We release our code at https://github.com/AITRICS/DeltaSHAP.

nan


Article 1362

Title@2025-07-03 (4): Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Title: Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2:通过人类-AI协同增强优先数据曲线 2507.01352v2

Authors (12): Chris Yuhao Liu, Liang Zeng, Yuzhen Xiao, Jujie He, Jiacai Liu, Chaojie Wang, Rui Yan, Wei Shen, Fuxiang Zhang, Jiacheng Xu, Yang Liu, Yahui Zhou

Despite the critical role of reward models (RMs) in reinforcement learning from human feedback (RLHF), current state-of-the-art open RMs perform poorly on most existing evaluation benchmarks, failing to capture the spectrum of nuanced and sophisticated human preferences. Even approaches that incorporate advanced training techniques have not yielded meaningful performance improvements. We hypothesize that this brittleness stems primarily from limitations in preference datasets, which are often narrowly scoped, synthetically labeled, or lack rigorous quality control. To address these challenges, we present a large-scale preference dataset comprising 40 million preference pairs, named SynPref-40M. To enable data curation at scale, we design a human-AI synergistic two-stage pipeline that leverages the complementary strengths of human annotation quality and AI scalability. In this pipeline, humans provide verified annotations, while large language models perform automatic curation based on human guidance. Training on this preference mixture, we introduce Skywork-Reward-V2, a suite of eight reward models ranging from 0.6B to 8B parameters, trained on a carefully curated subset of 26 million preference pairs from SynPref-40M. We demonstrate that Skywork-Reward-V2 is versatile across a wide range of capabilities, including alignment with human preferences, objective correctness, safety, resistance to stylistic biases, and best-of-N scaling, achieving state-of-the-art performance across seven major reward model benchmarks. Ablation studies confirm that the effectiveness of our approach stems not only from data scale but also from high-quality curation. The Skywork-Reward-V2 series represents substantial progress in open reward models, highlighting the untapped potential of existing preference datasets and demonstrating how human-AI curation synergy can unlock significantly higher data quality.

nan


Article 1363

Title@2025-07-03 (4): Cautious Next Token Prediction

Title: Cautious Next Token Prediction Vorsichtige nächste Zeichen Vorhersage 谨慎的次下 Tok 预测 2507.03038v1

Authors (10): Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi, Yun Fu

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model’s capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings’ behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.

nan


Article 1364

Title@2025-07-03 (4): SecAlign: Defending Against Prompt Injection with Preference Optimization

Title: SecAlign: Defending Against Prompt Injection with Preference Optimization SecAlign: Abwehr gegen prompte Injektion mit Preference-Optimierung Secallign: 以优先优化防范迅速注射 2410.05451v3

Authors (6): Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo

Large language models (LLMs) are becoming increasingly prevalent in modern software systems, interfacing between the user and the Internet to assist with tasks that require advanced language understanding. To accomplish these tasks, the LLM often uses external data sources such as user documents, web retrieval, results from API calls, etc. This opens up new avenues for attackers to manipulate the LLM via prompt injection. Adversarial prompts can be injected into external data sources to override the system’s intended instruction and instead execute a malicious instruction. To mitigate this vulnerability, we propose a new defense called SecAlign based on the technique of preference optimization. Our defense first constructs a preference dataset with prompt-injected inputs, secure outputs (ones that respond to the legitimate instruction), and insecure outputs (ones that respond to the injection). We then perform preference optimization on this dataset to teach the LLM to prefer the secure output over the insecure one. This provides the first known method that reduces the success rates of various prompt injections to <10%, even against attacks much more sophisticated than ones seen during training. This indicates our defense generalizes well against unknown and yet-to-come attacks. Also, SecAlign models are still practical with similar utility to the one before defensive training in our evaluations. Our code is at https://github.com/facebookresearch/SecAlign

nan


Article 1365

Title@2025-07-03 (4): Path Planning using a One-shot-sampling Skeleton Map

Title: Path Planning using a One-shot-sampling Skeleton Map Pfadplanung mit einer One-Shot-Sampling-Skeleton-Karte 使用单点抽样的 Skeleton 地图进行路径规划 2507.02328v1

Authors (3): Gabriel O. Flores-Aquino, Octavio Gutierrez-Frias, Juan Irving Vasquez

Path planning algorithms aim to compute a collision-free path, and many works focus on finding the optimal distance path. However, for some applications, a more suitable approach is to balance response time, safety of the paths, and path length. In this context, a skeleton map is a useful tool in graph-based schemes, as it provides an intrinsic representation of free configuration space. However, skeletonization algorithms are very resource-intensive, being primarily oriented towards image processing tasks. We propose an efficient path-planning methodology that finds safe paths within an acceptable processing time. This methodology leverages a Deep Denoising Auto-Encoder (DDAE) based on U-Net architecture to compute a skeletonized version of the navigation map, which we refer to as SkelUnet. The SkelUnet network facilitates exploration of the entire workspace through one-shot sampling (OSS), as opposed to the iterative process used by exact algorithms or the probabilistic sampling process. SkelUnet is trained and tested on a dataset consisting of 12,500 bi-dimensional dungeon maps. The motion planning methodology is evaluated in a simulation environment for an Unmanned Aerial Vehicle (UAV) using 250 previously unseen maps, and assessed with various navigation metrics to quantify the navigability of the computed paths. The results demonstrate that using SkelUnet to construct a roadmap offers significant advantages, such as connecting all regions of free workspace, providing safer paths, and reducing processing times. These characteristics make this method particularly suitable for mobile service robots in structured environments.

nan


Article 1366

Title@2025-07-03 (4): High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

Title: High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures Leistungsstarkes Stärkungslernen vor Ort: Simulationsparameter mit distributiven Maßnahmen optimieren 高绩效加强现场学习:以分配措施优化模拟参数 2504.17857v3

Authors (4): AJ Miller, Fangzhou Yu, Michael Brauckmann, Farbod Farshidian

This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.

nan


Article 1367

Title@2025-07-03 (4): Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach

Title: Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v2

Authors (9): Xinnan Zhang, Chenliang Li, Siliang Zeng, Jiaxiang Li, Zhongruo Wang, Kaixiang Lin, Songtao Lu, Alfredo Garcia, Mingyi Hong

Aligning large language models (LLMs) with human preferences usually requires fine-tuning methods such as RLHF and DPO. These methods directly optimize the model parameters, so they cannot be used in test-time to improve model performance, nor are they applicable when the model weights are not accessible. In contrast, test-time methods sidestep weight updates by leveraging reward functions to guide and improve output quality. However, they incur high inference costs, and their one-shot guidance is often based on imperfect reward or value functions, leading to suboptimal outputs. In this work, we present a method named Iterative Reweight-then-Optimize (IRO), a reinforcement learning (RL) framework that performs RL-style alignment of the (frozen) base model without touching its parameters. During training, each iteration (i) samples candidates from the base model, (ii) resamples using current value functions, and (iii) trains a new lightweight value function that guides the next decoding pass. At test time, the value functions are used to guide the base model generation via a search-based optimization process. Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI’s reinforcement fine-tuning (RFT), but without requiring access to the model weights.

nan


Article 1368

Title@2025-07-03 (4): Transformer-based EEG Decoding: A Survey

Title: Transformer-based EEG Decoding: A Survey Transformer-basierte EEG-Dekodierung: Eine Umfrage 以变换器为基础的电子EEG编码:调查 2507.02320v1

Authors (2): Haodong Zhang, Hongqi Li

Electroencephalography (EEG) is one of the most common signals used to capture the electrical activity of the brain, and the decoding of EEG, to acquire the user intents, has been at the forefront of brain-computer/machine interfaces (BCIs/BMIs) research. Compared to traditional EEG analysis methods with machine learning, the advent of deep learning approaches have gradually revolutionized the field by providing an end-to-end long-cascaded architecture, which can learn more discriminative features automatically. Among these, Transformer is renowned for its strong handling capability of sequential data by the attention mechanism, and the application of Transformers in various EEG processing tasks is increasingly prevalent. This article delves into a relevant survey, summarizing the latest application of Transformer models in EEG decoding since it appeared. The evolution of the model architecture is followed to sort and organize the related advances, in which we first elucidate the fundamentals of the Transformer that benefits EEG decoding and its direct application. Then, the common hybrid architectures by integrating basic Transformer with other deep learning techniques (convolutional/recurrent/graph/spiking neural netwo-rks, generative adversarial networks, diffusion models, etc.) is overviewed in detail. The research advances of applying the modified intrinsic structures of customized Transformer have also been introduced. Finally, the current challenges and future development prospects in this rapidly evolving field are discussed. This paper aims to help readers gain a clear understanding of the current state of Transformer applications in EEG decoding and to provide valuable insights for future research endeavors.

nan


Article 1369

Title@2025-07-03 (4): Generalization vs. Specialization under Concept Shift

Title: Generalization vs. Specialization under Concept Shift Verallgemeinerung vs. Spezialisierung unter Konzeptverschiebung 普遍化与概念转变下的专业化 2409.15582v2

Authors (3): Alex Nguyen, David J. Schwab, Vudtiwat Ngampruetikorn

Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift – a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.

nan


Article 1370

Title@2025-07-03 (4): Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo

Title: Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo Verbesserung der eingeschränkten Generierung in Sprachmodellen durch selbstdestillierte Twisted Sequential Monte Carlo 通过自行筛选的蒙蒙特卡洛顺序改进语言模型中受约束的一代 2507.02315v1

Authors (3): Sooyeon Kim, Giung Nam, Juho Lee

Recent work has framed constrained text generation with autoregressive language models as a probabilistic inference problem. Among these, Zhao et al. (2024) introduced a promising approach based on twisted Sequential Monte Carlo, which incorporates learned twist functions and twist-induced proposals to guide the generation process. However, in constrained generation settings where the target distribution concentrates on outputs that are unlikely under the base model, learning becomes challenging due to sparse and uninformative reward signals. We show that iteratively refining the base model through self-distillation alleviates this issue by making the model progressively more aligned with the target, leading to substantial gains in generation quality.

nan


Article 1371

Title@2025-07-03 (4): Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

Title: Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment Ganzheitliches kontinuierliches Lernen unter Konzept Drift mit adaptiver Memory-Neuausrichtung 适应性记忆调整概念下的全面持续学习 2507.02310v1

Authors (4): Alif Ashrafee, Jedrzej Kozal, Michal Wozniak, Bartosz Krawczyk

Traditional continual learning methods prioritize knowledge retention and focus primarily on mitigating catastrophic forgetting, implicitly assuming that the data distribution of previously learned tasks remains static. This overlooks the dynamic nature of real-world data streams, where concept drift permanently alters previously seen data and demands both stability and rapid adaptation. We introduce a holistic framework for continual learning under concept drift that simulates realistic scenarios by evolving task distributions. As a baseline, we consider Full Relearning (FR), in which the model is retrained from scratch on newly labeled samples from the drifted distribution. While effective, this approach incurs substantial annotation and computational overhead. To address these limitations, we propose Adaptive Memory Realignment (AMR), a lightweight alternative that equips rehearsal-based learners with a drift-aware adaptation mechanism. AMR selectively removes outdated samples of drifted classes from the replay buffer and repopulates it with a small number of up-to-date instances, effectively realigning memory with the new distribution. This targeted resampling matches the performance of FR while reducing the need for labeled data and computation by orders of magnitude. To enable reproducible evaluation, we introduce four concept-drift variants of standard vision benchmarks: Fashion-MNIST-CD, CIFAR10-CD, CIFAR100-CD, and Tiny-ImageNet-CD, where previously seen classes reappear with shifted representations. Comprehensive experiments on these datasets using several rehearsal-based baselines show that AMR consistently counters concept drift, maintaining high accuracy with minimal overhead. These results position AMR as a scalable solution that reconciles stability and plasticity in non-stationary continual learning environments.

nan


Article 1372

Title@2025-07-03 (4): Distributional Soft Actor-Critic with Diffusion Policy

Title: Distributional Soft Actor-Critic with Diffusion Policy Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik 配发软软软动作- 带有传播政策批评器 2507.01381v2

Authors (9): Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li

Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by generating a set of reward samples through reverse sampling using a diffusion model. Based on this, a distributional reinforcement learning algorithm with dual diffusion of the value network and the policy network was derived. MuJoCo testing tasks demonstrate that the proposed algorithm not only learns multimodal policy, but also achieves state-of-the-art (SOTA) performance in all 9 control tasks, with significant suppression of estimation bias and total average return improvement of over 10% compared to existing mainstream algorithms. The results of real vehicle testing show that DSAC-D can accurately characterize the multimodal distribution of different driving styles, and the diffusion policy network can characterize multimodal trajectories.

nan


Article 1373

Title@2025-07-03 (4): Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks

Title: Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks Neurale CRNs: Eine natürliche Umsetzung des Lernens in chemischen Reaktionsnetzwerken CRNs:化学反应网络学习的自然应用 2409.00034v3

Authors (2): Rajiv Teja Nagipogu, John H. Reif

This work introduces Neural CRNs, a general-purpose chemical neural network framework that embeds learning directly into mass-action chemical reaction systems. Unlike prior approaches that chemically implement and compose discrete neural computations, Neural CRNs adopt an analog computing approach, where both forward and backward passes of learning are implemented as continuous-time evolutions of molecular concentrations. Such an analog formulation naturally aligns with the analog nature of chemical kinetics, yielding concise circuits and practicable reactions. We demonstrate this efficiency by constructing a streamlined supervised learning procedure executable in just two sequential stages. We then implement several learning circuits to demonstrate the framework’s linear and nonlinear modeling capabilities and to validate its learning procedure. These circuits are implemented entirely using unimolecular and bimolecular reactions, avoiding the complexity of higher-order chemistries. In summary, Neural CRNs offer a compact, scalable, and autonomous framework for biochemical learning, opening new avenues for adaptive computing in synthetic biology, bioengineering, and biomedicine.

nan


Article 1374

Title@2025-07-03 (4): Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Title: Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Bypass Back-Propagation: Optimierungsbasiertes Structural Pruning für große Sprachmodelle über Policy Gradient Bypass 后回通信:通过 “ 政策梯度 “ 优化基于优化的结构结构,为大语言模式提供缓冲 2406.10576v3

Authors (5): Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on heuristically hand-crafted metrics, potentially leading to suboptimal performance. We instead propose a novel optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. To preserve efficiency, our method eliminates the back-propagation through the LLM per se during optimization, requiring only the forward pass of the LLM. We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks, where we decouple the Bernoulli parameters from LLM loss, facilitating efficient optimization via policy gradient estimator without back-propagation. Thus, our method can 1) support global and heterogeneous pruning (i.e., automatically determine different redundancy for different layers), and 2) optionally initialize with a metric-based method (for our Bernoulli distributions). Extensive experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models using the C4 and WikiText2 datasets demonstrate the promising performance of our method in efficiency and effectiveness. Code is available at https://github.com/ethanygao/backprop-free_LLM_pruning.

nan


Article 1375

Title@2025-07-03 (4): Kernel Density Bayesian Inverse Reinforcement Learning

Title: Kernel Density Bayesian Inverse Reinforcement Learning Kernel Dichte Bayesian Inverse Verstärkung Lernen 内核冷密度贝耶斯反强化学习 2303.06827v4

Authors (6): Aishwarya Mandyam, Didong Li, Jiayu Yao, Diana Cai, Andrew Jones, Barbara E. Engelhardt

Inverse reinforcement learning (IRL) methods infer an agent’s reward function using demonstrations of expert behavior. A Bayesian IRL approach models a distribution over candidate reward functions, capturing a degree of uncertainty in the inferred reward function. This is critical in some applications, such as those involving clinical data. Typically, Bayesian IRL algorithms require large demonstration datasets, which may not be available in practice. In this work, we incorporate existing domain-specific data to achieve better posterior concentration rates. We study a common setting in clinical and biological applications where we have access to expert demonstrations and known reward functions for a set of training tasks. Our aim is to learn the reward function of a new test task given limited expert demonstrations. Existing Bayesian IRL methods impose restrictions on the form of input data, thus limiting the incorporation of training task data. To better leverage information from training tasks, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL). Our approach employs a conditional kernel density estimator, which uses the known reward functions of the training tasks to improve the likelihood estimation across a range of reward functions and demonstration samples. Our empirical results highlight KD-BIRL’s faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the first to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.

nan


Article 1376

Title@2025-07-03 (4): DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

Title: DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning DoMIX: Ein effizientes Framework zur Nutzung von Domain-Wissen im Feintuning DoMIX:一个在微调中利用域知识的有效框架 2507.02302v1

Authors (3): Dohoon Kim, Donghun Kang, Taesup Moon

Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training; (2) sensitivity to incremental data order; and (3) providing a single, generalized model for all end tasks, which contradicts the essence of DAP. In this paper, we propose DoMIX, a novel approach that addresses these challenges by leveraging LoRA modules, a representative parameter-efficient fine-tuning (PEFT) method. Our approach enables efficient and parallel domain-adaptive pre-training that is robust to domain order and effectively utilizes accumulated knowledge to provide tailored pre-trained models for specific tasks. We also demonstrate that our method can be extended beyond the DAP setting to standard LLM fine-tuning scenarios. Code is available at https://github.com/dohoonkim-ai/DoMIX.

nan


Article 1377

Title@2025-07-03 (4): Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications

Title: Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications Wissensgraphisch erklärbare und generalisierte zero-shot semantische Kommunikation 基于知识的图表可解释和通用的零热语义通信 2507.02291v1

Authors (5): Zhaoyu Zhang, Lingyi Wang, Wei Wu, Fuhui Zhou, Qihui Wu

Data-driven semantic communication is based on superficial statistical patterns, thereby lacking interpretability and generalization, especially for applications with the presence of unseen data. To address these challenges, we propose a novel knowledge graph-enhanced zero-shot semantic communication (KGZS-SC) network. Guided by the structured semantic information from a knowledge graph-based semantic knowledge base (KG-SKB), our scheme provides generalized semantic representations and enables reasoning for unseen cases. Specifically, the KG-SKB aligns the semantic features in a shared category semantics embedding space and enhances the generalization ability of the transmitter through aligned semantic features, thus reducing communication overhead by selectively transmitting compact visual semantics. At the receiver, zero-shot learning (ZSL) is leveraged to enable direct classification for unseen cases without the demand for retraining or additional computational overhead, thereby enhancing the adaptability and efficiency of the classification process in dynamic or resource-constrained environments. The simulation results conducted on the APY datasets show that the proposed KGZS-SC network exhibits robust generalization and significantly outperforms existing SC frameworks in classifying unseen categories across a range of SNR levels.

nan


Article 1378

Title@2025-07-03 (4): Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain

Title: Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain Optimale Strategien zur mehrsprachigen Analyse sozialer Inhalte für einen neuartigen Datensatz im Tourismusbereich 为旅游领域新数据集的社会内容进行多语种社会内容分析的最佳最佳战略 2311.14727v2

Authors (6): Maxime Masson, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose

The rising influence of social media platforms in various domains, including tourism, has highlighted the growing need for efficient and automated Natural Language Processing (NLP) strategies to take advantage of this valuable resource. However, the transformation of multilingual, unstructured, and informal texts into structured knowledge still poses significant challenges, most notably the never-ending requirement for manually annotated data to train deep learning classifiers. In this work, we study different NLP techniques to establish the best ones to obtain competitive performances while keeping the need for training annotated data to a minimum. To do so, we built the first publicly available multilingual dataset (French, English, and Spanish) for the tourism domain, composed of tourism-related tweets. The dataset includes multilayered, manually revised annotations for Named Entity Recognition (NER) for Locations and Fine-grained Thematic Concepts Extraction mapped to the Thesaurus of Tourism and Leisure Activities of the World Tourism Organization, as well as for Sentiment Analysis at the tweet level. Extensive experimentation comparing various few-shot and fine-tuning techniques with modern language models demonstrate that modern few-shot techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 30 tweets for Named Entity Recognition of Locations and 1K tweets annotated with fine-grained thematic concepts, a highly fine-grained sequence labeling task based on an inventory of 315 classes. We believe that our results, grounded in a novel dataset, pave the way for applying NLP to new domain-specific applications, reducing the need for manual annotations and circumventing the complexities of rule-based, ad-hoc solutions.

nan


Article 1379

Title@2025-07-03 (4): Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Title: Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization Prompt Entwirren über Sprachführung und Repräsentation Ausrichtung für Domain-Verallgemeinerung 通过语言指导和代表对称调整,迅速解脱 2507.02288v1

Authors (6): De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

nan


Article 1380

Title@2025-07-03 (4): Content filtering methods for music recommendation: A review

Title: Content filtering methods for music recommendation: A review Methoden zur Inhaltsfilterung für Musikempfehlungen: Eine Rezension 音乐建议的内容过滤方法:审查 2507.02282v1

Authors (2): Terence Zeng, Abhishek K. Umrawal

Recommendation systems have become essential in modern music streaming platforms, shaping how users discover and engage with songs. One common approach in recommendation systems is collaborative filtering, which suggests content based on the preferences of users with similar listening patterns to the target user. However, this method is less effective on media where interactions are sparse. Music is one such medium, since the average user of a music streaming service will never listen to the vast majority of tracks. Due to this sparsity, there are several challenges that have to be addressed with other methods. This review examines the current state of research in addressing these challenges, with an emphasis on the role of content filtering in mitigating biases inherent in collaborative filtering approaches. We explore various methods of song classification for content filtering, including lyrical analysis using Large Language Models (LLMs) and audio signal processing techniques. Additionally, we discuss the potential conflicts between these different analysis methods and propose avenues for resolving such discrepancies.

nan


Article 1381

Title@2025-07-03 (4): Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments

Title: Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments Kausales Repräsentationslernen mit generativer Künstlicher Intelligenz: Anwendung auf Texte als Behandlungen 产生人工智能的因果代表性学习:应用文字作为治疗 2410.00903v3

Authors (2): Kosuke Imai, Kentaro Nakamura

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings in which the treatment feature is based on human perception. The proposed GPI methodology is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.

nan


Article 1382

Title@2025-07-03 (4): Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

Title: Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge bei CVPR 2025 MEIS Workshop 基准的可通用二手操纵:2025年欧洲气象和气象科学研究所讲习班上的机器人双臂双臂合作挑战 2506.23351v2

Authors (99): Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng, Yunfei Ge, Peiqing Cong, Guannan He, Zhaoming Han, Ruocheng Yin, Jingxiang Guo, Lunkai Lin, Tianling Xu, Hongzhe Bi, Xuewu Lin, Tianwei Lin, Shujie Luo, Keyu Li, Ziyan Zhao, Ke Fan, Heyang Xu, Bo Peng, Wenlong Gao, Dongjiang Li, Feng Jin, Hui Shen, Jinming Li, Chaowei Cui, Yu Chen, Yaxin Peng, Lingdong Zeng, Wenlong Dong, Tengfei Li, Weijie Ke, Jun Chen, Erdemt Bao, Tian Lan, Tenglong Liu, Jin Yang, Huiping Zhuang, Baozhi Jia, Shuai Zhang, Zhengfeng Zou, Fangheng Guan, Tianyi Jia, Ke Zhou, Hongjiu Zhang, Yating Han, Cheng Fang, Yixian Zou, Chongyang Xu, Qinglun Zhang, Shen Cheng, Xiaohe Wang, Ping Tan, Haoqiang Fan, Shuaicheng Liu, Jiaheng Chen, Chuxuan Huang, Chengliang Lin, Kaijun Luo, Boyu Yue, Yi Liu, Jinyu Chen, Zichang Tan, Liming Deng, Shuo Xu, Zijian Cai, Shilong Yin, Hao Wang, Hongshan Liu, Tianyang Li, Long Shi, Ran Xu, Huilin Xu, Zhengquan Zhang, Congsheng Xu, Jinchang Yang, Feng Xu

Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To advance this goal, we launched the RoboTwin Dual-Arm Collaboration Challenge at the 2nd MEIS Workshop, CVPR 2025. Built on the RoboTwin Simulation platform (1.0 and 2.0) and the AgileX COBOT-Magic Robot platform, the competition consisted of three stages: Simulation Round 1, Simulation Round 2, and a final Real-World Round. Participants totally tackled 17 dual-arm manipulation tasks, covering rigid, deformable, and tactile-based scenarios. The challenge attracted 64 global teams and over 400 participants, producing top-performing solutions like SEM and AnchorDP3 and generating valuable insights into generalizable bimanual policy learning. This report outlines the competition setup, task design, evaluation methodology, key findings and future direction, aiming to support future research on robust and generalizable bimanual manipulation policies. The Challenge Webpage is available at https://robotwin-benchmark.github.io/cvpr-2025-challenge/.

nan


Article 1383

Title@2025-07-03 (4): Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model

Title: Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model Adaptives kubisches Regularisiertes Latenten-Faktoranalysemodell zweiter Ordnung 适应性三次立方 正规化的二至二极冷态要素分析模型 2507.03036v1

Authors (3): Jialiang Wang, Junzhou Wang, Xin Liao

High-dimensional and incomplete (HDI) data, characterized by massive node interactions, have become ubiquitous across various real-world applications. Second-order latent factor models have shown promising performance in modeling this type of data. Nevertheless, due to the bilinear and non-convex nature of the SLF model’s objective function, incorporating a damping term into the Hessian approximation and carefully tuning associated parameters become essential. To overcome these challenges, we propose a new approach in this study, named the adaptive cubic regularized second-order latent factor analysis (ACRSLF) model. The proposed ACRSLF adopts the two-fold ideas: 1) self-tuning cubic regularization that dynamically mitigates non-convex optimization instabilities; 2) multi-Hessian-vector product evaluation during conjugate gradient iterations for precise second-order information assimilation. Comprehensive experiments on two industrial HDI datasets demonstrate that the ACRSLF converges faster and achieves higher representation accuracy than the advancing optimizer-based LFA models.

nan


Article 1384

Title@2025-07-03 (4): Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach

Title: Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach Erklärbare KI für umfassende Risikobewertung für Finanzberichte: Ein leichter Hierarchischer Transformer-Netzwerkansatz 用于财务报告综合风险评估的可解释的AIAI:轻量级高压变压器网络方法 2506.23767v2

Authors (2): Xue Wen Tan, Stanley Kok

Every publicly traded U.S. company files an annual 10-K report containing critical insights into financial health and risk. We propose Tiny eXplainable Risk Assessor (TinyXRA), a lightweight and explainable transformer-based model that automatically assesses company risk from these reports. Unlike prior work that relies solely on the standard deviation of excess returns (adjusted for the Fama-French model), which indiscriminately penalizes both upside and downside risk, TinyXRA incorporates skewness, kurtosis, and the Sortino ratio for more comprehensive risk assessment. We leverage TinyBERT as our encoder to efficiently process lengthy financial documents, coupled with a novel dynamic, attention-based word cloud mechanism that provides intuitive risk visualization while filtering irrelevant terms. This lightweight design ensures scalable deployment across diverse computing environments with real-time processing capabilities for thousands of financial documents which is essential for production systems with constrained computational resources. We employ triplet loss for risk quartile classification, improving over pairwise loss approaches in existing literature by capturing both the direction and magnitude of risk differences. Our TinyXRA achieves state-of-the-art predictive accuracy across seven test years on a dataset spanning 2013-2024, while providing transparent and interpretable risk assessments. We conduct comprehensive ablation studies to evaluate our contributions and assess model explanations both quantitatively by systematically removing highly attended words and sentences, and qualitatively by examining explanation coherence. The paper concludes with findings, practical implications, limitations, and future research directions. Our code is available at https://github.com/Chen-XueWen/TinyXRA.

nan


Article 1385

Title@2025-07-03 (4): Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

Title: Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning Neural Graph Matching verbessert retrieval Augmented Generation im molekularen maschinellen Lernen 在分子机学习中改进回流增强一代的神经图相匹配 2502.17874v2

Authors (4): Runzhong Wang, Rui-Xi Wang, Mrunali Manjrekar, Connor W. Coley

Molecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Code is publicly available at https://github.com/coleygroup/ms-pred

nan


Article 1386

Title@2025-07-03 (4): NLP4Neuro: Sequence-to-sequence learning for neural population decoding

Title: NLP4Neuro: Sequence-to-sequence learning for neural population decoding NLP4Neuro: Sequence-to-Sequence-Lernen für neuronale Populationsdekodierung NLP4Neuro:神经人口解码的序列到序列学习 2507.02264v1

Authors (7): Jacob J. Morra, Kaitlyn E. Fouke, Kexin Hang, Zichen He, Owen Traubert, Timothy W. Dunn, Eva A. Naumann

Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience. However, as the computations underlying behavior unfold in networks of thousands of individual neurons across the entire brain, this presents challenges for investigating neural roles and computational mechanisms in large, densely wired mammalian brains during behavior. Transformers, the backbones of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations. These modern LLMs have benefited from extensive pre-training, and their sequence-to-sequence learning has been shown to generalize to novel tasks and data modalities, which may also confer advantages for neural decoding from larger, brain-wide activity recordings. Here, we present a systematic evaluation of off-the-shelf LLMs to decode behavior from brain-wide populations, termed NLP4Neuro, which we used to test LLMs on simultaneous calcium imaging and behavior recordings in larval zebrafish exposed to visual motion stimuli. Through NLP4Neuro, we found that LLMs become better at neural decoding when they use pre-trained weights learned from textual natural language data. Moreover, we found that a recent mixture-of-experts LLM, DeepSeek Coder-7b, significantly improved behavioral decoding accuracy, predicted tail movements over long timescales, and provided anatomically consistent highly interpretable readouts of neuron salience. NLP4Neuro demonstrates that LLMs are highly capable of informing brain-wide neural circuit dissection.

nan


Article 1387

Title@2025-07-03 (4): MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Title: MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent MemAgent: Umgestalten von Langkontext-LLM mit Multi-Conv RL-basierten Speicheragenten MemerAgent: 与基于多Conv RL的内存代理重塑长文本LLM 2507.02259v1

Authors (11): Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, Hao Zhou

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.

nan


Article 1388

Title@2025-07-03 (4): Uncertainty-aware Reward Design Process

Title: Uncertainty-aware Reward Design Process Unsicheres Design von Belohnungen 不确定性-对不确定性的认知度-奖励设计流程 2507.02256v1

Authors (4): Yang Yang, Xiaolu Zhou, Bosong Ding, Miao Xin

Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate reward function design. However, their suboptimal performance in numerical optimization often yields unsatisfactory reward quality, while the evolutionary search paradigm demonstrates inefficient utilization of simulation resources, resulting in prohibitively lengthy design cycles with disproportionate computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation in RL environments. URDP quantifies candidate reward function uncertainty based on self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel reward components. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to significantly enhance hyperparameter configuration efficiency. Finally, we construct a bi-level optimization architecture by decoupling the reward component optimization and the hyperparameter tuning. URDP orchestrates synergistic collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian Optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches.

nan


Article 1389

Title@2025-07-03 (4): Listwise Preference Alignment Optimization for Tail Item Recommendation

Title: Listwise Preference Alignment Optimization for Tail Item Recommendation Listwise Preference Alignment Optimization for Tail Item Empfehlung 用于反尾项目建议的优先对齐优化列表 2507.02255v1

Authors (7): Zihao Li, Chao Yang, Tong Zhang, Yakun Chen, Xianzhi Wang, Guandong Xu, Daoyi Dong

Preference alignment has achieved greater success on Large Language Models (LLMs) and drawn broad interest in recommendation research. Existing preference alignment methods for recommendation either require explicit reward modeling or only support pairwise preference comparison. The former directly increases substantial computational costs, while the latter hinders training efficiency on negative samples. Moreover, no existing effort has explored preference alignment solutions for tail-item recommendation. To bridge the above gaps, we propose LPO4Rec, which extends the Bradley-Terry model from pairwise comparison to listwise comparison, to improve the efficiency of model training. Specifically, we derive a closed form optimal policy to enable more efficient and effective training without explicit reward modeling. We also present an adaptive negative sampling and reweighting strategy to prioritize tail items during optimization and enhance performance in tail-item recommendations. Besides, we theoretically prove that optimizing the listwise preference optimization (LPO) loss is equivalent to maximizing the upper bound of the optimal reward. Our experiments on three public datasets show that our method outperforms 10 baselines by a large margin, achieving up to 50% performance improvement while reducing 17.9% GPU memory usage when compared with direct preference optimization (DPO) in tail-item recommendation. Our code is available at https://github.com/Yuhanleeee/LPO4Rec.

nan


Article 1390

Title@2025-07-03 (4): Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks

Title: Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks Schaltungs-Tuning: Mechanistischer Ansatz zur Identifizierung von Parameter Redundanz und Feinsteuerung neuraler Netzwerke 电路调控:确定参数冗余和精微调整神经网络的机械化方法 2502.06106v2

Authors (4): Yueyan Li, Wenhao Gao, Caixia Yuan, Xiaojie Wang

The study of mechanistic interpretability aims to reverse-engineer a model to explain its behaviors. While recent studies have focused on the static mechanism of a certain behavior, the learning dynamics inside a model remain to be explored. In this work, we develop an interpretable fine-tuning method for analyzing the mechanism behind learning. We first introduce the concept of node-level intrinsic dimensionality to describe the learning process of a model in a computational graph. Based on our theory, we propose circuit-tuning, a two-stage algorithm that iteratively builds the minimal subgraph for a specific task and updates the key parameters in a heuristic way. Experimental results confirm the existence of the intrinsic dimensionality at the node level and demonstrate the effectiveness of our method for transparent and interpretable fine-tuning. We visualize and analyze the circuits before, during, and after fine-tuning, providing new insights into the self-organization mechanism of a neural network in the learning process.

nan


Article 1391

Title@2025-07-03 (4): Adapter-Enhanced Semantic Prompting for Continual Learning

Title: Adapter-Enhanced Semantic Prompting for Continual Learning Adapter-erweiterte semantische Prompting für kontinuierliches Lernen 增强适应性强化的持续学习的语义提示 2412.11074v3

Authors (8): Baocai Yin, Ji Zhao, Huajie Jiang, Ningning Hou, Yongli Hu, Amin Beheshti, Ming-Hsuan Yang, Yuankai Qi

Continual learning (CL) enables models to adapt to evolving data streams. A major challenge of CL is catastrophic forgetting, where new knowledge will overwrite previously acquired knowledge. Traditional methods usually retain the past data for replay or add additional branches in the model to learn new knowledge, which has high memory requirements. In this paper, we propose a novel lightweight CL framework, Adapter-Enhanced Semantic Prompting (AESP), which integrates prompt tuning and adapter techniques. Specifically, we design semantic-guided prompts to enhance the generalization ability of visual features and utilize adapters to efficiently fuse the semantic information, aiming to learn more adaptive features for the continual learning task. Furthermore, to choose the right task prompt for feature adaptation, we have developed a novel matching mechanism for prompt selection. Extensive experiments on three CL datasets demonstrate that our approach achieves favorable performance across multiple metrics, showing its potential for advancing CL.

nan


Article 1392

Title@2025-07-03 (4): Rethinking Data Protection in the (Generative) Artificial Intelligence Era

Title: Rethinking Data Protection in the (Generative) Artificial Intelligence Era Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken 在人工(人工)情报时代重新思考数据保护问题 2507.03034v1

Authors (11): Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren

The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.

nan


Article 1393

Title@2025-07-03 (4): Transfer Learning for Matrix Completion

Title: Transfer Learning for Matrix Completion Transfer-Lernen für Matrix-Vervollständigung 用于完成矩阵表完成的转移学习 2507.02248v1

Authors (2): Dali Liu, Haolei Weng

In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We study its convergence rates and prove its minimax optimality. Our analysis reveals that with the source matrices close enough to the target matrix, out method outperforms the traditional method using the single target data. In particular, we leverage the advanced sharp concentration inequalities introduced in \cite{brailovskaya2024universality} to eliminate a logarithmic factor in the convergence rate, which is crucial for proving the minimax optimality. When the relevance of source datasets is unknown, we develop an efficient detection procedure to identify informative sources and establish its selection consistency. Simulations and real data analysis are conducted to support the validity of our methodology.

nan


Article 1394

Title@2025-07-03 (4): EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference

Title: EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference EigenLoRAx: Recycling-Adapter, um wesentliche Subräume für ressourceneffiziente Anpassung und Schlussfolgerung zu finden EigenLORAx:为资源效率适应和推断寻找主要子空间的再循环适应器 2502.04700v4

Authors (4): Prakhar Kaushik, Ankit Vaidya, Shravan Chaudhari, Alan Yuille

The rapid growth of large models has raised concerns about their environmental impact and equity in accessibility due to significant computational costs. Low-Rank Adapters (LoRA) offer a lightweight solution for finetuning large models, resulting in an abundance of publicly available adapters tailored to diverse domains. We ask: Can these pretrained adapters be leveraged to further streamline adaptation to new tasks while addressing these challenges? We introduce EigenLoRAx, a parameter-efficient finetuning method that recycles existing adapters to create a principal subspace aligned with their shared domain knowledge which can be further augmented with orthogonal basis vectors in low-resource scenarios. This enables rapid adaptation to new tasks by learning only lightweight coefficients on the principal components of the subspace-eliminating the need to finetune entire adapters. EigenLoRAx requires significantly fewer parameters and memory, improving efficiency for both training and inference. Our method demonstrates strong performance across diverse domains and tasks, offering a scalable for edge-based applications, personalization, and equitable deployment of large models in resource-constrained environments.

nan


Article 1395

Title@2025-07-03 (4): VERBA: Verbalizing Model Differences Using Large Language Models

Title: VERBA: Verbalizing Model Differences Using Large Language Models VERBA: Verbalisieren von Modellunterschieden mit großen Sprachmodellen VERBA: 使用大语言模式 2507.02241v1

Authors (3): Shravan Doda, Shashidhar Reddy Javaji, Zining Zhu

In the current machine learning landscape, we face a “model lake” phenomenon: Given a task, there is a proliferation of trained models with similar performances despite different behavior. For model users attempting to navigate and select from the models, documentation comparing model pairs is helpful. However, for every $N$ models there could be $O(N^2)$ pairwise comparisons, a number prohibitive for the model developers to manually perform pairwise comparisons and prepare documentations. To facilitate fine-grained pairwise comparisons among models, we introduced $\textbf{VERBA}$. Our approach leverages a large language model (LLM) to generate verbalizations of model differences by sampling from the two models. We established a protocol that evaluates the informativeness of the verbalizations via simulation. We also assembled a suite with a diverse set of commonly used machine learning models as a benchmark. For a pair of decision tree models with up to 5% performance difference but 20-25% behavioral differences, $\textbf{VERBA}$ effectively verbalizes their variations with up to 80% overall accuracy. When we included the models’ structural information, the verbalization’s accuracy further improved to 90%. $\textbf{VERBA}$ opens up new research avenues for improving the transparency and comparability of machine learning models in a post-hoc manner.

nan


Article 1396

Title@2025-07-03 (4): Bi-modality medical images synthesis by a bi-directional discrete process matching method

Title: Bi-modality medical images synthesis by a bi-directional discrete process matching method Bi-Modalität medizinische Bilder Synthese durch eine bi-direktionale diskrete Prozess-Matching-Methode 以双向离散进程匹配方法对双向单离散过程匹配方法进行双向医学图像合成 2409.03977v3

Authors (3): Zhe Xiong, Qiaoqiao Ding, Xiaoqun Zhang

Recently, medical image synthesis gains more and more popularity, along with the rapid development of generative models. Medical image synthesis aims to generate an unacquired image modality, often from other observed data modalities. Synthesized images can be used for clinical diagnostic assistance, data augmentation for model training and validation or image quality improving. In the meanwhile, the flow-based models are among the successful generative models for the ability of generating realistic and high-quality synthetic images. However, most flow-based models require to calculate flow ordinary different equation (ODE) evolution steps in synthesis process, for which the performances are significantly limited by heavy computation time due to a large number of time iterations. In this paper, we propose a novel flow-based model, namely bi-directional Discrete Process Matching (Bi-DPM) to accomplish the bi-modality image synthesis tasks. Different to other flow matching based models, we propose to utilize both forward and backward ODE flows and enhance the consistency on the intermediate images over a few discrete time steps, resulting in a synthesis process maintaining high-quality generations for both modalities under the guidance of paired data. Our experiments on three datasets of MRI T1/T2 and CT/MRI demonstrate that Bi-DPM outperforms other state-of-the-art flow-based methods for bi-modality image synthesis, delivering higher image quality with accurate anatomical regions.

nan


Article 1397

Title@2025-07-03 (4): Fast AI Model Splitting over Edge Networks

Title: Fast AI Model Splitting over Edge Networks Schnelle KI-Modellaufspaltung über Edge-Netzwerke 快速 AI 模型在边缘网络上分割 2507.01041v2

Authors (7): Zuguang Li, Wen Wu, Shaohua Wu, Songge Zhang, Ye Wang, Xuemin, Shen

Split learning (SL) has emerged as a computationally efficient approach for artificial intelligence (AI) model training, which can alleviate device-side computational workloads. However, complex AI model architectures pose high computational complexity to obtain the optimal model splitting. In this paper, we represent an arbitrary AI model as a directed acyclic graph (DAG), and then reformulate the optimal model splitting problem as a minimum s-t cut search problem. To solve the problem, we propose a fast DAG-based model splitting algorithm, which restructures the DAG to enable the optimal model splitting identification via a maximum flow method. Theoretical analysis indicates that the proposed algorithm is optimal. Furthermore, considering AI models with block structures, we propose a block-wise model splitting algorithm to reduce computational complexity. The algorithm abstracts each block, i.e., a component consisting of multiple layers, into a single vertex, thereby obtaining the optimal model splitting via a simplified DAG. Extensive experimental results demonstrate that the proposed algorithms can determine the optimal model splitting within milliseconds, as well as reduce training delay by 24.62%-38.95% in dynamic edge networks as compared to the state-of-the-art benchmarks.

nan


Article 1398

Title@2025-07-03 (4): Privacy-Preserving Quantized Federated Learning with Diverse Precision

Title: Privacy-Preserving Quantized Federated Learning with Diverse Precision Privacy-Preserving Quantized Federated Learning with Diverse Precision 具有多种精密度的隐私保护量化联邦学习 2507.00920v2

Authors (6): Dang Qua Nguyen, Morteza Hashemi, Erik Perrins, Sergiy A. Vorobyov, David J. Love, Taejoon Kim

Federated learning (FL) has emerged as a promising paradigm for distributed machine learning, enabling collaborative training of a global model across multiple local devices without requiring them to share raw data. Despite its advancements, FL is limited by factors such as: (i) privacy risks arising from the unprotected transmission of local model updates to the fusion center (FC) and (ii) decreased learning utility caused by heterogeneity in model quantization resolution across participating devices. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. In this paper, our aim is therefore to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that is designed to simultaneously achieve differential privacy (DP) and minimum quantization error. Notably, the proposed SQ guarantees bounded distortion, unlike other DP approaches. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Numerical simulations validate the benefits of our approach in terms of privacy protection and learning utility compared to the conventional LaplaceSQ-FL algorithm.

nan


Article 1399

Title@2025-07-03 (4): PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations

Title: PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations PhysikKorrektur: Ein trainingsfreier Ansatz für stabile neurale PDE-Simulationen 物理校正:稳定神经物质模拟的无培训方法 2507.02227v1

Authors (2): Xinquan Huang, Paris Perdikaris

Neural networks have emerged as powerful surrogates for solving partial differential equations (PDEs), offering significant computational speedups over traditional methods. However, these models suffer from a critical limitation: error accumulation during long-term rollouts, where small inaccuracies compound exponentially, eventually causing complete divergence from physically valid solutions. We present PhysicsCorrect, a training-free correction framework that enforces PDE consistency at each prediction step by formulating correction as a linearized inverse problem based on PDE residuals. Our key innovation is an efficient caching strategy that precomputes the Jacobian and its pseudoinverse during an offline warm-up phase, reducing computational overhead by two orders of magnitude compared to standard correction approaches. Across three representative PDE systems – Navier-Stokes fluid dynamics, wave equations, and the chaotic Kuramoto-Sivashinsky equation – PhysicsCorrect reduces prediction errors by up to 100x while adding negligible inference time (under 5\%). The framework integrates seamlessly with diverse architectures including Fourier Neural Operators, UNets, and Vision Transformers, effectively transforming unstable neural surrogates into reliable simulation tools that bridge the gap between deep learning’s computational efficiency and the physical fidelity demanded by practical scientific applications.

nan


Article 1400

Title@2025-07-03 (4): DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs

Title: DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs DecoRTL: Ein Laufzeit-Decoding-Framework für RTL-Code-Generierung mit LLMs DecoRTL: 使用LLMs的RTL代码生成运行时间解码框架 2507.02226v1

Authors (3): Mohammad Akyash, Kimia Azar, Hadi Kamali

As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation. However, conventional LLM decoding strategies, originally designed for natural language, often fail to meet the structural and semantic demands of RTL, leading to hallucinated, repetitive, or invalid code outputs. In this paper, we first investigate the root causes of these decoding failures through an empirical analysis of token-level entropy during RTL generation. Our findings reveal that LLMs exhibit low confidence in regions of structural ambiguity or semantic complexity, showing that standard decoding strategies fail to differentiate between regions requiring determinism (syntax-critical regions) and those that benefit from creative exploratory variability (design-critical regions). Then, to overcome this, we introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation. DecoRTL integrates two complementary components: (i) self-consistency sampling, which generates multiple candidates and re-ranks them based on token-level agreement to promote correctness while maintaining diversity; and (ii) syntax-aware temperature adaptation, which classifies tokens by their syntactical and functional roles and adjusts the sampling temperature accordingly, enforcing low temperature for syntax-critical tokens and higher temperature for exploratory ones. Our approach operates entirely at inference time without requiring any additional model fine-tuning. Through evaluations on multiple open-source LLMs using the VerilogEval benchmark, we demonstrate significant improvements in syntactic validity, functional correctness, and output diversity, while the execution overhead (performance overhead) is imperceptible.

nan


Article 1401

Title@2025-07-03 (4): Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction

Title: Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction Metrisches Design != Metrisches Verhalten: Verbesserung der Metrischen Auswahl für die unvoreingenommene Bewertung der Dimensionalitätsreduktion 计量设计 ! = 计量行为:改进对减少多维度的无偏见评价的计量选择 2507.02225v1

Authors (3): Jiyeon Bae, Hyeon Jeon, Jinwook Seo

Evaluating the accuracy of dimensionality reduction (DR) projections in preserving the structure of high-dimensional data is crucial for reliable visual analytics. Diverse evaluation metrics targeting different structural characteristics have thus been developed. However, evaluations of DR projections can become biased if highly correlated metrics–those measuring similar structural characteristics–are inadvertently selected, favoring DR techniques that emphasize those characteristics. To address this issue, we propose a novel workflow that reduces bias in the selection of evaluation metrics by clustering metrics based on their empirical correlations rather than on their intended design characteristics alone. Our workflow works by computing metric similarity using pairwise correlations, clustering metrics to minimize overlap, and selecting a representative metric from each cluster. Quantitative experiments demonstrate that our approach improves the stability of DR evaluation, which indicates that our workflow contributes to mitigating evaluation bias.

nan


Article 1402

Title@2025-07-03 (4): On the Mathematical Impossibility of Safe Universal Approximators

Title: On the Mathematical Impossibility of Safe Universal Approximators Über die mathematische Unmöglichkeit sicherer Universal-Annäherer 安全普遍接近者在数学方面的不易性 2507.03031v1

Authors (1): Jasper Yao

We establish fundamental mathematical limits on universal approximation theorem (UAT) system alignment by proving that catastrophic failures are an inescapable feature of any useful computational system. Our central thesis is that for any universal approximator, the expressive power required for useful computation is inextricably linked to a dense set of instabilities that make perfect, reliable control a mathematical impossibility. We prove this through a three-level argument that leaves no escape routes for any class of universal approximator architecture. i) Combinatorial Necessity: For the vast majority of practical universal approximators (e.g., those using ReLU activations), we prove that the density of catastrophic failure points is directly proportional to the network’s expressive power. ii) Topological Necessity: For any theoretical universal approximator, we use singularity theory to prove that the ability to approximate generic functions requires the ability to implement the dense, catastrophic singularities that characterize them. iii) Empirical Necessity: We prove that the universal existence of adversarial examples is empirical evidence that real-world tasks are themselves catastrophic, forcing any successful model to learn and replicate these instabilities. These results, combined with a quantitative “Impossibility Sandwich” showing that the minimum complexity for usefulness exceeds the maximum complexity for safety, demonstrate that perfect alignment is not an engineering challenge but a mathematical impossibility. This foundational result reframes UAT safety from a problem of “how to achieve perfect control” to one of “how to operate safely in the presence of irreducible uncontrollability,” with profound implications for the future of UAT development and governance.

nan


Article 1403

Title@2025-07-03 (4): Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Title: Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v2

Authors (3): Ziyue Li, Chenrui Fan, Tianyi Zhou

Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkpoints during one-pass pretraining of a 7B large language model (LLM), i.e., OLMoE. We compute the training loss and evaluate generalization on diverse benchmark tasks, including math reasoning, code generation, and commonsense/domain-specific knowledge retrieval tasks. Our study, for the first time, verifies that grokking still happens in the pretraining of large-scale foundation models, though different data may enter grokking stages asynchronously. We further demystify grokking’s “emergence of generalization” by investigating LLM internal dynamics. Specifically, we find that training samples’ pathways (i.e., expert choices across layers) evolve from random, instance-specific to more structured and shareable between samples. Also, the complexity of a sample’s pathway reduces despite the converged loss. These indicate a memorization-to-generalization “knowledge digestion”, providing a mechanistic explanation of delayed generalization. In the study, we develop two novel metrics to quantify pathway distance and the complexity of a single pathway. We show their ability to predict the generalization improvement on diverse downstream tasks. They are efficient, simple to compute and solely dependent on training data. Hence, they have practical value for pretraining, enabling us to monitor the generalization performance without finetuning and test. Theoretically, we show that more structured pathways reduce model complexity and improve the generalization bound.

nan


Article 1404

Title@2025-07-03 (4): Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting

Title: Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v3

Authors (5): Mert Onur Cakiroglu, Idil Bilge Altun, Mehmet Dalkilic, Elham Buxton, Hasan Kurban

Time series forecasting remains a challenging task for foundation models due to temporal heterogeneity, high dimensionality, and the lack of inherent symbolic structure. In this work, we propose DRAGON (Discrete Representation and Augmented Graph encoding Over de BruijN Graphs), a novel encoder that introduces Multivariate de Bruijn Graphs (MdBGs) to bridge the gap between symbolic representations and neural modeling. DRAGON discretizes continuous input sequences and maps them onto a fixed graph structure, enabling dynamic context recovery via graph-based attention. Integrated as an auxiliary module within a dual-branch architecture, DRAGON augments conventional CNN-based encoders with symbolic, structure-aware representations. All code developed for this study is available at: https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library

nan


Article 1405

Title@2025-07-03 (4): Hybrid least squares for learning functions from highly noisy data

Title: Hybrid least squares for learning functions from highly noisy data Hybride kleinste Quadrate für das Lernen von Funktionen aus hochlauten Daten 从高噪音数据中学习函数最小混合方格 2507.02215v1

Authors (4): Ben Adcock, Bernhard Hientzsch, Akil Narayan, Yiming Xu

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose a hybrid approach that combines Christoffel sampling with certain types of optimal experimental design to address this issue. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convex-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

nan


Article 1406

Title@2025-07-03 (4): SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers

Title: SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers SciGA: Ein umfassender Datensatz zur Gestaltung grafischer Abstracts in wissenschaftlichen Papieren SciGA: 用于设计学术论文制图摘要的综合数据集 2507.02212v1

Authors (4): Takuro Kawada, Shunsuke Kitada, Sota Nemoto, Hitoshi Iyatomi

Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. While recent research has increasingly incorporated visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Moreover, designing effective GAs requires advanced visualization skills, creating a barrier to their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, explicitly designed for supporting GA selection and recommendation as well as facilitating research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA recommendation, which identifies figures within a given paper that are well-suited to serve as GAs, and 2) Inter-GA recommendation, which retrieves GAs from other papers to inspire the creation of new GAs. We provide reasonable baseline models for these tasks. Furthermore, we propose Confidence Adjusted top-1 ground truth Ratio (CAR), a novel recommendation metric that offers a fine-grained analysis of model behavior. CAR addresses limitations in traditional ranking-based metrics by considering cases where multiple figures within a paper, beyond the explicitly labeled GA, may also serve as GAs. By unifying these tasks and metrics, our SciGA-145k establishes a foundation for advancing visual scientific communication while contributing to the development of AI for Science.

nan